Posted on 2007-06-30 16:00:00
Tags: PostgreSQL, Slony, Databases, Replication
Our PostgreSQL servers, called Janus and Kermit, use Slony replication to make sure we have a backup copy of the data. And if we need to do work on Janus, we fall over to the databases on Kermit and everything is fine for the users.
One of our databases uses inherited objects, and tables in that database use a number of standard fields and a lot of table specific fields. Recently we added a field to that parent object. Replication went fine, Slony can handle this.
One of the issues we have with that database is that we can't "vacuum full" it, which means that it will grow and grow and grow. At a certain moment the partition becomes full and we need to fall over to the other server and drop the database and replicate it back. This happens about once every two months, it takes fives minutes and everything is fine again.
Except for today... This is the parent object in the master database:
And this is an inherited object:mail=# \d barnet_objects Table "public.barnet_objects" Column | Type | Modifiers -------------------+---------+------------------------------------------ id | bigint | not null default nextval('barnet_object_i object_type | integer | default 0 owner | bigint | default (0)::bigint creator | bigint | default (0)::bigint acl | bigint | default (0)::bigint notforpublication | boolean | default false
The experienced eye can see that the field notforpublication is the new field in the barnet_object.mail=# \d chambers Table "public.chambers" Column | Type | Modifiers -------------------+---------+------------------------------------------ id | bigint | not null default nextval('barnet_object_i object_type | integer | default 6 owner | bigint | default (0)::bigint creator | bigint | default (0)::bigint acl | bigint | default (0)::bigint name | text | description | text | network_code | text | [...] ldap_address | text | notforpublication | boolean | default false
pgdump gives the following command to create the chambers table:
And it creates this table in the database:CREATE TABLE chambers ( object_type integer DEFAULT 6, name text, description text, network_code text, [...] ldap_address text ) INHERITS (barnet_objects);
As you can see, the order is different. And Slony replicates based on the order of fields and now complains about...Table "public.chambers" Column | Type | Modifiers -------------------+---------+------------------------------------------ id | bigint | not null default nextval('barnet_object_i object_type | integer | default 6 owner | bigint | default (0)::bigint creator | bigint | default (0)::bigint acl | bigint | default (0)::bigint notforpublication | boolean | default false name | text | description | text | network_code | text | [...] ldap_address | text |
DEBUG3 remoteWorkerThread_4: table "public"."chambers" does not require Slony-I serial key DEBUG4 remoteWorkerThread_4: Begin COPY of table "public"."chambers" ERROR remoteWorkerThread_4: copy from stdin on local node - PGRES_FATAL_ERROR ERROR: invalid input syntax for type boolean: "BARNET" CONTEXT: COPY chambers, line 1, column notforpublication: "BARNET" WARN remoteWorkerThread_4: data copy for set 1 failed - sleep 60 seconds
So instead of a five minute outage this afternoon, we'll have a three-four-five-six hour outage during the night, in which I have to pgdump | psql the data to the now-slave-database and start replication on that one back to the now-master:
COPY chambers (id, object_type, "owner", creator, acl, name, description, network_code, pop_server, smtp_server, mail_domain, proxy_server, imap_address, ldap_address, notforpublication) FROM stdin; 10818 6 1 961 0 BARNET BarNet Internal N022 pop.barnet.com.au smtp.barnet.com.au barnet.com.au proxy.barnet.com.au imap.barnet.com.au ldap.barnet.com.au f