Saturday, 10 June 2017

An unusual upgrade

I have mentioned in previous posts that in my 4 years with OmniTI, we've tackled a lot of migrations. Most of them are usually the "typical" procedure. The methodology we use is more or less explained here. Last week we had a usecase for a kind of "unusual" upgrade, a 9.2 compiled with 
"--disable-integer-datetimes" meaning that all datetimes were represented as floating point internally, something that was the default at up to 8.3. This changed at (i think) 8.4 where datetimes were represented as int64 which offers more precision. 
The requirement was to migrate the database to a new one that will use integer datetimes with the minimum possible downtime. Obviously a direct upgrade wouldn't work and pg_dump / restore was not an option so we decided to approach and test this scenario differently.

The general idea is the following :

Upgrade to a 9.6 that was compiled with "--disable-integer-datetimes" and then using something like pglogical or mimeo to replicate to another 9.6 that would use integer datetimes. For this, i used 2 containers and pagila test database to make this simulation as much realistic as i could. In this post i will describe the i steps I followed.

Installed both 9.2 and 9.6 on the same box :

9.2.21 with the following options :
./configure --prefix=/home/postgres/pgsql92/ --disable-integer-datetimes
make -j 8 world
sudo make install-world

9.6.2 with the following options :
./configure --prefix=/home/postgres/pgsql96/ --disable-integer-datetimes
make -j 8 world
sudo make install-world

initiated a new cluster and started 9.2, loaded pagila testdb (schema and data), started the database.
From now on this will act like my production database.

downloaded and installed pglogical 2.0.1 9.6 :
wget -c http://packages.2ndquadrant.com/pglogical/tarballs/pglogical-2.0.1.tar.bz2
uncompress :
tar jxfv pglogical-2.0.1.tar.bz2
compile and install :
make USE_PGXS=1 clean all
sudo PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/home/postgres/pgsql96/bin make USE_PGXS=1 install

Note: i installed 9.6 before upgrading because pglogical requires some changes in the parameters (shown later) and a library loaded, so in order not to restart twice i had it installed before the upgrade.

initiated a new 9.6 cluster so i can upgrade, stopped 9.2 and upgraded 9.2 to 9.6 :

pre upgrade check :
pgsql96/bin/pg_upgrade -b /home/postgres/pgsql92/bin/ -B /home/postgres/pgsql96/bin/ -c -d /home/postgres/pgdata92/ -D /home/postgres/pgdata96/ -v
stopped 9.2 :
-- outage starts --
pgsql92/bin/pg_ctl -D /home/postgres/pgdata92/ stop
upgrade:
pgsql96/bin/pg_upgrade -b /home/postgres/pgsql92/bin/ -B /home/postgres/pgsql96/bin/  -d /home/postgres/pgdata92/ -D /home/postgres/pgdata96/ -v -k

added the following in postgresql.conf :
wal_level = 'logical'
max_worker_processes = 10
max_replication_slots = 10
max_wal_senders = 10
shared_preload_libraries = 'pglogical'
track_commit_timestamp = on

on master hba.conf (and slave, cause why not) added :
host    replication     postgres        10.0.0.1/16             trust  
(security was not a concern so "trust" was ok)

started 9.6
-- outage stops --
analyzed 9.6 and cleaned up 9.2
and issued "create extension pg_logical;" to postgres

At this point i had my "production" db upgraded to 9.6 with pglogical installed and everything ready for logical replication.

On the second box that would have postgres 9.6,compiled without the "--disable-integer-datetimes" flag. I installed pglogical with exactly the same way i did for the first box and at this point i was ready to replicate :

on production (provider) I created a new node and added a set with all objects in public schema:
SELECT pglogical.create_node( node_name := 'provider1', dsn := 'host=10.0.0.2 port=5432 dbname=monkey' );
SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']);

Keep in mind :
ERROR:  table payment_p2007_01 cannot be added to replication set default
DETAIL:  table does not have PRIMARY KEY and given replication set is configured to replicate UPDATEs and/or DELETEs
HINT:  Add a PRIMARY KEY to the table

ALL tables that are going to be replicated need to have a primary key.
after adding a pk to the tables that didn't have one i went to the slave and i did :

Create a node for the subscriber :
SELECT pglogical.create_node(node_name := 'subscriber1',dsn := 'host=10.0.0.3 port=5432 dbname=monkey');

Started the replication process which sync'd (schema and data): 
SELECT pglogical.create_subscription( subscription_name := 'subscription1',provider_dsn := 'host=10.0.0.2 port=5432 dbname=monkey', synchronize_structure := true);

To verify that these 2 databases have different storage types :

postgres@old_server:~/pgdata96$ pg_controldata |grep Date
Date/time type storage:               floating-point numbers

postgres@new_server:~/pgdata$ pg_controldata |grep Date
Date/time type storage:               64-bit integers

The database was transferred and from now on replicated.
At this point, if this was the real deal it would be preferred to first transfer the schema and then start replication just to be sure that all objects will transfer but in my case i didn't really care about that.

Keep in mind that since postgres 10 floating point datetimes are no longer supported. If you tried to compile it it would give :

postgres@a1bdb0750dc5:~/postgresql-10beta1$ ./configure --disable-integer-datetimes
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking which template to use... linux
configure: error: --disable-integer-datetimes is no longer supported 


This is a pretty specialized scenario, but from what i saw there are some databases out there having their date times stored as floating points that hopefully could benefit from this migration procedure.


Thanks for reading
-- Vasilis Ventirozos

3 comments:

  1. Just curious why something like the below was not an option:

    pg_dumpall(9.6/bin/) | psql -d pg06_db

    The procedure shown in the post seems to be the long way to get to the same place. Of course I may be missing something obvious.

    ReplyDelete
    Replies
    1. This scenario was meant to simulate a migration of a live production database while having transactions running with minimal downtime. Minimal downtime being the key point here, I don't see how this is possible with pg_dump.

      Delete
  2. Hi,

    Why don't you use slony un such situation ? I have upgraded a few clusters using it with great success... You could have jump from 9.2 to 9.6 directly without the third cluster in the middle...

    ReplyDelete