Friday, 19 May 2017

Reusing an old master (as slave) after a switchover

Todays blogpost is old news but it's not very well known and it has to do with how to add an old master after a slave has been promoted. Fujii Masao explained the situation in his patch back in the day.

So in todays post i will be demonstrating a combination of replication slots for retaining the writes on a promoted slave and how to re-introduce an old master to the replication.

Say that we have a master-slave setup, stop the master gracefully and create and activate a replica slot on the slave :

monkey=# SELECT * FROM pg_create_physical_replication_slot('this_is_a_replica_slot');
       slot_name        | xlog_position
------------------------+---------------
 this_is_a_replica_slot |
(1 row)

postgres@bf9823730feb:~$ pg_receivexlog -D . -S this_is_a_replica_slot -v -h 10.0.0.3 -U repuser
pg_receivexlog: starting log streaming at 0/4000000 (timeline 1) ^C
pg_receivexlog: received interrupt signal, exiting
pg_receivexlog: not renaming "000000010000000000000004.partial", segment is not complete
pg_receivexlog: received interrupt signal, exiting


This way all the changes after the slave promotion will be retained. Now , lets promote the slave and check the replication slot status.

postgres@bf9823730feb:~$ pg_ctl promote
server promoting

postgres@bf9823730feb:~$ psql -c "SELECT slot_name, database,active,  pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS retained_bytes FROM pg_replication_slots;" postgres
       slot_name        | database | active | retained_bytes
------------------------+----------+--------+----------------
 this_is_a_replica_slot |          | f      |           9056

As you can see , the new master is now retaining all writes. Now, to the old master, lets make a recovery.conf that looks like this :

standby_mode = 'on'
primary_conninfo = 'user=repuser host=10.0.0.3 port=5432 application_name=a_slave'
trigger_file = '/home/postgres/pgdata/finish.recovery'
primary_slot_name = 'this_is_a_replica_slot'
recovery_target_timeline = 'latest'

where host is obviously the new master. 
Start the old master and you should see something similar to:

LOG:  entering standby mode
LOG:  consistent recovery state reached at 0/5000098
LOG:  database system is ready to accept read only connections
LOG:  invalid record length at 0/5000098: wanted 24, got 0
LOG:  fetching timeline history file for timeline 2 from primary server
LOG:  started streaming WAL from primary at 0/5000000 on timeline 1
LOG:  replication terminated by primary server
DETAIL:  End of WAL reached on timeline 1 at 0/5000098.
LOG:  new target timeline is 2
LOG:  restarted WAL streaming at 0/5000000 on timeline 2
LOG:  redo starts at 0/5000098


This would also work if you had archiving enabled instead of replication slots. Given that you have plenty of available disk space on the new master, it should be fine to keep the old master down for any reasonable amount of time and re-introduce it to the replication without issues.

The only 2 things that you have to make sure is to shutdown postgres on old-master gracefully and that the slave has caught up before the promotion.


Thanks for reading.




No comments:

Post a Comment