Tuesday, 16 September 2014

Offsite replication problems and how to solve them.

 Those of us who use (and abuse) replication in daily basis know how cool and flexible it is. I've seen a lot of guides on how to setup streaming replication in 5 minutes, how to setup basic archiving and/or wal shipping replication but i haven't seen many guides combining these or implementing an offsite setup simulating latency, packet corruption, and basically what happens under network degradation.

In this post i will describe a resilient replication setup of 2 nodes and i will put it to the test. For this post i will use 2 debian VMs, PostgreSQL 9.4 beta2, OmniPITR 1.3.2 and netem.
Netem can be found on all current (2.6+) distributions and it can emulate variable delay, loss, duplication and re-ordering.

The Basics

Streaming replication is awesome, its fast , easy to setup, lightweight and near to realtime, but how it performs over the internet ?

I setup a simple streaming replica, set wal_segments and wal_keep_segments low (10 and 5). Now i wanna emulate how it will perform over a slow internet connection :
From lab2 and as root :
# tc qdisc add dev eth0 root tbf rate 20kbit buffer 1600 limit 3000

This will emulate an "almost" network outage limiting eth0 to 20kbit.
Next, hammer lab1 with transactions... a bit later :
FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 00000001000000000000000A has already been removed
Makes sense right ? lab2 couldn't keep up with lab1, lab1 rotated all xlogs and the replica is now broken. I know that this example is a bit extreme, these settings would never be used for an offsite replica, in all fairness they aren't even suitable for a local replica.
But ! network outages happen, and especially on geographically distributed databases this WILL happen and because :
Matter will be damaged in direct proportion to its value.

So, lets configure something that will tolerate such failures and it will do much, much more.

Requirements
First of all, we want postgres to archive , we want wal files to be transferred compressed and on an encrypted channel. For all my wal management i will use OmniPITR , a wal management suite written by OmniTI that is simple to use, has almost no dependencies and makes everything so much better, i will use rsync over ssh to transfer my wals, bellow the archive_command and the recovery.conf entries for streaming + WAL shipping replication.
(please keep in mind that these settings are for the sake of this post, they are not to be used directly on production systems)

archive_command = '/home/postgres/omnipitr/bin/omnipitr-archive -D /opt/postgres/pgdata/ -dr gzip=postgres@lab2:/opt/postgres/walarchive -l /home/postgres/archive.log -v  "%p"'           

recovery.conf :
standby_mode = 'on'
primary_conninfo = 'user=postgres host=lab1 port=5432'
restore_command = '/home/postgres/omnipitr/bin/omnipitr-restore -l /home/postgres/restore.log -s gzip=/opt/postgres/walarchive -f /home/postgres/failover.now -p /home/postgres/pause.removal -t /tmp -ep hang -pp /opt/postgres/psql/bin/pg_controldata  -sr -r -v %f %p'

Again, with the same wal settings i hammered the database with transactions, the replica started having delays because of the 20kbit limitation but eventually it caught up and everything was ok.

OmniPITR is hands down awesome, it can do much more than just archive and restore wals. You can delay your replica,  you can take hot backups from the slave, that can even be encrypted on creation, you can output the backup from a slave directly to an offsite backup server with no need for extra disk space on the replica and more..
(On some occasions directly sending WALS to the slave might create issues, if you get into this situation, remember that you can always archive wals locally to the master and schedule a script to transfer all wals generated in time intervals that serve your needs.)

This all proves that WAL shipping replication on top of streaming is still very usable when it comes to remote/offsite slaves and you can always switch to only SR or only WAL shipping without problems.
Afterall, its good to know that in case of a network outage, your only limitation is disk space for archived wals.

Feedback and ideas for future posts are always welcome :)

Thanks for reading
- Vasilis