Friday, 30 October 2015

PostgreSQL 9.4 streaming replication over SSL with Replication Slots

I never really liked the concept of keeping wal files in case the replica goes down proactively, setting wal_keep_segments was always a tricky number and most of the time we ended up setting this for the worst case scenario, something that means space consumption for no (under normal circumstances) good reason.
That’s why i really like replication slots.

Replication slots are, and i am quoting Robert Haas :

"a crash-safe data structure which can be created on either a master or a standby to prevent premature removal of write-ahead log segments needed by a standby, as well as (with hot_standby_feedback=on) pruning of tuples whose removal would cause replication conflicts. Slots have some advantages over existing techniques, as explained in the documentation."

While there are a lot of posts about how to setup replication slots, and to be honest its pretty easy, in this post i will combine this with replication over SSL and i will also show a useful trick for pg_basebackup (props to my friend Denish Patel).

The setup that will serve this post is the following

2 Debian VMS, with minimal install with PostgreSQL 9.4.5 installed on both.
we will call them :

master01 the ip will be and it will act as Master with $PGDATA at /data/.
slave01 the ip will be and it will act as Slave with $PGDATA at /data/.

on Master:
We set SSL and replication settings to :

ssl = on
ssl_cert_file = '/data/ssl/server.crt'
ssl_key_file = '/data/ssl/server.key'
wal_level = hot_standby
max_wal_senders = 5
wal_keep_segments = 0
max_replication_slots = 5
hot_standby = on  

hot_standby is ignored on master but we plan getting a pg_basebackup so it will be used on slave.

Before we restart the Master, we have to create the certificate files :

mkdir $PGDATA/ssl ; chown postgres /data/ssl
cd /data/ssl/
openssl req -new -text -out server.req
openssl rsa -in privkey.pem -out server.key
rm privkey.pem
openssl req -x509 -in server.req -text -key server.key -out server.crt
chmod og-rwx server.key

More info about the certifications can be found here.

Now we need a proper replication user a replication slot and a replication entry on pg_hba.conf.
for the user :
psql -c "create role repuser with login replication password 'password';" postgres
for the slot :
psql -c "SELECT * FROM pg_create_physical_replication_slot('slave01_slot');" postgres
and pg_hba.conf should loot like this :

# TYPE  DATABASE        USER            ADDRESS                 METHOD
local   all             all                                     trust
host    all             all               trust
host    all             vasilis          md5
hostssl replication     repuser          md5
host    all             all                  reject

Two things that need attenction in pg_hba.conf :
1. hostssl for the replication user
2. keep your "reject everyone else" as your last entry.

With all these set on Master , time to restart and start working on slave.

On slave:

Connect to the replication slot BEFORE you initiate pg_basebackup. This way Master will keep all wal files needed for the duration of the backup. Connection to the replication slot can be done by using pg_receivexlog eg:

> pg_receivexlog -D . -S slave01_slot -v -h -U repuser -W
pg_receivexlog: starting log streaming at 0/3E000000 (timeline 1)
pg_receivexlog: finished segment at 0/3F000000 (timeline 1)
^Cpg_receivexlog: received interrupt signal, exiting
pg_receivexlog: received interrupt signal, exiting

notice that i pressed ^C, i stopped pg_receivexlog and with this way the wal files will remain to the Master until my slave is up.

now run pg_basebackup :
pg_basebackup -h -D $PGDATA -P -X stream -U repuser

When pg_basebackup is complete, add a recovery.conf with the following entries :

standby_mode = 'on'
primary_conninfo = 'host= port=5432 user=repuser password=password sslmode=require'
trigger_file = '/data/failover.trigger'
recovery_target_timeline = 'latest'
primary_slot_name = 'slave01_slot'

and start the slave.
you should see:
database system is ready to accept read only connections

Now, in order to monitor your replication there are two useful sql statements :
I run the first one while my slave is down so as you can see it shows how much data (wal data) have to be transferred to my slave.

postgres=# SELECT slot_name, database,
active,  pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS retained_bytes FROM pg_replication_slots;
  slot_name   | database | active | retained_bytes
 slave01_slot |          | f      |      201326688
(1 row)

and after i started the slave :

postgres=# SELECT slot_name, database, active,pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS retained_bytes FROM pg_replication_slots;
  slot_name   | database | active | retained_bytes
 slave01_slot |          | t      |              0
(1 row)

the second monitoring statement is :

postgres=#  SELECT pg_xlog_location_diff(pg_current_xlog_insert_location(), flush_location) AS lag_bytes, pid, application_name
 FROM pg_stat_replication;

lag_bytes | pid | application_name
         0 | 855 | walreceiver

This will show you how many bytes your slave is behind.
(both statements are to be run on Master unless cascading replicas are in place)

Thanks for reading,
Vasilis Ventirozos