Wednesday, 22 January 2014

Setting up Postgres Automated Failover

Nowadays, failover is one of the simplest procedures in postgres. Still, there are not many ways to automate this procedure and perform it without any human interference. This blog post will guide you through a way to implement automated failover by using two open source projects, repmgr2 and pgbouncer.

PGbouncer will act like a connection pool and as an entry point to the database cluster.
Repmgr now, (since version 2) supports automatic failover, the slave will try to connect to the master for a predefined time and number of attempts, if it fails, it will run a script that can be anything. We are going to use this functionality to perform auto-failover, redirect our application connections to the new master database, and notify the dba that a failover has been performed.

Each server will have its own copy of pgbouncer configuration file(s) and when a server is being promoted, the new master also has the responsibility to overwrite the configuration file on the server that pgbouncer runs (

In my example I used:
3 nodes (3 debian net install virtual machines)
$PGDATA configured on each node
3 postgres installations. one on each node
repmgr2-beta2 or later

bouncer witness and pgbouncer server, will act as connection to our database cluster.
pgnode1 postgres database server 1
pgnode2 postgres database server 2

I will use these hosts as examples and as a point of reference for all the configuration files, lets start. . .

Install postgres on all nodes, set passwordless ssh between the three nodes for a user that has access to write in $PGDATA and run postgres binaries.

Install repmgr on all nodes, and pgbouncer only on bouncer.
On pgnode1 set the following in $PGDATA/postgresql.conf

wal_level = 'hot_standby'
archive_mode = on
archive_command = '/bin/false'
max_wal_senders = 10
wal_keep_segments = 5           # ----- FOR MY EXAMPLE 5, suggested by repmgr is bellow -------
#### wal_keep_segments = 5000   # 80 GB required on pg_xlog
hot_standby = on
shared_preload_libraries = 'repmgr_funcs'

and in $PGDATA/pg_hba.conf:
host    all             repmgr             trust
host    replication     repmgr           trust

Bring PostgreSQL up on pgnode1 and run:

createuser -s repmgr
createdb -O repmgr repmgr
psql -f </usr/share/postgresql/9.0>/contrib/repmgr_funcs.sql repmgr

On pgnode2 run:
repmgr -d repmgr -U repmgr -h pgnode1 standby clone
(this will use rsync)

Start postgres on pgnode2, you should have a new slave.
edit a repmgr.conf , whereever you like; in my case /etc/repmgr/repmgr.conf and add:

conninfo='host= dbname=repmgr user=repmgr'
follow_command='repmgr standby follow -f /etc/repmgr/repmgr.conf'

Each node will need its own repmgr.conf file: check github link for all my config files.
on pgnode1:
repmgr -f /etc/repmgr/repmgr.conf master register

on pgnode2:
repmgr -f /etc/repmgr/repmgr.conf standby register
on bouncer (witness server):repmgr -d repmgr -U repmgr -h -D PGDATA -f /etc/repmgr/repmgr.conf witness create

on pgnode2 and on bouncer run:
repmgrd -f /etc/repmgr/repmgr.conf > /var/log/postgresql/repmgr.log 2>&1 
(this is the deamon for repmgr)

Put the configuration file of pgbouncer from the master (pgnode1) to bouncer (pgbouncer server). Restart and test connectivity. If you stop pgnode1's postgres, pgnode2 will takeover, rsync configuration file for pgbouncer to bouncer, restart pgbouncer with ssh and mail a notification.

The main idea behind this implementation is that pgbouncer is the only entry point from the application to the database, each node upon promotion is responsible to “show” pgbouncer where it should now connect, this way, the application can remain unaware of  the current master db.
An alternative implementation could skip pgbouncer completely and each slave could directly change the configuration file of the application. In my example, I used a pgbouncer for 3 reasons: to ensure stability, to enforce connection pooling (and its advantages), and also to skip the assumption that the database stack has access to the application stack, which in my opinion shouldn’t.

  1. If the master is running repmgrd and db crashes then the deamon will exit because it can’t connect to the db, needs a way to restart, possibly with postgres startup.
  2. Bouncer in our example is a single point of failure, witness db can be configured as pgnode1/2 with repmgr and a second pgbouncer on a second machine. The application should take care on how to use these 2 pgbouncers.
  3. runs pg_ctl promote which is a postgresql 9.1 feature, for versions prior to 9.1 you may “touch” a trigger file.

Github link of all the configuration files used


Thanks for reading
--Vasilis Ventirozos

Managing xid wraparound without looking like a (mail) chimp

My colleague Payal came across an outage that happened to mailchimp's mandrill app yesterday, link can be found HERE . Since this was P...