Wednesday, 22 January 2014

Setting up Postgres Automated Failover

Nowadays, failover is one of the simplest procedures in postgres. Still, there are not many ways to automate this procedure and perform it without any human interference. This blog post will guide you through a way to implement automated failover by using two open source projects, repmgr2 and pgbouncer.

PGbouncer will act like a connection pool and as an entry point to the database cluster.
Repmgr now, (since version 2) supports automatic failover, the slave will try to connect to the master for a predefined time and number of attempts, if it fails, it will run a script that can be anything. We are going to use this functionality to perform auto-failover, redirect our application connections to the new master database, and notify the dba that a failover has been performed.

Each server will have its own copy of pgbouncer configuration file(s) and when a server is being promoted, the new master also has the responsibility to overwrite the configuration file on the server that pgbouncer runs (failover.sh).

In my example I used:
3 nodes (3 debian net install virtual machines)
$PGDATA configured on each node
3 postgres installations. one on each node
repmgr2-beta2 or later
pgbouncer-1.5.4
rsync
sshd

Hostnames:
bouncer 192.168.0.16 witness and pgbouncer server, will act as connection to our database cluster.
pgnode1 192.168.0.202 postgres database server 1
pgnode2 192.168.0.203 postgres database server 2


I will use these hosts as examples and as a point of reference for all the configuration files, lets start. . .

Install postgres on all nodes, set passwordless ssh between the three nodes for a user that has access to write in $PGDATA and run postgres binaries.

Install repmgr on all nodes, and pgbouncer only on bouncer.
On pgnode1 set the following in $PGDATA/postgresql.conf

listen_addresses='*'
wal_level = 'hot_standby'
archive_mode = on
archive_command = '/bin/false'
max_wal_senders = 10
wal_keep_segments = 5           # ----- FOR MY EXAMPLE 5, suggested by repmgr is bellow -------
#### wal_keep_segments = 5000   # 80 GB required on pg_xlog
hot_standby = on
shared_preload_libraries = 'repmgr_funcs'


and in $PGDATA/pg_hba.conf:
host    all             repmgr             192.168.0.0/16          trust
host    replication     repmgr      192.168.0.0/16           trust

Bring PostgreSQL up on pgnode1 and run:

createuser -s repmgr
createdb -O repmgr repmgr
psql -f </usr/share/postgresql/9.0>/contrib/repmgr_funcs.sql repmgr

On pgnode2 run:
repmgr -d repmgr -U repmgr -h pgnode1 standby clone
(this will use rsync)

Start postgres on pgnode2, you should have a new slave.
edit a repmgr.conf , whereever you like; in my case /etc/repmgr/repmgr.conf and add:

cluster=my_cluster
node=1
#priority=-1
node_name=pgnode1
conninfo='host=192.168.0.202 dbname=repmgr user=repmgr'
master_response_timeout=50
reconnect_attempts=5
reconnect_interval=1
failover=automatic
promote_command='/home/vasilis/configs/failover.sh'
follow_command='repmgr standby follow -f /etc/repmgr/repmgr.conf'

Each node will need its own repmgr.conf file: check github link for all my config files.
on pgnode1:
repmgr -f /etc/repmgr/repmgr.conf master register

on pgnode2:
repmgr -f /etc/repmgr/repmgr.conf standby register
 
on bouncer (witness server):repmgr -d repmgr -U repmgr -h 192.168.0.16 -D PGDATA -f /etc/repmgr/repmgr.conf witness create

on pgnode2 and on bouncer run:
repmgrd -f /etc/repmgr/repmgr.conf > /var/log/postgresql/repmgr.log 2>&1 
(this is the deamon for repmgr)

Put the configuration file of pgbouncer from the master (pgnode1) to bouncer (pgbouncer server). Restart and test connectivity. If you stop pgnode1's postgres, pgnode2 will takeover, rsync configuration file for pgbouncer to bouncer, restart pgbouncer with ssh and mail a notification.


Conclusion
The main idea behind this implementation is that pgbouncer is the only entry point from the application to the database, each node upon promotion is responsible to “show” pgbouncer where it should now connect, this way, the application can remain unaware of  the current master db.
An alternative implementation could skip pgbouncer completely and each slave could directly change the configuration file of the application. In my example, I used a pgbouncer for 3 reasons: to ensure stability, to enforce connection pooling (and its advantages), and also to skip the assumption that the database stack has access to the application stack, which in my opinion shouldn’t.


notes:
  1. If the master is running repmgrd and db crashes then the deamon will exit because it can’t connect to the db, needs a way to restart, possibly with postgres startup.
  2. Bouncer in our example is a single point of failure, witness db can be configured as pgnode1/2 with repmgr and a second pgbouncer on a second machine. The application should take care on how to use these 2 pgbouncers.
  3. Failover.sh runs pg_ctl promote which is a postgresql 9.1 feature, for versions prior to 9.1 you may “touch” a trigger file.

Github link of all the configuration files used

 

Thanks for reading
--Vasilis Ventirozos

5 comments:

  1. Ti Kanis Vasilis,
    I am currently looking into your procedure for automated failover in pgsql...I am currently using 9.2 of postgres...probably shouldn't make much of a difference as so far...but my question to you is this...

    Have you had any issues with Repmgr? I've been doing some digging and haven't seen too much...Just wanted to know your experiences with it in the past weeks...


    Regards,
    Thomas

    ReplyDelete
  2. Hey Thomas, you shouldn't have any issues with 9.2, the only thing that is version specific is the way i command the failover (pg_ctl promote) and that's 9.1+.
    As you probably know repmgr started having automatic failover functionality since version 2, which at the moment is at beta2 phase. I haven't found any issues or any problems but this was on a lab so i can't really assure you that this will be perfect in production, the other project that performs automatic failover is handyrep, to be honest i found some problems with handyrep, and because this might be implemented on a real customer i chose repmgr because it is much more simple and definitely more straight forward.
    If you need further info you can always reach me at #postgresql in freenode (nickname evol_monkey).

    ReplyDelete
  3. excellent, man
    you saved my day!!

    ReplyDelete
  4. Great post! One question: Is pgbouncer a must here? I can configure a vip or dns for the master db node and then on failover the new master will start listening on this interface. This way there is no need to update the application config files - they always go to the vip.

    ReplyDelete
  5. Hey, no, pgbouncer is there just to abstract the connection string, you could get exactly the same functionality with VIP.

    ReplyDelete