Setting up Postgres Automated Failover

- January 22, 2014

Nowadays, failover is one of the simplest procedures in postgres. Still, there are not many ways to automate this procedure and perform it without any human interference. This blog post will guide you through a way to implement automated failover by using two open source projects, repmgr2 and pgbouncer.

PGbouncer will act like a connection pool and as an entry point to the database cluster.

Repmgr now, (since version 2) supports automatic failover, the slave will try to connect to the master for a predefined time and number of attempts, if it fails, it will run a script that can be anything. We are going to use this functionality to perform auto-failover, redirect our application connections to the new master database, and notify the dba that a failover has been performed.

Each server will have its own copy of pgbouncer configuration file(s) and when a server is being promoted, the new master also has the responsibility to overwrite the configuration file on the server that pgbouncer runs (failover.sh).

In my example I used:

3 nodes (3 debian net install virtual machines)

$PGDATA configured on each node

3 postgres installations. one on each node

repmgr2-beta2 or later

pgbouncer-1.5.4

rsync

sshd

Hostnames:

bouncer 192.168.0.16 witness and pgbouncer server, will act as connection to our database cluster.

pgnode1 192.168.0.202 postgres database server 1

pgnode2 192.168.0.203 postgres database server 2

I will use these hosts as examples and as a point of reference for all the configuration files, lets start. . .

Install postgres on all nodes, set passwordless ssh between the three nodes for a user that has access to write in $PGDATA and run postgres binaries.

Install repmgr on all nodes, and pgbouncer only on bouncer.

On pgnode1 set the following in $PGDATA/postgresql.conf

listen_addresses='*'
wal_level = 'hot_standby'
archive_mode = on
archive_command = '/bin/false'
max_wal_senders = 10
wal_keep_segments = 5           # ----- FOR MY EXAMPLE 5, suggested by repmgr is bellow -------
#### wal_keep_segments = 5000   # 80 GB required on pg_xlog
hot_standby = on
shared_preload_libraries = 'repmgr_funcs'

and in $PGDATA/pg_hba.conf:
host    all             repmgr             192.168.0.0/16          trust
host    replication     repmgr      192.168.0.0/16 trust

Bring PostgreSQL up on pgnode1 and run:

createuser -s repmgr
createdb -O repmgr repmgr
psql -f </usr/share/postgresql/9.0>/contrib/repmgr_funcs.sql repmgr

On pgnode2 run:

repmgr -d repmgr -U repmgr -h pgnode1 standby clone
(this will use rsync)

Start postgres on pgnode2, you should have a new slave.

edit a repmgr.conf , whereever you like; in my case /etc/repmgr/repmgr.conf and add:

cluster=my_cluster
node=1
#priority=-1
node_name=pgnode1
conninfo='host=192.168.0.202 dbname=repmgr user=repmgr'
master_response_timeout=50

reconnect_attempts=5
reconnect_interval=1
failover=automatic
promote_command='/home/vasilis/configs/failover.sh'
follow_command='repmgr standby follow -f /etc/repmgr/repmgr.conf'

Each node will need its own repmgr.conf file: check github link for all my config files.

on pgnode1:

repmgr -f /etc/repmgr/repmgr.conf master register

on pgnode2:repmgr -f /etc/repmgr/repmgr.conf standby register

on bouncer (witness server):repmgr -d repmgr -U repmgr -h 192.168.0.16 -D PGDATA -f /etc/repmgr/repmgr.conf witness create

on pgnode2 and on bouncer run:

repmgrd -f /etc/repmgr/repmgr.conf > /var/log/postgresql/repmgr.log 2>&1

(this is the deamon for repmgr)

Put the configuration file of pgbouncer from the master (pgnode1) to bouncer (pgbouncer server). Restart and test connectivity. If you stop pgnode1's postgres, pgnode2 will takeover, rsync configuration file for pgbouncer to bouncer, restart pgbouncer with ssh and mail a notification.

Conclusion

The main idea behind this implementation is that pgbouncer is the only entry point from the application to the database, each node upon promotion is responsible to “show” pgbouncer where it should now connect, this way, the application can remain unaware of the current master db.

An alternative implementation could skip pgbouncer completely and each slave could directly change the configuration file of the application. In my example, I used a pgbouncer for 3 reasons: to ensure stability, to enforce connection pooling (and its advantages), and also to skip the assumption that the database stack has access to the application stack, which in my opinion shouldn’t.

notes:

If the master is running repmgrd and db crashes then the deamon will exit because it can’t connect to the db, needs a way to restart, possibly with postgres startup.
Bouncer in our example is a single point of failure, witness db can be configured as pgnode1/2 with repmgr and a second pgbouncer on a second machine. The application should take care on how to use these 2 pgbouncers.
Failover.sh runs pg_ctl promote which is a postgresql 9.1 feature, for versions prior to 9.1 you may “touch” a trigger file.

Github link of all the configuration files used

Thanks for reading

--Vasilis Ventirozos

Comments

Thomas23 January 2014 at 15:00
Ti Kanis Vasilis,
I am currently looking into your procedure for automated failover in pgsql...I am currently using 9.2 of postgres...probably shouldn't make much of a difference as so far...but my question to you is this...

Have you had any issues with Repmgr? I've been doing some digging and haven't seen too much...Just wanted to know your experiences with it in the past weeks...

Regards,
Thomas
ReplyDelete
Replies
Evol Monkey24 January 2014 at 06:10
Hey Thomas, you shouldn't have any issues with 9.2, the only thing that is version specific is the way i command the failover (pg_ctl promote) and that's 9.1+.
As you probably know repmgr started having automatic failover functionality since version 2, which at the moment is at beta2 phase. I haven't found any issues or any problems but this was on a lab so i can't really assure you that this will be perfect in production, the other project that performs automatic failover is handyrep, to be honest i found some problems with handyrep, and because this might be implemented on a real customer i chose repmgr because it is much more simple and definitely more straight forward.
If you need further info you can always reach me at #postgresql in freenode (nickname evol_monkey).
ReplyDelete
Replies
Ari1 March 2014 at 02:34
excellent, man
you saved my day!!
ReplyDelete
Replies
Yaron Naveh (MVP)9 May 2014 at 10:27
Great post! One question: Is pgbouncer a must here? I can configure a vip or dns for the master db node and then on failover the new master will start listening on this interface. This way there is no need to update the application config files - they always go to the vip.
ReplyDelete
Replies
Evol Monkey12 May 2014 at 11:26
Hey, no, pgbouncer is there just to abstract the connection string, you could get exactly the same functionality with VIP.
ReplyDelete
Replies
Unknown28 September 2014 at 04:44
eVOL, I had some problems trying to read this blog entry, the entire text is showing up to dark for me, besides that this blog post gave to me a lot of clarification about pgbouncer and repmgr, thanks!
ReplyDelete
Replies
Nitu10 November 2015 at 05:09
Hi,
Thanks for the excellent tutorial.
Hope you are still around this blog !
I am trying to configure failover as per this blog on ubuntu.
I installed repmgr using $ sudo apt-get install repmgr
However, I can't locate repmgr_funcs.sql as described in the step -

psql -f /contrib/repmgr_funcs.sql repmgr

Can you please help?
ReplyDelete
Replies
Nitu10 November 2015 at 10:23
I am still in dark !!
ReplyDelete
Replies
Evol Monkey10 November 2015 at 11:09
i checked the package, there isn't an sql file, try continue without adding the functions, they might be created when you register a new master.

Soon i will post a post about repmgr 3.0.2 by the way
ReplyDelete
Replies
Nitu10 November 2015 at 19:16
I shall try. Thank you. shall keep an eye for your 3.0.2 blog.
ReplyDelete
Replies
Nitu10 November 2015 at 19:26
Added optional dependency $sudo apt-get install postgresql-9.3-repmgr and voila postgresql service starts !!
ReplyDelete
Replies
Kieran Alen15 March 2016 at 08:05
Thanks for the great post. It helped me lot to configure the postgres failover.
I have only one question. In one configuration file of "bouncer/repmgr.conf", "promote_command.sh" is mentioned. But I haven't found any "promote_command.sh" in attachments.

Can you please provide some details on that file.

Thanks.
ReplyDelete
Replies
Dan19 May 2017 at 08:43
Thanks for this post:) I have recreated this environment as in your post. Unfortunately, the penultimate step can not be executed.

>>>>>>repmgr -d repmgr -U repmgr -h 192.168.0.16 -D PGDATA -f /etc/repmgr/repmgr.conf witness create<<<<<< on PgBouncer Server.

I get the following error: [ERROR] Cannot insert node details, FEHLER: Relation »repmgr_my_cluster.repl_nodes« existiert nicht
LINE 1: INSERT INTO repmgr_my_cluster.repl_nodes(id, cluster, name, ...

I hope you can support me after this time. In which step, the mentioned table is created?
>>repmgr_my_cluster.repl_nodes<<

I hope you can help me. Many Thanks. Dan

ReplyDelete
Replies

Add comment

Evol Monkey

Setting up Postgres Automated Failover

Comments

Post a Comment

Popular posts from this blog

Tuning checkpoints

AWS Aurora Postgres, not a great first impression

Accessing PostgreSQL data from AWS Lambda