Tuesday, 20 December 2016

Traveling in time(lines) with pg_rewind

At 9.5 pg_rewind was introduced, with this feature it was possible to make a server that is no longer master to follow a promoted standby that has a new timeline. 
There are at least a few use cases that could benefit from this feature, to name a few :

Testing Failovers : Its pretty common for one of customers to ask for a failover simulation just to make sure that everything works, by using pg_rewind this is much easier and much faster because we don't have to re-sync the whole database that sometimes is multi-TB sized. 

Minor version upgrades : during an upgrade you have to put the new binaries in place and restart the database. By using pg_rewind if the application is flexible enough , you could upgrade the slave , switchover , upgrade the master and pg_rewind the old master to follow the new slave, possibly minimising even more the downtime.


pg_rewind has a couple of requirements :

1. data page checksums enabled (initdb -k)
2. parameter wal_log_hints has to be enabled 
3. The old-master (the one we want to re-sync) to be properly shutdown or you'll get: 
target server must be shut down cleanly 
Failure, exiting

How does it work ?

pg_rewind searches the old-master’s data directory, finds the data blocks changed during the switchover and then copies only the nesessary blocks from the promoted slave. Keep in mind that the configuration files will also be copied from the new master so take a look for mis-configurations before you start it ! Also you have to have all the wal files since the last checkpoint  from the old master. Changes are identified by comparing the state of the data blocks present in the data directory with the changes logged in the wal files. When the deferrable blocks are identified, the wals are replayed.
For more info refer to the documentation here

How to do it

Recently i've posted about how to use Docker to setup a lab environment, the link can be found here. I'm going to use two containers for this blogpost using the image created from the dockerfile that i have. Keep in mind that in my Dockerfile, i make sure that initdb is run with -k option, that all the parameters are properly set for replication and that archiving is enabled but not doing anything : 

psql -c "show archive_command";
(1 row)

before i setup my slave, i create a wal_archive directory and i change the archive_command to simply archive there :
mkdir ~/wal_archive
psql -c "alter system set archive_command to 'cp %p /home/postgres/wal_archive/%f' ;" postgres
psql -c "select pg_reload_conf() " postgres
(1 row)
psql -c "select pg_switch_xlog() ;" postgres
(1 row)
psql -c "checkpoint;" postgres

now , from the slave :

mkdir ~/wal_archive ; ~/mk_replica.sh
waiting for server to shut down.... done
server stopped
29284/29284 kB (100%), 1/1 tablespace
server starting
LOG:  redirecting log output to logging collector process
HINT:  Future log output will appear in directory "pg_log".
-[ RECORD 1 ]----+--------
lag_bytes        | 0
pid              | 106
application_name | a_slave

Next , we promote the slave :
pg_ctl promote ; sleep 5 ; psql -c "select pg_is_in_recovery();" postgres
server promoting
(1 row)

And now, time to pg_rewind, from the old master :

pg_ctl stop
waiting for server to shut down.... done
server stopped
pg_rewind --target-pgdata=/home/postgres/pgdata --source-server="host= port=5432 user=postgres dbname=postgres"

servers diverged at WAL position 0/F000098 on timeline 1
rewinding from last common checkpoint at 0/F000028 on timeline 1

put a recovery.conf on the rewinded master (now slave) and start it up, check that it got connected to the new master and you should be all good.

postgres=# SELECT pg_xlog_location_diff(pg_current_xlog_insert_location(), flush_location) AS lag_bytes,pid, application_name FROM pg_stat_replication;
 lag_bytes | pid | application_name
         0 | 139 | old_master

Most of the production database clusters that i've come across don't use database checksums which is understandable , because of the performance penalty that checksums have, (as an alternative to checksums, in OmniTI we use ZFS on OmniOS A LOT, and we love it) 
 still pg_rewind is a nice feature , maybe its not for every database but for those who will use it, it might save a ton of time of waiting to resync an old master.

Thanks for reading
-- Vasilis Ventirozos

Wednesday, 14 December 2016

Adding columns with default value fast

I recently came across a discussion in #postgresql channel about how to add a new column that will represent record insert timestamps to a table with minimal locking.

Normally you would do something like this :

monkey=# alter table alter_test add column insert_timestamp timestamp without time zone default now();
Time: 13643.983 ms
monkey=# create index concurrently insert_timestamp_idx on alter_test (insert_timestamp) where insert_timestamp is not null;
Time: 1616.108 ms

This though , would require an AccessExclusiveLock on the table and the application would have to wait for the duration of alter.
What you could do instead is :

monkey=# alter table alter_test add column insert_timestamp timestamp without time zone;
Time: 1.135 ms
monkey=# alter table alter_test alter column insert_timestamp set default now();
Time: 1.238 ms
monkey=# create index concurrently insert_timestamp_idx on alter_test (insert_timestamp) where insert_timestamp is not null;
Time: 1618.788 ms

As you can see the timings are completely different, in the first case the database has to change data in all pre-existing rows, which isn't really needed since the value will be dummy (it will default to the transaction time).
The second way will also require an AccessExclusiveLock but it wont have to change any data ignoring the old rows so the lock should be almost instant (assuming no blockers exist),
next step you alter the table adding the default value and from now on the new column will have the default value. If you don't want to have nulls, or you want to add a not null constraint to the column, you can update the old (null) rows in a more progressive and non intrusive way later on.

Thanks for reading
- Vasilis Ventirozos

Tuesday, 6 December 2016

Custom Postgres installation with Docker

Lately, i wanted to make a new test / lab environment that would be dispensable, fast to deploy and easily customisable. VM's are all nice for lab use and they served me well over the years, but they tend to be slow(er) to deploy, heavy on resources and lets face it,

So i've decided to play around with Docker. I've played with Docker in the past but i haven't done anything too serious, i just took the official postgres Dockerfile and run it to see how it works. This is how i started my whole Docker exploration, i took the official postgres docker file that can be found here and made an image. Immediately i saw that it wasn't exactly what i wanted, i wanted something less generic. The things that i would like are :

  • latest postgres version, compiled from source
  • easily customisable pgdata and prefix
  • custom postgresql.conf settings (mostly having it replication ready)
  • custom shell environment
  • some extensions
  • data page checksums
  • easy replica deployment
So i started creating my own Dockerfile that would fill my needs and at the same time i would learn some more about Docker, 2 birds one stone kind of thing. After several hours and some testing i came up with something that was working (link at the bottom). It still needs some work to be done but i plan maintaining it for my own use so feel free to use it if you like what i've done there.

Let me explain a bit how it works.
With :
ENV PGBINDIR=/home/postgres/pgsql
ENV PGDATADIR=/home/postgres/pgdata
I can customise the installation and the pgdata directories, I install all packages i need, note that comments on joe will be deleted! Then i add postgres user to sudoers (this dockerfile is not meant to be secure, ease of use is what i am looking for when it comes to a lab postgres instance).
I -always- get the latest postgres version sources, compile and install (including extensions , documentation etc) set some environmental variables, configure postgres (postgresql.conf , pg_hba.conf) , put a sample recovery.done there so slaves can get it from pg_basebackup , install some extensions in template1, initdb with checksums,  create a replication user and finally copy a very simple replica creation script to the docker image.

How to make it work
With Docker installed its pretty simple :

to create dockers own playground network
docker network create --subnet= garden   
to build the image, from the directory you have the Dockerfile downloaded
docker build -t pg_dev .  
to run a container with a specific ip in our kindergarten docker network:
docker run --name master --net garden --ip -i -t pg_dev /bin/bash
docker run --name slave1 --net garden --ip -i -t pg_dev /bin/bash

and docker ps should look like this :

CONTAINER ID        IMAGE             COMMAND           CREATED              STATUS                PORTS              NAMES
7a38d4b11769            pg_dev              "/bin/bash"         About a minute ago   Up About a minute   5432/tcp            slave1
2c0266f942ea             pg_dev              "/bin/bash"         About a minute ago   Up About a minute   5432/tcp            master

Keep in mind that i have started playing around with Docker just 2 days ago. There might be a better way to assign static ips that I'm not aware of (yet).

Now, say that you have put some data in master and you wanna create a replica on slave1, something like this should work :

From any slave container home dir :
postgres@7a38d4b11769:~$ ./mk_replica.sh
waiting for server to shut down.... done
server stopped
29044/29044 kB (100%), 1/1 tablespace
server starting
LOG:  redirecting log output to logging collector process
HINT:  Future log output will appear in directory "pg_log".

-[ RECORD 1 ]----+--------
lag_bytes               | 0
pid                         | 37
application_name  | a_slave

The Dockerfile can be found here and hopefully it can help some people on setting up a lab fast and easy.

Thanks for reading.
- Vasilis Ventirozos

Managing xid wraparound without looking like a (mail) chimp

My colleague Payal came across an outage that happened to mailchimp's mandrill app yesterday, link can be found HERE . Since this was P...