Tuesday, 15 October 2013

Parallel pg_dump backups

As I told in a previous post, i'm planning to post about the features that 9.3 brought to the game, today i will explore parallel pg_dump, how it works, how can we benefit from it and how does it compare to classic pg_dump.

First of all, pg_dump supports parallel dump only if -Fd (directory) is used, this is because directory dump  is the only format that supports multiple processes to write data at the same time. Directory format will write one file per relation and a toc.dat file, it is similar to -Fc with the output being compressed and supports parallel and selective restore.
The pg_dump switch that implements parallel dump is -j <njobs>,
when used, pg_dump will create n connections +1 (the master process).

The test:
for this test i have created 10 tables with 20million rows each. as a baseline i will use -Fc and then i will use -Fd -j increasing the number of jobs by 1 each time. the disk that was used was a simple 7.2k rpm sata3 disk connected over usb3.

> time pg_dump -Fc -f out.sql -d dump
real    4m58.831s
now with -Fd -j
> time pg_dump -Fd -j 1 -f out.sql1 -d dump
real    1m10.280s
> time pg_dump -Fd -j 2 -f out.sql2 -d dump
real    0m43.596s
> time pg_dump -Fd -j 3 -f out.sql3 -d dump
real    0m38.801s
> time pg_dump -Fd -j 4 -f out.sql4 -d dump
real    0m33.708s
> time pg_dump -Fd -j 5 -f out.sql5 -d dump
real    0m31.040s
> time pg_dump -Fd -j 6 -f out.sql6 -d dump
real    0m30.501s
> time pg_dump -Fd -j 60 -f out.sql60 -d dump
real    0m30.522s

Obviously the last run was just to show that you cannot use any degree of parallelism, on an I7 2600 (4 core + 4 multi threaded). After 6 concurrent jobs it had no difference. Still, we started with 1:10 and we end up with less than the half time.

Now, this feature is by nature not usable for everyone, in my test i had 10 equal tables with 20m rows each , if i had 1 large table and several small ones then it wouldn't scale the same. Also, I was talking with a friend about this feature and he said: Who does pg_dump now days anyway? Everyone is doing live backups
- That's partially true, but pg_dump always will have its use.
I think this feature is great, maybe not the greatest for terabytes of enterprise databases but its good for those who use pg_dump for backups. Test systems, tight on diskspace installations, databases with little network bandwidth.

Thanks for reading
- Vasilis

Tuesday, 8 October 2013

The cost of Reliability

At 9th of September PostgreSQL 9.3 was released with many exciting new features. This post is the first out of many (?) that will explore these new features and will try to evaluate them. I will start with data checksums, a feature that makes possible for PostgreSQL to checksum data pages and report corruptions.
I always liked how PostgreSQL doesn't cut corners when it comes to reliability, even when it (and it usually does) comes with a cost in performance. but how does this work and how much is this cost ? Thats why I'm writing this post :)

Checksums are set immediately prior to flush out of shared buffers
and checked when pages are read in again.

I used 2 db clusters one with the feature enabled and one without it, same config file and 3 basic operations 1 insert 2 updates, the results :

No checksums:

monkey_no_sum=# insert into test_of_sum (id) select generate_series (1,10000000);
INSERT 0 10000000
Time: 26226.654 ms
monkey_no_sum=# update test_of_sum set name = 'The Monkey';
UPDATE 10000000
Time: 51903.185 ms
monkey_no_sum=# update test_of_sum set name = 'The Monkey', address = 'somewhere in London';
UPDATE 10000000
Time: 70125.720 ms

With checksums:

monkey_sum=# insert into test_of_sum (id) select generate_series (1,10000000);
INSERT 0 10000000
Time: 23724.373 ms
monkey_sum=# update test_of_sum set name = 'The Monkey';
UPDATE 10000000
Time: 74446.178 ms
monkey_sum=# update test_of_sum set name = 'The Monkey', address = 'somewhere in London';
UPDATE 10000000
Time: 78987.092 ms

Apart from the performance impact, where obviously there is an extra overhead when you checksum all data pages, there are other things that make this great feature not easy to use. First of all, its a cluster setting. Which means if you want it on your second db you cannot have it in the same data dir. Also you cannot disable it once the cluster is created.

Don't get me wrong, I LOVE having this ! I think its absolutely brilliant that i have the option to use it when i can. but that's the problem, i can't :(
If this feature was object wide (tables / indexes) then it would have application in production systems , being up to the hand of the dba where to enable it and where not to. Even if it was database wide, then i would enable it on a secondary database that i run on the same server which i dont care about performance. Or even turning this on and off would be very useful.

There are good reasons why this feature doesn't work the way i would like, its because the whole thing happens in shared buffers and shared buffers are a cluster entity. I'm pretty positive though that this is just the beginning and that we will soon see a switch on-off operation.
Until then enjoy having this on small dbs or when reliability REALLY matters.

-- Thank for reading

Managing xid wraparound without looking like a (mail) chimp

My colleague Payal came across an outage that happened to mailchimp's mandrill app yesterday, link can be found HERE . Since this was P...