Posts

Showing posts from October, 2013

Parallel pg_dump backups

As I told in a previous post, i'm planning to post about the features that 9.3 brought to the game, today i will explore parallel pg_dump, how it works, how can we benefit from it and how does it compare to classic pg_dump. First of all, pg_dump supports parallel dump only if -Fd (directory) is used, this is because directory dump  is the only format that supports multiple processes to write data at the same time. Directory format will write one file per relation and a toc.dat file, it is similar to -Fc with the output being compressed and supports parallel and selective restore. The pg_dump switch that implements parallel dump is -j <njobs>, when used, pg_dump will create n connections +1 (the master process). The test: for this test i have created 10 tables with 20million rows each. as a baseline i will use -Fc and then i will use -Fd -j increasing the number of jobs by 1 each time. the disk that was used was a simple 7.2k rpm sata3 disk connected over usb3. >

The cost of Reliability

At 9th of September PostgreSQL 9.3 was released with many exciting new features. This post is the first out of many (?) that will explore these new features and will try to evaluate them. I will start with data checksums, a feature that makes possible for PostgreSQL to checksum data pages and report corruptions. I always liked how PostgreSQL doesn't cut corners when it comes to reliability, even when it (and it usually does) comes with a cost in performance. but how does this work and how much is this cost ? Thats why I'm writing this post :) Checksums are set immediately prior to flush out of shared buffers and checked when pages are read in again. I used 2 db clusters one with the feature enabled and one without it, same config file and 3 basic operations 1 insert 2 updates, the results : No checksums: monkey_no_sum=# insert into test_of_sum (id) select generate_series (1,10000000); INSERT 0 10000000 Time: 26226.654 ms monkey_no_sum=# update test_of_sum se