Wednesday, 22 January 2014

Setting up Postgres Automated Failover

Nowadays, failover is one of the simplest procedures in postgres. Still, there are not many ways to automate this procedure and perform it without any human interference. This blog post will guide you through a way to implement automated failover by using two open source projects, repmgr2 and pgbouncer.

PGbouncer will act like a connection pool and as an entry point to the database cluster.
Repmgr now, (since version 2) supports automatic failover, the slave will try to connect to the master for a predefined time and number of attempts, if it fails, it will run a script that can be anything. We are going to use this functionality to perform auto-failover, redirect our application connections to the new master database, and notify the dba that a failover has been performed.

Each server will have its own copy of pgbouncer configuration file(s) and when a server is being promoted, the new master also has the responsibility to overwrite the configuration file on the server that pgbouncer runs (failover.sh).

In my example I used:
3 nodes (3 debian net install virtual machines)
$PGDATA configured on each node
3 postgres installations. one on each node
repmgr2-beta2 or later
pgbouncer-1.5.4
rsync
sshd

Hostnames:
bouncer 192.168.0.16 witness and pgbouncer server, will act as connection to our database cluster.
pgnode1 192.168.0.202 postgres database server 1
pgnode2 192.168.0.203 postgres database server 2


I will use these hosts as examples and as a point of reference for all the configuration files, lets start. . .

Install postgres on all nodes, set passwordless ssh between the three nodes for a user that has access to write in $PGDATA and run postgres binaries.

Install repmgr on all nodes, and pgbouncer only on bouncer.
On pgnode1 set the following in $PGDATA/postgresql.conf

listen_addresses='*'
wal_level = 'hot_standby'
archive_mode = on
archive_command = '/bin/false'
max_wal_senders = 10
wal_keep_segments = 5           # ----- FOR MY EXAMPLE 5, suggested by repmgr is bellow -------
#### wal_keep_segments = 5000   # 80 GB required on pg_xlog
hot_standby = on
shared_preload_libraries = 'repmgr_funcs'


and in $PGDATA/pg_hba.conf:
host    all             repmgr             192.168.0.0/16          trust
host    replication     repmgr      192.168.0.0/16           trust

Bring PostgreSQL up on pgnode1 and run:

createuser -s repmgr
createdb -O repmgr repmgr
psql -f </usr/share/postgresql/9.0>/contrib/repmgr_funcs.sql repmgr

On pgnode2 run:
repmgr -d repmgr -U repmgr -h pgnode1 standby clone
(this will use rsync)

Start postgres on pgnode2, you should have a new slave.
edit a repmgr.conf , whereever you like; in my case /etc/repmgr/repmgr.conf and add:

cluster=my_cluster
node=1
#priority=-1
node_name=pgnode1
conninfo='host=192.168.0.202 dbname=repmgr user=repmgr'
master_response_timeout=50
reconnect_attempts=5
reconnect_interval=1
failover=automatic
promote_command='/home/vasilis/configs/failover.sh'
follow_command='repmgr standby follow -f /etc/repmgr/repmgr.conf'

Each node will need its own repmgr.conf file: check github link for all my config files.
on pgnode1:
repmgr -f /etc/repmgr/repmgr.conf master register

on pgnode2:
repmgr -f /etc/repmgr/repmgr.conf standby register
 
on bouncer (witness server):repmgr -d repmgr -U repmgr -h 192.168.0.16 -D PGDATA -f /etc/repmgr/repmgr.conf witness create

on pgnode2 and on bouncer run:
repmgrd -f /etc/repmgr/repmgr.conf > /var/log/postgresql/repmgr.log 2>&1 
(this is the deamon for repmgr)

Put the configuration file of pgbouncer from the master (pgnode1) to bouncer (pgbouncer server). Restart and test connectivity. If you stop pgnode1's postgres, pgnode2 will takeover, rsync configuration file for pgbouncer to bouncer, restart pgbouncer with ssh and mail a notification.


Conclusion
The main idea behind this implementation is that pgbouncer is the only entry point from the application to the database, each node upon promotion is responsible to “show” pgbouncer where it should now connect, this way, the application can remain unaware of  the current master db.
An alternative implementation could skip pgbouncer completely and each slave could directly change the configuration file of the application. In my example, I used a pgbouncer for 3 reasons: to ensure stability, to enforce connection pooling (and its advantages), and also to skip the assumption that the database stack has access to the application stack, which in my opinion shouldn’t.


notes:
  1. If the master is running repmgrd and db crashes then the deamon will exit because it can’t connect to the db, needs a way to restart, possibly with postgres startup.
  2. Bouncer in our example is a single point of failure, witness db can be configured as pgnode1/2 with repmgr and a second pgbouncer on a second machine. The application should take care on how to use these 2 pgbouncers.
  3. Failover.sh runs pg_ctl promote which is a postgresql 9.1 feature, for versions prior to 9.1 you may “touch” a trigger file.

Github link of all the configuration files used

 

Thanks for reading
--Vasilis Ventirozos

Monday, 30 December 2013

pgreplay-ing logfiles

Recently I wanted to find something to rerun a workload (from logfiles) in postgres. the reasons ? benchmark , testing , troubleshooting.
My colleague Denish reminded me a project called pgreplay. pgreplay reads a PostgreSQL logfile , extracts the SQL statements and executes them in the same order and with (or without) the original timing against a PG database.

I download it, compiled it and went straight to the testing.

The setup :

log_min_messages = error  (or more)
   (if you know that you have no cancel requests, 'log' will do)
log_min_error_statement = log  (or more)
log_connections = on
log_disconnections = on
log_line_prefix = '%m|%u|%d|%c|'  (if you don't use CSV logging)
log_statement = 'all'
lc_messages must be set to English (the encoding does not matter)
bytea_output = escape  (from version 9.0 on, only if you want to replay
                        the log on 8.4 or earlier)

It is highly recommended that you use CSV logging, because anything that
the PostgreSQL server or any loaded modules write to standard error will
be written to the stderr log and might confuse the parser.


For small and simple workloads it seemed that it worked fine. Bellow the output from a simple workload :

Replay statistics
=================

Speed factor for replay: 1.000
Total run time: 29.700 seconds
Maximum lag behind schedule: 0 seconds
Calls to the server: 8
(0.269 calls per second)
Total number of connections: 1
Maximum number of concurrent connections: 1
Average number of concurrent connections: 1.000
Average session idle percentage: 99.680%
SQL statements executed: 6
(0 or 0.000% of these completed with error)
Maximum number of concurrent SQL statements: 1
Average number of concurrent SQL statements: 0.003
Average SQL statement duration: 0.016 seconds
Maximum SQL statement duration: 0.038 seconds
Statement duration histogram:
0 to 0.02 seconds: 83.333%
0.02 to 0.1 seconds: 16.667%
0.1 to 0.5 seconds: 0.000%
0.5 to 2 seconds: 0.000%
over 2 seconds: 0.000%


So far so good, i wanted to see what will happen when concurrency occurred so i tried with pgbench , with 2 concurrent everything was ok, so i tried to scale it up a bit to 16 concurrent transactions, and then these nice messages started showing :
"Execution is 30 minutes behind schedule"

The reason ? Locks Lots and lots of Locks ! but let's see closer what happened...
While pgreplay was running trying to handle my logfile the best way it could, a simple lock investigating query was enough to show me that i would never get the successful prompt that i wanted :

monkey=#   SELECT bl.pid     AS blocked_pid,
monkey-#          a.usename  AS blocked_user,
monkey-#          kl.pid     AS blocking_pid,
monkey-#          ka.usename AS blocking_user,
monkey-#          a.query    AS blocked_statement
monkey-#    FROM  pg_catalog.pg_locks         bl
monkey-#     JOIN pg_catalog.pg_stat_activity a  ON a.pid = bl.pid
monkey-#     JOIN pg_catalog.pg_locks         kl ON kl.transactionid = bl.transactionid AND kl.pid != bl.pid
monkey-#     JOIN pg_catalog.pg_stat_activity ka ON ka.pid = kl.pid
monkey-#    WHERE NOT bl.granted;

 blocked_pid | blocked_user | blocking_pid | blocking_user |                          blocked_statement                          
-------------+--------------+--------------+---------------+----------------------------------------------------------------------
        7398 | vasilis      |         7403 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7398 | vasilis      |         7399 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7398 | vasilis      |         7404 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7398 | vasilis      |         7402 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7402 | vasilis      |         7398 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7402 | vasilis      |         7403 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7402 | vasilis      |         7399 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7402 | vasilis      |         7404 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7403 | vasilis      |         7398 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7403 | vasilis      |         7399 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7403 | vasilis      |         7404 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7403 | vasilis      |         7402 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7404 | vasilis      |         7398 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7404 | vasilis      |         7403 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7404 | vasilis      |         7399 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7404 | vasilis      |         7402 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7409 | vasilis      |         7404 | vasilis       | UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;


Deadlock, the reason ?
(quote from pgreplay documentation) :

While pgreplay makes sure that commands are sent to the server in the order in which they were originally executed, there is no way to guarantee that they will be executed in the same order during replay: Network delay, processor contention and other factors may cause a later command to "overtake" an earlier one. While this does not matter if the commands don't affect each other, it can lead to SQL statements hitting locks unexpectedly, causing replay to deadlock and "hang". This is particularly likely if many different sessions change the same data repeatedly in short intervals.

And that was my case, lots of processes were changing the same data.
For the record, i also set up a simple troubleshooting project that works on postgres, with 2 users doing their own thing pgreplay worked fine.

conclusion :
Ok , pgreplay didn't do the job for me, i am pretty sure that if i got the logs from a production DB i would run into deadlocks, still messing a bit with it was good because i could use this for replaying simple workload logfiles to a different machine or just because its a cool project that does what it does and you never know when you might need something like this..



Wednesday, 18 December 2013

A Problem Geeked Into Big Data

Today my article got published in OmniTI Seeds. it was a very interesting project, i had a lot of fun while doing it, and even if its not exactly Postgres related, I wouldn't be able to do it without Postgres. It begun from the basic everyday operation that all dba's are doing (log parsing / analysis) but it got to a whole different level. I hope you like it.

A Problem Geeked Into Big Data

cooking with fdw

Back in 9.1 a feature called foreign data wrappers was introduced, you could basically connect a PostgreSQL Server to remote data stores (other dbs, csv files etc). Today i plan to play with this FDW and see what and how we can use it.

First of all there are many FDW's, a list can be found here.

For the sake of this post i used the following data wrappers :

     Name     | Version |   Schema   |                    Description                    
--------------+---------+------------+----------------------------------------------------
 file_fdw     | 1.0     | public     | foreign-data wrapper for flat file access
 postgres_fdw | 1.0     | public     | foreign-data wrapper for remote PostgreSQL servers
 redis_fdw    | 1.0     | public     | Foreign data wrapper for querying a Redis server


file_fdw and postgres_fdw are both available in contrib, redis fdw can be found in the following link. you will also need to compile and install hiredis which can be found here. I wont go much into detail on how to install this, but there is plenty documentation available.

The plan is to connect a redis server a remote postgreSQL server and a csv and interact with everything.

First thing that has to be done after the extensions have been created is to create foreign servers and tables, the following example shows what i did with the remote postgreSQL server :

test_redis=# CREATE SERVER node_2
FOREIGN DATA WRAPPER postgres_fdw  
OPTIONS (host '192.168.0.3', dbname 'postgres');     
CREATE SERVER
test_redis=# CREATE USER MAPPING FOR PUBLIC SERVER node_2;
CREATE USER MAPPING
test_redis=# CREATE FOREIGN TABLE test_pg (id int)
SERVER node_2 OPTIONS (table_name 'test_pg');
CREATE FOREIGN TABLE


csv and redis foreign tables were really similar to create, so when i was done i had this :

 public | db15       | foreign table | vasilis
 public | pgcsv      | foreign table | vasilis
 public | test       | table         | vasilis
 public | test2_pg   | foreign table | vasilis





one local table and one for each of my foreign wrappers. So, lets start playing around.

test_redis=# begin;
BEGIN
test_redis=# insert into test2_pg values ('koko','lala');
INSERT 0 1
test_redis=# commit;
COMMIT



Yep, i could insert into a remote PG server, in a transactional way, goodbye dblink ? (oh yes!)
Unfortunately this wouldn't work on redis, but only because the fdw doesn't support it (yet?) .
Now lets join them all and see how that works.

test_redis=# select * from db15,pgcsv,test2_pg,test where db15.key=test2_pg.key and pgcsv.key=db15.key and db15.key=test.key and test2_pg.key=db15.key;
 key  | value | id | key  |    date    |   time   | num | key  | value | key  | value 
------+-------+----+------+------------+----------+-----+------+-------+------+--------
 koko | koko1 |  1 | koko | 2013-12-01 | 00:01:01 | 234 | koko | lala  | koko | ninini


the statement is not something that is actually usable, but it demonstrates by the execution plan that all foreign tables have been accessed :

test_redis=# explain analyze select * from db15,pgcsv,test2_pg,test where db15.key=test2_pg.key and pgcsv.key=db15.key and db15.key=test.key and test2_pg.key=db15.key;
                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=111.24..156.43 rows=1 width=244) (actual time=0.981..1.031 rows=1 loops=1)
   Join Filter: (test2_pg.key = db15.key)
   Rows Removed by Join Filter: 2
   ->  Nested Loop  (cost=101.24..143.39 rows=1 width=180) (actual time=0.857..0.862 rows=1 loops=1)
         ->  Hash Join  (cost=101.11..142.55 rows=5 width=116) (actual time=0.833..0.835 rows=1 loops=1)
               Hash Cond: (test2_pg.key = pgcsv.key)
               ->  Foreign Scan on test2_pg  (cost=100.00..137.90 rows=930 width=64) (actual time=0.767..0.768 rows=1 loops=1)
               ->  Hash  (cost=1.10..1.10 rows=1 width=52) (actual time=0.049..0.049 rows=3 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 1kB
                     ->  Foreign Scan on pgcsv  (cost=0.00..1.10 rows=1 width=52) (actual time=0.033..0.042 rows=3 loops=1)
                           Foreign File: /home/vasilis/lala.csv
                           Foreign File Size: 98
         ->  Index Scan using kokoako on test  (cost=0.12..0.16 rows=1 width=64) (actual time=0.019..0.021 rows=1 loops=1)
               Index Cond: (key = test2_pg.key)
   ->  Foreign Scan on db15  (cost=10.00..13.00 rows=3 width=64) (actual time=0.071..0.162 rows=3 loops=1)
         Foreign Redis Table Size: 3
 Total runtime: 2.184 ms
(17 rows)

This is awesome from many aspects, imagine that you have a heterogeneous environment with mongo, redis, postgres , oracle ,csv logs etc.. you can put a postgreSQL in the middle, acting like a bridge and create supersets, subsets, aggregations, views,temp tables, report tables and so many other things. You could even limit users to access only the "bridge postgres" so they can see whatever you want from many different systems. you can use SQL in redis since redis commands are so limited. and because i am starting getting a bit paranoid i will leave it up to you on how to use this :)


Thanks for reading
-- Vasilis





Tuesday, 17 December 2013

scaling postgres with pgpool-II

Recently a new release of pgpool-II was released, 3.2.7 (released 2013-12-06). For connection pooling I mostly have experience with pgbouncer and in OmniTI we use it a lot. I heard that pgpool has some issues and because of its increased complexity i don't know if i would recommend it as a connection pool but pgpool does much more than that.

What is pgpool-II (from pg-pool wiki) :


pgpool-II is a middleware that works between PostgreSQL servers and a PostgreSQL database client. It is licensed under BSD license. It provides the following features.
  • Connection Pooling
pgpool-II saves connections to the PostgreSQL servers, and reuse them whenever a new connection with the same properties (i.e. username, database, protocol version) comes in. It reduces connection overhead, and improves system's overall throughput.
  • Replication
pgpool-II can manage multiple PostgreSQL servers. Using the replication function enables creating a realtime backup on 2 or more physical disks, so that the service can continue without stopping servers in case of a disk failure.
  • Load Balancing
If a database is replicated, executing a SELECT query on any server will return the same result. pgpool-II takes an advantage of the replication feature to reduce the load on each PostgreSQL server by distributing SELECT queries among multiple servers, improving system's overall throughput. At best, performance improves proportionally to the number of PostgreSQL servers. Load balance works best in a situation where there are a lot of users executing many queries at the same time.
  • Limiting Exceeding Connections
There is a limit on the maximum number of concurrent connections with PostgreSQL, and connections are rejected after this many connections. Setting the maximum number of connections, however, increases resource consumption and affect system performance. pgpool-II also has a limit on the maximum number of connections, but extra connections will be queued instead of returning an error immediately.
  • Parallel Query
Using the parallel query function, data can be divided among the multiple servers, so that a query can be executed on all the servers concurrently to reduce the overall execution time. Parallel query works the best when searching large-scale data.

The feature that im mostly interested and that i want to explore in this post is load ballancing. I like this feature because without changing the application you can elevate your replica one step and make it even more useful.

I setup 3 virtual machines that i called pgpool1, pgnode1, pgnode2.
I setup a basic installation of debian and i compiled pgpool-II (3.3.2) and PostgreSQL 9.3.2.
As a first step i setup a basic streaming replication between pgnode1 and pgnode2 and i set pgpool1 for Load ballancing, all configurations can be found in my github. I wont go much into detail on how to set pgpool up, my configuration files should be a good guide and i found that pgpool documentation is very informative and up to date, also many before me have blogged about how to configure it.
I will stand more on the design and what could you do with this.
As we all know streaming replication is almost instant so separating "read" from "write" can be beneficial in a large variety of applications, but especially for more reads than writes, so lets see how this works :

root@pgpool1:~# psql -p 9999 -U postgres
postgres=# create database monkey;
CREATE DATABASE
postgres=# \connect monkey;
You are now connected to database "monkey" as user "postgres".
monkey=# create table test (id int);
CREATE TABLE
monkey=# insert into test values (1);
INSERT 0 1
monkey=# select * from test;
 id
----
  1
(1 row)

With full logging enabled in the dbs lets see the logs from pgnode1 and pgnode2 :

pgnode1:
LOG:  duration: 342.044 ms  statement: create database monkey;
LOG:  duration: 4.646 ms  statement: create table test (id int);
LOG:  duration: 1.733 ms  statement: insert into test values (1);

pgnode2:
LOG:  duration: 1.018 ms  statement: select * from test;

OK it works as expected, now i heard that this doesnt work if select has a space in the begining so i tried to test that as well :

on pgnode2 (replica) :
LOG:  duration: 0.757 ms  statement:   selecT * from test;

LOG:  duration: 0.215 ms  statement: SeLeCt   * from test /* this is my test */;

something more complicated maybe? something like pgbench sounds appropriate.

I wont spam more logs, it worked as it should.
But what would happen with 2 replicas ? i know that read operations should be scaled and that they should be distributed among the slaves, i created another database cluster on pgnode2 (port 5433) and made it replica to pgnode1,
so now i had one master running on pgnode1 and two slaves running on pgnode2 (5432 and 5433).
I changed pgpool configuration to add the new db :

backend_hostname0 = '192.168.0.202'
backend_port0 = 5432
backend_weight0 = 0
backend_data_directory0 = '/opt/pgdata'

backend_hostname1 = '192.168.0.203'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/opt/pgdata'

backend_hostname2 = '192.168.0.203'
backend_port2 = 5433
backend_weight2 = 1
backend_data_directory2 = '/opt/pgdata2'


and from pgpool1 i run :
while true; do psql -p 9999  -c 'selecT * from test;'  -U postgres ;done

and VoilĂ  , i had my selects distributed on my 2 slaves.

pgpool has A LOT of cool features, im not in position to say if they all work as they should but from my tests on load ballancing everything was smooth, pgpool has also a web based gui worth checking and some system views that can be used to monitor the activity, also supports automatic failover, watchdog and many many other neat features that i might explore in another post.

Next time i will explore Parallel query.

all the configurations can be found in my github (including the postgres configs that were used).


Thanks for reading
- Vasilis












Tuesday, 15 October 2013

Parallel pg_dump backups

As I told in a previous post, i'm planning to post about the features that 9.3 brought to the game, today i will explore parallel pg_dump, how it works, how can we benefit from it and how does it compare to classic pg_dump.

First of all, pg_dump supports parallel dump only if -Fd (directory) is used, this is because directory dump  is the only format that supports multiple processes to write data at the same time. Directory format will write one file per relation and a toc.dat file, it is similar to -Fc with the output being compressed and supports parallel and selective restore.
The pg_dump switch that implements parallel dump is -j <njobs>,
when used, pg_dump will create n connections +1 (the master process).

The test:
for this test i have created 10 tables with 20million rows each. as a baseline i will use -Fc and then i will use -Fd -j increasing the number of jobs by 1 each time. the disk that was used was a simple 7.2k rpm sata3 disk connected over usb3.

> time pg_dump -Fc -f out.sql -d dump
real    4m58.831s
---------
now with -Fd -j
> time pg_dump -Fd -j 1 -f out.sql1 -d dump
real    1m10.280s
> time pg_dump -Fd -j 2 -f out.sql2 -d dump
real    0m43.596s
> time pg_dump -Fd -j 3 -f out.sql3 -d dump
real    0m38.801s
> time pg_dump -Fd -j 4 -f out.sql4 -d dump
real    0m33.708s
> time pg_dump -Fd -j 5 -f out.sql5 -d dump
real    0m31.040s
> time pg_dump -Fd -j 6 -f out.sql6 -d dump
real    0m30.501s
> time pg_dump -Fd -j 60 -f out.sql60 -d dump
real    0m30.522s

Obviously the last run was just to show that you cannot use any degree of parallelism, on an I7 2600 (4 core + 4 multi threaded). After 6 concurrent jobs it had no difference. Still, we started with 1:10 and we end up with less than the half time.

Now, this feature is by nature not usable for everyone, in my test i had 10 equal tables with 20m rows each , if i had 1 large table and several small ones then it wouldn't scale the same. Also, I was talking with a friend about this feature and he said: Who does pg_dump now days anyway? Everyone is doing live backups
- That's partially true, but pg_dump always will have its use.
I think this feature is great, maybe not the greatest for terabytes of enterprise databases but its good for those who use pg_dump for backups. Test systems, tight on diskspace installations, databases with little network bandwidth.



Thanks for reading
- Vasilis

Tuesday, 8 October 2013

The cost of Reliability

At 9th of September PostgreSQL 9.3 was released with many exciting new features. This post is the first out of many (?) that will explore these new features and will try to evaluate them. I will start with data checksums, a feature that makes possible for PostgreSQL to checksum data pages and report corruptions.
I always liked how PostgreSQL doesn't cut corners when it comes to reliability, even when it (and it usually does) comes with a cost in performance. but how does this work and how much is this cost ? Thats why I'm writing this post :)

Checksums are set immediately prior to flush out of shared buffers
and checked when pages are read in again.

I used 2 db clusters one with the feature enabled and one without it, same config file and 3 basic operations 1 insert 2 updates, the results :

No checksums:

monkey_no_sum=# insert into test_of_sum (id) select generate_series (1,10000000);
INSERT 0 10000000
Time: 26226.654 ms
monkey_no_sum=# update test_of_sum set name = 'The Monkey';
UPDATE 10000000
Time: 51903.185 ms
monkey_no_sum=# update test_of_sum set name = 'The Monkey', address = 'somewhere in London';
UPDATE 10000000
Time: 70125.720 ms


With checksums:

monkey_sum=# insert into test_of_sum (id) select generate_series (1,10000000);
INSERT 0 10000000
Time: 23724.373 ms
monkey_sum=# update test_of_sum set name = 'The Monkey';
UPDATE 10000000
Time: 74446.178 ms
monkey_sum=# update test_of_sum set name = 'The Monkey', address = 'somewhere in London';
UPDATE 10000000
Time: 78987.092 ms




Apart from the performance impact, where obviously there is an extra overhead when you checksum all data pages, there are other things that make this great feature not easy to use. First of all, its a cluster setting. Which means if you want it on your second db you cannot have it in the same data dir. Also you cannot disable it once the cluster is created.

Don't get me wrong, I LOVE having this ! I think its absolutely brilliant that i have the option to use it when i can. but that's the problem, i can't :(
If this feature was object wide (tables / indexes) then it would have application in production systems , being up to the hand of the dba where to enable it and where not to. Even if it was database wide, then i would enable it on a secondary database that i run on the same server which i dont care about performance. Or even turning this on and off would be very useful.

There are good reasons why this feature doesn't work the way i would like, its because the whole thing happens in shared buffers and shared buffers are a cluster entity. I'm pretty positive though that this is just the beginning and that we will soon see a switch on-off operation.
Until then enjoy having this on small dbs or when reliability REALLY matters.



-- Thank for reading