Monday, 30 December 2013

pgreplay-ing logfiles

Recently I wanted to find something to rerun a workload (from logfiles) in postgres. the reasons ? benchmark , testing , troubleshooting.
My colleague Denish reminded me a project called pgreplay. pgreplay reads a PostgreSQL logfile , extracts the SQL statements and executes them in the same order and with (or without) the original timing against a PG database.

I download it, compiled it and went straight to the testing.

The setup :

log_min_messages = error  (or more)
   (if you know that you have no cancel requests, 'log' will do)
log_min_error_statement = log  (or more)
log_connections = on
log_disconnections = on
log_line_prefix = '%m|%u|%d|%c|'  (if you don't use CSV logging)
log_statement = 'all'
lc_messages must be set to English (the encoding does not matter)
bytea_output = escape  (from version 9.0 on, only if you want to replay
                        the log on 8.4 or earlier)

It is highly recommended that you use CSV logging, because anything that
the PostgreSQL server or any loaded modules write to standard error will
be written to the stderr log and might confuse the parser.


For small and simple workloads it seemed that it worked fine. Bellow the output from a simple workload :

Replay statistics
=================

Speed factor for replay: 1.000
Total run time: 29.700 seconds
Maximum lag behind schedule: 0 seconds
Calls to the server: 8
(0.269 calls per second)
Total number of connections: 1
Maximum number of concurrent connections: 1
Average number of concurrent connections: 1.000
Average session idle percentage: 99.680%
SQL statements executed: 6
(0 or 0.000% of these completed with error)
Maximum number of concurrent SQL statements: 1
Average number of concurrent SQL statements: 0.003
Average SQL statement duration: 0.016 seconds
Maximum SQL statement duration: 0.038 seconds
Statement duration histogram:
0 to 0.02 seconds: 83.333%
0.02 to 0.1 seconds: 16.667%
0.1 to 0.5 seconds: 0.000%
0.5 to 2 seconds: 0.000%
over 2 seconds: 0.000%


So far so good, i wanted to see what will happen when concurrency occurred so i tried with pgbench , with 2 concurrent everything was ok, so i tried to scale it up a bit to 16 concurrent transactions, and then these nice messages started showing :
"Execution is 30 minutes behind schedule"

The reason ? Locks Lots and lots of Locks ! but let's see closer what happened...
While pgreplay was running trying to handle my logfile the best way it could, a simple lock investigating query was enough to show me that i would never get the successful prompt that i wanted :

monkey=#   SELECT bl.pid     AS blocked_pid,
monkey-#          a.usename  AS blocked_user,
monkey-#          kl.pid     AS blocking_pid,
monkey-#          ka.usename AS blocking_user,
monkey-#          a.query    AS blocked_statement
monkey-#    FROM  pg_catalog.pg_locks         bl
monkey-#     JOIN pg_catalog.pg_stat_activity a  ON a.pid = bl.pid
monkey-#     JOIN pg_catalog.pg_locks         kl ON kl.transactionid = bl.transactionid AND kl.pid != bl.pid
monkey-#     JOIN pg_catalog.pg_stat_activity ka ON ka.pid = kl.pid
monkey-#    WHERE NOT bl.granted;

 blocked_pid | blocked_user | blocking_pid | blocking_user |                          blocked_statement                          
-------------+--------------+--------------+---------------+----------------------------------------------------------------------
        7398 | vasilis      |         7403 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7398 | vasilis      |         7399 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7398 | vasilis      |         7404 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7398 | vasilis      |         7402 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7402 | vasilis      |         7398 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7402 | vasilis      |         7403 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7402 | vasilis      |         7399 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7402 | vasilis      |         7404 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7403 | vasilis      |         7398 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7403 | vasilis      |         7399 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7403 | vasilis      |         7404 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7403 | vasilis      |         7402 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7404 | vasilis      |         7398 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7404 | vasilis      |         7403 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7404 | vasilis      |         7399 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7404 | vasilis      |         7402 | vasilis       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
        7409 | vasilis      |         7404 | vasilis       | UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;


Deadlock, the reason ?
(quote from pgreplay documentation) :

While pgreplay makes sure that commands are sent to the server in the order in which they were originally executed, there is no way to guarantee that they will be executed in the same order during replay: Network delay, processor contention and other factors may cause a later command to "overtake" an earlier one. While this does not matter if the commands don't affect each other, it can lead to SQL statements hitting locks unexpectedly, causing replay to deadlock and "hang". This is particularly likely if many different sessions change the same data repeatedly in short intervals.

And that was my case, lots of processes were changing the same data.
For the record, i also set up a simple troubleshooting project that works on postgres, with 2 users doing their own thing pgreplay worked fine.

conclusion :
Ok , pgreplay didn't do the job for me, i am pretty sure that if i got the logs from a production DB i would run into deadlocks, still messing a bit with it was good because i could use this for replaying simple workload logfiles to a different machine or just because its a cool project that does what it does and you never know when you might need something like this..



Wednesday, 18 December 2013

A Problem Geeked Into Big Data

Today my article got published in OmniTI Seeds. it was a very interesting project, i had a lot of fun while doing it, and even if its not exactly Postgres related, I wouldn't be able to do it without Postgres. It begun from the basic everyday operation that all dba's are doing (log parsing / analysis) but it got to a whole different level. I hope you like it.

A Problem Geeked Into Big Data

cooking with fdw

Back in 9.1 a feature called foreign data wrappers was introduced, you could basically connect a PostgreSQL Server to remote data stores (other dbs, csv files etc). Today i plan to play with this FDW and see what and how we can use it.

First of all there are many FDW's, a list can be found here.

For the sake of this post i used the following data wrappers :

     Name     | Version |   Schema   |                    Description                    
--------------+---------+------------+----------------------------------------------------
 file_fdw     | 1.0     | public     | foreign-data wrapper for flat file access
 postgres_fdw | 1.0     | public     | foreign-data wrapper for remote PostgreSQL servers
 redis_fdw    | 1.0     | public     | Foreign data wrapper for querying a Redis server


file_fdw and postgres_fdw are both available in contrib, redis fdw can be found in the following link. you will also need to compile and install hiredis which can be found here. I wont go much into detail on how to install this, but there is plenty documentation available.

The plan is to connect a redis server a remote postgreSQL server and a csv and interact with everything.

First thing that has to be done after the extensions have been created is to create foreign servers and tables, the following example shows what i did with the remote postgreSQL server :

test_redis=# CREATE SERVER node_2
FOREIGN DATA WRAPPER postgres_fdw  
OPTIONS (host '192.168.0.3', dbname 'postgres');     
CREATE SERVER
test_redis=# CREATE USER MAPPING FOR PUBLIC SERVER node_2;
CREATE USER MAPPING
test_redis=# CREATE FOREIGN TABLE test_pg (id int)
SERVER node_2 OPTIONS (table_name 'test_pg');
CREATE FOREIGN TABLE


csv and redis foreign tables were really similar to create, so when i was done i had this :

 public | db15       | foreign table | vasilis
 public | pgcsv      | foreign table | vasilis
 public | test       | table         | vasilis
 public | test2_pg   | foreign table | vasilis





one local table and one for each of my foreign wrappers. So, lets start playing around.

test_redis=# begin;
BEGIN
test_redis=# insert into test2_pg values ('koko','lala');
INSERT 0 1
test_redis=# commit;
COMMIT



Yep, i could insert into a remote PG server, in a transactional way, goodbye dblink ? (oh yes!)
Unfortunately this wouldn't work on redis, but only because the fdw doesn't support it (yet?) .
Now lets join them all and see how that works.

test_redis=# select * from db15,pgcsv,test2_pg,test where db15.key=test2_pg.key and pgcsv.key=db15.key and db15.key=test.key and test2_pg.key=db15.key;
 key  | value | id | key  |    date    |   time   | num | key  | value | key  | value 
------+-------+----+------+------------+----------+-----+------+-------+------+--------
 koko | koko1 |  1 | koko | 2013-12-01 | 00:01:01 | 234 | koko | lala  | koko | ninini


the statement is not something that is actually usable, but it demonstrates by the execution plan that all foreign tables have been accessed :

test_redis=# explain analyze select * from db15,pgcsv,test2_pg,test where db15.key=test2_pg.key and pgcsv.key=db15.key and db15.key=test.key and test2_pg.key=db15.key;
                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=111.24..156.43 rows=1 width=244) (actual time=0.981..1.031 rows=1 loops=1)
   Join Filter: (test2_pg.key = db15.key)
   Rows Removed by Join Filter: 2
   ->  Nested Loop  (cost=101.24..143.39 rows=1 width=180) (actual time=0.857..0.862 rows=1 loops=1)
         ->  Hash Join  (cost=101.11..142.55 rows=5 width=116) (actual time=0.833..0.835 rows=1 loops=1)
               Hash Cond: (test2_pg.key = pgcsv.key)
               ->  Foreign Scan on test2_pg  (cost=100.00..137.90 rows=930 width=64) (actual time=0.767..0.768 rows=1 loops=1)
               ->  Hash  (cost=1.10..1.10 rows=1 width=52) (actual time=0.049..0.049 rows=3 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 1kB
                     ->  Foreign Scan on pgcsv  (cost=0.00..1.10 rows=1 width=52) (actual time=0.033..0.042 rows=3 loops=1)
                           Foreign File: /home/vasilis/lala.csv
                           Foreign File Size: 98
         ->  Index Scan using kokoako on test  (cost=0.12..0.16 rows=1 width=64) (actual time=0.019..0.021 rows=1 loops=1)
               Index Cond: (key = test2_pg.key)
   ->  Foreign Scan on db15  (cost=10.00..13.00 rows=3 width=64) (actual time=0.071..0.162 rows=3 loops=1)
         Foreign Redis Table Size: 3
 Total runtime: 2.184 ms
(17 rows)

This is awesome from many aspects, imagine that you have a heterogeneous environment with mongo, redis, postgres , oracle ,csv logs etc.. you can put a postgreSQL in the middle, acting like a bridge and create supersets, subsets, aggregations, views,temp tables, report tables and so many other things. You could even limit users to access only the "bridge postgres" so they can see whatever you want from many different systems. you can use SQL in redis since redis commands are so limited. and because i am starting getting a bit paranoid i will leave it up to you on how to use this :)


Thanks for reading
-- Vasilis





Tuesday, 17 December 2013

scaling postgres with pgpool-II

Recently a new release of pgpool-II was released, 3.2.7 (released 2013-12-06). For connection pooling I mostly have experience with pgbouncer and in OmniTI we use it a lot. I heard that pgpool has some issues and because of its increased complexity i don't know if i would recommend it as a connection pool but pgpool does much more than that.

What is pgpool-II (from pg-pool wiki) :


pgpool-II is a middleware that works between PostgreSQL servers and a PostgreSQL database client. It is licensed under BSD license. It provides the following features.
  • Connection Pooling
pgpool-II saves connections to the PostgreSQL servers, and reuse them whenever a new connection with the same properties (i.e. username, database, protocol version) comes in. It reduces connection overhead, and improves system's overall throughput.
  • Replication
pgpool-II can manage multiple PostgreSQL servers. Using the replication function enables creating a realtime backup on 2 or more physical disks, so that the service can continue without stopping servers in case of a disk failure.
  • Load Balancing
If a database is replicated, executing a SELECT query on any server will return the same result. pgpool-II takes an advantage of the replication feature to reduce the load on each PostgreSQL server by distributing SELECT queries among multiple servers, improving system's overall throughput. At best, performance improves proportionally to the number of PostgreSQL servers. Load balance works best in a situation where there are a lot of users executing many queries at the same time.
  • Limiting Exceeding Connections
There is a limit on the maximum number of concurrent connections with PostgreSQL, and connections are rejected after this many connections. Setting the maximum number of connections, however, increases resource consumption and affect system performance. pgpool-II also has a limit on the maximum number of connections, but extra connections will be queued instead of returning an error immediately.
  • Parallel Query
Using the parallel query function, data can be divided among the multiple servers, so that a query can be executed on all the servers concurrently to reduce the overall execution time. Parallel query works the best when searching large-scale data.

The feature that im mostly interested and that i want to explore in this post is load ballancing. I like this feature because without changing the application you can elevate your replica one step and make it even more useful.

I setup 3 virtual machines that i called pgpool1, pgnode1, pgnode2.
I setup a basic installation of debian and i compiled pgpool-II (3.3.2) and PostgreSQL 9.3.2.
As a first step i setup a basic streaming replication between pgnode1 and pgnode2 and i set pgpool1 for Load ballancing, all configurations can be found in my github. I wont go much into detail on how to set pgpool up, my configuration files should be a good guide and i found that pgpool documentation is very informative and up to date, also many before me have blogged about how to configure it.
I will stand more on the design and what could you do with this.
As we all know streaming replication is almost instant so separating "read" from "write" can be beneficial in a large variety of applications, but especially for more reads than writes, so lets see how this works :

root@pgpool1:~# psql -p 9999 -U postgres
postgres=# create database monkey;
CREATE DATABASE
postgres=# \connect monkey;
You are now connected to database "monkey" as user "postgres".
monkey=# create table test (id int);
CREATE TABLE
monkey=# insert into test values (1);
INSERT 0 1
monkey=# select * from test;
 id
----
  1
(1 row)

With full logging enabled in the dbs lets see the logs from pgnode1 and pgnode2 :

pgnode1:
LOG:  duration: 342.044 ms  statement: create database monkey;
LOG:  duration: 4.646 ms  statement: create table test (id int);
LOG:  duration: 1.733 ms  statement: insert into test values (1);

pgnode2:
LOG:  duration: 1.018 ms  statement: select * from test;

OK it works as expected, now i heard that this doesnt work if select has a space in the begining so i tried to test that as well :

on pgnode2 (replica) :
LOG:  duration: 0.757 ms  statement:   selecT * from test;

LOG:  duration: 0.215 ms  statement: SeLeCt   * from test /* this is my test */;

something more complicated maybe? something like pgbench sounds appropriate.

I wont spam more logs, it worked as it should.
But what would happen with 2 replicas ? i know that read operations should be scaled and that they should be distributed among the slaves, i created another database cluster on pgnode2 (port 5433) and made it replica to pgnode1,
so now i had one master running on pgnode1 and two slaves running on pgnode2 (5432 and 5433).
I changed pgpool configuration to add the new db :

backend_hostname0 = '192.168.0.202'
backend_port0 = 5432
backend_weight0 = 0
backend_data_directory0 = '/opt/pgdata'

backend_hostname1 = '192.168.0.203'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/opt/pgdata'

backend_hostname2 = '192.168.0.203'
backend_port2 = 5433
backend_weight2 = 1
backend_data_directory2 = '/opt/pgdata2'


and from pgpool1 i run :
while true; do psql -p 9999  -c 'selecT * from test;'  -U postgres ;done

and VoilĂ  , i had my selects distributed on my 2 slaves.

pgpool has A LOT of cool features, im not in position to say if they all work as they should but from my tests on load ballancing everything was smooth, pgpool has also a web based gui worth checking and some system views that can be used to monitor the activity, also supports automatic failover, watchdog and many many other neat features that i might explore in another post.

Next time i will explore Parallel query.

all the configurations can be found in my github (including the postgres configs that were used).


Thanks for reading
- Vasilis