Wednesday, 6 February 2019

Managing xid wraparound without looking like a (mail) chimp

My colleague Payal came across an outage that happened to mailchimp's mandrill app yesterday, link can be found HERE.
Since this was PostgreSQL related i wanted to post about the technical aspect of it.
According to the company :

“Mandrill uses a sharded Postgres setup as one of our main datastores,”
the email explains.
“On Sunday, February 3, at 10:30pm EST, 1 of our 5 physical Postgres instances saw a significant spike in writes.” 
The email continues:
The spike in writes triggered a Transaction ID Wraparound issue. When this occurs, database activity is completely halted. The database sets itself in read-only mode until offline maintenance (known as vacuuming) can occur.”

So, lets see what that "transaction id wraparound issue" is and how someone could prevent similar outages from ever happening.

PostgreSQL uses MVCC to control transaction visibility, basically by comparing transaction IDs (XIDs). A row with an insert XID greater than the current transaction  XID shouldn't be visible to the current transaction. But since transaction IDs are not unlimited a cluster will eventually run out after
(2^32 transactions 4+ billion) causing transaction ID wraparound: transaction counter wraps around to zero, and all past transaction would appear to be in the future

This is being taken care of by vacuum that will mark rows as frozen, indicating that they were inserted by a transaction that committed far in the past that can be visible to all current and future transactions. To control this behavior, postgres has a configurable called autovacuum_freeze_max_age, which defaults at 200.000.000 transactions, a very conservative default that must be tuned in larger production systems.

It sounds complicated but its relatively easy not to get to that point,for most people just having autovacuum on will prevent this situation from ever happening. You can simply schedule manual vacuums by getting a list of the tables "closer" to autovacuum_freeze_max_age with a simple query like this:


 SELECT 'vacuum analyze ' || c.oid::regclass ||' /* manual_vacuum */ ;'  
 FROM pg_class c LEFT JOIN pg_class t ON c.reltoastrelid = t.oid  
 WHERE c.relkind = 'r'   
 ORDER BY greatest(age(c.relfrozenxid),age(t.relfrozenxid)) desc  
 LIMIT 10;  

Even if you want to avoid manual vacuums, you can simply create a report or a monitoring metric based on age of relfrozenxid of pg_class combined with pg_settings, eg :


 SELECT oid::regclass::text AS table,pg_size_pretty(pg_total_relation_size(oid)) AS table_size,age(relfrozenxid) AS xid_age,   
 (SELECT setting::int FROM pg_settings WHERE   
  name = 'autovacuum_freeze_max_age') -   
  age(relfrozenxid) AS tx_for_wraparound_vacuum   
 FROM pg_class    
 WHERE relfrozenxid != 0   
 AND oid > 16384   
 ORDER BY tx_for_wraparound_vacuum ;   

But, lets assume that you got to the point that you started seeing "autovacuum: VACUUM tablename (to prevent wraparound)"

This vacuum will more likely happen when you don't want it and even if you kill it, it will keep respawning.
If you already set autovacuum_freeze_max_age to a more viable production setting, we usually set it at 1.5bil, you can change autovacuum_freeze_max_age to a higher value, say 3 billion and immediately kick of a manual vacuum on the table with a high maintenance_work_mem value. This vacuum will be a "normal vacuum" hence live. If the table is so big that each manual vacuum take days to complete, then you should've partitioned it...Proper schema and proper tuning of autovacuum are really important, especially in write heavy workloads.

And to get back to the outage, the mail from the company insinuates that it was a problem with postgres, reality is that it wasn't, it was clearly an OPS oversight.


Thanks for reading
Vasilis Ventirozos
credativ LLC






Friday, 25 January 2019

PostgreSQL 12 : Where in copy from

5 days ago a commit made it to 12devel that implements WHERE clause in COPY FROM.
Today we're gonna see how it works and see how someone could achieve the same by using file_fdw.

To begin with, lets create a table and put some data in it.

create table test (
id int,
date timestamp without time zone,
);

insert into test (id,date) select generate_series (1,1000000)            ,'2015-01-01';
insert into test (id,date) select generate_series (1000001,2000000),'2016-01-01';
insert into test (id,date) select generate_series (2000001,3000000),'2017-01-01';
insert into test (id,date) select generate_series (3000001,4000000),'2018-01-01';
insert into test (id,date) select generate_series (4000001,5000000),'2019-01-01';

now, lets make this a bit larger than 170MB, dump the data in csv and truncate the table :

monkey=# insert into test select * from test;
INSERT 0 5000000
monkey=# insert into test select * from test;
INSERT 0 10000000
monkey=# select count(*) from test ;
  count
----------
 20000000
(1 row)

monkey=# copy test to '/home/vasilis/test.csv' with csv;
COPY 20000000

monkey=# truncate test;
TRUNCATE TABLE

vasilis@Wrath > ls -lh ~/test.csv
-rw-r--r-- 1 vasilis vasilis 759M Jan 25 12:24 /home/vasilis/test.csv

Our test file is about 750Mb, now with an empty table , lets import only the rows that are up to 2 years old :

monkey=# copy test from '/home/vasilis/test.csv' with csv where date >= now() - INTERVAL '2 year' ;
COPY 8000000
Time: 17770.267 ms (00:17.770)

It worked , awesome !

Now, lets compare to the alternative, file_fdw :

monkey=# CREATE EXTENSION file_fdw;
CREATE EXTENSION
Time: 7.288 ms

monkey=# CREATE SERVER pgtest FOREIGN DATA WRAPPER file_fdw;
CREATE SERVER
Time: 3.957 ms

monkey=# create foreign table file_test (id int, date timestamp without time zone, name text) server pgtest options (filename '/home/vasilis/test.csv', format 'csv');
CREATE FOREIGN TABLE
Time: 16.463 ms

monkey=# truncate test ;
TRUNCATE TABLE
Time: 15.123 ms

monkey=# insert into test select * from file_test where date >= now() - INTERVAL '2 year' ;
INSERT 0 8000000
Time: 21368.595 ms (00:21.369)


I automated the test, and copy was (obviously) always faster. I'd imagine that the key benefit here is not performance, which is inherited by COPY itself but extending the functionality of COPY, which is good, especially for workloads that export data , ETL and load to another DB, or for restoring specific rows from backups (assuming pg_dump backups is still a thing).




Thanks for reading
Vasilis Ventirozos
credativ LLC

Managing xid wraparound without looking like a (mail) chimp

My colleague Payal came across an outage that happened to mailchimp's mandrill app yesterday, link can be found HERE . Since this was P...