Friday, 19 May 2017

Reusing an old master (as slave) after a switchover

Todays blogpost is old news but it's not very well known and it has to do with how to add an old master after a slave has been promoted. Fujii Masao explained the situation in his patch back in the day.

So in todays post i will be demonstrating a combination of replication slots for retaining the writes on a promoted slave and how to re-introduce an old master to the replication.

Say that we have a master-slave setup, stop the master gracefully and create and activate a replica slot on the slave :

monkey=# SELECT * FROM pg_create_physical_replication_slot('this_is_a_replica_slot');
       slot_name        | xlog_position
------------------------+---------------
 this_is_a_replica_slot |
(1 row)

postgres@bf9823730feb:~$ pg_receivexlog -D . -S this_is_a_replica_slot -v -h 10.0.0.3 -U repuser
pg_receivexlog: starting log streaming at 0/4000000 (timeline 1) ^C
pg_receivexlog: received interrupt signal, exiting
pg_receivexlog: not renaming "000000010000000000000004.partial", segment is not complete
pg_receivexlog: received interrupt signal, exiting


This way all the changes after the slave promotion will be retained. Now , lets promote the slave and check the replication slot status.

postgres@bf9823730feb:~$ pg_ctl promote
server promoting

postgres@bf9823730feb:~$ psql -c "SELECT slot_name, database,active,  pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS retained_bytes FROM pg_replication_slots;" postgres
       slot_name        | database | active | retained_bytes
------------------------+----------+--------+----------------
 this_is_a_replica_slot |          | f      |           9056

As you can see , the new master is now retaining all writes. Now, to the old master, lets make a recovery.conf that looks like this :

standby_mode = 'on'
primary_conninfo = 'user=repuser host=10.0.0.3 port=5432 application_name=a_slave'
trigger_file = '/home/postgres/pgdata/finish.recovery'
primary_slot_name = 'this_is_a_replica_slot'
recovery_target_timeline = 'latest'

where host is obviously the new master. 
Start the old master and you should see something similar to:

LOG:  entering standby mode
LOG:  consistent recovery state reached at 0/5000098
LOG:  database system is ready to accept read only connections
LOG:  invalid record length at 0/5000098: wanted 24, got 0
LOG:  fetching timeline history file for timeline 2 from primary server
LOG:  started streaming WAL from primary at 0/5000000 on timeline 1
LOG:  replication terminated by primary server
DETAIL:  End of WAL reached on timeline 1 at 0/5000098.
LOG:  new target timeline is 2
LOG:  restarted WAL streaming at 0/5000000 on timeline 2
LOG:  redo starts at 0/5000098


This would also work if you had archiving enabled instead of replication slots. Given that you have plenty of available disk space on the new master, it should be fine to keep the old master down for any reasonable amount of time and re-introduce it to the replication without issues.

The only 2 things that you have to make sure is to shutdown postgres on old-master gracefully and that the slave has caught up before the promotion.


Thanks for reading.




Tuesday, 9 May 2017

PostgreSQL statistics as data

 Postgres provides a lot of information when it comes to statistics. Only problem is that all the statistics are frozen in time the moment you query the system catalog views. Sometimes you need to see whats happening over time or in the past,and this is easily solvable with a script and a crontab entry. At OmniTI we use a project called system_monitoring a lot. This is basically a perl scheduler that runs as a deamon exporting useful information from postgres (and more) to tab delimited text files for later investigation. If you have decent one liner skills this is very helpful: you can answer questions about what happened yesterday night, what was running, who locked who etc. It's very good but data analysis can be a pain in the ass sometimes, especially if you need to reimport to a db and analyze with SQL. So I decided to approach this in a different way, by storing the statistics to a schema. Thats why i wrote statz
Statz is a very simple python script that will gather all postgres statistics for a period of time in an interval, aggregate them and keep raw and aggregated data in its own schema. The raw data that it keeps are just a now(),* from the most important statistics tables :

  • pg_stat_user_indexes
  • pg_stat_activity
  • pg_locks
  • pg_stat_user_tables
  • pg_stat_database
  • pg_stat_bgwriter
But its relatively easy to add more, even application specific data.

How to use :

statz needs a database to connect (for now it's the same with the database that monitors), an interval and a total duration (interval and total duration should be in seconds).
So, something like:

./statz.py -c dbname=monkey -i 5 -t 120 

Would initialize a schema called statz (this for now is hardcoded)
if schema exists it will drop it (cascade) and recreate it.
In a loop that will last for <total time> , sleeping for <interval> it will keep output of select now(),* from the tables listed above, populating the following tables :

statz.backend_activity
statz.bgwriter_activity
statz.database_activity
statz.index_activity
statz.lock_activity
statz.table_activity

In the same loop it will aggregate and populate the following aggregated tables.
statz.database_activity_agg
statz.table_activity_agg



The aggregated tables look like this :
for the database:


snap_date                 | 2017-05-08 16:35:17
interval                  | 00:00:02
step                      | 00:00:10
datname                   | monkey
commits                   | 863
rows_returned             | 21761
rows_fetched              | 9102
rows_inserted             | 1662
rows_updated              | 2251
rows_deleted              | 0
blocks_read               | 89
blocks_hit_cached         | 57572
commits_per_sec           | 432
rows_returned_per_sec     | 10881
rows_fetched_per_sec      | 4551
rows_inserted_per_sec     | 831
rows_updated_per_sec      | 1126
rows_deleted_per_sec      | 0
blocks_read_per_sec       | 45
blocks_hit_cached_per_sec | 28786
txn_per_sec               | 432
cache_hit_ratio           | 99.8456

and for each table :

snap_date                  | 2017-05-08 16:35:49
interval                   | 00:00:02
step                       | 00:00:42
table_name                 | public.pgbench_tellers
seq_scans                  | 0
seq_rows_read              | 0
index_scans                | 5807
index_rows_fetched         | 5807
rows_inserted              | 0
rows_updated               | 5807
rows_deleted               | 0
rows_hot_updated           | 5759
live_row_count             | 0
n_dead_tup                 | 634
seq_scans_per_sec          | 0
seq_rows_read_per_sec      | 0
index_scans_per_sec        | 5807
index_rows_fetched_per_sec | 2904
rows_inserted_per_sec      | 0
rows_updated_per_sec       | 2904
rows_deleted_per_sec       | 0
rows_hot_updated_per_sec   | 2880
live_row_count_per_sec     | 0
n_dead_tup_per_sec         | 317


I have also included a couple of views that are useful for fast monitoring :

statz.db_stats_per_sec 
statz.table_stats_per_sec

Keep in mind that the numbers are being calculated using the last inserted row and since its in the same loop with populating the historical tables the information will be refreshed every <interval> 
Which makes this a great tool for things like :

watch -n 2 "psql -x -c 'select * from statz.database_activity_agg order by step desc limit 1;' monkey"

You can even create and send graphs like :

Usecases


Statz is gathering a lot of data so it's not meant to run 24/7. Thats why it has an interval and a total time and thats why it will destroy old statistical data when re-run
Say that you observed that during night, queries are getting slower or wal file generation is increased or any abnormality (locks / unknown statements / rollbacks etc). You can schedule statz to run for this period of time and analyze data next morning. Keeping snapshots of pg_stat_activity is also helpful for slow query investigation where you can do things like :

select (snap_date - xact_start) as age,user,query from backend_activity order by 1 desc;

This script is also useful for analyzing application specific usage patterns, To get statistics from benchmarks and fine tune checkpoints , missing indexes etc..

And because it all happens in a single transaction, snap_date can be used as a key linking all statz tables.
This means that if you see database doing a lot of commits for example, you can use snap_date to find the busy table in statz.table_activity or you can check locks or statements run at that specific time in the rest of the tables.

link for statz can be found here.

Next steps

I plan on the following things :
  • Refactor schema / statements for performance.
  • More aggregating statements for the rest of the tables.
  • Possibly (live) graphs or maybe a dashboard.
  • Possibly rewrite some parts for better general use.
  • Lately im reading a book about machine learning in python i might experiment with that a bit.

Thanks for reading.