Friday, 9 June 2017

Tip for faster wal replay on a slave

I've been in situations where i need a slave db to replay a lot of wal files fast, and by a lot i mean tens of thousands. This could happen because of a reporting database refreshing or simply because a slave was down for an extended period of time. It's known that lowering shared_buffers speeds up wal replay for obvious reasons, but by how much ?

I did a benchmark on an old server and the results are interesting :

With 32GB of shared buffers and with 6390Mb of wals (1840 wal files)
it took 1408 seconds to complete the replay.

With 64MB of shared buffers and with 6510Mb of wals (1920 wal files)
it took 1132 seconds to complete the replay.

My test was done by stopping the slave, inserting 50 mil rows to a test table, wait for the wal transfer to complete, then stop the master and start the slave and watch OmniPITR logs.

The performance gain in wal replay was about 20% in postgres 10beta1 which doesn't sound bad, especially in times of need.


Thanks for reading
-- Vasilis

2 comments:

  1. Forgive me if I am being thick, but why would lowering shared_buffers have any impact on WAL replay speeds?

    ReplyDelete
    Replies
    1. 2 reasons, buffer management overhead and shared buffers getting full of dirty pages that have to be flushed when checkpoints happen. Actually if you try the same test in 9.0 you'll see see that it can be ~10 times slower with a large shared buffers setting.

      Delete