Tuning checkpoints

We recently had the chance to help a customer with some IO related issues that ended up being unconfigured checkpoints. Something that may not always be obvious but can actually be somewhat common.

Let's start with how things roughly work.
Postgres smallest IO unit is a disk block that is 8kb (by default). Each time postgres needs a new block it will fetch it from the disks and load it to an area in RAM called shared_buffers.
When postgres needs to write, it does it in the same manner:
  • Fetches the block(s) from the disks if the block is not in shared_buffers
  • Changes the page in shared buffers.
  • Marks the page as changed (dirty) in shared buffers.
  • It writes the change in  a "sequential ledger of changes" called WAL to ensure durability.

This basically means that the writes are not yet "on disk". This operation is taken care of by a postgres process called checkpointer. Checkpoints are how postgres guarantees that data files and index files will be updated with all the changes that happened before that checkpoint. In case of a crash, postgres will go back to the latest checkpoint record and it will start a REDO operation from WAL. Checkpoints are triggered every checkpoint_timeout (default at 5min) or when changes reach max_wal_size (default at 1GB). This is an IO intensive operation and postgres tries to spread this IO
with checkpoint_completion_target (default at 0.5).

checkpoint_timeout* is the maximum time between checkpoints in seconds.
min_wal_size minimum size of wals that will be recycled rather than removed
max_wal_size** maximum size allowed for wals between checkpoints
checkpoint_completion_target allows data changes to spread over a longer period of time, making the  final fsync() much cheaper.
* Affects recovery time, change only after reviewing the documentation
** This is a softmax, it can exceed this value in special cases.

Best way to start is to set checkpoint_timeout value to something reasonable and set max_wal_size high enough so you won't reach the timeout. To make sense of what is reasonable, you can do the following: schedule (cronjob will do) something like this to run in short periods of time, say every minute:

psql -XqtA -c "copy (select now()::timestamp(0), pg_current_wal_insert_lsn()) to stdout with csv;" monkey >> current_lcn

Leave it running for as long as you see fit, from the result you can extract the difference of 2 locations in bytes like this:

monkey=# SELECT pg_size_pretty(pg_wal_lsn_diff('0/B1277248','0/59CEA2F8'));
 1398 MB
(1 row)

(protip: file_fdw + window function)

This function will calculate the difference of later location - earlier location in bytes, so having the location per minute can help calculate (or graph) the rate of changes over time.
With that number, your storage capabilities and your recovery needs in mind you should be able to come up with a good starting point.
max_wal_size should be set high enough so it won't be reached before the timeout. The rate of changes figure we calculated earlier should be a good indication of where to start. min_wal_size has to follow common sense and leave a small portion of wal files to be recycled for the next checkpoint.
checkpoint_completion_target often is just set to 0.9 but if you want to be more "precise" it follows the following rule:
(checkpoint_timeout - 2min) / checkpoint_timeout

To watch how checkpointer is working after the changes, you can query pg_stat_bgwriter. This is a very interesting view that probably deserves it's own blogpost - because of the data you can extract from it - but today we are going to concentrate on 2 columns, checkpoint_timed and checkpoint_req.

checkpoint_timed counts checkpoints that happened triggered by checkpoint_timeout and checkpoint_req is the count of the checkpoint that happened because max_wal_size was hit.
What you want to see there is the majority of the checkpoints to be timed.

And because graphs are cool, you can see an example of a production system before and after checkpoint configuration. The graph is write IOPS over time on an AWS RDS database.
In RDS specifically, i find it weird that checkpoint defaults are so low considering the high shared_buffers default it has.

Thanks for reading
-- Vasilis Ventirozos
-- credativ LLC


  1. I have a big buffers_clean and small buffers_checkpoint any idea?
    buffers_checkpoint | buffers_clean | buffers_backend
    2939561 | 19872289 | 6015787

  2. There are checkpoints_timed and checkpoints_req columns in the view, not checkpoint_timed and checkpoint_req.

  3. Poker online situs terbaik yang kini dapat dimainkan seperti Bandar Poker yang menyediakan beberapa situs lainnya seperti , kemudian,, dan yang paling akhir yaitu Jangan lupa mendaftar di panenqq silakan dicoba bosku serta salam hoki

  4. I think you did an awesome job explaining it. Sure beats having to research it on my own. Thanks
    BCOM 1st, 2nd & 3rd Year Time Table 2020

  5. The main motive of the Big data engineering automation is to spread the knowledge so that they can give more big data engineers to the world.

  6. Are you looking for where to buy hemp oil in Australia? Visit Ricks Hemp Oil store to choose from a premium organic range of hemp seed oil products today!
    Hemp Oil Launceston
    Hemp Oil Mackay
    Hemp Oil Rockhampton

  7. Feeling good to read such a informative blog, mostly i eagerly search for this kind of blog. I really found your blog informative and unique, waiting for your new blog to read.
    Digital marketing Service in Delhi
    SMM Services
    PPC Services in Delhi
    Website Design & Development Packages
    SEO Services Packages
    Local SEO services
    E-mail marketing services
    YouTube plans

  8. Hi there colleagues, good paragraph and good urging commented here, I am genuinely enjoying by these. Result BA Third Part

  9. Interesting article, thank you so much for sharing this amazing blog with us. Visit Ogen Infosystem for creative Website Designing and SEO Services in Delhi, India.
    SEO Service in Delhi

  10. The actual demand for id verification service is much greater these days. Trust Swiftly is the sole platform that has many verification tactics, and a corporation can bring much better safety to their company by implementing all the verification approaches. In case you explore this particular TRUST SWIFTLY web site, you will definitely get a growing number of information regarding id verification service.


Post a comment

Popular posts from this blog

Accessing PostgreSQL data from AWS Lambda

AWS Aurora Postgres, not a great first impression