Setting shared_buffers the hard way

One of the main performance parameters in PostgreSQL is shared_buffers, probably the most important one, there are guidelines and rules of thumb that say just set it to 20-30% of your machine total memory.
Don't get me wrong , these rules are generaly good and its a perfect starting point, but there are reasons why you should tune your shared buffers in more detail

a. 30% might not be enough and you will never know if you dont know exactly how to set this parameter
b. 30% might be a lot and you spend resources in vain.
c. you want to be awesome and tune every bit of your DB the optimal way.

What is shared buffers though ?
Shared buffers defines a block of memory that PostgreSQL will use to hold requests that are awaiting attention from the kernel buffer and CPU.
so basically PostgreSQL will put temporarily data blocks in the memory in order to process - EVERYTHING will go through the shared buffers.

Why not set shared_buffers to 80% of ram in a DB dedicated server ?
The OS also has cache, and if you set shared_buffers too high you will most likely have an overlap which called double buffering, having datablocks on both caches.

So, in order to set your shared_buffers you need to know what's happening inside shared memory.PostgreSQL has an implementation called clock-sweep algorithm, so everytime you use a datablock a usage counter is making that block harder to get rid of. the block gets a popularity number from 1-5 with 5 being heavily used and most likely it will stay in shared memory.
In theory you want the most popular data blocks in the shared buffers and the least popular ones out of it. To do that you have to be able to see what is inside the shared buffers, and thats exactly what pg_buffercache package does.You will find pg_buffercache in the contrib.

lets create 2 tables and full join them, update them and generaly run operations on these 2 tables while monitoring the buffer cache
I will give dramatic examples by setting shared_buffers too low , default and too high just to demonstrate what pg_buffercache views will show and then i i will find a good value for this specific workflow.i will run the same statements while i analyze what is happening inside shared buffers.
the to most very useful sql statements from the pg_buffercache views are the following :

-- buffers per relation and size
SELECT
c.relname,
pg_size_pretty(count(*) * 8192) as buffered,
round(100.0 * count(*) /
(SELECT setting FROM pg_settings
WHERE name='shared_buffers')::integer,1)
AS buffers_percent,
round(100.0 * count(*) * 8192 /
pg_relation_size(c.oid),1)
AS percent_of_relation
FROM pg_class c
INNER JOIN pg_buffercache b
ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d
ON (b.reldatabase = d.oid AND d.datname = current_database())
where pg_relation_size(c.oid) >0
GROUP BY c.oid,c.relname
ORDER BY 3 DESC
LIMIT 10;

-- buffers per usage count
SELECT
c.relname, count(*) AS buffers,usagecount
FROM pg_class c
INNER JOIN pg_buffercache b
ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d
ON (b.reldatabase = d.oid AND d.datname = current_database())
GROUP BY c.relname,usagecount
ORDER BY usagecount,c.relname


RESULTS :

512kB

relname              | buffered | buffers_percent | percent_of_relation
----------------------------------+----------+-----------------+---------------------
 parents_id_idx                       | 56 kB   |        10.9 |              0.1
 pg_operator                                  | 40 kB    |             7.8 |                35.7
 pg_amop                                        | 32 kB    |            6.3 |              100.0
 pg_statistic                                    | 24 kB    |             4.7 |                20.0
 pg_opclass                                     | 16 kB    |            3.1 |              100.0
 pg_operator_oid_index                 | 16 kB    |             3.1 |                50.0
 pg_statistic_relid_att_inh_index  | 16 kB    |             3.1 |                50.0
 pg_amop_opr_fam_index              | 16 kB    |             3.1 |                50.0
 pg_amproc_fam_proc_index         | 16 kB    |             3.1 |                50.0
 pg_amop_fam_strat_index            | 16 kB    |             3.1 |                50.0

 0.1% of one of my index in the buffer, there is no space for more

24Mb (default value)

 relname                          | buffered |     buffers_percent |    percent_of_relation
----------------------------------+----------+-----------------+---------------------
 child_id_idx                               | 12 MB  |        48.3 |             18.0
 children                                    | 11 MB  |        46.4 |              11.2
 parents                                     | 312 kB |          1.3 |               0.3
 pg_opclass_am_name_nsp_index     | 16 kB    |             0.1 |               100.0
 pg_namespace_oid_index                 | 16 kB    |             0.1 |               100.0
 pg_operator_oid_index                     | 24 kB    |             0.1 |                75.0
 pg_statistic_relid_att_inh_index      | 24 kB    |             0.1 |                75.0
 pg_index                                            | 16 kB    |             0.1 |                66.7
 pg_opclass_oid_index                       | 16 kB    |             0.1 |               100.0
 pg_amop_opr_fam_index                  | 24 kB    |             0.1 |                75.0
(10 rows)  

part of the index and part of children table is in the buffers , still not enough


5Gb
(dont forget to set your kernel resource limits)
Envy ~ # sysctl kernel.shmmax=5387018240
kernel.shmmax = 5387018240
Envy ~ # sysctl kernel.shmall=2052297
kernel.shmall = 2052297

 relname              |  buffered  | buffers_percent | percent_of_relation
----------------------------------+------------+-----------------+---------------------
 children                                  | 149 MB |          3.0 |        100.0
 parents                                   | 117 MB |          2.3 |        100.0
 child_id_idx                             | 64 MB   |          1.3 |        100.0

 parents_id_idx                        | 64 MB   |          1.3 |         100.0
 pg_statistic_relid_att_inh_index   | 32 kB      |             0.0 |           100.0
 pg_opclass_oid_index                    | 16 kB      |             0.0 |           100.0
 pg_namespace                               | 8192 bytes |         0.0 |           100.0
 pg_namespace_oid_index             | 16 kB      |             0.0 |           100.0
 pg_operator_oid_index                 | 32 kB      |             0.0 |           100.0
 pg_rewrite_rel_rulename_index  | 16 kB      |             0.0 |           100.0

as you may see now we have the full tables and their indexes inside our buffer cache , and thats what we want, to have the popular datablocks there. aparently we've set too much memory because
a.buffers_percent is too low
b.if we sum up all buffered data we will see that its  close to 394 , so a setting close to 500-600mb would be optimal value FOR THE OPERATIONS that are performed in this database.

I havent mentioned how much ram i have on this machine, i do have 6Gb.If i set shared_buffers to 30% of my ram it would be 1.8Gb, and i just proved that i need 600mb for this database, that the 1/3 of the recommended value. I also havent talked about the performance gain, for setting this parameter even if its really out of the scope of this post, i had timing on and i can tell you that the optimal numbers are when shared buffers are close to the optimal.

In real life scenarios you will have much more data than my dummy 2 table database and tuning this parameter will be much , much harder, here comes the second sql statement that i've showed you, the one with the popular pages. Every database , every workflow needs special tunning and thats why there is no  magic percentage on setting this parameter, keep the popular pages (popularity 3+) inside buffer cache , leave the rest to the OS.
20-30% is indeed a good starting point but it wont hurt after the database has run its workflow for a while to check whats happening inside the shared buffers, you might need to increase or decrease on the next restart.

thanks for reading
- Vasilis

Comments

Popular posts from this blog

Tuning checkpoints

AWS Aurora Postgres, not a great first impression

Setting up Postgres Automated Failover