PostgreSQL on ZFS with BPF tracing on top.

At OmniTI we love solaris, my personal favourite features are ZFS and DTrace. Unfortunately not many run postgres on solaris so i have decided to implement similar features in linux. Instead of Dtrace i'll install BPF, in-kernel bytecode that can be used for tracing introduced in recent kernels (4.X). 
This post will be a part of a three series post. In this post we'll start with setup, in part #2 with ZFS and how to use it for backups / snapshots. In part #3 we'll dig into BPF a bit more.

Step 1 is to setup a new ubuntu. I setup a VM using ubuntu-16.04.2-server-amd64.iso.


As root :

Add the repo for bcc :
> echo "deb [trusted=yes] https://repo.iovisor.org/apt/xenial xenial-nightly main" | sudo tee /etc/apt/sources.list.d/iovisor.list

sudo apt-get update


Install all necessary and some optional packages :


> apt-get install -y sudo wget apt-transport-https joe less build-essential libreadline-dev \

zlib1g-dev flex bison libxml2-dev libxslt-dev libssl-dev openssh-server screen git \
binutils bcc bcc-tools libbcc-examples python-bcc zfsutils-linux \
systemtap systemtap-sdt-dev

Make sure kernel is properly compiled :


> cat /boot/config-`uname -r` |grep BPF

CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NET_CLS_BPF=m
CONFIG_NET_ACT_BPF=m
CONFIG_BPF_JIT=y
CONFIG_HAVE_BPF_JIT=y
CONFIG_BPF_EVENTS=y
CONFIG_TEST_BPF=m

Test BCC (stands for BPF Compiler Collection)


> python /usr/share/bcc/examples/tracing/bitehist.py

Tracing... Hit Ctrl-C to end.
^C
     kbytes              : count     distribution
         0 -> 1          : 7        |************                            |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 22       |****************************************|
         8 -> 15         : 19       |**********************************      |
        16 -> 31         : 8        |**************                          |
        32 -> 63         : 6        |**********                              |
        64 -> 127        : 1        |*                                       |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 0        |                                        |
       512 -> 1023       : 1        |*                                       |

Now its time to install postgres on a zfs partition, in my case i had a disk (sdb) attached on my VM :


> fdisk /dev/sdb


Welcome to fdisk (util-linux 2.27.1).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.

Created a new DOS disklabel with disk identifier 0x4226e0bf.

Command (m for help): n

Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1):
First sector (2048-41943039, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-41943039, default 41943039):

Created a new partition 1 of type 'Linux' and of size 20 GiB.


Command (m for help): w

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

To create the pool :


> sudo zpool create pg_zfs /dev/sdb1


> zpool status

  pool: pg_zfs
 state: ONLINE
  scan: none requested
config:

NAME        STATE     READ WRITE CKSUM

pg_zfs      ONLINE       0     0     0
 sdb1      ONLINE       0     0     0

errors: No known data errors


> mount |grep pg_

pg_zfs on /pg_zfs type zfs (rw,relatime,xattr,noacl)

> ls /pg_zfs/ -l

total 0

> cd /pg_zfs/

> mkdir pgsql
> mkdir pgdata
> chown postgres:postgres pgsql/
> chown postgres:postgres pgdata/

Now with everything ready compile postgres from source :


> wget -c https://ftp.postgresql.org/pub/source/v10beta1/postgresql-10beta1.tar.gz


> tar zxfv postgresql-10beta1.tar.gz


> cd postgresql-10*

> ./configure --prefix=/pg_zfs/pgsql/ --enable-dtrace
> make -j 4 world
> make -j 4 install-world
> export PATH=$PATH:/pg_zfs/pgsql/bin
> export LD_LIBRARY_PATH=/pg_zfs/pgsql/lib
> export PGDATA=/pg_zfs/pgdata
> initdb
> pg_ctl start

At this point, postgres binaries and datafiles are on zfs. Now to check the probes we have available :


/usr/share/bcc/tools/tplist -l /pg_zfs/pgsql/bin/postgres  |awk {'print $2'}

postgresql:clog__checkpoint__start
postgresql:clog__checkpoint__done
postgresql:multixact__checkpoint__start
postgresql:multixact__checkpoint__done
postgresql:subtrans__checkpoint__start
postgresql:subtrans__checkpoint__done
postgresql:twophase__checkpoint__start
postgresql:twophase__checkpoint__done
postgresql:transaction__start
postgresql:transaction__commit
postgresql:transaction__abort
postgresql:wal__buffer__write__dirty__start
postgresql:wal__buffer__write__dirty__done
postgresql:wal__switch
postgresql:checkpoint__start
postgresql:checkpoint__done
postgresql:wal__insert
postgresql:statement__status
postgresql:buffer__flush__start
postgresql:buffer__flush__done
postgresql:buffer__read__start
postgresql:buffer__read__done
postgresql:buffer__write__dirty__start
postgresql:buffer__write__dirty__done
postgresql:buffer__sync__done
postgresql:buffer__sync__start
postgresql:buffer__sync__written
postgresql:buffer__checkpoint__start
postgresql:buffer__checkpoint__sync__start
postgresql:buffer__checkpoint__done
postgresql:lock__wait__start
postgresql:lock__wait__done
postgresql:deadlock__found
postgresql:lwlock__wait__start
postgresql:lwlock__wait__done
postgresql:lwlock__acquire
postgresql:lwlock__condacquire
postgresql:lwlock__condacquire__fail
postgresql:lwlock__acquire__or__wait
postgresql:lwlock__acquire__or__wait__fail
postgresql:lwlock__release
postgresql:smgr__md__read__start
postgresql:smgr__md__read__done
postgresql:smgr__md__write__start
postgresql:smgr__md__write__done
postgresql:query__parse__start
postgresql:query__parse__done
postgresql:query__rewrite__start
postgresql:query__rewrite__done
postgresql:query__plan__start
postgresql:query__plan__done
postgresql:query__start
postgresql:query__done
postgresql:query__execute__start
postgresql:query__execute__done
postgresql:sort__start
postgresql:sort__done

To make sure tracing works properly, while running a statement on a different terminal :


> sudo /usr/share/bcc/tools/dbslower postgres -p 1208

Tracing database queries for pids 1208 slower than 1 ms...
TIME(s)        PID          MS QUERY
2.729496       1208   2399.665 insert into test select * from generate_series (1,100000);



Thanks for reading.

Vasilis Ventirozos
OmniTI Computer Consulting

Comments

  1. "Unfortunately not many run postgres on solaris". Given that Joyent make a living out of doing this - Postgres on SmartOS, can you comment further on your doubts.

    ReplyDelete
    Replies
    1. We do the same at OmniTI, all our internally hosted clients are on OmniOS, i know what Joyent does and how they make a living but still, you can't deny that the strong majority would go for linux.

      Delete
  2. Indeed. Thanks for your thoughts
    /d

    ReplyDelete

Post a Comment

Popular posts from this blog

Tuning checkpoints

AWS Aurora Postgres, not a great first impression

Setting up Postgres Automated Failover