Friday, 16 June 2017

PostgreSQL on ZFS with BPF tracing on top.

At OmniTI we love solaris, my personal favourite features are ZFS and DTrace. Unfortunately not many run postgres on solaris so i have decided to implement similar features in linux. Instead of Dtrace i'll install BPF, in-kernel bytecode that can be used for tracing introduced in recent kernels (4.X). 
This post will be a part of a three series post. In this post we'll start with setup, in part #2 with ZFS and how to use it for backups / snapshots. In part #3 we'll dig into BPF a bit more.

Step 1 is to setup a new ubuntu. I setup a VM using ubuntu-16.04.2-server-amd64.iso.

As root :
Add the repo for bcc :
> echo "deb [trusted=yes] https://repo.iovisor.org/apt/xenial xenial-nightly main" | sudo tee /etc/apt/sources.list.d/iovisor.list

sudo apt-get update

Install all necessary and some optional packages :

> apt-get install -y sudo wget apt-transport-https joe less build-essential libreadline-dev \
zlib1g-dev flex bison libxml2-dev libxslt-dev libssl-dev openssh-server screen git \
binutils bcc bcc-tools libbcc-examples python-bcc zfsutils-linux \
systemtap systemtap-sdt-dev

Make sure kernel is properly compiled :

> cat /boot/config-`uname -r` |grep BPF
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NET_CLS_BPF=m
CONFIG_NET_ACT_BPF=m
CONFIG_BPF_JIT=y
CONFIG_HAVE_BPF_JIT=y
CONFIG_BPF_EVENTS=y
CONFIG_TEST_BPF=m

Test BCC (stands for BPF Compiler Collection)

> python /usr/share/bcc/examples/tracing/bitehist.py
Tracing... Hit Ctrl-C to end.
^C
     kbytes              : count     distribution
         0 -> 1          : 7        |************                            |
         2 -> 3          : 0        |                                        |
         4 -> 7          : 22       |****************************************|
         8 -> 15         : 19       |**********************************      |
        16 -> 31         : 8        |**************                          |
        32 -> 63         : 6        |**********                              |
        64 -> 127        : 1        |*                                       |
       128 -> 255        : 0        |                                        |
       256 -> 511        : 0        |                                        |
       512 -> 1023       : 1        |*                                       |

Now its time to install postgres on a zfs partition, in my case i had a disk (sdb) attached on my VM :

> fdisk /dev/sdb

Welcome to fdisk (util-linux 2.27.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0x4226e0bf.

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1):
First sector (2048-41943039, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-41943039, default 41943039):

Created a new partition 1 of type 'Linux' and of size 20 GiB.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

To create the pool :

> sudo zpool create pg_zfs /dev/sdb1

> zpool status
  pool: pg_zfs
 state: ONLINE
  scan: none requested
config:

NAME        STATE     READ WRITE CKSUM
pg_zfs      ONLINE       0     0     0
 sdb1      ONLINE       0     0     0

errors: No known data errors

> mount |grep pg_
pg_zfs on /pg_zfs type zfs (rw,relatime,xattr,noacl)

> ls /pg_zfs/ -l
total 0

> cd /pg_zfs/
> mkdir pgsql
> mkdir pgdata
> chown postgres:postgres pgsql/
> chown postgres:postgres pgdata/

Now with everything ready compile postgres from source :

> wget -c https://ftp.postgresql.org/pub/source/v10beta1/postgresql-10beta1.tar.gz

> tar zxfv postgresql-10beta1.tar.gz

> cd postgresql-10*
> ./configure --prefix=/pg_zfs/pgsql/ --enable-dtrace
> make -j 4 world
> make -j 4 install-world
> export PATH=$PATH:/pg_zfs/pgsql/bin
> export LD_LIBRARY_PATH=/pg_zfs/pgsql/lib
> export PGDATA=/pg_zfs/pgdata
> initdb
> pg_ctl start

At this point, postgres binaries and datafiles are on zfs. Now to check the probes we have available :

/usr/share/bcc/tools/tplist -l /pg_zfs/pgsql/bin/postgres  |awk {'print $2'}
postgresql:clog__checkpoint__start
postgresql:clog__checkpoint__done
postgresql:multixact__checkpoint__start
postgresql:multixact__checkpoint__done
postgresql:subtrans__checkpoint__start
postgresql:subtrans__checkpoint__done
postgresql:twophase__checkpoint__start
postgresql:twophase__checkpoint__done
postgresql:transaction__start
postgresql:transaction__commit
postgresql:transaction__abort
postgresql:wal__buffer__write__dirty__start
postgresql:wal__buffer__write__dirty__done
postgresql:wal__switch
postgresql:checkpoint__start
postgresql:checkpoint__done
postgresql:wal__insert
postgresql:statement__status
postgresql:buffer__flush__start
postgresql:buffer__flush__done
postgresql:buffer__read__start
postgresql:buffer__read__done
postgresql:buffer__write__dirty__start
postgresql:buffer__write__dirty__done
postgresql:buffer__sync__done
postgresql:buffer__sync__start
postgresql:buffer__sync__written
postgresql:buffer__checkpoint__start
postgresql:buffer__checkpoint__sync__start
postgresql:buffer__checkpoint__done
postgresql:lock__wait__start
postgresql:lock__wait__done
postgresql:deadlock__found
postgresql:lwlock__wait__start
postgresql:lwlock__wait__done
postgresql:lwlock__acquire
postgresql:lwlock__condacquire
postgresql:lwlock__condacquire__fail
postgresql:lwlock__acquire__or__wait
postgresql:lwlock__acquire__or__wait__fail
postgresql:lwlock__release
postgresql:smgr__md__read__start
postgresql:smgr__md__read__done
postgresql:smgr__md__write__start
postgresql:smgr__md__write__done
postgresql:query__parse__start
postgresql:query__parse__done
postgresql:query__rewrite__start
postgresql:query__rewrite__done
postgresql:query__plan__start
postgresql:query__plan__done
postgresql:query__start
postgresql:query__done
postgresql:query__execute__start
postgresql:query__execute__done
postgresql:sort__start
postgresql:sort__done

To make sure tracing works properly, while running a statement on a different terminal :

> sudo /usr/share/bcc/tools/dbslower postgres -p 1208
Tracing database queries for pids 1208 slower than 1 ms...
TIME(s)        PID          MS QUERY
2.729496       1208   2399.665 insert into test select * from generate_series (1,100000);


To be continued ...
-- Vasilis Ventirozos

3 comments:

  1. "Unfortunately not many run postgres on solaris". Given that Joyent make a living out of doing this - Postgres on SmartOS, can you comment further on your doubts.

    ReplyDelete
    Replies
    1. We do the same at OmniTI, all our internally hosted clients are on OmniOS, i know what Joyent does and how they make a living but still, you can't deny that the strong majority would go for linux.

      Delete
  2. Indeed. Thanks for your thoughts
    /d

    ReplyDelete