Wednesday, 30 August 2017

PostgreSQL on ZFS

What is ZFS. (a paste from wikipedia)

ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of filesystem and volume managementsnapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z and native NFSv4 ACLs.

Advantages of ZFS.

Data integrity guaranteed.
ZFS has volume management on filesystem level, this enables copy on write. When a data block changed, it will change its location on the disk before the new write is complete. If you have a crash, that data would be damaged. ZFS does not change the location of the data until the write is verified, which means that your data is safe during a crash. ZFS also use checksums to ensure that all data is correct, every write is tested. ZFS doesn't just protect data with copy on write, it also offers additional protection by having an additional RAID level. RAID-Z3 which allows a maximum of three disk failures in a ZFS pool. Traditional RAID levels only allow two disk failures per volume. ZFS offers RAID mirrors who are usually composed of a single disk keeping its own copy. With a multi-disk mirror you can have multiple copies, which adds levels of data integrity not found in traditional RAID setups and it's also great for reads.
Performance
ZFS also allows to send writes to individual disks, instead of just the whole RAID volume. Because of this, ZFS can spread writes which is speeding up write performance. It also has algorithms that ensure that the most frequently used data blocks are kept in the fastest media, meaning that you can have just a few SSD drives as cache.
Compression
Compression can be enabled on each dataset , its a transparent operation where zfs will compress and uncompress data on the fly. Because cpu operations are usually cheaper than IO and considering that compression algorithms are much smarter than they used to be the cpu overhead is usually not even noticeable, combined with the gain of compression  this is an extremely useful feature for databases. The supported compression algorithms are: LZJB, LZ4, ZLE, and Gz (1-9). Today we are going to play with LZ4.
Snapshots
snapshot is a read-only copy of a file system or volume. Snapshots can be created almost instantly, and they initially consume no additional disk space within the pool. However, as data within the active dataset changes, the snapshot consumes disk space by continuing to reference the old data, thus preventing the disk space from being freed.

-----------

And after this rather long intro on ZFS (that may or may not have pasted descriptions from oracle) its time to put it to work for a PostgreSQL database. In my previous post I went through how to install postgres on a openZFS volume on Ubuntu so in this post i will be using a vm that already has a ZFS volume. 
Enabling compression (LZ4).
> zfs set compression=lz4 pg_zfs 
Where pg_zfs is the name of the volume.

And yeah, that's pretty much it.

Now i will initdb a new cluster create a table and insert some data :

create table test_compression (id int , name text default 'Evol' , surname text default 'Monkey');
insert into test_compression (id) select generate_series (1,5000000);
create unique index id_unique_idx on test_compression (id);
create index name_idx on test_compression (name);
checkpoint ;
select pg_size_pretty(pg_total_relation_size('test_compression'));
 pg_size_pretty
----------------
 426 MB

now as root :

root@lab:~# zfs get compressratio pg_zfs
NAME    PROPERTY       VALUE  SOURCE

pg_zfs  compressratio  3.38x  -

Which is pretty neat!

Using snapshots.

To get a snapshot of your database directory something like this would work. as root :
psql -c "select pg_start_backup('zfs_snapshot');" -U vasilis postgres && zfs snapshot pg_zfs@blogpost && psql -c "select pg_stop_backup();" -U vasilis postgres 

zfs list -t snapshot
NAME              USED  AVAIL  REFER  MOUNTPOINT
pg_zfs@blogpost      0      -   279M  -

To restore, say that we delete the whole data directory :

rm -rf /pg_zfs/pgdata/
killall -9 postgres
zfs rollback pg_zfs@blogpost
/pg_zfs/pgsql/bin/pg_ctl -D /pg_zfs/pgdata/ start
psql -q -t -A -c "select count(*) from test_compression" monkey
5000000

All Done!
One thing to keep in mind is that all these operations are nearly instant.

At OmniTI, we use ZFS on OmniOS a lot and for this reason we have developed some scripts that can help with automation, sending snapshots on remote servers etc. These scripts can be found here.

Biggest con of ZFS is that there is no huge acceptability in linux, people are reporting that open-zfs is production ready, but personally i haven't used it with linux. I've seen that Canonical would promote it but there were some legal issues (?). Thing is that and now, with redhat ditching btrfs it's a good opportunity for ZFS to be used more. So if you have test / development environments , maybe you could give it a try and spread the word of zfs-awesomeness.


Thanks for reading
- Vasilis


2 comments:

  1. I've been running PostgreSQL + ZoL on Ubuntu in production for a couple years now, and on FreeBSD for many years before that.

    I found that setting the zfs recordsize to match the PostgreSQL block size (default 8k) increased performance as well. Compression plus recordsize didn't improve much, but I suspect it was a pg_bench bottleneck (when I benchmarked on AWS, increasing IO didn't increase the benchmark results).

    We initially thought that we would need to limit ZFS' ARC size to play nice with PostgreSQL, but it turned out to not be a problem. I have 160GB of RAM, so it could be a problem on smaller servers.

    ReplyDelete
  2. http://open-zfs.org/wiki/Performance_tuning also suggests using logbias=throughput in addition to recordsize=8k. FYI.

    ReplyDelete