Showing posts from August, 2013

Backing up PostgreSQL in HDFS

There are number ways to backup a PostgreSQL database, some are standard and some just demonstrate the power of open source and the things you can do if you put in use your creativity and imagination. At OmniTI , we use OmniPITR tool to manage WAL files and running backup on secondary databases instead of primary to reduce load during backup. In this post, I will discuss OmniPITR and Hadoop to accomplish something very neat, storing your backups into HDFS (Hadoop Distributed File System).  You might be asking Why? HDFS is rock solid reliable, it has extremely low cost per byte and it can get 2Gbit per computer, scalable up to more than a TB per second. it is proven from internet giants for running a big variety of different use-cases. Let's say that you have a 1TB database running, an uncompressed backup will need 1TB of reliable storage just to keep one copy. HFDS has the great advantage of using cheap hardware and being fault tolerant at the same time

Setting shared_buffers the hard way

One of the main performance parameters in PostgreSQL is shared_buffers, probably the most important one, there are guidelines and rules of thumb that say just set it to 20-30% of your machine total memory. Don't get me wrong , these rules are generaly good and its a perfect starting point, but there are reasons why you should tune your shared buffers in more detail a. 30% might not be enough and you will never know if you dont know exactly how to set this parameter b. 30% might be a lot and you spend resources in vain. c. you want to be awesome and tune every bit of your DB the optimal way. What is shared buffers though ? Shared buffers defines a block of memory that PostgreSQL will use to hold requests that are awaiting attention from the kernel buffer and CPU. so basically PostgreSQL will put temporarily data blocks in the memory in order to process - EVERYTHING will go through the shared buffers. Why not set shared_buffers to 80% of ram in a DB dedicated se

PostgreSQL backup and recovery

One of the main tasks of any kind of administrator is to make sure that the data that he's responsible for will be available if anything bad happens (asteroids, flood, locusts, hail) in order to do that you will need a high availability solution to obtain continuity and a good backup plan to support your system in case of human error ( -Hello Mr. admin, i just dropped the customer payment table) trust me, it happens ... a lot... I briefly covered high availability with the implementation of a hot standby in an earlier post and now its time to cover backup options available in PostgreSQL. There are 2 kinds of backups in PostgreSQL, physical and logical. The good thing with logical backups is that they are simple to implement and maintain, selective backup and restore (even in later PG versions) is easy. Usually the backup output consists in one file that of course can be compressed. the major con of this method is that lacks Point In Time Recovery. ( A database wi

Migrating Oracle to PostgreSQL

I have an oracle dump from a database that i want to migrate to postgres for a project. I setup a 100gb VM, I setup oracle 11gr2 and i compiled postgres 9.2.3 on the same VM. i restored the dump to oracle and now i had to make a migration strategy. I had some ideas like getting the schema from the dump with expdp (sqlfile=) and manually translate it to postgres but i remembered a project that i saw some time ago called ora2pg. This cool thing is basically a perl program that needs except of perl (duh)  DBI and DBD for oracle and postgres. i set it up and i started experimenting first on the DDL's. the output was actually very good ! the only things that i had to change was some function based indexes and it was parse-able from postgres without problems, next was data. No problems there , it created a 20gb output file that all i had to do was to throw it to postgres. Just because i wasn't very proactive with my VM's disk space i faced some problems... 20Gb fr

PostgreSQL Partitioning

Partitions are very usable when it comes to big tables, documentation suggests applying table partitioning when a table is bigger than 10Gb. In postgres there are 2 kinds of partitions  Range List implementation will just need the following steps. Enable constraint exclusion in config file Create a master table Create child tables WITHOUT overlapping table constraints Create indexes , pk's Create function and trigger to insert data to child tables  First thing to notice here is that partitioning is basically using table inheritance. but enough with how things are working in theory, lets create one. F irst , check into postgresql.conf for th is parameter: constraint_exclusion = partition   Now , lets create the master and child tables : CREATE TABLE sales (     sales_id serial NOT NULL,     sales_date DATE NOT NULL DEFAULT CURRENT_DATE,     description text ); CREATE TABLE sales_2013_p1 ( CHECK ( sales_date >= DATE '2013-01-01' AND sales_date &

PostgreSQL, Binary replication in practice

 A couple of days ago I started making a short howto about streaming replication in PostgreSQL 9.2. Most of these things are well documented but in this howto i will also try to experiment with switch overs and switchbacks. It aims to show how easy it is to set it up right out of the box. Streaming replication PostgreSQL 9.2 For my example i will use 2 debian VM's (pglab1) and (pglab2) - not mandatory - exchange ssh keys for passwordless ssh ,might be used if we need to scp scripts , rsync or whatever. if you don't know how to do it, follow these steps : - ON MASTER (PGLAB1)  - after you create a postgres cluster using initdb, edit master's postgresql.conf and change the following : listen_addresses = '*' wal_level = hot_standby #(could be archive too) max_wal_senders = 5 hot_standby = on create a replication user : create user repuser replication password 'passw

PostgreSQL proper installation

Recently at work, i got assigned to upgrade some (about 10) postgres 8.1.x to 9, I always liked compiling basically because I like the flexibility that compile offers, and thats what i proposed to the guys that are in charge of the project. They gave me a test system (vm) to play with, in all fairness they were a bit skeptical with the idea of compiling the rdbms. Mostly for reliability issues (don't ask me why). I explained that upgrading from source would be much easier later on and that the last year PostgreSQL developers are doing so much work that it cannot be ignored (pg_basebackup for example).Since the latest centos package was 9.1.x they agreed and i started working. PostgreSQL is so easy to compile, no strange dependencies not many libraries needed. and about these reliability concerns ? Because developers are so neat, and i quote from "While upgrades always have some risk, PostgreSQL minor releases fix