Tuesday, 27 August 2013

PostgreSQL Partitioning

Partitions are very usable when it comes to big tables, documentation suggests applying table partitioning when a table is bigger than 10Gb. In postgres there are 2 kinds of partitions 
  • Range
  • List
implementation will just need the following steps.
  1. Enable constraint exclusion in config file
  2. Create a master table
  3. Create child tables WITHOUT overlapping table constraints
  4. Create indexes , pk's
  5. Create function and trigger to insert data to child tables
 First thing to notice here is that partitioning is basically using table inheritance. but enough with how things are working in theory, lets create one.

First , check into postgresql.conf for this parameter:

constraint_exclusion = partition 

Now , lets create the master and child tables :

CREATE TABLE sales (
    sales_id serial NOT NULL,
    sales_date DATE NOT NULL DEFAULT CURRENT_DATE,
    description text
);

CREATE TABLE sales_2013_p1 (
CHECK ( sales_date >= DATE '2013-01-01' AND sales_date < DATE '2013-05-01' )
 ) INHERITS (sales);

 CREATE TABLE sales_2013_p2 (
CHECK ( sales_date >= DATE '2013-05-01' AND sales_date < DATE '2013-09-01' )
 ) INHERITS (sales);

 CREATE TABLE sales_2013_p3 (
CHECK ( sales_date >= DATE '2013-09-01' AND sales_date < DATE '2014-01-01' )
 ) INHERITS (sales);


notice the keyword INHERITS here :)
next, PK's and indexes on child tables,

ALTER TABLE sales_2013_p1 ADD CONSTRAINT sales_2013_p1_pkey PRIMARY KEY (sales_id, sales_date);
ALTER TABLE sales_2013_p2 ADD CONSTRAINT sales_2013_p2_pkey PRIMARY KEY (sales_id, sales_date);
ALTER TABLE sales_2013_p3 ADD CONSTRAINT sales_2013_p3_pkey PRIMARY KEY (sales_id, sales_date);


CREATE INDEX idx_2013_p1 ON sales_2013_p1 (sales_date);
CREATE INDEX idx_2013_p2 ON sales_2013_p2 (sales_date);
CREATE INDEX idx_2013_p3 ON sales_2013_p3 (sales_date);


and finaly a function that returns trigger and the on-insert trigger itself.

CREATE OR REPLACE FUNCTION sales_trig_func()
RETURNS TRIGGER AS $$
BEGIN
    IF ( NEW.sales_date >= DATE '2013-01-01' AND NEW.sales_date < DATE '2013-05-01' ) THEN
        INSERT INTO sales_2013_p1 VALUES (NEW.*);
    ELSIF ( NEW.sales_date >= DATE '2013-05-01' AND NEW.sales_date < DATE '2013-09-01' ) THEN
        INSERT INTO sales_2013_p2 VALUES (NEW.*);
    ELSIF ( NEW.sales_date >= DATE '2013-09-01' AND NEW.sales_date < DATE '2014-01-01' ) THEN
        INSERT INTO sales_2013_p3 VALUES (NEW.*);
    ELSE
        RAISE EXCEPTION 'Date out of range.!';
    END IF;
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER insert_on_sales
    BEFORE INSERT ON sales
    FOR EACH ROW EXECUTE PROCEDURE sales_trig_func();


Now that we have a table with a basic partitioning schema, lets assume that we want to add more partitions for 2014. create a new child table (for examples sake) i will just create a partition for 2014.


CREATE TABLE sales_2014 (
CHECK ( sales_date >= DATE '2014-01-01' AND sales_date < DATE '2015-01-01' )
 ) INHERITS (sales);


ALTER TABLE sales_2014 ADD CONSTRAINT sales_2014_pkey PRIMARY KEY (sales_id, sales_date);

CREATE INDEX idx_2014 ON sales_2014 (sales_date);

CREATE OR REPLACE FUNCTION sales_trig_func()
RETURNS TRIGGER AS $$
BEGIN
    IF ( NEW.sales_date >= DATE '2013-01-01' AND NEW.sales_date < DATE '2013-05-01' ) THEN
        INSERT INTO sales_2013_p1 VALUES (NEW.*);
    ELSIF ( NEW.sales_date >= DATE '2013-05-01' AND NEW.sales_date < DATE '2013-09-01' ) THEN
        INSERT INTO sales_2013_p2 VALUES (NEW.*);
    ELSIF ( NEW.sales_date >= DATE '2013-09-01' AND NEW.sales_date < DATE '2014-01-01' ) THEN
        INSERT INTO sales_2013_p3 VALUES (NEW.*);


    ELSIF ( NEW.sales_date >= DATE '2014-01-01' AND NEW.sales_date < DATE '2015-01-01' ) THEN
        INSERT INTO sales_2014 VALUES (NEW.*);

   ELSE
        RAISE EXCEPTION 'Date out of range.!';
    END IF;
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;


and we are done!



Now lets say that 2013_p1 data are obsolete and we want to move them to a historical database, drop table, correct and replace the function and you are done.
 This is how the master table would look after these operations :

partition=# \d+ sales
                                                   Table "public.sales"
   Column    |  Type   |                        Modifiers                                               | Storage  | Stats target | Description
-------------+---------+----------------------------------------------------------+----------+--------------+-------------
 sales_id    | integer | not null default nextval('sales_sales_id_seq'::regclass) | plain    |              |
 sales_date  | date    | not null default ('now'::text)::date                                  | plain    |              |
 description | text    |                                                                                         | extended |              |
Triggers:
    insert_on_sales BEFORE INSERT ON sales FOR EACH ROW EXECUTE PROCEDURE sales_trig_func()
Child tables: sales_2013_p2,
                     sales_2013_p3,
                     sales_2014
Has OIDs: no



Good thing about this approach is that partitions are generally easy to maintain and administer, child tables can have different indexes with each other, and you can of course delete large portions of data that may not be needed any more just by dropping a partition, uh ! the performance is VERY good on insert and select and maintenance work like reindex is faster. Reindex in particular wouldn't lock the whole master table for writes.



Thanks for reading
-Vasilis

2 comments:

  1. Hello, Thx fot that post. I have one question:

    What if i would like to

    SELECT * FROM sales WHERE sales_date > 2013-06-01;

    Is it possible to execute this simple query or do we have to SELECT and UNION data from sales_2013_p2 and sales_2013_p3 ?

    ReplyDelete
  2. Of course it is, you don't need to do any UNION or change anything on the application side.

    ReplyDelete