PostgreSQL Partitioning
Partitions are very usable when it comes to big
tables, documentation suggests applying table partitioning when a table
is bigger than 10Gb. In postgres there are 2 kinds of partitions
First , check into postgresql.conf for this parameter:
constraint_exclusion = partition
Now , lets create the master and child tables :
CREATE TABLE sales (
sales_id serial NOT NULL,
sales_date DATE NOT NULL DEFAULT CURRENT_DATE,
description text
);
CREATE TABLE sales_2013_p1 (
CHECK ( sales_date >= DATE '2013-01-01' AND sales_date < DATE '2013-05-01' )
) INHERITS (sales);
CREATE TABLE sales_2013_p2 (
CHECK ( sales_date >= DATE '2013-05-01' AND sales_date < DATE '2013-09-01' )
) INHERITS (sales);
CREATE TABLE sales_2013_p3 (
CHECK ( sales_date >= DATE '2013-09-01' AND sales_date < DATE '2014-01-01' )
) INHERITS (sales);
notice the keyword INHERITS here :)
next, PK's and indexes on child tables,
ALTER TABLE sales_2013_p1 ADD CONSTRAINT sales_2013_p1_pkey PRIMARY KEY (sales_id, sales_date);
ALTER TABLE sales_2013_p2 ADD CONSTRAINT sales_2013_p2_pkey PRIMARY KEY (sales_id, sales_date);
ALTER TABLE sales_2013_p3 ADD CONSTRAINT sales_2013_p3_pkey PRIMARY KEY (sales_id, sales_date);
CREATE INDEX idx_2013_p1 ON sales_2013_p1 (sales_date);
CREATE INDEX idx_2013_p2 ON sales_2013_p2 (sales_date);
CREATE INDEX idx_2013_p3 ON sales_2013_p3 (sales_date);
and finaly a function that returns trigger and the on-insert trigger itself.
CREATE OR REPLACE FUNCTION sales_trig_func()
RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.sales_date >= DATE '2013-01-01' AND NEW.sales_date < DATE '2013-05-01' ) THEN
INSERT INTO sales_2013_p1 VALUES (NEW.*);
ELSIF ( NEW.sales_date >= DATE '2013-05-01' AND NEW.sales_date < DATE '2013-09-01' ) THEN
INSERT INTO sales_2013_p2 VALUES (NEW.*);
ELSIF ( NEW.sales_date >= DATE '2013-09-01' AND NEW.sales_date < DATE '2014-01-01' ) THEN
INSERT INTO sales_2013_p3 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range.!';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
CREATE TRIGGER insert_on_sales
BEFORE INSERT ON sales
FOR EACH ROW EXECUTE PROCEDURE sales_trig_func();
Now that we have a table with a basic partitioning schema, lets assume that we want to add more partitions for 2014. create a new child table (for examples sake) i will just create a partition for 2014.
CREATE TABLE sales_2014 (
CHECK ( sales_date >= DATE '2014-01-01' AND sales_date < DATE '2015-01-01' )
) INHERITS (sales);
ALTER TABLE sales_2014 ADD CONSTRAINT sales_2014_pkey PRIMARY KEY (sales_id, sales_date);
CREATE INDEX idx_2014 ON sales_2014 (sales_date);
CREATE OR REPLACE FUNCTION sales_trig_func()
RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.sales_date >= DATE '2013-01-01' AND NEW.sales_date < DATE '2013-05-01' ) THEN
INSERT INTO sales_2013_p1 VALUES (NEW.*);
ELSIF ( NEW.sales_date >= DATE '2013-05-01' AND NEW.sales_date < DATE '2013-09-01' ) THEN
INSERT INTO sales_2013_p2 VALUES (NEW.*);
ELSIF ( NEW.sales_date >= DATE '2013-09-01' AND NEW.sales_date < DATE '2014-01-01' ) THEN
INSERT INTO sales_2013_p3 VALUES (NEW.*);
ELSIF ( NEW.sales_date >= DATE '2014-01-01' AND NEW.sales_date < DATE '2015-01-01' ) THEN
INSERT INTO sales_2014 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range.!';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
and we are done!
Now lets say that 2013_p1 data are obsolete and we want to move them to a historical database, drop table, correct and replace the function and you are done.
This is how the master table would look after these operations :
partition=# \d+ sales
Table "public.sales"
Column | Type | Modifiers | Storage | Stats target | Description
-------------+---------+----------------------------------------------------------+----------+--------------+-------------
sales_id | integer | not null default nextval('sales_sales_id_seq'::regclass) | plain | |
sales_date | date | not null default ('now'::text)::date | plain | |
description | text | | extended | |
Triggers:
insert_on_sales BEFORE INSERT ON sales FOR EACH ROW EXECUTE PROCEDURE sales_trig_func()
Child tables: sales_2013_p2,
sales_2013_p3,
sales_2014
Has OIDs: no
Good thing about this approach is that partitions are generally easy to maintain and administer, child tables can have different indexes with each other, and you can of course delete large portions of data that may not be needed any more just by dropping a partition, uh ! the performance is VERY good on insert and select and maintenance work like reindex is faster. Reindex in particular wouldn't lock the whole master table for writes.
Thanks for reading
-Vasilis
- Range
- List
- Enable constraint exclusion in config file
- Create a master table
- Create child tables WITHOUT overlapping table constraints
- Create indexes , pk's
- Create function and trigger to insert data to child tables
First , check into postgresql.conf for this parameter:
constraint_exclusion = partition
Now , lets create the master and child tables :
CREATE TABLE sales (
sales_id serial NOT NULL,
sales_date DATE NOT NULL DEFAULT CURRENT_DATE,
description text
);
CREATE TABLE sales_2013_p1 (
CHECK ( sales_date >= DATE '2013-01-01' AND sales_date < DATE '2013-05-01' )
) INHERITS (sales);
CREATE TABLE sales_2013_p2 (
CHECK ( sales_date >= DATE '2013-05-01' AND sales_date < DATE '2013-09-01' )
) INHERITS (sales);
CREATE TABLE sales_2013_p3 (
CHECK ( sales_date >= DATE '2013-09-01' AND sales_date < DATE '2014-01-01' )
) INHERITS (sales);
notice the keyword INHERITS here :)
next, PK's and indexes on child tables,
ALTER TABLE sales_2013_p1 ADD CONSTRAINT sales_2013_p1_pkey PRIMARY KEY (sales_id, sales_date);
ALTER TABLE sales_2013_p2 ADD CONSTRAINT sales_2013_p2_pkey PRIMARY KEY (sales_id, sales_date);
ALTER TABLE sales_2013_p3 ADD CONSTRAINT sales_2013_p3_pkey PRIMARY KEY (sales_id, sales_date);
CREATE INDEX idx_2013_p1 ON sales_2013_p1 (sales_date);
CREATE INDEX idx_2013_p2 ON sales_2013_p2 (sales_date);
CREATE INDEX idx_2013_p3 ON sales_2013_p3 (sales_date);
and finaly a function that returns trigger and the on-insert trigger itself.
CREATE OR REPLACE FUNCTION sales_trig_func()
RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.sales_date >= DATE '2013-01-01' AND NEW.sales_date < DATE '2013-05-01' ) THEN
INSERT INTO sales_2013_p1 VALUES (NEW.*);
ELSIF ( NEW.sales_date >= DATE '2013-05-01' AND NEW.sales_date < DATE '2013-09-01' ) THEN
INSERT INTO sales_2013_p2 VALUES (NEW.*);
ELSIF ( NEW.sales_date >= DATE '2013-09-01' AND NEW.sales_date < DATE '2014-01-01' ) THEN
INSERT INTO sales_2013_p3 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range.!';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
CREATE TRIGGER insert_on_sales
BEFORE INSERT ON sales
FOR EACH ROW EXECUTE PROCEDURE sales_trig_func();
Now that we have a table with a basic partitioning schema, lets assume that we want to add more partitions for 2014. create a new child table (for examples sake) i will just create a partition for 2014.
CREATE TABLE sales_2014 (
CHECK ( sales_date >= DATE '2014-01-01' AND sales_date < DATE '2015-01-01' )
) INHERITS (sales);
ALTER TABLE sales_2014 ADD CONSTRAINT sales_2014_pkey PRIMARY KEY (sales_id, sales_date);
CREATE INDEX idx_2014 ON sales_2014 (sales_date);
CREATE OR REPLACE FUNCTION sales_trig_func()
RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.sales_date >= DATE '2013-01-01' AND NEW.sales_date < DATE '2013-05-01' ) THEN
INSERT INTO sales_2013_p1 VALUES (NEW.*);
ELSIF ( NEW.sales_date >= DATE '2013-05-01' AND NEW.sales_date < DATE '2013-09-01' ) THEN
INSERT INTO sales_2013_p2 VALUES (NEW.*);
ELSIF ( NEW.sales_date >= DATE '2013-09-01' AND NEW.sales_date < DATE '2014-01-01' ) THEN
INSERT INTO sales_2013_p3 VALUES (NEW.*);
ELSIF ( NEW.sales_date >= DATE '2014-01-01' AND NEW.sales_date < DATE '2015-01-01' ) THEN
INSERT INTO sales_2014 VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range.!';
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
and we are done!
Now lets say that 2013_p1 data are obsolete and we want to move them to a historical database, drop table, correct and replace the function and you are done.
This is how the master table would look after these operations :
partition=# \d+ sales
Table "public.sales"
Column | Type | Modifiers | Storage | Stats target | Description
-------------+---------+----------------------------------------------------------+----------+--------------+-------------
sales_id | integer | not null default nextval('sales_sales_id_seq'::regclass) | plain | |
sales_date | date | not null default ('now'::text)::date | plain | |
description | text | | extended | |
Triggers:
insert_on_sales BEFORE INSERT ON sales FOR EACH ROW EXECUTE PROCEDURE sales_trig_func()
Child tables: sales_2013_p2,
sales_2013_p3,
sales_2014
Has OIDs: no
Good thing about this approach is that partitions are generally easy to maintain and administer, child tables can have different indexes with each other, and you can of course delete large portions of data that may not be needed any more just by dropping a partition, uh ! the performance is VERY good on insert and select and maintenance work like reindex is faster. Reindex in particular wouldn't lock the whole master table for writes.
Thanks for reading
-Vasilis
Hello, Thx fot that post. I have one question:
ReplyDeleteWhat if i would like to
SELECT * FROM sales WHERE sales_date > 2013-06-01;
Is it possible to execute this simple query or do we have to SELECT and UNION data from sales_2013_p2 and sales_2013_p3 ?
Of course it is, you don't need to do any UNION or change anything on the application side.
ReplyDelete