Wednesday, 18 December 2013

cooking with fdw

Back in 9.1 a feature called foreign data wrappers was introduced, you could basically connect a PostgreSQL Server to remote data stores (other dbs, csv files etc). Today i plan to play with this FDW and see what and how we can use it.

First of all there are many FDW's, a list can be found here.

For the sake of this post i used the following data wrappers :

     Name     | Version |   Schema   |                    Description                    
--------------+---------+------------+----------------------------------------------------
 file_fdw     | 1.0     | public     | foreign-data wrapper for flat file access
 postgres_fdw | 1.0     | public     | foreign-data wrapper for remote PostgreSQL servers
 redis_fdw    | 1.0     | public     | Foreign data wrapper for querying a Redis server


file_fdw and postgres_fdw are both available in contrib, redis fdw can be found in the following link. you will also need to compile and install hiredis which can be found here. I wont go much into detail on how to install this, but there is plenty documentation available.

The plan is to connect a redis server a remote postgreSQL server and a csv and interact with everything.

First thing that has to be done after the extensions have been created is to create foreign servers and tables, the following example shows what i did with the remote postgreSQL server :

test_redis=# CREATE SERVER node_2
FOREIGN DATA WRAPPER postgres_fdw  
OPTIONS (host '192.168.0.3', dbname 'postgres');     
CREATE SERVER
test_redis=# CREATE USER MAPPING FOR PUBLIC SERVER node_2;
CREATE USER MAPPING
test_redis=# CREATE FOREIGN TABLE test_pg (id int)
SERVER node_2 OPTIONS (table_name 'test_pg');
CREATE FOREIGN TABLE


csv and redis foreign tables were really similar to create, so when i was done i had this :

 public | db15       | foreign table | vasilis
 public | pgcsv      | foreign table | vasilis
 public | test       | table         | vasilis
 public | test2_pg   | foreign table | vasilis





one local table and one for each of my foreign wrappers. So, lets start playing around.

test_redis=# begin;
BEGIN
test_redis=# insert into test2_pg values ('koko','lala');
INSERT 0 1
test_redis=# commit;
COMMIT



Yep, i could insert into a remote PG server, in a transactional way, goodbye dblink ? (oh yes!)
Unfortunately this wouldn't work on redis, but only because the fdw doesn't support it (yet?) .
Now lets join them all and see how that works.

test_redis=# select * from db15,pgcsv,test2_pg,test where db15.key=test2_pg.key and pgcsv.key=db15.key and db15.key=test.key and test2_pg.key=db15.key;
 key  | value | id | key  |    date    |   time   | num | key  | value | key  | value 
------+-------+----+------+------------+----------+-----+------+-------+------+--------
 koko | koko1 |  1 | koko | 2013-12-01 | 00:01:01 | 234 | koko | lala  | koko | ninini


the statement is not something that is actually usable, but it demonstrates by the execution plan that all foreign tables have been accessed :

test_redis=# explain analyze select * from db15,pgcsv,test2_pg,test where db15.key=test2_pg.key and pgcsv.key=db15.key and db15.key=test.key and test2_pg.key=db15.key;
                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=111.24..156.43 rows=1 width=244) (actual time=0.981..1.031 rows=1 loops=1)
   Join Filter: (test2_pg.key = db15.key)
   Rows Removed by Join Filter: 2
   ->  Nested Loop  (cost=101.24..143.39 rows=1 width=180) (actual time=0.857..0.862 rows=1 loops=1)
         ->  Hash Join  (cost=101.11..142.55 rows=5 width=116) (actual time=0.833..0.835 rows=1 loops=1)
               Hash Cond: (test2_pg.key = pgcsv.key)
               ->  Foreign Scan on test2_pg  (cost=100.00..137.90 rows=930 width=64) (actual time=0.767..0.768 rows=1 loops=1)
               ->  Hash  (cost=1.10..1.10 rows=1 width=52) (actual time=0.049..0.049 rows=3 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 1kB
                     ->  Foreign Scan on pgcsv  (cost=0.00..1.10 rows=1 width=52) (actual time=0.033..0.042 rows=3 loops=1)
                           Foreign File: /home/vasilis/lala.csv
                           Foreign File Size: 98
         ->  Index Scan using kokoako on test  (cost=0.12..0.16 rows=1 width=64) (actual time=0.019..0.021 rows=1 loops=1)
               Index Cond: (key = test2_pg.key)
   ->  Foreign Scan on db15  (cost=10.00..13.00 rows=3 width=64) (actual time=0.071..0.162 rows=3 loops=1)
         Foreign Redis Table Size: 3
 Total runtime: 2.184 ms
(17 rows)

This is awesome from many aspects, imagine that you have a heterogeneous environment with mongo, redis, postgres , oracle ,csv logs etc.. you can put a postgreSQL in the middle, acting like a bridge and create supersets, subsets, aggregations, views,temp tables, report tables and so many other things. You could even limit users to access only the "bridge postgres" so they can see whatever you want from many different systems. you can use SQL in redis since redis commands are so limited. and because i am starting getting a bit paranoid i will leave it up to you on how to use this :)


Thanks for reading
-- Vasilis





No comments:

Post a Comment