Wednesday, 18 December 2013

cooking with fdw

Back in 9.1 a feature called foreign data wrappers was introduced, you could basically connect a PostgreSQL Server to remote data stores (other dbs, csv files etc). Today i plan to play with this FDW and see what and how we can use it.

First of all there are many FDW's, a list can be found here.

For the sake of this post i used the following data wrappers :

     Name     | Version |   Schema   |                    Description                    
 file_fdw     | 1.0     | public     | foreign-data wrapper for flat file access
 postgres_fdw | 1.0     | public     | foreign-data wrapper for remote PostgreSQL servers
 redis_fdw    | 1.0     | public     | Foreign data wrapper for querying a Redis server

file_fdw and postgres_fdw are both available in contrib, redis fdw can be found in the following link. you will also need to compile and install hiredis which can be found here. I wont go much into detail on how to install this, but there is plenty documentation available.

The plan is to connect a redis server a remote postgreSQL server and a csv and interact with everything.

First thing that has to be done after the extensions have been created is to create foreign servers and tables, the following example shows what i did with the remote postgreSQL server :

test_redis=# CREATE SERVER node_2
OPTIONS (host '', dbname 'postgres');     
test_redis=# CREATE FOREIGN TABLE test_pg (id int)
SERVER node_2 OPTIONS (table_name 'test_pg');

csv and redis foreign tables were really similar to create, so when i was done i had this :

 public | db15       | foreign table | vasilis
 public | pgcsv      | foreign table | vasilis
 public | test       | table         | vasilis
 public | test2_pg   | foreign table | vasilis

one local table and one for each of my foreign wrappers. So, lets start playing around.

test_redis=# begin;
test_redis=# insert into test2_pg values ('koko','lala');
test_redis=# commit;

Yep, i could insert into a remote PG server, in a transactional way, goodbye dblink ? (oh yes!)
Unfortunately this wouldn't work on redis, but only because the fdw doesn't support it (yet?) .
Now lets join them all and see how that works.

test_redis=# select * from db15,pgcsv,test2_pg,test where db15.key=test2_pg.key and pgcsv.key=db15.key and db15.key=test.key and test2_pg.key=db15.key;
 key  | value | id | key  |    date    |   time   | num | key  | value | key  | value 
 koko | koko1 |  1 | koko | 2013-12-01 | 00:01:01 | 234 | koko | lala  | koko | ninini

the statement is not something that is actually usable, but it demonstrates by the execution plan that all foreign tables have been accessed :

test_redis=# explain analyze select * from db15,pgcsv,test2_pg,test where db15.key=test2_pg.key and pgcsv.key=db15.key and db15.key=test.key and test2_pg.key=db15.key;
                                                          QUERY PLAN                                                          
 Nested Loop  (cost=111.24..156.43 rows=1 width=244) (actual time=0.981..1.031 rows=1 loops=1)
   Join Filter: (test2_pg.key = db15.key)
   Rows Removed by Join Filter: 2
   ->  Nested Loop  (cost=101.24..143.39 rows=1 width=180) (actual time=0.857..0.862 rows=1 loops=1)
         ->  Hash Join  (cost=101.11..142.55 rows=5 width=116) (actual time=0.833..0.835 rows=1 loops=1)
               Hash Cond: (test2_pg.key = pgcsv.key)
               ->  Foreign Scan on test2_pg  (cost=100.00..137.90 rows=930 width=64) (actual time=0.767..0.768 rows=1 loops=1)
               ->  Hash  (cost=1.10..1.10 rows=1 width=52) (actual time=0.049..0.049 rows=3 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 1kB
                     ->  Foreign Scan on pgcsv  (cost=0.00..1.10 rows=1 width=52) (actual time=0.033..0.042 rows=3 loops=1)
                           Foreign File: /home/vasilis/lala.csv
                           Foreign File Size: 98
         ->  Index Scan using kokoako on test  (cost=0.12..0.16 rows=1 width=64) (actual time=0.019..0.021 rows=1 loops=1)
               Index Cond: (key = test2_pg.key)
   ->  Foreign Scan on db15  (cost=10.00..13.00 rows=3 width=64) (actual time=0.071..0.162 rows=3 loops=1)
         Foreign Redis Table Size: 3
 Total runtime: 2.184 ms
(17 rows)

This is awesome from many aspects, imagine that you have a heterogeneous environment with mongo, redis, postgres , oracle ,csv logs etc.. you can put a postgreSQL in the middle, acting like a bridge and create supersets, subsets, aggregations, views,temp tables, report tables and so many other things. You could even limit users to access only the "bridge postgres" so they can see whatever you want from many different systems. you can use SQL in redis since redis commands are so limited. and because i am starting getting a bit paranoid i will leave it up to you on how to use this :)

Thanks for reading
-- Vasilis

No comments:

Post a Comment