Tuesday, 25 September 2018

Logical Replication in aws RDS

Recently Amazon annnounced that postgres 10 supports logical replication syntax and that it can work as either a publisher or a subscriber. The announcement can be found here.

In order to test it out i made 2 10.4 RDS instances. Set them up to be in the same security,
subnet and parameter groups so i ended up with something like this :

user : vasilis
password : Ap4ssw0rd
db instance : lrnode1 & 2
dbname : lrtest
port : 5432

only thing i had to change in order to enable logical replication is to change rds.enable_logical_replication to 1 in my parameter group.
 

After verifying connectivity

psql -h lrnode1.random.us-east-1.rds.amazonaws.com -U vasilis -d lrtest -W
psql -h lrnode2.random.us-east-1.rds.amazonaws.com -U vasilis -d lrtest -W

i created a dummy table on my publisher and subscriber (lrnode1 and 2)

create table lr_test_tbl (
    id int primary key,
    name text
);

on lrnode1 (publisher) i created a publication
CREATE PUBLICATION lr_pub FOR TABLE lr_test_tbl ;


on lrnode2 and as vasilis i created the subscription
CREATE SUBSCRIPTION lr_sub CONNECTION 'host=lrnode1.cqrcyb3h2bq2.us-east-1.rds.amazonaws.com 
dbname=lrtest user=vasilis password=Ap4ssw0rd' PUBLICATION lr_pub ;

There only 3 things you need to know, you need to adjust your subnet / security groups allowing
instances to see each other, you have to change the RDS.enable_logical_replication parameter 
and that the user you are supposed to use for this is the user you set when you created the instance.
 
That was it, pretty fast and simple.

-- 
Vasilis Ventirozos
OmniTI Computer Consulting Inc.

Thursday, 1 March 2018

Implementing a "distributed" reporting server using some of postgres10 features.

Today i will try to show how strong Postgres 10 is by combining different features in order to create a "distributed" reporting server. The features that i will be using are :
  • Logical Replication
  • Partitioning
  • Foreign Data Wrappers
  • Table Inheritance 
The scenario that we want to implement is the following : 
We have one central point for inserts, that we will call Bucket, bucket is partitioned by range yearly.In my example we have 3 partitions for 2016, 2017, 2018 and each partition is logically replicated to 3 data nodes, each responsible for 1 year of data. Finally we have a reporting proxy that is responsible for all selects and connects to each node through foreign data wrappers.
The setup consists in 5 docker containers that have the following roles.
  • 10.0.0.2, bucket, insert / update / delete
  • 10.0.0.3, node2016, data holder for 2016
  • 10.0.0.4, node2017, data holder for 2017
  • 10.0.0.5, node2018, data holder for 2018
  • 10.0.0.6, reporting proxy, main point for selects 


Now lets start with the bucket :


CREATE TABLE data_bucket (
  id int , 
  data text, 
  insert_time timestamp without time zone DEFAULT now()) 
  PARTITION BY RANGE (insert_time);

CREATE TABLE data_p2016 PARTITION OF data_bucket 
  FOR VALUES FROM ('2016-01-01 00:00:00') TO ('2017-01-01 00:00:00');
CREATE TABLE data_p2017 PARTITION OF data_bucket 
  FOR VALUES FROM ('2017-01-01 00:00:00') TO ('2018-01-01 00:00:00');
CREATE TABLE data_p2018 PARTITION OF data_bucket 
  FOR VALUES FROM ('2018-01-01 00:00:00') TO ('2019-01-01 00:00:00');

create unique index data_p2016_uniq on data_p2016 (id);
create unique index data_p2017_uniq on data_p2017 (id);
create unique index data_p2018_uniq on data_p2018 (id);

create index data_p2016_time on data_p2016 (insert_time);
create index data_p2017_time on data_p2017 (insert_time);
create index data_p2018_time on data_p2018 (insert_time);

CREATE PUBLICATION pub_data_p2016 FOR TABLE data_p2016 
  WITH (publish='insert,update');
CREATE PUBLICATION pub_data_p2017 FOR TABLE data_p2017 
  WITH (publish='insert,update');
CREATE PUBLICATION pub_data_p2018 FOR TABLE data_p2018 
  WITH (publish='insert,update');

Here we created a data bucket table that we will insert into, its yearly partitions, some indexes for uniqueness and for searching dates, indexes are optional, more about that later on. Last we created 3 publications that we will use in our next step. Notice that we only replicate inserts and updates not deletes. Just keep that in mind for later.

Next step is setting up the data nodes. On each container (node2016, node2017 and node2018) run the following SQL :


-- node 2016
CREATE TABLE data_p2016 (
  id int,
  data text,
  insert_time timestamp without time zone );

create unique index data_p2016_uniq on data_p2016 (id);
create index data_p2016_time on data_p2016 (insert_time);

CREATE SUBSCRIPTION sub_data_p2016 
  CONNECTION 'dbname=monkey host=10.0.0.2 user=postgres port=5432'
  PUBLICATION pub_data_p2016;

-- node 2017

CREATE TABLE data_p2017 (
  id int,
  data text,
insert_time timestamp without time zone ) ;

create unique index data_p2017_uniq on data_p2017 (id);
create index data_p2017_time on data_p2017 (insert_time);

CREATE SUBSCRIPTION sub_data_p2017 
  CONNECTION 'dbname=monkey host=10.0.0.2 user=postgres port=5432'
  PUBLICATION pub_data_p2017;

-- node 2018

CREATE TABLE data_p2018 (
  id int,
  data text,
  insert_time timestamp without time zone ) ;

create unique index data_p2018_uniq on data_p2017 (id);
create index data_p2018_time on data_p2017 (insert_time);

CREATE SUBSCRIPTION sub_data_p2018 
  CONNECTION 'dbname=monkey host=10.0.0.2 user=postgres port=5432'
  PUBLICATION pub_data_p2018;

Here, for each node we create the data table, indexes and a subscription pointing to the bucket server.

Right now every row that gets into the bucket is being transferred to the appropriate node. One last thing is missing, putting everything together. For aggregating all nodes we have the reporting proxy container. In this server we need to run the following SQL statements :


create extension if not exists postgres_fdw;

CREATE SERVER data_node_2016 
  FOREIGN DATA WRAPPER postgres_fdw 
  OPTIONS(host '10.0.0.3',port '5432',dbname 'monkey');
CREATE SERVER data_node_2017 
  FOREIGN DATA WRAPPER postgres_fdw 
  OPTIONS(host '10.0.0.4',port '5432',dbname 'monkey');
CREATE SERVER data_node_2018 
  FOREIGN DATA WRAPPER postgres_fdw 
  OPTIONS(host '10.0.0.5',port '5432',dbname 'monkey');

CREATE USER MAPPING FOR postgres SERVER data_node_2016 OPTIONS(user 'postgres');
CREATE USER MAPPING FOR postgres SERVER data_node_2017 OPTIONS(user 'postgres');
CREATE USER MAPPING FOR postgres SERVER data_node_2018 OPTIONS(user 'postgres');

CREATE TABLE reporting_table (
  id int,
  data text, 
insert_time timestamp without time zone);

CREATE FOREIGN TABLE data_node_2016 (
  CHECK ( insert_time >= DATE '2016-01-01' AND insert_time < DATE '2017-01-01' ))
  INHERITS (reporting_table) SERVER data_node_2016 
  options (table_name 'data_p2016');
CREATE FOREIGN TABLE data_node_2017 (
  CHECK ( insert_time >= DATE '2017-01-01' AND insert_time < DATE '2018-01-01' ))
  INHERITS (reporting_table) SERVER data_node_2017 options (table_name 'data_p2017');
CREATE FOREIGN TABLE data_node_2018 (
  CHECK ( insert_time >= DATE '2018-01-01' AND insert_time < DATE '2019-01-01' ))
  INHERITS (reporting_table) SERVER data_node_2018 options (table_name 'data_p2018');

We first create the Postgres foreign data wrapper extension , create remote servers and user mappings for each data node, then create the main reporting table and finally we create three foreign tables, one for each node using table inheritance.
The structure is ready, everything is now connected and we should be good for testing. But before we test this let's describe what to expect. By inserting into data_bucket data should be replicated into yearly partitions, these partitions will be replicated to their data nodes and the reporting proxy should aggregate all nodes by using foreign scans. 
Let's insert some randomly generated data by inserting into the data_bucket:


insert into data_bucket
  select generate_series(1,1000000), 
  md5(random()::text),
  timestamp '2016-01-01 00:00:00' + random() * 
  (timestamp '2019-01-01 00:00:00' - timestamp '2016-01-01 00:00:00');

Data should be distributed into all three nodes. Now from reporting_table we created in the reporting proxy we should be able to see everything, notice the explain plans :


monkey=# select count (*) from reporting_table ;
  count
---------
 1000000
(1 row)

monkey=# select min (insert_time),max(insert_time) from reporting_table;
            min             |            max
----------------------------+----------------------------
 2016-01-01 00:03:17.062862 | 2018-12-31 23:59:39.671967
(1 row)

monkey=# explain analyze select min (insert_time),max(insert_time) from reporting_table;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=598.80..598.81 rows=1 width=16) (actual time=1708.333..1708.334 rows=1 loops=1)
   ->  Append  (cost=0.00..560.40 rows=7681 width=8) (actual time=0.466..1653.186 rows=1000000 loops=1)
         ->  Seq Scan on reporting_table  (cost=0.00..0.00 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=1)
         ->  Foreign Scan on data_node_2016  (cost=100.00..186.80 rows=2560 width=8) (actual time=0.464..544.597 rows=334088 loops=1)
         ->  Foreign Scan on data_node_2017  (cost=100.00..186.80 rows=2560 width=8) (actual time=0.334..533.149 rows=332875 loops=1)
         ->  Foreign Scan on data_node_2018  (cost=100.00..186.80 rows=2560 width=8) (actual time=0.323..534.776 rows=333037 loops=1)
 Planning time: 0.220 ms
 Execution time: 1709.252 ms
(8 rows)

monkey=# select * from reporting_table where insert_time = '2016-06-21 17:59:44';
 id | data | insert_time
----+------+-------------
(0 rows)

monkey=# select * from reporting_table where insert_time = '2016-06-21 17:59:44.154904';
 id  |               data               |        insert_time
-----+----------------------------------+----------------------------
 150 | 27da6c5606ea26d4ca51c6b642547d44 | 2016-06-21 17:59:44.154904
(1 row)

monkey=# explain analyze select * from reporting_table where insert_time = '2016-06-21 17:59:44.154904';
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..125.17 rows=7 width=44) (actual time=0.383..0.384 rows=1 loops=1)
   ->  Seq Scan on reporting_table  (cost=0.00..0.00 rows=1 width=44) (actual time=0.002..0.002 rows=0 loops=1)
         Filter: (insert_time = '2016-06-21 17:59:44.154904'::timestamp without time zone)
   ->  Foreign Scan on data_node_2016  (cost=100.00..125.17 rows=6 width=44) (actual time=0.381..0.381 rows=1 loops=1)
 Planning time: 0.172 ms
 Execution time: 0.801 ms
(6 rows)

Some might say that ok, but we have all the data in 2 places, which is true.. but do we actually need data in the bucket? Answer is no, we don't , we only need them in case we need to update. Remember that we set logical replication to only replicate insert and updates? This means that we can delete whatever we want from either the bucket or its partitions, so we can have any custom data retention, we can even truncate them if we want to remove data fast.
Now, is this solution perfect ? No, it's not, foreign data wrappers are obviously slower and they can't perform all operations but with each Postgres version they are getting better.


Thanks for reading.

Vasilis Ventirozos
OmniTI Computer Consulting

Tuesday, 27 February 2018

Postgres10 in RDS, first impressions

As a firm believer of Postgres, and someone who runs Postgres10 in production and runs RDS in production, I've been waiting for Postgres10 on RDS to be announced ever since the release last fall. Well, today was not that day, but I was surprised to see that RDS is now sporting a "postgres10" instance you can spin up. I'm not sure if thats there on purpose, but you can be sure I jumped at the chance to get a first look at what the new Postgres 10 RDS world might look like; here is what I found..
The first thing that i wanted to test was logical replication. By default it was disabled with rds.logical_replication being set to 0. AWS console allowed me to change this, which also changed wal_level to logical so i started creating a simple table to replicate. I created a publication that included my table but thats where the party stopped. I can't create a role with replication privilege and i can't grant replication to any user :

mydb=> SELECT SESSION_USER, CURRENT_USER;
 session_user | current_user
--------------+---------------
 testuser      | rds_superuser
(1 row)

Time: 143.554 ms
omniti=> alter role testuser with replication;
ERROR:  must be superuser to alter replication users
Time: 163.823 ms

On top of that, create subscription requires superuser. Basically logical replication is there but i don't see how anyone could actually use it. It's well known that RDS replicas can't exist outside RDS. I was hoping that postgres 10 and logical replication would add more flexibility on replicating methods. I don't think this will change anytime soon but maybe they will add functionality in console menus that will control logical replication in their own terms using their rdsadmin user, who knows..

Next thing i wanted to check was parallelism. Remember how I said we run Postgres 10 in production? One thing we found is that there are significant bugs around parallel query, and the only safe way to work around them at this point is to disable.
I was surprised to not only see it enabled, but in fact they are only running 10.1, which does not include a bunch of fixes that we need in our prod instances (not to mention upcoming fixes in 10.3). Presumably they will fix this once it becomes officially released, hopefully on 10.3. For now, please be nice and don't crash their servers just because you can.

I tried a bunch of other features and it sure looked like Postgres10. The new partitioning syntax is there and it works, as well as 
scram-sha-256 . Obviously this is super new and they still have work to do, but I'm really excited about the chance to get a sneak peek and looking forward to seeing this get an official release date, maybe at pgconfus later this year?


Thanks for reading.

Vasilis Ventirozos
OmniTI Computer Consulting

Wednesday, 10 January 2018

AWS Aurora Postgres, not a great first impression

Recently we had a customer facing some memory issues with an Aurora PostgreSQL. What was happening is that while loading data or creating GiST indexes they got the following error :


2018-01-04 14:03:06 UTC:171.32.45.250(35508):root@test:[20457]:ERROR:  out of memory
2018-01-04 14:03:06 UTC:171.32.45.250(35508):root@test:[20457]:DETAIL:  Failed on request of size 23.
2018-01-04 14:03:06 UTC:171.32.45.250(35508):root@test:[20457]:STATEMENT:  INSERT INTO test_table1 VALUES
('domain1.com','Manual','[198.168.1.148,198.168.1.250]','t','f','2016-02-16 15:58:58.016626','',''),   ('domain2.com','Manual','[192.168.1.1,192.168.1.174]','t','f','2016-02-16 15:58:58.023136','',''),
...

At the same time they had a major drop in freeable memory :



logs showed that :


TopMemoryContext: 1184792 total in 13 blocks; 14024 free (5 chunks); 1170768 used
  CFuncHash: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
  Type information cache: 24472 total in 2 blocks; 2840 free (0 chunks); 21632 used
  TopTransactionContext: 8192 total in 1 blocks; 7384 free (1 chunks); 808 used
  MessageContext: 8511544 total in 11 blocks; 2077632 free (0 chunks); 6433912 used
  Operator class cache: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
  smgr relation context: 0 total in 0 blocks; 0 free (0 chunks); 0 used
    smgr relation table: 24576 total in 2 blocks; 13008 free (4 chunks); 11568 used
  TransactionAbortContext: 32768 total in 1 blocks; 32728 free (0 chunks); 40 used
  Portal hash: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
  PortalMemory: 8192 total in 1 blocks; 7880 free (0 chunks); 312 used
    PortalHeapMemory: 1024 total in 1 blocks; 840 free (0 chunks); 184 used
      ExecutorState: 24576 total in 2 blocks; 3328 free (13 chunks); 21248 used
        GiST scan context: 49245329112 total in 5881 blocks; 138720 free (3880 chunks); 49245190392 used
        ExprContext: 8192 total in 1 blocks; 8056 free (0 chunks); 136 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext: 8192 total in 1 blocks; 7464 free (0 chunks); 728 used
  Relcache by OID: 24576 total in 2 blocks; 12960 free (4 chunks); 11616 used
  CacheMemoryContext: 516096 total in 6 blocks; 74648 free (1 chunks); 441448 used
    test_idx1: 1024 total in 1 blocks; 8 free (0 chunks); 1016 used
    test_idx2: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_index_indrelid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_constraint_conrelid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
    pg_db_role_setting_databaseid_rol_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
    pg_user_mapping_user_server_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_user_mapping_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_type_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_type_typname_nsp_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_ts_template_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_ts_template_tmplname_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_ts_parser_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_ts_parser_prsname_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_ts_dict_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_ts_dict_dictname_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_ts_config_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_ts_config_cfgname_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_ts_config_map_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_transform_type_lang_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_transform_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_statistic_relid_att_inh_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_class_relname_nsp_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
    pg_range_rngtypid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_proc_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_opfamily_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_opfamily_am_name_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_operator_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_namespace_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_namespace_nspname_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_language_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_language_name_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_foreign_table_relid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_foreign_server_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_foreign_server_name_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_event_trigger_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_event_trigger_evtname_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_enum_typid_label_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_enum_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_default_acl_role_nsp_obj_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_conversion_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_constraint_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_conversion_name_nsp_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_conversion_default_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_collation_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_collation_name_enc_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_cast_source_target_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_amop_fam_strat_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_amop_opr_fam_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_aggregate_fnoid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1976 free (0 chunks); 1096 used
    pg_opclass_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_index_indexrelid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
    pg_class_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_authid_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_auth_members_member_role_index: 1024 total in 1 blocks; 264 free (0 chunks); 760 used
    pg_tablespace_oid_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_shseclabel_object_index: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used
    pg_replication_origin_roname_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_database_datname_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_replication_origin_roiident_index: 1024 total in 1 blocks; 448 free (0 chunks); 576 used
    pg_auth_members_role_member_index: 1024 total in 1 blocks; 312 free (0 chunks); 712 used
    pg_database_oid_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    pg_authid_rolname_index: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
  WAL record construction: 49768 total in 2 blocks; 6584 free (0 chunks); 43184 used
    GWAL record construction: 1024 total in 1 blocks; 456 free (0 chunks); 568 used
  PrivateRefCount: 8192 total in 1 blocks; 2840 free (0 chunks); 5352 used
  Aurora WAL Context: 8192 total in 1 blocks; 8152 free (0 chunks); 40 used
  Aurora File Context: 8192 total in 1 blocks; 7512 free (0 chunks); 680 used
  MdSmgr: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  LOCALLOCK hash: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
  Timezones: 104120 total in 2 blocks; 2840 free (0 chunks); 101280 used
  ErrorContext: 8192 total in 1 blocks; 8152 free (4 chunks); 40 used
Grand total: 49256003552 bytes in 6012 blocks; 2466216 free (3912 chunks); 49253537336 used
2018-01-04 14:03:06 UTC:172.31.45.250(35508):root@test:[20457]:ERROR:  out of memory
2018-01-04 14:03:06 UTC:172.31.45.250(35508):root@test:[20457]:DETAIL:  Failed on request of size 23.
2018-01-04 14:03:06 UTC:172.31.45.250(35508):root@test:[20457]:STATEMENT:  INSERT INTO test_table1 VALUES
    ('domain1.com','Manual','[198.168.1.148,198.168.1.250]','t','f','2016-02-16 15:58:58.016626','',''),
    ('domain2.com','Manual','[192.168.1.1,192.168.1.174]','t','f','2016-02-16 15:58:58.023136','',''),
...

My first assumption was that this had something to do with GiST or with ip4r extension / datatypes combined with a GiST index.
Unfortunately all this investigation happened over a slack chat so i couldn't collect all the data i would like,  the table was empty and this was affecting production so i suggested a vacuum full verbose that fixed the issue.
It has been bothering me and my colleagues that we never got to find the root cause so I started an aurora instance in AWS and tried to reproduce the problem.

I wrote a shell script that generated a lot of random rows and i've loaded about 4 mil rows in a table that looked like this :

CREATE TYPE inet_range AS RANGE (
  subtype = inet
);

create table gist_test (
testtext1 citext not null,
testtext2 citext not null,
ip_range inet_range not null,
testbool1 boolean not null default false,
testbool2 boolean  not null default false,
created_at timestamp without time zone not null default now(),
testemail1 citext,
testemail2 citext);

create index idx_test1 on gist_test (testtext1);
create index idx_test2 on gist_test using GIST(ip_range);


I Did a bunch of tests, like bloating the gist index up to 80% and trying to insert or to use COPY to import data from external files, everything worked fine. Last i tried importing some data  (about 300k rows) from my local machine with plain insert :


insert into gist_test (testtext1,testtext2,ip_range,testbool1,testbool2,testemail1,testemail2) values
('HyBrlWugbsGH4fusJ9czVneB+58IZaoG8Vx3NV0jgyQ','0SaIKhTlZyD2zlUFpSK0b2Y9gep6hzxO0q2crgm3iRM','[184.224.199.243,184.224.199.255]','t','f','',''),
...
...
('AZgoJrP+QUvhe3TkmaxqoX8ZKv3BQ4a2muvPGuJIqq8','rIWnCq4q7qwLWlPmbYqnk3voOQdA0cPB4AerbeV4lpA','[6.116.212.51,6.116.212.255]','t','f','','');

Aurora went mental, i couldn't cancel the client getting :
Could not send cancel request: 
PQcancel() -- connect() failed: Connection refused
and after a while :
psql: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Service was obviously dead.
The logs said :

2018-01-10 14:53:09 UTC::@:[6393]:FATAL: Storage initialization failed. result = 1436, errno = 0
2018-01-10 14:53:09 UTC::@:[6391]:LOG: Aurora Runtime process (PID 6393) exited with exit code 1
2018-01-10 14:53:09 UTC::@:[6391]:LOG: terminating any other active server processes
2018-01-10 14:53:09 UTC::@:[6391]:FATAL: Can't handle storage runtime process crash
2018-01-10 14:53:09 UTC::@:[6391]:LOG: database system is shut down
2018-01-10 14:53:09 UTC::@:[6751]:LOG: Waiting for runtime initialization complete...


I tried this test 3 times, two out of three the database restarted with the above message and one time i had to manually reboot the instance because it went completely unresponsive.
Now, at my home i just have 1mbit of upload so couldn't possibly send data fast enough to impact the server, cloudwatch was showing that everything was nice and low, except memory :




So basically, this is my progress so far. I might try to re-test this on a normal RDS instance or to try without the indexes, or i will simply file a case to Amazon :)
Obviously this post is not meant to say that Aurora is bad, but if i was about to migrate to Aurora and take advantage of its features i'd make sure to double and triple check that the application is working as it should. This product is very new, i think it was released at Nov 2017 and some problems should be expected, especially if your schema is a bit out of the ordinary.


Thanks for reading,

Vasilis Ventirozos
OmniTI Computer Consulting

Logical Replication in aws RDS

Recently Amazon annnounced that postgres 10 supports logical replication syntax and that it can work as either a publisher or a subscriber. ...