At the same time they had a major drop in freeable memory :
logs showed that :
My first assumption was that this had something to do with GiST or with ip4r extension / datatypes combined with a GiST index.
Unfortunately all this investigation happened over a slack chat so i couldn't collect all the data i would like, the table was empty and this was affecting production so i suggested a vacuum full verbose that fixed the issue.
It has been bothering me and my colleagues that we never got to find the root cause so I started an aurora instance in AWS and tried to reproduce the problem.
I wrote a shell script that generated a lot of random rows and i've loaded about 4 mil rows in a table that looked like this :
I Did a bunch of tests, like bloating the gist index up to 80% and trying to insert or to use COPY to import data from external files, everything worked fine. Last i tried importing some data (about 300k rows) from my local machine with plain insert :
Aurora went mental, i couldn't cancel the client getting :
Could not send cancel request:
PQcancel() -- connect() failed: Connection refused
and after a while :
psql: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Service was obviously dead.
The logs said :
I tried this test 3 times, two out of three the database restarted with the above message and one time i had to manually reboot the instance because it went completely unresponsive.
Now, at my home i just have 1mbit of upload so couldn't possibly send data fast enough to impact the server, cloudwatch was showing that everything was nice and low, except memory :
So basically, this is my progress so far. I might try to re-test this on a normal RDS instance or to try without the indexes, or i will simply file a case to Amazon :)
Obviously this post is not meant to say that Aurora is bad, but if i was about to migrate to Aurora and take advantage of its features i'd make sure to double and triple check that the application is working as it should. This product is very new, i think it was released at Nov 2017 and some problems should be expected, especially if your schema is a bit out of the ordinary.
Thanks for reading,
OmniTI Computer Consulting