Repairing clog corruptions

Yesterday, i got a page from a client about a possible corruption issue to one of his clients, so i decided to share how i dealt with it. Before starting with how i approached the problem, i want to say that corruptions don't just happen in postgres, in OmniTI, we manage A LOT of databases, all with different major versions and in different operating systems and in my time, I haven't seen (many) cases that corruption happened because of a postgres bug (i've seen indexes getting corrupted but i don't remember ever seeing data being corrupted). What i want to say is that corruptions don't just happen, hardware failures, power outages, disks getting full are common reasons for corruptions.
A replica and backups should always be in place and the server should be properly monitored. Unfortunately this server was not managed by us so none of the above was in place..

At first I saw in the logs entries like :

From the logs:

2016-05-18 15:50:06 EDT::@:[10191]: ERROR: could not access status of transaction 128604706

2016-05-18 15:50:06 EDT::@:[10191]: DETAIL: Could not open file "pg_clog/007A": No such file or directory.

2016-05-18 15:50:06 EDT::@:[10191]: CONTEXT: automatic vacuum of table "pe.pg_toast.pg_toast_4409334"

...

much more to be expected...

...

At this point you know that something went wrong, ideally you want to copy your whole $PGDATA to somewhere else and work there while at the same time you start considering uncompressing your most recent backup. In my case, there was no backup and not enough disk space to copy $PGDATA.
FUN TIMES :)

I started by fixing all clogs missing which i found logs
dd if=/dev/zero of=/var/db/pgdata/pg_clog/0114 bs=256k count=1
dd if=/dev/zero of=/var/db/pgdata/pg_clog/00D1 bs=256k count=1
dd if=/dev/zero of=/var/db/pgdata/pg_clog/0106 bs=256k count=1

...

keep creating until logs are clean, they can be a lot, in my case they were more than 100....

...

From the logs i also found the tables :

pg_toast.pg_toast_18454

pg_toast.pg_toast_35597

pg_toast.pg_toast_35607

pg_toast.pg_toast_4409334

pg_toast.pg_toast_4409344

pg_toast.pg_toast_8817516

db=# select relname,relnamespace from pg_class where oid in (18454,35597,35607,4409334,4409344,8817516) order by relnamespace;

relname | relnamespace

------------------------+--------------

table_case1 | 16872

table_case1 | 16874

table_case2 | 16874

table_case2 | 4409063

table_case1 | 4409063

table_case2 | 8817221

(6 rows)

db=# select oid,nspname from pg_namespace where oid in (16872,16874,16874,4409063,8817221) order by oid;

oid | nspname

---------+------------

16872 | schema1

16874 | schema2

4409063 | schema3

8817221 | schema4

(4 rows)

With a join i found schema.table:

schema1.table_case1

schema2.table_case1

schema2.table_case2

schema3.table_case2

schema3.table_case1

schema4.table_case2

Basically we have an application with multiple schemas and 2 kinds of tables were corrupted across 4 schemas.

For table_case1 (simple case, table not referenced by anyone):

for each schema :

set search_path to schema1;

create table badrows (badid int);

DO $f$

declare

curid INT := 0;

vcontent TEXT;

badid INT;

begin

FOR badid IN SELECT id FROM table_case1 LOOP

curid = curid + 1;

if curid % 100000 = 0 then

raise notice '% rows inspected', curid;

end if;

begin

SELECT id

INTO vcontent

FROM table_case1 where id = badid;

exception

when others then

insert into badrows values (badid);

raise notice 'data for id % is corrupt', badid;

continue;

end;

end loop;

end;

$f$;

(This script was taken from Josh Berkus blog, and it was modified to fill my needs.)

create table table_case1_clean as select * from table_case1

where id not in (select badid from badrows);

TRUNCATE table_case1;

vacuum full verbose table_case1;

insert into table_case1 select * from table_case1_clean;

vacuum full analyze verbose table_case1;

drop table badrows;

For table_case2 (this one is being referenced by 2 other tables)

F or each org (schema):

set search_path to schema2;

create table badrows (badid int);

DO $f$

declare

curid INT := 0;

vcontent TEXT;

badid INT;

begin

FOR badid IN SELECT id FROM table_case2 LOOP

curid = curid + 1;

if curid % 100000 = 0 then

raise notice '% rows inspected', curid;

end if;

begin

SELECT id

INTO vcontent

FROM table_case2 where id = badid;

exception

when others then

insert into badrows values (badid);

raise notice 'data for id % is corrupt', badid;

continue;

end;

end loop;

end;

$f$;

create table table_case2_clean as select * from table_case2

where id not in (select badid from badrows);

alter table reference_table1 drop constraint reference_table1_fkey;

delete from reference_table1 where table_case2_id in (select badid from badrows) ;

alter table reference_table2 drop constraint reference_table2_fkey;

delete from reference_table2 where table_case2_id in (select badid from badrows);

TRUNCATE table_case2;

vacuum full verbose table_case2;

insert into table_case2 select * from table_case2_clean;

vacuum full analyze verbose table_case2;

ALTER TABLE ONLY reference_table1

ADD CONSTRAINT reference_table1_fkey FOREIGN KEY (table_case2_id) REFERENCES table_case2(id) ON DELETE CASCADE;

ALTER TABLE ONLY reference_table2

ADD CONSTRAINT reference_table2_fkey FOREIGN KEY (table_case2_id) REFERENCES table_case2(id);

drop table badrows;

(please ignore or report any typos here, i replaced the real table names while i was writing this post so i might messed up with some names).

What we basically did here was to recreate the table without the corrupted rows.

After this, tables should be corruption free with the minimum possible data loss.

To ensure that you are corruption free you should either pg_dump and restore, or vacuum full everything, normal vacuum will NOT show corruptions.

pe=# vacuum verbose schema1.table_case1;

INFO: vacuuming " schema1.table_case1"

INFO: index " schema1.table_case1_pkey" now contains 12175 row versions in 36 pages

DETAIL: 0 index row versions were removed.

0 index pages have been deleted, 0 are currently reusable.

CPU 0.00s/0.00u sec elapsed 0.15 sec.

INFO: " table_case1": found 0 removable, 12175 nonremovable row versions in 258 out of 258 pages

DETAIL: 0 dead row versions cannot be removed yet.

There were 0 unused item pointers.

0 pages are entirely empty.

CPU 0.00s/0.00u sec elapsed 0.17 sec.

INFO: vacuuming "pg_toast.pg_toast_18454"

INFO: index "pg_toast_18454_index" now contains 51370 row versions in 143 pages

DETAIL: 0 index row versions were removed.

0 index pages have been deleted, 0 are currently reusable.

CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO: "pg_toast_18454": found 0 removable, 51370 nonremovable row versions in 12331 out of 12331 pages

DETAIL: 0 dead row versions cannot be removed yet.

There were 0 unused item pointers.

0 pages are entirely empty.

CPU 0.05s/0.03u sec elapsed 0.09 sec.

VACUUM

pe=#

pe=# vacuum full verbose schema1.table_case1;

INFO: vacuuming " schema1.table_case1"

ERROR: missing chunk number 0 for toast value 9270408 in pg_toast_18454

Rows were lost, in my case that was acceptable and maybe your case is not the same as mine, but i hope this will provide some guidance in case you get into a similar situation..

Thanks for reading

- Vasilis

Evol Monkey

Repairing clog corruptions

Comments

Post a Comment

Popular posts from this blog

Tuning checkpoints

AWS Aurora Postgres, not a great first impression

Accessing PostgreSQL data from AWS Lambda