Thursday, 19 May 2016

Repairing clog corruptions

Yesterday, i got a page from a client about a possible corruption issue to one of his clients, so i decided to share how i dealt with it. Before starting with how i approached the problem, i want to say that corruptions don't just happen in postgres, in OmniTI, we manage A LOT of databases, all with different major versions and in different operating systems and in my time, I haven't seen (many) cases that corruption happened because of a postgres bug (i've seen indexes getting corrupted but i don't remember ever seeing data being corrupted). What i want to say is that corruptions don't just happen, hardware failures, power outages, disks getting full are common reasons for corruptions. 
A replica and backups should always be in place and the server should be properly monitored. Unfortunately this server was not managed by us so none of the above was in place..

At first I saw in the logs entries like :


From the logs:
2016-05-18 15:50:06 EDT::@:[10191]: ERROR:  could not access status of transaction 128604706
2016-05-18 15:50:06 EDT::@:[10191]: DETAIL:  Could not open file "pg_clog/007A": No such file or directory.
2016-05-18 15:50:06 EDT::@:[10191]: CONTEXT:  automatic vacuum of table "pe.pg_toast.pg_toast_4409334"
...
much more to be expected...
...

At this point you know that something went wrong, ideally you want to copy your whole $PGDATA to somewhere else and work there while at the same time you start considering uncompressing your most recent backup. In my case,  there was no backup and not enough disk space to copy $PGDATA.
FUN TIMES :)

I started by fixing all clogs missing which i found  logs
dd if=/dev/zero of=/var/db/pgdata/pg_clog/0114 bs=256k count=1
dd if=/dev/zero of=/var/db/pgdata/pg_clog/00D1 bs=256k count=1
dd if=/dev/zero of=/var/db/pgdata/pg_clog/0106 bs=256k count=1
...
...
keep creating until logs are clean, they can be a lot, in my case they were more than 100....
...
...


From the logs i also found the tables :

pg_toast.pg_toast_18454
pg_toast.pg_toast_35597
pg_toast.pg_toast_35607
pg_toast.pg_toast_4409334
pg_toast.pg_toast_4409344
pg_toast.pg_toast_8817516


db=# select relname,relnamespace from pg_class where oid in (18454,35597,35607,4409334,4409344,8817516) order by relnamespace;
        relname       | relnamespace
------------------------+--------------
table_case1 |        16872
table_case1 |        16874
table_case2 |        16874
table_case2 |      4409063
table_case1 |      4409063
table_case2 |      8817221
(6 rows)

db=# select oid,nspname from pg_namespace where oid in (16872,16874,16874,4409063,8817221) order by oid;
   oid   |  nspname
---------+------------
   16872 | schema1
   16874 | schema2
 4409063 | schema3
 8817221 | schema4
(4 rows)

With a join i found schema.table:
schema1.table_case1
schema2.table_case1
schema2.table_case2
schema3.table_case2
schema3.table_case1
schema4.table_case2

Basically we have an application with multiple schemas and 2 kinds of tables were corrupted across 4 schemas.

For table_case1 (simple case, table not referenced by anyone):
for each schema :

set search_path to schema1;
create table badrows (badid int);

DO $f$
declare
    curid INT := 0;
    vcontent TEXT;
    badid INT;
begin
FOR badid IN SELECT id FROM table_case1 LOOP
    curid = curid + 1;
    if curid % 100000 = 0 then
        raise notice '% rows inspected', curid;
    end if;
    begin
        SELECT id
        INTO vcontent
        FROM table_case1 where id = badid;
    exception
        when others then
            insert into badrows values (badid);
raise notice 'data for id % is corrupt', badid;
            continue;
    end;
end loop;
end;
$f$;

(This script was taken from Josh Berkus blog, and it was modified to fill my needs.)

create table table_case1_clean as select * from table_case1
where id not in (select badid from badrows);

TRUNCATE table_case1;
vacuum full verbose table_case1;
insert into table_case1 select * from table_case1_clean;
vacuum full analyze verbose table_case1;
drop table badrows;


For table_case2 (this one is being referenced by 2 other tables) 
F or each org (schema):

set search_path to schema2;

create table badrows (badid int);

DO $f$
declare
    curid INT := 0;
    vcontent TEXT;
    badid INT;
begin
FOR badid IN SELECT id FROM table_case2 LOOP
    curid = curid + 1;
    if curid % 100000 = 0 then
        raise notice '% rows inspected', curid;
    end if;
    begin
        SELECT id
        INTO vcontent
        FROM table_case2 where id = badid;
    exception
        when others then
            insert into badrows values (badid);
            raise notice 'data for id % is corrupt', badid;
            continue;
    end;
end loop;
end;
$f$;


create table table_case2_clean as select * from table_case2
where id not in (select badid from badrows);

alter table reference_table1 drop constraint reference_table1_fkey;
delete from reference_table1 where table_case2_id in (select badid from badrows) ;

alter table reference_table2 drop constraint reference_table2_fkey;
delete from reference_table2 where table_case2_id in (select badid from badrows);

TRUNCATE table_case2;
vacuum full verbose table_case2;
insert into table_case2 select * from table_case2_clean;
vacuum full analyze verbose table_case2;

ALTER TABLE ONLY reference_table1
    ADD CONSTRAINT reference_table1_fkey FOREIGN KEY (table_case2_id) REFERENCES table_case2(id) ON DELETE CASCADE;

ALTER TABLE ONLY reference_table2
    ADD CONSTRAINT reference_table2_fkey FOREIGN KEY (table_case2_id) REFERENCES table_case2(id);

drop table badrows;

(please ignore or report any typos here, i replaced the real table names while i was writing this post so i might messed up with some names).


What we basically did here was to recreate the table without the corrupted rows.
After this, tables should be corruption free with the minimum possible data loss.
To ensure that you are corruption free you should either pg_dump and restore, or vacuum full everythingnormal vacuum will NOT show corruptions.


pe=# vacuum verbose schema1.table_case1;
INFO:  vacuuming " schema1.table_case1"
INFO:  index " schema1.table_case1_pkey" now contains 12175 row versions in 36 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.15 sec.
INFO:  " table_case1": found 0 removable, 12175 nonremovable row versions in 258 out of 258 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.17 sec.
INFO:  vacuuming "pg_toast.pg_toast_18454"
INFO:  index "pg_toast_18454_index" now contains 51370 row versions in 143 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_18454": found 0 removable, 51370 nonremovable row versions in 12331 out of 12331 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.05s/0.03u sec elapsed 0.09 sec.
VACUUM
pe=#
pe=#
pe=# vacuum full verbose  schema1.table_case1;
INFO:  vacuuming " schema1.table_case1"
ERROR:  missing chunk number 0 for toast value 9270408 in pg_toast_18454



Rows were lost, in my case that was acceptable and maybe your case is not the same as mine, but i hope this will provide some guidance in case you get into a similar situation..


Thanks for reading
- Vasilis