WhiteBoard Coder: Postgres pg_dumpall with compression

To do this on an Ubuntu machine is relatively easy.

Just run the following commands.

First log in as the postgres user

> sudo su postgres

Then run pg_dumpall and pipe it to a compression program, in my case I use bzip2

> pg_dumpall | bzip2 -vf > database.`date +"%Y.%m.%d"`.pgdumpall.bz2

I like to time mine so I add the time command

> time pg_dumpall | bzip2 -vf > database.`date +"%Y.%m.%d"`.pgdumpall.bz2

Replication Database

However, this does not work to well on my current database. I am trying to take a dumpall from a Replication database that has yet to catch up on all its WAL files. If I run the dumpall command I get the following error.

pg_dump: Error message from server: ERROR: canceling statement due to conflict with recovery

DETAIL: User query might have needed to see row versions that must be removed.

If I run the following command as the root user:

> ps -AF | grep post

Here you can see its still recovering a WAL file from the Master Database. If I compare it to the WAL files in the Master database it is about 30-35 behind. It seems to linger here not losing too much ground or gaining. I would think it would constantly get better or get worse. But for some reason it just remains in this range… so far.

postgresql.conf

Open up postgresql.conf for editing.

> vi /etc/postgresql/9.1/main/postgresql.conf

I updated these two settings that relate to query timeout in a replication database.

@line 212

max_standby_archive_delay = 600s

max_standby_streaming_delay = 600s

I upped them to 600 seconds to see if that is a high enough timetout to get a dumpall to work.

After updating that you need to restart postgres

> /etc/init.d/postgresql restart

This got a little further but still was not enough to finish my pg_dumpall.

So I upped it to 1200s and restarted the postgresql, that was not enough so I upped it to 2400s.

It looks like when you do this and the replication server has not caught up it puts it in a waiting mode

Which is causing it to fall further behind.

So I guess taking a pg_dumpall of a replication server that has not caught up yet is generally a bad idea?

In the end I upped the timeout to 8 hrs (28800s). That gave me enough time to create a backup and also after the backup was complete, even though it had fallen further behind it caught back up to 30-35 WAL files behind the live database.

This is another issue I need to figure out.

References

Postgres pg_dumpall with compression

Posted on Tuesday, May 15, 2012

Replication Database

postgresql.conf

3 comments: