Problems When Creating Reports From Slave

The document discusses issues that can occur when running reports from a PostgreSQL slave database. Running intensive reporting queries on the slave can conflict with replication from the master, causing errors or the slave's progress to slow or stop. Setting parameters like max_standby_streaming_delay to high values could theoretically allow denial of service by overloading the slave with reports.

Uploaded by

Moïse Kouadio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

Problems When Creating Reports From Slave

Uploaded by

Moïse Kouadio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Problems when creating reports from slave

The last month I was working on fixing some issues on a database system I am maintaining. We
have a PostgreSQL master instance that is being replicated using streaming replication to a slave
instance. As there is a lot of connections on the master server which are using a lot of resources we
had an idea to move all the reporting stuff to the slave server. And this is usually the way to do it.
Why spend critical resources on the master when you only have to read data and generate reports
that are displayed as web pages? So, after a few tweaks to our application we had the first reports
from the slave. And everybody was happy. Well, until someone run a slightly bigger report. And 30
seconds later you get an error that looks like this:
ERROR: canceling statement due to conflict with recovery SQL state: 40001 D
etail: User query might have needed to see row versions that must be remove
d.
Or one of the following flavours:

FATAL: terminating connection due to conflict with recovery SQL state: 4000
1 Detail: User was holding a relation lock for too long. Hint: In a moment
you should be able to reconnect to the database and repeat your command.
FATAL: terminating connection due to conflict with recovery SQL state: 4000
1 Detail: User query might have needed to see row versions that must be rem
oved. Hint: In a moment you should be able to reconnect to the database and
repeat your command.
So what is going on here? Well, our slave instance has some limitations regarding the queries that
you can run on it. On our master new data is coming all the time and some of the existing rows are
changed by this new data. When these changed rows are replicated on the slave the queries that run
on it might be canceled or even worse, their connection can be terminated. And this is a well known
problem, so there are some parameters you can change on the slave to fix this. Or better yet, you
[1]
can discover that using PostgreSQL you develop superpowers and are now able to stop time .
[1] Well, you can’t stop time, but you can do almost as good. Read on :)

Simulation preparations

To demonstrate this problem I will need:

• A PostgreSQL master and a slave connected with streaming replication

• A table with some data
CREATE TABLE a( id BIGINT NOT NULL, c TIMESTAMPTZ NOT NULL DEFA
ULT now() ); INSERT INTO a (SELECT generate_series(1, 10000));
• Several consoles/pgAdminIII query windows to run queries on. I will mark the connection on
each server with the connection number.

Simulation 1 - basic problem demonstration

We have a table a with 10000 rows of data and we want to change some of it. To do that we can run
the following script:
-- master - connection 1 = mc1 WITH random_rows AS ( SELECT floor(ran
dom() * 10000) + 1 as id FROM generate_series(1, 20) ) UPDATE a SET
c = now() WHERE id IN (SELECT id FROM random_rows); SELECT txid_curren
t();
You should run this as a whole so that the last statement can tell you the transaction ID that will be
present on the rows that the statement has changed.
On the other server you can run a very intensive reporting query. Like this:

-- slave - connection 1 = sc1 BEGIN TRANSACTION; SELECT * FROM a; SE

LECT pg_sleep(300); COMMIT;
While the sc1 is running you can run the mc1 query u couple of times, and about 30 seconds after
that on the sc1 you will get the error.
So why 30 seconds? Easy, the
parameters max_standby_archive_delay andmax_standby_streaming_delay have a default
value of 30 seconds. And the docs say:
When Hot Standby is active, this parameter determines how long the standby server should wait
before canceling standby queries that conflict with about-to-be-applied WAL entries, as described
in Section 25.5.2. max_standby_streaming_delay applies when WAL data is being received via
streaming replication. The default is 30 seconds. Units are milliseconds if not specified. A value of -
1 allows the standby to wait forever for conflicting queries to complete. This parameter can only be
set in the postgresql.conf file or on the server command line.
Note that max_standby_streaming_delay is not the same as the maximum length of time a query
can run before cancellation; rather it is the maximum total time allowed to apply WAL data once it
has been received from the primary server. Thus, if one query has resulted in significant delay,
subsequent conflicting queries will have much less grace time until the standby server has caught
up again.
So, if I am reading this correctly if I can create a specific order of reporting queries on the slave, I
can stop the WAL applying process. As a consequence of that I can stop time on the slave :) Well,
having superpowers is great, but I will have to prove that I have them fist.
Simulation 2 - Can I stop time?

So, lets set max_standby_archive_delay and max_standby_streaming_delay to -1 fist and reload

the configuration on both servers. To be able to detect the time has stopped we can tun the
following query:
-- slave - connection 2 = sc2 SELECT t.id, t.xmin, t.c, now(), clock_times
tamp() FROM a t WHERE t.c = (SELECT max(c) FROM a)
In the xmin column you can see the transaction ID that created this row, the c column will be the
time when the row was created. now() will return the time this transaction started,
and clock_timestamp() will return the current time on the server running the query.
So let’s run sc1, then sc2, and after that run mc1 a two times. For example, sc2 will tell you that the
last inserted rows were inserted in the XID=714. And mc1 will tell you that ih has created rows in
XIDs 715 and 716. If you rerun sc2 you will see the last XID equals 715. Hmmm…. So, we advanced
one transaction. I was expecting that we would stop. As a mater of fact, this behaviour is not
consistent, and I think id depends on the rows and pages that were changed. Sometimes
the XID doesn’t change, but this is very rare. In most of my simulations I got an increment of +1.
And the time function tell me the time didn’t stop. I was hoping to get some superpowers,
but nothing.
But, it is not that bad. Actually time on the slave goes on, but the data on the slave mimics the
situation when the time on the master has stopped.

Can I use this knowledge to create a disaster scenario? Become evil?

Simulation 3 - Breaking bad

Out reporting query lasts only 300 seconds, so it slows down time for only that amount of time. If I
can run a similar query before the first one finishes, it should slow time down during its duration.
Let’s use the same query:

-- slave - connection 3 = sc3 BEGIN TRANSACTION; SELECT * FROM a; SE

LECT pg_sleep(300); COMMIT;
You run sc1 and before it finishes run sc3. During this execution you can run mc1simulating traffic on
the master. If you keep running sc2 you will see that after sc1finishes, XID increases by 1 (or better
yet 0). So before sc3 finished, you rerun sc1, and so forth.
We now have 2 reporting queries that are constantly overlapping and causing the slave behave like it
is moving very slowly forward in time (or better yet it is frozen in time if you are lucky). This still
doesn’t look so bad. No?

Scenario 4 - Denial of service (DOS)

At the end of scenario 3, having the sc1 and sc3 reports constantly overlapping each other, the
master crashes and the slave server needs to be promoted. So you issue a promote request (The
actual command might vary depending on the linux distribution.):
pg_ctl promote -D /var/lib/postgresql/9.4/main/
And… nothing happens. The slave is not promoted, because WAL replaying is being throttled by the
reporting queries. We have no master, and the slave won’t promote until all WAL records are
replayed. You can still read data from the slave, but no new data will get in. It sounds pretty bad.
Conclusion

The described scenarios are only theoretical, but with the values
ofmax_standby_archive_delay and max_standby_streaming_delay set to -1, and a steady input
of reporting queries, it might just be possible. Also, your reports might be wrong because they are
operating on ancient data, and not almost real-time data. And the DOS scenario is really far fetched
(but you should really try it at home, not on production).
Setting the hot_standby_feedback = on seems to solve the reporting errors, but not fully. We still
got some errors after setting it to on, but I haven’t found a set of example tables/queries to
demonstrate that. If I do, I will make a sequel to this post. Untill then, have some sane values
for max_standby_archive_delay and max_standby_streaming_delay, let’s say a few minutes so
that your longest report will finish. Some will fail, but be prepared for that and you will be ok.

PLUK2011 Diagnosing and Fixing Mysql Replication
No ratings yet
PLUK2011 Diagnosing and Fixing Mysql Replication
56 pages
Diagnosing Failures in Mysql Replication: Devananda Van Der Veen Percona Live 2012
No ratings yet
Diagnosing Failures in Mysql Replication: Devananda Van Der Veen Percona Live 2012
52 pages
MySQL Replication Tutorial
100% (9)
MySQL Replication Tutorial
114 pages
hot_streaming_rep
No ratings yet
hot_streaming_rep
22 pages
Accidentaldbalinuxcon 130102190320 Phpapp02
No ratings yet
Accidentaldbalinuxcon 130102190320 Phpapp02
61 pages
Mysql Replication PDF
No ratings yet
Mysql Replication PDF
15 pages
Replication Webinar
No ratings yet
Replication Webinar
30 pages
Postgresql Tuning Guide: Postgresql Architecture: Key Takeaways
No ratings yet
Postgresql Tuning Guide: Postgresql Architecture: Key Takeaways
8 pages
12 Transaction
No ratings yet
12 Transaction
37 pages
11 Transaction 1928dbhc
No ratings yet
11 Transaction 1928dbhc
19 pages
Built-In Physical and Logical Replication in Postgresql
No ratings yet
Built-In Physical and Logical Replication in Postgresql
49 pages
CHAPTER III - ppt7978863
No ratings yet
CHAPTER III - ppt7978863
51 pages
Mysql Pacemaker
No ratings yet
Mysql Pacemaker
49 pages
Challenges of Distributing Postgres: A Citus Story: Ozgun Erdogan
No ratings yet
Challenges of Distributing Postgres: A Citus Story: Ozgun Erdogan
47 pages
Postgresql Concurrency Issues: Tom Lane Red Hat Database Group Red Hat, Inc
No ratings yet
Postgresql Concurrency Issues: Tom Lane Red Hat Database Group Red Hat, Inc
36 pages
Document 1577406.1
No ratings yet
Document 1577406.1
4 pages
PostgreSQL Replication - Second Edition - Sample Chapter
No ratings yet
PostgreSQL Replication - Second Edition - Sample Chapter
27 pages
Distributed PostgreSQL
No ratings yet
Distributed PostgreSQL
118 pages
10 Concurrency
No ratings yet
10 Concurrency
26 pages
Database Isolation Levels
No ratings yet
Database Isolation Levels
65 pages
ch16_overview_xacts (1)
No ratings yet
ch16_overview_xacts (1)
18 pages
MySQL Stores The Update On Disk To Make It Permanent.
No ratings yet
MySQL Stores The Update On Disk To Make It Permanent.
8 pages
Understanding Multi-Master Replication 2
No ratings yet
Understanding Multi-Master Replication 2
31 pages
lecture6-GeneralizedIsolation
No ratings yet
lecture6-GeneralizedIsolation
37 pages
Making Master - Slave Systems Work Better With Pgpool-II - Tatsuo Ishii, SRAA OSS, Inc. Japan
No ratings yet
Making Master - Slave Systems Work Better With Pgpool-II - Tatsuo Ishii, SRAA OSS, Inc. Japan
26 pages
SQL - Transactions
No ratings yet
SQL - Transactions
19 pages
Master Slave Pgpool
No ratings yet
Master Slave Pgpool
26 pages
Introduction Postgre SQLAdministration V11
No ratings yet
Introduction Postgre SQLAdministration V11
274 pages
Major Features: Postgres 10: Ruce Omjian
No ratings yet
Major Features: Postgres 10: Ruce Omjian
20 pages
Replicacion MySql
No ratings yet
Replicacion MySql
87 pages
Postgrre
No ratings yet
Postgrre
14 pages
Enable Online Backup in PostgreSQL
No ratings yet
Enable Online Backup in PostgreSQL
4 pages
Mysql High Availability
No ratings yet
Mysql High Availability
94 pages
2020300053_ADBMS_EXP2_Chinmay
No ratings yet
2020300053_ADBMS_EXP2_Chinmay
7 pages
Postgres For Interview
100% (1)
Postgres For Interview
15 pages
Postgresql Management and Automation With Clustercontrol
50% (2)
Postgresql Management and Automation With Clustercontrol
42 pages
18-Recovery
No ratings yet
18-Recovery
53 pages
MySQL Replication Tutorial
100% (6)
MySQL Replication Tutorial
114 pages
Hbase Hive Pig
No ratings yet
Hbase Hive Pig
144 pages
535 PGCon 2019 Borodin Odyssey
No ratings yet
535 PGCon 2019 Borodin Odyssey
66 pages
Galera Cluster
100% (1)
Galera Cluster
106 pages
Restart Logic in DB2
100% (1)
Restart Logic in DB2
24 pages
PostgreSQL - Identifying Slow Queries and Fixing Them
No ratings yet
PostgreSQL - Identifying Slow Queries and Fixing Them
40 pages
Replication and Consistency For Mysql
No ratings yet
Replication and Consistency For Mysql
5 pages
Upgrade To PostgreSQL11
No ratings yet
Upgrade To PostgreSQL11
36 pages
High Availability With Postgresql and Pacemaker: Your Presenter
No ratings yet
High Availability With Postgresql and Pacemaker: Your Presenter
14 pages
Implementing Failover of Logical Replication Slots in Patroni
No ratings yet
Implementing Failover of Logical Replication Slots in Patroni
37 pages
Lect-Transactions-1-Week 12 (TEL)
No ratings yet
Lect-Transactions-1-Week 12 (TEL)
37 pages
Session 8 9 Questions
No ratings yet
Session 8 9 Questions
27 pages
Lecture 18
No ratings yet
Lecture 18
49 pages
Experiment 8
No ratings yet
Experiment 8
11 pages
Replication
No ratings yet
Replication
16 pages
SGDB
No ratings yet
SGDB
14 pages
Notes From Session
No ratings yet
Notes From Session
6 pages
PostgreSQL - MERGE JOIN Vs HASH JOIN
No ratings yet
PostgreSQL - MERGE JOIN Vs HASH JOIN
3 pages
Deep Dive Into Postgresql Statistics Pgconf Us 2016 160413073045
No ratings yet
Deep Dive Into Postgresql Statistics Pgconf Us 2016 160413073045
54 pages
Introduction To PostgreSQL Physical Storage
No ratings yet
Introduction To PostgreSQL Physical Storage
5 pages
Upgrading RAC From 10g To 11g-Linux
No ratings yet
Upgrading RAC From 10g To 11g-Linux
22 pages
Upgrading Oracle Used DBUA To 11g
No ratings yet
Upgrading Oracle Used DBUA To 11g
20 pages