@@ -660,29 +660,79 @@ HINT: To avoid a database shutdown, execute a database-wide VACUUM in that data
660
660
</programlisting>
661
661
662
662
(A manual <command>VACUUM</command> should fix the problem, as suggested by the
663
- hint; but note that the <command>VACUUM</command> must be performed by a
664
- superuser, else it will fail to process system catalogs and thus not
665
- be able to advance the database's <structfield>datfrozenxid</structfield>.)
666
- If these warnings are
667
- ignored, the system will shut down and refuse to start any new
668
- transactions once there are fewer than three million transactions left
669
- until wraparound:
663
+ hint; but note that the <command>VACUUM</command> should be performed by a
664
+ superuser, else it will fail to process system catalogs, which prevent it from
665
+ being able to advance the database's <structfield>datfrozenxid</structfield>.)
666
+ If these warnings are ignored, the system will refuse to assign new XIDs once
667
+ there are fewer than three million transactions left until wraparound:
670
668
671
669
<programlisting>
672
670
ERROR: database is not accepting commands to avoid wraparound data loss in database "mydb"
673
671
HINT: Stop the postmaster and vacuum that database in single-user mode.
674
672
</programlisting>
675
673
676
- The three-million-transaction safety margin exists to let the
677
- administrator recover without data loss, by manually executing the
678
- required <command>VACUUM</command> commands. However, since the system will not
679
- execute commands once it has gone into the safety shutdown mode,
680
- the only way to do this is to stop the server and start the server in single-user
681
- mode to execute <command>VACUUM</command>. The shutdown mode is not enforced
682
- in single-user mode. See the <xref linkend="app-postgres"/> reference
683
- page for details about using single-user mode.
674
+ In this condition any transactions already in progress can continue,
675
+ but only read-only transactions can be started. Operations that
676
+ modify database records or truncate relations will fail.
677
+ The <command>VACUUM</command> command can still be run normally.
678
+ Contrary to what the hint states, it is not necessary or desirable to stop the
679
+ postmaster or enter single user-mode in order to restore normal operation.
680
+ Instead, follow these steps:
681
+
682
+ <orderedlist>
683
+ <listitem>
684
+ <simpara>Resolve old prepared transactions. You can find these by checking
685
+ <link linkend="view-pg-prepared-xacts">pg_prepared_xacts</link> for rows where
686
+ <literal>age(transactionid)</literal> is large. Such transactions should be
687
+ committed or rolled back.</simpara>
688
+ </listitem>
689
+ <listitem>
690
+ <simpara>End long-running open transactions. You can find these by checking
691
+ <link linkend="monitoring-pg-stat-activity-view">pg_stat_activity</link> for rows where
692
+ <literal>age(backend_xid)</literal> or <literal>age(backend_xmin)</literal> is
693
+ large. Such transactions should be committed or rolled back, or the session
694
+ can be terminated using <literal>pg_terminate_backend</literal>.</simpara>
695
+ </listitem>
696
+ <listitem>
697
+ <simpara>Drop any old replication slots. Use
698
+ <link linkend="monitoring-pg-stat-replication-view">pg_stat_replication</link> to
699
+ find slots where <literal>age(xmin)</literal> or <literal>age(catalog_xmin)</literal>
700
+ is large. In many cases, such slots were created for replication to servers that no
701
+ longer exist, or that have been down for a long time. If you drop a slot for a server
702
+ that still exists and might still try to connect to that slot, that replica may
703
+ need to be rebuilt.</simpara>
704
+ </listitem>
705
+ <listitem>
706
+ <simpara>Execute <command>VACUUM</command> in the target database. A database-wide
707
+ <literal>VACUUM</literal> is simplest; to reduce the time required, it as also possible
708
+ to issue manual <command>VACUUM</command> commands on the tables where
709
+ <structfield>relminxid</structfield> is oldest. Do not use <literal>VACUUM FULL</literal>
710
+ in this scenario, because it requires an XID and will therefore fail, except in super-user
711
+ mode, where it will instead consume an XID and thus increase the risk of transaction ID
712
+ wraparound. Do not use <literal>VACUUM FREEZE</literal> either, because it will do
713
+ more than the minimum amount of work required to restore normal operation.</simpara>
714
+ </listitem>
715
+ <listitem>
716
+ <simpara>Once normal operation is restored, ensure that autovacuum is properly configured
717
+ in the target database in order to avoid future problems.</simpara>
718
+ </listitem>
719
+ </orderedlist>
684
720
</para>
685
721
722
+ <note>
723
+ <para>
724
+ In earlier versions, it was sometimes necessary to stop the postmaster and
725
+ <command>VACUUM</command> the database in a single-user mode. In typical scenarios, this
726
+ is no longer necessary, and should be avoided whenever possible, since it involves taking
727
+ the system down. It is also riskier, since it disables transaction ID wraparound safeguards
728
+ that are designed to prevent data loss. The only reason to use single-user mode in this
729
+ scenario is if you wish to <command>TRUNCATE</command> or <command>DROP</command> unneeded
730
+ tables to avoid needing to <command>VACUUM</command> them. The three-million-transaction
731
+ safety margin exists to let the administrator do this. See the
732
+ <xref linkend="app-postgres"/> reference page for details about using single-user mode.
733
+ </para>
734
+ </note>
735
+
686
736
<sect3 id="vacuum-for-multixact-wraparound">
687
737
<title>Multixacts and Wraparound</title>
688
738
@@ -747,6 +797,38 @@ HINT: Stop the postmaster and vacuum that database in single-user mode.
747
797
have the oldest multixact-age. Both of these kinds of aggressive
748
798
scans will occur even if autovacuum is nominally disabled.
749
799
</para>
800
+
801
+ <para>
802
+ Similar to the XID case, if autovacuum fails to clear old MXIDs from a table, the
803
+ system will begin to emit warning messages when the database's oldest MXIDs reach forty
804
+ million transactions from the wraparound point. And, just as an the XID case, if these
805
+ warnings are ignored, the system will refuse to generate new MXIDs once there are fewer
806
+ than three million left until wraparound.
807
+ </para>
808
+
809
+ <para>
810
+ Normal operation when MXIDs are exhausted can be restored in much the same way as
811
+ when XIDs are exhausted. Follow the same steps in the previous section, but with the
812
+ following differences:
813
+
814
+ <orderedlist>
815
+ <listitem>
816
+ <simpara>Running transactions and prepared transactions can be ignored if there
817
+ is no chance that they might appear in a multixact.</simpara>
818
+ </listitem>
819
+ <listitem>
820
+ <simpara>MXID information is not directly visible in system views such as
821
+ <literal>pg_stat_activity</literal>; however, looking for old XIDs is still a good
822
+ way of determining which transactions are causing MXID wraparound problems.</simpara>
823
+ </listitem>
824
+ <listitem>
825
+ <simpara>XID exhaustion will block all write transactions, but MXID exhaustion will
826
+ only block a subset of write transactions, specifically those that involve
827
+ row locks that require an MXID.</simpara>
828
+ </listitem>
829
+ </orderedlist>
830
+ </para>
831
+
750
832
</sect3>
751
833
</sect2>
752
834
0 commit comments