Skip to content

Commit 7fbcee1

Browse files
committed
Log when GetNewOidWithIndex() fails to find unused OID many times.
GetNewOidWithIndex() generates a new OID one by one until it finds one not in the relation. If there are very long runs of consecutive existing OIDs, GetNewOidWithIndex() needs to iterate many times in the loop to find unused OID. Since TOAST table can have a large number of entries and there can be such long runs of OIDs, there is the case where it takes so many iterations to find new OID not in TOAST table. Furthermore if all (i.e., 2^32) OIDs are already used, GetNewOidWithIndex() enters something like busy loop and repeats the iterations until at least one OID is marked as unused. There are some reported troubles caused by a large number of iterations in GetNewOidWithIndex(). For example, when inserting a billion of records into the table, all the backends doing that insertion operation got hang with 100% CPU usage at some point. Previously there was no easy way to detect that GetNewOidWithIndex() failed to find unused OID many times. So, for example, gdb full backtrace of hanged backends needed to be taken, in order to investigate that trouble. This is inconvenient and may not be available in some production environments. To provide easy way for that, this commit makes GetNewOidWithIndex() log that it iterates more than GETNEWOID_LOG_THRESHOLD but have not yet found OID unused in the relation. Also this commit makes it repeat logging with exponentially increasing intervals until it iterates more than GETNEWOID_LOG_MAX_INTERVAL, and makes it finally repeat logging every GETNEWOID_LOG_MAX_INTERVAL unless an unused OID is found. Those macro variables are used not to fill up the server log with the similar messages. In the discusion at pgsql-hackers, there was another idea to report the lots of iterations in GetNewOidWithIndex() via wait event. But since GetNewOidWithIndex() traverses indexes to find unused OID and which will do I/O, acquire locks, etc, which will overwrite the wait event and reset it to nothing once done. So that idea doesn't work well, and we didn't adopt it. Author: Tomohiro Hiramitsu Reviewed-by: Tatsuhito Kasahara, Kyotaro Horiguchi, Tom Lane, Fujii Masao Discussion: https://postgr.es/m/16722-93043fb459a41073@postgresql.org
1 parent 99dd75f commit 7fbcee1

File tree

1 file changed

+49
-0
lines changed

1 file changed

+49
-0
lines changed

src/backend/catalog/catalog.c

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,13 @@
4747
#include "utils/snapmgr.h"
4848
#include "utils/syscache.h"
4949

50+
/*
51+
* Parameters to determine when to emit a log message in
52+
* GetNewOidWithIndex()
53+
*/
54+
#define GETNEWOID_LOG_THRESHOLD 1000000
55+
#define GETNEWOID_LOG_MAX_INTERVAL 128000000
56+
5057
/*
5158
* IsSystemRelation
5259
* True iff the relation is either a system catalog or a toast table.
@@ -318,6 +325,8 @@ GetNewOidWithIndex(Relation relation, Oid indexId, AttrNumber oidcolumn)
318325
SysScanDesc scan;
319326
ScanKeyData key;
320327
bool collides;
328+
uint64 retries = 0;
329+
uint64 retries_before_log = GETNEWOID_LOG_THRESHOLD;
321330

322331
/* Only system relations are supported */
323332
Assert(IsSystemRelation(relation));
@@ -353,8 +362,48 @@ GetNewOidWithIndex(Relation relation, Oid indexId, AttrNumber oidcolumn)
353362
collides = HeapTupleIsValid(systable_getnext(scan));
354363

355364
systable_endscan(scan);
365+
366+
/*
367+
* Log that we iterate more than GETNEWOID_LOG_THRESHOLD but have not
368+
* yet found OID unused in the relation. Then repeat logging with
369+
* exponentially increasing intervals until we iterate more than
370+
* GETNEWOID_LOG_MAX_INTERVAL. Finally repeat logging every
371+
* GETNEWOID_LOG_MAX_INTERVAL unless an unused OID is found. This
372+
* logic is necessary not to fill up the server log with the similar
373+
* messages.
374+
*/
375+
if (retries >= retries_before_log)
376+
{
377+
ereport(LOG,
378+
(errmsg("still finding an unused OID within relation \"%s\"",
379+
RelationGetRelationName(relation)),
380+
errdetail("OID candidates were checked \"%llu\" times, but no unused OID is yet found.",
381+
(unsigned long long) retries)));
382+
383+
/*
384+
* Double the number of retries to do before logging next until it
385+
* reaches GETNEWOID_LOG_MAX_INTERVAL.
386+
*/
387+
if (retries_before_log * 2 <= GETNEWOID_LOG_MAX_INTERVAL)
388+
retries_before_log *= 2;
389+
else
390+
retries_before_log += GETNEWOID_LOG_MAX_INTERVAL;
391+
}
392+
393+
retries++;
356394
} while (collides);
357395

396+
/*
397+
* If at least one log message is emitted, also log the completion of OID
398+
* assignment.
399+
*/
400+
if (retries > GETNEWOID_LOG_THRESHOLD)
401+
{
402+
ereport(LOG,
403+
(errmsg("new OID has been assigned in relation \"%s\" after \"%llu\" retries",
404+
RelationGetRelationName(relation), (unsigned long long) retries)));
405+
}
406+
358407
return newOid;
359408
}
360409

0 commit comments

Comments
 (0)