Skip to content

gh-91555: disable logger while handling log record #131812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 20 additions & 8 deletions Lib/logging/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -1474,6 +1474,8 @@ class Logger(Filterer):
level, and "input.csv", "input.xls" and "input.gnu" for the sub-levels.
There is no arbitrary limit to the depth of nesting.
"""
_tls = threading.local()

def __init__(self, name, level=NOTSET):
"""
Initialize the logger with a name and an optional level.
Expand Down Expand Up @@ -1670,14 +1672,19 @@ def handle(self, record):
This method is used for unpickled records received from a socket, as
well as those created locally. Logger-level filtering is applied.
"""
if self.disabled:
return
maybe_record = self.filter(record)
if not maybe_record:
if self._is_disabled():
Copy link
Contributor

@graingert graingert Mar 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than disabling the logging, can we instead append the record to a self._reentrant_records = collections.deque(), and then process all of the pending records:

            maybe_record = self.filter(record)
            if not maybe_record:
                return
            if isinstance(maybe_record, LogRecord):
                record = maybe_record
            was_calling_handlers = set_calling_handlers()
            try:
                if not was_calling_handlers:
                    self.callHandlers(record)
                    while True:
                        try:
                            record = self._reentrant_records.popleft()
                        except IndexError:
                            return
                        self.callHandlers(record)
                else:
                    self._reentrant_records.append(record)
            finally:
                set_not_calling_handlers()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will still produce a stack overflow if handling the deferred log message itself logs another message

Copy link
Contributor

@graingert graingert Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following how this would cause a stack overflow, if handling the log message logs another message it would go onto the _reentrant_records queue, and then be processed later once the stack returns all the way back to where set_calling_handlers() is first called.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following how this would cause a stack overflow, if handling the log message logs another message it would go onto the _reentrant_records queue, and then be processed later once the stack returns all the way back to where set_calling_handlers() is first called.

Sorry, I should have said deadlock with the current example. The stack overflow is from a different way of triggering this (see the second unit test added).

The trouble is that when the first recursive logging call exits the finally block it clears the "calling handlers" flag, which means a subsequent (still recursive) one takes the wrong path and deadlocks/overflows.

That can be avoided for the original triggering example by only clearing the "handling" flag if it was initially unset (the deferred records collection also needs to be TLS not a member variable). It ends up looking something like this:

        if not hasattr(self._tls, 'reentrant_records'):
            self._tls.reentrant_records = deque()
        deferred = self._tls.reentrant_records                      
        was_calling_handlers = self.set_calling_handlers()
        try:
            if not was_calling_handlers:
                self.callHandlers(record)
                while deferred:
                    self.callHandlers(deferred.popleft())
            else:
                deferred.append(record)
        finally:
            if not was_calling_handlers:
                self.set_not_calling_handlers()

This fixes the two bugs, which only log the first time they try to process a log record (and means those recursive log messages are logged and not silently ignored, which is nice). However a different example which logs every time (such as the second unit test) will still live-lock and never exit that while loop.

Copy link
Contributor

@graingert graingert Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will still live-lock and never exit that while loop.

Does the system keep logging forever instead?

This seems acceptable as you'd easily track down this bug just by looking at the logs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the system keep logging forever instead?

Yep, it will just sit in a tight loop spamming the log forever, or at least until/unless that exhausts disk space or wherever the logs are actually going.

This seems acceptable as you'd easily track down this bug just by looking at the logs

IMO it is not a great failure mode, but it will certainly be obvious!

FWIW I think I prefer ignoring them: the code is much simpler and it prevents the issue in non-trivial handler implementations like Sentry's (that would otherwise trigger the live-lock failure). I was hoping this fix would mean they would be able to remove that nasty monkey-patching on supported versions.

OTOH it is definitely nice to actually handle instead of drop the recursive log messages, in cases where they don't always continue to recurse.

Copy link
Contributor

@graingert graingert Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could have a logging.(_)N_RECURSIVE_CALLS constant to limit this so it's not forever

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. Another alternative would be Victor Stinner's suggestion in the discourse discussion to raise an exception. That would bring it to the user's attention and force them to deal with it.

Ultimately, though, the way they will have to deal with it is by preventing, disabling, or otherwise intercepting and ignoring all such logging. That will be difficult to do reliably outside the core, will likely be overlooked unless/until it bites, and have to be done in every susceptible log handler or application that uses such.

IMO it would be better for us to do this once, centrally, with a small, simple, and robust fix.

return
if isinstance(maybe_record, LogRecord):
record = maybe_record
self.callHandlers(record)

self._tls.in_progress = True
try:
maybe_record = self.filter(record)
if not maybe_record:
return
if isinstance(maybe_record, LogRecord):
record = maybe_record
self.callHandlers(record)
finally:
self._tls.in_progress = False

def addHandler(self, hdlr):
"""
Expand Down Expand Up @@ -1765,7 +1772,7 @@ def isEnabledFor(self, level):
"""
Is this logger enabled for level 'level'?
"""
if self.disabled:
if self._is_disabled():
return False

try:
Expand Down Expand Up @@ -1815,6 +1822,11 @@ def _hierlevel(logger):
if isinstance(item, Logger) and item.parent is self and
_hierlevel(item) == 1 + _hierlevel(item.parent))

def _is_disabled(self):
# We need to use getattr as it will only be set the first time a log
# message is recorded on any given thread
return self.disabled or getattr(self._tls, 'in_progress', False)

def __repr__(self):
level = getLevelName(self.getEffectiveLevel())
return '<%s %s (%s)>' % (self.__class__.__name__, self.name, level)
Expand Down
83 changes: 83 additions & 0 deletions Lib/test/test_logging.py
Original file line number Diff line number Diff line change
Expand Up @@ -4174,6 +4174,89 @@ def __init__(self, *args, **kwargs):
handler = logging.getHandlerByName('custom')
self.assertEqual(handler.custom_kwargs, custom_kwargs)

# See gh-91555 and gh-90321
@support.requires_subprocess()
def test_deadlock_in_queue(self):
queue = multiprocessing.Queue()
handler = logging.handlers.QueueHandler(queue)
logger = multiprocessing.get_logger()
level = logger.level
try:
logger.setLevel(logging.DEBUG)
logger.addHandler(handler)
logger.debug("deadlock")
finally:
logger.setLevel(level)
logger.removeHandler(handler)

def test_recursion_in_custom_handler(self):
class BadHandler(logging.Handler):
def __init__(self):
super().__init__()
def emit(self, record):
logger.debug("recurse")
logger = logging.getLogger("test_recursion_in_custom_handler")
logger.addHandler(BadHandler())
logger.setLevel(logging.DEBUG)
logger.debug("boom")

@threading_helper.requires_working_threading()
def test_thread_supression_noninterference(self):
lock = threading.Lock()
logger = logging.getLogger("test_thread_supression_noninterference")

# Block on the first call, allow others through
#
# NOTE: We need to bypass the base class's lock, otherwise that will
# block multiple calls to the same handler itself.
class BlockOnceHandler(TestHandler):
def __init__(self, barrier):
super().__init__(support.Matcher())
self.barrier = barrier

def createLock(self):
self.lock = None

def handle(self, record):
self.emit(record)

def emit(self, record):
if self.barrier:
barrier = self.barrier
self.barrier = None
barrier.wait()
with lock:
pass
super().emit(record)
logger.info("blow up if not supressed")

barrier = threading.Barrier(2)
handler = BlockOnceHandler(barrier)
logger.addHandler(handler)
logger.setLevel(logging.DEBUG)

t1 = threading.Thread(target=logger.debug, args=("1",))
with lock:

# Ensure first thread is blocked in the handler, hence supressing logging...
t1.start()
barrier.wait()

# ...but the second thread should still be able to log...
t2 = threading.Thread(target=logger.debug, args=("2",))
t2.start()
t2.join(timeout=3)

self.assertEqual(len(handler.buffer), 1)
self.assertTrue(handler.matches(levelno=logging.DEBUG, message='2'))

# The first thread should still be blocked here
self.assertTrue(t1.is_alive())

# Now the lock has been released the first thread should complete
t1.join()
self.assertEqual(len(handler.buffer), 2)
self.assertTrue(handler.matches(levelno=logging.DEBUG, message='1'))

class ManagerTest(BaseTest):
def test_manager_loggerclass(self):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Ignore log messages generated during handling of log messages, to avoid
deadlock or infinite recursion.
Loading