Skip to content

Thread-unsafe libc functions #127081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
colesbury opened this issue Nov 20, 2024 · 3 comments
Open

Thread-unsafe libc functions #127081

colesbury opened this issue Nov 20, 2024 · 3 comments
Labels
extension-modules C modules in the Modules dir interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@colesbury
Copy link
Contributor

colesbury commented Nov 20, 2024

Bug report

There are a a few non thread-safe libc functions used in Python that can be an issue for free threading, isolated subinterpreters, or sometimes even with the GIL.

In #126316, @vstinner fixed the use setgrent / getgrent. It's probably a good time to look for other similar issues.

clang-tidy

clang-tidy has a concurrency-mt-unsafe check that looks for "known-to-be-unsafe functions".

clang-tidy notes

Prerequisites: install clang-tidy-18 and bear.

./configure -C --with-pydebug --disable-gil

# generate compile_commands.json
bear -- make -j 

run-clang-tidy-18 -checks='-*,concurrency-mt-unsafe' -p .

Unsafe libc functions

  • localeconv(): not thread-safe, see glibc's manual. Is nl_langinfo a substitue?
  • setlocale
  • setpwent, getpwent, and endpwent in pwdmodule.c. These are similar to grpmodule.c and can likely be addressed the same way.
  • getservbyname, getservbyport, getprotobyname in Modules/socketmodule.c: use getservbyname_r, etc.? Note these thread-safety issues affect the default build because we release the GIL around the relevant calls.
  • dbm_open, dbm_close, etc. in Modules/_dbmmodule.c
  • getlogin: use getlogin_r if available?

Unfixable by us?

Safe due to our usage

  • getc_unlocked, safe because we use it within a flockfile() call.
  • mbrtowc() - safe as long as the passed in mbstate_t * is non-NULL, which is the case in CPython.

Safe in glibc

These functions are flagged by clang-tidy because they are not guaranteed to be safe by POSIX, but they are safe in glibc. It'd be nice to verify that they are safe in other libc implementations. I don't think it's worth changing them:

Other

  • exit(): apparently concurrent calls to exit() are not thread-safe, but I don't think it matters for our usages.
  • ptsname(): we already use ptsname_r(), but the static analyzer gets confused

Linked PRs

@duaneg
Copy link

duaneg commented Apr 16, 2025

This reliably reproduces the race in the socket functions, with or without the GIL:

repro-127081-sockets.py
import socket
import sys
import threading

ITERATIONS=100

services = [
    ('echo', 7),
    ('ftp', 21),
    ('ssh', 22),
    ('telnet', 23),
    ('smtp', 25)]

protos = [
    ('icmp', 1),
    ('ipv4', 4),
    ('igp', 9),
    ('udp', 17),
    ('ipv6', 41)]

def getservbyname(barrier, service, port):
    barrier.wait()
    assert socket.getservbyname(service) == port

def getservbyport(barrier, service, port):
    barrier.wait()
    assert socket.getservbyport(port) == service

def getprotobyname(barrier, proto, port):
    barrier.wait()
    assert socket.getprotobyname(proto) == port

def run_threads(threads):
    for t in threads:
        t.start()
    for t in threads:
        t.join()

def test_getservbyname():
    barrier = threading.Barrier(len(services))
    threads = [threading.Thread(target=getservbyname, args=(barrier, service, port))
               for service, port in services]
    run_threads(threads)

def test_getservbyport():
    barrier = threading.Barrier(len(services))
    threads = [threading.Thread(target=getservbyport, args=(barrier, service, port))
               for service, port in services]
    run_threads(threads)

def test_getprotobyname():
    barrier = threading.Barrier(len(protos))
    threads = [threading.Thread(target=getprotobyname, args=(barrier, proto, port))
               for proto, port in protos]
    run_threads(threads)

bailing = threading.Event()
old_except_hook = None

def bail(args):
    bailing.set()
    old_except_hook(args)

if __name__ == '__main__':
    old_except_hook = threading.excepthook
    threading.excepthook = bail
    for _ in range(ITERATIONS):
        if bailing.is_set():
            break
        test_getservbyname()
        test_getservbyport()
        test_getprotobyname()

duaneg added a commit to duaneg/cpython that referenced this issue Apr 16, 2025
Add configure tests and defines for getservbyname_r, getservbyport_r, and
getprotobyname_r. Use these if available, otherwise fallback to the
thread-unsafe variants.

Add a unit test to exercise getprotobyname, which is currently untested.

TODO:
 - Are there any platforms which define the unsafe variants but not the
   re-entrant ones? If not we can simplify the #ifdef hell somewhat.
 - Do the re-entrant functions have the same signature on all platforms?
 - These changes follow the existing code's practice: allocate a fixed-size
   (and overly large) buffer, and don't properly handle the error case if it is
   too small. Should this be fixed? If so should existing code also be fixed?
@duaneg
Copy link

duaneg commented Apr 17, 2025

Reproductions for pwd and dbm issues (only affects free-threaded builds).

repro-127081-pwd.py
import pwd
import threading
import unittest

N=5
ITERATIONS=1000

pwds = set(pwd.getpwall())

def getpwall(barrier):
    barrier.wait()
    assert pwds == set(pwd.getpwall())

def run_threads(threads):
    for t in threads:
        t.start()
    for t in threads:
        t.join()

if __name__ == '__main__':
    for _ in range(ITERATIONS):
        barrier = threading.Barrier(N)
        threads = [threading.Thread(target=getpwall, args=(barrier,)) for _ in range(N)]
        run_threads(threads)
repro-127081-dbm.py
import dbm
import random
import tempfile
import threading
from pathlib import Path

N=5
ITERATIONS=1000

def run_threads(threads):
    for t in threads:
        t.start()
    for t in threads:
        t.join()

def traverse(barrier, db):
    barrier.wait()
    for key in db.keys():
        try:
            assert db[key] == key
        except KeyError:
            pass

def inserter(barrier, db):
    barrier.wait()
    for _ in range(N):
        ch = chr(ord('a') + random.randrange(26))
        db[ch] = ch

def deleter(barrier, db):
    barrier.wait()
    for _ in range(N):
        key = random.choice(db.keys())
        try:
            del db[key]
        except KeyError:
            pass

def defaulter(barrier, db):
    barrier.wait()
    for _ in range(N):
        ch = chr(ord('a') + random.randrange(26))
        result = db.setdefault(ch, ch)
        # TODO: Is this a gdbm bug?
        if isinstance(result, bytes):
            result = result.decode()
        assert result == ch, f"{result} != {ch}"

def checker(barrier, db):
    barrier.wait()
    for _ in range(N * 10):
        assert len(db) >= 0
        key = random.choice(db.keys())
        key in db

def init_db(path):
    with dbm.open(path, 'c') as db:
        for s in ['foo', 'bar', 'baz']:
            db[s] = s
    return path

def test_db(path):
    for _ in range(ITERATIONS):
        barrier = threading.Barrier(N + 4)
        with dbm.open(path, 'w') as db:
            args = (barrier, db)
            threads = [threading.Thread(target=traverse, args=args) for _ in range(N)]
            threads.append(threading.Thread(target=inserter, args=args))
            threads.append(threading.Thread(target=deleter, args=args))
            threads.append(threading.Thread(target=defaulter, args=args))
            threads.append(threading.Thread(target=checker, args=args))
            run_threads(threads)

if __name__ == '__main__':
    # dbm.sqlite3 only allows accessing DBs from the same thread that opened
    # them (and enforces this)
    for backend in ['dbm.gnu', 'dbm.ndbm']:
        dbm.whichdb = lambda x: backend
        with tempfile.TemporaryDirectory() as dir:
            path = Path(dir) / "test-db"
            init_db(path)
            test_db(path)

I can't reproduce a problem with getlogin at the moment: perhaps using docker or such-like and messing with /var/run/utmp would work.

The issues with setlocale and localeconv look rather more involved. Perhaps replace use of setlocale with uselocale where it is used to temporarily change locale? Various other places need to set it globally for the process, including os.setlocale(), so presumably cannot be changed. Hopefully the documentation stating it is not thread-safe is sufficient there.

duaneg added a commit to duaneg/cpython that referenced this issue Apr 17, 2025
The libc setpwent, getpwent, and endpwent functions are not thread-safe.
Protect them with mutexs in free-threading builds.
duaneg added a commit to duaneg/cpython that referenced this issue Apr 17, 2025
The dbm_* functions are not thread-safe, naturally. Add critical sections to
protect their use.
duaneg added a commit to duaneg/cpython that referenced this issue Apr 17, 2025
The getlogin function is not thread-safe: replace with getlogin_r where
available.

Note that this function is untested (unit test is skipped with a note it caused
CI failures as behaviour differs between NIX environments).
@duaneg
Copy link

duaneg commented Apr 20, 2025

See also #74667, which is a dup of this (or vice versa) specifically regarding the socket functions.

@picnixz picnixz added interpreter-core (Objects, Python, Grammar, and Parser dirs) extension-modules C modules in the Modules dir labels Apr 20, 2025
duaneg added a commit to duaneg/cpython that referenced this issue Apr 20, 2025
The dbm_* functions are not thread-safe, naturally. Add critical sections to
protect their use.
duaneg added a commit to duaneg/cpython that referenced this issue Apr 20, 2025
The libc setpwent, getpwent, and endpwent functions are not thread-safe.
Protect them with mutexs in free-threading builds.
duaneg added a commit to duaneg/cpython that referenced this issue Apr 20, 2025
Add configure tests and defines for getservbyname_r, getservbyport_r, and
getprotobyname_r. Use these if available, otherwise fallback to the
thread-unsafe variants.

Add a unit test to exercise getprotobyname, which is currently untested.

TODO:
 - Are there any platforms which define the unsafe variants but not the
   re-entrant ones? If not we can simplify the #ifdef hell somewhat.
 - Do the re-entrant functions have the same signature on all platforms?
 - These changes follow the existing code's practice: allocate a fixed-size
   (and overly large) buffer, and don't properly handle the error case if it is
   too small. Should this be fixed? If so should existing code also be fixed?
duaneg added a commit to duaneg/cpython that referenced this issue Apr 20, 2025
The getlogin function is not thread-safe: replace with getlogin_r where
available.

Note that this function is untested (unit test is skipped with a note it caused
CI failures as behaviour differs between NIX environments).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants