Skip to content

SQLite3 multithreading cache inconsistency #118172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
adam444555 opened this issue Apr 23, 2024 · 10 comments
Open

SQLite3 multithreading cache inconsistency #118172

adam444555 opened this issue Apr 23, 2024 · 10 comments
Labels
extension-modules C modules in the Modules dir topic-sqlite3 type-bug An unexpected behavior, bug, or error

Comments

@adam444555
Copy link

adam444555 commented Apr 23, 2024

Bug report

Bug description:

When using SQLite3 in multi-threading application, the fetch reuslts are not consistent. After some testing, this seems to be caused by the caching.
The SQLite threading mode is serialized.
This issue only exists in python 3.12 and 3.13. No issue in 3.11.
Currently solution is to set cached_statements=0 when connect to the database.

A simple demo to reproduce the error:

import sqlite3
import threading

KB = sqlite3.connect(
    "example.db", check_same_thread=False
)

def execute_query():
    cursor = KB.cursor()
    cursor.execute("SELECT * FROM test_table")
    result = cursor.fetchall()
    assert result == [(1, 'test1'), (2, 'test2'), (3, 'test3')], str(result)
    return result

def run_threads():
    threads = []
    for i in range(50):
        thread = threading.Thread(target=execute_query)
        threads.append(thread)
        thread.start()

    for thread in threads:
        thread.join()

def test_multithreading():
    cursor = KB.cursor()

    cursor.execute(
        """CREATE TABLE IF NOT EXISTS test_table (id INTEGER PRIMARY KEY, value TEXT)"""
    )
    KB.commit()

    cursor.execute("""DELETE FROM test_table""")
    KB.commit()

    cursor.execute("""INSERT INTO test_table (value) VALUES ('test1')""")
    cursor.execute("""INSERT INTO test_table (value) VALUES ('test2')""")
    cursor.execute("""INSERT INTO test_table (value) VALUES ('test3')""")
    KB.commit()

    run_threads()

    KB.close()

if __name__ == "__main__":
    test_multithreading()

Test output: (Different in each run because of inconsistent fetch results)

Exception in thread Thread-3 (execute_query):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
Exception in thread Thread-7 (execute_query):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self._target(*self._args, **self._kwargs)
  File "/home/user/tmp/kb_test.py", line 13, in execute_query
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
Exception in thread Thread-12 (execute_query):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    assert result == [(1, 'test1'), (2, 'test2'), (3, 'test3')], str(result)
    self._target(*self._args, **self._kwargs)
AssertionError: [(1, 'test2'), (3, 'test3')]
  File "/home/user/tmp/kb_test.py", line 13, in execute_query
Exception in thread Thread-1 (execute_query):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
    assert result == [(1, 'test1'), (2, 'test2'), (3, 'test3')], str(result)
    self.run()
AssertionError: [(3, 'test3')]
  File "/usr/lib/python3.12/threading.py", line 1010, in run
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/tmp/kb_test.py", line 13, in execute_query
    self._target(*self._args, **self._kwargs)
  File "/home/user/tmp/kb_test.py", line 13, in execute_query
    assert result == [(1, 'test1'), (2, 'test2'), (3, 'test3')], str(result)
    assert result == [(1, 'test1'), (2, 'test2'), (3, 'test3')], str(result)
AssertionError: [(None, None), (), (2, 'test2'), (), (), (2, 'test2'), (0, None), (1, 'test1'), (2, 'test2'), (3, 'test3')]
AssertionError: [(3, 'test3')]

CPython versions tested on:

3.12, 3.13

Operating systems tested on:

Linux

@corona10
Copy link
Member

cc @erlend-aasland

@erlend-aasland
Copy link
Contributor

cc @erlend-aasland

Thanks Donghee! (BTW, I'm already subscribed to the topic-sqlite3 label, so no need to ping me manually 🙂)

@9001
Copy link

9001 commented Apr 23, 2024

is this correct use of the api though, setting check_same_thread=False and then accessing the same connection from multiple threads simultaneously? The issue goes away with the following change:

LK = threading.Lock()
def execute_query():
    cursor = KB.cursor()
    with LK:
        cursor.execute("SELECT * FROM test_table")
        result = cursor.fetchall()
    assert result == [(1, 'test1'), (2, 'test2'), (3, 'test3')], str(result)
    return result

@adam444555
Copy link
Author

adam444555 commented Apr 23, 2024

is this correct use of the api though, setting check_same_thread=False and then accessing the same connection from multiple threads simultaneously? The issue goes away with the following change:

LK = threading.Lock()
def execute_query():
    cursor = KB.cursor()
    with LK:
        cursor.execute("SELECT * FROM test_table")
        result = cursor.fetchall()
    assert result == [(1, 'test1'), (2, 'test2'), (3, 'test3')], str(result)
    return result

I am not sure. But based on the description from the document, for serialized threading mode, "Threads may share the module, connections and cursors", and no lock needed for the old version (3.11 and older), I assume lock is not necessary.

@erlend-aasland
Copy link
Contributor

I can confirm that the issue appears in Python 3.12 and Python 3.13, but not in earlier versions (I tested back to Python 3.8).

I also tried to revert the sqlite3 extension code back to its 3.11 state (making only a few needed C API changes), just to make sure the issue exists in the sqlite3 extension and not in the runtime. I can confirm the issue is in the sqlite3 extension. I'll do a bisect as soon as possible.

@erlend-aasland
Copy link
Contributor

I bisected this back to f5c85aa, which seems plausible. I'll also try to come up with a smaller repro so we can add a regression test for this.

@erlend-aasland
Copy link
Contributor

However, I'm not completely convinced this is the actual offending commit; If I check out f5c85aa3eea1adf0c61089583e2251282a316ec1~, I get intermittent segfaults1 for about 5% of the runs.

Footnotes

  1. bad memory access in _pysqlite_query_execute

tilman-aiven added a commit to Aiven-Open/astacus that referenced this issue Jun 24, 2024
With python 3.12 and 3.13 fetch results from SQLite3
in multi-threading applications are not consistent.

This causes an InterfaceError with bad parameter
or other API misuse.

Current solution is to set the cache_statement to 0.
See python/cpython#118172
tilman-aiven added a commit to Aiven-Open/astacus that referenced this issue Jun 24, 2024
With python 3.12 and 3.13 fetch results from SQLite3
in multi-threading applications are not consistent.

This causes an InterfaceError with bad parameter
or other API misuse.

Current solution is to set the cached_statement to 0.
See python/cpython#118172
tilman-aiven added a commit to Aiven-Open/astacus that referenced this issue Jun 24, 2024
With python 3.12 and 3.13 fetch results from SQLite3
in multi-threading applications are not consistent.

This causes an InterfaceError with bad parameter
or other API misuse.

Current solution is to set the cached_statement to 0.
See python/cpython#118172
tilman-aiven added a commit to Aiven-Open/astacus that referenced this issue Jun 24, 2024
With python 3.12 and 3.13 fetch results from SQLite3
in multi-threading applications are not consistent.

This causes an InterfaceError with bad parameter
or other API misuse.

Current solution is to set the cached_statement to 0.
See python/cpython#118172
tilman-aiven added a commit to Aiven-Open/astacus that referenced this issue Jun 24, 2024
With python 3.12 and 3.13 fetch results from SQLite3
in multi-threading applications are not consistent.

This causes an InterfaceError with bad parameter
or other API misuse.

Current solution is to set the cached_statement to 0.
See python/cpython#118172
tilman-aiven added a commit to Aiven-Open/astacus that referenced this issue Jun 24, 2024
With python 3.12 and 3.13 fetch results from SQLite3
in multi-threading applications are not consistent.

This causes an InterfaceError with bad parameter
or other API misuse.

Current solution is to set the cached_statement to 0.
See python/cpython#118172
tilman-aiven added a commit to Aiven-Open/astacus that referenced this issue Jun 24, 2024
With python 3.12 and 3.13 fetch results from SQLite3
in multi-threading applications are not consistent.

This causes an InterfaceError with bad parameter
or other API misuse.

Current solution is to set the cached_statement to 0.
See python/cpython#118172
tilman-aiven added a commit to Aiven-Open/astacus that referenced this issue Jun 26, 2024
With python 3.12 and 3.13 fetch results from SQLite3
in multi-threading applications are not consistent.

This causes an InterfaceError with bad parameter
or other API misuse.

Current solution is to set the cached_statement to 0.
See python/cpython#118172
@erlend-aasland erlend-aasland moved this to TODO: Bugs in sqlite3 issues Sep 16, 2024
@olavmo-sikt
Copy link

olavmo-sikt commented Oct 11, 2024

We also experienced this issue on Python 3.12, and worked around it by setting cached_statements=0.
Adding a few more ways this issue can manifest to help people searching for Python exception messages related to SQLite:

In our case, this issue mainly manifested itself with the following exception when running connection.execute(...):

  • sqlite3.InterfaceError: bad parameter or other API misuse

More rarely we got some of the following exceptions when calling connection.execute(...):

  • sqlite3.InterfaceError: not an error
  • sqlite3.InterfaceError: no more rows available
  • sqlite3.InterfaceError: another row available

We also got some errors when calling cursor.fetchone():

  • sqlite3.OperationalError: Could not decode to UTF-8 column '[...]' with text '[...]'
  • SystemError: NULL string with positive size with NULL passed to PyUnicode_FromStringAndSize

We also got various errors when retrieving data from the sqlite3.Row objects, such as:

  • IndexError: tuple index out of range
    (when accessing a field that definitively is in the row)
  • None-values in columns that are NOT NULL.
  • Malformed JSON strings from columns that contain valid JSON.

wjblanke added a commit to Chia-Network/chia-blockchain that referenced this issue Jan 9, 2025
### Purpose:

Newer versions of python return incorrect sqlite results if
cached_statements is nonzero. See:

python/cpython#118172

### Current Behavior:

cached_statements is set to default

### New Behavior:

cached_statements is set to 0. Chia rarely repeats the same sqlite
queries so hopefully this does not affect performance.

### Testing Notes:

None
@altendky
Copy link
Contributor

We also experienced this issue on Python 3.12, and worked around it by setting cached_statements=0.

the op included explicitly using the same connection object from multiple threads simultaneously with check_same_thread=False. did you have either of these cases or any other possibly related 'not normal' settings?

@picnixz picnixz added the extension-modules C modules in the Modules dir label Feb 14, 2025
openstack-mirroring pushed a commit to openstack/openstack that referenced this issue Feb 24, 2025
* Update oslo.db from branch 'master'
  to bea56237b5cfaf3341d938210b005c99d71c50fb
  - Disable cached-statements on sqlite3
    
    Since python 3.12, sqlite3 and multi threading may not work correctly.
    
    One of the current workaround is to disable cached statements.
    
    See also: python/cpython#118172
    
    Change-Id: I35a2568474d99491b99009a8d475135d16939e2b
    Signed-off-by: Arnaud M <arnaud.morin@gmail.com>
openstack-mirroring pushed a commit to openstack/oslo.db that referenced this issue Feb 24, 2025
Since python 3.12, sqlite3 and multi threading may not work correctly.

One of the current workaround is to disable cached statements.

See also: python/cpython#118172

Change-Id: I35a2568474d99491b99009a8d475135d16939e2b
Signed-off-by: Arnaud M <arnaud.morin@gmail.com>
@erlend-aasland
Copy link
Contributor

However, I'm not completely convinced this is the actual offending commit; If I check out f5c85aa3eea1adf0c61089583e2251282a316ec1~, I get intermittent segfaults1 for about 5% of the runs.

  1. bad memory access in _pysqlite_query_execute

Picking this up after some time away from CPython development.

I tried bisecting using a more aggressive script and managed to reproduce failures all the way back to v3.10.0a2 (!). I'm unable to reproduce with official 3.10 binaries, though, so maybe we need a more focused repro. I'll try to devote more time to this before the ultimate 3.11 bugfix release later this month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir topic-sqlite3 type-bug An unexpected behavior, bug, or error
Projects
Status: TODO: Bugs
Development

No branches or pull requests

8 participants