Skip to content

[native + sqlite WAL] Read-your-writes consistency violation under stress, due to SqlCursor escaping mutual exclusion of its parent connection #2123

@andersio

Description

@andersio

Runtime Environment
SQLDelight version: 1.4.3
Application OS: iOS 14.x

Proposed change: #2132

TL;DR

SQLite Cursors/Statements are allowed to escape the critical section for SQLite connections. This causes intermittent read-your-writes consistency violation at least with the native driver in WAL mode.

Query listeners and after commit callbacks fail to see the written data, when the system is stressing SQLDelight.

Identified issue

In executeQuery of NativeSqlDatabase , the SqlCursor escapes the critical section provided by accessConnection().

https://github.com/cashapp/sqldelight/blob/51e0621abe103761717d3d9078dcde3734ad8c04/drivers/native-driver/src/nativeMain/kotlin/com/squareup/sqldelight/drivers/native/NativeSqlDatabase.kt#L53-L63

This means other threads can potentially acquire and start using the query connection, while the earlier owner is still stepping the result set through the escaped SqlCursor. While concurrently accessing the SQLite connection itself isn't problematic, the escaped SqlCursor has implications in WAL mode.

WAL readers cannot move its end mark when it still has active statements. In other words, if the read/query connection is prematurely yielded from a 1st thread 1 to a 2nd thread, the 2nd thread may not see any new data written by itself through the "transction pool" write connection.

It is a race condition that violates read-your-writes consistency, in that the data are visible if and only if the 1st thread by chance closes the SqlCursor, before than the 2nd thread starts any read. Such race condition can be unintuitive to isolate and fix — imagine an application with lots of read and write queries issued from 2+ threads, that unit tests are passing with small test data sets, and that hiccups only start to emerge as it processes growing user generated data.

Attached below is a code snippet to illustrate the issue. Note that the cursor lifetime is exaggerated by using execute(): SqlCursor directly. In practice, we usually use the convenient executeAs*() methods, whose cursor lifetime beyond the critical section scales with the result set size.

val list = db.fruitQueries.all().executeAtList()
check(list.count() == 0)

// Execute out-of-transaction on `queryPool`.
val cursor = db.fruitQueries.all().execute()

launch(GlobalScope + DIspatchers.Main) {
   delay(100000)
   cursor.close()
}

db.transaction {
    // Execute as part of a transaction on `transactionPool`.
    db.fruitQueries.insert("Apple")
    db.fruitQueries.insert("Orange")

    afterCommit {
        // Execute out-of-transaction on `queryPool`.
        val list = db.fruitQueries.all().executeAtList()

        // FAIL: IllegalStateException
        //
        // Count is still zero, because the escaped cursor keeps the reader conn.
        // at an earlier DB snapshot.
        check(list.count() == 2)
    }
}

This issue has to be solved at framework level, since:

  • As a user, I have no option to extend the critical section (internal to SQLDelight) to cover stepping of the cursor.
  • The only user-side "solution" is to serialize all database accesses, which defeats the point of having multiple connections.

Note that this affects not only the afterCommit hook, but also any query listeners as they are both called out at the same place. Moreover, this issue is applicable to SQLite usage on any platform, since this is rooted in SQLite's API design choice.

Proposed solution

SqlCursor and any other resource should be bound to the critical section. They should be closed before the current thread releases its lock on the connection.

The signature of SqlDriver.executeQuery and Query.execute() should be changed to accept a (SqlCursor) -> R block, so that the block can be evaluated and immediately followed up with cleanup actions, right inside the accessConnection() critical section. The result of the block is passed back to the caller.

This is however an API breaking change.

Our proposed patch for the issue: #2132

interface SqlDriver {
  /**
   * Execute a SQL statement and evaluate its result set using the given block.
   *
   * @param [identifier] An opaque, unique value that can be used to implement any driver-side
   *   caching of prepared statements. If [identifier] is null, a fresh statement is required.
   * @param [sql] The SQL string to be executed.
   * @param [parameters] The number of bindable parameters [sql] contains.
   * @param [binders] A lambda which is called before execution to bind any parameters to the SQL
   *   statement.
   * @param [block] A lambda which is called with the cursor when the statement is executed
   *   successfully. The generic result of the lambda is returned to the caller, as soon as the
   *   mutual exclusion on the database connection ends. The cursor **must not escape** the block
   *   scope.
   */
  fun <R> executeQuery(
    identifier: Int?,
    sql: String,
    parameters: Int,
    binders: (SqlPreparedStatement.() -> Unit)? = null,
    block: (SqlCursor) -> R
  ): R
}
abstract class Query<T> {
  /**
   * Execute the underlying statement. The resulting cursor is passed to the given block.
   *
   * The cursor is closed automatically after the block returns.
   */
  abstract fun <R> execute(block: (SqlCursor) -> R): R
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions