Skip to content

SIGSEGV on cursorIterable.next() when the env is closed #185

@at055612

Description

@at055612

We have a use case where LMDB envs are created as transient stores for search result data so are created, filled, queried then closed and deleted when no longer needed. There are multiple threads involved. We had a bug in our app that resulted in the env being closed while a thread was still trying to query the lmdb db. This resulted in a SIGSEGV and our whole app going down.

I appreciate that querying the db after the env is closed is wrong, but it would be preferable to get an exception rather than have the whole app crash.

I added the following test to EnvTest to demonstrate this. I should imagine the same will happen with other cursor ops and possibly gets.

  @Test
  public void cannotReadOnceClosed() throws IOException, InterruptedException, ExecutionException {
    final File path = tmp.newFile();
    final Env<ByteBuffer> env = create()
            .setMaxReaders(5)
            .open(path, MDB_NOSUBDIR);

    final Dbi<ByteBuffer> dbi = env.openDbi(DB_1, MDB_CREATE);

    dbi.put(bb(1), bb(10));
    dbi.put(bb(2), bb(20));

    try (Txn<ByteBuffer> roTxn = env.txnRead()) {
      assertThat(dbi.get(roTxn, bb(1)).getInt(), is(10));
      assertThat(dbi.get(roTxn, bb(2)).getInt(), is(20));
    }

    System.out.println("Done puts");

    final CountDownLatch firstGetFinishedLatch = new CountDownLatch(1);
    final CountDownLatch envClosedLatch = new CountDownLatch(1);

    final CompletableFuture<Void> future = CompletableFuture.runAsync(() -> {

      System.out.println("Running async task");
      try (Txn<ByteBuffer> roTxn = env.txnRead()) {
        try (CursorIterable<ByteBuffer> iterable = dbi.iterate(roTxn, KeyRange.all())) {
          final Iterator<CursorIterable.KeyVal<ByteBuffer>> iterator = iterable.iterator();

          try {
            System.out.println("Getting first entry");
            CursorIterable.KeyVal<ByteBuffer> keyVal = iterator.next();
            assertThat(keyVal.val().getInt(), is(10));

            firstGetFinishedLatch.countDown();

            try {
              System.out.println("Waiting for env to close");
              envClosedLatch.await();
            } catch (InterruptedException e) {
              Thread.currentThread().interrupt();
              throw new RuntimeException(e);
            }

            assertThat(env.isClosed(), is(true));

            System.out.println("Getting second entry, but env is now closed");
            keyVal = iterator.next();
            assertThat(keyVal.val().getInt(), is(20));
          } catch (RuntimeException e) {
            e.printStackTrace();
          }
        }
      }
    });

    System.out.println("Waiting for cursor to get first entry");
    firstGetFinishedLatch.await();

    System.out.println("Closing env");
    env.close();
    envClosedLatch.countDown();

    System.out.println("Waiting for completion");
    future.get();
  }

This produces the output

Done puts
Waiting for cursor to get first entry
Running async task
Getting first entry
Waiting for env to close
Closing env
Waiting for completion
Getting second entry, but env is now closed
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fb5d37dab28, pid=2052, tid=2086
#
# JRE version: OpenJDK Runtime Environment (15.0.2+7) (build 15.0.2+7-27)
# Java VM: OpenJDK 64-Bit Server VM (15.0.2+7-27, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  [lmdbjava-native-library-17311548015087855682.so+0x6b28]  mdb_cursor_next+0x138
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/dev/git_work/lmdbjava/hs_err_pid2052.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

I have raised PR #186 with some added guards for the env being closed.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions