Skip to content

Replacing byte[] and ByteBuffer compareTo methods with JDK builtins #199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

danielcranford
Copy link
Contributor

Resolves #198

@danielcranford
Copy link
Contributor Author

Ahh, missed the fact Arrays.compare(byte[], byte[]) was added in JDK 9. ByteByffer.compareTo() still works. ByteBuffer.wrap(byte[]).compareTo(ByteBuffer.wrap(byte[]) works as an alternative to Arrays.compare though adds allocation cost to the compare

@benalexau benalexau force-pushed the master branch 2 times, most recently from 0a78414 to a1162e1 Compare December 10, 2022 01:29
Copy link

@cgilliard cgilliard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test cases cover these changes.

@benalexau
Copy link
Member

This is a more complicated proposal than it first appears.

The PR modifies two implementations of the BufferProxy.compare(T, T) method. This method provides comparator functionality should null be passed in as the comparator to Dbi.iterate(Txn, KeyRange, Comparator). A comparator is needed for the CursorIterator to facilitate its iteration logic. CursorIterator is a relatively uncommonly used API and there is no equivalent in the LMDB native library.

LMDB native code does not call the provided comparator. While a user is free to provide their own Java-side comparator when creating a Dbi, most do not given this would impose a significant overhead. The majority of users carefully configure the Dbi with the correct flags for their key type and let native code on the LMDB side handle ordering. The JavaDocs for CursorIterable make reference to this:

If iterating over keys stored with {@link DbiFlags#MDB_INTEGERKEY} you must provide a Java comparator when constructing the {@link Dbi} or this class. It is more efficient to use a comparator only with this class, as this avoids LMDB calling back into Java code to perform the integer key comparison.

Where we run into issues is when using an MDB_INTEGERKEY with a CursorIterator. This is because what we need is a Java comparator that will treat buffers containing integer keys in the same manner as LMDB would. ByteBuffer.compareTo(..) delegates to Byte.compare(byte, byte) and Java bytes are signed. So we end up with a different order.

There are a few options available, such as better clarifying the requirements in the JavaDocs of BufferProxy.compare(T, T), requiring the user to instantiate the CursorIterator with an integer-aware comparator, or automatically doing the latter if the Dbi is using an integer key. However I am wondering how much of an issue this presently is given the limited occasions these comparators are called and the existing code already handles integers.

@benalexau
Copy link
Member

After thinking about this some more I believe the better approach is to store a mandatory Comparator in the Dbi at construction time. If the user does not provide a Comparator - which they very rarely would - the BufferProxy can select an optimal implementation based on the Dbi flags. So if there are integer keys it would use an approach like the existing code, otherwise it can fall back to a built-in Java method.

Storing a Comparator in the Dbi is the most natural approach given all keys in the Dbi must align with that Comparator for the CursorIterator to function correctly and there is no reason to provide a different Comparator for a particular invocation of Dbi.iterate(Txn, KeyRange, Comparator). Removing the Comparator parameter from the iterate method would remove this overloaded variant and potential API usage confusion.

It would be natural to only accept a single Comparator in a Dbi and use a boolean flag to denote whether it should be invoked from the LMDB native library side. The majority of the time this would be false for efficiency reasons, with the Comparator only used for CursorIterator instances if the user requests one.

The only downside I can presently see to this is it would represent a breaking change by removing the aforementioned method and some minor variations to the Env.openDbi(...) methods. Nevertheless it would allow more efficient Comparators and a more concise and clearer Dbi API. It's worth doing this but I will need to defer it until 0.9.0 to comply with semantic versioning expectations.

I'll leave this PR open as a reminder to address this when preparing 0.9.0.

@benalexau benalexau closed this in 3524995 Feb 4, 2023
benalexau added a commit that referenced this pull request Feb 4, 2023
benalexau added a commit that referenced this pull request Feb 4, 2023
benalexau added a commit that referenced this pull request Feb 4, 2023
@benalexau
Copy link
Member

Following the release of 0.8.3 I switched the target development version to 0.9.0 and implemented my prior comment. The main changes are:

  • A default Comparator is provided by each BufferProxy (just as it always has been)
  • A single Comparator must be associated with each Dbi when that database is opened
  • There is no longer support for a different Comparator when creating a CursorIterator
  • Clearer JavaDocs about how the Comparator is used
  • ByteBufferProxy detects if DbiFlags includes MDB_INTEGERKEY and uses the same custom logic as it always has (otherwise it uses the ByteBuffer.compare(..) method)
  • Various internals have been cleaned up (eg Txn no longer exposes the BufferProxy etc)

I decided against making other BufferProxy implementations (eg ByteArrayProxy) similarly differentiate between the existing custom Comparators and any inbuilt alternatives. In the case of Netty, we are already delegating to its compareTo method. In the case of byte[]s, the previously-mentioned absence of Arrays.compare(byte[], byte[]) in Java 8, plus the allocation costs of delegating through to ByteBuffers, would likely exceed any performance gain. If it is ultimately required it is easy enough to implement without further API changes, and/or for end users to provide their own BufferProxy implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ByteBufferProxy and ByteArrayProxy are using hand rolled compareTo implementations
3 participants