Skip to content

The new MDB_UNSIGNEDKEY DbiFlag should probably be the default behaviour #249

@at055612

Description

@at055612

This issue relates to this PR #199 (which addressed this issue #198) and the subsequent PR #237 that added MDB_UNSIGNED_KEY.

We have recently upgraded to 0.9.1 (thanks for this release and the fixes in it) and found that a number of our tests are failing due to cursor start/stop key comparisons. Adding the MDB_UNSIGNEDKEY DbiFlag fixes them however it feels like the old compareBuff comparator ought to be the default rather than ByteBuffer.compareTo.

While I appreciate it is preferable to use methods from the JDK rather than hand rolled comparators, this is only the case IF the JDK methods behave in the same way as the existing code, which they don't. As @benalexau pointed out in the PR, ByteBuffer.compareTo is signed while compareBuff is unsigned. The latter matches the way LMDB orders its entries while the former does not.

It is hard to see a case where you would want to use a comparator for the cursor start/stop keys that does not match that used for the iteration order imposed by LMDB's default comparator. Maybe if you are only dealing with positive signed numbers they would match, but in that case, you would be better using MDB_INTEGER_kEY for the better performance. There is always the option to use no stop key and to instead use your own code to determine when to break out of the cursor iteration.

This problem is not just limited to numbers. As an example, if your key was a UTF16 string and you had keys:

0x00 0x00 // a null char
0xFF 0xFD // a � char

They would appear in the DB in that order, but the ByteBuffer.compareTo would treat 0xFF 0xFD as being the lower of the two.

If MDB_UNSIGNEDKEY is not set then it can lead to strange behaviour when using cursors that is highly value dependant, and thus tricky to fix. A crude performance test between the two ByteBuffer comparators suggests that the JDK is only marginally quicker.

It may be preferable to return compareBuff to being the default (when no callback comparator is used) but to use the JDK comparators by setting a flag, essentially the opposite of what is in 0.9.1. I.e. the developer is making an explicit choice to get the slight performance gain of the JDK method with the understanding that it comes with risks. If they know their data fits within the constraints then it is ok.

One other option is not to use a java based comparator for the cursor start/stop keys and instead call down to this method on LMDB so that you are sure the comparison is being done in the same way as the DB is storing its entries. I suspect that it would be more costly do make this call but you would at least be sure the comparison matched LMDB's.

int mdb_cmp (
    MDB_txn *         txn,
    MDB_dbi           dbi,
    const MDB_val *   a,
    const MDB_val *   b 
)

Compare two data items according to a particular database.

This returns a comparison as if the two data items were keys in the specified database.

Parameters
[in]	txn	A transaction handle returned by mdb_txn_begin()
[in]	dbi	A database handle returned by mdb_dbi_open()
[in]	a	The first item to compare
[in]	b	The second item to compare
Returns
< 0 if a < b, 0 if a == b, > 0 if a > b

Happy to raise a PR address any changes that are agreed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions