Skip to content

INTPYTHON-527 Add Queryable Encryption support #329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

aclark4life
Copy link
Collaborator

@aclark4life aclark4life commented Jun 27, 2025

Previous attempts and additional context here:

@aclark4life

This comment was marked as resolved.

@timgraham

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@timgraham

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@timgraham

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@timgraham

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@timgraham

This comment was marked as resolved.

Copilot

This comment was marked as resolved.

@timgraham

This comment was marked as resolved.

@aclark4life

This comment was marked as resolved.

@timgraham

This comment was marked as resolved.

@aclark4life
Copy link
Collaborator Author

I'm still unclear on the user workflow. What I envisioned is:

* define model, `makemigrations`, `migrate`

* Put the output of `showencryptedfieldsmap` (which includes keyIds retrieved from server) in `AutoEncryptionOpts(encrypted_fields_map=...)`.

That is correct. We can remove --create if we never want to support this workflow:

  • define model
  • makemigrations
  • Put the output of showencryptedfieldsmap --create (which creates new keys) in AutoEncryptionOpts(encrypted_fields_map=...).
  • migrate

In other words, do we support the chicken and the egg or just the egg 😂

@aclark4life
Copy link
Collaborator Author

But this issue remains:

Creating test database for alias 'encrypted' ('test_djangotests-encrypted')...
/home/runner/.local/lib/python3.12/site-packages/pymongo/daemon.py:147: RuntimeWarning: Failed to start mongocryptd: is it on your $PATH?
Original exception: [Errno 2] No such file or directory: 'mongocryptd'
  _silence_resource_warning(_spawn(sys.argv[1:]))
/home/runner/.local/lib/python3.12/site-packages/pymongo/daemon.py:147: RuntimeWarning: Failed to start mongocryptd: is it on your $PATH?
Original exception: [Errno 2] No such file or directory: 'mongocryptd'

Any progress on this? Do we need to ask @blink1073 for help?

@addaleax
Copy link

addaleax commented Aug 6, 2025

@aclark4life Soooo ... I don't know what you've done around that area so far, but it's great that you're bringing it up, I didn't realize we'd have to still worry about this part because I don't think the PR currently concerns itself at all with how we integrate with crypt_shared/mongocryptd?

So I'll explain the whole story here, knowing that I'm risking telling you things you're alread well aware of 🙂

  • For automatic CSFLE/QE, we don't just need the libmongocrypt integration provided by the MongoDB driver, we also need a query analysis engine. This is created from part of the server source code, and comes in two forms:
    1. mongocryptd, a daemon process that can be spawned and run by the driver and talked through via the MongoDB wire protocol. We consider this a legacy solution, but it is fully supported and not officially deprecated for now.
    2. The mongo_crypt_v1 shared library, a dll/shared object that can be loaded by libmongocrypt at runtime.

Operationally, mongocryptd is more difficult to deploy in production applications (you're essentially managing an external daemon process from the driver, which is complex on its own, and potentially sharing this instance between multiple application processes), so while it is currently still supported, we are looking to phase it out in the long run. For backwards compatibility reasons, it is still the default (and fallback) though.

So we generally recommend using the crypt_shared library, which can be achieved by passing crypt_shared_lib_path=... to the driver, and ideally also setting crypt_shared_lib_required=True so that libmongocrypt doesn't fall back to mongocryptd and instead fails if the crypt_shared library could not be loaded.

Given that this is a new feature here, I'd strongly consider setting crypt_shared_lib_required=True and only using the crypt_shared library by default, and never using mongocryptd (or only if the user has explicitly requested it). I've asked our PMs about this in https://mongodb.slack.com/archives/C0406ECL478/p1754488650159029, it feels like the right call to me but I'd like to double-check.

As far as the tests here are concerned themselves, I imagine they're passing locally for you because you do have mongocryptd in your path, so the driver is able to spawn it. Regardless on how you decide regarding mongocryptd, you'll need to install at least one of it or the crypt_shared library in CI, and in the latter case, point the driver to the path to it (happy to provide help with the details of this should you have any questions).

@aclark4life
Copy link
Collaborator Author

Given that this is a new feature here, I'd strongly consider setting crypt_shared_lib_required=True and only using the crypt_shared library by default, and never using mongocryptd (or only if the user has explicitly requested it). I've asked our PMs about this in https://mongodb.slack.com/archives/C0406ECL478/p1754488650159029, it feels like the right call to me but I'd like to double-check.

Can we document our way around this by recommending crypt_shared_lib_path in production but allowing mongocryptd in development? I think that is something we can get Django folks to accept but I don't think enforcing the use of crypt shared in development would go over well.

Also what about bundling that library in the pymongocrypt wheel? It's the convenience of pip install django-mongodb-backend[encryption] that we are after and we lose that with the enterprise download step.

@addaleax
Copy link

addaleax commented Aug 6, 2025

Can we document our way around this by recommending crypt_shared_lib_path in production but allowing mongocryptd in development? I think that is something we can get Django folks to accept but I don't think enforcing the use of crypt shared in development would go over well.

Maybe, but I think we'd want to have a conversation around what the typical expectations here are. You'll generally want to have development and production environments behave similarly, and you'd still need to have a plan for what to do when mongocryptd does get deprecated eventually (long-term, I think it's fair to expect this to happen). You'll also still be in a position where you need to download and install mongocryptd, the only case in which this requirement goes away is the one where you happen to have the enterprise MongoDB server binaries already ready in your $PATH.

Also what about bundling that library in the pymongocrypt wheel? It's the convenience of pip install django-mongodb-backend[encryption] that we are after and we lose that with the enterprise download step.

This question comes up on a regular basis 🙂 Here's a Slack thread from April, which was one of the last times we spoke about this.

tl;dr: Yes, the setup process for CSFLE/QE is involved, and we'd like to make it easier. Currently, there is a requirement for the user to explicitly acknowledge that they have read and accepted the enterprise license agreement and that they are an Atlas or EA customer. Bundling this library with regular packages that can be installed via a regular package manager command like pip install is therefore something we can't do right now.)

Copy link
Contributor

@Jibola Jibola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! I've provided comments and feedback around some documentation and clarifications, but will approve once those are addressed.

- Fix rebase merge conflict edits
- Remove integer field FIXME comments
- Remove pos_int
@@ -11,3 +11,4 @@ know:
embedded-models
transactions
known-issues
queryable-encryption
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

order above "transactions"

Comment on lines 29 to 32
.. django-admin-option:: --create-new-keys

If specified, creates the data keys instead of getting them from the
database.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option need some explanation in the howto. (use case)
Also, there is no explanation here that the command "gets the keys from the database"... need more explanation.

.. admonition:: List of encrypted fields

See the full list of :ref:`encrypted fields <encrypted-fields>` in the
:doc:`Model field reference </ref/models/fields>`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need to repeat the name of the linked doc "Model field reference"


def setUp(self):
self.appointment = Appointment.objects.create(time="8:00")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no blank line needed after every object

self.assertEqual(json1, json2)

def test_show_encrypted_fields_map(self):
self.maxDiff = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can set maxDiff = None at the class-level.

Comment on lines 42 to 45
.. admonition:: List of encrypted fields

See the full list of :ref:`encrypted fields <encrypted-fields>` in the
:doc:`Model field reference </ref/models/fields>`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth include this reference, but not in the middle of the example.

Comment on lines +64 to +68
{
_id: ObjectId('68825b066fac55353a8b2b41'),
ssn: '123-45-6789',
__safeContent__: [b'\xe0)NOFB\x9a,\x08\xd7\xdd\xb8\xa6\xba$…']
}
Copy link
Collaborator

@timgraham timgraham Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this is telling me. So encrypted fields appear in plain text, but what the heck is __safeContent__? As I developer, do I care about it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a link to explanation.

Comment on lines 47 to 48
You can query encrypted fields using a
:ref:`manual:qe-supported-query-operators` which must be specified in the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the best page to link to (btw, the text renders as "using a Supported Query Operators". It doesn't show any of the options https://www.mongodb.com/docs/manual/core/queryable-encryption/fundamentals/encrypt-and-query/#configure-encrypted-fields-for-optimal-search-and-storage


EQUALITY_QUERY = {"queryType": "equality"}
RANGE_QUERY = {"queryType": "range"}
RANGE_QUERY_MIN_MAX = {"queryType": "range", "min": 0, "max": 100}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for the values that aren't reused, it doesn't make sense to use a module level variable.

``AutoEncryptionOpts``.

Fortunately, this is easy to do with Django MongoDB Backend. You can use
the ``showencryptedfieldsmap`` management command to generate the schema map
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:djadmin:`showencryptedfieldsmap`

Comment on lines 131 to 133
.. note:: The ``showencryptedfieldsmap`` command is only available if you
have the ``django_mongodb_backend`` app included in the
:setting:`INSTALLED_APPS` setting.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From caching docs:

.. admonition:: Didn't work?

    If you get the error ``Unknown command: 'createcachecollection'``, ensure
    ``"django_mongodb_backend"`` is in your :setting:`INSTALLED_APPS` setting.

To generate the encrypted fields map, run the following command in your Django
project::

python manage.py showencryptedfieldsmap
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If following the above steps, --database=encrypted is also needed.

]
}
connection = connections["encrypted"]
auto_encryption_opts = getattr(connection.connection._options, "auto_encryption_opts", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getattr should be unnecessary. This test shouldn't run if auto_encryption_opt isn't set.


In addition to :doc:`installing </intro/install>` and
:doc:`configuring </intro/configure>` Django MongoDB Backend,
you will need to install PyMongo with Queryable Encryption support::
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think saying "you will need to install PyMongo with Queryable Encryption support" and then giving a command that doesn't mention pymongo could be confusing.

Probably "you need to install some additional dependencies::" could be enough.

else:
db.create_collection(db_table)

def _get_encrypted_fields_map(self, model, client, auto_encryption_opts, from_db=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_db is not very descriptive. How about create_data_keys?

Comment on lines 253 to 256
# Avoid using PyMongo to check the database version or require
# pymongocrypt>=1.14.2 which will contain a fix for the `buildInfo`
# command. https://jira.mongodb.org/browse/PYTHON-5429
return tuple(self.connection.admin.command("buildInfo")["versionArray"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still uses PyMongo. I guess you meant the server_info() method. Is there a big benefit to switching back to in the the future? If so, you can write something like "This can use tuple(self.connection.server_info()["versionArray"]) on pymongocrypt 1.14.2+" (heck, if there's a benefit, you can add some conditional code based on the pymongocrypt version)... pymongocrypt 1.15.0 was released Aug. 15 (1.14.1 is the latest 1.14.x release so unclear if there will be a 1.14.2). Anyway, some comment update is needed.


def handle(self, *args, **options):
db = options["database"]
create = options.get("create", False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
create = options.get("create", False)
create = options.get("`create_new_keys`", False)

for model in router.get_migratable_models(app_config, db):
if model_has_encrypted_fields(model):
from_db = not create
fields = connection.schema_editor()._get_encrypted_fields_map(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use with connection.schema_editor() as editor: at the top-level outside of the loops so you don't have to initialize a new one for every model.

else:
db.create_collection(db_table)

def _get_encrypted_fields_map(self, model, client, auto_encryption_opts, from_db=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto_encryption_opts needn't be an argument. It's an attribute of client.

Comment on lines +440 to +445
raise ImproperlyConfigured(
"Encrypted fields found but "
"DATABASES[[self.connection.alias}]['OPTIONS'] is missing "
"auto_encryption_opts. Please set `auto_encryption_opts` "
"in the connection settings."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try adding a test for this exception?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested it manually but haven't written a test for it yet.

else:
# If the encrypted fields map is provided, get the map for the
# specific collection.
encrypted_fields_map = encrypted_fields_map.get(db_table, None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean to fallback to None?

Comment on lines 70 to 71
"KMS_PROVIDERS": {},
"KMS_CREDENTIALS": {},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For KMS_CREDENTIALS, I believe the options are documented here: https://pymongo.readthedocs.io/en/stable/api/pymongo/encryption.html#pymongo.encryption.ClientEncryption.create_data_key (but pymongo docs are deprecated?)

So a minimal example for AWS: "KMS_CREDENTIALS": {"aws": {"region": "...", "key": "..."}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants