Skip to content

Fixed #28586 -- Added model field fetching modes. #17554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

adamchainz
Copy link
Member

@adamchainz adamchainz commented Nov 29, 2023

Trac ticket number

ticket-28586

Branch description

Add QuerySet.fetch_mode() to set the fetch mode, controlling how on-demand field loading occurs. Include three fetch modes, FETCH_ONE (default, existing behaviour), FETCH_PEERS, and RAISE.

Checklist

  • This PR targets the main branch.
  • The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • I have checked the "Has patch" ticket flag in the Trac system.
  • I have added or updated relevant tests.
  • I have added or updated relevant docs, including release notes if applicable.
  • I have attached screenshots in both light and dark modes for any UI changes.

@adamchainz adamchainz changed the title Fixed #28586 -- Added model field lazy loading modes. Fixed #28586 -- Added model field lazy-loading modes. Feb 13, 2024
@adamchainz
Copy link
Member Author

@charettes I’ve been slowly but steadily chipping away at this. Would you like to make an initial review, particularly of the public API in the release notes, and the way that the callable interface works?

@charettes
Copy link
Member

@adamchainz sure! Can it wait this weekend? I'm taking a long break for Easter so I should have plenty of focus time.

@adamchainz
Copy link
Member Author

Of course, take your time.

@adamchainz adamchainz requested a review from charettes April 7, 2024 09:56
Copy link
Member

@charettes charettes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 @adamchainz I just want to start by saying thank you for the continued work here, this is really looking promising 🙇

I discussed an idea on how defining a form of fetching protocol (like how resolvable w/ resolve_expression and compilable w/ as_sql work) to avoid unnecessary closures and keep fetcher modes to the strict minimum which is to receive an instance, a fetcher, and route the call to the fetcher.

There also seems to be a missing usage of fetch mode in ReverseOneToOneDescriptor.__get__ but maybe this is expected for now as this is still a WIP.

Lastly I left some comments with regards to naming. I'm sure others will have opinions as well but it felt like django.db.models.fetching and fetching_mode vs lazy might be a better .

pass


def RAISE(instance, field, fetch_for_instances, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts about including a WARN as well? It seems like the only remaining feature that doesn't make django-seal obsolete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of that, but then maybe users would want “fetch peers” + warnings, or log messages directly rather than warnings. Maybe something we can instead document the pattern for? Like:

def warn_then_fetch_peers(...):
    logger.warning("fetching peers!")
    return FETCH_PEERS(...)

set_default_lazy_mode(warn_then_fetch_peers)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense to separate non-terminal surfacing of leaks from the actual fetching implementation. I wonder if there is anything that we could provide to avoid such boiler plate though.

Copy link
Member Author

@adamchainz adamchainz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Simon, thoughtful as always. (P.S. just finished your Django Chat episode yesterday, nice one!). I’ve replied inline to your comments and will work on an update.

There also seems to be a missing usage of fetch mode in ReverseOneToOneDescriptor.__get__ but maybe this is expected for now as this is still a WIP.

Yes, this needs doing, as does GenericForeignKey.

pass


def RAISE(instance, field, fetch_for_instances, **kwargs):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of that, but then maybe users would want “fetch peers” + warnings, or log messages directly rather than warnings. Maybe something we can instead document the pattern for? Like:

def warn_then_fetch_peers(...):
    logger.warning("fetching peers!")
    return FETCH_PEERS(...)

set_default_lazy_mode(warn_then_fetch_peers)

@adamchainz adamchainz force-pushed the ticket_28586 branch 2 times, most recently from ad1b6d0 to 19c4f6e Compare April 25, 2024 16:45
@adamchainz
Copy link
Member Author

Yes, this needs doing, as does GenericForeignKey.

I tried GenericForeignKey just now, but it’s complicated:

def __get__(self, instance, cls=None):
if instance is None:
return self
# Don't use getattr(instance, self.ct_field) here because that might
# reload the same ContentType over and over (#5570). Instead, get the
# content type ID here, and later when the actual instance is needed,
# use ContentType.objects.get_for_id(), which has a global cache.
f = self.model._meta.get_field(self.ct_field)
ct_id = getattr(instance, f.attname, None)
pk_val = getattr(instance, self.fk_field)
rel_obj = self.get_cached_value(instance, default=None)
if rel_obj is None and self.is_cached(instance):
return rel_obj
if rel_obj is not None:
ct_match = (
ct_id == self.get_content_type(obj=rel_obj, using=instance._state.db).id
)
pk_match = ct_match and rel_obj._meta.pk.to_python(pk_val) == rel_obj.pk
if pk_match:
return rel_obj
else:
rel_obj = None
if ct_id is not None:
ct = self.get_content_type(id=ct_id, using=instance._state.db)
try:
rel_obj = ct.get_object_for_this_type(
using=instance._state.db, pk=pk_val
)
except ObjectDoesNotExist:
pass
self.set_cached_value(instance, rel_obj)
return rel_obj

A fetch() function would need to perform all this validation per-instance. I think for ease of coding and review, we should tackle it in a follow-up PR as an optimization.

@charettes
Copy link
Member

charettes commented May 5, 2024

This is looking great @adamchainz thanks for pushing this through and sorry for my slow replies.

If you'd like I could give a shot at implementing the GenericForeignKey part in a separate branch that targets this one. Even if it doesn't land in the same PR it would be great to validate that our approach works properly and possibly dog-food the newly introduced GenericPrefetch.

@charettes
Copy link
Member

charettes commented May 6, 2024

Completely untested code but from eye balling the other fetch methods the following should work without too much issues?

diff --git a/django/contrib/contenttypes/fields.py b/django/contrib/contenttypes/fields.py
index 770f88265c..2b7597623b 100644
--- a/django/contrib/contenttypes/fields.py
+++ b/django/contrib/contenttypes/fields.py
@@ -255,16 +255,30 @@ def __get__(self, instance, cls=None):
                 return rel_obj
             else:
                 rel_obj = None
-        if ct_id is not None:
-            ct = self.get_content_type(id=ct_id, using=instance._state.db)
-            try:
-                rel_obj = ct.get_object_for_this_type(
-                    using=instance._state.db, pk=pk_val
-                )
-            except ObjectDoesNotExist:
-                pass
-        self.set_cached_value(instance, rel_obj)
-        return rel_obj
+
+        get_fetching_mode()(self, instance)
+        return self.get_cached_value(instance)
+
+    def fetch(self, instances):
+        f = self.model._meta.get_field(self.ct_field)
+        if len(instances) == 1:
+            ct_id = getattr(instance, f.attname, None)
+            pk_val = getattr(instance, self.fk_field)
+            instance = instances[0]
+            rel_obj = None
+            if ct_id is not None:
+                ct = self.get_content_type(id=ct_id, using=instance._state.db)
+                try:
+                    rel_obj = ct.get_object_for_this_type(
+                        using=instance._state.db, pk=pk_val
+                    )
+                except ObjectDoesNotExist:
+                    pass
+            self.set_cached_value(instance, rel_obj)
+        else:
+            prefetch_related_objects(
+                instances, f.name
+            )
+

@charettes
Copy link
Member

Gave a shot at it here adamchainz#4

@tolomea
Copy link
Contributor

tolomea commented Jun 4, 2024

Thanks Adam for working to push this thing to it's final form. I wonder when the 10 year anniversary of our first version is.

I've had a skim, I plan to look at the details a bit more.

I quite like the fetching modes mechanism and name.

I don't know if this has been covered somewhere but separate modes for deferred fields and FK(and friends) makes a lot of sense to me and is probably what I will look to do in my own code.
With the current API I guess it's possible by doing isinstance checks inside a custom mode.
But that's not very obvious or discoverable.

My rationale here is when you fetch an object from the DB, unless you do stuff to say otherwise all the primitive fields are fetched and none of the FK's.
Absence of an FK is the default it means no one explicitly asked for it.
Absence of a primitive field is unusual it means someone explicitly said don't load this.
So I would want a config that automatically fetches the FK's (in bulk) and errors on the primitive fields.

@tolomea
Copy link
Contributor

tolomea commented Jun 4, 2024

Also further to the above the doco encourages overridding refresh_from_db which makes setting FETCH_PEERS for deferred fields in your project a third party library compatibility risk as they might done that, although it does feel unlikely.

I wonder if it could / should detect a custom refresh_from_db in FETCH_PEERS and... warn? fall back to FETCH_ONE?

Likewise with the other backward compatibility things.

FWIW I looked though old notes and it seems like the OG version was sometime around September 2016 (later than I remember) so this might still make it into core before the 10th anniversary.

@adamchainz adamchainz changed the title Fixed #28586 -- Added model field lazy-loading modes. Fixed #28586 -- Added model field fetching modes. Feb 20, 2025
@adamchainz adamchainz force-pushed the ticket_28586 branch 3 times, most recently from 66c48bd to def10a7 Compare February 21, 2025 11:40
@adamchainz
Copy link
Member Author

Hello all! @pelme pushed me to continue working on this 👍 I have just pushed a new version with these changes:

  1. Rebased onto main with updates to target Django 6.0.
  2. Merged in the branch from @charettes for GenericForeignKey support, plus a tweak to use a separate GenericForeignKeyFetcher class so it works with the RAISE method. (We could also split a descriptor class from GenericForeignKey to achieve this, but I’m concerned about backwards-compatibility, despite it not being public API.)
  3. Renamed things to cover “fetch modes” (not fetching) consistently.
  4. Added a database topic guide on fetch modes.

I don't know if this has been covered somewhere but separate modes for deferred fields and FK(and friends) makes a lot of sense to me and is probably what I will look to do in my own code.
With the current API I guess it's possible by doing isinstance checks inside a custom mode.
But that's not very obvious or discoverable.

I think I‘d rather not split into two separate fetch modes/features. That would be harder to explain and maintain. I would rather just allow custom modes to do this, and document how to. But that would require covering lots of internal APIs right now, which is why I’ve held back on it.

FWIW I looked though old notes and it seems like the OG version was sometime around September 2016 (later than I remember) so this might still make it into core before the 10th anniversary.

🤞


Review would be appreciated. I’m going to try testing this branch against some real projects to check for edge cases.

@adamchainz adamchainz force-pushed the ticket_28586 branch 3 times, most recently from f509590 to e1fca06 Compare February 21, 2025 22:02
Copy link
Member

@pelme pelme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! This looks great! Me, @Majsvaffla, @nip3o and @saeldion have made this review together. We are very much looking forward having fetch modes in Django.


The on-demand fetching behavior of model fields is now configurable with
:doc:`fetch modes </topics/db/fetch-modes>`. Django provides three fetch
modes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also in this first paragraph call out that the main motivation for fetch modes is to prevent N+1 queries/missing select_related/prefetch_related on foreignkeys/related fields? fetch modes does more than that (deferred fields) but I still think that would be helpful for people reading this the first time.

It is stated below but I find this an important point to get across very early 🙂

Copy link
Member

@charettes charettes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments about the global state fetcher approach, __set_name__ manual calls, systematic creation of peers: list[weakref].

setattr(cls, self.attname, self.descriptor_class(self))
descriptor = self.descriptor_class(self)
setattr(cls, self.attname, descriptor)
descriptor.__set_name__(cls, self.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we potentially avoid calling these magic methods directly? Why can't descriptors rely on self.field.name instead if they require it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aim here was to have the "fetcher protocol" include a consistent way to see the related field's name available, so the RAISE mode can rely on it.

Descriptors did not consistently point at fields. At least GenericForeignKey is, oddly, its own descriptor. Adding a recursive field attribute there felt wrong, but it seemed to make sense to call __set_name__ here because it's the same point that Python would do so if the descriptor were placed in the class definition.

In retrospect, it's probably less invasive and more sensible to make field consistently available, and fix GFK to have a separate descriptor class. I'll give that a try now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I just pushed a fix. It required a new commit that splits a descriptor class from GenericForeignKey, which I also put into a new PR at #19381. Another minor hack it required is a new property in ReverseOneToOneDescriptor for field, with a docstring explaining the ducktyping.

@adamchainz
Copy link
Member Author

adamchainz commented Mar 31, 2025

I've brought it before but the fact the fetch mode is a global variable as opposed to attributed to QuerySet instances is likely going to make ergonomics quite weird when wanting to mix modes for distinct querysets or enable it only in some cases.

I have just pushed a new version that is implemented solely as a QuerySet method.

After working through it, I agree with you, @charettes, that this cuts with the grain of Django. While the global state is convenient, it could easily mess up some expectations in third-party apps.

I thought it would be nice to allow swapping the default fetch mode, but from some experience testing this PR, I’m not 100% sure it’s a great idea. On one project that I tested, using the RAISE fetch mode broke migrations because some migration RunSQL operations in Wagtail relied on the default FETCH_ONE-style behaviour. Opting in individual QuerySets, or models through a custom manager is the safer option, if a little more involved.

I will respond to the other review comments later today, and I also intend to work on some further changes:

  • Nesting: copy a model instance’s fetch mode to related objects fetched through the fetch mode
  • Nesting: copy a model instance’s fetch mode through its related managers
  • Update select_related() and prefetch_related() docs to refer to fetch modes

@adamchainz adamchainz force-pushed the ticket_28586 branch 3 times, most recently from 94db176 to 2a55dc4 Compare March 31, 2025 16:08
@adamchainz
Copy link
Member Author

Re @tolomea

I don't know if this has been covered somewhere but separate modes for deferred fields and FK(and friends) makes a lot of sense to me and is probably what I will look to do in my own code.

I understand the rationale for splitting behaviour between FKs and deferred fields, but I think actually splitting them in the API would be too fine-grained. Fetching deferred fields in bulk is still useful, such as for model classes that defer some fields by default.

With the current API I guess it's possible by doing isinstance checks inside a custom mode. But that's not very obvious or discoverable.

Yes, this would be the way. We could document a public API around combining fetch modes like this, but maybe in a follow-up PR.

Also further to the above the doco encourages overridding refresh_from_db which makes setting FETCH_PEERS for deferred fields in your project a third party library compatibility risk as they might done that, although it does feel unlikely.

Thankfully, this risk is gone by adopting a QuerySet-based API. Instead, third-party libraries will need to opt into using a different fetch mode.

@adamchainz
Copy link
Member Author

I've just pushed an update with a new commit that does the "nesting", copying the fetch mode to related objects to apply to a whole tree of objects unless explicitly replaced.

I'm still going to work a bit on the select_related and prefetch_related docs, but I think we're nearly there on the QuerySet-based design.

@adamchainz adamchainz force-pushed the ticket_28586 branch 5 times, most recently from 61aa0fc to 548939e Compare April 15, 2025 13:15
@adamchainz
Copy link
Member Author

I just fixed the test failure. (I wrote behaviour instead of behavior, failing the spellcheck!)

Copy link
Contributor

@g-nie g-nie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adamchainz this looks great! I left a small thought

adamchainz and others added 4 commits April 29, 2025 23:40
co-authored-by: Andreas Pelme <andreas@pelme.se>
co-authored-by: Simon Charette <charette.s@gmail.com>
co-authored-by: Jacob Walls <jacobtylerwalls@gmail.com>
Comment on lines +215 to +218
Using this fetch mode is easier than declaring fields to fetch with
:meth:`~django.db.models.query.QuerySet.select_related` or
:meth:`~django.db.models.query.QuerySet.prefetch_related`, especially when it's
hard to predict which fields will be accessed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been puzzled before why prefetch_related() does nothing, but select_related() selects everything. We could straighten this out if we let prefetch_related() resolve to fetch_mode(models.FETCH_PEERS), if I understand correctly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that select_related() call is almost always a bad idea, it's very prone to over fetching and to doing so in a way that is fairly hostile to the database, I'd be more in favor of removing that behaviour

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lean towards removing that, too. It’s worth noting that the admin allows users to use it:

if self.list_select_related is True:
return qs.select_related()

Maybe enough people rely on it that it’s not worth removing though...? Seems kinda handy for rapid prototyping at least.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong +1 to deprecating select_related() without a mask as a follow up that's almost always a bad idea.

Copy link
Member

@charettes charettes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing this forward and sorry for the delay in review @adamchainz.

I left a few comments mainly requesting clarifications and more verbose commit message but in essence the state of this PR is looking great to me 🥇 🙇

Comment on lines +215 to +218
Using this fetch mode is easier than declaring fields to fetch with
:meth:`~django.db.models.query.QuerySet.select_related` or
:meth:`~django.db.models.query.QuerySet.prefetch_related`, especially when it's
hard to predict which fields will be accessed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong +1 to deprecating select_related() without a mask as a follow up that's almost always a bad idea.

@@ -157,11 +156,19 @@ def get_content_type(self, obj=None, id=None, using=None, model=None):
# This should never happen. I love comments like this, don't you?
raise Exception("Impossible arguments to GFK.get_content_type!")


class GenericForeignKeyDescriptor:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand the rationale behind this change.

The same thing could be achieved by adding a field property on GenericForeignKey that returns self but a distinct class is deemed preferable as it makes GenericForeignKey follow the pattern used by other related fields.

It might be worth explaining the rationale behind these changes in the commit message and the benefits it provides for the followed commits in the patch set for future reference.

:class:`~django.db.models.ManyToManyField`), instances of that model will have
a convenient API to access the related object(s).
:class:`~django.db.models.ManyToManyField`), instances of the model class gain
accessor attributes for the related object(s).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again sensible chain but a commit description would go a long way explain why this is being done here for future reference.

return rel_obj

def fetch_many(self, instances):
missing_instances = [i for i in instances if not self.field.is_cached(i)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor but it might be worth using a local reference for self.field.is_cached to avoid two attribute lookups per instance in a potentially large loop

is_cached = self.field.is_cache
missing_instances = [i for i in instances if not is_cached(i)]

Comment on lines +490 to +491
state.pop("fetch_mode", None)
state.pop("peers", None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand, does this mean that a pickled model instances, and by definition a queryset of such objects, looses its fetch mode affinity? For example would that mean that caching a queryset though pickling and then unpickling it would not preserve the specified mode (e.g. raising errors on attributes that would require fetching).

Maybe that's by design but the lack of comments makes me wonder the purpose behind it.

raise FieldFetchBlocked(f"Fetching of {klass}.{field_name} blocked.")


RAISE = Raise()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like how this turned out, thanks for the tweaks!

Comment on lines +450 to +452
def get_queryset(self, *, instance):
return self.related.related_model._base_manager.db_manager(
hints={"instance": instance}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate that a clearer signature is used but it should be done in a distinct commit as it might be used under other circumstances in the wild.

Comment on lines +462 to 464
querysets[0] if querysets else self.get_queryset(instance=instances[0])
)
queryset._add_hints(instance=instances[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't these two instance hint passing redundant?

def fetch(self, fetcher, instance):
klass = instance.__class__.__qualname__
field_name = fetcher.field.name
raise FieldFetchBlocked(f"Fetching of {klass}.{field_name} blocked.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this tool, currently using it to debug a project. I noticed that with RAISE, the exception raised here will include context like:

Traceback (most recent call last):
  File "/Users/jwalls/django/django/db/models/fields/related_descriptors.py", line 231, in __get__
    rel_obj = self.field.get_cached_value(instance)
  File "/Users/jwalls/django/django/db/models/fields/mixins.py", line 21, in get_cached_value
    return instance._state.fields_cache[self.cache_name]
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
KeyError: 'nodegroup'

During handling of the above exception, another exception occurred:

What do you think about raising this error from None to suppress that irrelevant context?

Comment on lines +34 to +39
.. admonition:: Referencing fetch modes

Fetch modes are defined in ``django.db.models.fetch_modes``, but for
convenience they're imported into :mod:`django.db.models`. The standard
convention is to use ``from django.db import models`` and refer to the
fetch modes as ``models.<mode>``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this admonition, but the examples in the docs or the tests with this PR don't use this pattern.
If we are going to encourage this models.<mode> pattern, I think it should be used in the examples and tests too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants