-
-
Notifications
You must be signed in to change notification settings - Fork 32.8k
Fixed #28586 -- Added model field fetching modes. #17554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6f35264
to
61af026
Compare
5787d68
to
5708991
Compare
bc159cf
to
43cd3ec
Compare
@charettes I’ve been slowly but steadily chipping away at this. Would you like to make an initial review, particularly of the public API in the release notes, and the way that the callable interface works? |
@adamchainz sure! Can it wait this weekend? I'm taking a long break for Easter so I should have plenty of focus time. |
Of course, take your time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 @adamchainz I just want to start by saying thank you for the continued work here, this is really looking promising 🙇
I discussed an idea on how defining a form of fetching protocol (like how resolvable w/ resolve_expression
and compilable w/ as_sql
work) to avoid unnecessary closures and keep fetcher modes to the strict minimum which is to receive an instance, a fetcher, and route the call to the fetcher.
There also seems to be a missing usage of fetch mode in ReverseOneToOneDescriptor.__get__
but maybe this is expected for now as this is still a WIP.
Lastly I left some comments with regards to naming. I'm sure others will have opinions as well but it felt like django.db.models.fetching
and fetching_mode
vs lazy
might be a better .
django/db/models/lazy.py
Outdated
pass | ||
|
||
|
||
def RAISE(instance, field, fetch_for_instances, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts about including a WARN
as well? It seems like the only remaining feature that doesn't make django-seal
obsolete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of that, but then maybe users would want “fetch peers” + warnings, or log messages directly rather than warnings. Maybe something we can instead document the pattern for? Like:
def warn_then_fetch_peers(...):
logger.warning("fetching peers!")
return FETCH_PEERS(...)
set_default_lazy_mode(warn_then_fetch_peers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that makes sense to separate non-terminal surfacing of leaks from the actual fetching implementation. I wonder if there is anything that we could provide to avoid such boiler plate though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Simon, thoughtful as always. (P.S. just finished your Django Chat episode yesterday, nice one!). I’ve replied inline to your comments and will work on an update.
There also seems to be a missing usage of fetch mode in
ReverseOneToOneDescriptor.__get__
but maybe this is expected for now as this is still a WIP.
Yes, this needs doing, as does GenericForeignKey
.
django/db/models/lazy.py
Outdated
pass | ||
|
||
|
||
def RAISE(instance, field, fetch_for_instances, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of that, but then maybe users would want “fetch peers” + warnings, or log messages directly rather than warnings. Maybe something we can instead document the pattern for? Like:
def warn_then_fetch_peers(...):
logger.warning("fetching peers!")
return FETCH_PEERS(...)
set_default_lazy_mode(warn_then_fetch_peers)
ad1b6d0
to
19c4f6e
Compare
I tried django/django/contrib/contenttypes/fields.py Lines 234 to 267 in ec85524
A |
This is looking great @adamchainz thanks for pushing this through and sorry for my slow replies. If you'd like I could give a shot at implementing the |
Completely untested code but from eye balling the other diff --git a/django/contrib/contenttypes/fields.py b/django/contrib/contenttypes/fields.py
index 770f88265c..2b7597623b 100644
--- a/django/contrib/contenttypes/fields.py
+++ b/django/contrib/contenttypes/fields.py
@@ -255,16 +255,30 @@ def __get__(self, instance, cls=None):
return rel_obj
else:
rel_obj = None
- if ct_id is not None:
- ct = self.get_content_type(id=ct_id, using=instance._state.db)
- try:
- rel_obj = ct.get_object_for_this_type(
- using=instance._state.db, pk=pk_val
- )
- except ObjectDoesNotExist:
- pass
- self.set_cached_value(instance, rel_obj)
- return rel_obj
+
+ get_fetching_mode()(self, instance)
+ return self.get_cached_value(instance)
+
+ def fetch(self, instances):
+ f = self.model._meta.get_field(self.ct_field)
+ if len(instances) == 1:
+ ct_id = getattr(instance, f.attname, None)
+ pk_val = getattr(instance, self.fk_field)
+ instance = instances[0]
+ rel_obj = None
+ if ct_id is not None:
+ ct = self.get_content_type(id=ct_id, using=instance._state.db)
+ try:
+ rel_obj = ct.get_object_for_this_type(
+ using=instance._state.db, pk=pk_val
+ )
+ except ObjectDoesNotExist:
+ pass
+ self.set_cached_value(instance, rel_obj)
+ else:
+ prefetch_related_objects(
+ instances, f.name
+ )
+ |
Gave a shot at it here adamchainz#4 |
Thanks Adam for working to push this thing to it's final form. I wonder when the 10 year anniversary of our first version is. I've had a skim, I plan to look at the details a bit more. I quite like the fetching modes mechanism and name. I don't know if this has been covered somewhere but separate modes for deferred fields and FK(and friends) makes a lot of sense to me and is probably what I will look to do in my own code. My rationale here is when you fetch an object from the DB, unless you do stuff to say otherwise all the primitive fields are fetched and none of the FK's. |
Also further to the above the doco encourages overridding refresh_from_db which makes setting FETCH_PEERS for deferred fields in your project a third party library compatibility risk as they might done that, although it does feel unlikely. I wonder if it could / should detect a custom refresh_from_db in FETCH_PEERS and... warn? fall back to FETCH_ONE? Likewise with the other backward compatibility things. FWIW I looked though old notes and it seems like the OG version was sometime around September 2016 (later than I remember) so this might still make it into core before the 10th anniversary. |
19c4f6e
to
0420124
Compare
66c48bd
to
def10a7
Compare
Hello all! @pelme pushed me to continue working on this 👍 I have just pushed a new version with these changes:
I think I‘d rather not split into two separate fetch modes/features. That would be harder to explain and maintain. I would rather just allow custom modes to do this, and document how to. But that would require covering lots of internal APIs right now, which is why I’ve held back on it.
🤞 Review would be appreciated. I’m going to try testing this branch against some real projects to check for edge cases. |
f509590
to
e1fca06
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work! This looks great! Me, @Majsvaffla, @nip3o and @saeldion have made this review together. We are very much looking forward having fetch modes in Django.
docs/releases/6.0.txt
Outdated
|
||
The on-demand fetching behavior of model fields is now configurable with | ||
:doc:`fetch modes </topics/db/fetch-modes>`. Django provides three fetch | ||
modes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also in this first paragraph call out that the main motivation for fetch modes is to prevent N+1 queries/missing select_related/prefetch_related on foreignkeys/related fields? fetch modes does more than that (deferred fields) but I still think that would be helpful for people reading this the first time.
It is stated below but I find this an important point to get across very early 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments about the global state fetcher approach, __set_name__
manual calls, systematic creation of peers: list[weakref]
.
django/db/models/fields/__init__.py
Outdated
setattr(cls, self.attname, self.descriptor_class(self)) | ||
descriptor = self.descriptor_class(self) | ||
setattr(cls, self.attname, descriptor) | ||
descriptor.__set_name__(cls, self.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we potentially avoid calling these magic methods directly? Why can't descriptors rely on self.field.name
instead if they require it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aim here was to have the "fetcher protocol" include a consistent way to see the related field's name available, so the RAISE
mode can rely on it.
Descriptors did not consistently point at fields. At least GenericForeignKey
is, oddly, its own descriptor. Adding a recursive field
attribute there felt wrong, but it seemed to make sense to call __set_name__
here because it's the same point that Python would do so if the descriptor were placed in the class definition.
In retrospect, it's probably less invasive and more sensible to make field
consistently available, and fix GFK to have a separate descriptor class. I'll give that a try now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I just pushed a fix. It required a new commit that splits a descriptor class from GenericForeignKey
, which I also put into a new PR at #19381. Another minor hack it required is a new property in ReverseOneToOneDescriptor
for field
, with a docstring explaining the ducktyping.
b596f1d
to
c375047
Compare
I have just pushed a new version that is implemented solely as a After working through it, I agree with you, @charettes, that this cuts with the grain of Django. While the global state is convenient, it could easily mess up some expectations in third-party apps. I thought it would be nice to allow swapping the default fetch mode, but from some experience testing this PR, I’m not 100% sure it’s a great idea. On one project that I tested, using the I will respond to the other review comments later today, and I also intend to work on some further changes:
|
94db176
to
2a55dc4
Compare
Re @tolomea
I understand the rationale for splitting behaviour between FKs and deferred fields, but I think actually splitting them in the API would be too fine-grained. Fetching deferred fields in bulk is still useful, such as for model classes that defer some fields by default.
Yes, this would be the way. We could document a public API around combining fetch modes like this, but maybe in a follow-up PR.
Thankfully, this risk is gone by adopting a |
2a55dc4
to
d138911
Compare
I've just pushed an update with a new commit that does the "nesting", copying the fetch mode to related objects to apply to a whole tree of objects unless explicitly replaced. I'm still going to work a bit on the |
61aa0fc
to
548939e
Compare
I just fixed the test failure. (I wrote behaviour instead of behavior, failing the spellcheck!) |
548939e
to
b676579
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adamchainz this looks great! I left a small thought
co-authored-by: Andreas Pelme <andreas@pelme.se> co-authored-by: Simon Charette <charette.s@gmail.com> co-authored-by: Jacob Walls <jacobtylerwalls@gmail.com>
b676579
to
03a80bb
Compare
Using this fetch mode is easier than declaring fields to fetch with | ||
:meth:`~django.db.models.query.QuerySet.select_related` or | ||
:meth:`~django.db.models.query.QuerySet.prefetch_related`, especially when it's | ||
hard to predict which fields will be accessed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been puzzled before why prefetch_related()
does nothing, but select_related()
selects everything. We could straighten this out if we let prefetch_related()
resolve to fetch_mode(models.FETCH_PEERS)
, if I understand correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that select_related() call is almost always a bad idea, it's very prone to over fetching and to doing so in a way that is fairly hostile to the database, I'd be more in favor of removing that behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lean towards removing that, too. It’s worth noting that the admin allows users to use it:
django/django/contrib/admin/views/main.py
Lines 601 to 602 in 7e4b371
if self.list_select_related is True: | |
return qs.select_related() |
Maybe enough people rely on it that it’s not worth removing though...? Seems kinda handy for rapid prototyping at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strong +1 to deprecating select_related()
without a mask as a follow up that's almost always a bad idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pushing this forward and sorry for the delay in review @adamchainz.
I left a few comments mainly requesting clarifications and more verbose commit message but in essence the state of this PR is looking great to me 🥇 🙇
Using this fetch mode is easier than declaring fields to fetch with | ||
:meth:`~django.db.models.query.QuerySet.select_related` or | ||
:meth:`~django.db.models.query.QuerySet.prefetch_related`, especially when it's | ||
hard to predict which fields will be accessed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strong +1 to deprecating select_related()
without a mask as a follow up that's almost always a bad idea.
@@ -157,11 +156,19 @@ def get_content_type(self, obj=None, id=None, using=None, model=None): | |||
# This should never happen. I love comments like this, don't you? | |||
raise Exception("Impossible arguments to GFK.get_content_type!") | |||
|
|||
|
|||
class GenericForeignKeyDescriptor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I understand the rationale behind this change.
The same thing could be achieved by adding a field
property on GenericForeignKey
that returns self
but a distinct class is deemed preferable as it makes GenericForeignKey
follow the pattern used by other related fields.
It might be worth explaining the rationale behind these changes in the commit message and the benefits it provides for the followed commits in the patch set for future reference.
:class:`~django.db.models.ManyToManyField`), instances of that model will have | ||
a convenient API to access the related object(s). | ||
:class:`~django.db.models.ManyToManyField`), instances of the model class gain | ||
accessor attributes for the related object(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again sensible chain but a commit description would go a long way explain why this is being done here for future reference.
return rel_obj | ||
|
||
def fetch_many(self, instances): | ||
missing_instances = [i for i in instances if not self.field.is_cached(i)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor but it might be worth using a local reference for self.field.is_cached
to avoid two attribute lookups per instance in a potentially large loop
is_cached = self.field.is_cache
missing_instances = [i for i in instances if not is_cached(i)]
state.pop("fetch_mode", None) | ||
state.pop("peers", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I understand, does this mean that a pickled model instances, and by definition a queryset of such objects, looses its fetch mode affinity? For example would that mean that caching a queryset though pickling and then unpickling it would not preserve the specified mode (e.g. raising errors on attributes that would require fetching).
Maybe that's by design but the lack of comments makes me wonder the purpose behind it.
raise FieldFetchBlocked(f"Fetching of {klass}.{field_name} blocked.") | ||
|
||
|
||
RAISE = Raise() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really like how this turned out, thanks for the tweaks!
def get_queryset(self, *, instance): | ||
return self.related.related_model._base_manager.db_manager( | ||
hints={"instance": instance} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate that a clearer signature is used but it should be done in a distinct commit as it might be used under other circumstances in the wild.
querysets[0] if querysets else self.get_queryset(instance=instances[0]) | ||
) | ||
queryset._add_hints(instance=instances[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't these two instance
hint passing redundant?
def fetch(self, fetcher, instance): | ||
klass = instance.__class__.__qualname__ | ||
field_name = fetcher.field.name | ||
raise FieldFetchBlocked(f"Fetching of {klass}.{field_name} blocked.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this tool, currently using it to debug a project. I noticed that with RAISE
, the exception raised here will include context like:
Traceback (most recent call last):
File "/Users/jwalls/django/django/db/models/fields/related_descriptors.py", line 231, in __get__
rel_obj = self.field.get_cached_value(instance)
File "/Users/jwalls/django/django/db/models/fields/mixins.py", line 21, in get_cached_value
return instance._state.fields_cache[self.cache_name]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
KeyError: 'nodegroup'
During handling of the above exception, another exception occurred:
What do you think about raising this error from None
to suppress that irrelevant context?
.. admonition:: Referencing fetch modes | ||
|
||
Fetch modes are defined in ``django.db.models.fetch_modes``, but for | ||
convenience they're imported into :mod:`django.db.models`. The standard | ||
convention is to use ``from django.db import models`` and refer to the | ||
fetch modes as ``models.<mode>``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this admonition, but the examples in the docs or the tests with this PR don't use this pattern.
If we are going to encourage this models.<mode>
pattern, I think it should be used in the examples and tests too.
Trac ticket number
ticket-28586
Branch description
Add
QuerySet.fetch_mode()
to set the fetch mode, controlling how on-demand field loading occurs. Include three fetch modes,FETCH_ONE
(default, existing behaviour),FETCH_PEERS
, andRAISE
.Checklist
main
branch.