-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Optimize calls to get_type_ref
#10863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Deploying pydantic-docs with
|
Latest commit: |
000e60e
|
Status: | ✅ Deploy successful! |
Preview URL: | https://286f9f34.pydantic-docs.pages.dev |
Branch Preview URL: | https://type-ref-perf.pydantic-docs.pages.dev |
CodSpeed Performance ReportMerging #10863 will improve performances by 30.11%Comparing Summary
Benchmarks breakdown
|
I don't think it is a viable option to remove support for |
@@ -671,30 +671,6 @@ def model_validate_strings( | |||
__tracebackhide__ = True | |||
return cls.__pydantic_validator__.validate_strings(obj, strict=strict, context=context) | |||
|
|||
@classmethod | |||
def __get_pydantic_core_schema__(cls, source: type[BaseModel], handler: GetCoreSchemaHandler, /) -> CoreSchema: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think this is a good idea. This is definitely being relied on. Its not internal and also documented for customizing the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the perf improvement could be maybe done so that the __get_pydantic_core_schema__
can not be overriden but instead coming from model_config (with similar args). Users then have option to migrate into that. Then if its defined in model config the perf improvements are not applied. (Didnt fully follow why its causing issues here but maybe possible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, this doesn't remove support for __get_pydantic_core_schema__
. This is still handled by the added _generate_schema_from_get_schema_method
method.
Users are still free to implement __get_pydantic_core_schema__
on Pydantic models, however we provide no guarantees (and never did, see the breaking changes concerns section in my original post) it will work flawlessly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah interesting. Will give this another review then. Didn't realize users could still implement support. Nice!
Ah, I jumped the gun here on my review. Reread your new description, excited to look again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, I'm in favor of this change, after better understanding the scope.
if is_function_with_inner_schema(schema): | ||
ref = schema['schema'].pop('ref', None) # pyright: ignore[reportCallIssue, reportArgumentType] | ||
if ref: | ||
schema['ref'] = ref | ||
else: | ||
ref = get_ref(schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we lost this logic - is this needed anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to what I mentioned:
the first one I could find: #7102. The #7102 (comment) isn't really compelling, the user was doing things not the intended way. It also led to a scary and hacky fix that I would really like not having.
These removed lines had the effect of moving the ref from the inner schema of a function-*
schema to the the function-*
schema itself. With the following simplified test code added in the mentioned issue above:
class Numeric(BaseModel):
value: float
@classmethod
def __get_pydantic_core_schema__(cls, source_type, handler):
return core_schema.no_info_before_validator_function(cls.validate, handler(source_type))
@classmethod
def validate(cls, v):
...
class OuterModel(BaseModel):
x: Numeric
y: Numeric
On main
, the schema of OuterModel
would look like:
{
│ 'type': 'definitions',
│ 'schema': {
│ │ 'type': 'model',
│ │ 'cls': <class '__main__.OuterModel'>,
│ │ 'schema': {
│ │ │ 'type': 'model-fields',
│ │ │ 'fields': {
│ │ │ │ 'x': {'type': 'model-field', 'schema': {'type': 'definition-ref', 'schema_ref': '__main__.Numeric:109238506193520'}, 'metadata': {}},
│ │ │ │ 'y': {'type': 'model-field', 'schema': {'type': 'definition-ref', 'schema_ref': '__main__.Numeric:109238506193520'}, 'metadata': {}}
│ │ │ },
│ │ │ 'model_name': 'OuterModel',
│ │ │ 'computed_fields': []
│ │ },
│ │ 'config': {'title': 'OuterModel'},
│ │ 'ref': '__main__.OuterModel:109238503123056',
│ │ 'metadata': {'<stripped>'}
│ },
│ 'definitions': [
│ │ {
│ │ │ 'function': {'type': 'no-info', 'function': <bound method Numeric.validate of <class '__main__.Numeric'>>},
│ │ │ 'schema': {
│ │ │ │ 'type': 'model',
│ │ │ │ 'cls': <class '__main__.Numeric'>,
│ │ │ │ 'schema': {
│ │ │ │ │ 'type': 'model-fields',
│ │ │ │ │ 'fields': {'value': {'type': 'model-field', 'schema': {'type': 'float'}, 'metadata': {}}},
│ │ │ │ │ 'model_name': 'Numeric',
│ │ │ │ │ 'computed_fields': []
│ │ │ │ },
│ │ │ │ 'config': {'title': 'Numeric'}
│ │ │ },
│ │ │ 'ref': '__main__.Numeric:109238506193520',
│ │ │ 'metadata': {'<stripped>'},
│ │ │ 'type': 'function-before'
│ │ }
│ ]
}
On this PR, it looks like:
{
│ 'type': 'definitions',
│ 'schema': {
│ │ 'type': 'model',
│ │ 'cls': <class '__main__.OuterModel'>,
│ │ 'schema': {
│ │ │ 'type': 'model-fields',
│ │ │ 'fields': {
│ │ │ │ 'x': {
│ │ │ │ │ 'type': 'model-field',
│ │ │ │ │ 'schema': {
│ │ │ │ │ │ 'function': {'type': 'no-info', 'function': <bound method Numeric.validate of <class '__main__.Numeric'>>},
│ │ │ │ │ │ 'schema': {
│ │ │ │ │ │ │ 'function': {'type': 'no-info', 'function': <bound method Numeric.validate of <class '__main__.Numeric'>>},
│ │ │ │ │ │ │ 'schema': {'type': 'definition-ref', 'schema_ref': '__main__.Numeric:105945921898336'},
│ │ │ │ │ │ │ 'metadata': {'<stripped>'},
│ │ │ │ │ │ │ 'type': 'function-before'
│ │ │ │ │ │ },
│ │ │ │ │ │ 'metadata': {'<stripped>'},
│ │ │ │ │ │ 'type': 'function-before'
│ │ │ │ │ },
│ │ │ │ │ 'metadata': {}
│ │ │ │ },
│ │ │ │ 'y': {
│ │ │ │ │ 'type': 'model-field',
│ │ │ │ │ 'schema': {
│ │ │ │ │ │ 'function': {'type': 'no-info', 'function': <bound method Numeric.validate of <class '__main__.Numeric'>>},
│ │ │ │ │ │ 'schema': {
│ │ │ │ │ │ │ 'function': {'type': 'no-info', 'function': <bound method Numeric.validate of <class '__main__.Numeric'>>},
│ │ │ │ │ │ │ 'schema': {'type': 'definition-ref', 'schema_ref': '__main__.Numeric:105945921898336'},
│ │ │ │ │ │ │ 'metadata': {'<stripped>'},
│ │ │ │ │ │ │ 'type': 'function-before'
│ │ │ │ │ │ },
│ │ │ │ │ │ 'metadata': {'<stripped>'},
│ │ │ │ │ │ 'type': 'function-before'
│ │ │ │ │ },
│ │ │ │ │ 'metadata': {}
│ │ │ │ }
│ │ │ },
│ │ │ 'model_name': 'OuterModel',
│ │ │ 'computed_fields': []
│ │ },
│ │ 'config': {'title': 'OuterModel'},
│ │ 'ref': '__main__.OuterModel:105945922827312',
│ │ 'metadata': {'<stripped>'}
│ },
│ 'definitions': [
│ │ {
│ │ │ 'type': 'model',
│ │ │ 'cls': <class '__main__.Numeric'>,
│ │ │ 'schema': {
│ │ │ │ 'type': 'model-fields',
│ │ │ │ 'fields': {'value': {'type': 'model-field', 'schema': {'type': 'float'}, 'metadata': {}}},
│ │ │ │ 'model_name': 'Numeric',
│ │ │ │ 'computed_fields': []
│ │ │ },
│ │ │ 'config': {'title': 'Numeric'},
│ │ │ 'ref': '__main__.Numeric:105945921898336'
│ │ }
│ ]
}
Essentially, the difference in these two schemas is that we don't "move" the ref from the inner schema to the function-before
schemas.
The changes in this PR + removing this reference moving coincidentally make it work still.
However, doing so was a dangerous game: on L793, schema
directly comes from another model. The pop
calls removes the reference to the schema, and mutating schemas from other models has been a known issue.
You may be wondering: why this doesn't break things in the example I gave? Surely the pop
call should have mutated the core schema of Numeric
. Turns out it doesn't, because Numeric.__get_pydantic_core_schema__
does not cache the schema, so calling it will generate a new one every time (and this is what happens during the schema gen of OuterModel
). But on a similar issue, I mentioned that explicitly caching the core schema in the __get_pydantic_core_schema__
method would resolve the user issue (as the use case was slightly different)!
So to conclude, overriding BaseModel.__get_pydantic_core_schema__
is full of unexpected behaviors, but that's fine as officially supporting them would be a huge pain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to break this change out into a different PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc (but I'm not sure), I was able to remove it only thanks to the other changes. This won't clutter the git diff though, because it's just a removal. Probably by having a proper commit description when merging, I can add a note about this?
I've removed the example I added here to address #7833. It's no longer relevant, as there's no cc @chbndrhnns, you might find this interesting. This simplification now works: from typing import Type
from pydantic_core import CoreSchema
from typing_extensions import Annotated
from pydantic import BaseModel, GetCoreSchemaHandler
class Metadata(BaseModel):
foo: str = 'metadata!'
bar: int = 100
class Model(BaseModel):
state: Annotated[int, Metadata()]
m = Model.model_validate({'state': 2})
print(repr(m))
#> Model(state=2)
print(m.model_fields)
"""
{
'state': FieldInfo(
annotation=int,
required=True,
metadata=[Metadata(foo='metadata!', bar=100)],
)
}
""" |
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
As seen above, annotating a field with a `BaseModel` type can be used to modify or override the generated json schema. | ||
However, if you want to take advantage of storing metadata via `Annotated`, but you don't want to override the generated JSON | ||
schema, you can use the following approach with a no-op version of `__get_pydantic_core_schema__` implemented on the | ||
metadata class: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does seem like a terrible example, but is ther ean alternative way to do what it is trying to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, see #10863 (comment)
You just don't have to use the nasty no-op, so it's even better!
@Viicos, I'd love to chat about a few more nitty gritty details here tomorrow, but other than that, I think we can move forward with this. Great find! |
Thanks for the ping, @sydney-runkle. This sounds exciting! I am waiting for the release now ;) |
Sure thing. We just released v2.10, and this won't fit in until v2.11, which will be a bit 😅 Feel free to install from |
be2095c
to
a22872a
Compare
I went and tested this branch on a couple libraries (using the following GH search), and couldn't find any issues. This should imo be good to merge once approved. |
a22872a
to
0495519
Compare
Nice, tests passing! Let's add a few other integrations before we move forward with a merge here? |
0495519
to
34b876d
Compare
@Viicos, I can review this again early next week once we have some more third party tests added? |
Two integration tests were added since #10863 (comment), but overall it is pretty hard to see how this affects existing code, as most 3rd-party libraries are not defining a lot of custom types/ |
Fair enough, I'll review today then 👍 |
34b876d
to
99565e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great, a few follow up questions 👍
if is_function_with_inner_schema(schema): | ||
ref = schema['schema'].pop('ref', None) # pyright: ignore[reportCallIssue, reportArgumentType] | ||
if ref: | ||
schema['ref'] = ref | ||
else: | ||
ref = get_ref(schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to break this change out into a different PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's wait on merging 3.8, but otherwise LGTM :)
@sydney-runkle
|
Indirectly closes #9443.
What we're trying to achieve and how
get_type_ref
is pretty expensive, and we'll try to reduce the amount of calls to this function.Currently, for every call to
GenerateSchema.generate_schema
, we unconditionally end up with the following call chain:_generate_schema_from_property
->get_schema_or_ref
->get_type_ref
.The idea is thus to remove this part (L750-752):
pydantic/pydantic/_internal/_generate_schema.py
Lines 740 to 752 in 30ee4f4
So that any
GenerateSchema
method responsible for generating a core schema for a referenceable type handle it. For example,_model_schema
, which already has this check:pydantic/pydantic/_internal/_generate_schema.py
Lines 624 to 628 in 30ee4f4
Note that the
if maybe_schema is not None
branch is never reached (same for all the other referenceable types such as dataclasses, typeddict, etc) because the branch we're trying to remove handles it already.So surely removing the branch from
_generate_schema_from_property
should work.Challenges
However, this breaks for Pydantic models:
Let's say you have the following:
When we generate a schema for
sub2
, we follow the call chaingenerate_schema
->_generate_schema_from_property
. If we don't check forget_schema_or_ref
, we would want it to be checked in_model_schema
as shown above. However, the rest of the_generate_schema_from_property
implementation will check for__get_pydantic_core_schema__()
, and the__pydantic_core_schema__
property. In our example,Sub
is already in the definitions thanks tosub1
, so it would break by returning the complete core schema ofSub
1 instead of a definition reference schema (produced byget_schema_or_ref
).So let's try to push things a bit further. Why not move the
__get_pydantic_core_schema__()
/__pydantic_core_schema__
logic to the_model_schema
method? (same for dataclasses).The issue is that while it would be fine for the
__pydantic_core_schema__
property (only relevant for Pydantic models and dataclasses), the__get_pydantic_core_schema__()
method can be set on pretty much anything.Solution
That's why I had to remove the
__get_pydantic_core_schema__
method fromBaseModel
. The current implementation is really only returning the cached__pydantic_core_schema__
property or delegating the generation to theGenerateSchema
class.During model construction, we were doing something even more confusing:
pydantic/pydantic/_internal/_model_construction.py
Lines 648 to 655 in 30ee4f4
The following call chain happens:
cls.__get_pydantic_core_schema__(cls, handler)
->handler(cls)
~GenerateSchema().generate_schema(cls, from_dunder_get_core_schema=False)
.I think having
from_dunder_get_core_schema
to avoid an infinite loop is a sign that the current implementation is quite convoluted.Instead, by:
BaseModel.__get_pydantic_core_schema__
method, and moving the check for the__pydantic_core_schema__
attribute toGenerateSchema._model_schema
__get_pydantic_core_schema__
method (and not the property) when callingGenerateSchema.generate_schema
(which isn't an issue for Pydantic models as we removed it).We:
GenerateSchema().generate_schema(some_model_cls)
, and the call chain is way simpler:generate_schema
-> ... ->_model_schema
.get_schema_or_ref
(and check for the__pydantic_core_schema__
property) to the responsible methods:_model_schema
,_dataclass_schema
, etc.Breaking changes concerns
We had a couple reports from people overriding
BaseModel.__get_pydantic_core_schema__
:__get_pydantic_core_schema__
ignored in self-referencing models #10582, a user also facing unexpected issues (maybe the same ones).Feels like doing so always led to unexpected behavior, and a hacky fix. That's why I'm in support of having this pattern for
BaseModel
removed. It might break existing code, but we are probably helping the users in avoiding unexpected issues in the future.We could try to still support the existing
BaseModel.__get_pydantic_core_schema__
method, but honestly it's been quite hard to implement this already, and it might be that it's impossible to have both this perf improvement and support for the existing behavior at the same timeFootnotes
By calling
Sub.__get_pydantic_core_schema__()
, which would return it from the cached attribute. Confusing! Adds an extra level of indirection. ↩