Skip to content

CoreMetadata refactor with an emphasis on documentation and reducing complexity #10675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Oct 28, 2024

Conversation

sydney-runkle
Copy link
Contributor

@sydney-runkle sydney-runkle commented Oct 21, 2024

I would recommend reviewing this commit by commit, though looking over the whole thing afterwards gives some helpful perspective as well.

Changes in this PR:

  • Remove the CoreMetadataHandler structure - it offered minimal type checking benefits, some overhead, and a good amount of unnecessary complexity.
  • Better documentation for the CoreMetadata dict so that it's clear what we're adding to it and why.
  • Remove as much of the json schema (callable) generation logic from _generate_schema.py and _known_annotated_metadata.py and consolidate it in json_schema.py
  • For basic json schema updates and json_schema_extra updates, no longer build a callable ahead of time.
  • As a nice benefit, seeing some 2-5% improvements in schema build times, which is good considering this is just JSON schema related changes.

TODOs, in the future:

  • Minimize amount of callable wrapper logic we need for JSON schema generation - requires a major refactor, unfortunately.
  • Move the CoreMetadata structure to rust so that we can pull out some of the cast calls and just get natural type checking benefits there. One hiccup here is that replacing metadata: dict[str, Any] with metadata: CoreMetadata, doesn't allow for arbitrary injection (say, in a custom schema), so I'm not sure what we want to do about that. It also then raises errors when we do our temporary injects for say, discriminator application, but we shouldn't be doing that anyway.
  • Standardize whether or not we simplify refs for callable json schema application so we don't need pydantic_js_functions and pydantic_js_annotation_functions

@github-actions github-actions bot added the relnotes-fix Used for bugfixes. label Oct 21, 2024
Copy link

cloudflare-workers-and-pages bot commented Oct 21, 2024

Deploying pydantic-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2a08ddf
Status: ✅  Deploy successful!
Preview URL: https://1b3fed65.pydantic-docs.pages.dev
Branch Preview URL: https://metadata-stuck-on-a-flight.pydantic-docs.pages.dev

View logs

Copy link

codspeed-hq bot commented Oct 21, 2024

CodSpeed Performance Report

Merging #10675 will not alter performance

Comparing metadata-stuck-on-a-flight (2a08ddf) with main (19f7d41)

Summary

✅ 44 untouched benchmarks

@sydney-runkle sydney-runkle changed the title CoreMetadata refactor as a consequence of a very long flight CoreMetadata refactor with an emphasis on documentation and reducing complexity Oct 22, 2024
Comment on lines 51 to 52
pydantic_js_prefer_positional_arguments: bool | None = None,
pydantic_js_input_core_schema: CoreSchema | None = None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably overkill (these params specifically). Happy to remove. The others make more sense given that they need special update logic.

Comment on lines -2474 to -2510
def get_json_schema_update_func(
json_schema_update: JsonSchemaValue, json_schema_extra: JsonDict | typing.Callable[[JsonDict], None] | None
) -> GetJsonSchemaFunction:
def json_schema_update_func(
core_schema_or_field: CoreSchemaOrField, handler: GetJsonSchemaHandler
) -> JsonSchemaValue:
json_schema = {**handler(core_schema_or_field), **json_schema_update}
add_json_schema_extra(json_schema, json_schema_extra)
return json_schema

return json_schema_update_func


def add_json_schema_extra(
json_schema: JsonSchemaValue, json_schema_extra: JsonDict | typing.Callable[[JsonDict], None] | None
):
if isinstance(json_schema_extra, dict):
json_schema.update(to_jsonable_python(json_schema_extra))
elif callable(json_schema_extra):
json_schema_extra(json_schema)


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, shouldn't be in json_schema.py

Comment on lines +1484 to +1494
def _update_class_schema(self, json_schema: JsonSchemaValue, cls: type[Any], config: ConfigDict) -> None:
"""Update json_schema with the following, extracted from `config` and `cls`:

* title
* description
* additional properties
* json_schema_extra
* deprecated

Done in place, hence there's no return value as the original json_schema is mutated.
No ref resolving is involved here, as that's not appropriate for simple updates.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now has lots of the logic that was mistakenly in _generate_schema.py

schema_to_update = self.get_schema_from_definitions(JsonRef(ref))
if schema_to_update is None:
raise RuntimeError(f'Cannot update undefined schema for $ref={ref}')
return self.resolve_ref_schema(schema_to_update)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed and simplified. There's a function called this in GenerateJsonSchemaHandler, and ideally, we'd get rid of that eventually, so I think this helps draw some parallels to make that simplification easier in the future.

Comment on lines -2616 to -2630
def test_typeddict_with_extra_behavior_allow():
class Model:
@classmethod
def __get_pydantic_core_schema__(cls, source_type: Any, handler: GetCoreSchemaHandler) -> CoreSchema:
return core_schema.typed_dict_schema(
{'a': core_schema.typed_dict_field(core_schema.str_schema())},
extra_behavior='allow',
)

assert TypeAdapter(Model).json_schema() == {
'type': 'object',
'properties': {'a': {'title': 'A', 'type': 'string'}},
'required': ['a'],
'additionalProperties': True,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think we should only be pulling the extra_behavior from the config here. If folks disagree, I can pull from the schema as well, though not sure why we have this setting there - seems redundant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra items are going to be supported soon for typed dicts in Python. Is Pydantic support for it going to be easy to implement with these changes?

Copy link
Contributor

@hyperlint-ai hyperlint-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 files reviewed, 1 total issue(s) found.

Copy link
Contributor

github-actions bot commented Oct 23, 2024

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  pydantic
  fields.py
  functional_validators.py
  json_schema.py
  pydantic/_internal
  _core_metadata.py 68, 87, 95
  _core_utils.py
  _generate_schema.py
  _known_annotated_metadata.py 270
Project Total  

This report was generated by python-coverage-comment-action

@sydney-runkle sydney-runkle added the relnotes-change Used for changes to existing functionality which don't have a better categorization. label Oct 23, 2024
Copy link
Contributor

@dmontagu dmontagu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looking good to me other than the couple comments I made above about the warning message; not sure if you plan to address any of the TODO comments you added before merging, but I'm okay even if they aren't addressed.

@sydney-runkle
Copy link
Contributor Author

Wasn't planning on addressing the TODOs before merging - I feel like this is big enough already, and the callable stuff would represent a pretty significant structural change.

I will work on getting the fastapi CI step to green, though.

Copy link
Member

@Viicos Viicos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look reasonable and indeed having less stuff in the GenerateSchema class is much better.

Move the CoreMetadata structure to rust so that we can pull out some of the cast calls and just get natural type checking benefits there.

Will we get any benefits to move things to Rust here? Will this introduce an extra level of indirection, making things harder to understand?

Comment on lines -2616 to -2630
def test_typeddict_with_extra_behavior_allow():
class Model:
@classmethod
def __get_pydantic_core_schema__(cls, source_type: Any, handler: GetCoreSchemaHandler) -> CoreSchema:
return core_schema.typed_dict_schema(
{'a': core_schema.typed_dict_field(core_schema.str_schema())},
extra_behavior='allow',
)

assert TypeAdapter(Model).json_schema() == {
'type': 'object',
'properties': {'a': {'title': 'A', 'type': 'string'}},
'required': ['a'],
'additionalProperties': True,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra items are going to be supported soon for typed dicts in Python. Is Pydantic support for it going to be easy to implement with these changes?

@sydney-runkle
Copy link
Contributor Author

Will we get any benefits to move things to Rust here? Will this introduce an extra level of indirection, making things harder to understand?

My only thought here was then we could then type metadata as CoreMetadata instead of dict[str, Any] so that we can avoid some of the casting here and get more type checking across the board.

@sydney-runkle
Copy link
Contributor Author

extra items are going to be supported soon for typed dicts in Python

This is where moving this to rust would be super helpful, then we can type metadata: CoreMetadata, but extra values will be allowed both from the user side and when we're doing the odd mutation for discriminator mutation stuff.

@sydney-runkle
Copy link
Contributor Author

Is Pydantic support for it going to be easy to implement with these changes?

Yes, I haven't removed the extra_behavior attribute, I just think we should pull that info from config so that we don't have two ways to set it. Can have a different chat about this, I don't think the change here is super meaningful.

@Viicos
Copy link
Member

Viicos commented Oct 28, 2024

Yes, I haven't removed the extra_behavior attribute, I just think we should pull that info from config so that we don't have two ways to set it. Can have a different chat about this, I don't think the change here is super meaningful.

I would rather support the PEP 728 added feature and slowly deprecate pulling the info from the config (or support both, but disallow having both specified at the same time?)

@sydney-runkle sydney-runkle requested a review from Viicos October 28, 2024 16:26
Copy link
Member

@Viicos Viicos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also fine with removing extra_behavior for now, and re-add it in some way with PEP 728 coming soon

Comment on lines 14 to +15
if TYPE_CHECKING:
from ..annotated_handlers import GetJsonSchemaHandler
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed

@sydney-runkle sydney-runkle merged commit 80353c2 into main Oct 28, 2024
56 checks passed
@sydney-runkle sydney-runkle deleted the metadata-stuck-on-a-flight branch October 28, 2024 21:29
# Remove the first one once that test is fixed, see https://github.com/pydantic/pydantic/pull/10029
# the remaining tests all failing bc we now correctly add a `'deprecated': True` attribute to the JSON schema,
# So it's the FastAPI tests that need to be updated here
./scripts/test.sh -vv \
Copy link
Contributor

@tamird tamird Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I think the first one is already passing and the others should be fixed by https://github.com/fastapi/fastapi/pull/12971/files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

If you wouldn't mind, could you open a PR against pydantic to remove these once your FastAPI PR is merged? Thanks!!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to! Judging by my past experience with FastAPI I wouldn't expect that to happen any time soon 🙃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much!!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes-change Used for changes to existing functionality which don't have a better categorization. relnotes-fix Used for bugfixes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants