Refactor namespace logic for annotations evaluation #10530

Viicos · 2024-10-01T12:23:36Z

Change Summary

This can be reviewed commit per commit, with details added in each commit message.

There are some similarities with #10425, but here are the main things that differ:

As said in Improve namespace handling performance in core schema building #10425 (comment), we are no longer only using the local argument. Instead, the NamespacesTuple (name TBD) is used to tightly couple globals and locals together.
The NsResolver class is still present, but is closer to the TypesNamespaceStack we had previously and simplified. It is not a mapping. The lazy evaluation capability is still available for the locals, thanks to the LazyLocalNamespace mapping. It also supports the concept of a fallback and override namespace, as we specified.

The NsResolver can be used in two ways:

By explicitly specifying globals and locals. This is used for example when generating a schema from validate_call or from TypeAdapter. For validate_call, the globals and locals are fetched using ns_from and when we enter GenerateSchema._callable_schema, the explicitly passed globals and locals will be used.
By not specifying any globals or locals. This is used for example when generating a schema when a Pydantic model is created. In this case, NsResolver.push will be called when we enter GenerateSchema._model_schema, and the initial globals and locals are then irrelevant.

Related issue number

Checklist

The pull request title is a good summary of the changes - it will be used in the changelog
Unit tests for the changes exist
Tests pass on CI
Documentation reflects the changes where applicable
My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

codspeed-hq · 2024-10-01T12:30:32Z

CodSpeed Performance Report

Merging #10530 will not alter performance

_{Comparing ns-refactor (86fe4cc) with main (07e2b72)}

Summary

✅ 38 untouched benchmarks

Viicos · 2024-10-03T14:34:59Z

One issue that I have and I'm unsure how to solve it yet:

The following currently unexpectedly works on main/Markus' PR:

# module1.py
from typing import Optional

from pydantic import BaseModel

class Foo(BaseModel):
    # note that this is a type checking error,
    # Bar is not defined in this module:
    a: Optional['Bar'] = None

# main.py
from pydantic import BaseModel

from repro import Foo

class Bar(BaseModel):
    b: Foo

Bar.__pydantic_complete__
#> True
# When building `Bar`, we will re-enter `GenerateSchema_model_schema` for `Foo`,
# but because we merge every ns together, `'Bar'` successfully resolves.

It doesn't (and shouldn't) with this PR. However, it also doesn't (but in theory should) work when in the same module:

from typing import Optional

from pydantic import BaseModel

class Foo(BaseModel):
    a: Optional['Bar'] = None

class Bar(BaseModel):
    b: Foo

Bar.__pydantic_complete__
#> False (on this PR)

Similarly, when building Bar, we will re-enter GenerateSchema_model_schema for Foo. On this PR, only the module namespace of Foo (which doesn't contain Bar because we are currently defining it!) and locals of Foo (vars(Foo) + {Foo.__name__: Foo}) is used to resolve annotations (see the NsResolver.types_namespace implementation).

So there's a tradeoff here, either we keep using the ns of all the types in the stack (but then we allow invalid examples such as the first one), or we don't but we need to introduce extra model_rebuild calls for users.

Or we could implement something even smarter: when rebuilding Foo, because Bar was defined in the same scope (i.e. same locals), we can "share" locals from Bar (which includes {Bar.__name__: Bar}) to the schema rebuilding of Foo. Not sure how easy it is to do so, however...

sydney-runkle

@Viicos,

This is amazing work. I'm particularly excited about:

The abundance of new documentation. This is really important, as we manage namespaces slightly differently for each of TypeAdapters, BaseModels, dataclasses, etc, but the differences are slight and confusing.
The ease of reviewing this PR - thanks for breaking things down into small commits with in depth explanations
The conceptual changes here - inspired by @MarkusSintonen, the abstraction of some of this namespace management logic is really useful.
As mentioned above with the different namespace management paths, there's a lot to address here, and you've done a wonderful job wrapping your head around these issues.

I still need to review 47fdd78, but I felt there was enough in this review already to go ahead and submit.

pydantic/_internal/_namespace_utils.py

pydantic/_internal/_dataclasses.py

pydantic/_internal/_validate_call.py

pydantic/type_adapter.py

pydantic/dataclasses.py

MarkusSintonen · 2024-10-03T18:45:42Z

pydantic/_internal/_namespace_utils.py

+    globals: GlobalsNamespace
+    """The namespace to be used as the `globals` argument during annotations evaluation."""
+
+    locals: MappingNamespace


Why this isnt always LazyLocalNamespace? Is something actually allowed to initialize this with any kind of Mapping? There is some empties somewhere in the PR but it could be also then just empty LazyLocalNamespace

I wanted to have LazyLocalNamespace as an implementation detail, as this gives us more flexibility, especially if in the future we expose the GenerateSchema class to Pydantic users (the types_namespace property exposes NamespaceTuple).

MarkusSintonen · 2024-10-03T18:50:33Z

pydantic/_internal/_namespace_utils.py

+    def __init__(self, *namespaces: MappingNamespace) -> None:
+        self._namespaces = namespaces
+
+    @cached_property


Hows the perf looking with this in ForwardRefs cases like in the mega kube file thing when using future annotations? So how this works by flatting the dicts vs just iterating them, perf wise? I guess the def data is really never active when ForwardRefs are not being used by user, right?

I guess the def data is really never active when ForwardRefs are not being used by user, right?

Correct.

Running memray on the k8s file, I get:

~3 GiB total allocations on main

~1.5GiB total allocations here

~1.9GiB total allocations here, with the future import.

So how this works by flatting the dicts vs just iterating them, perf wise?

Considering most of the local namespaces are really small (compared to globals), If I remember correctly the difference is marginal, at least by experimenting quickly with timeit and some dummy examples.

This is great, I'm super glad to see we've reduced the duplicate allocations in context management that we were dealing with before this PR.

cloudflare-workers-and-pages · 2024-10-04T14:54:44Z

Deploying pydantic-docs with Cloudflare Pages

Latest commit:	`86fe4cc`
Status:	✅ Deploy successful!
Preview URL:	https://50667832.pydantic-docs.pages.dev
Branch Preview URL:	https://ns-refactor.pydantic-docs.pages.dev

View logs

pydantic/_internal/_generate_schema.py

This defines: - two type aliases, representing the global and local namespaces - a `NamespacesTuple` datastructure, to make sure globals and locals are coupled together - a `LazyLocalNamespace` mapping implementation. This is considered an implementation detail and is used to lazily merge locals if required when calling `eval` - a `ns_from` function returning the global and local ns to be used for annotations evalutation. It handles some edge cases not covered by the Python stdlib - A `NsResolver` class to be used *only* with the `GenerateSchema` class

Using the newly defined `ns_from` function, we improve correctness by fetching the correct globals and locals for each base of the class. Previously, the same global namespace was used, meaning base classes defined in different modules could have forward annotations resolved to the wrong type. We also fix type annotations of the eval like functions, using our two type aliases.

Only set type params if no local namespace was provided. Otherwise, it is assumed that `get_function_type_hints` is called from `GenerateSchema._callable_schema` and `ns_from` was used (which is smarter when it comes to handle type params).

and use new type aliases

Make it compatible with the new arguments that `GenerateSchema` and `collect_model_fields` expects.

This is where we use the concept of a "fallback" and "override" namespace.

Use new type aliases, correctly instantiate the `GenerateSchema` class

This is a pretty big changes, but basically this removes the hack merging all the global namespaces for each dataclass base, and instead improve the `collect_dataclass_fields` to mimic what we do for Pydantic models. The core config handling seems unnecessary as well, so it was simplified.

Similar to the previous commit for Pydantic models.

Use the `ns_from` function that will take care of type params.

This is still WIP, I'm unsure about the fallback ns

sydney-runkle

Looking better and better! We're definitely close here, great work.

pydantic/_internal/_dataclasses.py

pydantic/_internal/_namespace_utils.py

sydney-runkle · 2024-10-08T16:20:17Z

pydantic/_internal/_namespace_utils.py

+    def __init__(self, *namespaces: MappingNamespace) -> None:
+        self._namespaces = namespaces
+
+    @cached_property


This is great, I'm super glad to see we've reduced the duplicate allocations in context management that we were dealing with before this PR.

pydantic/_internal/_namespace_utils.py

pydantic/_internal/_generate_schema.py

pydantic/_internal/_model_construction.py

pydantic/main.py

sydney-runkle · 2024-10-09T01:31:47Z

Changes look good thus far 👍

FastAPI uses it (but shouldn't!). No runtime warning is emitted, to avoid end users having to deal with this.

The optimization avoids computing `types_namespace` if it isn't necessary. The `GetSetDescriptorType` check was also fixed, `isinstance` is the way to go (see `typing.get_type_hints`).

github-actions · 2024-10-09T19:02:08Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
pydantic
annotated_handlers.py
dataclasses.py
fields.py
functional_serializers.py
main.py
type_adapter.py
validate_call_decorator.py
pydantic/_internal
_dataclasses.py
_decorators.py
_fields.py
_generate_schema.py					1465-1466, 1532-1533
_generics.py
_model_construction.py
_namespace_utils.py
_schema_generation_shared.py
_typing_extra.py					271, 285
_validate_call.py
Project Total

_{This report was generated by python-coverage-comment-action}

sydney-runkle

Amazing work, @Viicos!

dmontagu · 2024-10-10T13:28:07Z

docs/concepts/experimental.md

-    family: Annotated[  # (7)!
-        list[User],
-        validate_as_deferred(lambda: list[User]).transform(lambda x: x[1:]),
-    ]


CC @adriangb I think this is broken by this PR but maybe worth breaking. I think it's possible to fix by fiddling with the __get_pydantic_core_schema__ on the Pipeline and making sure we plumb through the namespace stuff, but maybe harder than it's worth doing this second

I think @Viicos decided this was worth breaking. As long as there's clear documentation showing the alternative path forward I'm okay with that.

Copy pasting what I added on Slack so that it's not lost:

the TL;DR is using lambdas with references to other symbols in annotations opens the door to a lot of weird behaviors.

This is the (simplified) test failing on the PR:

from typing_extensions import Annotated from pydantic import BaseModel from pydantic.experimental.pipeline import validate_as_deferred class User(BaseModel): family: 'Annotated[list[User], validate_as_deferred(lambda: list[User])]' # The `family` annotation is successfully evaluated, but at some point during # core schema generation, the pipeline API logic is triggered and when the lambda # gets called, we end up with: # NameError: name 'User' is not defined

On main, when we evaluate (read: Python will call eval) the annotation for family, we use the following as globals:
{'User': __main__.User, 'Annotated': ..., 'BaseModel': ..., ...}
and locals are empty.

On this PR, we cleaned up how globals and locals were mixed up before. This means that we now use the following as globals:
{'Annotated': ..., 'BaseModel': ..., ...}
and locals:
{'User': __main__.User, ...}

And the issue comes from what could be considered as a limitation of eval. Consider this example:

def func(): A = int works = lambda: list[A] fails = eval('lambda: list[A]', globals(), locals()) works() # list[int] fails() # NameError: A is not defined.

The eval limitation is that it does not have access to the non-locals of the lambda environment (which is a new scope, like with a def statement). Even though A is present in locals, it won't be used to resolve A and so eval will look up in the globals instead (that's why it works on main because User was added in globals for the eval call).

This limitation is documented in this open CPython PR.

dmontagu · 2024-10-10T13:33:17Z

pydantic/_internal/_namespace_utils.py

+
+    @cached_property
+    def data(self) -> Mapping[str, Any]:
+        return {k: v for ns in self._namespaces for k, v in ns.items()}


Any reason not to use collections.ChainMap here instead of looping over all the items in all the dicts?

I thought about using chain maps but couldn't remember all the reasons I did not. Looking into it, we could also
ditch LazyLocalNamespace and use a chain map instead, but ChainMap is a mutable mapping, and ideally we'd like to enforce immutability (at least from at the type checker level). I added a comment about this.

Regarding whether data should return a chain map instead, we discussed about it in this comment.

pydantic/type_adapter.py

dmontagu · 2024-10-10T13:38:09Z

pydantic/main.py

+        parent_ns = _model_construction.unpack_lenient_weakvaluedict(cls.__pydantic_parent_namespace__) or {}
+
+        ns_resolver = _namespace_utils.NsResolver(
+            parent_namespace={**rebuild_ns, **parent_ns},


Another place collections.ChainMap could maybe be used, maybe not necessary

adapt pydantic BC in 2.10 (pydantic/pydantic#10530)

github-actions bot added the relnotes-fix Used for bugfixes. label Oct 1, 2024

sydney-runkle mentioned this pull request Oct 1, 2024

Improve namespace handling performance in core schema building #10425

Closed

5 tasks

sydney-runkle added relnotes-performance Used for performance improvements. and removed relnotes-fix Used for bugfixes. labels Oct 3, 2024

sydney-runkle reviewed Oct 3, 2024

View reviewed changes

sydney-runkle mentioned this pull request Oct 3, 2024

Add rebuild() method for TypeAdapter and simplify defer_build patterns #10537

Merged

MarkusSintonen reviewed Oct 3, 2024

View reviewed changes

sydney-runkle mentioned this pull request Oct 7, 2024

pydantic work plan for Sydney Runkle: October 7th - October 11th, 2024 #10561

Closed

9 tasks

Viicos force-pushed the ns-refactor branch from 077039e to 8cc84d9 Compare October 7, 2024 20:17

Viicos mentioned this pull request Oct 8, 2024

Do not use the previous config from the stack for dataclasses without config #10576

Merged

5 tasks

Viicos commented Oct 8, 2024

View reviewed changes

pydantic/_internal/_generate_schema.py Show resolved Hide resolved

Viicos added 15 commits October 8, 2024 18:00

Add temporary xfailing tests that we will hopefully fix

e5c4207

Improve annotations resolving for functions

1da3698

Only set type params if no local namespace was provided. Otherwise, it is assumed that `get_function_type_hints` is called from `GenerateSchema._callable_schema` and `ns_from` was used (which is smarter when it comes to handle type params).

Adapt collect_model_fields to handle the parent namespace

f4f2ae8

and use new type aliases

Adapt the GenerateSchema class to use the NsResolver class

bb7ffff

Update ModelMetaclass to be compatible with the previous changes

2835c7f

Make it compatible with the new arguments that `GenerateSchema` and `collect_model_fields` expects.

Change BaseModel.model_rebuid logic to match the proposed spec

9b77d05

This is where we use the concept of a "fallback" and "override" namespace.

Make the dataclass logic compatible with the proposed changes

132c06c

Use new type aliases, correctly instantiate the `GenerateSchema` class

Change rebuild_dataclass logic to match the proposed spec

cee89ad

Similar to the previous commit for Pydantic models.

Change validate_call namespace logic

24338c9

Use the `ns_from` function that will take care of type params.

Adapt TypeAdapter namespace logic

6a86ac8

This is still WIP, I'm unsure about the fallback ns

Misc. changes to adapt with the new structures

93bd021

Process some feedback

7c3bf4f

Almost finished

e182033

Viicos force-pushed the ns-refactor branch from 1b03127 to e182033 Compare October 8, 2024 16:03

sydney-runkle reviewed Oct 8, 2024

View reviewed changes

sydney-runkle mentioned this pull request Oct 8, 2024

some of the type annotations in _typing_extra are wrong #10577

Closed

1 task

First round of feedback

89199ba

Viicos added 2 commits October 9, 2024 13:50

Cleanup xfail tests, last fixes

127aefe

Add back eval_type_lenient and deprecate it

66c024e

FastAPI uses it (but shouldn't!). No runtime warning is emitted, to avoid end users having to deal with this.

Viicos force-pushed the ns-refactor branch from 301136a to 985d4ec Compare October 9, 2024 12:24

lint

2a371ca

Viicos force-pushed the ns-refactor branch from 985d4ec to 2a371ca Compare October 9, 2024 12:30

Viicos added 3 commits October 9, 2024 17:38

Fix pipeline example

8d57b83

Compat fixes, optimize get_cls_type_hints

7861286

The optimization avoids computing `types_namespace` if it isn't necessary. The `GetSetDescriptorType` check was also fixed, `isinstance` is the way to go (see `typing.get_type_hints`).

Update outdated comment

08691ec

sydney-runkle approved these changes Oct 9, 2024

View reviewed changes

Viicos mentioned this pull request Oct 10, 2024

Add documentation for annotations resolving #10592

Merged

5 tasks

dmontagu reviewed Oct 10, 2024

View reviewed changes

pydantic/type_adapter.py Show resolved Hide resolved

dmontagu reviewed Oct 10, 2024

View reviewed changes

Feedback

86fe4cc

sydney-runkle mentioned this pull request Oct 10, 2024

Revisit namespace management with pipeline API #10599

Open

Viicos merged commit c772b43 into main Oct 10, 2024
63 checks passed

Viicos deleted the ns-refactor branch October 10, 2024 19:55

sydney-runkle mentioned this pull request Oct 10, 2024

Refactor how we handle parent namespaces for model construction #10074

Closed

Daraan mentioned this pull request Oct 25, 2024

Third-party tests failed on Fri Oct 25 2024 python/typing_extensions#493

Closed

Viicos mentioned this pull request Nov 20, 2024

pydantic take dataclass Text as str if from __future__ import annotations #10889

Closed

1 task

RF-Tar-Railt added a commit to RF-Tar-Railt/nonebot-plugin-uninfo that referenced this pull request Feb 17, 2025

🐛 version 0.6.10

4b70ee1

adapt pydantic BC in 2.10 (pydantic/pydantic#10530)

RF-Tar-Railt added a commit to RF-Tar-Railt/nonebot-plugin-uninfo that referenced this pull request Feb 17, 2025

🐛 version 0.6.10

1c1a0c2

adapt pydantic BC in 2.10 (pydantic/pydantic#10530)

Uh oh!

Refactor namespace logic for annotations evaluation #10530

Refactor namespace logic for annotations evaluation #10530

Uh oh!

Conversation

Viicos commented Oct 1, 2024

Change Summary

Related issue number

Checklist

Uh oh!

codspeed-hq bot commented Oct 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging #10530 will not alter performance

Summary

Uh oh!

Viicos commented Oct 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sydney-runkle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarkusSintonen Oct 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarkusSintonen Oct 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloudflare-workers-and-pages bot commented Oct 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying pydantic-docs with Cloudflare Pages

Uh oh!

Uh oh!

sydney-runkle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sydney-runkle commented Oct 9, 2024

Uh oh!

github-actions bot commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

sydney-runkle left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmontagu Oct 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

codspeed-hq bot commented Oct 1, 2024 •

edited

Loading

Viicos commented Oct 3, 2024 •

edited

Loading

MarkusSintonen Oct 3, 2024 •

edited

Loading

MarkusSintonen Oct 3, 2024 •

edited

Loading

cloudflare-workers-and-pages bot commented Oct 4, 2024 •

edited

Loading

github-actions bot commented Oct 9, 2024 •

edited

Loading

dmontagu Oct 10, 2024 •

edited

Loading