Skip to content

Refactor namespace logic for annotations evaluation #10530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Oct 10, 2024
Merged

Refactor namespace logic for annotations evaluation #10530

merged 27 commits into from
Oct 10, 2024

Conversation

Viicos
Copy link
Member

@Viicos Viicos commented Oct 1, 2024

Change Summary

This can be reviewed commit per commit, with details added in each commit message.

There are some similarities with #10425, but here are the main things that differ:

  • As said in Improve namespace handling performance in core schema building #10425 (comment), we are no longer only using the local argument. Instead, the NamespacesTuple (name TBD) is used to tightly couple globals and locals together.
  • The NsResolver class is still present, but is closer to the TypesNamespaceStack we had previously and simplified. It is not a mapping. The lazy evaluation capability is still available for the locals, thanks to the LazyLocalNamespace mapping. It also supports the concept of a fallback and override namespace, as we specified.

The NsResolver can be used in two ways:

  • By explicitly specifying globals and locals. This is used for example when generating a schema from validate_call or from TypeAdapter. For validate_call, the globals and locals are fetched using ns_from and when we enter GenerateSchema._callable_schema, the explicitly passed globals and locals will be used.
  • By not specifying any globals or locals. This is used for example when generating a schema when a Pydantic model is created. In this case, NsResolver.push will be called when we enter GenerateSchema._model_schema, and the initial globals and locals are then irrelevant.

Related issue number

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

@github-actions github-actions bot added the relnotes-fix Used for bugfixes. label Oct 1, 2024
Copy link

codspeed-hq bot commented Oct 1, 2024

CodSpeed Performance Report

Merging #10530 will not alter performance

Comparing ns-refactor (86fe4cc) with main (07e2b72)

Summary

✅ 38 untouched benchmarks

@sydney-runkle sydney-runkle added relnotes-performance Used for performance improvements. and removed relnotes-fix Used for bugfixes. labels Oct 3, 2024
@Viicos
Copy link
Member Author

Viicos commented Oct 3, 2024

One issue that I have and I'm unsure how to solve it yet:

The following currently unexpectedly works on main/Markus' PR:

# module1.py
from typing import Optional

from pydantic import BaseModel

class Foo(BaseModel):
    # note that this is a type checking error,
    # Bar is not defined in this module:
    a: Optional['Bar'] = None

# main.py
from pydantic import BaseModel

from repro import Foo

class Bar(BaseModel):
    b: Foo

Bar.__pydantic_complete__
#> True
# When building `Bar`, we will re-enter `GenerateSchema_model_schema` for `Foo`,
# but because we merge every ns together, `'Bar'` successfully resolves.

It doesn't (and shouldn't) with this PR. However, it also doesn't (but in theory should) work when in the same module:

from typing import Optional

from pydantic import BaseModel

class Foo(BaseModel):
    a: Optional['Bar'] = None

class Bar(BaseModel):
    b: Foo

Bar.__pydantic_complete__
#> False (on this PR)

Similarly, when building Bar, we will re-enter GenerateSchema_model_schema for Foo. On this PR, only the module namespace of Foo (which doesn't contain Bar because we are currently defining it!) and locals of Foo (vars(Foo) + {Foo.__name__: Foo}) is used to resolve annotations (see the NsResolver.types_namespace implementation).


So there's a tradeoff here, either we keep using the ns of all the types in the stack (but then we allow invalid examples such as the first one), or we don't but we need to introduce extra model_rebuild calls for users.

Or we could implement something even smarter: when rebuilding Foo, because Bar was defined in the same scope (i.e. same locals), we can "share" locals from Bar (which includes {Bar.__name__: Bar}) to the schema rebuilding of Foo. Not sure how easy it is to do so, however...

Copy link
Contributor

@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Viicos,

This is amazing work. I'm particularly excited about:

  1. The abundance of new documentation. This is really important, as we manage namespaces slightly differently for each of TypeAdapters, BaseModels, dataclasses, etc, but the differences are slight and confusing.
  2. The ease of reviewing this PR - thanks for breaking things down into small commits with in depth explanations
  3. The conceptual changes here - inspired by @MarkusSintonen, the abstraction of some of this namespace management logic is really useful.
  4. As mentioned above with the different namespace management paths, there's a lot to address here, and you've done a wonderful job wrapping your head around these issues.

I still need to review 47fdd78, but I felt there was enough in this review already to go ahead and submit.

globals: GlobalsNamespace
"""The namespace to be used as the `globals` argument during annotations evaluation."""

locals: MappingNamespace
Copy link
Contributor

@MarkusSintonen MarkusSintonen Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this isnt always LazyLocalNamespace? Is something actually allowed to initialize this with any kind of Mapping? There is some empties somewhere in the PR but it could be also then just empty LazyLocalNamespace

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to have LazyLocalNamespace as an implementation detail, as this gives us more flexibility, especially if in the future we expose the GenerateSchema class to Pydantic users (the types_namespace property exposes NamespaceTuple).

def __init__(self, *namespaces: MappingNamespace) -> None:
self._namespaces = namespaces

@cached_property
Copy link
Contributor

@MarkusSintonen MarkusSintonen Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hows the perf looking with this in ForwardRefs cases like in the mega kube file thing when using future annotations? So how this works by flatting the dicts vs just iterating them, perf wise? I guess the def data is really never active when ForwardRefs are not being used by user, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the def data is really never active when ForwardRefs are not being used by user, right?

Correct.

Running memray on the k8s file, I get:

  • ~3 GiB total allocations on main
  • ~1.5GiB total allocations here
  • ~1.9GiB total allocations here, with the future import.

So how this works by flatting the dicts vs just iterating them, perf wise?

Considering most of the local namespaces are really small (compared to globals), If I remember correctly the difference is marginal, at least by experimenting quickly with timeit and some dummy examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, I'm super glad to see we've reduced the duplicate allocations in context management that we were dealing with before this PR.

Copy link

cloudflare-workers-and-pages bot commented Oct 4, 2024

Deploying pydantic-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 86fe4cc
Status: ✅  Deploy successful!
Preview URL: https://50667832.pydantic-docs.pages.dev
Branch Preview URL: https://ns-refactor.pydantic-docs.pages.dev

View logs

Viicos added 15 commits October 8, 2024 18:00
This defines:
- two type aliases, representing the global and local namespaces
- a `NamespacesTuple` datastructure, to make sure globals and locals
  are coupled together
- a `LazyLocalNamespace` mapping implementation. This is considered
  an implementation detail and is used to lazily merge locals if
  required when calling `eval`
- a `ns_from` function returning the global and local ns to be used
  for annotations evalutation. It handles some edge cases not covered
  by the Python stdlib
- A `NsResolver` class to be used *only* with the `GenerateSchema`
  class
Using the newly defined `ns_from` function, we improve correctness
by fetching the correct globals and locals for each base of the class.
Previously, the same global namespace was used, meaning base classes
defined in different modules could have forward annotations resolved
to the wrong type.

We also fix type annotations of the eval like functions, using our
two type aliases.
Only set type params if no local namespace was provided. Otherwise,
it is assumed that `get_function_type_hints` is called from
`GenerateSchema._callable_schema` and `ns_from` was used (which
is smarter when it comes to handle type params).
Make it compatible with the new arguments that `GenerateSchema`
and `collect_model_fields` expects.
This is where we use the concept of a "fallback" and "override"
namespace.
Use new type aliases, correctly instantiate the `GenerateSchema`
class
This is a pretty big changes, but basically this removes the hack
merging all the global namespaces for each dataclass base, and
instead improve the `collect_dataclass_fields` to mimic what
we do for Pydantic models. The core config handling seems
unnecessary as well, so it was simplified.
Similar to the previous commit for Pydantic models.
Use the `ns_from` function that will take care of type params.
This is still WIP, I'm unsure about the fallback ns
Copy link
Contributor

@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking better and better! We're definitely close here, great work.

def __init__(self, *namespaces: MappingNamespace) -> None:
self._namespaces = namespaces

@cached_property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, I'm super glad to see we've reduced the duplicate allocations in context management that we were dealing with before this PR.

@sydney-runkle
Copy link
Contributor

Changes look good thus far 👍

Viicos added 2 commits October 9, 2024 13:50
FastAPI uses it (but shouldn't!). No runtime warning is emitted,
to avoid end users having to deal with this.
Viicos added 3 commits October 9, 2024 17:38
The optimization avoids computing `types_namespace` if it isn't necessary.
The `GetSetDescriptorType` check was also fixed, `isinstance` is the way to go
(see `typing.get_type_hints`).
Copy link
Contributor

github-actions bot commented Oct 9, 2024

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  pydantic
  annotated_handlers.py
  dataclasses.py
  fields.py
  functional_serializers.py
  main.py
  type_adapter.py
  validate_call_decorator.py
  pydantic/_internal
  _dataclasses.py
  _decorators.py
  _fields.py
  _generate_schema.py 1465-1466, 1532-1533
  _generics.py
  _model_construction.py
  _namespace_utils.py
  _schema_generation_shared.py
  _typing_extra.py 271, 285
  _validate_call.py
Project Total  

This report was generated by python-coverage-comment-action

Copy link
Contributor

@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work, @Viicos!

Comment on lines -74 to -77
family: Annotated[ # (7)!
list[User],
validate_as_deferred(lambda: list[User]).transform(lambda x: x[1:]),
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @adriangb I think this is broken by this PR but maybe worth breaking. I think it's possible to fix by fiddling with the __get_pydantic_core_schema__ on the Pipeline and making sure we plumb through the namespace stuff, but maybe harder than it's worth doing this second

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @Viicos decided this was worth breaking. As long as there's clear documentation showing the alternative path forward I'm okay with that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy pasting what I added on Slack so that it's not lost:

the TL;DR is using lambdas with references to other symbols in annotations opens the door to a lot of weird behaviors.


This is the (simplified) test failing on the PR:

from typing_extensions import Annotated

from pydantic import BaseModel
from pydantic.experimental.pipeline import validate_as_deferred

class User(BaseModel):
    family: 'Annotated[list[User], validate_as_deferred(lambda: list[User])]'

# The `family` annotation is successfully evaluated, but at some point during
# core schema generation, the pipeline API logic is triggered and when the lambda
# gets called, we end up with:
# NameError: name 'User' is not defined

On main, when we evaluate (read: Python will call eval) the annotation for family, we use the following as globals:
{'User': __main__.User, 'Annotated': ..., 'BaseModel': ..., ...}
and locals are empty.

On this PR, we cleaned up how globals and locals were mixed up before. This means that we now use the following as globals:
{'Annotated': ..., 'BaseModel': ..., ...}
and locals:
{'User': __main__.User, ...}

And the issue comes from what could be considered as a limitation of eval. Consider this example:

def func():
    A = int

    works = lambda: list[A]
    fails = eval('lambda: list[A]', globals(), locals())

    works()
    # list[int]
    fails()
    # NameError: A is not defined.

The eval limitation is that it does not have access to the non-locals of the lambda environment (which is a new scope, like with a def statement). Even though A is present in locals, it won't be used to resolve A and so eval will look up in the globals instead (that's why it works on main because User was added in globals for the eval call).

This limitation is documented in this open CPython PR.


@cached_property
def data(self) -> Mapping[str, Any]:
return {k: v for ns in self._namespaces for k, v in ns.items()}
Copy link
Contributor

@dmontagu dmontagu Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use collections.ChainMap here instead of looping over all the items in all the dicts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about using chain maps but couldn't remember all the reasons I did not. Looking into it, we could also
ditch LazyLocalNamespace and use a chain map instead, but ChainMap is a mutable mapping, and ideally we'd like to enforce immutability (at least from at the type checker level). I added a comment about this.

Regarding whether data should return a chain map instead, we discussed about it in this comment.

parent_ns = _model_construction.unpack_lenient_weakvaluedict(cls.__pydantic_parent_namespace__) or {}

ns_resolver = _namespace_utils.NsResolver(
parent_namespace={**rebuild_ns, **parent_ns},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another place collections.ChainMap could maybe be used, maybe not necessary

@Viicos Viicos merged commit c772b43 into main Oct 10, 2024
63 checks passed
@Viicos Viicos deleted the ns-refactor branch October 10, 2024 19:55
RF-Tar-Railt added a commit to RF-Tar-Railt/nonebot-plugin-uninfo that referenced this pull request Feb 17, 2025
adapt pydantic BC in 2.10 (pydantic/pydantic#10530)
RF-Tar-Railt added a commit to RF-Tar-Railt/nonebot-plugin-uninfo that referenced this pull request Feb 17, 2025
adapt pydantic BC in 2.10 (pydantic/pydantic#10530)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes-performance Used for performance improvements.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants