Skip to content

The hash and equals methods of code objects are large, complex and unnecessary. #101346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
markshannon opened this issue Jan 26, 2023 · 9 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@markshannon
Copy link
Member

markshannon commented Jan 26, 2023

We should just compare code objects by identity.

Technically, they are immutable values but realistically, there is only on code object per source unit.
Attempting to maintain value semantics when the underlying code object is internally mutable (due to specialization and instrumentation) is cumbersome and probably buggy, and will only get worse with more optimizations.

We should just delete the hash and comparison methods and have identity comparison. It will be faster too.

Linked PRs

@markshannon markshannon added the performance Performance or resource usage label Jan 26, 2023
@markshannon
Copy link
Member Author

This will involves a slight change in semantics:
Currently:

>>> def (): pass
>>> c = f.__code__
>>> c.replace() == c
True

With this change:

>>> c.replace() == c
False

@Eclips4
Copy link
Member

Eclips4 commented Feb 2, 2023

It's a good idea.
What's stopping you from implementing this?

How i understand, we need to delete code_hash and code_richcompare from Objects/codeobject.c(and ofc, set tp_hash and tp_richcompare to 0 in PyCode_Type
Does it breaks backwards compatibility?

@carljm
Copy link
Member

carljm commented Mar 18, 2023

We should just compare code objects by value.

I assume this meant to say "by identity."

This will involves a slight change in semantics

So do we need a SC ruling on whether this needs a deprecation process? I guess technically we could raise a DeprecationWarning anytime an __eq__ comparison returns True for different code objects.

@carljm
Copy link
Member

carljm commented Apr 6, 2023

I put up a pull request for this, and looking at the resulting test failures in the CPython test suite alone has changed my mind about whether this is a good idea (or at the very least, whether we can do it without a deprecation process.)

Dynamically compiling Python source strings to code objects is not uncommon, and in that scenario it is a) quite likely that the same input will result in the "same" output code object, position information and all, and b) quite likely that the test suite for such dynamic-compilation code may be using code object equality. And in fact this case does occur in our own test suite (e.g. in test_codeop and elsewhere), which makes me think it's reasonably likely to be occurring in third-party test suites also.

I didn't see any reasonable way to fix test_codeop other than to reimplement the entire current definition of code object equality as an assertion method in the codeop tests. This is actually quite difficult to do given the unavailability of _PyCode_ConstantKey to managed code and the fact that code objects can occur nested arbitrarily deep in co_consts. And the fact that we would need to do it at all suggests that the current definition of code object equality is in fact sometimes useful.

@markshannon
Copy link
Member Author

We have lots of tests that test artifacts of the implementation, rather than any spec.
It may not be as bad as it appears.

@carljm
Copy link
Member

carljm commented Apr 6, 2023

Indeed, I hoped initially it was not as bad as it appeared! But when I dug into trying to fix test_codeop, it did lead me to question whether we should do this, for the reasons I mentioned above. Happy to hear counter-arguments, though! Ideally specific proposals for what we should do with test_codeop, and what we would tell any third-party libraries that do dynamic compilation if they have similar code in their test suite.

@markshannon
Copy link
Member Author

What is test_codeop supposed to be testing? I'm pretty sure it isn't the notion of equality of code objects.

Whatever notion of equivalence it has should be spelled, rather than using == on code objects.
I suspect that testing that the bytecode, names and constants were the same would be enough.

@markshannon
Copy link
Member Author

any third-party libraries that do dynamic compilation

Given the bytecode changes from one version to the next, I suspect a change in code object equality will be the least of their problems. We have changed the behavior of code object equality several times in the past.

@carljm
Copy link
Member

carljm commented Apr 6, 2023

I think in principle what test_codeop wants to test is "would any consumer of this code object observe any behavior difference from this other code object," which does sound a lot like "code object equality."

We have changed the behavior of code object equality several times in the past.

Has it ever been defined in a way that would mean that compile("some code") == compile("some code") would ever not have been true? This is the scenario that I think (in somewhat more complex disguise) may not be uncommon in test suites for libraries doing dynamic compilation.

I suspect that testing that the bytecode, names and constants were the same would be enough.

Constants make this non-trivial, since you have to check the types of constants and, if they are code objects, recursively use whatever definition of pseudo-code-object equality you have chosen on those nested code objects too, instead of just comparing them. self.assertEqual(c1.co_consts, c2.co_consts) doesn't work. This is of course doable in any particular case, but it seems to me like it may be a pretty noticeable burden to impose on anyone who wants to ask "is method 1 of compiling this code leading to the 'same' results as method 2."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants