Skip to content

Improve hash() builtin docstring with caveats. #125229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gpshead
Copy link
Member

@gpshead gpshead commented Oct 9, 2024

Mention its return type and that the value can be expected to change between processes (hash randomization).

Why? The hash builtin gets reached for and used by a lot of people whether it is the right tool or not. IDEs surface docstrings and people use pydoc and help(hash).

There are more possible caveats we could go into here such as classes implementing their own dunder methods like __eq__ or __hash__ naturally being able to violate the constraint stated in this docstring. But that feels like too much for a beginner friendly docstring.

Mention its return type and that the value can be expected to change between
processes (hash randomization).

Why? The `hash` builtin gets reached for and used by a lot of people whether it
is the right tool or not. IDEs surface docstrings and people use pydoc and
`help(hash)`.

There are more possible caveats we could go into here such as classes
implementing their own dunder methods like `__eq__` or `__hash__` naturally
being able to violate the constraint stated in this docstring. But _that_ feels
like too much for a beginner friendly docstring.
@gpshead gpshead added docs Documentation in the Doc dir skip issue skip news needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes labels Oct 9, 2024
reverse is not necessarily true. Hash values may vary between Python
processes.

This hash value is used internally by Python dict and set hash tables.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this sentence adds value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe remove the "hash tables" part.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or say "the dict and set builtin types".

@@ -1623,15 +1623,18 @@ hash as builtin_hash
obj: object
/

Return the hash value for the given object.
Return the integer hash value for the given object within this process.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggled on which way to word this. do people, especially newbies, understand what a process is? i've also seen linter wording around this say "between runs" but that is less technical, what even is a run? people come at interactive python use from so many environments that I don't know if the concept of a run or process makes sense. but process is at least technically accurate. as hash randomization, for example, is process based. and hashes of things like object() are the pointer values and thus process specific.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @JelleZijlstra - given the addition to the next paragraph, I don't think you need this addition here.

@@ -1623,15 +1623,18 @@ hash as builtin_hash
obj: object
/

Return the hash value for the given object.
Return the integer hash value for the given object within this process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the "within this process" part adds value; why did you add it?

reverse is not necessarily true. Hash values may vary between Python
processes.

This hash value is used internally by Python dict and set hash tables.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe remove the "hash tables" part.


Two objects that compare equal must also have the same hash value, but the
reverse is not necessarily true.
reverse is not necessarily true. Hash values may vary between Python
processes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth mentioning that mutable objects aren't hashable, and that hash() on them raises a TypeError?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like "Equivalent objects will always give the same hash value within a single Python process, but a different Python process may report a different hash value. Not all objects are hashable."?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, "The same object can have different hash values in different processes. Use hash() only within a single process."

Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we explicitly say that hash should not be used for hashing in crypto operations?

This is something I've noticed several times.

However, sometimes adding a note to not do something actually make people do this even more 🙈

Copy link
Contributor

@ncoghlan ncoghlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General idea seems sound, wordsmithing is tricky :(

@@ -1623,15 +1623,18 @@ hash as builtin_hash
obj: object
/

Return the hash value for the given object.
Return the integer hash value for the given object within this process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @JelleZijlstra - given the addition to the next paragraph, I don't think you need this addition here.


Two objects that compare equal must also have the same hash value, but the
reverse is not necessarily true.
reverse is not necessarily true. Hash values may vary between Python
processes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like "Equivalent objects will always give the same hash value within a single Python process, but a different Python process may report a different hash value. Not all objects are hashable."?

reverse is not necessarily true. Hash values may vary between Python
processes.

This hash value is used internally by Python dict and set hash tables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or say "the dict and set builtin types".

@antiseebs
Copy link

I like the "within this process" language, and while I can see that it is also covered by something in another paragraph... I also think this is a thing that is empirically very easy for people to miss or get confused by, and I think the extra couple of words to make sure that someone who reads one paragraph and stops there because they think they know what the function does are probably a very good investment in reduced user pain.

Copy link
Contributor

@willingc willingc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks @gpshead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting changes docs Documentation in the Doc dir needs backport to 3.13 bugs and security fixes skip issue skip news
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

8 participants