-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Improve hash()
builtin docstring with caveats.
#125229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Mention its return type and that the value can be expected to change between processes (hash randomization). Why? The `hash` builtin gets reached for and used by a lot of people whether it is the right tool or not. IDEs surface docstrings and people use pydoc and `help(hash)`. There are more possible caveats we could go into here such as classes implementing their own dunder methods like `__eq__` or `__hash__` naturally being able to violate the constraint stated in this docstring. But _that_ feels like too much for a beginner friendly docstring.
reverse is not necessarily true. Hash values may vary between Python | ||
processes. | ||
|
||
This hash value is used internally by Python dict and set hash tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this sentence adds value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove the "hash tables" part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or say "the dict and set builtin types".
@@ -1623,15 +1623,18 @@ hash as builtin_hash | |||
obj: object | |||
/ | |||
|
|||
Return the hash value for the given object. | |||
Return the integer hash value for the given object within this process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggled on which way to word this. do people, especially newbies, understand what a process is? i've also seen linter wording around this say "between runs" but that is less technical, what even is a run? people come at interactive python use from so many environments that I don't know if the concept of a run or process makes sense. but process is at least technically accurate. as hash randomization, for example, is process based. and hashes of things like object()
are the pointer values and thus process specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @JelleZijlstra - given the addition to the next paragraph, I don't think you need this addition here.
@@ -1623,15 +1623,18 @@ hash as builtin_hash | |||
obj: object | |||
/ | |||
|
|||
Return the hash value for the given object. | |||
Return the integer hash value for the given object within this process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure the "within this process" part adds value; why did you add it?
reverse is not necessarily true. Hash values may vary between Python | ||
processes. | ||
|
||
This hash value is used internally by Python dict and set hash tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe remove the "hash tables" part.
|
||
Two objects that compare equal must also have the same hash value, but the | ||
reverse is not necessarily true. | ||
reverse is not necessarily true. Hash values may vary between Python | ||
processes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth mentioning that mutable objects aren't hashable, and that hash() on them raises a TypeError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like "Equivalent objects will always give the same hash value within a single Python process, but a different Python process may report a different hash value. Not all objects are hashable."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, "The same object can have different hash values in different processes. Use hash() only within a single process."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we explicitly say that hash
should not be used for hashing in crypto operations?
This is something I've noticed several times.
However, sometimes adding a note to not do something actually make people do this even more 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General idea seems sound, wordsmithing is tricky :(
@@ -1623,15 +1623,18 @@ hash as builtin_hash | |||
obj: object | |||
/ | |||
|
|||
Return the hash value for the given object. | |||
Return the integer hash value for the given object within this process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @JelleZijlstra - given the addition to the next paragraph, I don't think you need this addition here.
|
||
Two objects that compare equal must also have the same hash value, but the | ||
reverse is not necessarily true. | ||
reverse is not necessarily true. Hash values may vary between Python | ||
processes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like "Equivalent objects will always give the same hash value within a single Python process, but a different Python process may report a different hash value. Not all objects are hashable."?
reverse is not necessarily true. Hash values may vary between Python | ||
processes. | ||
|
||
This hash value is used internally by Python dict and set hash tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or say "the dict and set builtin types".
I like the "within this process" language, and while I can see that it is also covered by something in another paragraph... I also think this is a thing that is empirically very easy for people to miss or get confused by, and I think the extra couple of words to make sure that someone who reads one paragraph and stops there because they think they know what the function does are probably a very good investment in reduced user pain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks @gpshead
Mention its return type and that the value can be expected to change between processes (hash randomization).
Why? The
hash
builtin gets reached for and used by a lot of people whether it is the right tool or not. IDEs surface docstrings and people use pydoc andhelp(hash)
.There are more possible caveats we could go into here such as classes implementing their own dunder methods like
__eq__
or__hash__
naturally being able to violate the constraint stated in this docstring. But that feels like too much for a beginner friendly docstring.