-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
Use CSPRNG for random UUID node ID #135244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Another CSPRNG( I've run the brench mark wieh pyperf. It seems like from os import urandom
return int.from_bytes(urandom(2), 'big') & 0x3fff so to sum up, may we use |
To be precise, I should have said that this is assuming that the UUIDs are generated sequentially and that the underlying MT-19937 state is not touched in between, so it's a bit more complex. |
I think I saw the benchmarks results somewhere. Can you post them in full as well? (I think you edited your message). |
I'll treat this as a bugfix but not a security bug fix as it's not critical enough IMO. |
os.urandom(): Mean +- std dev: 205 ns +- 6 ns pyperf on python 3.9 |
You're right. And there are also secruity warnings of |
Can you test using the latest main branch instead of Python 3.9? TiA |
I've done the benchmarks on my side and they are different: $ ./python -m pyperf timeit -s "import random" "random.getrandbits(14)"
Mean +- std dev: 14.0 ns +- 0.3 ns
$ ./python -m pyperf timeit -s "import os" "int.from_bytes(os.urandom(2)) & 0x3fff"
Mean +- std dev: 555 ns +- 5 ns
$ ./python -m pyperf timeit -s "import secrets" "secrets.randbits(14)"
Mean +- std dev: 599 ns +- 5 ns So we're still losing quite a lot of speed. |
This is a very huge drop of performance. We're 25x times slower now. Since the node ID is cached, I think we can live with a CSPRNG. However resampling the clock sequence every time is too much of a performance loss, so we won't use it. However, we can add a note. |
My benchmark on 3.13:
Well yes, we are losing a quiet lot of speed |
Correct. node ID is cached so even if its 25x slower we just need to generate it once. |
So the new changes would be only in _random_getnode:
and some notes in docs? |
Yes. But prefer using |
Ok. I will post the final changes soon at the linked PR |
…, §6.10.3 (#135226) This aligns with the recommendations of RFC 9562, Section 6.10, paragraph 3 [1]. [1]: https://www.rfc-editor.org/rfc/rfc9562.html#section-6.10-3. --------- Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
…C 9562, §6.10.3 (pythonGH-135226) This aligns with the recommendations of RFC 9562, Section 6.10, paragraph 3 [1]. [1]: https://www.rfc-editor.org/rfc/rfc9562.html#section-6.10-3. --------- (cherry picked from commit 1cb7163) Co-authored-by: LamentXU <108666168+LamentXU123@users.noreply.github.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
…FC 9562, §6.10.3 (GH-135226) (#135255) gh-135244: generate UUID random Node ID with a CSPRNG as per RFC 9562, §6.10.3 (GH-135226) This aligns with the recommendations of RFC 9562, Section 6.10, paragraph 3 [1]. [1]: https://www.rfc-editor.org/rfc/rfc9562.html#section-6.10-3. --------- (cherry picked from commit 1cb7163) Co-authored-by: LamentXU <108666168+LamentXU123@users.noreply.github.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
For 3.13, I'll hold the backport because I need #134704, which I don't want to merge for a while (see #132901 (comment) for the rationale). |
I'd like to keep the issue open until the backport is merged |
(Backport is merged now) |
I meant the 3.13 back port that I hadn't created yet because it's blocked by other backports that I deferred due to possible runtime issues that I'd like first to see in 3.14 beta. |
Oops, sorry! |
…GH-135433) (cherry picked from commit 394d798) Co-authored-by: LamentXU <108666168+LamentXU123@users.noreply.github.com>
Hi there, @picnixz I still think we may not use all PRNG for Since RFC9562 do not mentioned how implementations should generate UUID8 by default, I think it's still arguable whether to use CSPRNG or PRNG. Now, I know there is indeed 25x performance drop if we use all CSPRNG, but think about this, there are ppl who love to rely on default attributes. I still think we may not set all a, b and c to vulnerable PRNG in default attributes tho. IMO we can set one attributes to CSPRNG which could solve the security problem while minimum the performance loss. According to the previous discussion, you've mentioned that RFC states that we shouldn't assume UUID is hard to guess, and if we change uuid8 to all CSPRNG it will be like UUID4 which is kind of meaningless. But yeah, UUID8 is highly customizable, we are assuming our users will customize them. IMO we may at least ensure it is cryptographically secure when careless developers are using the default attributes. I think we could set b to CSPRNG at least. Another point is, collecting 159 UUIDs sequentially is not that hard in some scene, especially when it comes to web app auth issues. Think about this: if a web app use uuid8() for their token generation for password-forgot feature. Once they use the default attributes, the token will be so easy to guess because we could certainly get 159 UUIDs sequentially by requesting 159 times to the forgot-password route in a short time period, which is kind of danger. So to sum up, I think we could get attr b to CSPRNG since it is just 12 bits long and will not cause unacceptable performance decline while ensuring the security of non-customize uuid8(). def uuid8(a=None, b=None, c=None):
if a is None:
import random
a = random.getrandbits(48)
if b is None:
- import random
- b = random.getrandbits(12)
+ b = int.from_bytes(os.urandom(2)) & 0xFFF
if c is None:
import random
c = random.getrandbits(62)
int_uuid_8 = (a & 0xffff_ffff_ffff) << 80
int_uuid_8 |= (b & 0xfff) << 64
int_uuid_8 |= c & 0x3fff_ffff_ffff_ffff
# by construction, the variant and version bits are already cleared
int_uuid_8 |= _RFC_4122_VERSION_8_FLAGS
return UUID._from_int(int_uuid_8) Thank you for considering this. |
I don't want to only make 'b' "special". UUIDv8 should be left to users and is not meant to be secure at all (in its general form). There's no reason to make it use a CSPRNG, that's the wrong semantics and UUIDv4 should be used instead.
This is an application vulnerability, not a Python issue in this case. And it's not necessarily "so" easy to guess. Why? because adversaries don't know if they are indeed sequential. What if someone else is querying the API at the same time? what if there are other Recovering the state of the PRNG only works if one has direct access to the PRNG as an oracle that is not tempered by someone else.
And that's why I make the choice of using a PRNG and not a CSPRNG for UUIDv8 by default. I added UUIDv8 because it's part of the RFC, but most of the time, users will use UUIDv4 and v7 (before they would have used v6 but v7 sold pretty well in the db-oriented community) |
ur right, but I mean we could avoid those vulnerability if we use CSPRNG, I suggest to use b is because b is only 12 bits long and will cause less performance impact. I know users will use version 4 and 7 at most time. But I think we could provide a more secure uuid8 func in case of some users use uuid8() with default attr, which is completely wrong, but possible I think, and we COULD avoid it by changing b to CSPRNG. |
Again, the attack surface is way too small and quite hard to exploit.
This penalizes other users, which I don't want to. And it wouldn't really solve the issue. Assuming that the adversary is able to attack a UUIDv8 where a, b, and c are generated by the same PRNG (this is the case here), then even if 'b' is generated by a CSPRNG, they can ignore it. And then, they can predict the next 'a' and 'c' chunks, and can just bruteforce the 'b' chunk directly (12 bits is only 5k possibilities, and if they can spam 200 API sequential calls to get their password-reset token, we should assume they can also spam 5k sequential API calls). Once again, I will reject this feature. |
Uh oh!
There was an error while loading. Please reload this page.
Feature or enhancement
Proposal:
Issue for: #135226
(I forgot to open issue first, sry)
I'll do the benchmark of the performance impact when switching to
secrets.getrandbits()
for the clock sequence soon. I'll post here.I totally agree with the previous discussion. Summary:
_random_getnode()
from PRNG to CSPRNG which randomly generates 48-bits note when python failed to get MAC addressclock_seq
from PRNG to CSPRNG in bothuuidv1()
anduuidv6()
. Well I think this is of great importance. Those two algorithm are time-based. The only random thing comes fromclock_seq
. Hackers can predict the next UUID if they got 624*32 random bits generated from the previous UUIDs (1427 UUIDs will be enough).note that nothing will be changed if the performance impact is unacceptable.
Has this already been discussed elsewhere?
I have already discussed this feature proposal on Discourse
Links to previous discussion of this feature:
#135226
Linked PRs
uuid8
docs about CSPRNG #135433uuid8
docs about CSPRNG (GH-135433) #135467Note
3.13 needs a backport but we need to wait until the MAC address detection PR is merged (see #135244 (comment)), as otherwise, there is a small penalty at import time and upon first use of uuid1() and uuid6().
The text was updated successfully, but these errors were encountered: