Skip to content

Use CSPRNG for random UUID node ID #135244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
LamentXU123 opened this issue Jun 8, 2025 · 23 comments
Open

Use CSPRNG for random UUID node ID #135244

LamentXU123 opened this issue Jun 8, 2025 · 23 comments
Assignees
Labels
3.13 bugs and security fixes 3.14 bugs and security fixes 3.15 new features, bugs and security fixes deferred-blocker stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@LamentXU123
Copy link
Contributor

LamentXU123 commented Jun 8, 2025

Feature or enhancement

Proposal:

Issue for: #135226
(I forgot to open issue first, sry)
I'll do the benchmark of the performance impact when switching to secrets.getrandbits() for the clock sequence soon. I'll post here.
I totally agree with the previous discussion. Summary:

  • Maybe make it more clearer in the docs of UUIDv8(?) Its possible to predict UUIDv8 if hackers have collected enough UUIDs. BUT, UUIDv8 is highly customizable. It's pretty ez if someone wants to use CSPRNG in UUIDv8s by writing a wrapper outside the call(or using UUIDv4 instead lol).
  • Change _random_getnode() from PRNG to CSPRNG which randomly generates 48-bits note when python failed to get MAC address
def _random_getnode():
+	import secrets
+	return secrets.randbits(48) | (1 << 40)
-	import random
-	return random.getranbits(48) | (1 << 40)
  • Change the logic when generating clock_seq from PRNG to CSPRNG in both uuidv1() and uuidv6(). Well I think this is of great importance. Those two algorithm are time-based. The only random thing comes from clock_seq. Hackers can predict the next UUID if they got 624*32 random bits generated from the previous UUIDs (1427 UUIDs will be enough).
def uuid1(node=None, clock_seq=None):
    ...
    if clock_seq is None:
+       import secrets
+       clock_seq = secrets.randbits(14) # instead of stable storage
-		import random
-		clock_seq = random.getrandbits(14) # instead of stable storage
    ...

def uuid6(node=None, clock_seq=None):
    ...
+       import secrets
+       clock_seq = secrets.randbits(14) # instead of stable storage
-		import random
-		clock_seq = random.getrandbits(14) # instead of stable storage
    ...

note that nothing will be changed if the performance impact is unacceptable.

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

#135226

Linked PRs

Note

3.13 needs a backport but we need to wait until the MAC address detection PR is merged (see #135244 (comment)), as otherwise, there is a small penalty at import time and upon first use of uuid1() and uuid6().

@LamentXU123 LamentXU123 added the type-feature A feature request or enhancement label Jun 8, 2025
@LamentXU123
Copy link
Contributor Author

LamentXU123 commented Jun 8, 2025

Another CSPRNG( os.urandom() + cutoff + int.from_bytes()) is considered if its faster than secrets.randbits()

I've run the brench mark wieh pyperf. It seems like os.urandom() might be faster.

    from os import urandom
    return int.from_bytes(urandom(2), 'big') & 0x3fff

so to sum up, may we use os.urandom() instead of random.getrandbits() in clock_seq and notes generations, and make clearer security clarifications in doc of UUIDv8? @picnixz

@picnixz picnixz added the stdlib Python modules in the Lib dir label Jun 8, 2025
@picnixz
Copy link
Member

picnixz commented Jun 8, 2025

Hackers can predict the next UUID if they got 624*32 random bits generated from the previous UUIDs (1427 UUIDs will be enough).

To be precise, I should have said that this is assuming that the UUIDs are generated sequentially and that the underlying MT-19937 state is not touched in between, so it's a bit more complex.

@picnixz
Copy link
Member

picnixz commented Jun 8, 2025

so to sum up, may we use os.urandom() instead of random.getrandbits() in clock_seq and notes generations

I think I saw the benchmarks results somewhere. Can you post them in full as well? (I think you edited your message).

@picnixz picnixz changed the title Get random in UUID more cryptography secure Use CSPRNG for random UUID node ID & clock sequence #135226 Jun 8, 2025
@picnixz picnixz changed the title Use CSPRNG for random UUID node ID & clock sequence #135226 Use CSPRNG for random UUID node ID & clock sequenc Jun 8, 2025
@picnixz picnixz changed the title Use CSPRNG for random UUID node ID & clock sequenc Use CSPRNG for random UUID node ID & clock sequence Jun 8, 2025
@picnixz picnixz added type-bug An unexpected behavior, bug, or error 3.13 bugs and security fixes 3.14 bugs and security fixes 3.15 new features, bugs and security fixes and removed type-feature A feature request or enhancement labels Jun 8, 2025
@picnixz
Copy link
Member

picnixz commented Jun 8, 2025

I'll treat this as a bugfix but not a security bug fix as it's not critical enough IMO.

@LamentXU123
Copy link
Contributor Author

LamentXU123 commented Jun 8, 2025

so to sum up, may we use os.urandom() instead of random.getrandbits() in clock_seq and notes generations

I think I saw the benchmarks results somewhere. Can you post them in full as well? (I think you edited your message).

os.urandom(): Mean +- std dev: 205 ns +- 6 ns
random.getrandbits(): Mean +- std dev: 105 ns +- 5 ns
secrets.randbits(): Mean +- std dev: 17.6 us +- 2.3 us

pyperf on python 3.9

benchmark_UUID.json

@LamentXU123
Copy link
Contributor Author

LamentXU123 commented Jun 8, 2025

Hackers can predict the next UUID if they got 624*32 random bits generated from the previous UUIDs (1427 UUIDs will be enough).

To be precise, I should have said that this is assuming that the UUIDs are generated sequentially and that the underlying MT-19937 state is not touched in between, so it's a bit more complex.

You're right. And there are also secruity warnings of uuid1() and uuid6() in docs. So its just not a serious problem.

@picnixz
Copy link
Member

picnixz commented Jun 8, 2025

pyperf on python 3.9

Can you test using the latest main branch instead of Python 3.9? TiA

@picnixz
Copy link
Member

picnixz commented Jun 8, 2025

I've done the benchmarks on my side and they are different:

$ ./python -m pyperf timeit -s "import random" "random.getrandbits(14)"
Mean +- std dev: 14.0 ns +- 0.3 ns

$ ./python -m pyperf timeit -s "import os" "int.from_bytes(os.urandom(2)) & 0x3fff"
Mean +- std dev: 555 ns +- 5 ns

$ ./python -m pyperf timeit -s "import secrets" "secrets.randbits(14)"
Mean +- std dev: 599 ns +- 5 ns

So we're still losing quite a lot of speed.

@picnixz
Copy link
Member

picnixz commented Jun 8, 2025

This is a very huge drop of performance. We're 25x times slower now. Since the node ID is cached, I think we can live with a CSPRNG. However resampling the clock sequence every time is too much of a performance loss, so we won't use it. However, we can add a note.

@picnixz picnixz changed the title Use CSPRNG for random UUID node ID & clock sequence Use CSPRNG for random UUID node ID Jun 8, 2025
@LamentXU123
Copy link
Contributor Author

I've done the benchmarks on my side and they are different:

$ ./python -m pyperf timeit -s "import random" "random.getrandbits(14)"
Mean +- std dev: 14.0 ns +- 0.3 ns

$ ./python -m pyperf timeit -s "import os" "int.from_bytes(os.urandom(2)) & 0x3fff"
Mean +- std dev: 555 ns +- 5 ns

$ ./python -m pyperf timeit -s "import secrets" "secrets.randbits(14)"
Mean +- std dev: 599 ns +- 5 ns
So we're still losing quite a lot of speed.

My benchmark on 3.13:

os.urandom(): Mean +- std dev: 183 ns +- 7 ns
random.getrandbits(): Mean +- std dev: 75.0 ns +- 3.1 ns

Well yes, we are losing a quiet lot of speed

@LamentXU123
Copy link
Contributor Author

Correct. node ID is cached so even if its 25x slower we just need to generate it once.

@LamentXU123
Copy link
Contributor Author

LamentXU123 commented Jun 8, 2025

So the new changes would be only in _random_getnode:

def _random_getnode():
	import secrets
	return secrets.randbits(48) | (1 << 40)

and some notes in docs?

@picnixz
Copy link
Member

picnixz commented Jun 8, 2025

Yes. But prefer using os.urandom instead of secrets. os is likely already imported (and 48 is already a multiple of 8 so it's easy)

@LamentXU123
Copy link
Contributor Author

Ok. I will post the final changes soon at the linked PR

picnixz added a commit that referenced this issue Jun 8, 2025
…, §6.10.3 (#135226)

This aligns with the recommendations of RFC 9562, Section 6.10, paragraph 3 [1].

[1]: https://www.rfc-editor.org/rfc/rfc9562.html#section-6.10-3.

---------

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 8, 2025
…C 9562, §6.10.3 (pythonGH-135226)

This aligns with the recommendations of RFC 9562, Section 6.10, paragraph 3 [1].

[1]: https://www.rfc-editor.org/rfc/rfc9562.html#section-6.10-3.

---------
(cherry picked from commit 1cb7163)

Co-authored-by: LamentXU <108666168+LamentXU123@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
picnixz added a commit that referenced this issue Jun 8, 2025
…FC 9562, §6.10.3 (GH-135226) (#135255)

gh-135244: generate UUID random Node ID with a CSPRNG as per RFC 9562, §6.10.3 (GH-135226)

This aligns with the recommendations of RFC 9562, Section 6.10, paragraph 3 [1].

[1]: https://www.rfc-editor.org/rfc/rfc9562.html#section-6.10-3.

---------
(cherry picked from commit 1cb7163)

Co-authored-by: LamentXU <108666168+LamentXU123@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
@picnixz
Copy link
Member

picnixz commented Jun 8, 2025

For 3.13, I'll hold the backport because I need #134704, which I don't want to merge for a while (see #132901 (comment) for the rationale).

@picnixz
Copy link
Member

picnixz commented Jun 10, 2025

I'd like to keep the issue open until the backport is merged

@ZeroIntensity
Copy link
Member

(Backport is merged now)

@picnixz
Copy link
Member

picnixz commented Jun 13, 2025

I meant the 3.13 back port that I hadn't created yet because it's blocked by other backports that I deferred due to possible runtime issues that I'd like first to see in 3.14 beta.

@picnixz picnixz reopened this Jun 13, 2025
@ZeroIntensity
Copy link
Member

Oops, sorry!

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 13, 2025
…GH-135433)

(cherry picked from commit 394d798)

Co-authored-by: LamentXU <108666168+LamentXU123@users.noreply.github.com>
picnixz pushed a commit that referenced this issue Jun 13, 2025
…5433) (#135467)

gh-135244: improve wording of `uuid8` docs about CSPRNG (GH-135433)
(cherry picked from commit 394d798)

Co-authored-by: LamentXU <108666168+LamentXU123@users.noreply.github.com>
@LamentXU123
Copy link
Contributor Author

LamentXU123 commented Jun 15, 2025

Hi there, @picnixz

I still think we may not use all PRNG for uuid8() func by default.

Since RFC9562 do not mentioned how implementations should generate UUID8 by default, I think it's still arguable whether to use CSPRNG or PRNG. Now, I know there is indeed 25x performance drop if we use all CSPRNG, but think about this, there are ppl who love to rely on default attributes. I still think we may not set all a, b and c to vulnerable PRNG in default attributes tho. IMO we can set one attributes to CSPRNG which could solve the security problem while minimum the performance loss.

According to the previous discussion, you've mentioned that RFC states that we shouldn't assume UUID is hard to guess, and if we change uuid8 to all CSPRNG it will be like UUID4 which is kind of meaningless. But yeah, UUID8 is highly customizable, we are assuming our users will customize them. IMO we may at least ensure it is cryptographically secure when careless developers are using the default attributes. I think we could set b to CSPRNG at least.

Another point is, collecting 159 UUIDs sequentially is not that hard in some scene, especially when it comes to web app auth issues.

Think about this: if a web app use uuid8() for their token generation for password-forgot feature. Once they use the default attributes, the token will be so easy to guess because we could certainly get 159 UUIDs sequentially by requesting 159 times to the forgot-password route in a short time period, which is kind of danger.

So to sum up, I think we could get attr b to CSPRNG since it is just 12 bits long and will not cause unacceptable performance decline while ensuring the security of non-customize uuid8().

def uuid8(a=None, b=None, c=None):
    if a is None:
        import random
        a = random.getrandbits(48)
    if b is None:
-        import random
-        b = random.getrandbits(12)
+        b = int.from_bytes(os.urandom(2)) & 0xFFF
    if c is None:
        import random
        c = random.getrandbits(62)
    int_uuid_8 = (a & 0xffff_ffff_ffff) << 80
    int_uuid_8 |= (b & 0xfff) << 64
    int_uuid_8 |= c & 0x3fff_ffff_ffff_ffff
    # by construction, the variant and version bits are already cleared
    int_uuid_8 |= _RFC_4122_VERSION_8_FLAGS
    return UUID._from_int(int_uuid_8)

Thank you for considering this.

@picnixz
Copy link
Member

picnixz commented Jun 15, 2025

I don't want to only make 'b' "special". UUIDv8 should be left to users and is not meant to be secure at all (in its general form). There's no reason to make it use a CSPRNG, that's the wrong semantics and UUIDv4 should be used instead.

Once they use the default attributes, the token will be so easy to guess because we could certainly get 159 UUIDs sequentially by requesting 159 times to the forgot-password route in a short time period, which is kind of danger.

This is an application vulnerability, not a Python issue in this case. And it's not necessarily "so" easy to guess. Why? because adversaries don't know if they are indeed sequential. What if someone else is querying the API at the same time? what if there are other random.* calls elsewhere in the program? recovering the state of a MT-19937 generator is hard in practice because adversaries don't know if the state is updated elsewhere and they have no way to detect this (well they can detect it by trying to the attack and check if their bet was correct, but trhey can't avoid this).

Recovering the state of the PRNG only works if one has direct access to the PRNG as an oracle that is not tempered by someone else.

Since RFC9562 do not mentioned how implementations should generate UUID8 by default

And that's why I make the choice of using a PRNG and not a CSPRNG for UUIDv8 by default. I added UUIDv8 because it's part of the RFC, but most of the time, users will use UUIDv4 and v7 (before they would have used v6 but v7 sold pretty well in the db-oriented community)

@LamentXU123
Copy link
Contributor Author

LamentXU123 commented Jun 15, 2025

This is an application vulnerability, not a Python issue in this case.

ur right, but I mean we could avoid those vulnerability if we use CSPRNG, I suggest to use b is because b is only 12 bits long and will cause less performance impact.

I know users will use version 4 and 7 at most time. But I think we could provide a more secure uuid8 func in case of some users use uuid8() with default attr, which is completely wrong, but possible I think, and we COULD avoid it by changing b to CSPRNG.

@picnixz
Copy link
Member

picnixz commented Jun 15, 2025

Again, the attack surface is way too small and quite hard to exploit.

But I think we could provide a more secure uuid8 func in case of some users use uuid8() with default attr, which is completely wrong, but possible I think, and we COULD avoid it by changing b to CSPRNG.

This penalizes other users, which I don't want to. And it wouldn't really solve the issue. Assuming that the adversary is able to attack a UUIDv8 where a, b, and c are generated by the same PRNG (this is the case here), then even if 'b' is generated by a CSPRNG, they can ignore it. And then, they can predict the next 'a' and 'c' chunks, and can just bruteforce the 'b' chunk directly (12 bits is only 5k possibilities, and if they can spam 200 API sequential calls to get their password-reset token, we should assume they can also spam 5k sequential API calls).

Once again, I will reject this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 bugs and security fixes 3.15 new features, bugs and security fixes deferred-blocker stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: In Progress
Development

No branches or pull requests

3 participants