-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Allow for larger int64s in random.randint #6824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I don't remember whether we consider it legit to use Could you add a test? And either run that test on Windows and report back, or if you don't have convenient access to a Windows box then speak up and we'll hassle @jreback or someone into doing it? |
Also, as a tip for the future: it's nice to write something like (You can also write |
For this to work, I think you need to make |
@jaimefrio : Ah, I think you are right. However, if we propagate, I think that would also require changing Secondly, does anyone have an idea as to why only one build managed to pass the test I wrote, but the other builds failed? Does it have to do with what was discussed in my first paragraph? |
I haven't analyzed in detail, but the reason that that one build passed almost certainly has something to do with that being the one build that we run in 32-bit mode (so for that build |
Okay. That seems to make sense. From what I understand, that seems to be related to the comments made by @jaimefrio. I think before I can make any further changes (to the C code most likely), I'll need to have a better understanding of the internals though. |
Close and reopen to restart tests. |
@charris : Thanks! I'm still working on understanding this random number generation thing. If anyone has any input/ideas they would like to contribute, that would be great! |
Undid a good portion of my changes and left just the unit test because all of these |
Many |
Travis is finally happy! 😄 Please review carefully! I did a lot of recasting all over the place, so I could have inadvertently broken a function that Travis failed to catch, much like the issue I was trying to fix in this PR. |
I think this is wrong (iow, the change of datatype in the rng code should not be made): this is the 32-bit Mersenne twister generator --- it does not become 64-bit just by increasing the data type. The 64-bit MT is a different rng. The rng code should not be changed. The issue is only in the randint / rk_interval methods, and only those need modification. |
rk_ulong(rk_state *state) | ||
{ | ||
#if ULONG_MAX <= 0xffffffffUL | ||
#if ULLONG_MAX <= 0xffffffffULL | ||
return rk_random(state); | ||
#else | ||
return (rk_random(state) << 32) | (rk_random(state)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get this part to work with long long / int64 (after reverting the change of datatype in the rng core), you need a cast: ((unsigned long long)rk_random(state) << 32) | ...
The generator only produces 32-bit numbers: changing the data types in
|
The maximum |
|
Okay, but how are such values generated if |
|
I am referring to the |
They aren't; they are |
Oh right, right! My bad. For some reason I kept reading |
@@ -703,13 +703,13 @@ cdef class RandomState: | |||
|
|||
""" | |||
cdef ndarray state "arrayObject_state" | |||
state = <ndarray>np.empty(624, np.uint) | |||
state = <ndarray>np.empty(624, np.uint64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be reverted
This changes the return type and random stream on 32-bit platforms, which can be considered backward incompatible. I'd suggest to continue instead the approach in the other PR --- adding a dtype parameter seems a better way to do this. |
Adding the |
@pv: Reverted more of the code, and Travis is still happy. What are your thoughts about what I proposed for the PR organization? |
You can't change Making I do apologize, but this is unfortunately not the most new-contributor-friendly part of the code to start with. There are subtle and strict backwards-compatibility requirements that are unique to the PRNG subsystem. |
@rkern : No worries. Given the subtleties though you have brought up, it would have been preferable if they had been brought up earlier. That may have avoided a lot of the confusion that transpired over the past week or so with this PR. |
My alternative idea was to rename the functions to describe what they are doing as a result of my changes (e.g. It also occurs to me (and perhaps this is a stupid question, but I haven't found a convincing answer online): why is it not possible to run the 32-bit Mersenne Twister algorithm with 64-bit integers? As long as the internal values are constrained to having the upper 32 bits = 0, I think the numbers generated will be identical because it's shifting / bit-wise operations with numbers whose values also have the upper 32 bits = 0? I do believe you are correct in saying that |
It is possible to modify the MT code that way, but it doesn't solve any problem. It won't make As for the cross-platformness, this is what I meant by "subtle" backwards compatibility requirements. If the code currently raises an exception with certain inputs, then we do allow changes that will make those inputs work. |
Could you define what you mean by "large"? Actually, as the current implementation stands, you actually cannot generate numbers up to Actually, in light of this, I don't even think my changes will suffice to take in inputs like as large as |
I meant that it doesn't change anything with respect to the problem that you are trying to fix. The types internally used by the core PRNG are simply irrelevant. The problem that you seem to want to fix can be fixed solely within the confines of By "large", I meant |
Okay, that makes sense. Building out new functions should not be too bad then I think since it is essentially boilerplate with different types ( |
@rkern : Refactored with the new methods, and Travis is happy. |
Addresses bug in random integer generation in which the bounds for the generation are stored as longs. For 32-bit Python, longs are only 32-bit. Random integer generation has to acommodate integers up to 64-bit. Thus, if a 33-bit integer is passed in for one of the bounds, an OverflowError is thrown. This commit adds '64-bit' versions of relevant random integer generation methods that are called if the original randint method overflows. Closes gh-6812.
☔ The latest upstream changes (presumably #6885) made this pull request unmergeable. Please resolve the merge conflicts. |
Whoops! Did not know that. 😄 Though the comment does seem relevant nonetheless given the discussion going on. |
@gfyoung I'll be adding tests at the limits for all the types, so that will get covered in the process. |
@charris : Oh, okay. Cool, that works. Thanks! |
Addresses enhancement requested in #6812 in which attempts to generate random, large 64-bit integers was causing overflow issues in Windows. This PR allows such generation to be done.