-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Lack of clarity in RandomState compatibility guarantee #8771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@rkern probably knows this best. I guess you are probably right, but doubt there is much you can do about it (except trying to document details). I don't know if e.g. |
The sequences of floats, integers, etc. produced by the base rng should be the same assuming that division conforms to the ieee standard for ieee doubles. Platform variations in library functions such as I think you are correct that we cannot make quarantees cross platform, or even between library/compiler versions, when certain of the distributions are involved. |
To be a bit more precise, roundoff errors and small function differences will only make small differences in floating results if they are not involved in rejection methods. Differences in functions involved in rejection methods can change the whole following sequence, but that is unlikely unless a very large number of trials are made. We do have tests for repeatability and I'm not aware of any failures at this time. |
There's one other source of cross-platform wonkiness that is somewhat easier to realize: internal integer values are C |
Possible, definitely: a single ulp difference in a pow result could be enough to cause a rejection-method inner loop to go for one more iteration, and such single ulp differences are very common: there's wide variation in I agree that you'd have to be very unlucky to run into any of these cases by accident, so that in practice the guarantee is likely to hold almost always. So perhaps my objection can be dismissed under "practicality beats purity" rules. |
Yeah, there are two faces to the compatibility policy: what guarantees users can expect and what goals developers should strive towards. Being compatible "up to numerical rounding errors" across platforms is really an aspirational goal of the latter kind, not a firm guarantee of the former. I think I had the developer-goal in my head when I approved the language. And I was probably thinking of the transformation-type algorithms instead of the rejection-type ones. |
The multivariate case mentioned by @seberg is probably the most likely to cause problems as the factorization is only unique up to the sign of the eigenvectors if the singular values are distinct and even less unique if the covariance has degenerate eigenvalues, so differences in implementation of the svd that is used can have a big effect. I believe the Cholesky factorization would be more specific. |
Can we close this or do we need to clarify the documentation? |
Given that NEP 19 has been accepted, it looks like this issue can be closed. |
The docs for numpy.random.RandomState say:
Question: in this context, does "always" mean that this guarantee should apply across platforms and machines, or just that it should apply across runs on a single machine?
If the latter, then the "up to roundoff error" is probably unnecessary, which leads me to believe that the intent is that the guarantee should apply across platforms. But now I'm failing to see how it's possible to make such a guarantee: some of the sample generation methods use the rejection method, and so consume some (unknown in advance) number of random samples. The number of samples actually consumed may depend on floating-point and libm variations. A good example is the zipf distribution:
numpy/numpy/random/mtrand/distributions.c
Lines 720 to 742 in b94c2b0
Here, the number of calls to rk_double for a given
a
and random state may change depending on tiny floating-point differences in the result ofpow
(for example).Should the guarantee be restricted to some subset of the RandomState methods?
Related: #6180 (where the wording explicitly includes "regardless of platform"), #6405 (where it doesn't).
The text was updated successfully, but these errors were encountered: