-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Replaced RandomState with Generator compatible calls #22271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
RandomState.random_sample
calls to RandomState.uniform
RandomState.random_sample
calls to RandomState.uniform
RandomState.random_sample
calls to RandomState.uniform
RandomState.random_sample
calls to RandomState.uniform
There looks to be a few more for grep -r "random_sample" sklearn --exclude="*tests*" --exclude="*.pyc" \
--exclude="*.pxi" --exclude="*.c" --exclude="*.html" --exclude="*.so" -in Resultssklearn/ensemble/_gb.py:36:from ._gradient_boosting import _random_sample_mask
sklearn/ensemble/_gb.py:660: sample_mask = _random_sample_mask(n_samples, n_inbag, random_state)
sklearn/ensemble/_gradient_boosting.pyx:237:def _random_sample_mask(np.npy_intp n_total_samples,
sklearn/cluster/_kmeans.py:212: rand_vals = random_state.random_sample(n_local_trials) * current_pot
sklearn/multiclass.py:1058: self.code_book_ = random_state.random_sample((n_classes, code_size_))
sklearn/neural_network/_rbm.py:199: return rng.random_sample(size=p.shape) < p
sklearn/neural_network/_rbm.py:220: return rng.random_sample(size=p.shape) < p
sklearn/neighbors/_kde.py:108: >>> X = rng.random_sample((100, 3)) For grep -r "randn" sklearn --exclude="*tests*" --exclude="*.pyc" \
--exclude="*.pxi" --exclude="*.c" --exclude="*.html" --exclude="*.so" -in Resultssklearn/cluster/_affinity_propagation.py:174: ) * random_state.randn(n_samples, n_samples)
sklearn/datasets/_samples_generator.py:233: X[:, :n_informative] = generator.randn(n_samples, n_informative)
sklearn/datasets/_samples_generator.py:262: X[:, -n_useless:] = generator.randn(n_samples, n_useless)
sklearn/datasets/_samples_generator.py:598: X = generator.randn(n_samples, n_features)
sklearn/datasets/_samples_generator.py:1025: + noise * generator.randn(n_samples)
sklearn/datasets/_samples_generator.py:1091: ) ** 0.5 + noise * generator.randn(n_samples)
sklearn/datasets/_samples_generator.py:1156: ) + noise * generator.randn(n_samples)
sklearn/datasets/_samples_generator.py:1221: u, _ = linalg.qr(generator.randn(n_samples, n), mode="economic", check_finite=False)
sklearn/datasets/_samples_generator.py:1223: generator.randn(n_features, n), mode="economic", check_finite=False
sklearn/datasets/_samples_generator.py:1283: D = generator.randn(n_features, n_components)
sklearn/datasets/_samples_generator.py:1292: X[idx, i] = generator.randn(n_nonzero_coefs)
sklearn/datasets/_samples_generator.py:1522: X += noise * generator.randn(3, n_samples)
sklearn/datasets/_samples_generator.py:1564: X += noise * generator.randn(3, n_samples)
sklearn/linear_model/_quantile.py:92: >>> y = rng.randn(n_samples)
sklearn/linear_model/_quantile.py:93: >>> X = rng.randn(n_samples, n_features)
sklearn/linear_model/_sag.py:216: >>> X = rng.randn(n_samples, n_features)
sklearn/linear_model/_sag.py:217: >>> y = rng.randn(n_samples)
sklearn/linear_model/_ridge.py:971: >>> y = rng.randn(n_samples)
sklearn/linear_model/_ridge.py:972: >>> X = rng.randn(n_samples, n_features)
sklearn/linear_model/_stochastic_gradient.py:1890: >>> y = rng.randn(n_samples)
sklearn/linear_model/_stochastic_gradient.py:1891: >>> X = rng.randn(n_samples, n_features)
sklearn/kernel_ridge.py:126: >>> y = rng.randn(n_samples)
sklearn/kernel_ridge.py:127: >>> X = rng.randn(n_samples, n_features)
sklearn/utils/estimator_checks.py:171: X = rng.randn(10, 5)
sklearn/feature_selection/_mutual_info.py:293: 1e-10 * means * rng.randn(n_samples, np.sum(continuous_mask))
sklearn/feature_selection/_mutual_info.py:298: y += 1e-10 * np.maximum(1, np.mean(np.abs(y))) * rng.randn(n_samples)
sklearn/svm/_classes.py:1196: >>> y = rng.randn(n_samples)
sklearn/svm/_classes.py:1197: >>> X = rng.randn(n_samples, n_features)
sklearn/svm/_classes.py:1388: >>> y = np.random.randn(n_samples)
sklearn/svm/_classes.py:1389: >>> X = np.random.randn(n_samples, n_features)
sklearn/manifold/_t_sne.py:994: X_embedded = 1e-4 * random_state.randn(n_samples, self.n_components).astype(
sklearn/manifold/_spectral_embedding.py:339: X = random_state.randn(laplacian.shape[0], n_components + 1)
sklearn/manifold/_spectral_embedding.py:370: X = random_state.randn(laplacian.shape[0], n_components + 1)
sklearn/mixture/_base.py:459: mean + rng.randn(sample, n_features) * np.sqrt(covariance)
sklearn/model_selection/_split.py:1006: >>> X = np.random.randn(12, 2)
sklearn/decomposition/_nmf.py:317: H = avg * rng.randn(n_components, n_features).astype(X.dtype, copy=False)
sklearn/decomposition/_nmf.py:318: W = avg * rng.randn(n_samples, n_components).astype(X.dtype, copy=False)
sklearn/decomposition/_nmf.py:372: W[W == 0] = abs(avg * rng.randn(len(W[W == 0])) / 100)
sklearn/decomposition/_nmf.py:373: H[H == 0] = abs(avg * rng.randn(len(H[H == 0])) / 100)
sklearn/neighbors/_nca.py:454: transformation = self.random_state_.randn(n_components, X.shape[1]) |
Thank you for that. I'm on windows and their string searching/parsing utilities aren't...great. Will update the PR |
How would I log this in the changelog? |
I do not think this needs a change log since there should be changes to the user. Title should be updated tho since the scope increased. (Test failing is unrelated, looking into it now) |
RandomState.random_sample
calls to RandomState.uniform
RandomState
-specific calls to equivalent calls that match signature with Generator
calls
RandomState
-specific calls to equivalent calls that match signature with Generator
callsRandomState
-specific calls to equivalent calls that match signature with Generator
calls
RandomState
-specific calls to equivalent calls that match signature with Generator
callsRandomState
-specific calls to equivalent calls that match signature with Generator
calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
For future reviewers, this change does not change any generated random numbers. It's more so to make NumPy Generators
easier to adopt.
-
For reference,
randn
->standard_normal
are the same call: https://github.com/numpy/numpy/blob/6077afd650a503034d0a8a5917bb9a5fa3f115fd/numpy/random/mtrand.pyx#L1243-L1246 -
rand
callsrandom_sample
: https://github.com/numpy/numpy/blob/6077afd650a503034d0a8a5917bb9a5fa3f115fd/numpy/random/mtrand.pyx#L1179-L1182 -
random_sample
->uniform
generate the same values as they have use the same underlyingC
code. Through testing, they are all the same:
import numpy as np
from numpy.testing import assert_allclose
for i in range(20):
rng1 = np.random.RandomState(i)
rng2 = np.random.RandomState(i)
for row, col in zip(range(0, 1000, 100), range(0, 1000, 100)):
x1 = rng1.random_sample((row, col))
x2 = rng2.uniform(size=(row, col))
assert_allclose(x1, x2)
The only difference is that uniform
expands it a little by allowing for low
and high
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LTGM. Thanks @Micky774 !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
RandomState
-specific calls to equivalent calls that match signature with Generator
calls
As a follow up PR, there are grep -r "rand(" sklearn --exclude="*tests*" --exclude="*.pyc" --exclude="*.cpp" --exclude="*.h" \
--exclude="*.pxi" --exclude="*.c" --exclude="*.html" --exclude="*.so" -in Results (Could be some false positives)sklearn/metrics/pairwise.py:1655: >>> X = np.random.RandomState(0).rand(5, 3)
sklearn/ensemble/_gradient_boosting.pyx:259: random_state.rand(n_total_samples)
sklearn/semi_supervised/_label_propagation.py:39:>>> random_unlabeled_points = rng.rand(len(iris.target)) < 0.3
sklearn/semi_supervised/_label_propagation.py:411: >>> random_unlabeled_points = rng.rand(len(iris.target)) < 0.3
sklearn/semi_supervised/_label_propagation.py:567: >>> random_unlabeled_points = rng.rand(len(iris.target)) < 0.3
sklearn/semi_supervised/_self_training.py:133: >>> random_unlabeled_points = rng.rand(iris.target.shape[0]) < 0.3
sklearn/datasets/_samples_generator.py:229: centroids *= generator.rand(n_clusters, 1)
sklearn/datasets/_samples_generator.py:230: centroids *= generator.rand(1, n_informative)
sklearn/datasets/_samples_generator.py:242: A = 2 * generator.rand(n_informative, n_informative) - 1
sklearn/datasets/_samples_generator.py:249: B = 2 * generator.rand(n_informative, n_redundant) - 1
sklearn/datasets/_samples_generator.py:257: indices = ((n - 1) * generator.rand(n_repeated) + 0.5).astype(np.intp)
sklearn/datasets/_samples_generator.py:266: flip_mask = generator.rand(n_samples) < flip_y
sklearn/datasets/_samples_generator.py:271: shift = (2 * generator.rand(n_features) - 1) * class_sep
sklearn/datasets/_samples_generator.py:275: scale = 1 + 100 * generator.rand(n_features)
sklearn/datasets/_samples_generator.py:394: p_c = generator.rand(n_classes)
sklearn/datasets/_samples_generator.py:397: p_w_c = generator.rand(n_features, n_classes)
sklearn/datasets/_samples_generator.py:412: c = np.searchsorted(cumulative_p_c, generator.rand(y_size - len(y)))
sklearn/datasets/_samples_generator.py:430: words = np.searchsorted(cumulative_p_w_sample, generator.rand(n_words))
sklearn/datasets/_samples_generator.py:614: ground_truth[:n_informative, :] = 100 * generator.rand(n_informative, n_targets)
sklearn/datasets/_samples_generator.py:1019: X = generator.rand(n_samples, n_features)
sklearn/datasets/_samples_generator.py:1082: X = generator.rand(n_samples, 4)
sklearn/datasets/_samples_generator.py:1147: X = generator.rand(n_samples, 4)
sklearn/datasets/_samples_generator.py:1377: A = generator.rand(n_dim, n_dim)
sklearn/datasets/_samples_generator.py:1379: X = np.dot(np.dot(U, 1.0 + np.diag(generator.rand(n_dim))), Vt)
sklearn/datasets/_samples_generator.py:1439: aux = random_state.rand(dim, dim)
sklearn/datasets/_samples_generator.py:1443: ) * random_state.rand(np.sum(aux > alpha))
sklearn/datasets/_samples_generator.py:1507: t = 1.5 * np.pi * (1 + 2 * generator.rand(n_samples))
sklearn/datasets/_samples_generator.py:1508: y = 21 * generator.rand(n_samples)
sklearn/datasets/_samples_generator.py:1515: parameters = generator.rand(2, n_samples) * np.array([[np.pi], [7]])
sklearn/datasets/_samples_generator.py:1558: t = 3 * np.pi * (generator.rand(1, n_samples) - 0.5)
sklearn/datasets/_samples_generator.py:1560: y = 2.0 * generator.rand(1, n_samples)
sklearn/random_projection.py:512: >>> X = rng.rand(25, 3000)
sklearn/random_projection.py:662: >>> X = rng.rand(25, 3000)
sklearn/utils/estimator_checks.py:801: X = rng.rand(40, 3)
sklearn/utils/estimator_checks.py:805: y = (4 * rng.rand(40)).astype(int)
sklearn/utils/estimator_checks.py:1091: X = _pairwise_estimator_convert_X(rng.rand(40, 10), estimator_orig)
sklearn/utils/random.py:92: class_probability_nz_norm.cumsum(), rng.rand(nnz)
sklearn/manifold/_mds.py:87: X = random_state.rand(n_samples * n_components)
sklearn/mixture/_base.py:151: resp = random_state.rand(n_samples, self.n_components)
sklearn/decomposition/_truncated_svd.py:143: >>> X_dense = np.random.rand(100, 100) |
This is a great first step for Generator, thanks! |
Reference Issues/PRs
Issue #20669
Towards to #16988
What does this implement/fix? Explain your changes.
Per #20669 it is discussed that we will likely need an eventual change towards adopting NumPy's
Generator
interface instead of remaining onRandomState
. To ease such a transition, this PR replacesRandomState.random_sample
calls with their correspondingRandomState.uniform
to match theGenerator.uniform
syntax, preserving functionality but allowing for a drop-in replacement of the underlying object without syntax errors.Any other comments?
Thank you to @thomasjpfan for the direction and guidance for this PR.
In working on this PR, I also looked similar changing RandomState methods to equivalent methods w/ overlapping names/signatures between the two interfaces, namely:
randn -> standard_normal
,random_sample, rand -> uniform
.