ENH Replaced RandomState with Generator compatible calls #22271

Micky774 · 2022-01-22T20:59:14Z

Reference Issues/PRs

Issue #20669
Towards to #16988

What does this implement/fix? Explain your changes.

Per #20669 it is discussed that we will likely need an eventual change towards adopting NumPy's Generator interface instead of remaining on RandomState. To ease such a transition, this PR replaces RandomState.random_sample calls with their corresponding RandomState.uniform to match the Generator.uniform syntax, preserving functionality but allowing for a drop-in replacement of the underlying object without syntax errors.

Any other comments?

Thank you to @thomasjpfan for the direction and guidance for this PR.

In working on this PR, I also looked similar changing RandomState methods to equivalent methods w/ overlapping names/signatures between the two interfaces, namely: randn -> standard_normal, random_sample, rand -> uniform.

thomasjpfan · 2022-01-25T17:17:55Z

There looks to be a few more for random_sample. You can run: (Some false positives)

grep -r "random_sample" sklearn --exclude="*tests*" --exclude="*.pyc" \
    --exclude="*.pxi" --exclude="*.c"  --exclude="*.html" --exclude="*.so" -in

Results

sklearn/ensemble/_gb.py:36:from ._gradient_boosting import _random_sample_mask
sklearn/ensemble/_gb.py:660:                sample_mask = _random_sample_mask(n_samples, n_inbag, random_state)
sklearn/ensemble/_gradient_boosting.pyx:237:def _random_sample_mask(np.npy_intp n_total_samples,
sklearn/cluster/_kmeans.py:212:        rand_vals = random_state.random_sample(n_local_trials) * current_pot
sklearn/multiclass.py:1058:        self.code_book_ = random_state.random_sample((n_classes, code_size_))
sklearn/neural_network/_rbm.py:199:        return rng.random_sample(size=p.shape) < p
sklearn/neural_network/_rbm.py:220:        return rng.random_sample(size=p.shape) < p
sklearn/neighbors/_kde.py:108:    >>> X = rng.random_sample((100, 3))

For randn: (Some false positives)

grep -r "randn" sklearn --exclude="*tests*" --exclude="*.pyc" \
    --exclude="*.pxi" --exclude="*.c"  --exclude="*.html" --exclude="*.so" -in

Results

sklearn/cluster/_affinity_propagation.py:174:    ) * random_state.randn(n_samples, n_samples)
sklearn/datasets/_samples_generator.py:233:    X[:, :n_informative] = generator.randn(n_samples, n_informative)
sklearn/datasets/_samples_generator.py:262:        X[:, -n_useless:] = generator.randn(n_samples, n_useless)
sklearn/datasets/_samples_generator.py:598:        X = generator.randn(n_samples, n_features)
sklearn/datasets/_samples_generator.py:1025:        + noise * generator.randn(n_samples)
sklearn/datasets/_samples_generator.py:1091:    ) ** 0.5 + noise * generator.randn(n_samples)
sklearn/datasets/_samples_generator.py:1156:    ) + noise * generator.randn(n_samples)
sklearn/datasets/_samples_generator.py:1221:    u, _ = linalg.qr(generator.randn(n_samples, n), mode="economic", check_finite=False)
sklearn/datasets/_samples_generator.py:1223:        generator.randn(n_features, n), mode="economic", check_finite=False
sklearn/datasets/_samples_generator.py:1283:    D = generator.randn(n_features, n_components)
sklearn/datasets/_samples_generator.py:1292:        X[idx, i] = generator.randn(n_nonzero_coefs)
sklearn/datasets/_samples_generator.py:1522:    X += noise * generator.randn(3, n_samples)
sklearn/datasets/_samples_generator.py:1564:    X += noise * generator.randn(3, n_samples)
sklearn/linear_model/_quantile.py:92:    >>> y = rng.randn(n_samples)
sklearn/linear_model/_quantile.py:93:    >>> X = rng.randn(n_samples, n_features)
sklearn/linear_model/_sag.py:216:    >>> X = rng.randn(n_samples, n_features)
sklearn/linear_model/_sag.py:217:    >>> y = rng.randn(n_samples)
sklearn/linear_model/_ridge.py:971:    >>> y = rng.randn(n_samples)
sklearn/linear_model/_ridge.py:972:    >>> X = rng.randn(n_samples, n_features)
sklearn/linear_model/_stochastic_gradient.py:1890:    >>> y = rng.randn(n_samples)
sklearn/linear_model/_stochastic_gradient.py:1891:    >>> X = rng.randn(n_samples, n_features)
sklearn/kernel_ridge.py:126:    >>> y = rng.randn(n_samples)
sklearn/kernel_ridge.py:127:    >>> X = rng.randn(n_samples, n_features)
sklearn/utils/estimator_checks.py:171:    X = rng.randn(10, 5)
sklearn/feature_selection/_mutual_info.py:293:            1e-10 * means * rng.randn(n_samples, np.sum(continuous_mask))
sklearn/feature_selection/_mutual_info.py:298:        y += 1e-10 * np.maximum(1, np.mean(np.abs(y))) * rng.randn(n_samples)
sklearn/svm/_classes.py:1196:    >>> y = rng.randn(n_samples)
sklearn/svm/_classes.py:1197:    >>> X = rng.randn(n_samples, n_features)
sklearn/svm/_classes.py:1388:    >>> y = np.random.randn(n_samples)
sklearn/svm/_classes.py:1389:    >>> X = np.random.randn(n_samples, n_features)
sklearn/manifold/_t_sne.py:994:            X_embedded = 1e-4 * random_state.randn(n_samples, self.n_components).astype(
sklearn/manifold/_spectral_embedding.py:339:        X = random_state.randn(laplacian.shape[0], n_components + 1)
sklearn/manifold/_spectral_embedding.py:370:            X = random_state.randn(laplacian.shape[0], n_components + 1)
sklearn/mixture/_base.py:459:                    mean + rng.randn(sample, n_features) * np.sqrt(covariance)
sklearn/model_selection/_split.py:1006:    >>> X = np.random.randn(12, 2)
sklearn/decomposition/_nmf.py:317:        H = avg * rng.randn(n_components, n_features).astype(X.dtype, copy=False)
sklearn/decomposition/_nmf.py:318:        W = avg * rng.randn(n_samples, n_components).astype(X.dtype, copy=False)
sklearn/decomposition/_nmf.py:372:        W[W == 0] = abs(avg * rng.randn(len(W[W == 0])) / 100)
sklearn/decomposition/_nmf.py:373:        H[H == 0] = abs(avg * rng.randn(len(H[H == 0])) / 100)
sklearn/neighbors/_nca.py:454:                transformation = self.random_state_.randn(n_components, X.shape[1])

Micky774 · 2022-01-25T17:20:37Z

Thank you for that. I'm on windows and their string searching/parsing utilities aren't...great. Will update the PR

Micky774 · 2022-01-25T18:25:11Z

How would I log this in the changelog?

thomasjpfan · 2022-01-25T18:46:29Z

How would I log this in the changelog?

I do not think this needs a change log since there should be changes to the user. Title should be updated tho since the scope increased.

(Test failing is unrelated, looking into it now)

thomasjpfan

LGTM!

For future reviewers, this change does not change any generated random numbers. It's more so to make NumPy Generators easier to adopt.

For reference, randn -> standard_normal are the same call: https://github.com/numpy/numpy/blob/6077afd650a503034d0a8a5917bb9a5fa3f115fd/numpy/random/mtrand.pyx#L1243-L1246
rand calls random_sample: https://github.com/numpy/numpy/blob/6077afd650a503034d0a8a5917bb9a5fa3f115fd/numpy/random/mtrand.pyx#L1179-L1182
random_sample -> uniform generate the same values as they have use the same underlying C code. Through testing, they are all the same:

import numpy as np
from numpy.testing import assert_allclose

for i in range(20):
    rng1 = np.random.RandomState(i)
    rng2 = np.random.RandomState(i)

    for row, col in zip(range(0, 1000, 100), range(0, 1000, 100)):
        x1 = rng1.random_sample((row, col))
        x2 = rng2.uniform(size=(row, col))
        assert_allclose(x1, x2)

The only difference is that uniform expands it a little by allowing for low and high.

jeremiedbb

LTGM. Thanks @Micky774 !

thomasjpfan

LGTM

thomasjpfan · 2022-01-28T20:54:45Z

As a follow up PR, there are rand -> uniform that needs to be updated too:

grep -r "rand(" sklearn --exclude="*tests*" --exclude="*.pyc" --exclude="*.cpp" --exclude="*.h" \
    --exclude="*.pxi" --exclude="*.c"  --exclude="*.html" --exclude="*.so" -in

Results (Could be some false positives)

sklearn/metrics/pairwise.py:1655:    >>> X = np.random.RandomState(0).rand(5, 3)
sklearn/ensemble/_gradient_boosting.pyx:259:          random_state.rand(n_total_samples)
sklearn/semi_supervised/_label_propagation.py:39:>>> random_unlabeled_points = rng.rand(len(iris.target)) < 0.3
sklearn/semi_supervised/_label_propagation.py:411:    >>> random_unlabeled_points = rng.rand(len(iris.target)) < 0.3
sklearn/semi_supervised/_label_propagation.py:567:    >>> random_unlabeled_points = rng.rand(len(iris.target)) < 0.3
sklearn/semi_supervised/_self_training.py:133:    >>> random_unlabeled_points = rng.rand(iris.target.shape[0]) < 0.3
sklearn/datasets/_samples_generator.py:229:        centroids *= generator.rand(n_clusters, 1)
sklearn/datasets/_samples_generator.py:230:        centroids *= generator.rand(1, n_informative)
sklearn/datasets/_samples_generator.py:242:        A = 2 * generator.rand(n_informative, n_informative) - 1
sklearn/datasets/_samples_generator.py:249:        B = 2 * generator.rand(n_informative, n_redundant) - 1
sklearn/datasets/_samples_generator.py:257:        indices = ((n - 1) * generator.rand(n_repeated) + 0.5).astype(np.intp)
sklearn/datasets/_samples_generator.py:266:        flip_mask = generator.rand(n_samples) < flip_y
sklearn/datasets/_samples_generator.py:271:        shift = (2 * generator.rand(n_features) - 1) * class_sep
sklearn/datasets/_samples_generator.py:275:        scale = 1 + 100 * generator.rand(n_features)
sklearn/datasets/_samples_generator.py:394:    p_c = generator.rand(n_classes)
sklearn/datasets/_samples_generator.py:397:    p_w_c = generator.rand(n_features, n_classes)
sklearn/datasets/_samples_generator.py:412:            c = np.searchsorted(cumulative_p_c, generator.rand(y_size - len(y)))
sklearn/datasets/_samples_generator.py:430:        words = np.searchsorted(cumulative_p_w_sample, generator.rand(n_words))
sklearn/datasets/_samples_generator.py:614:    ground_truth[:n_informative, :] = 100 * generator.rand(n_informative, n_targets)
sklearn/datasets/_samples_generator.py:1019:    X = generator.rand(n_samples, n_features)
sklearn/datasets/_samples_generator.py:1082:    X = generator.rand(n_samples, 4)
sklearn/datasets/_samples_generator.py:1147:    X = generator.rand(n_samples, 4)
sklearn/datasets/_samples_generator.py:1377:    A = generator.rand(n_dim, n_dim)
sklearn/datasets/_samples_generator.py:1379:    X = np.dot(np.dot(U, 1.0 + np.diag(generator.rand(n_dim))), Vt)
sklearn/datasets/_samples_generator.py:1439:    aux = random_state.rand(dim, dim)
sklearn/datasets/_samples_generator.py:1443:    ) * random_state.rand(np.sum(aux > alpha))
sklearn/datasets/_samples_generator.py:1507:        t = 1.5 * np.pi * (1 + 2 * generator.rand(n_samples))
sklearn/datasets/_samples_generator.py:1508:        y = 21 * generator.rand(n_samples)
sklearn/datasets/_samples_generator.py:1515:        parameters = generator.rand(2, n_samples) * np.array([[np.pi], [7]])
sklearn/datasets/_samples_generator.py:1558:    t = 3 * np.pi * (generator.rand(1, n_samples) - 0.5)
sklearn/datasets/_samples_generator.py:1560:    y = 2.0 * generator.rand(1, n_samples)
sklearn/random_projection.py:512:    >>> X = rng.rand(25, 3000)
sklearn/random_projection.py:662:    >>> X = rng.rand(25, 3000)
sklearn/utils/estimator_checks.py:801:    X = rng.rand(40, 3)
sklearn/utils/estimator_checks.py:805:    y = (4 * rng.rand(40)).astype(int)
sklearn/utils/estimator_checks.py:1091:    X = _pairwise_estimator_convert_X(rng.rand(40, 10), estimator_orig)
sklearn/utils/random.py:92:                class_probability_nz_norm.cumsum(), rng.rand(nnz)
sklearn/manifold/_mds.py:87:        X = random_state.rand(n_samples * n_components)
sklearn/mixture/_base.py:151:            resp = random_state.rand(n_samples, self.n_components)
sklearn/decomposition/_truncated_svd.py:143:    >>> X_dense = np.random.rand(100, 100)

lorentzenchr · 2022-02-01T14:49:23Z

This is a great first step for Generator, thanks!

Replaced RandomState.random_sample calls to RandomState.uniform

a1d038e

Micky774 changed the title ~~Replaced RandomState.random_sample calls to RandomState.uniform~~ [WIP] Replaced RandomState.random_sample calls to RandomState.uniform Jan 22, 2022

Micky774 changed the title ~~[WIP] Replaced RandomState.random_sample calls to RandomState.uniform~~ Replaced RandomState.random_sample calls to RandomState.uniform Jan 24, 2022

Replaced random_sample with equivalent uniform

4aad618

github-actions bot added module:cluster module:neural_network labels Jan 25, 2022

Replaced calls of RandomState.randn with RandomState.standard_normal

cfbf9a7

thomasjpfan added the No Changelog Needed label Jan 25, 2022

Micky774 changed the title ~~Replaced RandomState.random_sample calls to RandomState.uniform~~ Replaced RandomState-specific calls to equivalent calls that match signature with Generator calls Jan 25, 2022

thomasjpfan changed the title ~~Replaced RandomState-specific calls to equivalent calls that match signature with Generator calls~~ MAINT Replaced RandomState-specific calls to equivalent calls that match signature with Generator calls Jan 25, 2022

thomasjpfan changed the title ~~MAINT Replaced RandomState-specific calls to equivalent calls that match signature with Generator calls~~ ENH Replaced RandomState-specific calls to equivalent calls that match signature with Generator calls Jan 25, 2022

thomasjpfan reviewed Jan 25, 2022

View reviewed changes

Merge branch 'main' into random_prep

cfda967

jeremiedbb approved these changes Jan 28, 2022

View reviewed changes

thomasjpfan approved these changes Jan 28, 2022

View reviewed changes

thomasjpfan changed the title ~~ENH Replaced RandomState-specific calls to equivalent calls that match signature with Generator calls~~ ENH Replaced RandomState with Generator compatible calls Jan 28, 2022

thomasjpfan merged commit 254ea8c into scikit-learn:main Jan 28, 2022

Micky774 mentioned this pull request Jan 28, 2022

ENH Replaced RandomState.rand with equivalent uniform #22327

Merged

thomasjpfan mentioned this pull request May 13, 2022

Path for Adopting the Array API spec #22352

Open

jakirkham mentioned this pull request May 25, 2022

[FEA] Accept NumPy (and CuPy) RandomState objects as estimator random_state rapidsai/cuml#4753

Closed

oyamad mentioned this pull request Nov 23, 2022

ENH: check_random_state: Accept np.random.Generator QuantEcon/QuantEcon.py#654

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Replaced RandomState with Generator compatible calls #22271

ENH Replaced RandomState with Generator compatible calls #22271

Micky774 commented Jan 22, 2022 •

edited by thomasjpfan

Loading

thomasjpfan commented Jan 25, 2022 •

edited

Loading

Micky774 commented Jan 25, 2022

Micky774 commented Jan 25, 2022

thomasjpfan commented Jan 25, 2022 •

edited

Loading

thomasjpfan left a comment •

edited

Loading

jeremiedbb left a comment

thomasjpfan left a comment

thomasjpfan commented Jan 28, 2022

lorentzenchr commented Feb 1, 2022

ENH Replaced RandomState with Generator compatible calls #22271

ENH Replaced RandomState with Generator compatible calls #22271

Conversation

Micky774 commented Jan 22, 2022 • edited by thomasjpfan Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

thomasjpfan commented Jan 25, 2022 • edited Loading

Micky774 commented Jan 25, 2022

Micky774 commented Jan 25, 2022

thomasjpfan commented Jan 25, 2022 • edited Loading

thomasjpfan left a comment • edited Loading

Choose a reason for hiding this comment

jeremiedbb left a comment

Choose a reason for hiding this comment

thomasjpfan left a comment

Choose a reason for hiding this comment

thomasjpfan commented Jan 28, 2022

lorentzenchr commented Feb 1, 2022

Micky774 commented Jan 22, 2022 •

edited by thomasjpfan

Loading

thomasjpfan commented Jan 25, 2022 •

edited

Loading

thomasjpfan commented Jan 25, 2022 •

edited

Loading

thomasjpfan left a comment •

edited

Loading