Skip to content

Support RandomState._bit_generator as input in check_random_state for users with numpy version >= 1.17.0 #20669

Open
@zoj613

Description

@zoj613

Describe the workflow you want to enable

Since numpy version 1.17.0, np.random.RandomState can accept the ._bit_generator attribute as input in the constructor. This can be a plus for those who use np.random.Generator in their code and want to use the same bitgenerator with sklearn's estimators. Currently this is not possible, see:

from sklearn.datasets import make_classification
from sklearn.manifold import TSNE

X, y = make_classification(n_samples=150, n_features=5, n_informative=5,
                           n_redundant=0, n_repeated=0, n_classes=3,
                           n_clusters_per_class=1,
                           weights=[0.01, 0.05, 0.94],
                           class_sep=0.8, random_state=0)

rng = np.random.default_rng(12345)
tsne = TSNE()
# some piece of code here
# then later we use our own rng to set the seed of `tsne`
# notice `_bit_generator` used here, which is compatible with RandomState
tsne.set_params(random_state=rng._bit_generator)
tsne.fit_transform(X, y)

this leads to the error:

  File "/home/python3.6/site-packages/sklearn/manifold/_t_sne.py", line 932, in fit_transform
    embedding = self._fit(X)
  File "/home/python3.6/site-packages/sklearn/manifold/_t_sne.py", line 728, in _fit
    random_state = check_random_state(self.random_state)
  File "/home/python3.6/site-packages/sklearn/utils/validation.py", line 944, in check_random_state
    ' instance' % seed)
ValueError: <numpy.random.pcg64.PCG64 object at 0x7ffa3ab471b8> cannot be used to seed a numpy.random.RandomState instance

Describe your proposed solution

I propose we add a conditional in check_random_state that supports an instance of BitGenerator, see:

def check_random_state(seed):
"""Turn seed into a np.random.RandomState instance
Parameters
----------
seed : None, int or instance of RandomState
If seed is None, return the RandomState singleton used by np.random.
If seed is an int, return a new RandomState instance seeded with seed.
If seed is already a RandomState instance, return it.
Otherwise raise ValueError.
"""
if seed is None or seed is np.random:
return np.random.mtrand._rand
if isinstance(seed, numbers.Integral):
return np.random.RandomState(seed)
if isinstance(seed, np.random.RandomState):
return seed
raise ValueError('%r cannot be used to seed a numpy.random.RandomState'
' instance' % seed)

something like

supported_bitgenerators = {'PCG64', 'SFC64', 'Philox', ...}

def check_random_state(seed):
	...
	if seed.__class__.__name__ in supported_bitgenerators:
		return np.random.RandomState(seed) # should work if numpy>=1.17.0
	...

Describe alternatives you've considered, if relevant

I know there is an issue regarding supporting the new numpy Generator interface but I feel this is slightly different since it does not attempt to replace RandomState.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions