Open
Description
Describe the workflow you want to enable
I'd like to use a Generator
or BitGenerator
with scikit-learn where I'd otherwise use RandomState
or a seed
int.
For example:
import numpy as np
bit_generator = np.random.PCG64(seed=0)
generator = np.random.Generator(bit_generator)
and then use this for random_state=
in scikit-learn:
from sklearn.datasets import make_classification
from sklearn.model_selection import ShuffleSplit
from sklearn.svm import LinearSVC
X, y = make_classification(random_state=generator) # or my bit_generator here
classifier = LinearSVC(random_state=generator)
cv = ShuffleSplit(random_state=generator)
This fails because these methods expect a RandomState
object or int
seed value. The specific trigger is check_random_state(random_state)
.
Describe your proposed solution
This would require:
- changing code to allow
Generator
orBitGenerator
as acceptable values forrandom_state=..
in every function and class constructor that acceptsrandom_state
. - change
check_random_state()
to allowGenerator
and/orBitGenerator
objects. - adding tests for using
Generator
orBitGenerator
with classes or functions that consumerandom_state
(similar toseed
int orRandomState
objects already) - change any internal code that assumes
RandomState
methods that aren't available withGenerator
(e.g.rand
,randn
, see ) - maybe switch to using
Generator
instead ofRandomState
by default, when seed int is given
Describe alternatives you've considered, if relevant
The scope could include either or both of BitGenerator
or Generator
.
It might be easiest to allow only BitGenerator
, and not Generator
.
- This allows flexibility.
- Users have control over seed and PRNG algorithm.
- This is easier to implement (can be treated just like a
seed
int value).BitGenerator
can be given toRandomState
, and I think it then produces the same values asGenerator
.
Additional context
NumPy v1.17 added the numpy.random.Generator
(docs) interface for random number generation.
Overview:
Generator
is similar toRandomState
, but enables different PRNG algorithmsBitGenerator
(docs) encapsulates the PRNG and seed value, e.g.PCG64(seed=0)
RandomState
"is considered frozen" and uses "the slow Mersenne Twister" by default (docs)RandomState
can work with non-MersenneBitGenerator
objects- More info in NEP-19, the design document from NumPy.
The API for Generator
and BitGenerator
looks like:
from numpy import random
bit_generator = random.PCG64(seed=0) # PCG64 is a BitGenerator subclass
generator = random.Generator(bit_generator)
generator.uniform(...) # API is similar to RandomState
# there's also this, for making a PCG64-backed Generator
generator = random.default_rng(seed=0)