Skip to content

Support numpy.random.Generator and/or BitGenerator for random number generation #16988

Open
@grisaitis

Description

@grisaitis

Describe the workflow you want to enable

I'd like to use a Generator or BitGenerator with scikit-learn where I'd otherwise use RandomState or a seed int.

For example:

import numpy as np

bit_generator = np.random.PCG64(seed=0)
generator = np.random.Generator(bit_generator)

and then use this for random_state= in scikit-learn:

from sklearn.datasets import make_classification
from sklearn.model_selection import ShuffleSplit
from sklearn.svm import LinearSVC

X, y = make_classification(random_state=generator)  # or my bit_generator here 
classifier = LinearSVC(random_state=generator)
cv = ShuffleSplit(random_state=generator)

This fails because these methods expect a RandomState object or int seed value. The specific trigger is check_random_state(random_state).

Describe your proposed solution

This would require:

  • changing code to allow Generator or BitGenerator as acceptable values for random_state=.. in every function and class constructor that accepts random_state.
  • change check_random_state() to allow Generator and/or BitGenerator objects.
  • adding tests for using Generator or BitGenerator with classes or functions that consume random_state (similar to seed int or RandomState objects already)
  • change any internal code that assumes RandomState methods that aren't available with Generator (e.g. rand, randn, see )
  • maybe switch to using Generator instead of RandomState by default, when seed int is given

Describe alternatives you've considered, if relevant

The scope could include either or both of BitGenerator or Generator.

It might be easiest to allow only BitGenerator, and not Generator.

  • This allows flexibility.
    • Users have control over seed and PRNG algorithm.
  • This is easier to implement (can be treated just like a seed int value).
    • BitGenerator can be given to RandomState, and I think it then produces the same values as Generator.

Additional context

NumPy v1.17 added the numpy.random.Generator (docs) interface for random number generation.

Overview:

  • Generator is similar to RandomState, but enables different PRNG algorithms
  • BitGenerator (docs) encapsulates the PRNG and seed value, e.g. PCG64(seed=0)
  • RandomState "is considered frozen" and uses "the slow Mersenne Twister" by default (docs)
  • RandomState can work with non-Mersenne BitGenerator objects
  • More info in NEP-19, the design document from NumPy.

The API for Generator and BitGenerator looks like:

from numpy import random

bit_generator = random.PCG64(seed=0)  # PCG64 is a BitGenerator subclass
generator = random.Generator(bit_generator)

generator.uniform(...)  # API is similar to RandomState

# there's also this, for making a PCG64-backed Generator
generator = random.default_rng(seed=0)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions