Skip to content

Wrong value for the sum of p in np.random.choice #13713

Closed
@glemaitre

Description

@glemaitre

It seems that there is a regression after the merging of #13163

We had the case in scikit-learn where the parameter p passed to np.random.choice is a view on a 2D C-contiguous array. In the previous version, p was surely contiguous in memory:

p = <ndarray>PyArray_ContiguousFromObject(p, NPY_DOUBLE, 1, 1)

This has changed with the following:

p = <np.ndarray>np.PyArray_FROM_OTF(p, np.NPY_DOUBLE, np.NPY_ALIGNED)

However, no flag enforcing the contiguity is passed. It implies a failure when computing the kahan_sum which expect a contiguous array.

I assume that it should be possible to pass the NPY_ARRAY_C_CONTIGUOUS flag when calling the PyArray_FROM_OTF function.

Reproducing code example:

import numpy as np                                                                   
                                                                                     
p = np.repeat(np.array([[0.1, 0, 0.3, 0.6, 0]]).T, 3, axis=1)                        
x = np.random.choice(5, 3, p=p[:, 1])                                                
print(x)

Error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/tmp.py in <module>
      3 p = np.repeat(np.array([[0.1, 0, 0.3, 0.6, 0]]).T, 3, axis=1)
      4 
----> 5 x = np.random.choice(5, 3, p=p[:, 1])
      6 print(x)

mtrand.pyx in numpy.random.mtrand.RandomState.choice()

ValueError: probabilities do not sum to 1

Numpy/Python version information:

In [12]: np.__version__                                                              
Out[12]: '1.17.0.dev0+495de50'

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions