-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Closed
Labels
Description
Describe the bug
_py_sort() returns ValueError with numpy 1.26.4 but works correctly with numpy 2.x. I have created 2 different conda envs with different numpy versions from conda-forge:
conda create -n numpy_1.26.4 numpy=1.26.4 scikit-learn=1.6.1 -c conda-forge --override-channels
and
conda create -n numpy_2 numpy=2 scikit-learn=1.6.1 -c conda-forge --override-channel
In each of the envs, I essentially reproduced https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/tree/tests/test_tree.py#L2820 test_sort_log2_build test that shows different behavior. This works correctly with numpy 2, but with numpy 1.26.4 it returns:
ValueError: Buffer dtype mismatch, expected 'intp_t' but got 'long'
Steps/Code to Reproduce
In fact, this is just a copy of test_sort_log2_build test:
>>> import numpy as np
>>> print(np.__version__)
1.26.4
>>> import sklearn
>>> print(sklearn.__version__)
1.6.1
>>> from sklearn.tree._partitioner import _py_sort
>>> rng = np.random.default_rng(75)
>>> some = rng.normal(loc=0.0, scale=10.0, size=10).astype(np.float32)
>>> feature_values = np.concatenate([some] * 5)
>>> samples = np.arange(50)
>>> _py_sort(feature_values, samples, 50)
Expected Results
>>> _py_sort(feature_values, samples, 50)
>>>
This is the normal behavior of the test in case numpy 2:
>>> import numpy as np
>>> print(np.__version__)
2.1.2
Actual Results
>>> _py_sort(feature_values, samples, 50)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "_partitioner.pyx", line 705, in sklearn.tree._partitioner._py_sort
ValueError: Buffer dtype mismatch, expected 'intp_t' but got 'long'
This behavior is reproduced if the test is run with numpy 1.26.4
Versions
>>> import sklearn
>>> print(sklearn.__version__)
1.6.1