You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Locally I see intermittent failures of the KMeans()-check_transformer_data_not_an_array test. I don't see this failures on 0.24.2.
One additional weird thing is that this is not happening in the CI and I seem to be the first to complain about it (at least I could not find it in the issues).
❯ pytest sklearn/tests/test_common.py -k 'KMeans and data_not_an_array'
================================================================================================================================== test session starts ===================================================================================================================================
platform linux -- Python 3.7.7, pytest-6.2.3, py-1.10.0, pluggy-0.13.1
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/lesteve/dev/scikit-learn/.hypothesis/examples')
rootdir: /home/lesteve/dev/scikit-learn, configfile: setup.cfg
plugins: hypothesis-4.36.2, asyncio-0.10.0, cov-2.7.1
collected 7785 items / 7783 deselected / 2 selected
sklearn/tests/test_common.py F. [100%]
======================================================================================================================================== FAILURES ========================================================================================================================================
_____________________________________________________________________________________________________________ test_estimators[KMeans()-check_transformer_data_not_an_array] ______________________________________________________________________________________________________________
estimator = KMeans(max_iter=5, n_clusters=2, n_init=2), check = functools.partial(<function check_transformer_data_not_an_array at 0x7fec5ceb7050>, 'KMeans'), request = <FixtureRequest for <Function test_estimators[KMeans()-check_transformer_data_not_an_array]>>
@parametrize_with_checks(list(_tested_estimators()))
def test_estimators(estimator, check, request):
# Common tests for estimator instances
with ignore_warnings(category=(FutureWarning,
ConvergenceWarning,
UserWarning, FutureWarning)):
_set_checking_parameters(estimator)
> check(estimator)
sklearn/tests/test_common.py:90:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/_testing.py:308: in wrapper
return fn(*args, **kwargs)
sklearn/utils/estimator_checks.py:1289: in check_transformer_data_not_an_array
_check_transformer(name, transformer, X, y)
sklearn/utils/estimator_checks.py:1366: in _check_transformer
% transformer, atol=1e-2)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
x = array([[0.20255037, 3.57452256],
[3.22900935, 0.23278803],
[3.2526988 , 0.33507256],
[0.34546063,...79101, 0.53812952],
[0.32877832, 3.45131422],
[0.19314137, 3.21278957],
[3.79826855, 0.41606372]])
y = array([[3.57452256, 0.20255037],
[0.23278803, 3.22900935],
[0.33507256, 3.2526988 ],
[3.23476868,...12952, 3.75679101],
[3.45131422, 0.32877832],
[3.21278957, 0.19314137],
[0.41606372, 3.79826855]]), rtol = 1e-07, atol = 0.01
err_msg = 'fit_transform and transform outcomes not consistent in KMeans(max_iter=5, n_clusters=2, n_init=2, random_state=0)'
def assert_allclose_dense_sparse(x, y, rtol=1e-07, atol=1e-9, err_msg=''):
"""Assert allclose for sparse and dense data.
Both x and y need to be either sparse or dense, they
can't be mixed.
Parameters
----------
x : {array-like, sparse matrix}
First array to compare.
y : {array-like, sparse matrix}
Second array to compare.
rtol : float, default=1e-07
relative tolerance; see numpy.allclose.
atol : float, default=1e-9
absolute tolerance; see numpy.allclose. Note that the default here is
more tolerant than the default for numpy.testing.assert_allclose, where
atol=0.
err_msg : str, default=''
Error message to raise.
"""
if sp.sparse.issparse(x) and sp.sparse.issparse(y):
x = x.tocsr()
y = y.tocsr()
x.sum_duplicates()
y.sum_duplicates()
assert_array_equal(x.indices, y.indices, err_msg=err_msg)
assert_array_equal(x.indptr, y.indptr, err_msg=err_msg)
assert_allclose(x.data, y.data, rtol=rtol, atol=atol, err_msg=err_msg)
elif not sp.sparse.issparse(x) and not sp.sparse.issparse(y):
# both dense
> assert_allclose(x, y, rtol=rtol, atol=atol, err_msg=err_msg)
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0.01
E fit_transform and transform outcomes not consistent in KMeans(max_iter=5, n_clusters=2, n_init=2, random_state=0)
E Mismatched elements: 60 / 60 (100%)
E Max absolute difference: 3.38712923
E Max relative difference: 24.28104678
E x: array([[0.20255 , 3.574523],
E [3.229009, 0.232788],
E [3.252699, 0.335073],...
E y: array([[3.574523, 0.20255 ],
E [0.232788, 3.229009],
E [0.335073, 3.252699],...
sklearn/utils/_testing.py:415: AssertionError
=============================================================================================================== 1 failed, 1 passed, 7783 deselected, 32 warnings in 2.69s ================================================================================================================
Steps/Code to Reproduce
I can reproduce this failure consistently with:
Create a test-kmeans.sh file:
#!/bin/bashset -e
conda create -n test python scipy cython pytest joblib threadpoolctl -y
conda activate test
pip install --no-build-isolation --editable .# make sure to run it a few times to trigger the test failureforiin$(seq 1 50);do
pytest sklearn/tests/test_common.py -k 'KMeans and data_not_an_array'done
Run test-kmeans.sh:
source test-kmeans.py
Expected Results
No test failure
Actual Results
Test failure
Other comments
Looking at bit at bit more, it seems that when calling KMeans.fit the cluster centers can be in a different order in main, whereas the order is consistent in 0.24.2. Wild-guess: maybe something due to some use of low-level parallelism in KMeans?
Describe the bug
Locally I see intermittent failures of the
KMeans()-check_transformer_data_not_an_array
test. I don't see this failures on 0.24.2.One additional weird thing is that this is not happening in the CI and I seem to be the first to complain about it (at least I could not find it in the issues).
Steps/Code to Reproduce
I can reproduce this failure consistently with:
Create a
test-kmeans.sh
file:Run
test-kmeans.sh
:Expected Results
No test failure
Actual Results
Test failure
Other comments
Looking at bit at bit more, it seems that when calling
KMeans.fit
the cluster centers can be in a different order in main, whereas the order is consistent in 0.24.2. Wild-guess: maybe something due to some use of low-level parallelism in KMeans?Versions
The text was updated successfully, but these errors were encountered: