Tests for sample order invariance in estimator_checks #8695

jnothman · 2017-04-03T23:45:15Z

While sample and feature order can have subtle effects on the model fit by an estimator, I think we should have common tests to ensure that reordering or subsampling X in predict or transform or score_samples or predict_proba or decision_function does not change the sample-wise output. That is:

idx = np.random.randint(X.shape[0], size=X.shape[0] // 2)
assert_array_equal(method(X)[idx], method(X[idx]))

Apologies if we already have such tests, but I can't see them (which is also an issue: we don't actually have a clear list of what is asserted by estimator_checks)

The text was updated successfully, but these errors were encountered:

jmcol · 2017-04-05T01:21:33Z

Hi! I would like to work on this, but I'm new to this project. I took a look at estimator_checks.py and there's a lot there. Would this be a completely new test or would this be added to test_check_estimator?

jnothman · 2017-04-05T01:41:08Z

You could add part of this to check_classifiers_train, but then would need the same for regressors, transformers and outlier detectors. The best way to do it is often more apparent after trying one way.

…

On 5 April 2017 at 11:21, Jeff Colfer ***@***.***> wrote: Hi! I would like to work on this, but I'm new to this project. I took a look at estimator_checks.py and there's a lot there. Would this be a completely new test or would this be added to test_check_estimator? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#8695 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6_pWr4LsKLCPTcfJaCi1j-Fo1b2Qks5rsuyfgaJpZM4MyOfs> .

… for sample invariance in predict_proba to ensure that reordering or subsampling \n does not change the sample-wide output \n \n Addresses: scikit-learn#8695 \n

Adds a check for sample order invariance for regressors in estimator_checks Adresses: scikit-learn#8695

jmcol · 2017-04-10T23:47:26Z

Would the test for transformers only be in check_transformer_general, or would it need to be present in all the check_transformer* functions? I'm also unsure how I would produce an estimator that would fail one of these tests. Also, I've been unable to find a straightforward way to build from source on windows. Do you (or anyone) have any recommendations?

Thanks so much!

Adds a simple check for sample order invariance in the predict function when using the estimator_cheks Addresses: scikit-learn#8695

jnothman · 2017-04-12T11:10:50Z

I'm sorry I know nothing about building on windows beyond what's in the docs.

An example of an estimator that fails on one of these tests is:

class Bad(BaseEstimator):
    def fit(self, X, y=None): return self
    def predict(self, X): return np.arange(len(X))

I'm not sure the best way to structure the tests in the current checking framework. Try one and we'll see if there's better when reviewing your pull request.

jmcol · 2017-04-14T01:05:47Z

Hmm this might not be the place to ask this, but I'm having a hard time building/testing. I created a VM on my windows machine and I'm not able to use make in the scikit-learn directory. I'm getting the following error

ERROR: Failure: ImportError (cannot import name _hierarchical)
Traceback (most recent call last):
File "/home/jcolfer/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
addr.filename, addr.module)
File "/home/jcolfer/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/home/jcolfer/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/jcolfer/scikit-learn/sklearn/tests/test_random_projection.py", line 14, in
from sklearn.utils.testing import assert_less
File "/home/jcolfer/scikit-learn/sklearn/utils/testing.py", line 61, in
from sklearn.cluster import DBSCAN
File "/home/jcolfer/scikit-learn/sklearn/cluster/init.py", line 10, in
from .hierarchical import (ward_tree, AgglomerativeClustering, linkage_tree,
File "/home/jcolfer/scikit-learn/sklearn/cluster/hierarchical.py", line 23, in
from . import _hierarchical
ImportError: cannot import name _hierarchical
Ran 348 tests in 1.328s

FAILED (errors=143)
Makefile:32: recipe for target 'test-code' failed

Building doesn't fail when I just type python setup.py install. However. I get a similar error about the hierarchical package when I try to execute the following code:

from sklearn.base import BaseEstimator
import numpy as np


class Bad(BaseEstimator):
    def fit(self, X, y=None): return self

    def predict(self, X): return np.arange(len(X))


check_estimator(Bad)

Traceback (most recent call last):
File "test_check_estimators.py", line 1, in
from sklearn.utils.estimator_checks import check_estimator
File "/home/jcolfer/anaconda2/lib/python2.7/site-packages/sklearn/utils/estimator_checks.py", line 16, in
from sklearn.utils.testing import assert_raises
File "/home/jcolfer/anaconda2/lib/python2.7/site-packages/sklearn/utils/testing.py", line 61, in
from sklearn.cluster import DBSCAN
File "/home/jcolfer/anaconda2/lib/python2.7/site-packages/sklearn/cluster/init.py", line 10, in
from .hierarchical import (ward_tree, AgglomerativeClustering, linkage_tree,
File "/home/jcolfer/anaconda2/lib/python2.7/site-packages/sklearn/cluster/hierarchical.py", line 23, in
from . import _hierarchical
File "sklearn/utils/fast_dict.pxd", line 22, in init sklearn.cluster._hierarchical (sklearn/cluster/_hierarchical.cpp:21043)
ImportError: /home/jcolfer/anaconda2/lib/python2.7/site-packages/sklearn/utils/fast_dict.so: undefined symbol: _ZTINSt8ios_base7failureB5cxx11E

I'm using Ubuntu 16.04.2 in a VMWare workstation VM on a 64 bit Intel system.

AishwaryaRK · 2017-08-26T19:31:00Z

@jnothman this issue seems open, I would like to contribute, can you please give me some pointers
to start with

jnothman · 2017-08-27T14:53:05Z

Get your head around sklearn/utils/estimator_checks.py for a start. Thanks!

AishwaryaRK · 2017-08-28T17:13:04Z

Thanks, I'm picking it up.

anhqngo · 2020-06-08T22:01:16Z

hey is someone working on this? Can I pick this up?

cmarmo · 2020-10-08T13:09:07Z

Fixed in #17598 via #18570.

jnothman added Easy Well-defined and straightforward way to resolve Need Contributor labels Apr 3, 2017

jmcol added a commit to jmcol/scikit-learn that referenced this issue Apr 7, 2017

test: Add sample order invariance check to regressors

8d8342a

Adds a check for sample order invariance for regressors in estimator_checks Adresses: scikit-learn#8695

jmcol added a commit to jmcol/scikit-learn that referenced this issue Apr 11, 2017

test: Adds invariance check to predict function

546fbb6

Adds a simple check for sample order invariance in the predict function when using the estimator_cheks Addresses: scikit-learn#8695

lesteve added help wanted and removed Need Contributor labels Oct 18, 2017

This was referenced Jun 10, 2020

Add sample order invariance to estimator_checks MLH-Fellowship/scikit-learn#1

Merged

TST Add sample order invariance to estimator_checks #17598

Closed

cmarmo removed the help wanted label Jun 16, 2020

cmarmo closed this as completed Oct 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests for sample order invariance in estimator_checks #8695

Tests for sample order invariance in estimator_checks #8695

jnothman commented Apr 3, 2017 •

edited

Loading

jmcol commented Apr 5, 2017

jnothman commented Apr 5, 2017 via email

jmcol commented Apr 10, 2017

jnothman commented Apr 12, 2017

jmcol commented Apr 14, 2017

AishwaryaRK commented Aug 26, 2017

jnothman commented Aug 27, 2017

AishwaryaRK commented Aug 28, 2017

anhqngo commented Jun 8, 2020

cmarmo commented Oct 8, 2020

Tests for sample order invariance in estimator_checks #8695

Tests for sample order invariance in estimator_checks #8695

Comments

jnothman commented Apr 3, 2017 • edited Loading

jmcol commented Apr 5, 2017

jnothman commented Apr 5, 2017 via email

jmcol commented Apr 10, 2017

jnothman commented Apr 12, 2017

jmcol commented Apr 14, 2017

AishwaryaRK commented Aug 26, 2017

jnothman commented Aug 27, 2017

AishwaryaRK commented Aug 28, 2017

anhqngo commented Jun 8, 2020

cmarmo commented Oct 8, 2020

jnothman commented Apr 3, 2017 •

edited

Loading