MRG factorize common tests. #893

amueller · 2012-06-07T16:20:55Z

This is a shot at factorizing common tests for all estimators.
Any suggestions on what to test are very welcome :)

Closes #406.

I think I would like to merge the stuff I already did before going on with more tests. At the moment, everything passes.
There are some (minor) issues for going further, for example the fit_pairwise problem which prevents me from testing clusterings.

I feel there is already quite a lot in this PR and the longer it gets, the less motivation someone will have to review ;)

amueller · 2012-06-07T18:35:36Z

Btw I think this PR will work better with #803, as this enforces a more consistent API.

amueller · 2012-06-07T19:53:41Z

The following classifiers fail on iris with default params: SGDClassifier, Perceptron, BernoulliNB.
I tried to normalize the features with Scaler. That didn't help but then NearestCentroid also failed.
Oh and btw, I am talking about training set performance.

I am a bit surprised that SGDClassifier has problems. It seems a bit hard to tune.

ogrisel · 2012-06-08T09:21:41Z

What surprises me is that Perceptron and NearestCentroid fail. Both are more simpler than SGDClassifier with less hyperparameters to tweak.

amueller · 2012-06-08T09:23:04Z

Also true. NearestCentroid "only" fails when using Scaler... but also shouldn't. You're right.

amueller · 2012-06-08T09:24:08Z

I haven't looked into it this much, as there are also other tests failing.

mblondel · 2012-06-10T14:47:43Z

sklearn/tests/test_common.py

+            #test cloning
+            clone(e)
+            # test __repr__
+            print(e)


You can use the built-in function repr for that.

amueller · 2012-06-12T18:52:00Z

I have trouble testing the NB classifiers. Maybe @larsmans has some opinion on that. BernoulliNB by default thresholds at zero, so it can not handle non-negative data. MultinomialNB on the other hand assumes the entries to be counts, in particular non-negative. Neither raised any error when given arbitrary data.

My understanding is that MultiomialNB should raise an error on negative data.
I would like it if all estimators can handle zero mean data or raise an error if it doesn't make sense (as in the case of MultinomialNB).

mblondel · 2012-06-13T04:28:01Z

@amueller Shuffle the iris dataset first. SGDClassifier and Perceptron are sensitive to the shuffling and by default the iris dataset is not shuffled.

amueller · 2012-06-13T07:02:17Z

@mblondel Thanks. That one bit me already so often ;) I reverted the SGD default changes since it seems to work. It is quite unstable though, and I think adding some more iterations by default might be good. What do you think?

mblondel · 2012-06-13T07:23:15Z

You could add a list of estimators that must be handled specifically: this way you could add SGDClassifier with extra parameters.

amueller · 2012-06-13T08:03:34Z

Well, for the moment it works. My question was more: how sensible are the current defaults. Usually I use much more than 5 iterations. I guess these are good with huge, very redundant datasets. But would it hurt to do 20 iterations instead?

ogrisel · 2012-06-13T08:39:03Z

Ideally we should have a large number of iterations (e.g. 100) + some early stopping criterion based on a exponentially moving average of the test error measured from a validation test subsamples from the training set if not passed explicitly.

larsmans · 2012-06-13T10:12:03Z

2012/6/12 Andreas Mueller
reply@reply.github.com:

I have trouble testing the NB classifiers. Maybe @larsmans has some opinion on that. BernoulliNB by default thresholds at zero, so it can not handle non-negative data. MultinomialNB on the other hand assumes the entries to be counts, in particular non-negative. Neither raised any error when given arbitrary data.

Well, BernoulliNB can handle negative data, though it doesn't really
make much sense to feed it that. It wants booleans.

My understanding is that MultiomialNB should raise an error on negative data.

That's right. It didn't, so far, because our tf-idf implementation was
buggy and could produce negatives.

amueller · 2012-06-13T10:15:25Z

@larsmans Maybe I misread you or you misread me: I wanted it say BernoulliNB can not handle all positive data, since it thresholds at zero by default. It handles zero-mean data "reasonably".

About tf-idf: was that fixed? I remember some PR hanging around a while ago.

amueller · 2012-06-13T10:17:18Z

@ogrisel wouldn't it also be good to stop on the training error? In my view, the early stopping is more of a convergence check than a regularization. And I'm a bit afraid of doing to much behind the scenes.

larsmans · 2012-06-13T10:20:00Z

@amueller: ah, I get you now. No, BernoulliNB by default doesn't handle all-positive feature values nicely. I'm not sure whether that should raise an exception, though; a user might want to grid search for an appropriate threshold. (Or does the grid searcher catch exceptions from estimators?)

I think @ogrisel fixed tf-idf (?).

amueller · 2012-06-13T10:23:00Z

@larsmans: My idea was that having a per-feature median might be more natural as default threshold instead of 0. That was just something that came to my mind, not sure if it makes sense. I'm really not familiar with the kind of data that is used with this classifier.

larsmans · 2012-06-13T10:40:49Z

The common application domains of Bernoulli NB, AFAIK, are text classification and NLP tasks like WSD with word window features. In both cases, the median is likely zero for each feature except the stop words.

ogrisel · 2012-06-13T11:47:57Z

Yes it's "fixed": it no longer outputs negatives values. However it does so in a non-standard way and I might change it again in the future. However last time I tried to implement the canonical normalization from the text books, the kmeans text clustering example was behaving much worse hence I decided to keep the current scheme for now as I did not have time to investigate further.

amueller · 2012-06-13T21:45:28Z

@larsmans @ogrisel Did the MultinomialNB check in #908.

amueller · 2012-06-25T10:43:57Z

Any comments? Should I merge? There are some smaller fixes that would be good to have in master. I can also cherry pick them and wait for a proper review on the testing stuff....

ogrisel · 2012-06-25T12:01:41Z

If all tests pass on your box, +1 for merging. More tests is always better :)

amueller · 2012-06-25T12:02:43Z

Ok then I'll have a look again tonight and if everything works I'll merge.

…y :-/)

…ionMixin not inherit from ClassifierMixin

…ction of sparse SVM

…the moment)

…ed them). Turned up alpha and n_iter. This corresponds to more regularization and more carefull SGD. On which kind of problems do the old defaults work?

…st guessed them)." This reverts commit b659bc399da94be0de4a970da15e69cb778d4101.

…test set

amueller · 2012-06-26T17:05:36Z

Runs fine. Still needs work but should be a good place to start from.
Merging now :)

MRG factorize common tests.

ogrisel · 2012-06-27T08:10:04Z

It seems that some of the new tests are broken on python 2.6: https://jenkins.shiningpanda.com/scikit-learn/job/python-2.6-numpy-1.3.0-scipy-0.7.2/703/consoleText

amueller · 2012-06-27T08:12:18Z

Yes, I saw that but didn't have time to fix it. I hope I can do it later today.

amueller · 2012-06-27T18:54:14Z

Should be fixed now.

ogrisel · 2012-06-27T21:04:48Z

Thanks!

mblondel reviewed Jun 10, 2012
View reviewed changes

sklearn/tests/test_common.py

#test cloning

clone(e)

# test __repr__

print(e)

Copy link

Member

mblondel Jun 10, 2012

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the built-in function repr for that.

This was referenced Jun 13, 2012

Problem fitting LassoLarsCV - broadcasting error #716

Closed

MRG Check that X is non_zero for MultinomialNB. #908

Closed

amueller added 5 commits June 26, 2012 14:16

ENH factorize common tests.

863d53c

ENH don't list abstract base classes

1e9c337

ENH make base classes abstract meta classes

26c31bf

ENH make all Estimators default constructible (except SparseCoder)

86cedaf

ENH Add MetaEstimatorMixin, make RFE default constructible

04d8160

amueller added 22 commits June 26, 2012 14:16

ENH make GMMs and LLE cloneable.

520da07

COSMIT get rid of warnings (can't get rid of deprecation warnings onl…

7cdeecf

…y :-/)

ENH make BaseLabelPropagation abstract base class, make OutlierDetect…

21c0d1a

…ionMixin not inherit from ClassifierMixin

BUG fix testing for abstract classes

282d267

ENH default score func for univariate feature selection: f_classif

f71185d

Make sparse svm base class ABC

ee7d78c

FIX better class selection, more strict testing.

30d6246

ENH more tests

7a585e4

MISC raise NotImplementedError instead of value error in decision_fun…

310a8ce

…ction of sparse SVM

ENH do zero mean, unit variance on iris, don't test naive Bayes (for …

c02938f

…the moment)

ENH change defaults on SGD (works on digits and iris and I just guess…

cd0b531

…ed them). Turned up alpha and n_iter. This corresponds to more regularization and more carefull SGD. On which kind of problems do the old defaults work?

ENH avoid division by zero in LDA, also avoid reusing variable names.

f0026be

MISC don't test SVM for the moment, rest works :)

4264acd

ENH make LinearModel and LinearModelCV abstract base classes

bab0539

ENH test regressors

6aa167c

MISC shuffle iris for SGD based methods

5338bda

Revert "ENH change defaults on SGD (works on digits and iris and I ju…

a44318a

…st guessed them)." This reverts commit b659bc399da94be0de4a970da15e69cb778d4101.

ENH Fix seed that makes SGDClassifier work.

ddf5693

ENH create BaseRidge base class

971d131

ENH test more shapes, test non-consecutive classes, test accuracy on …

a93e0af

…test set

FIX minor rebasing and other problems

94708a3

MISC cleanup common testing

bd94b49

amueller added a commit that referenced this pull request Jun 26, 2012

Merge pull request #893 from amueller/common_test

549d82f

MRG factorize common tests.

amueller merged commit 549d82f into scikit-learn:master Jun 26, 2012

amueller mentioned this pull request Jun 26, 2012

factorize common tests #406

Closed

Uh oh!

MRG factorize common tests. #893

MRG factorize common tests. #893

Uh oh!

Conversation

amueller commented Jun 7, 2012

Uh oh!

amueller commented Jun 7, 2012

Uh oh!

amueller commented Jun 7, 2012

Uh oh!

ogrisel commented Jun 8, 2012

Uh oh!

amueller commented Jun 8, 2012

Uh oh!

amueller commented Jun 8, 2012

Uh oh!

mblondel Jun 10, 2012

Choose a reason for hiding this comment

Uh oh!

amueller commented Jun 12, 2012

Uh oh!

mblondel commented Jun 13, 2012

Uh oh!

amueller commented Jun 13, 2012

Uh oh!

mblondel commented Jun 13, 2012

Uh oh!

amueller commented Jun 13, 2012

Uh oh!

ogrisel commented Jun 13, 2012

Uh oh!

larsmans commented Jun 13, 2012

Uh oh!

amueller commented Jun 13, 2012

Uh oh!

amueller commented Jun 13, 2012

Uh oh!

larsmans commented Jun 13, 2012

Uh oh!

amueller commented Jun 13, 2012

Uh oh!

larsmans commented Jun 13, 2012

Uh oh!

ogrisel commented Jun 13, 2012

Uh oh!

amueller commented Jun 13, 2012

Uh oh!

amueller commented Jun 25, 2012

Uh oh!

ogrisel commented Jun 25, 2012

Uh oh!

amueller commented Jun 25, 2012

Uh oh!

amueller commented Jun 26, 2012

Uh oh!

ogrisel commented Jun 27, 2012

Uh oh!

amueller commented Jun 27, 2012

Uh oh!

amueller commented Jun 27, 2012

Uh oh!

ogrisel commented Jun 27, 2012

Uh oh!

Uh oh!