Skip to content

MRG factorize common tests. #893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jun 26, 2012
Merged

Conversation

amueller
Copy link
Member

@amueller amueller commented Jun 7, 2012

This is a shot at factorizing common tests for all estimators.
Any suggestions on what to test are very welcome :)

Closes #406.

I think I would like to merge the stuff I already did before going on with more tests. At the moment, everything passes.
There are some (minor) issues for going further, for example the fit_pairwise problem which prevents me from testing clusterings.

I feel there is already quite a lot in this PR and the longer it gets, the less motivation someone will have to review ;)

@amueller
Copy link
Member Author

amueller commented Jun 7, 2012

Btw I think this PR will work better with #803, as this enforces a more consistent API.

@amueller
Copy link
Member Author

amueller commented Jun 7, 2012

The following classifiers fail on iris with default params: SGDClassifier, Perceptron, BernoulliNB.
I tried to normalize the features with Scaler. That didn't help but then NearestCentroid also failed.
Oh and btw, I am talking about training set performance.

I am a bit surprised that SGDClassifier has problems. It seems a bit hard to tune.

@ogrisel
Copy link
Member

ogrisel commented Jun 8, 2012

What surprises me is that Perceptron and NearestCentroid fail. Both are more simpler than SGDClassifier with less hyperparameters to tweak.

@amueller
Copy link
Member Author

amueller commented Jun 8, 2012

Also true. NearestCentroid "only" fails when using Scaler... but also shouldn't. You're right.

@amueller
Copy link
Member Author

amueller commented Jun 8, 2012

I haven't looked into it this much, as there are also other tests failing.

#test cloning
clone(e)
# test __repr__
print(e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the built-in function repr for that.

@amueller
Copy link
Member Author

I have trouble testing the NB classifiers. Maybe @larsmans has some opinion on that. BernoulliNB by default thresholds at zero, so it can not handle non-negative data. MultinomialNB on the other hand assumes the entries to be counts, in particular non-negative. Neither raised any error when given arbitrary data.

My understanding is that MultiomialNB should raise an error on negative data.
I would like it if all estimators can handle zero mean data or raise an error if it doesn't make sense (as in the case of MultinomialNB).

@mblondel
Copy link
Member

@amueller Shuffle the iris dataset first. SGDClassifier and Perceptron are sensitive to the shuffling and by default the iris dataset is not shuffled.

@amueller
Copy link
Member Author

@mblondel Thanks. That one bit me already so often ;) I reverted the SGD default changes since it seems to work. It is quite unstable though, and I think adding some more iterations by default might be good. What do you think?

@mblondel
Copy link
Member

You could add a list of estimators that must be handled specifically: this way you could add SGDClassifier with extra parameters.

@amueller
Copy link
Member Author

Well, for the moment it works. My question was more: how sensible are the current defaults. Usually I use much more than 5 iterations. I guess these are good with huge, very redundant datasets. But would it hurt to do 20 iterations instead?

@ogrisel
Copy link
Member

ogrisel commented Jun 13, 2012

Ideally we should have a large number of iterations (e.g. 100) + some early stopping criterion based on a exponentially moving average of the test error measured from a validation test subsamples from the training set if not passed explicitly.

@larsmans
Copy link
Member

2012/6/12 Andreas Mueller
reply@reply.github.com:

I have trouble testing the NB classifiers. Maybe @larsmans has some opinion on that. BernoulliNB by default thresholds at zero, so it can not handle non-negative data. MultinomialNB on the other hand assumes the entries to be counts, in particular non-negative. Neither raised any error when given arbitrary data.

Well, BernoulliNB can handle negative data, though it doesn't really
make much sense to feed it that. It wants booleans.

My understanding is that MultiomialNB should raise an error on negative data.

That's right. It didn't, so far, because our tf-idf implementation was
buggy and could produce negatives.

@amueller
Copy link
Member Author

@larsmans Maybe I misread you or you misread me: I wanted it say BernoulliNB can not handle all positive data, since it thresholds at zero by default. It handles zero-mean data "reasonably".

About tf-idf: was that fixed? I remember some PR hanging around a while ago.

@amueller
Copy link
Member Author

@ogrisel wouldn't it also be good to stop on the training error? In my view, the early stopping is more of a convergence check than a regularization. And I'm a bit afraid of doing to much behind the scenes.

@larsmans
Copy link
Member

@amueller: ah, I get you now. No, BernoulliNB by default doesn't handle all-positive feature values nicely. I'm not sure whether that should raise an exception, though; a user might want to grid search for an appropriate threshold. (Or does the grid searcher catch exceptions from estimators?)

I think @ogrisel fixed tf-idf (?).

@amueller
Copy link
Member Author

@larsmans: My idea was that having a per-feature median might be more natural as default threshold instead of 0. That was just something that came to my mind, not sure if it makes sense. I'm really not familiar with the kind of data that is used with this classifier.

@larsmans
Copy link
Member

The common application domains of Bernoulli NB, AFAIK, are text classification and NLP tasks like WSD with word window features. In both cases, the median is likely zero for each feature except the stop words.

@ogrisel
Copy link
Member

ogrisel commented Jun 13, 2012

Yes it's "fixed": it no longer outputs negatives values. However it does so in a non-standard way and I might change it again in the future. However last time I tried to implement the canonical normalization from the text books, the kmeans text clustering example was behaving much worse hence I decided to keep the current scheme for now as I did not have time to investigate further.

@amueller
Copy link
Member Author

@larsmans @ogrisel Did the MultinomialNB check in #908.

@amueller
Copy link
Member Author

Any comments? Should I merge? There are some smaller fixes that would be good to have in master. I can also cherry pick them and wait for a proper review on the testing stuff....

@ogrisel
Copy link
Member

ogrisel commented Jun 25, 2012

If all tests pass on your box, +1 for merging. More tests is always better :)

@amueller
Copy link
Member Author

Ok then I'll have a look again tonight and if everything works I'll merge.

amueller added 22 commits June 26, 2012 14:16
…ed them).

Turned up alpha and n_iter. This corresponds to more regularization and more carefull SGD.
On which kind of problems do the old defaults work?
…st guessed them)."

This reverts commit b659bc399da94be0de4a970da15e69cb778d4101.
@amueller
Copy link
Member Author

Runs fine. Still needs work but should be a good place to start from.
Merging now :)

amueller added a commit that referenced this pull request Jun 26, 2012
@amueller amueller merged commit 549d82f into scikit-learn:master Jun 26, 2012
@amueller amueller mentioned this pull request Jun 26, 2012
@ogrisel
Copy link
Member

ogrisel commented Jun 27, 2012

It seems that some of the new tests are broken on python 2.6: https://jenkins.shiningpanda.com/scikit-learn/job/python-2.6-numpy-1.3.0-scipy-0.7.2/703/consoleText

@amueller
Copy link
Member Author

Yes, I saw that but didn't have time to fix it. I hope I can do it later today.

@amueller
Copy link
Member Author

Should be fixed now.

@ogrisel
Copy link
Member

ogrisel commented Jun 27, 2012

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

factorize common tests
4 participants