Skip to content

[MRG] Use tag requires_positive_X for NMF + ComplementNB #14680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 20, 2019

Conversation

agramfort
Copy link
Member

@agramfort agramfort commented Aug 18, 2019

I need the tag requires_positive_X for #13246 following the suggestion of @rth

@agramfort agramfort changed the title Tag positive x Use tag requires_positive_X for NMF + ComplementNB Aug 18, 2019
@agramfort agramfort changed the title Use tag requires_positive_X for NMF + ComplementNB [MRG] Use tag requires_positive_X for NMF + ComplementNB Aug 18, 2019
@glemaitre
Copy link
Member

Shall we have a check_non_negative for NMF as well. We could also have a common test that check that an error is raised when the tag is True?

@wdevazelhes
Copy link
Contributor

I had been starting a version of this locally but I just saw this PR (see #14685)
Shouldn't ComplementNB also have the flag requires_positive_X ? Looking at the user guide it seems like an enhancement of MultinomialNB. Also, maybe BernoulliNB too ? Since it takes binary inputs (0 or 1 so positives)

Also, maybe it could be a good occasion to update the statements X -= X.min() that deal with that, using a enforce_estimator_tags_X like done for enforce_estimator_tags_y in #14095 ? (I tried to implement this in #14685)

@wdevazelhes
Copy link
Contributor

Also LatentDirichletAllocation ?

X = self._check_non_neg_array(X, "LatentDirichletAllocation.fit")

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just an entry in the what's new since it changes check_estimator which is public.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always add this tag to more classes/functions in the future if needed.

estimator = clone(estimator_orig)
assert_raises_regex(ValueError, "Negative values in data passed to",
estimator.fit, X, y)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a redundant link (flake8 error which can't be detected in PR)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I don't get what you mean here. My editor does not complain at these lines (but elsewhere in the file)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there're 3 lines between this functions and the next function.
E303 too many blank lines (3)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad... that's what happens when you have so many flake8 errors in the file that you stop paying attention to them :(

X = np.array([[-1., 1], [-1., 1]])
y = np.array([1, 2])
estimator = clone(estimator_orig)
assert_raises_regex(ValueError, "Negative values in data passed to",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've decided to deprecate assert_raise_regex, see #14216

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would need to import pytest in a non-test file. Do we have such a context manager in the testing module?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, let's leave it as it is.

@qinhanmin2014 qinhanmin2014 merged commit 951c501 into scikit-learn:master Aug 20, 2019
@amueller
Copy link
Member

we should also do the inverse. Right now many tests use positive data, they should use this tag instead.

@agramfort
Copy link
Member Author

agramfort commented Aug 20, 2019 via email

@amueller
Copy link
Member

amueller commented Aug 20, 2019

We get the tags in most of them already, why would that be an issue? We check if they are multi-output only in each check I think.

@agramfort
Copy link
Member Author

agramfort commented Aug 20, 2019 via email

@wdevazelhes
Copy link
Contributor

wdevazelhes commented Aug 20, 2019

Not sure if that's what you are talking about, but if you want I can open a PR that uses a enforce_estimator_tags_X function to make the X positive if needed depending on the tags, just like the enforce_estimator_tags_y function introduced in #14095 (see this closed PR: #14685) ?

@amueller
Copy link
Member

@wdevazelhes yes, ideally that'll replace the existing helpers on X.

arpanchowdhry added a commit to arpanchowdhry/scikit-learn that referenced this pull request Aug 20, 2019
@wdevazelhes
Copy link
Contributor

@wdevazelhes yes, ideally that'll replace the existing helpers on X.

Done, in #14705

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants