Skip to content

[MRG+1] Remove check_array for X in DummyRegressor/DummyClassifier and replace X.shape[0] with _num_samples #9835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 71 commits into from
Dec 20, 2017

Conversation

vrishank97
Copy link
Contributor

Fixes #9832

Adds dtype="None" and ensure_min_features=0 as params for check_array in DummyRegressor predict.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a test.

Perhaps just discard the check_array and use utils.validation._num_samples. Note sure if this will pass the common tests, though.

@vrishank97
Copy link
Contributor Author

I haven't made tests before. Would adding the code to reproduce the bug from the report to sklearn/tests/test_dummy.py in the form of function suffice?

@jnothman
Copy link
Member

jnothman commented Sep 27, 2017 via email

@vrishank97 vrishank97 changed the title Add params for check_array in DummyRegressor predict [MRG] Add params for check_array in DummyRegressor predict Sep 27, 2017


def test_dummy_regressor_on_object_value():
X = np.array(['foo', 'bar', 'baz'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't actually have dtype object (check for yourself) so the test is incorrectly named.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I'll change it to string.

Changed test name from test_dummy_regressor_on_object_value to test_dummy_regressor_on_string_values.
@vrishank97
Copy link
Contributor Author

I have changed the test name. @jnothman is there anything else that needs to be changed?

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jnothman jnothman changed the title [MRG] Add params for check_array in DummyRegressor predict [MRG+1] Add params for check_array in DummyRegressor predict Nov 29, 2017
@qinhanmin2014
Copy link
Member

ping @vrishank97 Some personal concerns:
(1) Why is the check only modified in predict, not in fit? fit and perdict do the same check here.
(2) Seems that the test will pass even in current master. E.g.

X = np.array(['a', 'b', 'c']).reshape(-1, 1)
check_array(X, dtype='numeric') # no error

(3) Seems that the original code snippet in the issue will still get an error.

@qinhanmin2014
Copy link
Member

qinhanmin2014 commented Nov 30, 2017

So seems that there's something wrong with check_array (#10229).
@vrishank97 Could you please consider to copy the change to fit (and maybe DummyClassifier) and extend the test accordingly? Thanks.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @vrishank97 Could you please take some time to finish it. Otherwise I'll pick it up.
Even with current check_array, I believe we can still construct a regression test using astype("O").

@vrishank97
Copy link
Contributor Author

Hi. Some work came up earlier. I'll continue my work on this now. Sorry for the delay.

@amueller
Copy link
Member

We should probably check with a zero feature array as well, right?

@vrishank97
Copy link
Contributor Author

@qinhanmin2014
I've modified the check_array params for fit and predict in both DummyClassifier and DummyRegressor.
What other tests should I add here?

@qinhanmin2014
Copy link
Member

@vrishank97
There's predict_proba in DummyClassifier.
For the test, I think you need to cover all your changes. Also make sure that your test fail on master but success in the PR.

y_pred = cls.predict(X)
assert_array_equal(y_pred, y_expected)

test_dummy_regressor_on_string_values()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test dummy classifier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just made the test. Pushing now

@qinhanmin2014
Copy link
Member

@vrishank97 Please also update what's new(docs/whats_new/v0.20.rst) accordingly.

@vrishank97
Copy link
Contributor Author

I've updated the what's new section.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vrishank97 Please avoid making unrelated changes.

@@ -116,6 +116,10 @@ Classifiers and regressors
:class:`linear_model.BayesianRidge` for weighted linear regression.
:issue:`10111` by :user:`Peter St. John <pstjohn>`.

- :class:'dummy.DummyClassifier' and :class:'dummy.DummyRegresssor' now
only requires X to be an object with finite length or shape.
:issue:'9832' by :user:'Vrishank Bhardwaj <vrishank97>'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use ` instead of ', please follow what others' are doing and check the result from Circle CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops.

@vrishank97
Copy link
Contributor Author

I've removed the typos. I think I also accidentally included a few unrelated commits by others while rebasing. How should I remove them?

@qinhanmin2014
Copy link
Member

At least you might recover the code according to the diff.

@vrishank97
Copy link
Contributor Author

Removed the diffs.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ping @jnothman

@jnothman jnothman merged commit 58ff9b8 into scikit-learn:master Dec 20, 2017
@lesteve
Copy link
Member

lesteve commented Dec 21, 2017

Wow I have no idea about the root reason for this ... but it looks like merging this one reverted some changes in master, probably because of the weird history ...

  • the diff in this PR changes 3 files
  • the squashed commit 58ff9b8 changes 7 files, reverting some changes in master ...

I'll fix that in master but this is something to keep in mind (check the history has no quirks) before merging.

@lesteve
Copy link
Member

lesteve commented Dec 21, 2017

Reverted the unwanted changes in master in e824fa1

@jnothman
Copy link
Member

jnothman commented Dec 21, 2017 via email

@qinhanmin2014
Copy link
Member

Thanks @lesteve Really sorry about that. Also curious about the reason.

@lesteve
Copy link
Member

lesteve commented Dec 21, 2017

Thanks @lesteve Really sorry about that. Also curious about the reason.

No worries! Anyone with a half-baked explanation I am interested ...

@jnothman
Copy link
Member

Perhaps it's worth notifying support@github.com that there is a bug in squash and merge...?

@lesteve
Copy link
Member

lesteve commented Dec 22, 2017

Maybe, before classifying it as a bug, it would be nice if someone can do a bit of research about

  • what git commands are executed in the background for the squash and merge
  • whether similar behaviours have been reported in the past

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants