-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
DOC Scale data before using k-neighbours regression #31201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Since we now use custom estimator names, let's not use class names anymore.
Also, could you please fix the linting problems (use the formatters) as instructed in the automated comment above?
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @5nizza. These suggestions should fix the failing CI, but I strongly recommend using pre-commit to format your files before commiting:
conda install pre-commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, @5nizza!
I have one small nitpick. Otherwise, LGTM!
Co-authored-by: Virgil Chan <virchan.math@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though I agree that kNN requires feature scaling in general, in this case the features of the diabetes dataset already have the same scale, so it's normal that we don't really see an improvement with your PR.
For the California housing dataset we do have different scales, but the features "Population" and "AveOccup" have large outliers, so maybe using a StandardScaler
is not the best option. How about we try a RobustScaler
?
thanks for the comments.
|
Here is the summary of the updates. In
Changes in file
|
…target_estimator) and re-initialization of rng
Looks good to me. What do you think @ArturoAmorQ? |
Co-authored-by: Tim Head <betatim@gmail.com>
Fixes #31200
by basically replacing
KNeigboursRegressor(......)
withmake_pipeline(StandardScaler(), KNeigboursRegressor(......))