-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC Add links to preprocessing examples in docstrings and userguide #26877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC Add links to preprocessing examples in docstrings and userguide #26877
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @StefanieSenger. Here is a first batch of comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from a bit of wording, LGTM :) Thanks again @StefanieSenger!
Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments, it's hard to review since this PR is rather large and touching many examples, it's easier if PRs have a smaller scope.
Resolved the CI issues, thank you @adrinjalali |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM, thanks for the PR. In addition to @adrinjalali's remarks here are a few more.
sklearn/preprocessing/_data.py
Outdated
@@ -291,6 +290,10 @@ class MinMaxScaler(OneToOneFeatureMixin, TransformerMixin, BaseEstimator): | |||
This transformation is often used as an alternative to zero mean, | |||
unit variance scaling. | |||
|
|||
MinMaxScaler doesn't reduce the effect of outliers; it only linearily | |||
scales them down. For an example visualization, refer to :ref:`Compare |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement holds for all scalers (StandardScaler
, RobustScaler
, MaxAbsScaler
and MinMaxScaler
). What is different is that the scale value found by RobustScaler
is not sensitive to the presence of a few large marginal outliers while it is for StandardScaler
and even more so for MinMaxScaler
and MaxAbsScaler
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I see. To express how the MinMaxScaler
differs from the other scalers concerning outliers, I have tried to come up with a new wording:
`MinMaxScaler` doesn't reduce the effect of outliers, but it linearily
scales them down into a fixed range, where the largest occuring data point
corresponds to the maximum value and the smallest one corresponds to the
minimum value.
What do you think?
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…er/scikit-learn into link_examples_preprocessing
…cikit-learn#26877) Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…cikit-learn#26877) Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…cikit-learn#26877) Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…26877) Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
…cikit-learn#26877) Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
This PR suggests to add links to the examples from the Preprocessing section to the docstrings of the respective classes and functions.