[MRG] DOC Mention StandardScaler ddof #12950

MarcoGorelli · 2019-01-10T16:25:16Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Expands the documentation so it's clear that the estimate of the standard deviation in StandardScaler is the biased one (equivalent to numpy.sqrt(numpy.var(x, ddof=0))).

Any other comments?

…he estimator of the standard deviation is the biased one

jnothman

Put it in a Notes section and explain that the choice of ddof is unlikely to affect ML performance

MarcoGorelli · 2019-01-11T10:10:24Z

What do you mean by 'Put it in a Notes section'? I've searched 'notes' in the contributing guidelines, but all I can find is a section about 'working notes'

…d affect model performance.

qinhanmin2014

See https://scikit-learn.org/dev/modules/generated/sklearn.model_selection.StratifiedKFold.html for notes section.
I agree that we should emphasize the difference in notes section, since we're not going to modify our implementation.

qinhanmin2014 · 2019-01-11T12:16:47Z

sklearn/preprocessing/data.py

@@ -478,7 +478,10 @@ class StandardScaler(BaseEstimator, TransformerMixin):

    where `u` is the mean of the training samples or zero if `with_mean=False`,
    and `s` is the standard deviation of the training samples or one if
-    `with_std=False`.
+    `with_std=False`. Note that `s` is a biased estimator of the standard
+    deviation, equivalent to numpy.sqrt(numpy.var(x, ddof=0)), and that it is


use numpy.std instead?

….std instead of np.var

qinhanmin2014

thanks @MarcoGorelli

qinhanmin2014 · 2019-01-11T13:07:41Z

sklearn/preprocessing/data.py

@@ -574,6 +574,10 @@ class StandardScaler(BaseEstimator, TransformerMixin):
    -----
    NaNs are treated as missing values: disregarded in fit, and maintained in
    transform.
+
+    We use a biased estimator for the standard deviation, equivalent to
+    `numpy.std(x, ddof=0)`. Note, however, that the choice of `ddof` is


remove however?

jnothman

Actually, before merge, can we get this note replicated in the scale function?

This reverts commit 6a85a17.

Marco Gorelli added 3 commits January 10, 2019 16:14

Update docstring of preprocessing.StandardScaler so it's clear that t…

60e25fe

…he estimator of the standard deviation is the biased one

Add np.sqrt to calculation of standard deviation

b406ded

Change np to numpy

c91101e

MarcoGorelli changed the title ~~Fix doc standardscaler~~ Fix "StandardScaler that supports option to use unbiased estimator of variance" Jan 10, 2019

MarcoGorelli changed the title ~~Fix "StandardScaler that supports option to use unbiased estimator of variance"~~ [MRG] Fix "StandardScaler that supports option to use unbiased estimator of variance" Jan 10, 2019

jnothman reviewed Jan 11, 2019

View reviewed changes

Add message explaning it's unlikely that the choice of estimator woul…

06f807d

…d affect model performance.

qinhanmin2014 reviewed Jan 11, 2019

View reviewed changes

Move note about ddof in StandardScaler to 'Notes' section, and use np…

a16334d

….std instead of np.var

qinhanmin2014 approved these changes Jan 11, 2019

View reviewed changes

Remove 'however'

dc9c0ee

qinhanmin2014 approved these changes Jan 11, 2019

View reviewed changes

jnothman approved these changes Jan 13, 2019

View reviewed changes

jnothman reviewed Jan 13, 2019

View reviewed changes

jnothman changed the title ~~[MRG] Fix "StandardScaler that supports option to use unbiased estimator of variance"~~ [MRG] DOC Mention StandardScaler ddof Jan 13, 2019

Replicate note in the function

b62c70d

qinhanmin2014 approved these changes Jan 14, 2019

View reviewed changes

qinhanmin2014 merged commit eab7e8b into scikit-learn:master Jan 14, 2019

MarcoGorelli deleted the fix-doc-standardscaler branch January 14, 2019 15:15

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Feb 19, 2019

DOC Mention StandardScaler ddof (scikit-learn#12950)

cff7af7

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

DOC Mention StandardScaler ddof (scikit-learn#12950)

6a85a17

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "DOC Mention StandardScaler ddof (scikit-learn#12950)"

49dac59

This reverts commit 6a85a17.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "DOC Mention StandardScaler ddof (scikit-learn#12950)"

9dabed1

This reverts commit 6a85a17.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

DOC Mention StandardScaler ddof (scikit-learn#12950)

8cb3522

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] DOC Mention StandardScaler ddof #12950

[MRG] DOC Mention StandardScaler ddof #12950

Uh oh!

MarcoGorelli commented Jan 10, 2019

Uh oh!

jnothman left a comment

Uh oh!

MarcoGorelli commented Jan 11, 2019

Uh oh!

qinhanmin2014 left a comment

Uh oh!

qinhanmin2014 Jan 11, 2019

Uh oh!

qinhanmin2014 left a comment

Uh oh!

qinhanmin2014 Jan 11, 2019

Uh oh!

jnothman left a comment

Uh oh!

Uh oh!

Uh oh!

[MRG] DOC Mention StandardScaler ddof #12950

[MRG] DOC Mention StandardScaler ddof #12950

Uh oh!

Conversation

MarcoGorelli commented Jan 10, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli commented Jan 11, 2019

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 Jan 11, 2019

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 Jan 11, 2019

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!