7
7
=============================================================
8
8
9
9
Feature 0 (median income in a block) and feature 5 (number of households) of
10
- the `California housing dataset
11
- <https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html>`_ have very
10
+ the :ref:`california_housing_dataset` have very
12
11
different scales and contain some very large outliers. These two
13
12
characteristics lead to difficulties to visualize the data and, more
14
13
importantly, they can degrade the predictive performance of many machine
26
25
data within a pre-defined range.
27
26
28
27
Scalers are linear (or more precisely affine) transformers and differ from each
29
- other in the way to estimate the parameters used to shift and scale each
28
+ other in the way they estimate the parameters used to shift and scale each
30
29
feature.
31
30
32
- ``QuantileTransformer`` provides non-linear transformations in which distances
33
- between marginal outliers and inliers are shrunk. ``PowerTransformer`` provides
31
+ :class:`~sklearn.preprocessing.QuantileTransformer` provides non-linear
32
+ transformations in which distances
33
+ between marginal outliers and inliers are shrunk.
34
+ :class:`~sklearn.preprocessing.PowerTransformer` provides
34
35
non-linear transformations in which data is mapped to a normal distribution to
35
36
stabilize variance and minimize skewness.
36
37
89
90
PowerTransformer (method = 'yeo-johnson' ).fit_transform (X )),
90
91
('Data after power transformation (Box-Cox)' ,
91
92
PowerTransformer (method = 'box-cox' ).fit_transform (X )),
92
- ('Data after quantile transformation (gaussian pdf)' ,
93
- QuantileTransformer (output_distribution = 'normal' )
94
- .fit_transform (X )),
95
93
('Data after quantile transformation (uniform pdf)' ,
96
94
QuantileTransformer (output_distribution = 'uniform' )
97
95
.fit_transform (X )),
96
+ ('Data after quantile transformation (gaussian pdf)' ,
97
+ QuantileTransformer (output_distribution = 'normal' )
98
+ .fit_transform (X )),
98
99
('Data after sample-wise L2 normalizing' ,
99
100
Normalizer ().fit_transform (X )),
100
101
]
@@ -184,7 +185,7 @@ def plot_distribution(axes, X, y, hist_nbins=50, title="",
184
185
# figure will show a scatter plot of the full data set while the right figure
185
186
# will exclude the extreme values considering only 99 % of the data set,
186
187
# excluding marginal outliers. In addition, the marginal distributions for each
187
- # feature will be shown on the side of the scatter plot.
188
+ # feature will be shown on the sides of the scatter plot.
188
189
189
190
190
191
def make_plot (item_idx ):
@@ -238,16 +239,18 @@ def make_plot(item_idx):
238
239
# StandardScaler
239
240
# --------------
240
241
#
241
- # ``StandardScaler`` removes the mean and scales the data to unit variance.
242
+ # :class:`~sklearn.preprocessing.StandardScaler` removes the mean and scales
243
+ # the data to unit variance. The scaling shrinks the range of the feature
244
+ # values as shown in the left figure below.
242
245
# However, the outliers have an influence when computing the empirical mean and
243
- # standard deviation which shrink the range of the feature values as shown in
244
- # the left figure below. Note in particular that because the outliers on each
246
+ # standard deviation. Note in particular that because the outliers on each
245
247
# feature have different magnitudes, the spread of the transformed data on
246
248
# each feature is very different: most of the data lie in the [-2, 4] range for
247
249
# the transformed median income feature while the same data is squeezed in the
248
250
# smaller [-0.2, 0.2] range for the transformed number of households.
249
251
#
250
- # ``StandardScaler`` therefore cannot guarantee balanced feature scales in the
252
+ # :class:`~sklearn.preprocessing.StandardScaler` therefore cannot guarantee
253
+ # balanced feature scales in the
251
254
# presence of outliers.
252
255
253
256
make_plot (1 )
@@ -256,33 +259,38 @@ def make_plot(item_idx):
256
259
# MinMaxScaler
257
260
# ------------
258
261
#
259
- # ``MinMaxScaler`` rescales the data set such that all feature values are in
262
+ # :class:`~sklearn.preprocessing.MinMaxScaler` rescales the data set such that
263
+ # all feature values are in
260
264
# the range [0, 1] as shown in the right panel below. However, this scaling
261
- # compress all inliers in the narrow range [0, 0.005] for the transformed
265
+ # compresses all inliers into the narrow range [0, 0.005] for the transformed
262
266
# number of households.
263
267
#
264
- # As ``StandardScaler``, ``MinMaxScaler`` is very sensitive to the presence of
265
- # outliers.
268
+ # Both :class:`~sklearn.preprocessing.StandardScaler` and
269
+ # :class:`~sklearn.preprocessing.MinMaxScaler` are very sensitive to the
270
+ # presence of outliers.
266
271
267
272
make_plot (2 )
268
273
269
274
#############################################################################
270
275
# MaxAbsScaler
271
276
# ------------
272
277
#
273
- # ``MaxAbsScaler`` differs from the previous scaler such that the absolute
274
- # values are mapped in the range [0, 1]. On positive only data, this scaler
275
- # behaves similarly to ``MinMaxScaler`` and therefore also suffers from the
276
- # presence of large outliers.
278
+ # :class:`~sklearn.preprocessing.MaxAbsScaler` is similar to
279
+ # :class:`~sklearn.preprocessing.MinMaxScaler` except that the
280
+ # values are mapped in the range [0, 1]. On positive only data, both scalers
281
+ # behave similarly.
282
+ # :class:`~sklearn.preprocessing.MaxAbsScaler` therefore also suffers from
283
+ # the presence of large outliers.
277
284
278
285
make_plot (3 )
279
286
280
287
##############################################################################
281
288
# RobustScaler
282
289
# ------------
283
290
#
284
- # Unlike the previous scalers, the centering and scaling statistics of this
285
- # scaler are based on percentiles and are therefore not influenced by a few
291
+ # Unlike the previous scalers, the centering and scaling statistics of
292
+ # :class:`~sklearn.preprocessing.RobustScaler`
293
+ # is based on percentiles and are therefore not influenced by a few
286
294
# number of very large marginal outliers. Consequently, the resulting range of
287
295
# the transformed feature values is larger than for the previous scalers and,
288
296
# more importantly, are approximately similar: for both features most of the
@@ -297,53 +305,57 @@ def make_plot(item_idx):
297
305
# PowerTransformer
298
306
# ----------------
299
307
#
300
- # ``PowerTransformer`` applies a power transformation to each feature to make
301
- # the data more Gaussian-like. Currently, ``PowerTransformer`` implements the
302
- # Yeo-Johnson and Box-Cox transforms. The power transform finds the optimal
303
- # scaling factor to stabilize variance and mimimize skewness through maximum
304
- # likelihood estimation. By default, ``PowerTransformer`` also applies
305
- # zero-mean, unit variance normalization to the transformed output. Note that
308
+ # :class:`~sklearn.preprocessing.PowerTransformer` applies a power
309
+ # transformation to each feature to make the data more Gaussian-like in order
310
+ # to stabilize variance and minimize skewness. Currently the Yeo-Johnson
311
+ # and Box-Cox transforms are supported and the optimal
312
+ # scaling factor is determined via maximum likelihood estimation in both
313
+ # methods. By default, :class:`~sklearn.preprocessing.PowerTransformer` applies
314
+ # zero-mean, unit variance normalization. Note that
306
315
# Box-Cox can only be applied to strictly positive data. Income and number of
307
316
# households happen to be strictly positive, but if negative values are present
308
- # the Yeo-Johnson transformed is to be preferred.
317
+ # the Yeo-Johnson transformed is preferred.
309
318
310
319
make_plot (5 )
311
320
make_plot (6 )
312
321
313
- ##############################################################################
314
- # QuantileTransformer (Gaussian output)
315
- # -------------------------------------
316
- #
317
- # ``QuantileTransformer`` has an additional ``output_distribution`` parameter
318
- # allowing to match a Gaussian distribution instead of a uniform distribution.
319
- # Note that this non-parametetric transformer introduces saturation artifacts
320
- # for extreme values.
321
-
322
- make_plot (7 )
323
-
324
322
###################################################################
325
323
# QuantileTransformer (uniform output)
326
324
# ------------------------------------
327
325
#
328
- # ``QuantileTransformer`` applies a non-linear transformation such that the
326
+ # :class:`~sklearn.preprocessing.QuantileTransformer` applies a non-linear
327
+ # transformation such that the
329
328
# probability density function of each feature will be mapped to a uniform
330
- # distribution. In this case, all the data will be mapped in the range [0, 1],
331
- # even the outliers which cannot be distinguished anymore from the inliers.
329
+ # or Gaussian distribution. In this case, all the data, including outliers,
330
+ # will be mapped to a uniform distribution with the range [0, 1], making
331
+ # outliers indistinguishable from inliers.
332
+ #
333
+ # :class:`~sklearn.preprocessing.RobustScaler` and
334
+ # :class:`~sklearn.preprocessing.QuantileTransformer` are robust to outliers in
335
+ # the sense that adding or removing outliers in the training set will yield
336
+ # approximately the same transformation. But contrary to
337
+ # :class:`~sklearn.preprocessing.RobustScaler`,
338
+ # :class:`~sklearn.preprocessing.QuantileTransformer` will also automatically
339
+ # collapse any outlier by setting them to the a priori defined range boundaries
340
+ # (0 and 1). This can result in saturation artifacts for extreme values.
341
+
342
+ make_plot (7 )
343
+
344
+ ##############################################################################
345
+ # QuantileTransformer (Gaussian output)
346
+ # -------------------------------------
332
347
#
333
- # As ``RobustScaler``, ``QuantileTransformer`` is robust to outliers in the
334
- # sense that adding or removing outliers in the training set will yield
335
- # approximately the same transformation on held out data. But contrary to
336
- # ``RobustScaler``, ``QuantileTransformer`` will also automatically collapse
337
- # any outlier by setting them to the a priori defined range boundaries (0 and
338
- # 1).
348
+ # To map to a Gaussian distribution, set the parameter
349
+ # ``output_distribution='normal'``.
339
350
340
351
make_plot (8 )
341
352
342
353
##############################################################################
343
354
# Normalizer
344
355
# ----------
345
356
#
346
- # The ``Normalizer`` rescales the vector for each sample to have unit norm,
357
+ # The :class:`~sklearn.preprocessing.Normalizer` rescales the vector for each
358
+ # sample to have unit norm,
347
359
# independently of the distribution of the samples. It can be seen on both
348
360
# figures below where all samples are mapped onto the unit circle. In our
349
361
# example the two selected features have only positive values; therefore the
0 commit comments