Skip to content

[MRG+1] TheilSen robust linear regression #2949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 99 commits into from
Nov 20, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
d4519d2
Added multiple linear Theil-Sen regression
FlorianWilhelm Jan 14, 2014
b3e8a64
Added an example and documentatin for Theil-Sen
FlorianWilhelm Jan 17, 2014
7dea60d
Added subpopulation parameter to Theil-Sen
FlorianWilhelm Jan 27, 2014
8d48d45
Added parallelization support to TheilSen Estimator
FlorianWilhelm Mar 1, 2014
86a7461
Improved parallelization for Theilsen estimator
FlorianWilhelm Mar 2, 2014
601b82b
Merge branch 'master' into theilsen
FlorianWilhelm Mar 2, 2014
c19d967
Cleanups and corrections for Theil-Sen regression.
FlorianWilhelm Mar 4, 2014
8653c20
Removed subpopulation=None option in TheilSen
FlorianWilhelm Mar 8, 2014
960c45b
xrange fix for Python3 in TheilSen
FlorianWilhelm Mar 8, 2014
0eda5bf
FIX Theil-Sen unittest for older Scipy Version
FlorianWilhelm Mar 9, 2014
d133712
FIX that some functions in Theil-Sen were public
FlorianWilhelm Mar 9, 2014
d1d221b
FIX usage of linalg from Scipy in Theil-Sen
FlorianWilhelm Mar 9, 2014
61a5195
FIX: Let Theil-Sen handle n_samples < n_features case
FlorianWilhelm Mar 9, 2014
13d9212
FIX: Python 2.6 format syntax in Theil-Sen
FlorianWilhelm Mar 9, 2014
3e8ca8c
Vectorization of theilsen._modweiszfeld_step
FlorianWilhelm Mar 9, 2014
1d75896
FIX: Parallel unittests for Theil-Sen estimator
FlorianWilhelm Mar 9, 2014
b374e7e
FIX: TheilSen supports old Numpy versions
Mar 10, 2014
0b10f49
DOC: Comparison of Theil-Sen and RANSAC
FlorianWilhelm Mar 22, 2014
ca9a275
DOC: Fixed typo in Theil-Sen example.
FlorianWilhelm Mar 22, 2014
675babb
FIX: Some coding style fixes in TheilSen unittest.
FlorianWilhelm Mar 22, 2014
279dc90
FIX: Reduced the runtime of the TheilSen unittest.
FlorianWilhelm Mar 23, 2014
39efe30
DOC: Small corrections in the docs of Theil-Sen
FlorianWilhelm Mar 23, 2014
689bf29
DOC: improve Thiel-Sen vs RANSAC example
GaelVaroquaux Mar 23, 2014
b2366b2
ENH: speed up theilsen
GaelVaroquaux Mar 23, 2014
43d76ac
DOC: Explanation when TheilSen outperforms RANSAC.
FlorianWilhelm Mar 24, 2014
8935df6
Merge branch 'master' into theilsen
Jul 17, 2014
4b6768f
Merge branch 'master' into theilsen
FlorianWilhelm Jul 17, 2014
e4503b1
Fix for old Numpy 1.6.3
Jul 18, 2014
1119e3b
Added to comments to better explain last commit
Jul 18, 2014
b6ed218
Use string argument for legend's loc parameter
Jul 18, 2014
5189cfb
ENH: add a median absolute deviation metric
GaelVaroquaux Jul 18, 2014
434c20d
DOC: add an example of robust fitting
GaelVaroquaux Jul 18, 2014
e7fa78d
TST: fix doctest
GaelVaroquaux Jul 19, 2014
110d3f2
DOC: better documentation for robust models
GaelVaroquaux Jul 19, 2014
d3cddfe
API: naming: CamelCase class -> camel_case function
GaelVaroquaux Jul 19, 2014
c03d4bd
Merge remote-tracking branch 'gvaroquaux/pr_2949' into theilsen
FlorianWilhelm Jul 19, 2014
2fcba22
Merge remote-tracking branch 'upstream/master' into theilsen
FlorianWilhelm Jul 19, 2014
181c720
TST: Cleanups in test_theil_sen
FlorianWilhelm Sep 5, 2014
e44f9c7
COSMIT: Renamed _lse to _lstsq in theil_sen.py
FlorianWilhelm Sep 5, 2014
09d44f1
ENH: Removed shared-memory parallelism in theil_sen
FlorianWilhelm Sep 5, 2014
970e4c2
COSMIT: Inlined two methods in theil_sen.py
FlorianWilhelm Sep 5, 2014
00071bd
ENH: Use warnings instead of logging in theil_sen
FlorianWilhelm Sep 5, 2014
1314697
ENH: Removed _split_indices method in TheilSen
FlorianWilhelm Sep 5, 2014
d105145
ENH: Rewrote TheilSen._get_n_jobs as a function
FlorianWilhelm Sep 5, 2014
43b5e43
COSMIT: More explicit names for vars in theil_sen
FlorianWilhelm Sep 6, 2014
8dbec4f
Merge branch 'master' into theilsen
FlorianWilhelm Sep 6, 2014
3e70d84
FIX: usage of check_array in theil_sen
FlorianWilhelm Sep 6, 2014
030f7e1
FIX: Use check_consistent_length in theil_sen
FlorianWilhelm Sep 6, 2014
9db5ba1
ENH: Refactoring in theil_sen
FlorianWilhelm Sep 6, 2014
b28e2b6
ENH: Removed unnecessary generator in theil_sen
FlorianWilhelm Sep 6, 2014
49d3043
FIX: doctest of get_n_jobs
FlorianWilhelm Sep 6, 2014
7d2179b
ENH: Theil-Sen vs. RANSAC example
FlorianWilhelm Sep 7, 2014
b0ab714
Merge branch 'master' into theilsen
FlorianWilhelm Sep 24, 2014
a800040
COSMIT: Small changes regarding Theil-Sen
FlorianWilhelm Sep 24, 2014
3de9f22
DOC: Better documentation for Theil-Sen
FlorianWilhelm Sep 24, 2014
ef17f80
ENH: Improvements in the Theil-Sen regressor
FlorianWilhelm Sep 24, 2014
961bf67
ENH: Shortcut for 1d case in spatial median
FlorianWilhelm Sep 25, 2014
374ac6d
ENH: Avoid trailing \ in test_theilsen imports
FlorianWilhelm Sep 25, 2014
cf3b476
FIX: TheilSen -> TheilSenRegressor in docs
FlorianWilhelm Oct 2, 2014
ef43e7d
DOC: Narrative doc for median_absolute_error
Oct 9, 2014
53d1711
ENH: Reworked _modified_weiszfeld_step
Oct 9, 2014
afda6b0
DOC: Improved _spatial_median docs
Oct 9, 2014
89d5aa0
COSMIT: Replaced xrange by range
Oct 9, 2014
51cef86
COSMIT: Renamed y -> x_old in _modified_weiszfeld_step
Oct 9, 2014
d3de579
COSMIT: Renamed spmed[_old] to spatial_median[_old]
Oct 9, 2014
1f67b56
COSMIT: Break and for .. else in _spatial_median
Oct 9, 2014
9ce2444
ENH: Reworked _lstsq in theil_sen.py
Oct 9, 2014
389d13d
ENH: Replace AssertionError by ValueError
Oct 9, 2014
25746e2
COSMIT: Improved error message in theil_sen.py
Oct 9, 2014
8955fd0
COSMIT: Renamed n_all to all_combinations in theil_sen.py
Oct 9, 2014
961a2ea
COSMIT: Consistent naming for n_subpop
Oct 9, 2014
27ef88f
COSMIT: Fixed pep8 problem in theil_sen.py
Oct 9, 2014
4633c52
DOC: Moved notes section to long description
Oct 9, 2014
a5d2fd9
Merge branch 'master' into theilsen
Oct 9, 2014
aadd504
DOC: Added return doc to _lstsq
FlorianWilhelm Oct 10, 2014
562f4e4
COSMIT: Better variable names for _modified_weiszfeld_step
FlorianWilhelm Oct 10, 2014
d9bdddd
COSMIT: Moved epsilon to module level
FlorianWilhelm Oct 10, 2014
9c1b0c0
COSMIT: Some empty lines for better readability
FlorianWilhelm Oct 10, 2014
076bad6
ENH: Removed median_absolute_error due to PR #3761
FlorianWilhelm Oct 12, 2014
5b3a387
ENH: Made n_subpopulation a fit parameter
FlorianWilhelm Oct 12, 2014
9a3ad6a
COSMIT: Some renamings and PEP8 compliance
Oct 13, 2014
9ebed28
Merge branch 'master' into theilsen
Oct 13, 2014
657c86f
ENH: Removed 1d shortcut in _spatial_median again
Oct 13, 2014
8e5347b
COSMIT: Clearer slicing syntax in _modified_weiszfeld_step
Oct 13, 2014
6564d23
ENH: Fixed confusing X = X.T renaming
Oct 13, 2014
7a1a35d
COSMIT: Some renamings in _lstsq
Oct 13, 2014
923619f
FIX: Fix of last merge with master
Oct 13, 2014
c377ccd
COSMIT: Renamings for easier understanding
Oct 13, 2014
ec663e6
COSMIT: Another slicing syntax cleanup
Oct 13, 2014
29c21f0
Revert "ENH: Removed 1d shortcut in _spatial_median again"
Oct 13, 2014
777cd87
ENH: sample without replacement
Oct 13, 2014
64fc5c9
Merge branch 'master' into theilsen
Oct 14, 2014
4fcd6b8
COSMIT: pep8 and renaming
Oct 16, 2014
b8cd60b
COSMIT: replaced assert by assert_less/greater etc.
Oct 16, 2014
d5ec39b
TEST: No console output during unit tests
Oct 17, 2014
f9ecbf7
ENH: Always set random_state in unit tests
Oct 17, 2014
f92074a
ENH: Speedup of unit tests
Oct 17, 2014
c54c17e
COSMIT: Better consistency
Oct 17, 2014
2137d82
ENH: Added random_state in plot_theilsen.py
Oct 17, 2014
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -659,6 +659,7 @@ From text
linear_model.RidgeCV
linear_model.SGDClassifier
linear_model.SGDRegressor
linear_model.TheilSenRegressor

.. autosummary::
:toctree: generated/
Expand Down
156 changes: 151 additions & 5 deletions doc/modules/linear_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -789,13 +789,90 @@ For classification, :class:`PassiveAggressiveClassifier` can be used with
<http://jmlr.csail.mit.edu/papers/volume7/crammer06a/crammer06a.pdf>`_
K. Crammer, O. Dekel, J. Keshat, S. Shalev-Shwartz, Y. Singer - JMLR 7 (2006)

Robustness to outliers: RANSAC
==============================

RANSAC (RANdom SAmple Consensus) is an iterative algorithm for the robust
estimation of parameters from a subset of inliers from the complete data set.
Robustness regression: outliers and modeling errors
=====================================================

Robust regression is interested in fitting a regression model in the
presence of corrupt data: either outliers, or error in the model.

.. figure:: ../auto_examples/linear_model/images/plot_theilsen_001.png
:target: ../auto_examples/linear_model/plot_theilsen.html
:scale: 50%
:align: center

Different scenario and useful concepts
----------------------------------------

There are different things to keep in mind when dealing with data
corrupted by outliers:

.. |y_outliers| image:: ../auto_examples/linear_model/images/plot_robust_fit_003.png
:target: ../auto_examples/linear_model/plot_robust_fit.html
:scale: 60%

.. |X_outliers| image:: ../auto_examples/linear_model/images/plot_robust_fit_002.png
:target: ../auto_examples/linear_model/plot_robust_fit.html
:scale: 60%

.. |large_y_outliers| image:: ../auto_examples/linear_model/images/plot_robust_fit_005.png
:target: ../auto_examples/linear_model/plot_robust_fit.html
:scale: 60%

* **Outliers in X or in y**?

==================================== ====================================
Outliers in the y direction Outliers in the X direction
==================================== ====================================
|y_outliers| |X_outliers|
==================================== ====================================

* **Fraction of outliers versus amplitude of error**

The number of outlying points matters, but also how much they are
outliers.

==================================== ====================================
Small outliers Large outliers
==================================== ====================================
|y_outliers| |large_y_outliers|
==================================== ====================================

An important notion of robust fitting is that of breakdown point: the
fraction of data that can be outlying for the fit to start missing the
inlying data.

Note that in general, robust fitting in high-dimensional setting (large
`n_features`) is very hard. The robust models here will probably not work
in these settings.


.. topic:: **Trade-offs: which estimator?**

Scikit-learn provides 2 robust regression estimators:
:ref:`RANSAC <ransac_regression>` and
:ref:`Theil Sen <theil_sen_regression>`

* :ref:`RANSAC <ransac_regression>` is faster, and scales much better
with the number of samples

* :ref:`RANSAC <ransac_regression>` will deal better with large
outliers in the y direction (most common situation)

* :ref:`Theil Sen <theil_sen_regression>` will cope better with
medium-size outliers in the X direction, but this property will
disappear in large dimensional settings.

When in doubt, use :ref:`RANSAC <ransac_regression>`

.. _ransac_regression:

RANSAC: RANdom SAmple Consensus
--------------------------------

RANSAC (RANdom SAmple Consensus) fits a model from random subsets of
inliers from the complete data set.

It is an iterative method to estimate the parameters of a mathematical model.
RANSAC is a non-deterministic algorithm producing only a reasonable result with
a certain probability, which is dependent on the number of iterations (see
`max_trials` parameter). It is typically used for linear and non-linear
Expand All @@ -812,6 +889,9 @@ estimated only from the determined inliers.
:align: center
:scale: 50%

Details of the algorithm
^^^^^^^^^^^^^^^^^^^^^^^^

Each iteration performs the following steps:

1. Select ``min_samples`` random samples from the original data and check
Expand Down Expand Up @@ -841,6 +921,7 @@ performance.
.. topic:: Examples:

* :ref:`example_linear_model_plot_ransac.py`
* :ref:`example_linear_model_plot_robust_fit.py`

.. topic:: References:

Expand All @@ -853,6 +934,68 @@ performance.
<http://www.bmva.org/bmvc/2009/Papers/Paper355/Paper355.pdf>`_
Sunglok Choi, Taemin Kim and Wonpil Yu - BMVC (2009)

.. _theil_sen_regression:

Theil-Sen estimator: generalized-median-based estimator
--------------------------------------------------------

The :class:`TheilSenRegressor` estimator uses a generalization of the median in
multiple dimensions. It is thus robust to multivariate outliers. Note however
that the robustness of the estimator decreases quickly with the dimensionality
of the problem. It looses its robustness properties and becomes no
better than an ordinary least squares in high dimension.

.. topic:: Examples:

* :ref:`example_linear_model_plot_theilsen.py`
* :ref:`example_linear_model_plot_robust_fit.py`

.. topic:: References:

* http://en.wikipedia.org/wiki/Theil%E2%80%93Sen_estimator

Theoretical considerations
^^^^^^^^^^^^^^^^^^^^^^^^^^

:class:`TheilSenRegressor` is comparable to the :ref:`Ordinary Least Squares
(OLS) <ordinary_least_squares>` in terms of asymptotic efficiency and as an
unbiased estimator. In contrast to OLS, Theil-Sen is a non-parametric
method which means it makes no assumption about the underlying
distribution of the data. Since Theil-Sen is a median-based estimator, it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels strange to call a linear model a non parametric model: the number of parameters is fixed (it's the number of features + 1) and it assumes that the output variable is linearly dependent on the input variables (excluding the outliers). I would not call that a non-parametric model.

To me a non-parametric model can grow in complexity with the number of samples in the training set. Here the complexity (the number of parameters) is bounded by the number of features which is assumed to be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogrisel From Wikipedia: "In statistics, a parametric model or parametric family or finite-dimensional model is a family of distributions that can be described using a finite number of parameters."
Least squares for instance makes the assumption that the error is normally distributed. The parameters of this distribution is then found by maximum likelihood. At least this is one way to derive the least squares method. On the other hand, Theil-Sen makes no assumption about the underlying distribution, the method does not try to find the parameters of some distribution and is therefore considered non-parametric although some parameters are of course determined but not of a distribution.
I think that is just about how one defines parametric and non-parametric. Can I keep it that way since this is consistent with the definition in statistics?

is more robust against corrupted data aka outliers. In univariate
setting, Theil-Sen has a breakdown point of about 29.3% in case of a
simple linear regression which means that it can tolerate arbitrary
corrupted data of up to 29.3%.

.. figure:: ../auto_examples/linear_model/images/plot_theilsen_001.png
:target: ../auto_examples/linear_model/plot_theilsen.html
:align: center
:scale: 50%

The implementation of :class:`TheilSenRegressor` in scikit-learn follows a
generalization to a multivariate linear regression model [#f1]_ using the
spatial median which is a generalization of the median to multiple
dimensions [#f2]_.

In terms of time and space complexity, Theil-Sen scales according to

.. math::
\binom{n_{samples}}{n_{subsamples}}

which makes it infeasible to be applied exhaustively to problems with a
large number of samples and features. Therefore, the magnitude of a
subpopulation can be chosen to limit the time and space complexity by
considering only a random subset of all possible combinations.

.. topic:: Examples:

* :ref:`example_linear_model_plot_theilsen.py`

.. topic:: References:

.. [#f1] Xin Dang, Hanxiang Peng, Xueqin Wang and Heping Zhang: `Theil-Sen Estimators in a Multiple Linear Regression Model. <http://www.math.iupui.edu/~hpeng/MTSE_0908.pdf>`_

.. [#f2] T. Kärkkäinen and S. Äyrämö: `On Computation of Spatial Median for Robust Data Mining. <http://users.jyu.fi/~samiayr/pdf/ayramo_eurogen05.pdf>`_

.. _polynomial_regression:

Expand Down Expand Up @@ -965,3 +1108,6 @@ This way, we can solve the XOR problem with a linear classifier::
>>> clf = Perceptron(fit_intercept=False, n_iter=10).fit(X, y)
>>> clf.score(X, y)
1.0



87 changes: 87 additions & 0 deletions examples/linear_model/plot_robust_fit.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
"""
Robust linear estimator fitting
===============================

Here a sine function is fit with a polynomial of order 3, for values
close to zero.

Robust fitting is demoed in different situations:

- No measurement errors, only modelling errors (fitting a sine with a
polynomial)

- Measurement errors in X

- Measurement errors in y

The median absolute deviation to non corrupt new data is used to judge
the quality of the prediction.

What we can see that:

- RANSAC is good for strong outliers in the y direction

- TheilSen is good for small outliers, both in direction X and y, but has
a break point above which it performs worst than OLS.

"""

from matplotlib import pyplot as plt
import numpy as np

from sklearn import linear_model, metrics
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

np.random.seed(42)

X = np.random.normal(size=400)
y = np.sin(X)
# Make sure that it X is 2D
X = X[:, np.newaxis]

X_test = np.random.normal(size=200)
y_test = np.sin(X_test)
X_test = X_test[:, np.newaxis]

y_errors = y.copy()
y_errors[::3] = 3

X_errors = X.copy()
X_errors[::3] = 3

y_errors_large = y.copy()
y_errors_large[::3] = 10

X_errors_large = X.copy()
X_errors_large[::3] = 10

estimators = [('OLS', linear_model.LinearRegression()),
('Theil-Sen', linear_model.TheilSenRegressor(random_state=42)),
('RANSAC', linear_model.RANSACRegressor(random_state=42)), ]

x_plot = np.linspace(X.min(), X.max())

for title, this_X, this_y in [
('Modeling errors only', X, y),
('Corrupt X, small deviants', X_errors, y),
('Corrupt y, small deviants', X, y_errors),
('Corrupt X, large deviants', X_errors_large, y),
('Corrupt y, large deviants', X, y_errors_large)]:
plt.figure(figsize=(5, 4))
plt.plot(this_X[:, 0], this_y, 'k+')

for name, estimator in estimators:
model = make_pipeline(PolynomialFeatures(3), estimator)
model.fit(this_X, this_y)
mse = metrics.mean_squared_error(model.predict(X_test), y_test)
y_plot = model.predict(x_plot[:, np.newaxis])
plt.plot(x_plot, y_plot,
label='%s: error = %.3f' % (name, mse))

plt.legend(loc='best', frameon=False,
title='Error: mean absolute deviation\n to non corrupt data')
plt.xlim(-4, 10.2)
plt.ylim(-2, 10.2)
plt.title(title)
plt.show()
108 changes: 108 additions & 0 deletions examples/linear_model/plot_theilsen.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
"""
====================
Theil-Sen Regression
====================

Computes a Theil-Sen Regression on a synthetic dataset.

See :ref:`theil_sen_regression` for more information on the regressor.

Compared to the OLS (ordinary least squares) estimator, the Theil-Sen
estimator is robust against outliers. It has a breakdown point of about 29.3%
in case of a simple linear regression which means that it can tolerate
arbitrary corrupted data (outliers) of up to 29.3% in the two-dimensional
case.

The estimation of the model is done by calculating the slopes and intercepts
of a subpopulation of all possible combinations of p subsample points. If an
intercept is fitted, p must be greater than or equal to n_features + 1. The
final slope and intercept is then defined as the spatial median of these
slopes and intercepts.

In certain cases Theil-Sen performs better than :ref:`RANSAC
<ransac_regression>` which is also a robust method. This is illustrated in the
second example below where outliers with respect to the x-axis perturb RANSAC.
Tuning the ``residual_threshold`` parameter of RANSAC remedies this but in
general a priori knowledge about the data and the nature of the outliers is
needed.
Due to the computational complexity of Theil-Sen it is recommended to use it
only for small problems in terms of number of samples and features. For larger
problems the ``max_subpopulation`` parameter restricts the magnitude of all
possible combinations of p subsample points to a randomly chosen subset and
therefore also limits the runtime. Therefore, Theil-Sen is applicable to larger
problems with the drawback of losing some of its mathematical properties since
it then works on a random subset.
"""

# Author: Florian Wilhelm -- <florian.wilhelm@gmail.com>
# License: BSD 3 clause

import time
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, TheilSenRegressor
from sklearn.linear_model import RANSACRegressor

print(__doc__)

estimators = [('OLS', LinearRegression()),
('Theil-Sen', TheilSenRegressor(random_state=42)),
('RANSAC', RANSACRegressor(random_state=42)), ]

##############################################################################
# Outliers only in the y direction

np.random.seed(0)
n_samples = 200
# Linear model y = 3*x + N(2, 0.1**2)
x = np.random.randn(n_samples)
w = 3.
c = 2.
noise = 0.1 * np.random.randn(n_samples)
y = w * x + c + noise
# 10% outliers
y[-20:] += -20 * x[-20:]
X = x[:, np.newaxis]

plt.plot(x, y, 'k+', mew=2, ms=8)
line_x = np.array([-3, 3])
for name, estimator in estimators:
t0 = time.time()
estimator.fit(X, y)
elapsed_time = time.time() - t0
y_pred = estimator.predict(line_x.reshape(2, 1))
plt.plot(line_x, y_pred,
label='%s (fit time: %.2fs)' % (name, elapsed_time))

plt.axis('tight')
plt.legend(loc='upper left')


##############################################################################
# Outliers in the X direction

np.random.seed(0)
# Linear model y = 3*x + N(2, 0.1**2)
x = np.random.randn(n_samples)
noise = 0.1 * np.random.randn(n_samples)
y = 3 * x + 2 + noise
# 10% outliers
x[-20:] = 9.9
y[-20:] += 22
X = x[:, np.newaxis]

plt.figure()
plt.plot(x, y, 'k+', mew=2, ms=8)

line_x = np.array([-3, 10])
for name, estimator in estimators:
t0 = time.time()
estimator.fit(X, y)
elapsed_time = time.time() - t0
y_pred = estimator.predict(line_x.reshape(2, 1))
plt.plot(line_x, y_pred,
label='%s (fit time: %.2fs)' % (name, elapsed_time))

plt.axis('tight')
plt.legend(loc='upper left')
plt.show()
Loading