Skip to content

[MRG] New Feature: VotingRegressor #12513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Apr 10, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
25ed281
AverageRegressor based on VotingClassifier implemented.
Nov 3, 2018
ae880ac
travis tests fixed.
Nov 3, 2018
6e484e2
tutorial added.
Nov 4, 2018
c68b7eb
doctests fixed.
Nov 4, 2018
fd4d3f3
watsnew added.
Nov 6, 2018
7a447d2
Merge remote-tracking branch 'sklearn/master'
Nov 6, 2018
2c5113b
doc fixed.
Nov 6, 2018
039969b
after review:
Nov 7, 2018
a3fe3a6
Merge remote-tracking branch 'sklearn/master'
Nov 7, 2018
e3317f2
flake8 fixed.
Nov 7, 2018
7f87eb2
setparams getparams moved to common class.
Nov 8, 2018
55ecc4f
Merge remote-tracking branch 'sklearn/master'
Nov 8, 2018
f395268
whats_new fixed.
Nov 8, 2018
aac0fff
Voting classifier tests not changed.
Nov 9, 2018
7262e48
Merge remote-tracking branch 'sklearn/master'
Nov 9, 2018
e16796d
Merge remote-tracking branch 'sklearn/master'
Nov 11, 2018
4cae3f8
Merge remote-tracking branch 'sklearn/master'
Nov 12, 2018
412e80c
Merge remote-tracking branch 'sklearn/master'
Nov 13, 2018
8dc81c8
weights test fixed.
Nov 13, 2018
3b16a26
Merge remote-tracking branch 'sklearn/master'
Nov 20, 2018
4088af3
Merge remote-tracking branch 'sklearn/master'
Dec 1, 2018
6c2d789
Merge remote-tracking branch 'sklearn/master'
Dec 11, 2018
82e13b7
Merge remote-tracking branch 'sklearn/master'
Jan 14, 2019
b56a7e1
Merge remote-tracking branch 'sklearn/master'
Feb 14, 2019
36d591c
doc fix.
Feb 14, 2019
4f5baa2
renamed AveragingRegressor to VotingRegressor
Feb 14, 2019
35eb73a
docs fixes.
Feb 14, 2019
1f30512
Merge remote-tracking branch 'sklearn/master'
Apr 5, 2019
b011547
six removed
Apr 5, 2019
b080cc1
tests fixed
Apr 5, 2019
a870376
Merge remote-tracking branch 'sklearn/master'
Apr 6, 2019
95695da
fixes:
Apr 6, 2019
85e53bc
Update sklearn/ensemble/voting.py
NicolasHug Apr 7, 2019
7506a0b
Update sklearn/ensemble/voting.py
NicolasHug Apr 7, 2019
8ebcd75
Update sklearn/ensemble/voting.py
NicolasHug Apr 7, 2019
237fa88
Update sklearn/ensemble/voting.py
NicolasHug Apr 7, 2019
65ca753
Update sklearn/ensemble/voting.py
NicolasHug Apr 7, 2019
d7098ea
docs fixed.
Apr 7, 2019
5ea624b
plot fixed
Apr 7, 2019
ee2dba4
ref fixed
Apr 7, 2019
d67a106
* https://github.com/scikit-learn/scikit-learn/pull/12513#discussion_…
Apr 10, 2019
3d0c714
Merge remote-tracking branch 'origin/master'
Apr 10, 2019
b722e6d
https://github.com/scikit-learn/scikit-learn/pull/12513#discussion_r2…
Apr 10, 2019
ba9ff4f
Update doc/modules/ensemble.rst
NicolasHug Apr 10, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -421,6 +421,7 @@ Samples generator
ensemble.RandomForestRegressor
ensemble.RandomTreesEmbedding
ensemble.VotingClassifier
ensemble.VotingRegressor

.. autosummary::
:toctree: generated/
Expand Down
45 changes: 44 additions & 1 deletion doc/modules/ensemble.rst
Original file line number Diff line number Diff line change
Expand Up @@ -927,7 +927,7 @@ averaged.
Voting Classifier
========================

The idea behind the :class:`VotingClassifier` is to combine
The idea behind the `VotingClassifier` is to combine
conceptually different machine learning classifiers and use a majority vote
or the average predicted probabilities (soft vote) to predict the class labels.
Such a classifier can be useful for a set of equally well performing model
Expand Down Expand Up @@ -1084,3 +1084,46 @@ Optionally, weights can be provided for the individual classifiers::

>>> eclf = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('gnb', clf3)],
... voting='soft', weights=[2, 5, 1])


.. _voting_regressor:

Voting Regressor
================

The idea behind the `VotingRegressor` is to combine conceptually
different machine learning regressors and return the average predicted values.
Such a regressor can be useful for a set of equally well performing models
in order to balance out their individual weaknesses.

Usage
.....

The following example shows how to fit the VotingRegressor::

>>> from sklearn import datasets
>>> from sklearn.ensemble import GradientBoostingRegressor
>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.linear_model import LinearRegression
>>> from sklearn.ensemble import VotingRegressor

>>> # Loading some example data
>>> boston = datasets.load_boston()
>>> X = boston.data
>>> y = boston.target

>>> # Training classifiers
>>> reg1 = GradientBoostingRegressor(random_state=1, n_estimators=10)
>>> reg2 = RandomForestRegressor(random_state=1, n_estimators=10)
>>> reg3 = LinearRegression()
>>> ereg = VotingRegressor(estimators=[('gb', reg1), ('rf', reg2), ('lr', reg3)])
>>> ereg = ereg.fit(X, y)

.. figure:: ../auto_examples/ensemble/images/sphx_glr_plot_voting_regressor_001.png
:target: ../auto_examples/ensemble/plot_voting_regressor.html
:align: center
:scale: 75%

.. topic:: Examples:

* :ref:`sphx_glr_auto_examples_ensemble_plot_voting_regressor.py`
6 changes: 6 additions & 0 deletions doc/whats_new/v0.21.rst
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,12 @@ Support for Python 3.4 and below has been officially dropped.
gradient boosting model has been trained with sample weights.
:issue:`13193` by :user:`Samuel O. Ronsin <samronsin>`.

- |Feature| Add :class:`ensemble.VotingRegressor`
which provides an equivalent of :class:`ensemble.VotingClassifier`
for regression problems.
:issue:`12513` by :user:`Ramil Nugmanov <stsouko>` and
:user:`Mohamed Ali Jamaoui <mohamed-ali>`.

:mod:`sklearn.externals`
........................

Expand Down
53 changes: 53 additions & 0 deletions examples/ensemble/plot_voting_regressor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
"""
=================================================
Plot individual and voting regression predictions
=================================================

Plot individual and averaged regression predictions for Boston dataset.

First, three exemplary regressors are initialized (`GradientBoostingRegressor`,
`RandomForestRegressor`, and `LinearRegression`) and used to initialize a
`VotingRegressor`.

The red starred dots are the averaged predictions.

"""
print(__doc__)

import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import VotingRegressor

# Loading some example data
boston = datasets.load_boston()
X = boston.data
y = boston.target

# Training classifiers
reg1 = GradientBoostingRegressor(random_state=1, n_estimators=10)
reg2 = RandomForestRegressor(random_state=1, n_estimators=10)
reg3 = LinearRegression()
ereg = VotingRegressor([('gb', reg1), ('rf', reg2), ('lr', reg3)])
reg1.fit(X, y)
reg2.fit(X, y)
reg3.fit(X, y)
ereg.fit(X, y)

xt = X[:20]

plt.figure()
plt.plot(reg1.predict(xt), 'gd', label='GradientBoostingRegressor')
plt.plot(reg2.predict(xt), 'b^', label='RandomForestRegressor')
plt.plot(reg3.predict(xt), 'ys', label='LinearRegression')
plt.plot(ereg.predict(xt), 'r*', label='VotingRegressor')
plt.tick_params(axis='x', which='both', bottom=False, top=False,
labelbottom=False)
plt.ylabel('predicted')
plt.xlabel('training samples')
plt.legend(loc="best")
plt.title('Comparison of individual predictions with averaged')
plt.show()
5 changes: 3 additions & 2 deletions sklearn/ensemble/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@
from .weight_boosting import AdaBoostRegressor
from .gradient_boosting import GradientBoostingClassifier
from .gradient_boosting import GradientBoostingRegressor
from .voting_classifier import VotingClassifier
from .voting import VotingClassifier
from .voting import VotingRegressor

from . import bagging
from . import forest
Expand All @@ -30,6 +31,6 @@
"ExtraTreesRegressor", "BaggingClassifier",
"BaggingRegressor", "IsolationForest", "GradientBoostingClassifier",
"GradientBoostingRegressor", "AdaBoostClassifier",
"AdaBoostRegressor", "VotingClassifier",
"AdaBoostRegressor", "VotingClassifier", "VotingRegressor",
"bagging", "forest", "gradient_boosting",
"partial_dependence", "weight_boosting"]
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Testing for the VotingClassifier"""
"""Testing for the VotingClassifier and VotingRegressor"""

import pytest
import numpy as np
Expand All @@ -11,21 +11,25 @@
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import VotingClassifier, VotingRegressor
from sklearn.model_selection import GridSearchCV
from sklearn import datasets
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.datasets import make_multilabel_classification
from sklearn.svm import SVC
from sklearn.multiclass import OneVsRestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.dummy import DummyRegressor


# Load the iris dataset and randomly permute it
# Load datasets
iris = datasets.load_iris()
X, y = iris.data[:, 1:3], iris.target

boston = datasets.load_boston()
X_r, y_r = boston.data, boston.target


@pytest.mark.filterwarnings('ignore: Default solver will be changed') # 0.22
@pytest.mark.filterwarnings('ignore: Default multi_class will') # 0.22
Expand All @@ -42,7 +46,7 @@ def test_estimator_init():
assert_raise_message(ValueError, msg, eclf.fit, X, y)

eclf = VotingClassifier(estimators=[('lr', clf)], weights=[1, 2])
msg = ('Number of classifiers and weights must be equal'
msg = ('Number of `estimators` and weights must be equal'
'; got 2 weights, 1 estimators')
assert_raise_message(ValueError, msg, eclf.fit, X, y)

Expand Down Expand Up @@ -76,9 +80,19 @@ def test_notfitted():
eclf = VotingClassifier(estimators=[('lr1', LogisticRegression()),
('lr2', LogisticRegression())],
voting='soft')
msg = ("This VotingClassifier instance is not fitted yet. Call \'fit\'"
ereg = VotingRegressor([('dr', DummyRegressor())])
msg = ("This %s instance is not fitted yet. Call \'fit\'"
" with appropriate arguments before using this method.")
assert_raise_message(NotFittedError, msg, eclf.predict_proba, X)
assert_raise_message(NotFittedError, msg % 'VotingClassifier',
eclf.predict, X)
assert_raise_message(NotFittedError, msg % 'VotingClassifier',
eclf.predict_proba, X)
assert_raise_message(NotFittedError, msg % 'VotingClassifier',
eclf.transform, X)
assert_raise_message(NotFittedError, msg % 'VotingRegressor',
ereg.predict, X_r)
assert_raise_message(NotFittedError, msg % 'VotingRegressor',
ereg.transform, X_r)


@pytest.mark.filterwarnings('ignore: Default solver will be changed') # 0.22
Expand Down Expand Up @@ -125,6 +139,38 @@ def test_weights_iris():
assert_almost_equal(scores.mean(), 0.93, decimal=2)


def test_weights_regressor():
"""Check weighted average regression prediction on boston dataset."""
reg1 = DummyRegressor(strategy='mean')
reg2 = DummyRegressor(strategy='median')
reg3 = DummyRegressor(strategy='quantile', quantile=.2)
ereg = VotingRegressor([('mean', reg1), ('median', reg2),
('quantile', reg3)], weights=[1, 2, 10])

X_r_train, X_r_test, y_r_train, y_r_test = \
train_test_split(X_r, y_r, test_size=.25)

reg1_pred = reg1.fit(X_r_train, y_r_train).predict(X_r_test)
reg2_pred = reg2.fit(X_r_train, y_r_train).predict(X_r_test)
reg3_pred = reg3.fit(X_r_train, y_r_train).predict(X_r_test)
ereg_pred = ereg.fit(X_r_train, y_r_train).predict(X_r_test)

avg = np.average(np.asarray([reg1_pred, reg2_pred, reg3_pred]), axis=0,
weights=[1, 2, 10])
assert_almost_equal(ereg_pred, avg, decimal=2)

ereg_weights_none = VotingRegressor([('mean', reg1), ('median', reg2),
('quantile', reg3)], weights=None)
ereg_weights_equal = VotingRegressor([('mean', reg1), ('median', reg2),
('quantile', reg3)],
weights=[1, 1, 1])
ereg_weights_none.fit(X_r_train, y_r_train)
ereg_weights_equal.fit(X_r_train, y_r_train)
ereg_none_pred = ereg_weights_none.predict(X_r_test)
ereg_equal_pred = ereg_weights_equal.predict(X_r_test)
assert_almost_equal(ereg_none_pred, ereg_equal_pred, decimal=2)


@pytest.mark.filterwarnings('ignore: Default solver will be changed') # 0.22
@pytest.mark.filterwarnings('ignore: Default multi_class will') # 0.22
@pytest.mark.filterwarnings('ignore:The default value of n_estimators')
Expand Down Expand Up @@ -382,8 +428,7 @@ def test_set_estimator_none():
eclf2.set_params(voting='soft').fit(X, y)
assert_array_equal(eclf1.predict(X), eclf2.predict(X))
assert_array_almost_equal(eclf1.predict_proba(X), eclf2.predict_proba(X))
msg = ('All estimators are None. At least one is required'
' to be a classifier!')
msg = 'All estimators are None. At least one is required!'
assert_raise_message(
ValueError, msg, eclf2.set_params(lr=None, rf=None, nb=None).fit, X, y)

Expand Down
Loading