Skip to content

MRG Deprecates 'normalize' in LinearRegression (_base.py) #17743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 141 commits into from
Jan 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
141 commits
Select commit Hold shift + click to select a range
06de537
first normalize changes
maikia Jun 26, 2020
d1c9816
exchanged setting self.normalize by _normalize
maikia Jun 26, 2020
233a82e
updated the warning
maikia Jun 26, 2020
9369ed3
clean up
maikia Jun 26, 2020
2e93e06
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
maikia Jun 26, 2020
523d588
added test if warnings do show up
maikia Jun 26, 2020
a7b7422
clean up
maikia Jun 26, 2020
15368a7
change of the warning msg
maikia Jun 26, 2020
293682f
clean up
maikia Jun 26, 2020
6258d1c
updated warning msg
maikia Jun 26, 2020
2f4d60a
updated warning msg
maikia Jun 26, 2020
582532a
removed ignore warning from the test
maikia Jun 26, 2020
428f1fa
Update sklearn/linear_model/tests/test_base.py
maikia Jun 26, 2020
a93d367
cleaning up the test
maikia Jun 26, 2020
0e89ab9
Update sklearn/linear_model/tests/test_base.py
maikia Jun 26, 2020
97c6221
cleaning up the test
maikia Jun 26, 2020
25fe971
Merge branch 'depreciate_normalize_base' of https://github.com/maikia…
maikia Jun 26, 2020
0b3e5b5
updated tests in test_coordinate_descent
maikia Jun 26, 2020
38b9b5e
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
maikia Jun 29, 2020
7fc22ee
removed with_mean=False from standardScaler
maikia Jun 29, 2020
86bb2ef
added private function _deprecate_normalize(normalize, default) to ca…
maikia Jun 29, 2020
59e4c87
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
maikia Jun 30, 2020
2e52377
added whats new
maikia Jun 30, 2020
c8d9295
corrected the text in whats new
maikia Jun 30, 2020
15f2a1e
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
maikia Jul 1, 2020
5a3008b
extended underline of the title
maikia Jul 1, 2020
45b42e3
Update doc/whats_new/v0.24.rst
maikia Jul 1, 2020
afc9c01
Update doc/whats_new/v0.24.rst
maikia Jul 1, 2020
63f93cc
Update doc/whats_new/v0.24.rst
maikia Jul 1, 2020
ca413e1
update the test to init LinearRegression outside of the pytest.warns
maikia Jul 1, 2020
e546ec1
Update sklearn/linear_model/_base.py
maikia Jul 1, 2020
3886e02
changed the warning message
maikia Jul 1, 2020
da1ec7d
Merge branch 'depreciate_normalize_base' of https://github.com/maikia…
maikia Jul 1, 2020
4e4afe7
changed deprecate for deprecated
maikia Jul 1, 2020
e29932d
changed deprecate for deprecated
maikia Jul 1, 2020
7f32058
added checking if message correct in the test
maikia Jul 1, 2020
82584f4
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
maikia Jul 2, 2020
6d3ee72
Update sklearn/linear_model/tests/test_base.py
maikia Jul 2, 2020
ad7a508
updated the whats new file
maikia Jul 2, 2020
9cceafe
updated warning message
maikia Jul 2, 2020
74c30e0
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
maikia Jul 2, 2020
f0eb556
Merge branch 'depreciate_normalize_base' of https://github.com/maikia…
maikia Jul 2, 2020
798efcc
clean up
maikia Jul 2, 2020
abfd95d
merging changes
maikia Jul 22, 2020
3cbb7b0
checking for the version sklearn and gives the appropriate message if…
maikia Jul 22, 2020
c8d66b5
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
maikia Jul 22, 2020
d8ee96c
added tests for test_deprecate_normalize
maikia Jul 22, 2020
0dbc444
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
maikia Jul 24, 2020
63424b7
updated all the messages depending on the value of normalize, default…
maikia Jul 24, 2020
117059e
Update sklearn/linear_model/_base.py
maikia Jul 24, 2020
5ae8ece
removed functionality from version 0.26 up\
maikia Jul 24, 2020
3aec615
clean up
maikia Jul 24, 2020
ff6bec1
added relative import to _base.py
maikia Jul 24, 2020
6d5d588
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
maikia Aug 5, 2020
5170b2c
removed checking for version of sklearn in _base.py
maikia Aug 5, 2020
ac18d63
removed checking for the version from the test
maikia Aug 5, 2020
2c7f8d6
merging conflict with master
maikia Aug 26, 2020
8024a5a
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
maikia Sep 30, 2020
1d8e3de
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
maikia Oct 28, 2020
3ee028e
updated the message
maikia Oct 28, 2020
73c3f70
added parameter est_name
maikia Oct 28, 2020
f0d850c
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
maikia Oct 28, 2020
04dd2b2
clean up;
maikia Oct 28, 2020
ec950dc
merging with master
maikia Jan 4, 2021
509c7c8
updated the versioning
maikia Jan 4, 2021
1ae7ead
update versioning in test_base
maikia Jan 4, 2021
011d751
updated the doc
maikia Jan 4, 2021
79a82e9
clean up
maikia Jan 4, 2021
91fd435
update versioning
maikia Jan 4, 2021
1911725
cleanup
maikia Jan 4, 2021
8c39982
added test for sample_weights with normalize and in a pipeline
maikia Jan 4, 2021
b49d4cc
updated the deprecate message to sugest the use of the fit params in …
maikia Jan 4, 2021
f0d81fa
test updated
maikia Jan 4, 2021
0208ee9
cleanup;
maikia Jan 4, 2021
a84ac5c
merge master
maikia Jan 5, 2021
01fbdd1
update versioning
maikia Jan 5, 2021
54e7197
updated the test
maikia Jan 5, 2021
eb913f6
Update sklearn/linear_model/_base.py
maikia Jan 12, 2021
053ceaa
Update sklearn/linear_model/_base.py
maikia Jan 12, 2021
71ba0bf
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
maikia Jan 12, 2021
4a6d7bc
updated the docstring
maikia Jan 12, 2021
10e7380
clean up
maikia Jan 12, 2021
4a078f4
Update sklearn/linear_model/tests/test_coordinate_descent.py
maikia Jan 12, 2021
a66a14e
update variable name
maikia Jan 12, 2021
9cd4156
update params in the test
maikia Jan 12, 2021
787fe67
update param names
maikia Jan 12, 2021
b0c7bc8
update param names
maikia Jan 12, 2021
b01c01d
cleanup
maikia Jan 12, 2021
8649118
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
maikia Jan 14, 2021
3d14885
update the whatsnew
maikia Jan 14, 2021
169dae5
Update sklearn/linear_model/_base.py
maikia Jan 14, 2021
d61aad0
Update sklearn/linear_model/_base.py
maikia Jan 14, 2021
511261e
add value error on wrong normalize
maikia Jan 14, 2021
86b1c21
update the info on what to do in v1.2
maikia Jan 14, 2021
80eb9e5
remove patch version from warnings
maikia Jan 14, 2021
7aee6ea
update warning msg
maikia Jan 14, 2021
64cf6aa
update the warning message
maikia Jan 14, 2021
c8f0456
update the if statement
maikia Jan 14, 2021
866ced7
Update sklearn/linear_model/_base.py
maikia Jan 14, 2021
505bcd9
Update sklearn/linear_model/_base.py
maikia Jan 14, 2021
f969696
Update sklearn/linear_model/_base.py
maikia Jan 14, 2021
828a3e9
improve formatting
maikia Jan 14, 2021
f5e878c
merge differences
maikia Jan 14, 2021
a9403c5
cleanup
maikia Jan 14, 2021
4683f62
Update sklearn/linear_model/tests/test_base.py
maikia Jan 14, 2021
1a0dfe1
Update sklearn/linear_model/tests/test_base.py
maikia Jan 14, 2021
f1f21f0
Update sklearn/linear_model/tests/test_base.py
maikia Jan 14, 2021
620d011
Update sklearn/linear_model/tests/test_base.py
maikia Jan 14, 2021
270a946
Update sklearn/linear_model/tests/test_base.py
maikia Jan 14, 2021
48c704e
Update sklearn/linear_model/tests/test_base.py
maikia Jan 14, 2021
2afefc5
Update sklearn/linear_model/tests/test_base.py
maikia Jan 14, 2021
68aa605
Update sklearn/linear_model/tests/test_base.py
maikia Jan 14, 2021
c97a6be
Update sklearn/linear_model/tests/test_base.py
maikia Jan 14, 2021
0b035a3
Update sklearn/linear_model/tests/test_base.py
maikia Jan 14, 2021
48f3b87
Update sklearn/linear_model/tests/test_coordinate_descent.py
maikia Jan 14, 2021
57b8819
update the test
maikia Jan 14, 2021
46d8606
update tests
maikia Jan 14, 2021
45d9026
cleanup the test
maikia Jan 14, 2021
4a21c22
typo
maikia Jan 14, 2021
ca7f528
merge
maikia Jan 14, 2021
8609a9e
correct the test
maikia Jan 14, 2021
84f7be7
update getting the class name
maikia Jan 14, 2021
81f276d
cleanup
maikia Jan 14, 2021
0653714
sklearn/linear_model/tests/test_base.py
maikia Jan 14, 2021
80822f7
cleanup
maikia Jan 14, 2021
db71adf
Update sklearn/linear_model/_base.py
maikia Jan 18, 2021
e8d7396
Update sklearn/linear_model/_base.py
maikia Jan 18, 2021
26b729a
Update sklearn/linear_model/_base.py
maikia Jan 18, 2021
ba547a0
Update sklearn/linear_model/tests/test_base.py
maikia Jan 18, 2021
4bd9b63
Update sklearn/linear_model/tests/test_base.py
maikia Jan 18, 2021
32447a8
Update sklearn/linear_model/tests/test_coordinate_descent.py
maikia Jan 18, 2021
5efe0de
Update sklearn/linear_model/tests/test_coordinate_descent.py
maikia Jan 18, 2021
e9527e5
Update sklearn/linear_model/_base.py
maikia Jan 18, 2021
6686800
Update sklearn/linear_model/_base.py
maikia Jan 18, 2021
21914d0
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
maikia Jan 18, 2021
69b0080
update a doc
maikia Jan 18, 2021
a149dcc
add the doc to the test
maikia Jan 18, 2021
0036778
Update sklearn/linear_model/_base.py
maikia Jan 18, 2021
7ae55a3
Update sklearn/linear_model/_base.py
maikia Jan 18, 2021
81d34b4
Extend test_linear_model_sample_weights_normalize_in_pipeline to bett…
ogrisel Jan 20, 2021
498c430
Merge branch 'master' of https://github.com/scikit-learn/scikit-learn…
maikia Jan 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions doc/whats_new/v1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,18 @@ Changelog
Use ``var_`` instead.
:pr:`18842` by :user:`Hong Shao Yang <hongshaoyang>`.

- |API|: The parameter ``normalize`` of :class:`linear_model.LinearRegression`
is deprecated and will be removed in 1.2.
Motivation for this deprecation: ``normalize`` parameter did not take any
effect if ``fit_intercept`` was set to False and therefore was deemed
confusing.
The behavior of the deprecated LinearRegression(normalize=True) can be
reproduced with :class:`~sklearn.pipeline.Pipeline` with
:class:`~sklearn.preprocessing.StandardScaler`as follows:
make_pipeline(StandardScaler(with_mean=False), LinearRegression()).
:pr:`17743` by :user:`Maria Telenczuk <maikia>` and
:user:`Alexandre Gramfort <agramfort>`.

Code and Documentation Contributors
-----------------------------------

Expand Down
110 changes: 105 additions & 5 deletions sklearn/linear_model/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
# Lars Buitinck
# Maryan Morel <maryan.morel@polytechnique.edu>
# Giorgio Patrini <giorgio.patrini@anu.edu.au>
# Maria Telenczuk <https://github.com/maikia>
# License: BSD 3 clause

from abc import ABCMeta, abstractmethod
Expand Down Expand Up @@ -49,6 +50,94 @@
# intercept oscillation.


# FIXME in 1.2: parameter 'normalize' should be removed from linear models
# in cases where now normalize=False. The default value of 'normalize' should
# be changed to False in linear models where now normalize=True
def _deprecate_normalize(normalize, default, estimator_name):
""" Normalize is to be deprecated from linear models and a use of
a pipeline with a StandardScaler is to be recommended instead.
Here the appropriate message is selected to be displayed to the user
depending on the default normalize value (as it varies between the linear
models and normalize value selected by the user).

Parameters
----------
normalize : bool,
normalize value passed by the user

default : bool,
default normalize value used by the estimator

estimator_name : string,
name of the linear estimator which calls this function.
The name will be used for writing the deprecation warnings

Returns
-------
normalize : bool,
normalize value which should further be used by the estimator at this
stage of the depreciation process

Notes
-----
This function should be updated in 1.2 depending on the value of
`normalize`:
- True, warning: `normalize` was deprecated in 1.2 and will be removed in
1.4. Suggest to use pipeline instead.
- False, `normalize` was deprecated in 1.2 and it will be removed in 1.4.
Leave normalize to its default value.
- `deprecated` - this should only be possible with default == False as from
1.2 `normalize` in all the linear models should be either removed or the
default should be set to False.
This function should be completely removed in 1.4.
"""

if normalize not in [True, False, 'deprecated']:
raise ValueError("Leave 'normalize' to its default value or set it "
"to True or False")

if normalize == 'deprecated':
_normalize = default
else:
_normalize = normalize

if default and normalize == 'deprecated':
warnings.warn(
"The default of 'normalize' will be set to False in version 1.2 "
"and deprecated in version 1.4. \nPass normalize=False and use "
"Pipeline with a StandardScaler in a preprocessing stage if you "
"wish to reproduce the previous behavior:\n"
"model = make_pipeline(StandardScaler(with_mean=False), \n"
f"{estimator_name}(normalize=False))\n"
"If you wish to use additional parameters in "
"the fit() you can include them as follows:\n"
"kwargs = {model.steps[-1][0] + "
"'__<your_param_name>': <your_param_value>}\n"
"model.fit(X, y, **kwargs)", FutureWarning
)
elif normalize != 'deprecated' and normalize and not default:
warnings.warn(
"'normalize' was deprecated in version 1.0 and will be "
"removed in 1.2 \nIf you still wish to normalize use "
"Pipeline with a StandardScaler in a preprocessing stage if you "
"wish to reproduce the previous behavior:\n"
"model = make_pipeline(StandardScaler(with_mean=False), "
f"{estimator_name}()). \nIf you wish to use additional "
"parameters in the fit() you can include them as follows: "
"kwargs = {model.steps[-1][0] + "
"'__<your_param_name>': <your_param_value>}\n"
"model.fit(X, y, **kwargs)", FutureWarning
)
elif not normalize and not default:
warnings.warn(
"'normalize' was deprecated in version 1.0 and will be"
" removed in 1.2 Don't set 'normalize' parameter"
" and leave it to its default value", FutureWarning
)

return _normalize


def make_dataset(X, y, sample_weight, random_state=None):
"""Create ``Dataset`` abstraction for sparse and dense inputs.

Expand Down Expand Up @@ -407,6 +496,10 @@ class LinearRegression(MultiOutputMixin, RegressorMixin, LinearModel):
:class:`~sklearn.preprocessing.StandardScaler` before calling ``fit``
on an estimator with ``normalize=False``.

.. deprecated:: 1.0
`normalize` was deprecated in version 1.0 and will be
removed in 1.2.

copy_X : bool, default=True
If True, X will be copied; else, it may be overwritten.

Expand Down Expand Up @@ -476,8 +569,8 @@ class LinearRegression(MultiOutputMixin, RegressorMixin, LinearModel):
array([16.])
"""
@_deprecate_positional_args
def __init__(self, *, fit_intercept=True, normalize=False, copy_X=True,
n_jobs=None, positive=False):
def __init__(self, *, fit_intercept=True, normalize='deprecated',
copy_X=True, n_jobs=None, positive=False):
self.fit_intercept = fit_intercept
self.normalize = normalize
self.copy_X = copy_X
Expand Down Expand Up @@ -507,6 +600,11 @@ def fit(self, X, y, sample_weight=None):
self : returns an instance of self.
"""

_normalize = _deprecate_normalize(
self.normalize, default=False,
estimator_name=self.__class__.__name__
)

n_jobs_ = self.n_jobs

accept_sparse = False if self.positive else ['csr', 'csc', 'coo']
Expand All @@ -519,7 +617,7 @@ def fit(self, X, y, sample_weight=None):
dtype=X.dtype)

X, y, X_offset, y_offset, X_scale = self._preprocess_data(
X, y, fit_intercept=self.fit_intercept, normalize=self.normalize,
X, y, fit_intercept=self.fit_intercept, normalize=_normalize,
copy=self.copy_X, sample_weight=sample_weight,
return_mean=True)

Expand Down Expand Up @@ -651,10 +749,12 @@ def _pre_fit(X, y, Xy, precompute, normalize, fit_intercept, copy,
check_input=check_input, sample_weight=sample_weight)
if sample_weight is not None:
X, y = _rescale_data(X, y, sample_weight=sample_weight)

# FIXME: 'normalize' to be removed in 1.2
if hasattr(precompute, '__array__'):
if (fit_intercept and not np.allclose(X_offset, np.zeros(n_features))
or normalize and not np.allclose(X_scale,
np.ones(n_features))):
or normalize and not np.allclose(X_scale, np.ones(n_features)
)):
warnings.warn(
"Gram matrix was provided but X was centered to fit "
"intercept, or X was normalized : recomputing Gram matrix.",
Expand Down
87 changes: 86 additions & 1 deletion sklearn/linear_model/tests/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from sklearn.utils.fixes import parse_version

from sklearn.linear_model import LinearRegression
from sklearn.linear_model._base import _deprecate_normalize
from sklearn.linear_model._base import _preprocess_data
from sklearn.linear_model._base import _rescale_data
from sklearn.linear_model._base import make_dataset
Expand Down Expand Up @@ -106,6 +107,7 @@ def test_raises_value_error_if_positive_and_sparse():
with pytest.raises(TypeError, match=error_msg):
reg.fit(X, y)


def test_raises_value_error_if_sample_weights_greater_than_1d():
# Sample weights must be either scalar or 1D

Expand Down Expand Up @@ -149,6 +151,59 @@ def test_fit_intercept():
lr3_without_intercept.coef_.ndim)


def test_error_on_wrong_normalize():
normalize = 'wrong'
default = True
error_msg = "Leave 'normalize' to its default"
with pytest.raises(ValueError, match=error_msg):
_deprecate_normalize(normalize, default, 'estimator')
ValueError


@pytest.mark.parametrize('normalize', [True, False, 'deprecated'])
@pytest.mark.parametrize('default', [True, False])
# FIXME update test in 1.2 for new versions
def test_deprecate_normalize(normalize, default):
# test all possible case of the normalize parameter deprecation
if not default:
if normalize == 'deprecated':
# no warning
output = default
expected = None
warning_msg = []
else:
output = normalize
expected = FutureWarning
warning_msg = ['1.2']
if not normalize:
warning_msg.append('default value')
else:
warning_msg.append('StandardScaler(')
elif default:
if normalize == 'deprecated':
# warning to pass False and use StandardScaler
output = default
expected = FutureWarning
warning_msg = ['False', '1.2', 'StandardScaler(']
else:
# no warning
output = normalize
expected = None
warning_msg = []

with pytest.warns(expected) as record:
_normalize = _deprecate_normalize(normalize, default, 'estimator')
assert _normalize == output

n_warnings = 0 if expected is None else 1
assert len(record) == n_warnings
if n_warnings:
assert all([
warning in str(record[0].message)
for warning in warning_msg
])


def test_linear_regression_sparse(random_state=0):
# Test that linear regression also works with sparse data
random_state = check_random_state(random_state)
Expand All @@ -165,6 +220,35 @@ def test_linear_regression_sparse(random_state=0):
assert_array_almost_equal(ols.predict(X) - y.ravel(), 0)


@pytest.mark.parametrize(
'normalize, n_warnings, warning',
[(True, 1, FutureWarning),
(False, 1, FutureWarning),
("deprecated", 0, None)]
)
# FIXME remove test in 1.4
def test_linear_regression_normalize_deprecation(
normalize, n_warnings, warning
):
# check that we issue a FutureWarning when normalize was set in
# LinearRegression
rng = check_random_state(0)
n_samples = 200
n_features = 2
X = rng.randn(n_samples, n_features)
X[X < 0.1] = 0.0
y = rng.rand(n_samples)

model = LinearRegression(normalize=normalize)
with pytest.warns(warning) as record:
model.fit(X, y)
assert len(record) == n_warnings
if n_warnings:
assert "'normalize' was deprecated" in str(record[0].message)


# FIXME: 'normalize' to be removed in 1.2 in LinearRegression
@pytest.mark.filterwarnings("ignore:'normalize' was deprecated")
@pytest.mark.parametrize('normalize', [True, False])
@pytest.mark.parametrize('fit_intercept', [True, False])
def test_linear_regression_sparse_equal_dense(normalize, fit_intercept):
Expand Down Expand Up @@ -303,8 +387,9 @@ def test_linear_regression_pd_sparse_dataframe_warning():
df[str(col)] = arr

msg = "pandas.DataFrame with sparse columns found."

reg = LinearRegression()
with pytest.warns(UserWarning, match=msg):
reg = LinearRegression()
reg.fit(df.iloc[:, 0:2], df.iloc[:, 3])

# does not warn when the whole dataframe is sparse
Expand Down
57 changes: 57 additions & 0 deletions sklearn/linear_model/tests/test_coordinate_descent.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
from sklearn.utils._testing import assert_warns_message
from sklearn.utils._testing import ignore_warnings
from sklearn.utils._testing import assert_array_equal
from sklearn.utils._testing import _convert_container
from sklearn.utils._testing import TempMemmap
from sklearn.utils.fixes import parse_version

Expand Down Expand Up @@ -301,6 +302,8 @@ def test_lasso_cv_positive_constraint():
assert min(clf_constrained.coef_) >= 0


# FIXME: 'normalize' to be removed in 1.2
@pytest.mark.filterwarnings("ignore:'normalize' was deprecated")
@pytest.mark.parametrize(
"LinearModel, params",
[(Lasso, {"tol": 1e-16, "alpha": 0.1}),
Expand Down Expand Up @@ -384,6 +387,60 @@ def test_model_pipeline_same_as_normalize_true(LinearModel, params):
assert_allclose(y_pred_normalize, y_pred_standardize)


# FIXME: 'normalize' to be removed in 1.2
@pytest.mark.filterwarnings("ignore:'normalize' was deprecated")
@pytest.mark.parametrize(
"estimator, is_sparse, with_mean",
[(LinearRegression, True, False),
(LinearRegression, False, True),
(LinearRegression, False, False)]
)
def test_linear_model_sample_weights_normalize_in_pipeline(
estimator, is_sparse, with_mean
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this test is only meant to test LinearRegression it should be moved to sklearn/linear_model/tests/test_base.py. If it's meant to be extended to Ridge, Lasso... maybe it should be move to a new file, e.g. sklearn/linear_model/tests/test_linear_model.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall I was proposing sklearn/linear_model/tests/test_common.py that is the usual way that we structure common tests for a module.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To anticipate this question, I tried to see if this test would pass with the current code for Ridge and Lasso and actually it always fails whether with_mean is True or False on dense data and it also fails with with_mean=False on sparse data:

sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[LinearRegression-True-False] PASSED                                                          [ 12%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[LinearRegression-False-True] PASSED                                                          [ 25%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[LinearRegression-False-False] PASSED                                                         [ 37%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Ridge-True-False] FAILED                                                                     [ 50%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Ridge-False-True] FAILED                                                                     [ 62%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Ridge-False-False] FAILED                                                                    [ 75%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Lasso-False-True] FAILED                                                                     [ 87%]
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_model_sample_weights_normalize_in_pipeline[Lasso-False-False] FAILED                                                                    [100%]

So the deprecation of their normalize option should not be implemented with the same message I believe.

To keep this PR focused, let's just move this test to sklearn/linear_model/tests/test_base.py for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @glemaitre . I answered your comment, but it must have gone lost int the flow of other comments:
#17743 (comment)

I will move it to test_base.py.

@ogrisel failing for Ridge and Lasso might indeed be a problem as it was supposed to be extended to include them in this test. Why is this the case (for their failing)?

Copy link
Contributor Author

@maikia maikia Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glemaitre
There is no file: sklearn/linear_model/tests/test_common.py
(there is: sklearn/tests/test_common.py, hence my previous question above).

Should I create it?

Copy link
Member

@ogrisel ogrisel Jan 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no file: sklearn/linear_model/tests/test_common.py

We could create one. But it's fine to keep in sklearn/linear_model/tests/test_base.py for now. You can move this test to sklearn/linear_model/tests/test_common.py in a PR that needs to reuse it for another estimator of the sklearn.linear_model module.

# Test that the results for running linear regression LinearRegression with
# sample_weight set and with normalize set to True gives similar results as
# LinearRegression with no normalize in a pipeline with a StandardScaler
# and set sample_weight.
rng = np.random.RandomState(0)
X, y = make_regression(n_samples=20, n_features=5, noise=1e-2,
random_state=rng)
# make sure the data is not centered to make the problem more
# difficult
X += 10
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5,
random_state=rng)
if is_sparse:
X_train = sparse.csr_matrix(X_train)
X_test = _convert_container(X_train, 'sparse')

sample_weight = rng.rand(X_train.shape[0])

# linear estimator with explicit sample_weight
reg_with_normalize = estimator(normalize=True)
reg_with_normalize.fit(X_train, y_train, sample_weight=sample_weight)

# linear estimator in a pipeline
reg_with_scaler = make_pipeline(
StandardScaler(with_mean=with_mean),
estimator(normalize=False)
)
kwargs = {reg_with_scaler.steps[-1][0] + '__sample_weight':
sample_weight}
reg_with_scaler.fit(X_train, y_train, **kwargs)

y_pred_norm = reg_with_normalize.predict(X_test)
y_pred_pip = reg_with_scaler.predict(X_test)

assert_allclose(
reg_with_normalize.coef_ * reg_with_scaler[0].scale_,
reg_with_scaler[1].coef_
)
assert_allclose(y_pred_norm, y_pred_pip)


# FIXME: 'normalize' to be removed in 1.2
@pytest.mark.filterwarnings("ignore:'normalize' was deprecated")
@pytest.mark.parametrize(
"LinearModel, params",
[(Lasso, {"tol": 1e-16, "alpha": 0.1}),
Expand Down