Skip to content

API Deprecates copy_X in TheilSenRegressor #29105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 30, 2024
7 changes: 7 additions & 0 deletions doc/whats_new/v1.6.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,13 @@ Changelog
- |Enhancement| Added a function :func:`base.is_clusterer` which determines
whether a given estimator is of category clusterer.
:pr:`28936` by :user:`Christian Veenhuis <ChVeen>`.

:mod:`sklearn.linear_model`
...........................

- |API| Deprecates `copy_X` in :class:`linear_model.TheilSenRegressor` as the parameter
has no effect. `copy_X` will be removed in 1.8.
:pr:`29105` by :user:`Adam Li <adam2392>`.

:mod:`sklearn.metrics`
......................
Expand Down
18 changes: 15 additions & 3 deletions sklearn/linear_model/_theil_sen.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from ..base import RegressorMixin, _fit_context
from ..exceptions import ConvergenceWarning
from ..utils import check_random_state
from ..utils._param_validation import Interval
from ..utils._param_validation import Hidden, Interval, StrOptions
from ..utils.parallel import Parallel, delayed
from ._base import LinearModel

Expand Down Expand Up @@ -228,6 +228,10 @@ class TheilSenRegressor(RegressorMixin, LinearModel):
copy_X : bool, default=True
If True, X will be copied; else, it may be overwritten.

.. deprecated:: 1.6
`copy_X` was deprecated in 1.6 and will be removed in 1.8.
It has no effect as a copy is always made.

max_subpopulation : int, default=1e4
Instead of computing with a set of cardinality 'n choose k', where n is
the number of samples and k is the number of subsamples (at least
Expand Down Expand Up @@ -324,7 +328,7 @@ class TheilSenRegressor(RegressorMixin, LinearModel):

_parameter_constraints: dict = {
"fit_intercept": ["boolean"],
"copy_X": ["boolean"],
"copy_X": ["boolean", Hidden(StrOptions({"deprecated"}))],
# target_type should be Integral but can accept Real for backward compatibility
"max_subpopulation": [Interval(Real, 1, None, closed="left")],
"n_subsamples": [None, Integral],
Expand All @@ -339,7 +343,7 @@ def __init__(
self,
*,
fit_intercept=True,
copy_X=True,
copy_X="deprecated",
max_subpopulation=1e4,
n_subsamples=None,
max_iter=300,
Expand Down Expand Up @@ -411,6 +415,14 @@ def fit(self, X, y):
self : returns an instance of self.
Fitted `TheilSenRegressor` estimator.
"""
if self.copy_X != "deprecated":
warnings.warn(
"`copy_X` was deprecated in 1.6 and will be removed in 1.8 since it "
"has no effect internally. Simply leave this parameter to its default "
"value to avoid this warning.",
FutureWarning,
)

random_state = check_random_state(self.random_state)
X, y = self._validate_data(X, y, y_numeric=True)
n_samples, n_features = X.shape
Expand Down
8 changes: 8 additions & 0 deletions sklearn/linear_model/tests/test_theil_sen.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,3 +292,11 @@ def test_less_samples_than_features():
theil_sen = TheilSenRegressor(fit_intercept=True, random_state=0).fit(X, y)
y_pred = theil_sen.predict(X)
assert_array_almost_equal(y_pred, y, 12)


# TODO(1.8): Remove
def test_copy_X_deprecated():
X, y, _, _ = gen_toy_problem_1d()
theil_sen = TheilSenRegressor(copy_X=True, random_state=0)
with pytest.warns(FutureWarning, match="`copy_X` was deprecated"):
theil_sen.fit(X, y)
Loading