Skip to content

Commit b279e4e

Browse files
s-banachs-banachadrinjalalithomasjpfan
authored andcommitted
FIX Target encoder const y (#28233)
Co-authored-by: s-banach <john@hopfensperger.family> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
1 parent 01ae2c5 commit b279e4e

File tree

3 files changed

+24
-2
lines changed

3 files changed

+24
-2
lines changed

doc/whats_new/v1.4.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,15 @@ Changelog
131131
and `axis=1`, as documented in the docstring.
132132
:pr:`28222` by :user:`Guillaume Lemaitre <glemaitre>`.
133133

134+
135+
:mod:`sklearn.preprocessing`
136+
............................
137+
138+
- |Fix| :class:`preprocessing.TargetEncoder` no longer fails when
139+
`target_type="continuous"` and the input is read-only. In particular, it now
140+
works with pandas copy-on-write mode enabled.
141+
:pr:`28233` by :user:`John Hopfensperger <s-banach>`.
142+
134143
.. _changes_1_4:
135144

136145
Version 1.4.0

sklearn/preprocessing/_target_encoder_fast.pyx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ ctypedef fused Y_DTYPE:
1919

2020
def _fit_encoding_fast(
2121
INT_DTYPE[:, ::1] X_int,
22-
Y_DTYPE[:] y,
22+
const Y_DTYPE[:] y,
2323
cnp.int64_t[::1] n_categories,
2424
double smooth,
2525
double y_mean,
@@ -79,7 +79,7 @@ def _fit_encoding_fast(
7979

8080
def _fit_encoding_fast_auto_smooth(
8181
INT_DTYPE[:, ::1] X_int,
82-
Y_DTYPE[:] y,
82+
const Y_DTYPE[:] y,
8383
cnp.int64_t[::1] n_categories,
8484
double y_mean,
8585
double y_variance,

sklearn/preprocessing/tests/test_target_encoder.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -701,3 +701,16 @@ def test_target_encoding_for_linear_regression(smooth, global_random_seed):
701701
# cardinality yet non-informative feature instead of the lower
702702
# cardinality yet informative feature:
703703
assert abs(coef[0]) < abs(coef[2])
704+
705+
706+
def test_pandas_copy_on_write():
707+
"""
708+
Test target-encoder cython code when y is read-only.
709+
710+
The numpy array underlying df["y"] is read-only when copy-on-write is enabled.
711+
Non-regression test for gh-27879.
712+
"""
713+
pd = pytest.importorskip("pandas", minversion="2.0")
714+
with pd.option_context("mode.copy_on_write", True):
715+
df = pd.DataFrame({"x": ["a", "b", "b"], "y": [4.0, 5.0, 6.0]})
716+
TargetEncoder(target_type="continuous").fit(df[["x"]], df["y"])

0 commit comments

Comments
 (0)