-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX Target encoder const y #28233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Target encoder const y #28233
Conversation
CI fails to compile though. Do we need a newwer cython for this to compile? As for setting the option on pandas, it should be in a context manager kind of thing, and adding a new test for this is okay. |
Evidently the version of cython on conda-forge is a couple weeks out of date. I know you guys need conda because your build system is so complicated, so I’ll just wait on this. |
@lesteve would know better where we need to change to upgrade the cython version on builds. |
If I understand correctly, it looks like you need Cython 3.0.8 to be our minimum Cython supported version, I think we need to discuss this outside of this PR. There was at least one other issue where Cython >= 3 was needed #27682. I am fine with it personnally since numpy and scipy require Cython >=3 IIRC but feed-back needs to be gathered and then someone needs to update the lock-files. Also currently the lock-file update failed because pandas started to warn about PyArrow becoming a required dependency, so this needs to be handled too ...
Yes it does, so to avoid side-effects maybe pandas has a context manager or you need to reset it with a On It would be great to do a common test although it won't catch everything since it uses default parameter, see #27879 (comment) |
Opened #28258 to discuss pandas issue. I think we can simply update Cython, since the main issue was on cython==3.0, which is resolved now. |
I opened #28259 to get feed-back on Cython 3.0.8 being our minimum supported version. I am +1 for this. |
CI failing @s-banach |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @s-banach !
@@ -701,3 +701,10 @@ def test_target_encoding_for_linear_regression(smooth, global_random_seed): | |||
# cardinality yet non-informative feature instead of the lower | |||
# cardinality yet informative feature: | |||
assert abs(coef[0]) < abs(coef[2]) | |||
|
|||
|
|||
def test_27879(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a better function name?
def test_27879(): | |
def test_pandas_copy_on_write_true(): | |
"""Check target encoder works with copy on write. | |
Non-regression test for gh-27879. | |
""" |
|
||
|
||
def test_27879(): | ||
pd = pytest.importorskip("pandas") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the failing CI, it is using a really old version of pandas that does not have mode.copy_on_write
. Here we can set the min version here:
pd = pytest.importorskip("pandas") | |
pd = pytest.importorskip("pandas", minversion="2.0") |
Please add an entry to the change log at |
Sorry, I use all my brain power at work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hhhjk
Co-authored-by: s-banach <john@hopfensperger.family> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: s-banach <john@hopfensperger.family> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: s-banach <john@hopfensperger.family> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Reference Issues/PRs
Regarding #27879, test failure when pandas copy-on-write is enabled.
What does this implement/fix? Explain your changes.
The Cython code for
TargetEncoder
does not modify the target columny
, so we mark itconst
.Any other comments?
Simply adding
const
is now an option, since cython/cython#5230 is closed.I assume someone else may want to improve the test here:
pd.options.mode.copy_on_write = True
in one test.Does this configuration carry forward to later tests in the test suite?