Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
Fixes #31290
What does this implement/fix? Explain your changes.
This PR fixes an issue where
_safe_indexing
triggers aSettingWithCopyWarning
when used with a slice on pandas DataFrames.The problem:
When
_safe_indexing
is used with a slice (e.g.,slice(0, 2)
) on a pandas DataFrame, it returns a view rather than a copy. If the user then modifies this view, pandas raises aSettingWithCopyWarning
because it's unclear whether the modification should affect the original DataFrame.The fix:
I modified the
_pandas_indexing
function insklearn/utils/_indexing.py
to explicitly create a copy when a slice is used:This ensures that when
_safe_indexing
is used with a slice, it always returns a copy of the data, not a view. This prevents theSettingWithCopyWarning
from being raised when the result is modified.This change maintains the existing behavior of
_safe_indexing
(it still returns the same data), but ensures that the returned object is always a copy when using slices, which is consistent with the behavior when using other types of indexers.Any other comments?
.copy(deep=True)
or.copy()
but decided on.copy()
because of the performance overhead of deep copying. The trade off is that deep copy would be safer but at the cost of performance.