Skip to content

_safe_indexing triggers SettingWithCopyWarning when used with slice #31290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MarcoGorelli opened this issue May 1, 2025 · 1 comment
Open
Labels
Bug Needs Triage Issue requires triage

Comments

@MarcoGorelli
Copy link
Contributor

Describe the bug

Here's something I noticed while looking into #31127

The test

pytest sklearn/utils/tests/test_indexing.py::test_safe_indexing_pandas_no_settingwithcopy_warning

checks that a copy is produced, and that no SettingWithCopyWarning is produced

Indeed, no copy is raised, but why is using _safe_indexing with a slice allowed to not make a copy? Is this intentional?

Based on responses, I can suggest what to do instead in #31127

(I am a little surprised that this always makes copies, given that a lot of the discussion in #28341 centered around wanting to avoid copies)

Steps/Code to Reproduce

import numpy as np

from sklearn.utils import _safe_indexing
import pandas as pd

X = pd.DataFrame({"a": [1, 2, 3], "b": [3, 4, 5]})
subset = _safe_indexing(X, slice(0, 2), axis=0)
subset.iloc[0, 0] = 10

Expected Results

No SettingWithCopyWarning

Actual Results

/home/marcogorelli/scikit-learn-dev/t.py:13: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  subset.iloc[0, 0] = 10

Versions

System:
    python: 3.11.11 (main, Dec  4 2024, 08:55:07) [GCC 11.4.0]
executable: /home/marcogorelli/scikit-learn-dev/.venv/bin/python
   machine: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35

Python dependencies:
      sklearn: 1.7.dev0
          pip: 24.2
   setuptools: None
        numpy: 2.1.0
        scipy: 1.14.0
       Cython: 3.0.11
       pandas: 2.2.2
   matplotlib: None
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libscipy_openblas
       filepath: /home/marcogorelli/scikit-learn-dev/.venv/lib/python3.11/site-packages/numpy.libs/libscipy_openblas64_-ff651d7f.so
        version: 0.3.27
threading_layer: pthreads
   architecture: SkylakeX

       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libscipy_openblas
       filepath: /home/marcogorelli/scikit-learn-dev/.venv/lib/python3.11/site-packages/scipy.libs/libscipy_openblas-c128ec02.so
        version: 0.3.27.dev
threading_layer: pthreads
   architecture: SkylakeX

       user_api: openmp
   internal_api: openmp
    num_threads: 16
         prefix: libgomp
       filepath: /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0
        version: None
@MarcoGorelli
Copy link
Contributor Author

based on git logs for related functionality, I'm going to tag @glemaitre @ogrisel @lorentzenchr @jeremiedbb to ask what the indended behaviour is

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue requires triage
Projects
None yet
Development

No branches or pull requests

1 participant