Skip to content

Partial dependence broken when categorical_features has an empty list #31077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Atm-Moumen opened this issue Mar 26, 2025 · 3 comments · Fixed by #31146
Closed

Partial dependence broken when categorical_features has an empty list #31077

Atm-Moumen opened this issue Mar 26, 2025 · 3 comments · Fixed by #31146
Labels

Comments

@Atm-Moumen
Copy link

Atm-Moumen commented Mar 26, 2025

Describe the bug

When we pass an empty list to categorical_features, partial_dependence will raise an error ValueError: Expected categorical_features to be an array-like of boolean, integer, or string. Got float64 instead.

Steps/Code to Reproduce

from sklearn import datasets
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LinearRegression
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import make_pipeline
from sklearn.inspection import partial_dependence

iris, Species = datasets.load_iris(return_X_y=True)
iris = pd.DataFrame(
iris,
columns=["sepal_length", "sepal_width", "petal_length", "petal_width"]
)
iris["species"] = pd.Series(Species).map({0: "A", 1: "B", 2: "C"})
iris.head()

species_encoder = make_pipeline(
SimpleImputer(strategy="constant", fill_value="A"),
OneHotEncoder(drop=["A"], sparse_output=False)
)

preprocessor = ColumnTransformer(
transformers=[
("species_encoder", species_encoder, ["species"]),
("other", SimpleImputer(), ["sepal_width", "petal_width", "petal_length"])
],
verbose_feature_names_out=False
).set_output(transform="pandas")

model = make_pipeline(preprocessor, LinearRegression())

model.fit(iris, iris.sepal_length)

pd = partial_dependence(estimator=model, X= iris, features= ["sepal_length"], categorical_features= [])

Expected Results

.

Actual Results

ValueError Traceback (most recent call last)
Cell In[12], line 28
24 model = make_pipeline(preprocessor, LinearRegression())
26 model.fit(iris, iris.sepal_length)
---> 28 pd = partial_dependence(estimator=model, X= iris, features= ["sepal_length"], categorical_features= [])

File ~.venv/lib/python3.10/site-packages/sklearn/utils/_param_validation.py:213, in validate_params..decorator..wrapper(*args, **kwargs)
207 try:
208 with config_context(
209 skip_parameter_validation=(
210 prefer_skip_nested_validation or global_skip_validation
211 )
212 ):
--> 213 return func(*args, **kwargs)
214 except InvalidParameterError as e:
215 # When the function is just a wrapper around an estimator, we allow
216 # the function to delegate validation to the estimator, but we replace
217 # the name of the estimator by the name of the function in the error
218 # message to avoid confusion.
219 msg = re.sub(
220 r"parameter of \w+ must be",
221 f"parameter of {func.qualname} must be",
222 str(e),
223 )

File ~.venv/lib/python3.10/site-packages/sklearn/inspection/_partial_dependence.py:679, in partial_dependence(estimator, X, features, sample_weight, categorical_features, feature_names, response_method, percentiles, grid_resolution, method, kind)
675 is_categorical = [
676 idx in categorical_features_idx for idx in features_indices
677 ]
678 else:
--> 679 raise ValueError(
680 "Expected categorical_features to be an array-like of boolean,"
681 f" integer, or string. Got {categorical_features.dtype} instead."
682 )
684 grid, values = _grid_from_X(
685 _safe_indexing(X, features_indices, axis=1),
686 percentiles,
687 is_categorical,
688 grid_resolution,
689 )
691 if method == "brute":

ValueError: Expected categorical_features to be an array-like of boolean, integer, or string. Got float64 instead.

Versions

System:
    python: 3.10.16 (main, Dec  3 2024, 17:27:57) [Clang 16.0.0 (clang-1600.0.26.4)]
executable: /Users/datategy/papAI/o2_ml/.venv/bin/python
   machine: macOS-15.3-x86_64-i386-64bit

Python dependencies:
      sklearn: 1.5.0
          pip: 25.0.1
   setuptools: 75.6.0
        numpy: 1.26.4
        scipy: 1.13.1
       Cython: 3.0.12
       pandas: 1.5.3
   matplotlib: 3.8.4
       joblib: 1.2.0
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 4
         prefix: libopenblas
...
    num_threads: 8
         prefix: libomp
       filepath: /usr/local/Cellar/libomp/19.1.7/lib/libomp.dylib
        version: None
@Atm-Moumen Atm-Moumen added Bug Needs Triage Issue requires triage labels Mar 26, 2025
@glemaitre
Copy link
Member

We should not accept an empty list indeed and raise a better error message. None would be the right input for such use case.

@glemaitre glemaitre removed the Needs Triage Issue requires triage label Mar 27, 2025
pedroL0pes added a commit to pedroL0pes/scikit-learn that referenced this issue Mar 28, 2025
… empty list

Passing an empty list to `categorical_features` in `partial_dependence` was causing a ValueError with an unclear message.
Now, we explicitly check for this case and raise a clearer ValueError instructing users to use `None` instead.
Added a test case to ensure proper exception handling.

Signed-off-by: Pedro Lopes <pedro.m.a.r.lopes@tecnico.ulisboa.pt>
pedroL0pes added a commit to pedroL0pes/scikit-learn that referenced this issue Mar 28, 2025
… empty list

Passing an empty list to `categorical_features` in
`partial_dependence` was causing a ValueError with
an unclear message. Now, we explicitly check for
this case and raise a clearer ValueError instructing
users to use `None` instead. Added a test case to
ensure proper exception handling.

Signed-off-by: Pedro Lopes <pedro.m.a.r.lopes@tecnico.ulisboa.pt>
@MarcBresson
Copy link
Contributor

Any reason why empty lists should not be accepted? I feel like giving an empty list [] should lead the same behaviour as giving None.

What do you think?

@jeremiedbb
Copy link
Member

I agree that if empty lists were accepted, the behavior should be the same as None. However I don't think that we should accept them. They weren't accepted before, the only thing that changed is that now we raise a more informative error message. So unless it represents a big convenience feature for the users, in which case we should reconsider, let's not introduce a new feature (small one but still) because it increases the maintenance burden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants