Partial dependence broken when categorical_features has an empty list #31077

Atm-Moumen · 2025-03-26T10:12:18Z

Describe the bug

When we pass an empty list to categorical_features, partial_dependence will raise an error ValueError: Expected categorical_features to be an array-like of boolean, integer, or string. Got float64 instead.

Steps/Code to Reproduce

from sklearn import datasets
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LinearRegression
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import make_pipeline
from sklearn.inspection import partial_dependence

iris, Species = datasets.load_iris(return_X_y=True)
iris = pd.DataFrame(
iris,
columns=["sepal_length", "sepal_width", "petal_length", "petal_width"]
)
iris["species"] = pd.Series(Species).map({0: "A", 1: "B", 2: "C"})
iris.head()

species_encoder = make_pipeline(
SimpleImputer(strategy="constant", fill_value="A"),
OneHotEncoder(drop=["A"], sparse_output=False)
)

preprocessor = ColumnTransformer(
transformers=[
("species_encoder", species_encoder, ["species"]),
("other", SimpleImputer(), ["sepal_width", "petal_width", "petal_length"])
],
verbose_feature_names_out=False
).set_output(transform="pandas")

model = make_pipeline(preprocessor, LinearRegression())

model.fit(iris, iris.sepal_length)

pd = partial_dependence(estimator=model, X= iris, features= ["sepal_length"], categorical_features= [])

Expected Results

.

Actual Results

ValueError Traceback (most recent call last)
Cell In[12], line 28
24 model = make_pipeline(preprocessor, LinearRegression())
26 model.fit(iris, iris.sepal_length)
---> 28 pd = partial_dependence(estimator=model, X= iris, features= ["sepal_length"], categorical_features= [])

File ~.venv/lib/python3.10/site-packages/sklearn/utils/_param_validation.py:213, in validate_params..decorator..wrapper(*args, **kwargs)
207 try:
208 with config_context(
209 skip_parameter_validation=(
210 prefer_skip_nested_validation or global_skip_validation
211 )
212 ):
--> 213 return func(*args, **kwargs)
214 except InvalidParameterError as e:
215 # When the function is just a wrapper around an estimator, we allow
216 # the function to delegate validation to the estimator, but we replace
217 # the name of the estimator by the name of the function in the error
218 # message to avoid confusion.
219 msg = re.sub(
220 r"parameter of \w+ must be",
221 f"parameter of {func.qualname} must be",
222 str(e),
223 )

File ~.venv/lib/python3.10/site-packages/sklearn/inspection/_partial_dependence.py:679, in partial_dependence(estimator, X, features, sample_weight, categorical_features, feature_names, response_method, percentiles, grid_resolution, method, kind)
675 is_categorical = [
676 idx in categorical_features_idx for idx in features_indices
677 ]
678 else:
--> 679 raise ValueError(
680 "Expected categorical_features to be an array-like of boolean,"
681 f" integer, or string. Got {categorical_features.dtype} instead."
682 )
684 grid, values = _grid_from_X(
685 _safe_indexing(X, features_indices, axis=1),
686 percentiles,
687 is_categorical,
688 grid_resolution,
689 )
691 if method == "brute":

ValueError: Expected categorical_features to be an array-like of boolean, integer, or string. Got float64 instead.

Versions

System:
    python: 3.10.16 (main, Dec  3 2024, 17:27:57) [Clang 16.0.0 (clang-1600.0.26.4)]
executable: /Users/datategy/papAI/o2_ml/.venv/bin/python
   machine: macOS-15.3-x86_64-i386-64bit

Python dependencies:
      sklearn: 1.5.0
          pip: 25.0.1
   setuptools: 75.6.0
        numpy: 1.26.4
        scipy: 1.13.1
       Cython: 3.0.12
       pandas: 1.5.3
   matplotlib: 3.8.4
       joblib: 1.2.0
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 4
         prefix: libopenblas
...
    num_threads: 8
         prefix: libomp
       filepath: /usr/local/Cellar/libomp/19.1.7/lib/libomp.dylib
        version: None

glemaitre · 2025-03-27T09:43:15Z

We should not accept an empty list indeed and raise a better error message. None would be the right input for such use case.

… empty list Passing an empty list to `categorical_features` in `partial_dependence` was causing a ValueError with an unclear message. Now, we explicitly check for this case and raise a clearer ValueError instructing users to use `None` instead. Added a test case to ensure proper exception handling. Signed-off-by: Pedro Lopes <pedro.m.a.r.lopes@tecnico.ulisboa.pt>

MarcBresson · 2025-04-23T09:15:21Z

Any reason why empty lists should not be accepted? I feel like giving an empty list [] should lead the same behaviour as giving None.

What do you think?

jeremiedbb · 2025-04-23T17:25:02Z

I agree that if empty lists were accepted, the behavior should be the same as None. However I don't think that we should accept them. They weren't accepted before, the only thing that changed is that now we raise a more informative error message. So unless it represents a big convenience feature for the users, in which case we should reconsider, let's not introduce a new feature (small one but still) because it increases the maintenance burden.

Atm-Moumen added Bug Needs Triage Issue requires triage labels Mar 26, 2025

glemaitre removed the Needs Triage Issue requires triage label Mar 27, 2025

pedroL0pes mentioned this issue Apr 3, 2025

Fix #31077: Improve error when categorical_features is an empty list #31146

Merged

jeremiedbb closed this as completed in #31146 Apr 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial dependence broken when categorical_features has an empty list #31077

Partial dependence broken when categorical_features has an empty list #31077

Atm-Moumen commented Mar 26, 2025 •

edited by glemaitre

Loading

glemaitre commented Mar 27, 2025

MarcBresson commented Apr 23, 2025

jeremiedbb commented Apr 23, 2025

Partial dependence broken when categorical_features has an empty list #31077

Partial dependence broken when categorical_features has an empty list #31077

Comments

Atm-Moumen commented Mar 26, 2025 • edited by glemaitre Loading

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

glemaitre commented Mar 27, 2025

MarcBresson commented Apr 23, 2025

jeremiedbb commented Apr 23, 2025

Atm-Moumen commented Mar 26, 2025 •

edited by glemaitre

Loading