Skip to content

FIX Array validation for DecisionBoundaryDisplay #25077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Dec 2, 2022

Conversation

ArturoAmorQ
Copy link
Member

Reference Issues/PRs

What does this implement/fix? Explain your changes.

DecisionBoundaryDisplay expects the input data X to be of shape (n_samples, 2), but this is not ensured by an internal validation. Because of this, the following code will output an unintuitive ValueError message:

import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.inspection import DecisionBoundaryDisplay

rng = np.random.RandomState(0)
X = pd.DataFrame(np.random.randn(5, 4))
y = pd.Series(rng.randint(0, 2, 5))

model = KNeighborsClassifier()
model.fit(X, y)

DecisionBoundaryDisplay.from_estimator(estimator=model, X=X)

Even worse, a further

from sklearn.ensemble import HistGradientBoostingClassifier

model = HistGradientBoostingClassifier()
model.fit(X, y)

DecisionBoundaryDisplay.from_estimator(estimator=model, X=X)

makes a plot!

This PR is a simple fix to solve the issue.

Any other comments?

A meta comment: I am wondering if we could add a parameter column_values (or similar) that accepts dictionaries to fix values for n_features - 2 of the variables, allowing then for surface curves of the decision boundary display.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @ArturoAmorQ !

@glemaitre glemaitre changed the title MAINT Array validation for DecisionBoundaryDisplay FIX Array validation for DecisionBoundaryDisplay Dec 1, 2022
@glemaitre
Copy link
Member

Actually, I would consider this as a fix and we should also add an entry in the changelog.
We can make it for the 1.2.0 release.

@glemaitre glemaitre added this to the 1.2 milestone Dec 1, 2022
@glemaitre glemaitre added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Dec 1, 2022
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @ArturoAmorQ

@ArturoAmorQ
Copy link
Member Author

Any idea of the reason for the Windows install fail?

@jeremiedbb
Copy link
Member

It's failing everywhere, but we did not find out why yet

@jeremiedbb jeremiedbb merged commit c047241 into scikit-learn:main Dec 2, 2022
@ArturoAmorQ ArturoAmorQ deleted the DecisionBoundaryDisplay branch December 2, 2022 12:44
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Dec 6, 2022
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:inspection To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants