Skip to content

Mention possibility of regression targets in warning about unique classes >50% of n_samples #31689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 14, 2025

Conversation

lucyleeow
Copy link
Member

Reference Issues/PRs

Addresses #26335 (comment)

What does this implement/fix? Explain your changes.

In #26335 we added a warning when unique classes >50% of n_samples. This adds a note that it could also be due to the data being from a regression problem.

Any other comments?

Copy link

github-actions bot commented Jul 2, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 428ef4d. Link to the linter CI: here

Comment on lines 421 to 422
"of samples. Note this may be because the data belongs to a regression "
"problem, not a classification problem.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would only be triggered by a regression target containing only integers and with an integer dtype. Is it worth it ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well technically integer float values are "multiclass", but I get your point...

type_of_target([1.0, 0.0, 3.0])
'multiclass'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is too rare. Happy to close

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I misread the previous code block, integer float values end-up as multiclass.

The original motivation for the warning was not about confusion with regression problems according to #16399 (comment).
However this comment #26335 (comment) goes in your direction. So I think your addition is worth.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yes. Christian also mentions it later: #26335 (comment)

Comment on lines 421 to 422
"of samples. Note this may be because the data belongs to a regression "
"problem, not a classification problem.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit:

Suggested change
"of samples. Note this may be because the data belongs to a regression "
"problem, not a classification problem.",
"of samples. The target data could represent a regression "
"problem, not a classification problem.",

Copy link
Member Author

@lucyleeow lucyleeow Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nicer thank you. What do you think about:

"y could represent a regression problem, not a classification problem." ..?

(I am happy either way)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion works for me to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@thomasjpfan thomasjpfan merged commit 6848353 into scikit-learn:main Jul 14, 2025
36 checks passed
@lucyleeow lucyleeow deleted the type_target_warn branch July 14, 2025 21:58
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 15, 2025
…sses >50% of n_samples (scikit-learn#31689)

Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
@jeremiedbb jeremiedbb mentioned this pull request Jul 15, 2025
13 tasks
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 16, 2025
…sses >50% of n_samples (scikit-learn#31689)

Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 16, 2025
…sses >50% of n_samples (scikit-learn#31689)

Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 16, 2025
…sses >50% of n_samples (scikit-learn#31689)

Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 16, 2025
…sses >50% of n_samples (scikit-learn#31689)

Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 16, 2025
…sses >50% of n_samples (scikit-learn#31689)

Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants