Improve check_classification_targets to warn users about large number of classes and avoid redundant computation

- `check_classification_targets` should warn the user if the number of unique classes is larger than say 50% of  `n_samples`: the user is likely feeding bad targets which can be highly problematic when the size of the model depends linearly (or more) on the number of classes (e.g. Gradient Boosting models, OneVsOne SVM, ...).

- `check_classification_targets` which calls `type_of_target` triggers a call to `_assert_all_finite` and `np.unique(y)` which are redundant with checks done elsewhere (e.g. in `check_X_y`)

- `check_classification_targets` should probably be refactored to return the array of unique classes (to be used for the `classes_` attribute of the classifier) to avoid redundant computation.

- `check_classification_targets` should probably be used more consistently for any classifier maybe by being used internally in `check_X_y` or in `_validate_data` instead of being called manually on a case by case basis.

And we probably need a common test to check the above (e.g. all classifier should raise the warning).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve check_classification_targets to warn users about large number of classes and avoid redundant computation #16399

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Improve check_classification_targets to warn users about large number of classes and avoid redundant computation #16399

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions