Skip to content

Proposal to decouple classifier output classes from the y vector with a classes= parameter in __init__ #1183

Closed
@erg

Description

@erg

From the discussion on the mailing list: http://sourceforge.net/mailarchive/message.php?msg_id=29883772

"I think we could have classes=None constructor parameter in
SGDClassifier an possibly many other classifiers. When provided we
would not use the traditional self.classes_ = np.unique(y) idiom
already implemented in some classifiers of the project (but not all).

+1 also for raising a ValueError exception when classes != None and
if the y provided at fit time has some values not in classes.
However we need to check with some benchmarks that this integrity
check is not too costly.

This constructor parameters could be overriden by a fit_param to
preserve backward compat, especially for classifier models with a
partial_fit method.

The expected behavior for a classifier that is passed a non-None
classes constructor param would be to never predict a class value.
In case of predict_proba method the missing fit-time class
probabilities should be 0.0.

This protocol (including expected exception types and error messages)
should be formalized as a series of common tests in
sklearn/tests/test_common.py and redundant book keeping code should be
factorized in the sklearn.base.py's ClassifierMixin class IMHO."

-Oliver Grisel

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions