Skip to content

BUG need to ensure classification metrics are sane under (non-stratified) cross-validation #2029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jnothman opened this issue Jun 4, 2013 · 3 comments
Labels

Comments

@jnothman
Copy link
Member

jnothman commented Jun 4, 2013

Where a dataset is split up and not all evaluated at once, some classes may be missing from evaluation. Metrics implementations get around problems relating to classes appearing not in both the y_true and y_pred by considering the union of their labels. However, this is insufficient if a label that existed in the training set for a fold is absent from both the predicted and true test targets.

This is at least a problem for the P/R/F family of metrics with average='macro' and labels unspecified, and it should be documented (though a user shouldn't be using 'macro' if there are infrequent labels). I haven't thought yet about whether it is an issue elsewhere, or whether it can be reasonably tested.

@jnothman
Copy link
Member Author

jnothman commented Jun 4, 2013

Where P/R/F specially handles the binary case this is also a problem for other values of average. By this I mean that if one or more missing classes reduces the problem from multiclass to binary classes, the expected result is completely different.

@jnothman
Copy link
Member Author

I think this is partially solved, and not specific enough to be generally solved. Closing.

@jnothman
Copy link
Member Author

By partially solved I mean we no longer allow multiclass data with default f1 parameters, and labels works more flexibly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants