Skip to content

[WIP] rewrite precision_recall_fscore_support #1990

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 30 commits into from

Conversation

jnothman
Copy link
Member

precision_recall_fscore_support was getting gargantuan, because the similarities between the different input formats and metric variations weren't being exploited; rather, almost everything was special-cased. As a result, some bugs and inconsistencies crept in (admittedly on my watch as a reviewer).

This implementation:

  • is much smaller and less nested in code, and should be faster in some cases, taking advantage of LabelEncoder to use bincount (and building upon vectorized multilabel support in FIX helper to check multilabel types #1985, ENH support multilabel targets in LabelEncoder #1987).
  • deprecates pos_label and introduces neg_label, which allows micro-averaging to be interesting in the multiclass case (see ENH P/R/F should be able to ignore a majority class in the multiclass case #1983).
  • neg_label has not yet been tested.
  • to do so, will assume that multilabel indicator matrices are represented with <1 and 1 (or False and True)
  • has not yet implemented support for the labels argument, largely because I don't know what it means (see DOC clarify the use of label in P/R/F metric family #1989). It's not hard to implement, but I wish labels were deprecated in favour of stating a convention regarding label ordering in the average=None case.
  • has not yet fixed some broken tests for multilabel average='samples'
  • has not yet updated the documentation or signatures of precision_score, etc. derivative functions regarding pos/neg_label
  • currently assumes P and R go to 0 when their denominators are 0. I think this is incorrect behaviour, but it is backward-compatible (except with the average='samples' implementation; again, whoops): precision should be perfect when recalling nothing; recall should be perfect when there are no instances to retrieve. (And not realising that scikit-learn had adopted the 0 approach, I suggested in the documentation that 1 should be used.) The decision made elsewhere is to use 0 and warn. This is not yet implemented.

@jnothman
Copy link
Member Author

The remaining test failures result from problems in the previous implementation, the currently unhandled labels parameter, or my stubborn refusal to implement the label indicator matrix's pos_label application.

neg_label is also yet to be tested.

@jnothman
Copy link
Member Author

Rebased on #1988 to correct values in test cases.

@jnothman
Copy link
Member Author

jnothman commented Jul 8, 2013

Status of this PR:

@jnothman jnothman closed this Jul 27, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants