Skip to content

Macro/micro average precision/recall/f1-score #83

Closed
@mblondel

Description

@mblondel

When n_classes > 2, the precision / recall / f1-score need to be averaged in some way.

Currently the code in precision_recall_fscore_support does:

precision = true_pos / (true_pos + false_pos)
recall = true_pos / (true_pos + false_neg)

Since true_pos, false_pos and false_neg are arrays of size n_classes, precision and recall are also arrays of the same size. Then to obtain a single average, the weighted sum is taken.

In the literature, the macro-average and micro-average are usually used but as far as I understand the current code does neither one. The macro is the unweighted average of the precision/recall taken separately for each class. Therefore it is an average over classes. The micro average on the contrary is an average over instances: therefore classes which have many instances are given more importance. However, AFAIK it's not the same as taking the weighted average as currently done in the code.

I think the code should be:

micro_avg_precision = true_pos.sum() / (true_pos.sum() + false_pos.sum())
micro_avg_recall = true_pos.sum() / (true_pos.sum() + false_neg.sum())
macro_avg_precision = np.mean(true_pos / (true_pos + false_pos))
macro_avg_recall = np.mean(true_pos / (true_pos + false_neg))

It's easy to fix (add a micro=True|False option) but the tests may be a pain to update :-/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions