Macro/micro average precision/recall/f1-score

When n_classes > 2, the precision / recall / f1-score need to be averaged in some way.

Currently the code in precision_recall_fscore_support does:

<pre>
precision = true_pos / (true_pos + false_pos)
recall = true_pos / (true_pos + false_neg)
</pre>


Since true_pos, false_pos and false_neg are arrays of size n_classes, precision and recall are also arrays of the same size. Then to obtain a single average, the weighted sum is taken.

In the literature, the macro-average and micro-average are usually used but as far as I understand the current code does neither one. The macro is the _unweighted average_ of the precision/recall taken separately for each class. Therefore it is an average over classes. The micro average on the contrary is an average over instances: therefore classes which have many instances are given more importance. However, AFAIK it's not the same as taking the weighted average as currently done in the code.

I think the code should be:

<pre>
micro_avg_precision = true_pos.sum() / (true_pos.sum() + false_pos.sum())
micro_avg_recall = true_pos.sum() / (true_pos.sum() + false_neg.sum())
</pre>


<pre>
macro_avg_precision = np.mean(true_pos / (true_pos + false_pos))
macro_avg_recall = np.mean(true_pos / (true_pos + false_neg))
</pre>


It's easy to fix (add a micro=True|False option) but the tests may be a pain to update :-/


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Macro/micro average precision/recall/f1-score #83

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Macro/micro average precision/recall/f1-score #83

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions