Closed
Description
Where a dataset is split up and not all evaluated at once, some classes may be missing from evaluation. Metrics implementations get around problems relating to classes appearing not in both the y_true
and y_pred
by considering the union of their labels. However, this is insufficient if a label that existed in the training set for a fold is absent from both the predicted and true test targets.
This is at least a problem for the P/R/F family of metrics with average='macro'
and labels
unspecified, and it should be documented (though a user shouldn't be using 'macro'
if there are infrequent labels). I haven't thought yet about whether it is an issue elsewhere, or whether it can be reasonably tested.