There should be multiclass classification tasks, and the metrics should be separated into multiclass and binary-only.