Skip to content

NMI and AMI use inconsistent definitions of mutual information #10308

Closed
@kno10

Description

@kno10

There exist many defintions of NMI and AMI.

Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11(Oct), 2837-2854.

mention 5 different definitions of NMI, and based on that give 4 different AMI.

The NMI implemented in sklearn uses sqrt(H(U), H(V)) for normalization.
The AMI implemented in sklearn uses max(H(U), H(V)) for normalization.

There exists an NMI with the max normalization, and a AMI with the sqrt normalization, so this is inconsistent in sklearn. Ideally, they would both use the same definition by default, and allow using any of the others via an option.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ModerateAnything that requires some knowledge of conventions and best practiceshelp wanted

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions