Skip to content

RFC should the scikit-learn metrics return a Python scalar or a NumPy scalar? #27339

Closed
@glemaitre

Description

@glemaitre

While working on the representation imposed by NEP51, I found out that we recently made the accuracy_score to return a Python scalar while, up-to-now, other metric are returning NumPy scalar.

This change was made due to the array API work:

def _weighted_sum(sample_score, sample_weight, normalize=False, xp=None):
# XXX: this function accepts Array API input but returns a Python scalar
# float. The call to float() is convenient because it removes the need to
# move back results from device to host memory (e.g. calling `.cpu()` on a
# torch tensor). However, this might interact in unexpected ways (break?)
# with lazy Array API implementations. See:
# https://github.com/data-apis/array-api/issues/642

I assume that we are getting to an intersection where we should make the output of our metrics consistent but also foresee potential requirements: as the comment indicate, calling float() will be a sync point but it might not be the best strategy for lazy computation.

This RFC is a placeholder to discuss what strategy we should be implementing.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions