RFC should the scikit-learn metrics return a Python scalar or a NumPy scalar?

While working on the representation imposed by NEP51, I found out that we recently made the `accuracy_score` to return a Python scalar while, up-to-now, other metric are returning NumPy scalar.

This change was made due to the array API work:

https://github.com/scikit-learn/scikit-learn/blob/b0da1b7706054f0b78f0a0582a9362a188e1fa38/sklearn/utils/_array_api.py#L448-L454

I assume that we are getting to an intersection where we should make the output of our metrics consistent but also foresee potential requirements: as the comment indicate, calling `float()` will be a sync point but it might not be the best strategy for lazy computation.

This RFC is a placeholder to discuss what strategy we should be implementing.

	def _weighted_sum(sample_score, sample_weight, normalize=False, xp=None):
	# XXX: this function accepts Array API input but returns a Python scalar
	# float. The call to float() is convenient because it removes the need to
	# move back results from device to host memory (e.g. calling `.cpu()` on a
	# torch tensor). However, this might interact in unexpected ways (break?)
	# with lazy Array API implementations. See:
	# https://github.com/data-apis/array-api/issues/642

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RFC should the scikit-learn metrics return a Python scalar or a NumPy scalar? #27339

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

RFC should the scikit-learn metrics return a Python scalar or a NumPy scalar? #27339

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions