Inconsistency in zero_division handling between precision/recall/f1 and precision_recall_curve/roc_curve related metrics

### Describe the workflow you want to enable

The API offers the possibility to set the behavior upon stumbling upon a zero division issue when no positive label is present in the dataset, upon computing `precision_score`, `recall_score` or `f1_score` using the keyword argument:

    zero_division{“warn”, 0.0, 1.0, np.nan}, default=”warn”

    Sets the value to return when there is a zero division.

    Notes: - If set to “warn”, this acts like 0, but a warning is also raised. - If set to np.nan, such values will be excluded from the average.

    New in version 1.3: np.nan option was added.

`precision_recall_curve`, `roc_curve`, `roc_auc_score`, `average_precision_score `,  `label_ranking_average_precision_score`, despite having to compute precision or recall under the hood do not offer the same possibility.
While this is unlikely to pose problem in the micro averaging setting, it becomes more likely in the sample (AP and LRAP, possibly roc_auc_score)  or macro averaging setting.

For instance `precision_recall_curve` is not using the `precision_score` or `recall_score` functions:

https://github.com/scikit-learn/scikit-learn/blob/e4efd8b7961c1a16e862edc6c592e10dcd9d8697/sklearn/metrics/_ranking.py#L970-L985

In this implementation recall=1 and precision=0 when there is no positive example.

### Describe your proposed solution

Use the implemented `precision_score` and `recall_score` in all precision-recall and ROC curve functions. Add the same  `zero_division` kwarg and forward it to the `precision_score` and `recall_score` functions 

### Describe alternatives you've considered, if relevant

_No response_

### Additional context

I think this discussion is linked to:

- #24381 that is associated to an unmerged PR. My proposed solution would also enable to fix this issue in a more consistent way (by returning a positive `0.0` and a a zero division warning)
- #19085

Also to add more context (though changing this could break some existing code): `roc_curve` and `precision_recall_curve` do not handle this problem consistently:
```python

import numpy as np
import sklearn.metrics as skmet

y_true = np.zeros(10)
y_pred = np.random.uniform(size=y_true.shape)

skmet.roc_curve(y_true,y_pred,pos_label=1)
# UndefinedMetricWarning: No positive samples in y_true, true positive value should be meaningless
#  warnings.warn(
# Out: 
# (array([0. , 0.1, 1. ]),
# array([nan, nan, nan]),  # Recall or True positive Rate
# array([1.82341255, 0.82341255, 0.0795866 ]))

skmet.precision_recall_curve(y_true,y_pred,pos_label=1)
#UserWarning: No positive class found in y_true, recall is set to one for all thresholds.
#  warnings.warn(
#Out: 
#(array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]),  # Precision
# array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0.]),  # Recall or True positive Rate
# array([0.0795866 , 0.3813231 , 0.41105316, 0.56378517, 0.56951648,
#        0.60346455, 0.61754398, 0.61861517, 0.70285933, 0.82341255]))

# Compute AUC for the the PR curve
prec, recall, thresh = skmet.precision_recall_curve(y_true,y_pred,pos_label=1)
skmet
# UserWarning: No positive class found in y_true, recall is set to one for all thresholds.
skmet.auc(recall, prec)
# Out: 0.5
```
Which translates into an undefined AUC for ROC and a weird `0.5` value (nor 0 nor 1) for the AUC of the PR curve. I also find the warning confusing, it states that Recall is set to 1 for all thresholds, however the last value of the recall vector is `0.`...

On the same theme I think the ndcg_score is also somewhat inconsistent (ideal DCG is 0.0, hence a zero division somewhere, in the binary relevance case without positive example): 
```python
y_true = np.zeros((10,5))
y_pred = np.random.uniform(size=y_true_almnull.shape)

skmet.ndcg_score(y_true,y_pred)
Out: 0.0
```
If one wanted to perform sample averaging of the three different metrics (though for the PR curve AP and LRAP are the way to go to perform sample averaging), they would all behave differently upon encounter of a sample with zero positive example... 

	ps = tps + fps
	# Initialize the result array with zeros to make sure that precision[ps == 0]
	# does not contain uninitialized values.
	precision = np.zeros_like(tps)
	np.divide(tps, ps, out=precision, where=(ps != 0))

	# When no positive label in y_true, recall is set to 1 for all thresholds
	# tps[-1] == 0 <=> y_true == all negative labels
	if tps[-1] == 0:
	warnings.warn(
	"No positive class found in y_true, "
	"recall is set to one for all thresholds."
	)
	recall = np.ones_like(tps)
	else:
	recall = tps / tps[-1]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Inconsistency in zero_division handling between precision/recall/f1 and precision_recall_curve/roc_curve related metrics #27047

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Inconsistency in zero_division handling between precision/recall/f1 and precision_recall_curve/roc_curve related metrics #27047

Description

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions