Fixes #16065 Ignore weights equal zero in precision recall curve #16319

alonsosilvaallende · 2020-01-30T14:33:20Z

What does this implement/fix? Explain your changes.

It ignores weights equal to zero when called by precision_recall_curve function.

Any other comments?

I'm not sure if there is a more elegant solution.

ogrisel · 2020-01-30T15:57:55Z

sklearn/metrics/_ranking.py

+    indexes_zeros = sample_weight.index(0)
+    del y_true[indexes_zeros]
+    del probas_pred[indexes_zeros]
+    del sample_weight[indexes_zeros]


All those arguments are more expected to be numpy arrays instead of lists. So instead we should write numpy code such as follows:

if sample_weight is not None: if not np.all(sample_weight >= 0) raise ValueError("negative values in sample_weight are invalid") nonzero_weight_mask = sample_weight > 0 y_true = y_true[nonzero_mask] probas_pred = probas_pred[nonzero_mask] sample_weight = sample_weight[nonzero_mask]

But this should only be done after we call check_array, column_or_1d or similar validation functions on all the arguments. Those checks are centralized in the _binary_clf_curve function itself, so the zero weight filtering code should probably be moved to the body of that function.

Thank you very much, Olivier. I have added a check_negative at _check_sample_weight which partially answers #15531. I have added some lines to check that it's not zero.

ogrisel · 2020-01-30T16:02:22Z

We might want to move the check for negative sample weights in _check_sample_weight and use that function instead.

…_curve

cmarmo · 2020-03-03T14:49:48Z

Hi @alonsosilvaallende could you please check the lint errors? Thanks!

ogrisel

More suggestions:

ogrisel · 2020-03-06T11:18:59Z

sklearn/utils/validation.py

@@ -1172,7 +1172,7 @@ def _check_psd_eigenvalues(lambdas, enable_warnings=False):
    return lambdas


-def _check_sample_weight(sample_weight, X, dtype=None):
+def _check_sample_weight(sample_weight, X, dtype=None, check_negative=False):


I find the naming check_negative confusing. What about:

Suggested change

def _check_sample_weight(sample_weight, X, dtype=None, check_negative=False):

def _check_sample_weight(sample_weight, X, dtype=None, check_nonnegative=False):

ogrisel · 2020-03-06T11:20:07Z

sklearn/metrics/_ranking.py

@@ -544,6 +545,13 @@ def _binary_clf_curve(y_true, y_score, pos_label=None, sample_weight=None):
    if sample_weight is not None:
        sample_weight = column_or_1d(sample_weight)

+    # Check to make sure sample_weight is strictly positive


Suggested change

# Check to make sure sample_weight is strictly positive

# Check to make sure sample_weight are non negative and filter out zero

# weighted samples as they should not impact the score value

albertvillanova · 2020-09-02T15:13:16Z

I take over this PR.

Should I merge or rebase onto master?

cmarmo · 2020-09-02T16:31:47Z

@albertvillanova you can find here some documentation about how to take over stalled pull requests. Hope this will help you.

albertvillanova · 2020-09-02T17:03:44Z

Thanks @cmarmo. I have already pulled it. My question was, once pulled, should I merge or rebase onto master. I will do whatever is better for you to manage my eventual own PR.

thomasjpfan · 2020-09-02T17:07:10Z

My question was, once pulled, should I merge or rebase onto master.

We prefer to merge.

albertvillanova · 2020-09-02T17:17:55Z

Thank you, @thomasjpfan. Indeed, rebasing was generating conflicts, while merging worked without problems.

albertvillanova · 2020-10-04T09:17:36Z

Take

Ignore weights equal zero in precision recall curve

2374879

alonsosilvaallende requested a review from jeremiedbb January 30, 2020 14:35

jeremiedbb added the Sprint label Jan 30, 2020

ogrisel requested changes Jan 30, 2020

View reviewed changes

Alonso Silva Allende added 5 commits January 31, 2020 10:09

Remove previous changes

3bca885

Add check positive sample_weight with default=False

89d1858

Add check for strictly positive sample_weight in binary_clf_curve

7d3e151

Change name from positive to check_negative

6974713

Add check to see is sample_weight is strictly positive in _binary_clf…

c336a40

…_curve

alonsosilvaallende requested review from rth and ogrisel January 31, 2020 11:01

github-actions bot added module:metrics module:utils labels Mar 2, 2020

ogrisel requested changes Mar 6, 2020

View reviewed changes

cmarmo added Stalled help wanted labels Sep 1, 2020

cmarmo removed the help wanted label Sep 2, 2020

albertvillanova mentioned this pull request Sep 2, 2020

FIX Ignore zero sample weights in precision recall curve #18328

Merged

cmarmo added the Superseded PR has been replace by a newer PR label Sep 2, 2020

github-actions bot assigned albertvillanova Oct 4, 2020

Base automatically changed from master to main January 22, 2021 10:52

lorentzenchr closed this in #18328 Apr 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fixes #16065 Ignore weights equal zero in precision recall curve #16319

Fixes #16065 Ignore weights equal zero in precision recall curve #16319

Uh oh!

alonsosilvaallende commented Jan 30, 2020

Uh oh!

ogrisel Jan 30, 2020 •

edited

Loading

Uh oh!

alonsosilvaallende Jan 31, 2020

Uh oh!

ogrisel commented Jan 30, 2020

Uh oh!

cmarmo commented Mar 3, 2020

Uh oh!

ogrisel left a comment

Uh oh!

ogrisel Mar 6, 2020

Uh oh!

ogrisel Mar 6, 2020

Uh oh!

albertvillanova commented Sep 2, 2020 •

edited

Loading

Uh oh!

cmarmo commented Sep 2, 2020

Uh oh!

albertvillanova commented Sep 2, 2020 •

edited

Loading

Uh oh!

thomasjpfan commented Sep 2, 2020

Uh oh!

albertvillanova commented Sep 2, 2020

Uh oh!

albertvillanova commented Oct 4, 2020

Uh oh!

Uh oh!

	def _check_sample_weight(sample_weight, X, dtype=None, check_negative=False):
	def _check_sample_weight(sample_weight, X, dtype=None, check_nonnegative=False):

	# Check to make sure sample_weight is strictly positive
	# Check to make sure sample_weight are non negative and filter out zero
	# weighted samples as they should not impact the score value

Uh oh!

Fixes #16065 Ignore weights equal zero in precision recall curve #16319

Fixes #16065 Ignore weights equal zero in precision recall curve #16319

Uh oh!

Conversation

alonsosilvaallende commented Jan 30, 2020

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

ogrisel Jan 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alonsosilvaallende Jan 31, 2020

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Jan 30, 2020

Uh oh!

cmarmo commented Mar 3, 2020

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel Mar 6, 2020

Choose a reason for hiding this comment

Uh oh!

ogrisel Mar 6, 2020

Choose a reason for hiding this comment

Uh oh!

albertvillanova commented Sep 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmarmo commented Sep 2, 2020

Uh oh!

albertvillanova commented Sep 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasjpfan commented Sep 2, 2020

Uh oh!

albertvillanova commented Sep 2, 2020

Uh oh!

albertvillanova commented Oct 4, 2020

Uh oh!

Uh oh!

ogrisel Jan 30, 2020 •

edited

Loading

albertvillanova commented Sep 2, 2020 •

edited

Loading

albertvillanova commented Sep 2, 2020 •

edited

Loading