ENH fast path for binary confusion matrix #15403

jnothman · 2019-10-30T08:32:46Z

A patch for @GregoryMorse to benchmark.

jnothman · 2019-10-30T08:33:18Z

sklearn/metrics/tests/test_classification.py

@@ -879,12 +879,6 @@ def test_confusion_matrix_dtype():
    assert cm[0, 0] == 4294967295
    assert cm[1, 1] == 8589934590

-    # np.iinfo(np.int64).max should cause an overflow


There might be a neater solution than removing this, but it turns out that different implementations will give different results in the case of overflow

Should it just test for the binary type and exact selection conditions as in the optimized code before making the assertions to exclude them?

GregoryMorse

Thank you so much for the PR. Helpful question: what about handling of the normalization case, as I do not see anything which deals with it? Update: nevermind, it appears I must have been looking at another function or version previously as I realize there is currently no normalize argument for the confusion matrix though theoretically one could be added, its easy enough to divide by np.sum.

jnothman · 2019-10-30T09:00:51Z

Looks like CI is unhappy anyway

GregoryMorse · 2019-10-30T09:20:50Z

Perhaps instead of sample_weight.dtype.kind == 'O', not sample_weight.dtype.kind in {'i', 'u', 'b'}. Otherwise it also looks like the final return should be .astype(np.float64) and not float. I had forgotten about sample weights. Also the first return should get .astype(np.int64) and then all tests should pass. Its interesting that supposedly bincount uses int64 internally then is returning ints and some complaints about that had been raised due to double memory. But I kept forgetting the sample weight parameter.

GregoryMorse · 2019-10-30T14:17:37Z

Nice job, it works now, only the doc system has some sort of issue. I would consider putting sample_weight.dtype.kind in 'iub' in a variable and not having that code repeated 4 or so times, along with the trivial computation it entails.

jnothman · 2020-01-10T01:18:50Z

Please feel free to run some benchmarks, @GregoryMorse

GregoryMorse · 2020-01-24T21:05:23Z

Would using timeit between the old and new version be sufficient?

jnothman · 2020-01-26T13:08:33Z

Yes. Maybe checking a few different affected cases.

lorentzenchr · 2022-01-15T11:45:28Z

sklearn/metrics/_classification.py

+        sample_weight = np.asarray(sample_weight)
+        check_consistent_length(y_true, y_pred, sample_weight)


Suggested change

sample_weight = np.asarray(sample_weight)

check_consistent_length(y_true, y_pred, sample_weight)

sample_weight = _check_sample_weight(sample_weight, X)

lorentzenchr · 2022-01-15T11:52:13Z

This PR needs a simple benchmark (as comment/post here in github), e.g. with %time or %timeit.

jeremiedbb · 2024-03-06T10:14:04Z

Some profiling done in #28578 showed that it's actually all the checks that dominate in confusion_matrix, especially the call(s) to np.unique. The specialization in the binary case only brings a marginal improvement so we decided that it's not worth the added complexity for now. With that in mind I'm closing this PR. If we manage to reduce the overhead of this function then we'll reconsider implementing this optimization.

ENH fast path for binary confusion matrix

81d6775

jnothman commented Oct 30, 2019

View reviewed changes

GregoryMorse reviewed Oct 30, 2019

View reviewed changes

jnothman added 2 commits October 30, 2019 21:09

Try fix issue comparing array on some numpy versions

5819228

Should usually return int64

fd90156

Merge branch 'master' into cmspeed

0e55564

Fix normalize handling

93f94e2

github-actions bot added the module:metrics label Mar 2, 2020

Base automatically changed from master to main January 22, 2021 10:51

lorentzenchr reviewed Jan 15, 2022

View reviewed changes

lorentzenchr added Stalled Performance labels Jan 15, 2022

lucyleeow mentioned this pull request Mar 5, 2024

ENH Add fast path for binary confusion matrix #28578

Closed

jeremiedbb closed this Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH fast path for binary confusion matrix #15403

ENH fast path for binary confusion matrix #15403

Uh oh!

jnothman commented Oct 30, 2019

Uh oh!

jnothman Oct 30, 2019

Uh oh!

GregoryMorse Oct 30, 2019

Uh oh!

GregoryMorse left a comment •

edited

Loading

Uh oh!

jnothman commented Oct 30, 2019

Uh oh!

GregoryMorse commented Oct 30, 2019 •

edited

Loading

Uh oh!

GregoryMorse commented Oct 30, 2019

Uh oh!

jnothman commented Jan 10, 2020

Uh oh!

GregoryMorse commented Jan 24, 2020

Uh oh!

jnothman commented Jan 26, 2020 via email

Uh oh!

lorentzenchr Jan 15, 2022

Uh oh!

lorentzenchr commented Jan 15, 2022

Uh oh!

jeremiedbb commented Mar 6, 2024

Uh oh!

Uh oh!

		sample_weight = np.asarray(sample_weight)
		check_consistent_length(y_true, y_pred, sample_weight)

	sample_weight = np.asarray(sample_weight)
	check_consistent_length(y_true, y_pred, sample_weight)
	sample_weight = _check_sample_weight(sample_weight, X)

Uh oh!

ENH fast path for binary confusion matrix #15403

ENH fast path for binary confusion matrix #15403

Uh oh!

Conversation

jnothman commented Oct 30, 2019

Uh oh!

jnothman Oct 30, 2019

Choose a reason for hiding this comment

Uh oh!

GregoryMorse Oct 30, 2019

Choose a reason for hiding this comment

Uh oh!

GregoryMorse left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Oct 30, 2019

Uh oh!

GregoryMorse commented Oct 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GregoryMorse commented Oct 30, 2019

Uh oh!

jnothman commented Jan 10, 2020

Uh oh!

GregoryMorse commented Jan 24, 2020

Uh oh!

jnothman commented Jan 26, 2020 via email

Uh oh!

lorentzenchr Jan 15, 2022

Choose a reason for hiding this comment

Uh oh!

lorentzenchr commented Jan 15, 2022

Uh oh!

jeremiedbb commented Mar 6, 2024

Uh oh!

Uh oh!

GregoryMorse left a comment •

edited

Loading

GregoryMorse commented Oct 30, 2019 •

edited

Loading