PERF speed up confusion matrix calculation #26820

adrinjalali · 2023-07-11T19:58:40Z

Speed up confusion_matrix calculation.

This achieves the speedup by calculating the unique values on y_true and y_pred only once, and adds the required args along the way to the inner methods.

Not sure about the names of the added args on public functions, and might need to refactor to hide them if necessary.

Also this needs to add the same for some other functions to make the tests pass, but for now this works, and brings the time from about 15s to 5s on my machine.

import numpy as np

from sklearn.metrics import classification_report

y_true = np.random.randint(0, 2, size=2**23)
y_pred = y_true.copy()
np.random.shuffle(y_pred[2**20 : 2**21])

print(
    classification_report(
        y_true=y_true,
        y_pred=y_pred,
        digits=10,
        output_dict=False,
        zero_division=0,
    )
)

github-actions · 2023-07-11T20:00:41Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 75929c9. Link to the linter CI: here}

fingoldo · 2023-07-25T13:01:43Z

Hi, what is required to merge this change?

adrinjalali · 2023-07-25T13:11:05Z

reviews, our bottleneck is usually reviewer time ;)

thomasjpfan

In principle, I like removing the unique calls. On the other hand, I do not like adding more to the public API. I'm overall +0.5.

thomasjpfan · 2023-07-25T15:18:36Z

sklearn/metrics/_classification.py

+    unique_y_true=None,
+    unique_y_pred=None,


labels, unique_y_true and unique_y_pred makes the public API looks really bloated. The only alternative I see is to have a private function.

I agree with @thomasjpfan. I see only 2 solutions:

Make some private function where the public one is a wrapper around

Implementing our own efficient unique for NumPy arrays but it could be rather complex

(third solution is to contribute upstream :))

sklearn/metrics/_classification.py

PERF speed up confusion matrix calculation

75929c9

github-actions bot added module:metrics module:utils labels Jul 11, 2023

adrinjalali mentioned this pull request Jul 11, 2023

Speed up classification_report #26808

Closed

thomasjpfan reviewed Jul 25, 2023

View reviewed changes

thomasjpfan mentioned this pull request Aug 15, 2023

ENH add pos_label to confusion_matrix #26839

Open

adrinjalali mentioned this pull request Dec 12, 2023

Attempt to speed up unique value discovery in _BaseEncoder for polars and pandas series #27911

Open

jeremiedbb mentioned this pull request Mar 7, 2024

ENH Add fast path for binary confusion matrix #28578

Closed

lucyleeow mentioned this pull request Mar 7, 2024

metrics.confusion_matrix far too slow for Boolean cases #15388

Closed

adrinjalali mentioned this pull request Aug 29, 2024

PERF speedup classification_report by attaching unique values to dtype.metadata #29738

Merged

adrinjalali closed this in #29738 Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF speed up confusion matrix calculation #26820

PERF speed up confusion matrix calculation #26820

Uh oh!

adrinjalali commented Jul 11, 2023

Uh oh!

github-actions bot commented Jul 11, 2023

Uh oh!

fingoldo commented Jul 25, 2023

Uh oh!

adrinjalali commented Jul 25, 2023

Uh oh!

thomasjpfan left a comment

Uh oh!

thomasjpfan Jul 25, 2023

Uh oh!

glemaitre Nov 3, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PERF speed up confusion matrix calculation #26820

PERF speed up confusion matrix calculation #26820

Uh oh!

Conversation

adrinjalali commented Jul 11, 2023

Uh oh!

github-actions bot commented Jul 11, 2023

✔️ Linting Passed

Uh oh!

fingoldo commented Jul 25, 2023

Uh oh!

adrinjalali commented Jul 25, 2023

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Jul 25, 2023

Choose a reason for hiding this comment

Uh oh!

glemaitre Nov 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre Nov 3, 2023 •

edited

Loading