-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
PERF speed up confusion matrix calculation #26820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi, what is required to merge this change? |
reviews, our bottleneck is usually reviewer time ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, I like removing the unique
calls. On the other hand, I do not like adding more to the public API. I'm overall +0.5.
unique_y_true=None, | ||
unique_y_pred=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
labels
, unique_y_true
and unique_y_pred
makes the public API looks really bloated. The only alternative I see is to have a private function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @thomasjpfan. I see only 2 solutions:
- Make some private function where the public one is a wrapper around
- Implementing our own efficient
unique
for NumPy arrays but it could be rather complex
(third solution is to contribute upstream :))
Fixes #26808
Speed up
confusion_matrix
calculation.This achieves the speedup by calculating the unique values on
y_true
andy_pred
only once, and adds the required args along the way to the inner methods.Not sure about the names of the added args on public functions, and might need to refactor to hide them if necessary.
Also this needs to add the same for some other functions to make the tests pass, but for now this works, and brings the time from about 15s to 5s on my machine.