-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
adjusted_mutual_info_score takes a long time with lists containing many unique values #24254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for opening the issue. One would need to profile the implementation to see what the slow parts are to move forward on this issue. |
Performed a high-level analysis and found that in a sample of 6000 records, 99% of the time is taken by
|
almost entire performance degradation is coming from this line of code For 6000 records, |
So to me, since the time complexity of above line of code is O(N^2), then we can't actually expect linear increases in runtime. |
Describe the bug
The runtime of
adjusted_mutual_info_score
jumps significantly when we have large amounts of unique values in the two lists. Hovering around 6k total unique values (ie 2 columns of 3k unique values) keeps the runtime around a minute, but when we increase the number of unique values, the runtime shoots up.Steps/Code to Reproduce
Expected Results
small or linear increase in runtime as we increase the number of unique values.
Actual Results
Large increase in runtime
2 rows of 6k unique values: 598s
2 rows of 8k unique values: 889s
2 rows of 10k unique values: 1118s
Versions
The text was updated successfully, but these errors were encountered: