-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Error in d2_log_loss_score multiclass when one of the classes is missing in y_true. #30903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @vmargonis
983caf3
to
6d4e09e
Compare
@OmarManzoor I see the test |
I think this is an issue of the numpy versions. I think we may have numpy > 2.0 whereas the failing pipelines are using numpy 1.21.5 and numpy 1.19.5. I think we might need a solution that works with the earlier versions of numpy considering that np.equal.outer doesn't really work there. |
6d4e09e
to
4b8ad82
Compare
I rolled back to my original implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. Can you please add a changelog entry as a fix. You can check for examples of any open PRs which contain a changelog on how and where to place the entry.
4b8ad82
to
51e65de
Compare
51e65de
to
d190c11
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you @vmargonis
@lorentzenchr Could you please have a look? |
d190c11
to
598301d
Compare
Hello, any updates on this? @lorentzenchr @OmarManzoor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @vmargonis.
I simplified a bit the tests and splitted them to have proper unit tests. I also updated one test to better test that we actually compute the right thing.
…missing in y_true. (scikit-learn#30903) Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
task list:
Reference Issues/PRs
Fixes #30713
What does this implement/fix? Explain your changes.
This implementation fixes an exception that is being thrown in
d2_log_loss_score
function, when the user passes alabels
variable (hence labels are not inferred) and the vectory_true
does not contain one or more labels.An example that reproduces the error:
Breakdown:
d2_log_loss_score
, two log losses are calculated, the log loss of the fitted model akanumerator
and the log loss of the null model akadenominator
.log_loss
function in which thelabels
variable is passed down to.numerator
,y_pred
is equal to they_pred
passed by the user.y_pred
has already the "correct" dimension (n, k=3) with respect to the number oflabels
-- 3 labels given, 3 columns are present iny_pred
:denominator
, wherey_true
is supposed to be transformed into a 2D array of dimension (n, k):y_value_indices = [0, 1, 1]
because the unique elements iny_true
are 2:[0,1]
counts = [1, 2]
because there are two bins iny_value_indices
. At this pointcounts
should be[1,2,0]
butlabels
are not taken into consideration. This is also why the error does not appear when all labels are present iny_true
.y_prob
has the same size ascounts
, i.e. 2y_pred_null
is thefore an array of (n, 2) instead of (n, 3)log_loss
becauselabels
has size 3 andy_pred
has 2 columnsChange explanation:
labels
array (if passed) for the computation ofcounts
Any other comments?