Skip to content

Error in d2_log_loss_score multiclass when one of the classes is missing in y_true. #30903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 15, 2025

Conversation

vmargonis
Copy link
Contributor

@vmargonis vmargonis commented Feb 25, 2025

task list:

  • implement fix
  • add appropriate tests (regression)
  • successful tests run / docs etc

Reference Issues/PRs

Fixes #30713

What does this implement/fix? Explain your changes.

This implementation fixes an exception that is being thrown in d2_log_loss_score function, when the user passes a labels variable (hence labels are not inferred) and the vector y_true does not contain one or more labels.

An example that reproduces the error:

from sklearn.metrics import d2_log_loss_score

y_true = [0, 1, 1]
y_pred = [[1, 0, 0], [1, 0, 0], [1, 0, 0]]
labels = [0, 1, 2]

d2_log_loss_score(y_true, y_pred, labels=labels)

Breakdown:

  • inside d2_log_loss_score, two log losses are calculated, the log loss of the fitted model aka numerator and the log loss of the null model aka denominator.
  • both losses are calculated with the log_loss function in which the labels variable is passed down to.
  • For numerator, y_pred is equal to the y_pred passed by the user. y_pred has already the "correct" dimension (n, k=3) with respect to the number of labels -- 3 labels given, 3 columns are present in y_pred:
    # log loss of the fitted model
    numerator = log_loss(
        y_true=y_true,
        y_pred=y_pred,
        normalize=False,
        sample_weight=sample_weight,
        labels=labels,
    )
  • The issue arises before the computation of denominator, where y_true is supposed to be transformed into a 2D array of dimension (n, k):
    _, y_value_indices = np.unique(y_true, return_inverse=True)
    counts = np.bincount(y_value_indices, weights=weights)
    y_prob = counts / weights.sum()
    y_pred_null = np.tile(y_prob, (len(y_true), 1))

    # log loss of the null model
    denominator = log_loss(
        y_true=y_true,
        y_pred=y_pred_null,
        normalize=False,
        sample_weight=sample_weight,
        labels=labels,
    )
  • y_value_indices = [0, 1, 1] because the unique elements in y_true are 2: [0,1]
  • counts = [1, 2] because there are two bins in y_value_indices. At this point counts should be [1,2,0] but labels are not taken into consideration. This is also why the error does not appear when all labels are present in y_true.
  • y_prob has the same size as counts, i.e. 2
  • y_pred_null is thefore an array of (n, 2) instead of (n, 3)
  • exception is thrown inside log_loss because labels has size 3 and y_pred has 2 columns

Change explanation:

  • Take into account the labels array (if passed) for the computation of counts

Any other comments?

Copy link

github-actions bot commented Feb 25, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: db7ed2e. Link to the linter CI: here

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @vmargonis

@vmargonis vmargonis force-pushed the bugfix/issue30713 branch 2 times, most recently from 983caf3 to 6d4e09e Compare February 28, 2025 19:51
@vmargonis vmargonis marked this pull request as ready for review February 28, 2025 19:53
@vmargonis
Copy link
Contributor Author

@OmarManzoor I see the test test_d2_log_loss_score failing in the ci but it passes normally when I run in my machine. Any idea why?

@OmarManzoor
Copy link
Contributor

@OmarManzoor I see the test test_d2_log_loss_score failing in the ci but it passes normally when I run in my machine. Any idea why?

I think this is an issue of the numpy versions. I think we may have numpy > 2.0 whereas the failing pipelines are using numpy 1.21.5 and numpy 1.19.5. I think we might need a solution that works with the earlier versions of numpy considering that np.equal.outer doesn't really work there.

@vmargonis vmargonis force-pushed the bugfix/issue30713 branch from 6d4e09e to 4b8ad82 Compare March 1, 2025 18:00
@vmargonis
Copy link
Contributor Author

I think this is an issue of the numpy versions. I think we may have numpy > 2.0 whereas the failing pipelines are using numpy 1.21.5 and numpy 1.19.5. I think we might need a solution that works with the earlier versions of numpy considering that np.equal.outer doesn't really work there.

I rolled back to my original implementation.

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. Can you please add a changelog entry as a fix. You can check for examples of any open PRs which contain a changelog on how and where to place the entry.

@vmargonis vmargonis force-pushed the bugfix/issue30713 branch from 4b8ad82 to 51e65de Compare March 2, 2025 10:56
@vmargonis vmargonis force-pushed the bugfix/issue30713 branch from 51e65de to d190c11 Compare March 3, 2025 09:10
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @vmargonis

@OmarManzoor
Copy link
Contributor

@lorentzenchr Could you please have a look?

@OmarManzoor OmarManzoor added the Waiting for Second Reviewer First reviewer is done, need a second one! label Mar 10, 2025
@vmargonis
Copy link
Contributor Author

Hello, any updates on this? @lorentzenchr @OmarManzoor

@OmarManzoor OmarManzoor added this to the 1.7 milestone Apr 11, 2025
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @vmargonis.

I simplified a bit the tests and splitted them to have proper unit tests. I also updated one test to better test that we actually compute the right thing.

@jeremiedbb jeremiedbb merged commit 1ef751b into scikit-learn:main Apr 15, 2025
36 checks passed
@vmargonis vmargonis deleted the bugfix/issue30713 branch April 19, 2025 11:53
lucyleeow pushed a commit to EmilyXinyi/scikit-learn that referenced this pull request Apr 23, 2025
…missing in y_true. (scikit-learn#30903)

Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:metrics Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error in d2_log_loss_score multiclass when one of the classes is missing in y_true.
4 participants