Skip to content

FEA Add DummyClassifier strategy that produces randomized probabilities #31488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

cboseak
Copy link

@cboseak cboseak commented Jun 5, 2025

Reference Issues/PRs

Fixes #31462

What does this implement/fix? Explain your changes.

This PR adds a new strategy to DummyClassifier called "random_proba" that generates randomized probability distributions for classification tasks. This strategy can be used for benchmarking and testing purposes where completely random probabilistic outputs are desirable.

Copy link

github-actions bot commented Jun 5, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: e39ae04. Link to the linter CI: here

Copy link

@tmcclintock tmcclintock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for implementing this so quickly. I'd have done it myself but I appreciate you being so eager!

I left one non-blocking suggestion that I think will be ignored, since it seems to be not the trend in this project.

Great work! 🚀

@betatim betatim changed the title [31462] DummyClassifier strategy that produces randomized probabilities FEA Add DummyClassifier strategy that produces randomized probabilities Jun 6, 2025
Copy link
Member

@betatim betatim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I think we need to do two things in addition to a second review:

  1. do we want this feature?
  2. is there a better name for the strategy? uniform-proba doesn't really tell you what it is if you don't already know the answer. Could we have this behaviour as part of uniform (so no new name needed)? Ideas welcome

Thanks a lot for making the PR so quickly and without waiting for a 👍 / 👎. As so often the discussion about naming and whether we want to do this or not can take much longer than the actual implementation work. Patience please :D

@betatim betatim added Needs Decision - Include Feature Requires decision regarding including feature Quick Review For PRs that are quick to review labels Jun 6, 2025
@cboseak
Copy link
Author

cboseak commented Jun 6, 2025

  1. do we want this feature?

I see it as a 'why not' feature. Its added functionality that doesn't hinder or affect existing functionality. Worst case, it goes unused but it shouldn't negatively affect anyone. Its a 2 way door decision.

  1. is there a better name for the strategy? uniform-proba doesn't really tell you what it is if you don't already know the answer.

Just let me know what to update it to. I named it what was suggested in the issue but have no preference on name

@tmcclintock
Copy link

@betatim thank you for your healthy skepticism :)

  1. Do we need this? Yes, I think so. I have personally seen this functionality implemented at three companies in order to test their ML pipelines. So, it's likely it would be used in many instances.
  2. I think uniform-proba is good for two reasons:
    i. proba implies that strategy applies to the probabilities (bc predict_proba)
    ii. uniform implies randomness and a relation to the uniform strategy, which there is since uniform applies to the predicted labels while uniform-proba applies to the probabilities

Co-authored-by: Tom McClintock <thmsmcclintock@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Decision - Include Feature Requires decision regarding including feature Quick Review For PRs that are quick to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feat: DummyClassifier strategy that produces randomized probabilities
3 participants