Feat: DummyClassifier strategy that produces randomized probabilities #31462

tmcclintock · 2025-06-01T17:27:18Z

Describe the workflow you want to enable

Motivation

The dummy module is fantastic for testing pipelines all the way up through enterprise scales. The strategies offered in the DummyClassifier are excellent for testing corner cases. However, the strategies offered fall short when testing pipelines that include downstream tasks that depend on moments of the predicted probabilities (e.g. gains charts).

This is because the existing strategies do not include sampling random probabilities.

Proposed API:

Consider adding a new strategy with a name like uniform-proba or score-random or something similar that results in this behavior for binary classification:

print(DummyClassifier(strategy="uniform-proba").fit(X, y).predict_proba(X))
"""
[[0.5651713  0.4348287 ]
 [0.36557341 0.63442659]
 [0.42386353 0.57613647]
 ...
 [0.30348692 0.69651308]
 [0.59589879 0.40410121]
 [0.32664176 0.67335824]]
"""

Describe your proposed solution

Proposed implementation

I had something like this in mind:

class DummyClassifier(MultiOutputMixin, ClassifierMixin, BaseEstimator):
    ...

    def predict_proba(self, X):
        ...
        for k in range(self.n_outputs_):
            if self._strategy == "uniform-proba":
                out = rs.dirichlet([1] * n_classes_[k], size=n_samples)
                out = out.astype(np.float64)
            ...

Similar to the "stratified" strategy, this simple implementation relies on numpy.random, in this case the dirichlet distribution. By setting all the alphas to 1, we are specifying that the probabilities of each class are equally distributed -- in contrast, the "stratified" strategy effectively samples from a dirichlet distribution with one alpha equal to 1 and the rest equal to 0.

Describe alternatives you've considered, if relevant

No response

Additional context

I am happy to make the PR. The biggest question is what the strategy string should be.

Thank you for reading 🙏.

The text was updated successfully, but these errors were encountered:

betatim · 2025-06-03T14:25:54Z

I think this could be useful. What to call the strategy and which strategy to use.

tmcclintock · 2025-06-04T02:27:13Z

Thanks, @betatim. Do you recommend I create a PR or wait for more discussion?

betatim · 2025-06-04T07:11:32Z

To be honest, I don't know. If you are ok investing a bit of time to make a PR that would be good, though it could be wasted if people don't like the idea.

@ogrisel do you have an opinion on this or know who we could ask?

tmcclintock · 2025-06-06T03:40:56Z

Looks like the author of #31488 eagerly knocked this out! We just need an approving reviewer.

glevv · 2025-06-08T14:18:51Z

It will always give a ROC AUC score of around 0.5. The most useful application of DummyClassifier is for model selection and comparisons. Wouldn't the uniform-proba strategy be a bit redundant in this case?

tmcclintock · 2025-06-08T15:06:14Z

@glevv good question -- please see my original post for an example. Some performance metrics such as a gains chart depends on there being high entropy in the predicted probabilities. The uniform is not high entropy enough to test these.

tmcclintock added New Feature Needs Triage Issue requires triage labels Jun 1, 2025

betatim removed the Needs Triage Issue requires triage label Jun 3, 2025

cboseak linked a pull request Jun 5, 2025 that will close this issue

FEA Add DummyClassifier strategy that produces randomized probabilities #31488

Open

virchan added the Needs Decision - Include Feature Requires decision regarding including feature label Jun 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Feat: DummyClassifier strategy that produces randomized probabilities #31462

Feat: DummyClassifier strategy that produces randomized probabilities #31462

tmcclintock commented Jun 1, 2025

betatim commented Jun 3, 2025

Uh oh!

tmcclintock commented Jun 4, 2025

Uh oh!

betatim commented Jun 4, 2025

Uh oh!

tmcclintock commented Jun 6, 2025

Uh oh!

glevv commented Jun 8, 2025

Uh oh!

tmcclintock commented Jun 8, 2025

Uh oh!

Uh oh!

Feat: DummyClassifier strategy that produces randomized probabilities #31462

Feat: DummyClassifier strategy that produces randomized probabilities #31462

Comments

tmcclintock commented Jun 1, 2025

Describe the workflow you want to enable

Motivation

Proposed API:

Describe your proposed solution

Proposed implementation

Describe alternatives you've considered, if relevant

Additional context

betatim commented Jun 3, 2025

Uh oh!

tmcclintock commented Jun 4, 2025

Uh oh!

betatim commented Jun 4, 2025

Uh oh!

tmcclintock commented Jun 6, 2025

Uh oh!

glevv commented Jun 8, 2025

Uh oh!

tmcclintock commented Jun 8, 2025

Uh oh!