Support for regression by classification

My team and I are working on an application of **regression by classification,** a technique described in [this article](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.21.5374&rep=rep1&type=pdf).

In a nut shell **regression by classification** is approaching a regression problem with multi-class classification algorithms. The key part of this technique is to perform discretization, or binning, of the (continous) target prior to classification. The article mentions 3 different approaches for target discretization which are supported by sklearn's [KBinsDiscretizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html#sklearn.preprocessing.KBinsDiscretizer).

1. Equally probable interval (this is the **_quantile_** strategy of KBinsDiscretizer) 
1. Equal width interval (this is the **_uniform_** strategy of KBinsDiscretizer) 
1. K-means clustering (this is the **_kmeans_** strategy of KBinsDiscretizer) 

In **regression by classification**, the choice of the numbers of classes, the **n_bins** parameter, is critical. One straight forward way to tune this parameter and to choose the binning strategy is to use cross-validation. But because _transformations on y_ (see #4143) are currently forbidden in scikit-learn, this is not "natively" supported. 

We found a way around this by creating our own meta-estimator, as suggested by @jnothman elsewhere. But one problem remained. **How can we tell scikit-learn to compute evaluation metrics on BINNED targets, and not the original CONTINOUS targets?** 

We achieved this by hacking the **_PredictScorer** class on our scikit-learn fork. The hack looks for a special custom method called **get_transformed_targets** on our home-brewed meta-estimator. If this method is present, the score is computed using transformed (binned) targets. Here is the hack:

``` py
class _PredictScorer(_BaseScorer):
    def _score(self, method_caller, estimator, X, y_true, sample_weight=None):
        """[... docstring ...]
        """
        #Here starts the hack
        if hasattr(estimator, 'get_transformed_targets'):
            y_true = estimator.get_transformed_targets(X, y_true)
        #Here ends the hack

        y_pred = method_caller(estimator, "predict", X)
        if sample_weight is not None:
            return self._sign * self._score_func(y_true, y_pred,
                                                 sample_weight=sample_weight,
                                                 **self._kwargs)
        else:
            return self._sign * self._score_func(y_true, y_pred,
                                                 **self._kwargs)
```
Another problem we encounter is to use the KBinsDiscretizer class on targets. We plan on doing this with a custom meta-transformer.

It would be nice if the **regression by classification** was supported by scikit-learn out of the box. Perhaps the re-sampling options coming soon will make this possible, but it will have to be tested.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support for regression by classification #15850

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Support for regression by classification #15850

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions