Support for regression by classification #15850

dabana · 2019-12-09T20:13:59Z

My team and I are working on an application of regression by classification, a technique described in this article.

In a nut shell regression by classification is approaching a regression problem with multi-class classification algorithms. The key part of this technique is to perform discretization, or binning, of the (continous) target prior to classification. The article mentions 3 different approaches for target discretization which are supported by sklearn's KBinsDiscretizer.

Equally probable interval (this is the quantile strategy of KBinsDiscretizer)
Equal width interval (this is the uniform strategy of KBinsDiscretizer)
K-means clustering (this is the kmeans strategy of KBinsDiscretizer)

In regression by classification, the choice of the numbers of classes, the n_bins parameter, is critical. One straight forward way to tune this parameter and to choose the binning strategy is to use cross-validation. But because transformations on y (see #4143) are currently forbidden in scikit-learn, this is not "natively" supported.

We found a way around this by creating our own meta-estimator, as suggested by @jnothman elsewhere. But one problem remained. How can we tell scikit-learn to compute evaluation metrics on BINNED targets, and not the original CONTINOUS targets?

We achieved this by hacking the _PredictScorer class on our scikit-learn fork. The hack looks for a special custom method called get_transformed_targets on our home-brewed meta-estimator. If this method is present, the score is computed using transformed (binned) targets. Here is the hack:

class _PredictScorer(_BaseScorer):
    def _score(self, method_caller, estimator, X, y_true, sample_weight=None):
        """[... docstring ...]
        """
        #Here starts the hack
        if hasattr(estimator, 'get_transformed_targets'):
            y_true = estimator.get_transformed_targets(X, y_true)
        #Here ends the hack

        y_pred = method_caller(estimator, "predict", X)
        if sample_weight is not None:
            return self._sign * self._score_func(y_true, y_pred,
                                                 sample_weight=sample_weight,
                                                 **self._kwargs)
        else:
            return self._sign * self._score_func(y_true, y_pred,
                                                 **self._kwargs)

Another problem we encounter is to use the KBinsDiscretizer class on targets. We plan on doing this with a custom meta-transformer.

It would be nice if the regression by classification was supported by scikit-learn out of the box. Perhaps the re-sampling options coming soon will make this possible, but it will have to be tested.

jnothman · 2019-12-10T08:03:39Z

Why do you want to evaluate on the binned targets? I'd assume that you ultimately want to evaluate on the cts targets, but might want a classification evaluation as a diagnostic.

This doesn't actually fit the resampling case nicely because resamplers do not modify the predicted data, and indeed I find it quite strange that one could use a resamplers to change the prediction space.

I don't really see why regression by classification could not be supported by TransformedTargetRegressor. Does TransformedTargetRegressor(LogisticRegression(), KBinsDiscretizer(encode='ordinal'), check_inverse=False) just work (but evaluate in the continuous space)?

dabana · 2019-12-10T13:16:48Z

Why do you want to evaluate on the binned targets? I'd assume that you ultimately want to evaluate on the continuous targets, but might want a classification evaluation as a diagnostic.

I don't really see why regression by classification could not be supported by TransformedTargetRegressor. Does TransformedTargetRegressor(LogisticRegression(), KBinsDiscretizer(encode='ordinal'), check_inverse=False) just work (but evaluate in the continuous space)?

This is what is done in the referenced article, and you are right, this can be handled by TransformedTargetRegressor. They use regression evaluation metrics to compare the different binning methods.

I am not fully comfortable with evaluating in the continuous space because the inverse of the binning transform is ambiguous due to rounding errors. I would rather switch to classification metrics altogether, especially for small n_bins. My approach to regression by classification would be more accurately described as formulating regression problems as classification problems.

It would be nice to be able to "skip" the inverse transformation with TransformedTargetRegressor and to evaluate in the so-called "transformed" space by applying the direct transform on the targets (true values) before computing the score. This is effectively what is done through the "_PredictScorer hack" mentioned above.

chkoar · 2019-12-10T13:53:52Z

@dabana Since you your task is regression (independently how you do it) you should use regression evaluation metrics. A corner case that would be when performing ordinal regression where I suppose you need evaluation metrics for both domains (classification and regression) but this is not the case here.

jnothman · 2019-12-10T20:58:37Z

Yes, evaluating faithfully to the task (and not necessarily faithfully to how you model it) is a very essential part of predictive learning. If you believe that the task is still meaningfully evaluated in the transformed space then you should be able to discretise y using task-related rules before applying scikit-learn. If TransformedTargetRegressor supports regression by classification, evaluating in the original space, then I think that is what we should support, and this issue can be closed (although maybe we could do with an example of this technique in our gallery).

dabana · 2019-12-11T03:43:03Z

If TransformedTargetRegressor supports regression by classification,
evaluating in the original space, then I think that is what we should
support.

But if it is already supported, why not support evaluation in the transformed space too? It could just be a simple boolean input to TransformedTargetRegressor? A toogle switch that turns inverse_transform on or off. Anything along the lines of improving the API to TransformedTargetRegressor so it could also be a fully functional TransformedTargetClassifier sounds good to me. I volunteer for the PR.

The task at hand has identifiable "task-related rules" for creating classes, it is true. But the class definitions are ambiguous (region/regulation dependent for instance). This is why we opted for regression in the first place. But we are looking back at classification for several reasons.

In our domain, classification fits some use-cases better. But there are some other uses-case where continuous predictions is more appropriate (they are simply mandatory in our case). Basically I am stuck. I need to be able to process both classification and regression tasks with the same automated data pipeline. I believe many application domains face this same dilemma.
By treating n_bins as a hyper-parameters (perhaps with a kmeans strategy at the KBinsDiscretizer ), we might be able to learn more interpretable classes. This would help a huge lot data scientists as well as application scientists.
I am also wondering maybe the stability of predictions could be improved in regard to variations in the input data. This is a big issue in our application. More robust models would help.
Maybe the learning problem could be less intensive computational if formulated as a classification task rather then a regression task? But that I doubt. Sounds like a no free lunch scenario?

@jnothman I think you should leave the issue open.

jnothman · 2019-12-11T08:53:09Z

But if it is already supported, why not support evaluation in the transformed space too? It could just be a simple boolean input to TransformedTargetRegressor?

because it's bad practice in general.

because if you want such diagnostic measures you can use something like:

def make_regression_by_classification_scorer(scoring):
    scorer = get_scorer(scoring)
    return lambda est, X, y: scorer(est.regressor_, X, est.transformer_.transform(y))

(the above might need some tweaking)

if n_bins is a hyperparameter, then most classification metrics won't be comparable across n_bins settings...

dabana · 2019-12-11T19:47:30Z

because it's bad practice in general.

Sorry @jnothman, I am not sure why evaluating in the transformed space is "bad practice in general". Can you explain a bit more?

because if you want such diagnostic measures you can use something like:
def make_regression_by_classification_scorer(scoring):
scorer = get_scorer(scoring)
return lambda est, X, y: scorer(est.regressor_, X, est.transformer_.transform(y))

Or something like:

class TransformedTargetScorer():

    def __init__(self, scoring):
        self.scorer = get_scorer(scoring)

    def __call__(self, estimator, X, y_true):
        if hasattr(estimator, 'get_transformed_targets'):
            y_true = estimator.get_transformed_targets(X, y_true)
        return self.scorer(estimator, X, y_true)

Thanks for the discussion. This is the way I implemented it.

jnothman · 2019-12-11T20:35:59Z

We try to encourage good practice, particularly around evaluation. Evaluating in the classification space does not tell you about how well you solved the regression problem. I think the API can make it possible, but should not make it too easy or the default.

dabana · 2019-12-11T23:33:48Z

okai fine with me if you close.

I found this an article on Arxiv about discretizing the targets (action space) in reinforcement learning.
https://arxiv.org/pdf/1901.10500.pdf.

But that's about it. Most of the time discretization is performed on the features.

jnothman · 2019-12-12T01:15:14Z

But that has an extrinsic evaluation

dabana · 2019-12-12T17:01:08Z

Does TransformedTargetRegressor(LogisticRegression(), KBinsDiscretizer(encode='ordinal'), check_inverse=False) just work (but evaluate in the continuous space)?

see #15872

glemaitre · 2019-12-13T22:19:51Z

I agree with @jnothman and @chkoar. It makes little sense to me to not evaluate the performance of a model based on the type of problem that one tries to solve. Not having support for this use case would really make sense in 99% of the time. And I am still not convinced about the remaining 1%.

glemaitre · 2019-12-13T22:41:00Z

@dabana What I can see from this issue is that you might have a real-world and maybe good use-case to apply a "regression by classification" approach using several components from scikit-learn. This is actually useful and we could think about making an example documenting on how to solve such problems: (i) identify when it will be profitable for someone, (ii) how to make a machine learning model in this case, (iii) how to evaluate such model, and (iv) compare with some other baseline to show the benefit.

lorentzenchr · 2021-08-10T13:49:27Z

It would be nice if the regression by classification was supported by scikit-learn out of the box.

I would say that this does not meet the scikit-learn inclusion criteria.

The discussion then shifted to whether or not to evaluate a model on the transformed scale (discretized in this example). While some interpretability tools can make (more) sense on the transformed scale, I second @jnothman‘s comment about bad practise.

Summary: I‘m closing this issue.

chkoar · 2021-08-10T14:47:18Z

I would say that this does not meet the scikit-learn inclusion criteria.

That is correct. Although, I think that we could include an example that leverages the aforementioned technique as we did with the inductive clustering example.

lorentzenchr · 2021-08-10T16:25:35Z

@chkoar From a statistical point of view, I have major concerns to convert a regression task on a continuous target into a classification task. Therefore, I would rather not put it in an example.

chkoar · 2021-08-10T16:57:40Z

scikit-learn has all the components in order to implement the regression by discretization approach. If I am not mistaken even WEKA includes this meta estimator by default. Since we always evaluate using cross-validation, I would appreciate if you could list your major concerns regarding the approach. Thanks.

lorentzenchr · 2021-08-10T20:09:13Z

Fair enough. But let‘s not make a long discussion out of it. (Disclaimer: Maybe some points are misplaced as I have not studied the approach in detail.)

First, I prefer strong positive arguments: Why should I follow this approach?
One loses information by binning (that‘s why I favor the SplineTransformer over the KBinsDiscretizer). Why should you do that in the first place (except data compression)?
Continuous targets have an order (<). Classification classes (in scikit-learn) don’t.
Continuous targets usually have some kind of smoothness: Proximity in feature space (for continuous features) means proximity in target space.
All this loss of information is accompanied by possibly more parameters in the model, eg LogisticRegression has number of coefficients proportional to number of classes.
The binning obfuscates whether one is trying to predict the expectation/mean or a quantile. (I guess it‘s more meant for the expectation.)
One can end up with a badly (conditionally) calibrated regression model, ie biased. (Ok, this can also happen for std regression techniques.)

glemaitre · 2021-08-11T09:05:20Z

@lorentzenchr I found the closing a bit abrupt :)

I completely agree with your point regarding the non-inclusion of a potential meta-estimator and the danger and bad/wrong practice of evaluating a regression problem with the subjacent classification proxy problem.

Regarding not introducing an example, I would not be as categorical. If there is a meaningful and well-defined problem where it makes sense, I would not be against it. Your arguments seem fair to me and really meaningful for linear models. However, I am not sure that the tree-based/rule-based models would not benefit from the classification proxy problem.
For instance, I was looking at the following: https://www.jair.org/index.php/jair/article/download/10150/24055

But it might be possible that there are better alternatives (skope-rules for regression?) out there.

lorentzenchr · 2021-08-11T13:34:41Z

I found the closing a bit abrupt

Indeed. That‘s why I took the time to lay out my reasons. And as a contributor, the most frustrating experience is to not get a response at all. This one was stalled for 1.5 years.
If there is a compelling use case, an example would be great. I just don‘t have enough imagination for it and therefore think that time could be better spend on other issues.

lorentzenchr · 2021-12-20T17:21:33Z

As reference for a more scientific reasoning, "V. Fedorov, F. Mannino, Rongmei Zhang" Consequences of dichotomization" doi: 10.1002/pst.331 concludes:

While the analysis of dichotomized outcomes may be easier, there are no benefits to this approach when the true outcomes can be observed and the ‘working’ model is flexible enough to describe the
population at hand. Thus, dichotomization should be avoided in most cases.

glemaitre · 2021-12-20T17:28:17Z

Dichotomization is the transformation of a continuous outcome (response) to a binary outcome

I just read the abstract. Is it only looking at binarizing the output?

lorentzenchr · 2021-12-20T17:31:30Z

It means transforming / discretizing the observed y.

glemaitre · 2021-12-20T17:37:02Z

Thanks for the reference. This looks like a compelling reference.

lorentzenchr · 2022-06-09T05:58:02Z

As info: There is now a longer discussion here: https://stats.stackexchange.com/questions/565537/is-there-ever-a-reason-to-solve-a-regression-problem-as-a-classification-problem

glemaitre mentioned this issue Dec 13, 2019

TransformedTargetRegressor returns the wrong _estimator_type with a classifier #15872

Closed

lorentzenchr closed this as completed Aug 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for regression by classification #15850

Support for regression by classification #15850

dabana commented Dec 9, 2019 •

edited

Loading

jnothman commented Dec 10, 2019

dabana commented Dec 10, 2019 •

edited

Loading

chkoar commented Dec 10, 2019

jnothman commented Dec 10, 2019 via email

dabana commented Dec 11, 2019 •

edited

Loading

jnothman commented Dec 11, 2019

dabana commented Dec 11, 2019

jnothman commented Dec 11, 2019

dabana commented Dec 11, 2019 •

edited

Loading

jnothman commented Dec 12, 2019 via email

dabana commented Dec 12, 2019 •

edited

Loading

glemaitre commented Dec 13, 2019

glemaitre commented Dec 13, 2019

lorentzenchr commented Aug 10, 2021

chkoar commented Aug 10, 2021

lorentzenchr commented Aug 10, 2021

chkoar commented Aug 10, 2021

lorentzenchr commented Aug 10, 2021 •

edited

Loading

glemaitre commented Aug 11, 2021

lorentzenchr commented Aug 11, 2021

lorentzenchr commented Dec 20, 2021 •

edited

Loading

glemaitre commented Dec 20, 2021

lorentzenchr commented Dec 20, 2021

glemaitre commented Dec 20, 2021

lorentzenchr commented Jun 9, 2022

Support for regression by classification #15850

Support for regression by classification #15850

Comments

dabana commented Dec 9, 2019 • edited Loading

jnothman commented Dec 10, 2019

dabana commented Dec 10, 2019 • edited Loading

chkoar commented Dec 10, 2019

jnothman commented Dec 10, 2019 via email

dabana commented Dec 11, 2019 • edited Loading

jnothman commented Dec 11, 2019

dabana commented Dec 11, 2019

jnothman commented Dec 11, 2019

dabana commented Dec 11, 2019 • edited Loading

jnothman commented Dec 12, 2019 via email

dabana commented Dec 12, 2019 • edited Loading

glemaitre commented Dec 13, 2019

glemaitre commented Dec 13, 2019

lorentzenchr commented Aug 10, 2021

chkoar commented Aug 10, 2021

lorentzenchr commented Aug 10, 2021

chkoar commented Aug 10, 2021

lorentzenchr commented Aug 10, 2021 • edited Loading

glemaitre commented Aug 11, 2021

lorentzenchr commented Aug 11, 2021

lorentzenchr commented Dec 20, 2021 • edited Loading

glemaitre commented Dec 20, 2021

lorentzenchr commented Dec 20, 2021

glemaitre commented Dec 20, 2021

lorentzenchr commented Jun 9, 2022

dabana commented Dec 9, 2019 •

edited

Loading

dabana commented Dec 10, 2019 •

edited

Loading

dabana commented Dec 11, 2019 •

edited

Loading

dabana commented Dec 11, 2019 •

edited

Loading

dabana commented Dec 12, 2019 •

edited

Loading

lorentzenchr commented Aug 10, 2021 •

edited

Loading

lorentzenchr commented Dec 20, 2021 •

edited

Loading