Skip to content

Support for regression by classification #15850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dabana opened this issue Dec 9, 2019 · 25 comments
Closed

Support for regression by classification #15850

dabana opened this issue Dec 9, 2019 · 25 comments

Comments

@dabana
Copy link

dabana commented Dec 9, 2019

My team and I are working on an application of regression by classification, a technique described in this article.

In a nut shell regression by classification is approaching a regression problem with multi-class classification algorithms. The key part of this technique is to perform discretization, or binning, of the (continous) target prior to classification. The article mentions 3 different approaches for target discretization which are supported by sklearn's KBinsDiscretizer.

  1. Equally probable interval (this is the quantile strategy of KBinsDiscretizer)
  2. Equal width interval (this is the uniform strategy of KBinsDiscretizer)
  3. K-means clustering (this is the kmeans strategy of KBinsDiscretizer)

In regression by classification, the choice of the numbers of classes, the n_bins parameter, is critical. One straight forward way to tune this parameter and to choose the binning strategy is to use cross-validation. But because transformations on y (see #4143) are currently forbidden in scikit-learn, this is not "natively" supported.

We found a way around this by creating our own meta-estimator, as suggested by @jnothman elsewhere. But one problem remained. How can we tell scikit-learn to compute evaluation metrics on BINNED targets, and not the original CONTINOUS targets?

We achieved this by hacking the _PredictScorer class on our scikit-learn fork. The hack looks for a special custom method called get_transformed_targets on our home-brewed meta-estimator. If this method is present, the score is computed using transformed (binned) targets. Here is the hack:

class _PredictScorer(_BaseScorer):
    def _score(self, method_caller, estimator, X, y_true, sample_weight=None):
        """[... docstring ...]
        """
        #Here starts the hack
        if hasattr(estimator, 'get_transformed_targets'):
            y_true = estimator.get_transformed_targets(X, y_true)
        #Here ends the hack

        y_pred = method_caller(estimator, "predict", X)
        if sample_weight is not None:
            return self._sign * self._score_func(y_true, y_pred,
                                                 sample_weight=sample_weight,
                                                 **self._kwargs)
        else:
            return self._sign * self._score_func(y_true, y_pred,
                                                 **self._kwargs)

Another problem we encounter is to use the KBinsDiscretizer class on targets. We plan on doing this with a custom meta-transformer.

It would be nice if the regression by classification was supported by scikit-learn out of the box. Perhaps the re-sampling options coming soon will make this possible, but it will have to be tested.

@jnothman
Copy link
Member

Why do you want to evaluate on the binned targets? I'd assume that you ultimately want to evaluate on the cts targets, but might want a classification evaluation as a diagnostic.

This doesn't actually fit the resampling case nicely because resamplers do not modify the predicted data, and indeed I find it quite strange that one could use a resamplers to change the prediction space.

I don't really see why regression by classification could not be supported by TransformedTargetRegressor. Does TransformedTargetRegressor(LogisticRegression(), KBinsDiscretizer(encode='ordinal'), check_inverse=False) just work (but evaluate in the continuous space)?

@dabana
Copy link
Author

dabana commented Dec 10, 2019

Why do you want to evaluate on the binned targets? I'd assume that you ultimately want to evaluate on the continuous targets, but might want a classification evaluation as a diagnostic.

I don't really see why regression by classification could not be supported by TransformedTargetRegressor. Does TransformedTargetRegressor(LogisticRegression(), KBinsDiscretizer(encode='ordinal'), check_inverse=False) just work (but evaluate in the continuous space)?

This is what is done in the referenced article, and you are right, this can be handled by TransformedTargetRegressor. They use regression evaluation metrics to compare the different binning methods.

I am not fully comfortable with evaluating in the continuous space because the inverse of the binning transform is ambiguous due to rounding errors. I would rather switch to classification metrics altogether, especially for small n_bins. My approach to regression by classification would be more accurately described as formulating regression problems as classification problems.

It would be nice to be able to "skip" the inverse transformation with TransformedTargetRegressor and to evaluate in the so-called "transformed" space by applying the direct transform on the targets (true values) before computing the score. This is effectively what is done through the "_PredictScorer hack" mentioned above.

@chkoar
Copy link
Contributor

chkoar commented Dec 10, 2019

@dabana Since you your task is regression (independently how you do it) you should use regression evaluation metrics. A corner case that would be when performing ordinal regression where I suppose you need evaluation metrics for both domains (classification and regression) but this is not the case here.

@jnothman
Copy link
Member

jnothman commented Dec 10, 2019 via email

@dabana
Copy link
Author

dabana commented Dec 11, 2019

If TransformedTargetRegressor supports regression by classification,
evaluating in the original space, then I think that is what we should
support.

But if it is already supported, why not support evaluation in the transformed space too? It could just be a simple boolean input to TransformedTargetRegressor? A toogle switch that turns inverse_transform on or off. Anything along the lines of improving the API to TransformedTargetRegressor so it could also be a fully functional TransformedTargetClassifier sounds good to me. I volunteer for the PR.

The task at hand has identifiable "task-related rules" for creating classes, it is true. But the class definitions are ambiguous (region/regulation dependent for instance). This is why we opted for regression in the first place. But we are looking back at classification for several reasons.

  1. In our domain, classification fits some use-cases better. But there are some other uses-case where continuous predictions is more appropriate (they are simply mandatory in our case). Basically I am stuck. I need to be able to process both classification and regression tasks with the same automated data pipeline. I believe many application domains face this same dilemma.

  2. By treating n_bins as a hyper-parameters (perhaps with a kmeans strategy at the KBinsDiscretizer ), we might be able to learn more interpretable classes. This would help a huge lot data scientists as well as application scientists.

  3. I am also wondering maybe the stability of predictions could be improved in regard to variations in the input data. This is a big issue in our application. More robust models would help.

  4. Maybe the learning problem could be less intensive computational if formulated as a classification task rather then a regression task? But that I doubt. Sounds like a no free lunch scenario?

@jnothman I think you should leave the issue open.

@jnothman
Copy link
Member

But if it is already supported, why not support evaluation in the transformed space too? It could just be a simple boolean input to TransformedTargetRegressor?

  1. because it's bad practice in general.
  2. because if you want such diagnostic measures you can use something like:
    def make_regression_by_classification_scorer(scoring):
        scorer = get_scorer(scoring)
        return lambda est, X, y: scorer(est.regressor_, X, est.transformer_.transform(y))

(the above might need some tweaking)

if n_bins is a hyperparameter, then most classification metrics won't be comparable across n_bins settings...

@dabana
Copy link
Author

dabana commented Dec 11, 2019

  1. because it's bad practice in general.

Sorry @jnothman, I am not sure why evaluating in the transformed space is "bad practice in general". Can you explain a bit more?

because if you want such diagnostic measures you can use something like:
def make_regression_by_classification_scorer(scoring):
scorer = get_scorer(scoring)
return lambda est, X, y: scorer(est.regressor_, X, est.transformer_.transform(y))

Or something like:

class TransformedTargetScorer():

    def __init__(self, scoring):
        self.scorer = get_scorer(scoring)

    def __call__(self, estimator, X, y_true):
        if hasattr(estimator, 'get_transformed_targets'):
            y_true = estimator.get_transformed_targets(X, y_true)
        return self.scorer(estimator, X, y_true)

Thanks for the discussion. This is the way I implemented it.

@jnothman
Copy link
Member

We try to encourage good practice, particularly around evaluation. Evaluating in the classification space does not tell you about how well you solved the regression problem. I think the API can make it possible, but should not make it too easy or the default.

@dabana
Copy link
Author

dabana commented Dec 11, 2019

okai fine with me if you close.

I found this an article on Arxiv about discretizing the targets (action space) in reinforcement learning.
https://arxiv.org/pdf/1901.10500.pdf.

But that's about it. Most of the time discretization is performed on the features.

@jnothman
Copy link
Member

jnothman commented Dec 12, 2019 via email

@dabana
Copy link
Author

dabana commented Dec 12, 2019

Does TransformedTargetRegressor(LogisticRegression(), KBinsDiscretizer(encode='ordinal'), check_inverse=False) just work (but evaluate in the continuous space)?

see #15872

@glemaitre
Copy link
Member

I agree with @jnothman and @chkoar. It makes little sense to me to not evaluate the performance of a model based on the type of problem that one tries to solve. Not having support for this use case would really make sense in 99% of the time. And I am still not convinced about the remaining 1%.

@glemaitre
Copy link
Member

@dabana What I can see from this issue is that you might have a real-world and maybe good use-case to apply a "regression by classification" approach using several components from scikit-learn. This is actually useful and we could think about making an example documenting on how to solve such problems: (i) identify when it will be profitable for someone, (ii) how to make a machine learning model in this case, (iii) how to evaluate such model, and (iv) compare with some other baseline to show the benefit.

@lorentzenchr
Copy link
Member

It would be nice if the regression by classification was supported by scikit-learn out of the box.

I would say that this does not meet the scikit-learn inclusion criteria.

The discussion then shifted to whether or not to evaluate a model on the transformed scale (discretized in this example). While some interpretability tools can make (more) sense on the transformed scale, I second @jnothman‘s comment about bad practise.

Summary: I‘m closing this issue.

@chkoar
Copy link
Contributor

chkoar commented Aug 10, 2021

I would say that this does not meet the scikit-learn inclusion criteria.

That is correct. Although, I think that we could include an example that leverages the aforementioned technique as we did with the inductive clustering example.

@lorentzenchr
Copy link
Member

@chkoar From a statistical point of view, I have major concerns to convert a regression task on a continuous target into a classification task. Therefore, I would rather not put it in an example.

@chkoar
Copy link
Contributor

chkoar commented Aug 10, 2021

scikit-learn has all the components in order to implement the regression by discretization approach. If I am not mistaken even WEKA includes this meta estimator by default. Since we always evaluate using cross-validation, I would appreciate if you could list your major concerns regarding the approach. Thanks.

@lorentzenchr
Copy link
Member

lorentzenchr commented Aug 10, 2021

Fair enough. But let‘s not make a long discussion out of it. (Disclaimer: Maybe some points are misplaced as I have not studied the approach in detail.)

  • First, I prefer strong positive arguments: Why should I follow this approach?
  • One loses information by binning (that‘s why I favor the SplineTransformer over the KBinsDiscretizer). Why should you do that in the first place (except data compression)?
  • Continuous targets have an order (<). Classification classes (in scikit-learn) don’t.
  • Continuous targets usually have some kind of smoothness: Proximity in feature space (for continuous features) means proximity in target space.
  • All this loss of information is accompanied by possibly more parameters in the model, eg LogisticRegression has number of coefficients proportional to number of classes.
  • The binning obfuscates whether one is trying to predict the expectation/mean or a quantile. (I guess it‘s more meant for the expectation.)
  • One can end up with a badly (conditionally) calibrated regression model, ie biased. (Ok, this can also happen for std regression techniques.)

@glemaitre
Copy link
Member

@lorentzenchr I found the closing a bit abrupt :)

I completely agree with your point regarding the non-inclusion of a potential meta-estimator and the danger and bad/wrong practice of evaluating a regression problem with the subjacent classification proxy problem.

Regarding not introducing an example, I would not be as categorical. If there is a meaningful and well-defined problem where it makes sense, I would not be against it. Your arguments seem fair to me and really meaningful for linear models. However, I am not sure that the tree-based/rule-based models would not benefit from the classification proxy problem.
For instance, I was looking at the following: https://www.jair.org/index.php/jair/article/download/10150/24055

But it might be possible that there are better alternatives (skope-rules for regression?) out there.

@lorentzenchr
Copy link
Member

I found the closing a bit abrupt

Indeed. That‘s why I took the time to lay out my reasons. And as a contributor, the most frustrating experience is to not get a response at all. This one was stalled for 1.5 years.
If there is a compelling use case, an example would be great. I just don‘t have enough imagination for it and therefore think that time could be better spend on other issues.

@lorentzenchr
Copy link
Member

lorentzenchr commented Dec 20, 2021

As reference for a more scientific reasoning, "V. Fedorov, F. Mannino, Rongmei Zhang" Consequences of dichotomization" doi: 10.1002/pst.331 concludes:

While the analysis of dichotomized outcomes may be easier, there are no benefits to this approach when the true outcomes can be observed and the ‘working’ model is flexible enough to describe the
population at hand. Thus, dichotomization should be avoided in most cases.

@glemaitre
Copy link
Member

Dichotomization is the transformation of a continuous outcome (response) to a binary outcome

I just read the abstract. Is it only looking at binarizing the output?

@lorentzenchr
Copy link
Member

It means transforming / discretizing the observed y.

@glemaitre
Copy link
Member

Thanks for the reference. This looks like a compelling reference.

@lorentzenchr
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants