Skip to content

Create a way to transform the target variable within a custom sklearn transformer. #26936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amorimds opened this issue Jul 29, 2023 · 1 comment
Labels
Needs Triage Issue requires triage New Feature

Comments

@amorimds
Copy link

amorimds commented Jul 29, 2023

Describe the workflow you want to enable

It is not uncommon that the target variable in the raw dataset is not in the ideal format to be fitted in the estimator:

  • In multiclass classification, we may need to apply a custom encoding.
  • In regression, we may want to scale the target.

It is critical (good practice) to keep all data transformation within the sklearn pipeline. This will ensure that the model
can accept the raw features and target as input when performing streaming predictions. If all transformations are not concentrated in the sklearn pipeline, the input data for online requests will need to pass through a preprocessing pipeline first adding a lot of unnecessary complexity. If the transformation in this preprocessing pipeline needs to be stateful (learn its parameter from the training dataset through fit) the creation of such preprocessing pipeline becomes even more complicated.

Describe your proposed solution

Enable a way for a Transformer to be able to change the target variable and return it forward as y.
The default behavior for a transformation should not change nor return why:

  • If the transformer doesn't return y (default) then can assume that y did not change.
  • If it returns X, y we should replace the old y with the returned y.

Describe alternatives you've considered, if relevant

No response

Additional context

No response

@amorimds amorimds added Needs Triage Issue requires triage New Feature labels Jul 29, 2023
@amorimds amorimds changed the title Create a way to transform the target valued within a custom sklearn transformer. Create a way to transform the target variable within a custom sklearn transformer. Jul 29, 2023
@thomasjpfan
Copy link
Member

For transforming regression targets there is TransformedTargetRegressor.

For generically transforming y there is already an open issue: #4143. According to our triaging guidelines, I am closing this issue as a duplicate. You are welcome to continue the discussion in #4143.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage Issue requires triage New Feature
Projects
None yet
Development

No branches or pull requests

2 participants