You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is not uncommon that the target variable in the raw dataset is not in the ideal format to be fitted in the estimator:
In multiclass classification, we may need to apply a custom encoding.
In regression, we may want to scale the target.
It is critical (good practice) to keep all data transformation within the sklearn pipeline. This will ensure that the model
can accept the raw features and target as input when performing streaming predictions. If all transformations are not concentrated in the sklearn pipeline, the input data for online requests will need to pass through a preprocessing pipeline first adding a lot of unnecessary complexity. If the transformation in this preprocessing pipeline needs to be stateful (learn its parameter from the training dataset through fit) the creation of such preprocessing pipeline becomes even more complicated.
Describe your proposed solution
Enable a way for a Transformer to be able to change the target variable and return it forward as y.
The default behavior for a transformation should not change nor return why:
If the transformer doesn't return y (default) then can assume that y did not change.
If it returns X, y we should replace the old y with the returned y.
Describe alternatives you've considered, if relevant
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
amorimds
changed the title
Create a way to transform the target valued within a custom sklearn transformer.
Create a way to transform the target variable within a custom sklearn transformer.
Jul 29, 2023
For generically transforming y there is already an open issue: #4143. According to our triaging guidelines, I am closing this issue as a duplicate. You are welcome to continue the discussion in #4143.
Describe the workflow you want to enable
It is not uncommon that the target variable in the raw dataset is not in the ideal format to be fitted in the estimator:
It is critical (good practice) to keep all data transformation within the sklearn pipeline. This will ensure that the model
can accept the raw features and target as input when performing streaming predictions. If all transformations are not concentrated in the sklearn pipeline, the input data for online requests will need to pass through a preprocessing pipeline first adding a lot of unnecessary complexity. If the transformation in this preprocessing pipeline needs to be stateful (learn its parameter from the training dataset through fit) the creation of such preprocessing pipeline becomes even more complicated.
Describe your proposed solution
Enable a way for a Transformer to be able to change the target variable and return it forward as y.
The default behavior for a transformation should not change nor return why:
Describe alternatives you've considered, if relevant
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: