Feature Request: Pipelining Outlier Removal

I wonder if we could make outlier removal available in pipelines.

I tried implementing it for example using the IsolationForest but so far I couldn't solve it and I know why.

The problem boils down to `fit_transform` only returning a transformed `X` this suffices in the vast majority of cases, since we typically only throw away columns (think of a PCA). However, using outlier removal in a pipeline, we need to throw away rows of `X` **and** `y` during training and do nothing during testing. This is not supported so far. Essentially, we would need to turn the `predict` function into some kind of `transform` function during training.

Investigating the pipeline implementation shows, that `fit_transform`is called if present during the fitting part of the pipeline, rather than `fit(X, y).transform(X)`. Particularly, in a cross validation `fit_transform` is only called during training. This would be perfect for outlier removal. However, it remains to do nothing in the test step. But to this end we can simply implement a "do-nothing" `transform`-function.

The most direct way to implement this, would be an API-change of the TransformerMixin-class,  unfortunately.

So my questions are:

Would it be interesting to contain feature removal in pipelines?
Are there other more suitable ideas of implementing this feature in a pipeline?

If the content of this question is somehow inapropriate (e.g. since I'm only an active user, not an active developer of the project) or at the wrong place, feel free to remove the thread.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Feature Request: Pipelining Outlier Removal #9630

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Feature Request: Pipelining Outlier Removal #9630

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions