Skip to content

Suggestion: cache all Pipeline steps by default #9007

Open
@lsorber

Description

@lsorber

PR #7990 implements a caching pipeline which caches all steps but the last. I haven't found much discussion on this topic specifically, so I can only speculate on why the last step is not cached. With this issue, I would like to make the case for caching all pipeline steps.

Arguments pro:

  1. I would guess the last step is not cached because it is usually not a transformer. However, the last step may well be a transformer, for example a pipeline of preprocessing steps inside of a parent pipeline that ends in a non-transforming estimator.
  2. Even if the the last step is not a transformer, there could be cases where the user would want to cache the last step.
  3. If all steps are cached, it is easy to recreate the behaviour currently implemented in [MRG+3] ENH Caching Pipeline by memoizing transformer #7990 by simply putting all steps you want cached in a caching pipeline, and insert that pipeline into a non-caching pipeline for steps you don't want cached. Conversely, it is difficult to cache all steps in a pipeline with the current behaviour.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions