Skip to content

RFC Support for Some Developer Utilities #15801

@thomasjpfan

Description

@thomasjpfan

On the Utilities for Developers page, it states:

Warning: These utilities are meant to be used internally within the scikit-learn package. They are not guaranteed to be stable between versions of scikit-learn. Backports, in particular, will be removed as the scikit-learn dependencies evolve.

If we want to provide utilities to support third-party estimators, we should treat some of these utilities as "first class" citizens.

For example safe_indexing would be extremely useful for third parties that want to support DataFrames as input. Currently, the options for third-party developers is to build their own "safe_indexing" or depend on our private version which may not be stable.

Another example is scikit-learn/enhancement_proposals#22, which defines a n_features_in_ contract where we will internally use private methods to cohere with the contract. Third-party estimators would need to build their own methods or functions to work with the SLEP.

TLDR: Now that much of the utilities are "private", we can make deliberate decisions about what utilities should be public and supported by us. This would mean deprecation cycles, etc. If we support some of the utils module, it will make it easier to build estimators, which will enrich the ecosystem of scikit-learn compatible estimators.

CC @scikit-learn/core-devs

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions