Skip to content

RFC: Vendor or add a dependency on threadpoolctl ? #16242

@jeremiedbb

Description

@jeremiedbb

@scikit-learn/core-devs following the irl discussion from the meeting.

threadpoolctl is needed in the new implementation of KMeans (#11950) to prevent oversubscription due to nested BLAS calls inside an outer OpenMP loop. It's a single file pure python package: https://github.com/joblib/threadpoolctl.

  • vendor
    The easiest and quickest way would be to vendor it in scikit-learn. There's even a PR ready for that ([MRG] Vendor threadpoolctl #14980). However it adds yet another thing in externals :(. It means that a bug fix in threadpoolctl would not be available until a new release of scikit-learn.

  • dependency
    On the other hand, we can make it a dependency. For that we need it to be available on conda (conda-forge and default channel). Also it might be a bit overkill for a single call to threadpoolctl in all scikit-learn.

What are your thoughts about that ? (Feel free to edit to add more pros and cons)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions