Skip to content

Use _openmp_effective_n_threads to take cgroups CPU quotas into account into account in Elkan and Minibatch KMeans #20483

@ogrisel

Description

@ogrisel

A partial output of git grep prange shows that those Cython file do not always path an explicit num_threads which means that those parallel loops will typically be over-subscribed in a docker container with low CPU quotas:

sklearn/cluster/_k_means_elkan.pyx:from cython.parallel import prange, parallel
sklearn/cluster/_k_means_elkan.pyx:    for i in prange(n_samples, schedule='static', nogil=True):
sklearn/cluster/_k_means_elkan.pyx:    for i in prange(n_samples, schedule='static', nogil=True):
sklearn/cluster/_k_means_elkan.pyx:        for chunk_idx in prange(n_chunks, schedule='static'):
sklearn/cluster/_k_means_elkan.pyx:        for chunk_idx in prange(n_chunks, schedule='static'):
sklearn/cluster/_k_means_lloyd.pyx:from cython.parallel import prange, parallel
sklearn/cluster/_k_means_lloyd.pyx:        for chunk_idx in prange(n_chunks, schedule='static'):
sklearn/cluster/_k_means_lloyd.pyx:        for chunk_idx in prange(n_chunks, schedule='static'):
sklearn/cluster/_k_means_minibatch.pyx:from cython.parallel cimport parallel, prange
sklearn/cluster/_k_means_minibatch.pyx:        for cluster_idx in prange(n_clusters, schedule="static"):
sklearn/cluster/_k_means_minibatch.pyx:        for cluster_idx in prange(n_clusters, schedule="static"):

Similar to #20477.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions