Skip to content

KMeans(init='k-means++') performance issue with OpenBLAS #17334

Open
@ogrisel

Description

@ogrisel

I open this issue to investigate a performance problem that might be related to #17230.

I adapted the reproducer of #17230 to display more info and make it work on a medium-size ranom dataset.

from sklearn import cluster
from time import time
from pprint import pprint
from threadpoolctl import threadpool_info
import numpy as np


pprint(threadpool_info())
rng = np.random.RandomState(0)
data = rng.randn(5000, 50)
t0_global = time()
for k in range(1, 15):
    t0 = time()
    # print(f"Running k-means with k={k}: ", end="", flush=True)
    cluster.KMeans(
        n_clusters=k,
        random_state=42,
        n_init=10,
        max_iter=2000,
        algorithm='full',
        init='k-means++').fit(data)
    # print(f"{time() - t0:.3f} s")

print(f"Total duration: {time() - t0_global:.3f} s")

I tried to run this on Linux with scikit-learn master (therefore including the #16499 fix) with 2 different builds of scipy (with openblas from pypi and MKL from anaconda) and various values for OMP_NUM_THREADS (unset, OMP_NUM_THREADS=1, OMP_NUM_THREADS=2, OMP_NUM_THREADS=4) on a laptop with 2 physical cpu cores (4 logical cpus).

In both cases, I use the same scikit-learn binaries (built with GCC in editable mode). I just change the env.

The summary is:

  • with MKL there is not problem: large or unset values of OMP_NUM_THREADS are faster than OMP_NUM_THREADS=1;
  • with OpenBLAS without explicit setting of OMP_NUM_THREADS or setting a large value for it is significanlty slower forced sequential run with OMP_NUM_THREADS=1.

I will include my runs in the first comment.

/cc @jeremiedbb

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions