Skip to content

Persistent UserWarning about KMeans Memory Leak on Windows Despite Applying Suggested Fixes #30921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rahimHub opened this issue Mar 1, 2025 · 2 comments

Comments

@rahimHub
Copy link

rahimHub commented Mar 1, 2025

Describe the bug

Issue Description
When running code involving GaussianMixture (or KMeans), a UserWarning about a known memory leak on Windows with MKL is raised, even after implementing the suggested workaround (OMP_NUM_THREADS=1 or 2). The warning persists across multiple environments and configurations, indicating the issue may require further investigation.

Warning Message:

C:\ProgramData\anaconda3\Lib\site-packages\sklearn\cluster_kmeans.py:1429: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=2.
warnings.warn(

Steps to Reproduce

1-Code Example:

import os
os.environ["OMP_NUM_THREADS"] = "1" # Also tested with "2"
os.environ["MKL_NUM_THREADS"] = "1"

import numpy as np
from sklearn.datasets import make_blobs
from sklearn.mixture import GaussianMixture

# Generate synthetic 3D data
X, _ = make_blobs(n_samples=300, n_features=3, centers=3, random_state=42)

# Train GMM model
gmm = GaussianMixture(n_components=3, random_state=42)
gmm.fit(X) # Warning triggered here

Environment:

OS: Windows 11
Python: 3.10.12
scikit-learn: 1.3.2
numpy: 1.26.0 (linked to MKL via Anaconda)
Installation Method: Anaconda (conda install scikit-learn).

Expected vs. Actual Behavior

Expected: Setting OMP_NUM_THREADS should suppress the warning and resolve the memory leak.

Actual: The warning persists despite environment variable configurations, reinstalls, and thread-limiting methods.

Attempted Fixes

Set OMP_NUM_THREADS=1 or 2 in code and system environment variables.
Limited threads via threadpoolctl:
code:

from threadpoolctl import threadpool_limits
with threadpool_limits(limits=1, user_api='blas'):
gmm.fit(X)

Reinstalled numpy and scipy with OpenBLAS instead of MKL.
Tested in fresh conda environments.
Updated all packages to latest versions.
None of these resolved the warning.

Additional Context:
The warning appears even when using GaussianMixture, which indirectly relies on KMeans-related code.

The issue is specific to Windows + MKL. No warnings on Linux/Mac.

Full error log: [Attach log if available].

Questions for Maintainers:
Is there a deeper configuration or bug causing this warning to persist?
Are there alternative workarounds for Windows users?
Is this issue being tracked in ongoing development?
Thank you for your time and support!
Let me know if further details are needed.

Steps/Code to Reproduce

import os
os.environ["OMP_NUM_THREADS"] = "1"  # Also tested with "2"
os.environ["MKL_NUM_THREADS"] = "1"

import numpy as np
from sklearn.datasets import make_blobs
from sklearn.mixture import GaussianMixture

# Generate synthetic 3D data
X, _ = make_blobs(n_samples=300, n_features=3, centers=3, random_state=42)

# Train GMM model
gmm = GaussianMixture(n_components=3, random_state=42)
gmm.fit(X)  # Warning triggered here

Expected Results

C:\ProgramData\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1429: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=2.
  warnings.warn(

Actual Results

C:\ProgramData\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1429: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=2.
  warnings.warn(

Versions

scikit-learn: 1.3.2

numpy: 1.26.0 (linked to MKL via Anaconda)
@rahimHub rahimHub added Bug Needs Triage Issue requires triage labels Mar 1, 2025
@ogrisel ogrisel removed the Needs Triage Issue requires triage label Mar 4, 2025
@ogrisel
Copy link
Member

ogrisel commented Mar 4, 2025

Thanks for the report.

Reinstalled numpy and scipy with OpenBLAS instead of MKL.

Have you checked the output of python -c "import sklearn; sklearn.show_version()" to check that there is no MKL linked into the Python process anymore?

Please include the full output of sklearn.show_version() in each environment in your report.

Are there alternative workarounds for Windows users?

Using OpenBLAS should fix the problem. See the precise conditions that lead to this warning to be raised in the source code:

def _check_mkl_vcomp(self, X, n_samples):
"""Check when vcomp and mkl are both present"""
# The BLAS call inside a prange in lloyd_iter_chunked_dense is known to
# cause a small memory leak when there are less chunks than the number
# of available threads. It only happens when the OpenMP library is
# vcomp (microsoft OpenMP) and the BLAS library is MKL. see #18653
if sp.issparse(X):
return
n_active_threads = int(np.ceil(n_samples / CHUNK_SIZE))
if n_active_threads < self._n_threads:
modules = _get_threadpool_controller().info()
has_vcomp = "vcomp" in [module["prefix"] for module in modules]
has_mkl = ("mkl", "intel") in [
(module["internal_api"], module.get("threading_layer", None))
for module in modules
]
if has_vcomp and has_mkl:
self._warn_mkl_vcomp(n_active_threads)

BTW, next time, please use markdown formatting to make the issue more readable (I edited it myself this time).

@adrinjalali
Copy link
Member

Closing as lack of response from the OP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants