Persistent UserWarning about KMeans Memory Leak on Windows Despite Applying Suggested Fixes #30921

rahimHub · 2025-03-01T19:34:29Z

Describe the bug

Issue Description
When running code involving GaussianMixture (or KMeans), a UserWarning about a known memory leak on Windows with MKL is raised, even after implementing the suggested workaround (OMP_NUM_THREADS=1 or 2). The warning persists across multiple environments and configurations, indicating the issue may require further investigation.

Warning Message:

C:\ProgramData\anaconda3\Lib\site-packages\sklearn\cluster_kmeans.py:1429: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=2.
warnings.warn(

Steps to Reproduce

1-Code Example:

import os
os.environ["OMP_NUM_THREADS"] = "1" # Also tested with "2"
os.environ["MKL_NUM_THREADS"] = "1"

import numpy as np
from sklearn.datasets import make_blobs
from sklearn.mixture import GaussianMixture

# Generate synthetic 3D data
X, _ = make_blobs(n_samples=300, n_features=3, centers=3, random_state=42)

# Train GMM model
gmm = GaussianMixture(n_components=3, random_state=42)
gmm.fit(X) # Warning triggered here

Environment:

OS: Windows 11
Python: 3.10.12
scikit-learn: 1.3.2
numpy: 1.26.0 (linked to MKL via Anaconda)
Installation Method: Anaconda (conda install scikit-learn).

Expected vs. Actual Behavior

Expected: Setting OMP_NUM_THREADS should suppress the warning and resolve the memory leak.

Actual: The warning persists despite environment variable configurations, reinstalls, and thread-limiting methods.

Attempted Fixes

Set OMP_NUM_THREADS=1 or 2 in code and system environment variables.
Limited threads via threadpoolctl:
code:

from threadpoolctl import threadpool_limits
with threadpool_limits(limits=1, user_api='blas'):
gmm.fit(X)

Reinstalled numpy and scipy with OpenBLAS instead of MKL.
Tested in fresh conda environments.
Updated all packages to latest versions.
None of these resolved the warning.

Additional Context:
The warning appears even when using GaussianMixture, which indirectly relies on KMeans-related code.

The issue is specific to Windows + MKL. No warnings on Linux/Mac.

Full error log: [Attach log if available].

Questions for Maintainers:
Is there a deeper configuration or bug causing this warning to persist?
Are there alternative workarounds for Windows users?
Is this issue being tracked in ongoing development?
Thank you for your time and support!
Let me know if further details are needed.

Steps/Code to Reproduce

import os
os.environ["OMP_NUM_THREADS"] = "1"  # Also tested with "2"
os.environ["MKL_NUM_THREADS"] = "1"

import numpy as np
from sklearn.datasets import make_blobs
from sklearn.mixture import GaussianMixture

# Generate synthetic 3D data
X, _ = make_blobs(n_samples=300, n_features=3, centers=3, random_state=42)

# Train GMM model
gmm = GaussianMixture(n_components=3, random_state=42)
gmm.fit(X)  # Warning triggered here

Expected Results

C:\ProgramData\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1429: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=2.
  warnings.warn(

Actual Results

C:\ProgramData\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1429: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=2.
  warnings.warn(

Versions

scikit-learn: 1.3.2

numpy: 1.26.0 (linked to MKL via Anaconda)

The text was updated successfully, but these errors were encountered:

ogrisel · 2025-03-04T16:06:25Z

Thanks for the report.

Reinstalled numpy and scipy with OpenBLAS instead of MKL.

Have you checked the output of python -c "import sklearn; sklearn.show_version()" to check that there is no MKL linked into the Python process anymore?

Please include the full output of sklearn.show_version() in each environment in your report.

Are there alternative workarounds for Windows users?

Using OpenBLAS should fix the problem. See the precise conditions that lead to this warning to be raised in the source code:

scikit-learn/sklearn/cluster/_kmeans.py

Lines 910 to 928 in d0ee195

    
           def _check_mkl_vcomp(self, X, n_samples): 
        
               """Check when vcomp and mkl are both present""" 
        
               # The BLAS call inside a prange in lloyd_iter_chunked_dense is known to 
        
               # cause a small memory leak when there are less chunks than the number 
        
               # of available threads. It only happens when the OpenMP library is 
        
               # vcomp (microsoft OpenMP) and the BLAS library is MKL. see #18653 
        
               if sp.issparse(X): 
        
                   return 
        
               n_active_threads = int(np.ceil(n_samples / CHUNK_SIZE)) 
        
               if n_active_threads < self._n_threads: 
        
                   modules = _get_threadpool_controller().info() 
        
                   has_vcomp = "vcomp" in [module["prefix"] for module in modules] 
        
                   has_mkl = ("mkl", "intel") in [ 
        
                       (module["internal_api"], module.get("threading_layer", None)) 
        
                       for module in modules 
        
                   ] 
        
                   if has_vcomp and has_mkl: 
        
                       self._warn_mkl_vcomp(n_active_threads)

BTW, next time, please use markdown formatting to make the issue more readable (I edited it myself this time).

adrinjalali · 2025-06-04T13:47:33Z

Closing as lack of response from the OP.

rahimHub added Bug Needs Triage Issue requires triage labels Mar 1, 2025

ogrisel removed the Needs Triage Issue requires triage label Mar 4, 2025

ogrisel added the Needs Info label Mar 4, 2025

adrinjalali closed this as completed Jun 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Persistent UserWarning about KMeans Memory Leak on Windows Despite Applying Suggested Fixes #30921

Persistent UserWarning about KMeans Memory Leak on Windows Despite Applying Suggested Fixes #30921

rahimHub commented Mar 1, 2025 •

edited by ogrisel

Loading

ogrisel commented Mar 4, 2025

Uh oh!

adrinjalali commented Jun 4, 2025

Uh oh!

Uh oh!

Persistent UserWarning about KMeans Memory Leak on Windows Despite Applying Suggested Fixes #30921

Persistent UserWarning about KMeans Memory Leak on Windows Despite Applying Suggested Fixes #30921

Comments

rahimHub commented Mar 1, 2025 • edited by ogrisel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the bug

Environment:

Expected vs. Actual Behavior

Attempted Fixes

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

ogrisel commented Mar 4, 2025

Uh oh!

adrinjalali commented Jun 4, 2025

Uh oh!

rahimHub commented Mar 1, 2025 •

edited by ogrisel

Loading