Skip to content

feature_importance causes a BSOD on Windows 10 #18187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Devilmoon opened this issue Aug 18, 2020 · 32 comments
Closed

feature_importance causes a BSOD on Windows 10 #18187

Devilmoon opened this issue Aug 18, 2020 · 32 comments
Labels
Bug Large Scale OS:Windows Problem specific to Windows

Comments

@Devilmoon
Copy link

Describe the bug

Running permutation_importance on a medium-sized data set results in a BSOD on Windows 10. The dataset is 470605 x 332, code is running in a Jupyter notebook, Python version 3.7.6, scikit version 0.22.1.
The BSOD is a KERNEL_SECURITY_CHECK_FAILURE, with ERROR_CODE: (NTSTATUS) 0xc0000409 - The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application.
The machine has a Ryzen 5 3600 with 16GB of RAM.

Steps/Code to Reproduce

from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance
rf = RandomForestClassifier(n_estimators = 250,
                           n_jobs = -1,
                           oob_score = True,
                           bootstrap = True,
                           random_state = 42)
rf.fit(X_train, y_train)
permImp = permutation_importance(rf,
                                 X_val,
                                 y_val,
                                 scoring='f1',
                                 n_repeats=5,
                                 n_jobs=-1,
                                 random_state=42)

Expected Results

No BSOD, permutation importance computed.

Actual Results

BSOD after ~1-2 minutes

Versions

sklearn.show_versions()

System:
python: 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\lucag\anaconda3\python.exe
machine: Windows-10-10.0.18362-SP0

Python dependencies:
pip: 20.0.2
setuptools: 45.2.0.post20200210
sklearn: 0.22.1
numpy: 1.18.1
scipy: 1.4.1
Cython: 0.29.15
pandas: 1.0.1
matplotlib: 3.1.3
joblib: 0.14.1

Built with OpenMP: True

@rth
Copy link
Member

rth commented Aug 18, 2020

Thanks for the report. It's surprising that even if it detected a stack-Based buffer overrun that it would trigger an equivalent of a kernel panic instead of just killing the application.

This will be likely hard to reproduce without the training data where it happens. Unless you also get it if you generate a dataset with an equivalent shape using make_classification(..., random_state=0)?

@Devilmoon
Copy link
Author

I will test with a synthetic data set of the same shape ASAP, unfortunately the data I'm currently using is under NDA thus I cannot share it for reproducibility purposes.

@Devilmoon
Copy link
Author

I confirm I can reproduce the issue with a synthetic dataset (I used the default on make_classification because I don't think the make up fo the data is relevant to the issue):

from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(n_samples=470605, n_features=332, random_state=0) 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42)

rf = RandomForestClassifier(n_estimators = 250,
                           n_jobs = -1,
                           oob_score = True,
                           bootstrap = True,
                           random_state = 42)
rf.fit(X_train, y_train)

permImp = permutation_importance(rf,
                                 X_val,
                                 y_val,
                                 scoring='f1',
                                 n_repeats=5,
                                 n_jobs=-1,
                                 random_state=42)

I see very high memory usage (>80%, sometimes close to 100%) and a ton of activity on the pagefile and a process related to joblib_memorymapping_folder (https://joblib.readthedocs.io/en/latest/auto_examples/parallel_memmap.html), which I suppose is how scikit handles storing/retrieving data from the pagefile when computing data that does not fit in RAM.

@Devilmoon
Copy link
Author

This might be where the issue happens: https://joblib.readthedocs.io/en/latest/parallel.html#working-with-numerical-data-in-shared-memory-memmapping

Should I try increasing the page file size (8GB currently) and see if it still crashes? Altough joblib doesn't seem to care about the page file but just dumps the data to disk.

@rth
Copy link
Member

rth commented Aug 18, 2020

Thanks for the example. I can't reproduce on Linux. Yes, my guess would be that,

  • either it's somehow related to mmapping used in joblib parallel (used to share data between processes)
  • or to spawning threads. RandomForestClassifier.predict now also uses joblib.Parallel So when used in combination with permutation_importance(..., n_jobs=). This could lead to the creation of N_CPU**2 threads or around 400 with 12 physical CPU cores. The Windows kernel might flag that as an security check violation.

ether way this issue being raised by one or multiple system call seem more plausible than any Cython code issues in the RF.

So setting n_jobs to a lower value in one or both of the used functions/classes should probably resolve it. Also with n_jobs=1 RandomForestClassifier will not use mmaping.

@Devilmoon
Copy link
Author

I'll try halving the no. of cores on the permutation_importance call, but to me it seems strange that scikit would crash and burn on such a simple example.
permimp is using the validation set that is 20% of the whole dataset, ~95000 x 332, and yet the whole OS commits seppuku rather than compute it?

@Devilmoon
Copy link
Author

Halving the no. of cores (n_jobs=6) seems to hold up, however I just got this warning message:

C:\Users\lucag\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py:706: UserWarning: A worker stopped while some jobs were given to the executor. This can be caused by a too short worker timeout or by a memory leak.
  "timeout or by a memory leak.", UserWarning

Furthermore, joblib seems to have already used up 10GB of space on my drive

@Devilmoon
Copy link
Author

So, the PC didn't crash and I got the results back on the synthetic dataset with n_jobs=6, however joblib used ~30GB of space by the end of it on my drive. How is this possible? What is it storing?
Also, what about the warning message I got? Are the results reliable anyways?

@rth
Copy link
Member

rth commented Aug 19, 2020

The disk usage is concerning as well. @lesteve or @ogrisel might have some ideas. To summarize the issues are,

  • kernel panic (BSOD on windows)
  • possible CPU oversubscription when doing RF.predict inside permutation_importance. I though joblib should prevent it but maybe you need a later version. You are using joblib 0.14.1 while the latest is 0.16.0
  • using 30GB of disk for caching is which is also not OK.

@rth rth added Bug and removed Bug: triage labels Aug 19, 2020
@Devilmoon
Copy link
Author

I updated joblib to 0.16.0 and tried running permutation importance on the original dataset, CPU usage showed some dips from 100% so I don't think the CPU is oversubscripted (with n_jobs=6 in the permutation_importance call).
However I still got a BSOD after joblib cached ~140GB of data (!!) and the space left on the drive was around ~20GB.
Throughout execution Disk I/O was always around ~100MB/s frequently rising to >200MB/s and peaks of ~500MB/s, all writes.
Am I not seeing something obvious or is this completely overkill?

@Devilmoon
Copy link
Author

Furthermore after restarting space left on the drive is 25.8GB, meaning that the cached data is still somewhere on disk!

@Devilmoon
Copy link
Author

This is totally not ok, almost 300GB of data left behind by joblib

joblib

@rth
Copy link
Member

rth commented Aug 19, 2020

Thanks for investigating! Yes, it's absolutely not OK as a default behavior. The 50k created file objects is also not good. Could you show a few filenames in one of those memmaping folders please? We'll need some joblib experts to have a look at it.

@rth rth added the Blocker label Aug 19, 2020
@Devilmoon
Copy link
Author

Unfortunately I already removed those folders and cannot remember the exact file names, however I can tell you that they all had short file names, and all of them had the same size, which was around ~2MB

@rth
Copy link
Member

rth commented Aug 19, 2020

I can reproduce on Windows. On a machine with 4 CPU cores, I get a joblib_mmapping_folder containing thousands of 1MB files for a total of ~3GB, I imagine it scales as some power law of N_CPU (or maybe I haven't let it run until the end).

Investigating this, when one runs RandomForestClassifier.predict it parallelizes over trees using threading with require="sharedmem". This should not use mmaping.

Then using permutation_importance will parallelize over features (

scores = Parallel(n_jobs=n_jobs)(delayed(_calculate_permutation_scores)(
) , this should use mmaping however it should have only mmaped X once, not thousands of time. I have the impression that require="sharedmem" is somehow misbehaving in this case of nested parallelism, and triggers mmaping as well.

@Devilmoon
Copy link
Author

Devilmoon commented Aug 19, 2020

Would it be possible that both the loky and threading back-end are working at the same time?
If I understand correctly sharedmem should force the threading back-end, which is thread-based; however, during my testing I saw multiple active python.exe processes, which would be the behaviour of loky. IIRC there were 6 different processes, which coincidentally is n_jobs for permutation_importance.
This might be what is causing the issue? Don't know enough about parallelism to give an informed opinion but wouldn't it be possible for each of the processes to spawn -1 threads that somehow misbehave?

@Devilmoon
Copy link
Author

@rth, this might be related? joblib/joblib#966 Altough it seemed to not be present in 0.14.1 which I was using at the beginning.

@Devilmoon
Copy link
Author

Also this might be related joblib/joblib#690

@ogrisel
Copy link
Member

ogrisel commented Sep 29, 2020

So, the PC didn't crash and I got the results back on the synthetic dataset with n_jobs=6, however joblib used ~30GB of space by the end of it on my drive. How is this possible? What is it storing?

joblib is storing the input data for each subproblem as memory mapped files to share memory between the subprocess and the main process. For this specific workload this memmapping is never useful because the data is never going to be same because the permutations are random and independent of one another.

I will have a look to see if we can disable it there.

@ogrisel
Copy link
Member

ogrisel commented Sep 29, 2020

Actually, looking at the code, using mmaping should be fine (and useful) because joblib.Parallel is called in the non permuted data and the permutation is applied on a copy of the data in the worker.

Maybe there is something wrong and low level only happening on windows. I need to investigate with a windows VM.

@cmarmo cmarmo added this to the 1.0 milestone Feb 24, 2021
@adrinjalali
Copy link
Member

Removing the milestone since it doesn't seem to be a blocker.

@adrinjalali adrinjalali removed this from the 1.0 milestone Aug 22, 2021
@glemaitre
Copy link
Member

In the issue #16716, it seems that a lot of memmaps were created. I am wondering if it could be a similar issue?

@lesteve lesteve removed the Blocker label Jun 18, 2024
@lesteve
Copy link
Member

lesteve commented Jun 18, 2024

Removing the Blocker label since this hasn't been a blocker for a while. My wild guess is that this is a joblib issue which on Windows sometimes does things which are very inefficient?

In an ideal world, someone with a Windows machine would try to see if #18187 (comment) still reproduces and compare with the status of #18187 (comment), I am guessing the problem may still exist ...

Edit: I added the "OS:Windows" label that I just created together with "OS:Linux" and "OS:macOS"

@lesteve lesteve added the OS:Windows Problem specific to Windows label Jun 18, 2024
@vitorpohlenz
Copy link
Contributor

Hi, thanks for the suggestion @lesteve!

I have a Windows setup and can work on it, as you said. The first step would be reproducing it again, since the Issue is a bit old.

I can work on it (I will probably take it to reproduce next week).

So, I can /take this Issue.

@vitorpohlenz
Copy link
Contributor

Update:

Started working on it, but it seems that CI is not recognizing /take.

@adrinjalali
Copy link
Member

adrinjalali commented Apr 23, 2025

@vitorpohlenz thanks for working on this. And don't worry about taking the issue. Commenting as you have is enough.

@vitorpohlenz
Copy link
Contributor

vitorpohlenz commented Apr 23, 2025

The disk usage is concerning as well. @lesteve or @ogrisel might have some ideas. To summarize the issues are,

  • kernel panic (BSOD on windows)
  • possible CPU oversubscription when doing RF.predict inside permutation_importance. I though joblib should prevent it but maybe you need a later version. You are using joblib 0.14.1 while the latest is 0.16.0
  • using 30GB of disk for caching is which is also not OK.

Thanks for the summary @rth,
I reproduced the issue, and the problem still persist on sklearn version 1.7.dev0 (an error is thronw but not BSOD) when using n_jobs=-1 in RandonForest and permutation_importance, but does not occur if you use half of available cpus on each of then.

System details:


System:
    python: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
executable: C:\Users\vitor.pohlenz\vpz\scikit-learn\sklearn-env\Scripts\python.exe
   machine: Windows-10-10.0.26100-SP0

Python dependencies:
      sklearn: 1.7.dev0
          pip: 25.0.1
   setuptools: 65.5.0
        numpy: 2.2.5
        scipy: 1.15.2
       Cython: 3.0.12
       pandas: 2.2.3
   matplotlib: 3.10.1
       joblib: 1.4.2
threadpoolctl: 3.6.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 20
         prefix: libscipy_openblas
       filepath: C:\Users\vitor.pohlenz\vpz\scikit-learn\sklearn-env\Lib\site-packages\numpy.libs\libscipy_openblas64_-43e11ff0749b8cbe0a615c9cf6737e0e.dll
        version: 0.3.28
threading_layer: pthreads
   architecture: Haswell

       user_api: blas
   internal_api: openblas
    num_threads: 20
         prefix: libscipy_openblas
       filepath: C:\Users\vitor.pohlenz\vpz\scikit-learn\sklearn-env\Lib\site-packages\scipy.libs\libs07a3a47104dca54d6d007a3a47104dca54d6d0c86a.dll
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 20
         prefix: vcomp
       filepath: C:\Windows\System32\vcomp140.dll
        version: None

Here is the example provided with few modifications to use half of available cpus:

from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

#Import os to check number of cpus available
import os

n_cpus = os.cpu_count()

cpus_to_use = n_cpus//2

X, y = make_classification(n_samples=470605, n_features=332, random_state=0) 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42)

rf = RandomForestClassifier(n_estimators = 250,
                           n_jobs = cpus_to_use,
                           oob_score = True,
                           bootstrap = True,
                           random_state = 42)
rf.fit(X_train, y_train)

permImp = permutation_importance(rf,
                                 X_val,
                                 y_val,
                                 scoring='f1',
                                 n_repeats=5,
                                 n_jobs=cpus_to_use,
                                 random_state=42)

In the example below, everything runs smoothly (at least on my machine), and the problem with high usage of disk for caching does not occur.

But if I run the code below with n_jobs=-1 in RandomForestClassifier and permutation_importance, I get the error:
OSError: [WinError 1450] Insufficient system resources exist to complete the requested service.

Error log

---------------------------------------------------------------------------
_RemoteTraceback                          Traceback (most recent call last)
_RemoteTraceback: 
"""
Traceback (most recent call last):
  File "c:\Users\vitor.pohlenz\vpz\scikit-learn\sklearn-env\lib\site-packages\joblib\externals\loky\backend\queues.py", line 161, in _feed
    send_bytes(obj_)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\multiprocessing\connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\multiprocessing\connection.py", line 280, in _send_bytes
    ov, err = _winapi.WriteFile(self._handle, buf, overlapped=True)
OSError: [WinError 1450] Insufficient system resources exist to complete the requested service
"""

The above exception was the direct cause of the following exception:

PicklingError                             Traceback (most recent call last)
File c:\Users\vitor.pohlenz\vpz\scikit-learn\test_delete_me.py:26
     19 rf = RandomForestClassifier(n_estimators = 250,
     20                            n_jobs = -1,
     21                            oob_score = True,
     22                            bootstrap = True,
     23                            random_state = 42)
     24 rf.fit(X_train, y_train)
---> 26 permImp = permutation_importance(rf,
     27                                  X_val,
     28                                  y_val,
     29                                  scoring='f1',
     30                                  n_repeats=5,
     31                                  n_jobs=-1,
     32                                  random_state=42)

File ~\vpz\scikit-learn\sklearn\utils\_param_validation.py:218, in validate_params.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
    212 try:
    213     with config_context(
    214         skip_parameter_validation=(
    215             prefer_skip_nested_validation or global_skip_validation
    216         )
    217     ):
--> 218         return func(*args, **kwargs)
    219 except InvalidParameterError as e:
    220     # When the function is just a wrapper around an estimator, we allow
    221     # the function to delegate validation to the estimator, but we replace
    222     # the name of the estimator by the name of the function in the error
    223     # message to avoid confusion.
    224     msg = re.sub(
    225         r"parameter of \w+ must be",
    226         f"parameter of {func.__qualname__} must be",
    227         str(e),
    228     )

File ~\vpz\scikit-learn\sklearn\inspection\_permutation_importance.py:288, in permutation_importance(estimator, X, y, scoring, n_repeats, n_jobs, random_state, sample_weight, max_samples)
    285 scorer = check_scoring(estimator, scoring=scoring)
    286 baseline_score = _weights_scorer(scorer, estimator, X, y, sample_weight)
--> 288 scores = Parallel(n_jobs=n_jobs)(
    289     delayed(_calculate_permutation_scores)(
    290         estimator,
    291         X,
    292         y,
    293         sample_weight,
    294         col_idx,
    295         random_seed,
    296         n_repeats,
    297         scorer,
    298         max_samples,
    299     )
    300     for col_idx in range(X.shape[1])
    301 )
    303 if isinstance(baseline_score, dict):
    304     return {
    305         name: _create_importances_bunch(
    306             baseline_score[name],
   (...)
    310         for name in baseline_score
    311     }

File ~\vpz\scikit-learn\sklearn\utils\parallel.py:82, in Parallel.__call__(self, iterable)
     73 warning_filters = warnings.filters
     74 iterable_with_config_and_warning_filters = (
     75     (
     76         _with_config_and_warning_filters(delayed_func, config, warning_filters),
   (...)
     80     for delayed_func, args, kwargs in iterable
     81 )
---> 82 return super().__call__(iterable_with_config_and_warning_filters)

File c:\Users\vitor.pohlenz\vpz\scikit-learn\sklearn-env\lib\site-packages\joblib\parallel.py:2007, in Parallel.__call__(self, iterable)
   2001 # The first item from the output is blank, but it makes the interpreter
   2002 # progress until it enters the Try/Except block of the generator and
   2003 # reaches the first `yield` statement. This starts the asynchronous
   2004 # dispatch of the tasks to the workers.
   2005 next(output)
-> 2007 return output if self.return_generator else list(output)

File c:\Users\vitor.pohlenz\vpz\scikit-learn\sklearn-env\lib\site-packages\joblib\parallel.py:1650, in Parallel._get_outputs(self, iterator, pre_dispatch)
   1647     yield
   1649     with self._backend.retrieval_context():
-> 1650         yield from self._retrieve()
   1652 except GeneratorExit:
   1653     # The generator has been garbage collected before being fully
   1654     # consumed. This aborts the remaining tasks if possible and warn
   1655     # the user if necessary.
   1656     self._exception = True

File c:\Users\vitor.pohlenz\vpz\scikit-learn\sklearn-env\lib\site-packages\joblib\parallel.py:1754, in Parallel._retrieve(self)
   1747 while self._wait_retrieval():
   1748 
   1749     # If the callback thread of a worker has signaled that its task
   1750     # triggered an exception, or if the retrieval loop has raised an
   1751     # exception (e.g. `GeneratorExit`), exit the loop and surface the
   1752     # worker traceback.
   1753     if self._aborting:
-> 1754         self._raise_error_fast()
   1755         break
   1757     # If the next job is not ready for retrieval yet, we just wait for
   1758     # async callbacks to progress.

File c:\Users\vitor.pohlenz\vpz\scikit-learn\sklearn-env\lib\site-packages\joblib\parallel.py:1789, in Parallel._raise_error_fast(self)
   1785 # If this error job exists, immediately raise the error by
   1786 # calling get_result. This job might not exists if abort has been
   1787 # called directly or if the generator is gc'ed.
   1788 if error_job is not None:
-> 1789     error_job.get_result(self.timeout)

File c:\Users\vitor.pohlenz\vpz\scikit-learn\sklearn-env\lib\site-packages\joblib\parallel.py:745, in BatchCompletionCallBack.get_result(self, timeout)
    739 backend = self.parallel._backend
    741 if backend.supports_retrieve_callback:
    742     # We assume that the result has already been retrieved by the
    743     # callback thread, and is stored internally. It's just waiting to
    744     # be returned.
--> 745     return self._return_or_raise()
    747 # For other backends, the main thread needs to run the retrieval step.
    748 try:

File c:\Users\vitor.pohlenz\vpz\scikit-learn\sklearn-env\lib\site-packages\joblib\parallel.py:763, in BatchCompletionCallBack._return_or_raise(self)
    761 try:
    762     if self.status == TASK_ERROR:
--> 763         raise self._result
    764     return self._result
    765 finally:

PicklingError: Could not pickle the task to send it to the workers.

So, it seems that the possible " CPU oversubscription" still occurs.

@glemaitre, as you mentioned in #16716 it may be related, but I'm not sure.

@vitorpohlenz
Copy link
Contributor

Update:

It seems that the OSError: WinError 1450, was not related to the code itself but some permissions in my Windows setup. For me, just giving enough permissions to my user to acess the "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\" was enough, despite of the other approaches described on Stack Overflow OSError: WinError 1450.

So I have some new pieces of information:

  1. kernel panic (BSOD on Windows): does not occur anymore, in none of my attempts it occurred when using n_jobs=-1

  2. possible CPU oversubscription when doing RF.predict inside permutation_importance: also does not occur anymore, even when using n_jobs=-1 in both. The usage of RAM and CPU goes to 100% (as expected in this example), but the code runs and finish without any error.

  3. using 30GB of disk for caching is which is also not OK: yes, this is still happening, but as soon as the process finishes, the temporary folder is deleted, freeing up the disk space.

For [3.] I have the following image showing a 37GB temporary folder of joblib_memmapping_folder

Image

With the above information, I have some questions that you all may help me with.

Since [1.] and [2.] seem to be solved, is it worth working on [3.]? I think that is relevant, but it would be great to have a confirmation.

@adrinjalali
Copy link
Member

Thanks for your investigations. With the information you provide, I think we can close this issue. However, as for the memmapping size, if you can find a way to reduce it w/o impacting performance, that would be a very welcome contribution.

Thanks again for all the work!

@vitorpohlenz
Copy link
Contributor

Thanks for your investigations. With the information you provide, I think we can close this issue. However, as for the memmapping size, if you can find a way to reduce it w/o impacting performance, that would be a very welcome contribution.

Thanks again for all the work!

Thanks for replying @adrinjalali !

Actually, I looked at the code to check if I could improve the disk usage, but it seems that this part is handled by joblib.Parallel, and I'm not so familiar with that code. From what I understood (or what I guess) is that joblib writes on disk to avoid consuming too much RAM. In this example, the X has more than 1.2GB in size, so when using both n_jobs=-1 in RF and permutation (which also has n_repeats), the data is "duplicated" a lot.

But using 30GB in disk with temporary file is still better than using 30GB in RAM, so not sure how to improve this situation...

@adrinjalali
Copy link
Member

One can look into whether or how hard it's feasible to reduce that "duplication", but if not, then there's no solution to it.

@vitorpohlenz
Copy link
Contributor

One can look into whether or how hard it's feasible to reduce that "duplication", but if not, then there's no solution to it.

Thanks for the tip @adrinjalali,

I gave a look at joblib.Parallel and how sklearn uses it in permutation_importance, to think about a possible workaround. But the default backend 'loky' uses multithreading (which is best for Windows, which does not handle multiple processes well), and since each worker is independent from the others in multithreading, each one needs its own data. So, the "duplication" is needed; to reduce it, we need to reduce the number of workers, but then we cannot use n_jobs=-1.

This seems like the "short blanket dilemma," in my opinion, but I'm not sure if there is a straight solution to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Large Scale OS:Windows Problem specific to Windows
Projects
None yet
Development

No branches or pull requests

8 participants