-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
feature_importance causes a BSOD on Windows 10 #18187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report. It's surprising that even if it detected a stack-Based buffer overrun that it would trigger an equivalent of a kernel panic instead of just killing the application. This will be likely hard to reproduce without the training data where it happens. Unless you also get it if you generate a dataset with an equivalent shape using |
I will test with a synthetic data set of the same shape ASAP, unfortunately the data I'm currently using is under NDA thus I cannot share it for reproducibility purposes. |
I confirm I can reproduce the issue with a synthetic dataset (I used the default on make_classification because I don't think the make up fo the data is relevant to the issue): from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
X, y = make_classification(n_samples=470605, n_features=332, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42)
rf = RandomForestClassifier(n_estimators = 250,
n_jobs = -1,
oob_score = True,
bootstrap = True,
random_state = 42)
rf.fit(X_train, y_train)
permImp = permutation_importance(rf,
X_val,
y_val,
scoring='f1',
n_repeats=5,
n_jobs=-1,
random_state=42) I see very high memory usage (>80%, sometimes close to 100%) and a ton of activity on the pagefile and a process related to |
This might be where the issue happens: https://joblib.readthedocs.io/en/latest/parallel.html#working-with-numerical-data-in-shared-memory-memmapping Should I try increasing the page file size (8GB currently) and see if it still crashes? Altough joblib doesn't seem to care about the page file but just dumps the data to disk. |
Thanks for the example. I can't reproduce on Linux. Yes, my guess would be that,
ether way this issue being raised by one or multiple system call seem more plausible than any Cython code issues in the RF. So setting |
I'll try halving the no. of cores on the permutation_importance call, but to me it seems strange that scikit would crash and burn on such a simple example. |
Halving the no. of cores (
Furthermore, joblib seems to have already used up 10GB of space on my drive |
So, the PC didn't crash and I got the results back on the synthetic dataset with |
The disk usage is concerning as well. @lesteve or @ogrisel might have some ideas. To summarize the issues are,
|
I updated joblib to 0.16.0 and tried running permutation importance on the original dataset, CPU usage showed some dips from 100% so I don't think the CPU is oversubscripted (with |
Furthermore after restarting space left on the drive is 25.8GB, meaning that the cached data is still somewhere on disk! |
Thanks for investigating! Yes, it's absolutely not OK as a default behavior. The 50k created file objects is also not good. Could you show a few filenames in one of those memmaping folders please? We'll need some joblib experts to have a look at it. |
Unfortunately I already removed those folders and cannot remember the exact file names, however I can tell you that they all had short file names, and all of them had the same size, which was around ~2MB |
I can reproduce on Windows. On a machine with 4 CPU cores, I get a Investigating this, when one runs Then using permutation_importance will parallelize over features (
require="sharedmem" is somehow misbehaving in this case of nested parallelism, and triggers mmaping as well.
|
Would it be possible that both the |
@rth, this might be related? joblib/joblib#966 Altough it seemed to not be present in 0.14.1 which I was using at the beginning. |
Also this might be related joblib/joblib#690 |
joblib is storing the input data for each subproblem as memory mapped files to share memory between the subprocess and the main process. For this specific workload this memmapping is never useful because the data is never going to be same because the permutations are random and independent of one another. I will have a look to see if we can disable it there. |
Actually, looking at the code, using mmaping should be fine (and useful) because joblib.Parallel is called in the non permuted data and the permutation is applied on a copy of the data in the worker. Maybe there is something wrong and low level only happening on windows. I need to investigate with a windows VM. |
Removing the milestone since it doesn't seem to be a blocker. |
In the issue #16716, it seems that a lot of memmaps were created. I am wondering if it could be a similar issue? |
Removing the Blocker label since this hasn't been a blocker for a while. My wild guess is that this is a joblib issue which on Windows sometimes does things which are very inefficient? In an ideal world, someone with a Windows machine would try to see if #18187 (comment) still reproduces and compare with the status of #18187 (comment), I am guessing the problem may still exist ... Edit: I added the "OS:Windows" label that I just created together with "OS:Linux" and "OS:macOS" |
Hi, thanks for the suggestion @lesteve! I have a Windows setup and can work on it, as you said. The first step would be reproducing it again, since the Issue is a bit old. I can work on it (I will probably take it to reproduce next week). So, I can /take this Issue. |
Update: Started working on it, but it seems that CI is not recognizing /take. |
@vitorpohlenz thanks for working on this. And don't worry about taking the issue. Commenting as you have is enough. |
Thanks for the summary @rth, System details:
Here is the example provided with few modifications to use half of available cpus: from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import permutation_importance
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
#Import os to check number of cpus available
import os
n_cpus = os.cpu_count()
cpus_to_use = n_cpus//2
X, y = make_classification(n_samples=470605, n_features=332, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42)
rf = RandomForestClassifier(n_estimators = 250,
n_jobs = cpus_to_use,
oob_score = True,
bootstrap = True,
random_state = 42)
rf.fit(X_train, y_train)
permImp = permutation_importance(rf,
X_val,
y_val,
scoring='f1',
n_repeats=5,
n_jobs=cpus_to_use,
random_state=42) In the example below, everything runs smoothly (at least on my machine), and the problem with high usage of disk for caching does not occur. But if I run the code below with Error log
So, it seems that the possible " CPU oversubscription" still occurs. @glemaitre, as you mentioned in #16716 it may be related, but I'm not sure. |
Update: It seems that the So I have some new pieces of information:
For [3.] I have the following image showing a 37GB temporary folder of With the above information, I have some questions that you all may help me with. Since [1.] and [2.] seem to be solved, is it worth working on [3.]? I think that is relevant, but it would be great to have a confirmation. |
Thanks for your investigations. With the information you provide, I think we can close this issue. However, as for the memmapping size, if you can find a way to reduce it w/o impacting performance, that would be a very welcome contribution. Thanks again for all the work! |
Thanks for replying @adrinjalali ! Actually, I looked at the code to check if I could improve the disk usage, but it seems that this part is handled by But using 30GB in disk with temporary file is still better than using 30GB in RAM, so not sure how to improve this situation... |
One can look into whether or how hard it's feasible to reduce that "duplication", but if not, then there's no solution to it. |
Thanks for the tip @adrinjalali, I gave a look at joblib.Parallel and how sklearn uses it in This seems like the "short blanket dilemma," in my opinion, but I'm not sure if there is a straight solution to it. |
Describe the bug
Running permutation_importance on a medium-sized data set results in a BSOD on Windows 10. The dataset is 470605 x 332, code is running in a Jupyter notebook, Python version 3.7.6, scikit version 0.22.1.
The BSOD is a KERNEL_SECURITY_CHECK_FAILURE, with ERROR_CODE:
(NTSTATUS) 0xc0000409 - The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application.
The machine has a Ryzen 5 3600 with 16GB of RAM.
Steps/Code to Reproduce
Expected Results
No BSOD, permutation importance computed.
Actual Results
BSOD after ~1-2 minutes
Versions
System:
python: 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\lucag\anaconda3\python.exe
machine: Windows-10-10.0.18362-SP0
Python dependencies:
pip: 20.0.2
setuptools: 45.2.0.post20200210
sklearn: 0.22.1
numpy: 1.18.1
scipy: 1.4.1
Cython: 0.29.15
pandas: 1.0.1
matplotlib: 3.1.3
joblib: 0.14.1
Built with OpenMP: True
The text was updated successfully, but these errors were encountered: