-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
GridSearchCV parallel execution with own scorer freezes #2889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is surprising, so we'll have to work out what the problem is and make sure it works! Can you please provide a little more detail:
FWIW, my machine has no problem fitting iris with this snippet on the development version of sklearn. |
Thank you for your fast reply. With crashing I actually mean freezing. It doesn't continue anymore and there is also no more activity to be monitored in the python process of task manager of windows. The processes are still there and consume a constant amount of RAM but require no processing time. This is scikit-learn version 0.14, last updated and run using Enthought Canopy. I am on platform "Windows-7-6.1.7601-SP1". I will go more into depth by providing a generic example of the problem. I think it has to do with the GridSearchCV being placed in a for loop. (To not waste too much of your time, you should probably start at the run_tune_process() method which is being called at the bottom of the code and calls the method containing GridSearchCV() in a for loop) Code:
Once again, this code works on my computer only when I change n_jobs to 1 or when I don't define a scoring= argument. |
Generally multiprocessing in Windows encounters a lot of problems. But I Are you able to try this out in the current development version, to see if On 25 February 2014 20:40, adverley notifications@github.com wrote:
|
(As a side point, @ogrisel, I note there seems to be a lot more joblib On 25 February 2014 21:52, Joel Nothman jnothman@student.usyd.edu.auwrote:
|
This has nothing to do with custom scorers. This is a well-known feature of Python multiprocessing on Windows: you have to run everything that uses |
Well, the good news is that nowadays joblib gives a meaningful error |
@GaelVaroquaux does current scikit-learn give that error message? If so, the issue can be considered fixed, IMHO. |
It should do. The only way to be sure is to check. I am on the move right |
I'm not going to install a C compiler on Windows just for this. Sorry, but I really don't do Windows :) |
I have a Windows VM. I can check. It's just a question of finding a |
@larsmans , you are completely right. The custom scorer object was a mistake of me, the problem lies indeed in the multiprocessing on windows. I tried this same code on a Linux and it runs well. I don't get any error messages because it doesn't crash, it just stops doing any meaningful. |
@adverley Could you try the most recent version from GitHub on your Windows box? |
Closing because of lack of feeback and it is probably a known issue that is fixed in newer joblib. |
Not sure if related, does seem to be. In windows, custom scorer still freezes. I encountered this thread on google - removed the scorer, and the grid search works. When it freezes, it shows no error message. There are 3 python processes spawned too (because I set n_jobs=3). However, the CPU utilization remains 0 for all python processes. I am using IPython Notebook. |
Can you share the code of the scorer? It seems a bit unlikely. |
Does your scorer use joblib / n_jobs anywhere? It shouldn't, and that could maybe cause problems (though I think joblib should detect that). |
Sure - here's the full code - http://pastebin.com/yUE26SNs The scorer function is "score_model", it doesn't use joblib. This runs from command prompt, but not from IPython Notebook. The error message is - Then the IPython and all the spawned python instances become idle - silently - and don't respond to any python code anymore till I restart it. |
Fix the attribute error, then it'll work. |
Well I do not know what causes the AttributeError... Though it is most likely related to joblibs, since it happens only when n_jobs is more than 1, runs fine with The error talks about attribute (I realized that the error line was pasted incorrectly above - I edited in the post above.) I don't use pylab. Here's the full extended error message - http://pastebin.com/23y5uHT2 |
Hum, that is likely related to issues of multiprocessing on windows. Maybe @GaelVaroquaux or @ogrisel can help. |
guys...thanks for the thread. Anyway i should have checked this thread before, wasted 5 hours of my time on this. Trying to run in parallel processing. Thanks a lot :) |
The best way of turning these 5 hours into something useful for the project, would be to provide us with a stand-alone example reproducing the problem. |
I was experiencing the same issue on Windows 10 working in Jupyter notebook trying to use a custom scorer within a nested cross-validation and n_jobs=-1. I was getting the |
I have the exact same problem on OSX 10.10.5 |
Same here. |
Please give a reproducible code snippet. We'd love to get to the bottom of this. It is hard to understand without code, including data, that shows us the issue. |
Just run these lines in a python shell import numpy as np
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from sklearn.preprocessing import RobustScaler
from sklearn.metrics import classification_report
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_predict
np.random.seed(1234)
X = np.random.sample((1000, 100))
Y = np.random.sample((1000)) > 0.5
svc_pipeline = Pipeline([('pca', PCA(n_components=95)), ('svc', SVC())])
predictions = cross_val_predict(svc_pipeline, X, Y, cv=30, n_jobs=-1)
print classification_report(Y, predictions) Note that removing the PCA step from the pipeline solves the issue. More info: Darwin-16.6.0-x86_64-i386-64bit |
seeing as you don't use a custom scorer, should we assume that is a
separate issue?
…On 8 Aug 2017 6:15 pm, "boazsh" ***@***.***> wrote:
Just run these lines in a python shell
from sklearn.decomposition import PCAfrom sklearn.svm import SVCfrom sklearn.preprocessing import RobustScalerfrom sklearn.metrics import classification_reportfrom sklearn.pipeline import Pipelinefrom sklearn.model_selection import cross_val_predict
X = np.random.sample((1000, 100))
Y = np.random.sample((1000)) > 0.5
svc_pipeline = Pipeline([('pca', PCA(n_components=95)), ('svc', SVC())])
predictions = cross_val_predict(svc_pipeline, X, Y, cv=30, n_jobs=-1)print classification_report(Y, predictions)
Note that removing the PCA step from the pipeline solves the issue.
More info:
scikit-learn==0.18.2
scipy==0.19.1
numpy==1.12.1
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2889 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6-6Klhc67b5kZ17fFTxc8RfZQ_BWks5sWBkLgaJpZM4BkiD9>
.
|
@KaisJM I think it is more useful if you start from your freezing script and manage to simplify and post a fully stand-alone that freezes for you. |
@lesteve Agreed. I created a new python2 environment like the one I had before installing Gensim. Code ran fine, NO freeze with n_jobs=-1. What's more, Numpy is using OpenBLAS and has the same config as the environment that exhibits the freeze (the one where Gensim was installed). So it seems that openblas is not the cause of this freeze.
|
@KaisJM I'm running the same snippet here (windows) and it freezes.
I know that it's awkward but it didn't froze when running with a custom metric. |
I have a similar problem. I have been running the same code and simply wanted to update the model with the new month data and it stopped running. i believe sklearn got updated in the meantime to 0.19 |
Running GridSearchCV or RandomizedSearchCV in a loop and n_jobs > 1 would hang silently in Jupiter & IntelliJ:
Followed @lesteve recommendation & checked environment & removed numpy installed with pip: Darwin-16.6.0-x86_64-i386-64bit $conda list | grep numpy $pip uninstall numpy $conda list | grep numpy $conda install numpy -f // most likely unnecessary $conda list | grep numpy Fixed my problem. |
@paulaceccon your problem is related to
The following will run your code:
with external.py
Results running on 8 cores Fitting 3 folds for each of 54 candidates, totalling 162 fits |
Issue is still there guys. I am using a custom scorer and it keeps going on forever when I set n_jobs to anything. When I don't specify n_jobs at all it works fine but otherwise it freezes. |
Can you provide a stand-alone snippet to reproduce the problem ? Please read https://stackoverflow.com/help/mcve for more details. |
Still facing this problem with the same sample code. Windows-10-10.0.15063-SP0 |
Can you provide a stand-alone snippet to reproduce the problem ? Please read https://stackoverflow.com/help/mcve for more details. |
I suspect this is the same old multiprocessing in windows issue. see our FAQ
|
I tested the code in thomberg1's #2889 (comment). OS: Windows 10 x64 10.0.16299.309 It worked fine in Jupyter Notebook and command-line. |
HI, i m having the same issue, so i did not want to open new one which could lead to almost identical thread. -Macos
Code is from a tutorial : https://machinelearningmastery.com/use-keras-deep-learning-models-scikit-learn-python/ |
it runs if I add the multiprocessing import and the if statement as show below - I don't work with keras so I don't have more insight
Fitting 3 folds for each of 18 candidates, totalling 54 fits[Parallel(n_jobs=12)]: Done 26 tasks | elapsed: 18.4s version info if needed compiler : GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final) |
Thank you @thomberg1 , but adding
did not help. The problem is still the same |
Same problem on my machine when using customized scoring function in |
@byrony can you provide code to reproduce? did you use |
I've experienced a similar problem multiple times on my machine when using
Here is the code I used: from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import PCA
from sklearn.utils import shuffle
from sklearn.neural_network import MLPClassifier
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
def main():
df = pd.read_csv('../csvs/my_data.csv', nrows=4000000)
X = np.array(list(map(lambda a: np.fromstring(a[1:-1] , sep=','), df['X'])))
y = np.array(list(map(lambda a: np.fromstring(a[1:-1] , sep=','), df['y'])))
scalerX = MinMaxScaler()
scalerY = MinMaxScaler()
X = scalerX.fit_transform(X)
y = scalerY.fit_transform(y)
grid_params = {
'beta_1': [ .1, .2, .3, .4, .5, .6, .7, .8, .9 ],
'activation': ['identity', 'logistic', 'tanh', 'relu'],
'learning_rate_init': [0.01, 0.001, 0.0001]
}
estimator = MLPClassifier(random_state=1,
max_iter=1000,
verbose=10,
early_stopping=True)
gs = GridSearchCV(estimator,
grid_params,
cv=5,
verbose=10,
return_train_score=True,
n_jobs=8)
X, y = shuffle(X, y, random_state=0)
y = y.astype(np.int16)
gs.fit(X, y.ravel())
print("GridSearchCV Report \n\n")
print("best_estimator_ {}".format(gs.best_estimator_))
print("best_score_ {}".format(gs.best_score_))
print("best_params_ {}".format(gs.best_params_))
print("best_index_ {}".format(gs.best_index_))
print("scorer_ {}".format(gs.scorer_))
print("n_splits_ {}".format(gs.n_splits_))
print("Exporting")
results = pd.DataFrame(data=gs.cv_results_)
results.to_csv('../csvs/gs_results.csv')
if __name__ == '__main__':
main() I know is a big dataset so I expected it would take some time to get results but then after 2 days running, it just stopped working (the script keeps executing but is not using any resource apart from RAM and swap). Thanks in advance! |
@amueller I didn't use the
|
You're using XGBoost. I don't know what they do internally, it's very possible that's the issue. Can you try to see if adding the |
@Pazitos10 can you reproduce with synthetic data and/or smaller data? I can't reproduce without your data and it would be good to reproduce in shorter time. |
@amueller Ok, I will run it again with 500k rows and will post the results. Thanks! |
@amueller, running the script with 50k rows works as expected. The script ends correctly, showing the results as follows (sorry, I meant 50k not 500k): The problem is that I don't know if these results are going to be the best for my whole dataset. Any advice? |
Seems like you're running out of ram. Maybe try using Keras instead, it's likely a better solution for large scale neural nets. |
@amueller Oh, ok. I will try using Keras instead. Thank you again! |
Is it perhaps an idea for scikit, that in case of Windows to alter the function |
@PGTBoos this is fixed in scikit-learn 0.20.0 |
I have been searching hours on this problem and can consistently replicate it:
This snippet crashes because of scoring=metrics.make_scorer(metrics.scorer.f1_score, average="macro") where metrics refers to sklearn.metrics module. If I cancel out the scoring=... line, the parallel execution works. If I want to use the f1 score as evaluation method, I have to cancel out the parallel execution by setting n_jobs = 1.
Is there a way I can define another score method without losing the parallel execution possibility?
Thanks
The text was updated successfully, but these errors were encountered: