-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Changed n_jobs parameter to increase speed in plot_validation_curve.py #21638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Running with -1 by default is problematic because on machines with a large number of CPUs (e.g. 64 or more), spawning the workers can dominate with concurrent access to the hard disk just to start the python interpreters and import the modules. Furthermore it can also use too much memory and cause crashes. This is why we would rather use a small number of workers (e.g. 2 instead of -1) when we want to use parallelism in examples or tests in scikit-learn. |
I agree with @ogrisel , and I think alternative is to find other ways to speed up the example. You can set the n_jobs to 2, and find other ways to further make the example faster. |
@ogrisel @adrinjalali Okay, that makes sense, thanks for the explanation :) Will set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let's hope it runs faster on circle ci :)
This example uses the digits dataset, and I think that's the main source of it being slow. It'd be nice if you could try either iris or a synthetic dataset to see if you can get similar plots while making it significantly faster (I've seen a 100x speedup in some examples by getting rid of the digits dataset) |
I believe the combo of Gaussian RBF + digits is important to get such charateristic validation curves for gamma. But maybe it would be possible to get similar results with a random sub-sample, or considering a binary classification subproblem such as 1 vs 2 (to make it non trivial): X, y = load_digits(return_X_y=True)
subset_mask = np.isin(y, [1, 2]) # binary classification: 1 vs 2
X, y = X[subset_mask], y[subset_mask] Since SVC is and One vs Rest classifier that should greatly help ;) Edit: changed to 1 vs 2 which is slightly harder than 1 vs 7 |
@sveneschlbeck could you please apply Olivier's suggestion? |
@adrinjalali Yes, am on it! |
@adrinjalali @ogrisel The result makes a big difference in exec time (18 sec vs. 3 sec) but the "C" isn't as big and clearly shaped as before. What do you think? Should I change the code after this result?: |
To me it still shows the effect the same way, I'd be happy with it. |
* Changed n_jobs parameter to increase speed * Update plot_validation_curve.py * Update plot_validation_curve.py
* Changed n_jobs parameter to increase speed * Update plot_validation_curve.py * Update plot_validation_curve.py
* Changed n_jobs parameter to increase speed * Update plot_validation_curve.py * Update plot_validation_curve.py
* Changed n_jobs parameter to increase speed * Update plot_validation_curve.py * Update plot_validation_curve.py
* Changed n_jobs parameter to increase speed * Update plot_validation_curve.py * Update plot_validation_curve.py
#21598 @adrinjalali
Adapted the

n_jobs
parameter from 1 to -1 (auto-detect mode) which halfed the time needed to run the module