-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Dictionary learning is slower with n_jobs > 1 #4769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Incriminated line can be located l. 692 of dict_learning.py this_code = sparse_encode(this_X, dictionary.T, algorithm=method,
alpha=alpha, n_jobs=n_jobs).T We have : this_X.shape[0] = batch_size which is typically of the same order than n_jobs. sparse_encode computation time may be dominated by joblib overhead and not by the lasso computation time itself. |
Fix in pull request #4773 |
I believe I'm seeing this problem here: #4779 |
Agreed. Have run into this myself as well. Basically Joblib would normally make sense for finding the sparse code for a precomputed dictionary. However in this specific instance, it is being called in a tight loop as part of dictionary learning. The overhead of starting Joblib in this tight loop is too costly. One is better off using the threaded parallelism that BLAS provides instead. Edit: Alternatively it may make sense to start the Joblib thread or multiprocessing pool before the tight loop and keep it running so that the tight loop merely schedules jobs for this pool. Have not tested this latter approach, but it may work reasonably well. |
from #4773 (comment) The joblib context manager API is implemented and available in #5016. We did some profiling with @arthurmensch and apparently we should benefit a lot from skipping redundant inner input data validation checks by introducing a check_input=False flag for the functions called inside the fit loop (the default being check_input=True). Input validation takes a large fraction of the time and causes unnecessary GIL contention when using the threading backend. |
FWIW there has been a lot of work on this (particularly in the past few weeks) and IMHO things are getting better. The main things that remain slow on the sparse coding side are this copy and Most of these improvements have involved skipping the |
On the Joblib side, would expect PR ( https://github.com/tomMoral/loky/pull/135 ) ends up being very helpful here as the bigger issue is likely thread contention between the BLAS used and Joblib in sparse coding. |
The gists are no longer available. from time import time
import numpy as np
from sklearn.datasets import make_sparse_coded_signal
from sklearn.decomposition import MiniBatchDictionaryLearning
X, dictionary, code = make_sparse_coded_signal(
n_samples=1000, n_components=15, n_features=20, n_nonzero_coefs=10,
random_state=42)
n_jobs = [1, 2, 3, 4, 10]
for nj in n_jobs:
dict_learner = MiniBatchDictionaryLearning(
n_components=15, transform_algorithm='lasso_lars', random_state=42,
n_jobs=nj
)
t0 = time()
X_transformed = dict_learner.fit_transform(X)
print("done in %0.3fs, with n_jobs = %d." % ((time() - t0), nj)) gives the following output
With the following environment:
It seems that this issue is still relevant? |
Thanks for rechecking this. That is a good data point to have. I had tried to Cythonize a good chunk of code and release the GIL for the bulk of it with PR ( #11874 ). Maybe that helps? Unfortunately don't have much time to work on that PR these days. |
Removing the milestone. Maybe @jeremiedbb would like to have a look? |
Setting n_jobs > 1 in MiniBatchDictionaryLearning (and in function dictionary_learning_online) leads to worse performance.
Multi processing is handled in sklearn.decompositions, function dict_learning, l 249
Minimal example :
https://gist.github.com/arthurmensch/091d16c135f4a3ba5580
Output n_jobs = 1
Output n_jobs == 2
Output n_jobs == 4
We can see that transform function of MiniBatchDictionaryLearning (relying on sparse_encode function) benefits from multi-processing as expected.
Dictionary learning relies on successive calls of sparse_encode function : slowness may come from this.
The text was updated successfully, but these errors were encountered: