Parallel grid-search for multiple slices of same data to improve throughput. #29645
dhritimaandas
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, the current GridSearchCV functionality available enables multiprocessing on one set of data given parameter search space and estimator and parallelizes it using n_jobs, however if we want to use multiple slices of the data to get parameter tuned from any estimator to get multiple tuned models from different slices of the data, we need to run the it in loop sequentially as nested multiprocessing is restricted in python.
Through this discussion want to discuss this issue as it can simply be addressed by taking in list of slices in input to the function and modify call to parallel() function in GridSearchCV .fit() method to enable additional level of parallelism. I have tested this change and we are getting 2x throughput boost for multiple sliced data inputs.
Beta Was this translation helpful? Give feedback.
All reactions