-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
n_jobs support in GradientBoostingClassifier #3628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
we could use joblib w/ threading the backend here but we need to ensure that each thread has a separate |
Indeed, we should pay attention ot this. The safest way is instantiate as many splitters and criteria objects as |
Does the new joblib backend allow sharing a thread pool across calls? |
@glouppe Creating one splitter per class would generate a lot of redundant work in the case of PresortBestSplitter because of X_argsorted. It would be nice if X_argsorted could be passed as an optional argument to the build method of a builder, and then the builder would pass it to the splitter if needed. In the mean time, perhaps it's better to just use BestSplitter. |
At this time this is not the case, the thread pool is created at each call. |
That would need to be benchmarked. |
I think it might be more generally useful to do threading for finding the best feature to split on inside the This would benefit both GBR classification and regression and also Adaboost with trees. The refactoring of the cython code to do so does not seem trivial though. |
For RF, in the single machine case, the search for the best feature to split on should probably not be parallelized, since parallelization is already done for creating trees. Two levels of parallelization are likely to lead to hangs. For the multiple machine case, this would be useful though. For GBRT, this is of course not a problem since learning is sequential. |
@ogrisel I strongly agree and would love to do that but I see some issues currently:
|
I would like to contribute to scikit-learn, is this issue open to take up? |
@nachi11 which one? Parallelizing the multi-class case or parallizing the induction of a single tree by computing the best split for each feature in parallel? Both would be cool but the latter would be the path to fame & glory & a crate of beer |
I will try the easier one first, I'll work on parallelizing the multi-class case. I have the environment configured. How do I start? |
@nachi11 please read this guide carefully http://scikit-learn.org/stable/developers/#contributing . |
Honestly I am not sure that working on |
@pprett Could you please suggest me some urgent issues which can be solved by a beginner. |
@ogrisel Thanks I'll try to solve it and get back. |
Is there active development on this issue? |
@angadgill no. See #3628 (comment) ;) |
Ah, I see. Thanks for pointing me to it. I'm considering taking this up as a research project. Any useful pointers? |
+1 |
I think this is open to people who want to try the "multiclass" parallelization, though I agree with @ogrisel that that might not be that useful. We are convinced by benchmarks though. Also still open for a brave soul to parallelize the tree on cython level. |
Regarding the first task, Multiclass Parallelization , has it been done or is someone working on it? |
Parallelized decision tree building is still an unsolved issue, in part due to the complexity of the underlying code. |
for the parallelized tree building, is it correct that essentially the loop over features starting here scikit-learn/sklearn/tree/_splitter.pyx Line 381 in 194c231
|
Yes. |
While working on #8779 I noticed the following related to
|
Regarding 5., you have a large overlap. We are implementing the same architecture than in XGBoost for the exact approach. Currently, @raghavrv is cynthonizing the implementation. It should be almost ready in couple of days, in order to be benchmarked. With this implementation, it should be relatively easy to parallelize the gradient boosting, when finding the split for each feature, using In the meanwhile, I started to work on using an approximation approach using histogram which is much faster with large number of samples. This is a feature which used in LightGBM, XGBoost, and FastBDT. We needed first a quantile transformer which took more time than expected since it can be used for pre-processing as well. It is almost ready to be merged and I will be able to focus on the histogram computation from now on. @MechCoder I passed by |
Thanks for the ping and your great work on the quantile transformer. Sorry if this is dumb, but do you know why the quantile transformation helps in the case of a decision tree, i.e the split computation is independent of the feature scaling no? Reg: scikit-garden, the quantiles returned are at predict time as an estimate of |
It will help in the approximation mode. The quantile transformer is used for binning the data later used to build the histogram. In addition, using something fitting in an int8 could allow to use the CPU more efficiently. |
Ah yes, thanks!
On May 4, 2017 3:44 PM, "Guillaume Lemaitre" <notifications@github.com> wrote:
Thanks for the ping and your great work on the quantile transformer. Sorry
if this is dumb, but do you know why the quantile transformation helps in
the case of a decision tree, i.e the split computation is independent of
the feature scaling no?
It will help in the approximation mode. The quantile transformer is used
for binning the data later used to build the histogram. In addition, using
something fitting in an int8 could allow to use the CPU more efficiently.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3628 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABx9ELq1vlAp29YEDd4cxFo7yfZheoe8ks5r2iqqgaJpZM4Cd_iW>
.
|
@glemaitre, @NicolasHug , the |
yeah I think this was about parallelizing the |
Parallelizing over the feature in the decision tree would still be a thing to try. |
No doubt about that, but the reference in the description is a 404 not found link and a lot of things changed in skl. As this is a "help wanted" issue, maybe it is worth to open a new issue with updated references and description. |
just curious, in what cases do you think this would be useful? For forests, we parallelize with joblib at the tree level so that would not be needed. |
This level of parallelization would be a
I agree that HGBDT is the algorithm to use. However, the improvement would impact the three classes so, if the GBDT can be speed-up for "free" by having a |
Yeah I think this should rather happen after the refactoring of the tree code, and I think once we get to doing that, we would also add the parallelization at the same time. So this can stay closed, but I don't think we'll get rid of the tree code or the old ensembles. |
The following loop is embarrassingly parallel:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/gradient_boosting.py#L552
Edit: I (@ogrisel) removed the Easy tag and put a Moderate tag instead. Based on the discussion below the most beneficial way to add
n_jobs
support for GBRT would be deep inside the Cython tree code (to benefit GB regression and adaboost models as wells instead of just GB classification).The text was updated successfully, but these errors were encountered: