-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Sparse input support for Gradient Boosting #3880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@arjoly this is an initial draft -- its fully functional but I need to do more benchmarks |
on covertype: GBRT (25 trees of depth 3): Classifier train-time test-time error-rate -------------------------------------------- dense 69.2286s 0.0894s 0.2201 sparse 105.2789s 0.1667s 0.2201 CART tree Classifier train-time test-time error-rate -------------------------------------------- dense 18.0742s 0.0215s 0.0423 sparse 63.5370s 0.0408s 0.0423 |
covertype is not too sparse -- but sparse enough to get an improvement for SGD Classifier train-time test-time error-rate -------------------------------------------- sparse 0.2035s 0.0042s 0.2301 dense 0.4488s 0.0235s 0.2302 |
for i in range(n_estimators): | ||
for k in range(K): | ||
tree = estimators[i, k].tree_ | ||
out += scale * tree.predict(X).reshape((X.shape[0], 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the sparse case we fall back to tree.predict
-- the special implementation above is only to get good performance for very small X
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put that remark as an inline comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check
I am curious. Can you benchmark on 20 newsgroup? There is a benchmark script in |
If you have time, can you have a look at #3790? I would like to end this and at least put some improvement in the gbrt from this pr. |
if sparse: | ||
to_sparse = sp.csc_matrix | ||
if order.lower() == 'c': | ||
to_sparse = sp.csr_matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit hackish...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make the sparse argument a str with the csr / csc / coo / None (default) value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check
@arjoly I tried it -- it takes ages -- waiting 5 minutes to train a single decision stump; unbearable. I wonder if we should create a dedicated |
strange.. adaboost is way faster on this benchmark than gbrt... both use |
@arjoly nevermind -- stupid me... stupid me.. stupid me...
|
so -- here we go -- of course we will try to distinguish between Classifier train-time test-time Accuracy -------------------------------------------- dummy 0.0003s 0.0006s 0.4658 naive_bayes 0.0207s 0.0032s 0.6750 cart 0.3463s 0.0014s 0.7406 logistic_regression 0.1091s 0.0010s 0.8563 adaboost 31.0893s 0.1319s 0.9219 gbrt 24.4430s 0.0965s 0.9247 adaboost and gbrt use 100 stumps |
the time difference between gbrt and adaboost is the difference between optimizing for MSE and Gini, respectively. |
Thanks for the benchmark. I will try to find time tomorrow to read further this pr. |
Would be nice to find a way around this. |
yes, we currently need csr to make predictions (which training example ended up in which leaf) and csc for fitting the trees. If we record during tree fitting which training examples ended up in which leaf then we dont need to run the To make matters worse: our trees require CSR format for predictions (makes a lot of sense to assume that), so if you want to make predictions using your model, you need to have your data in CSR anyways. So for example if you make a grid search using a sparse matrix, each grid point will convert your X anyways (either CSC or CSR) |
fix: staged predictions on sparse data
Making an apply for csc matrices could be an option. It would require to code the same algorithm as for the depth tree builder. |
Each estimator in the stage is scaled by ``scale`` before | ||
its prediction is added to ``out``. | ||
""" | ||
return predict_stages_sparse(estimators[stage:stage + 1], X, scale, out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why introduce the pair of functions predict_stage_sparse
/ predict_stage_dense
instead of checking sp.issparse(X)
to dispatch to the right predict_stages_xxx
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check
I think it's ok to hold data under both CSR and CSC during fit for now. +1 for merge on my side. |
forgot to push the commits to address the review |
I haven't thought about it deeply but it should be possible to implement |
That would be great. However I don't want to delay this PR too much if it's not easy to do. We could add an inline comment where |
I don't oppose merging but CSC support would be nice to have before the |
@@ -20,6 +21,16 @@ | |||
from .gradient_boosting import BaseGradientBoosting | |||
|
|||
|
|||
def _csc_col_minmax(X): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using min_max_axis from sklearn.utils.sparsefuncs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point -- I'll check min_max_axis
-- I still need a utility function though because I need the same return value as mquantile
@arjoly Any clue why this is so slow? This seems like an algorithmic issue. |
@pprett didn't you say at the sprint this was merged? hum :-/ |
adds sparse input support:
BestSparseSplitter
csc
andcsr
format during fit (needs the latter just to calltree.apply
to get the terminal leaf assignment).Todo:
cc @arjoly @jnothman @glouppe