[MRG] Sparse input support for Gradient Boosting #3880

pprett · 2014-11-24T21:30:52Z

adds sparse input support:

Falls back to BestSparseSplitter
Holds data in both csc and csr format during fit (needs the latter just to call tree.apply to get the terminal leaf assignment).

Todo:

more & better tests
benchmarks
partial dependence

cc @arjoly @jnothman @glouppe

pprett · 2014-11-24T21:31:21Z

@arjoly this is an initial draft -- its fully functional but I need to do more benchmarks

pprett · 2014-11-24T21:34:15Z

on covertype:

GBRT (25 trees of depth 3):

Classifier   train-time test-time error-rate
--------------------------------------------
dense         69.2286s   0.0894s     0.2201 
sparse        105.2789s   0.1667s     0.2201

CART tree

Classifier   train-time test-time error-rate
--------------------------------------------
dense          18.0742s   0.0215s     0.0423  
sparse          63.5370s   0.0408s     0.0423

pprett · 2014-11-24T21:36:11Z

covertype is not too sparse -- but sparse enough to get an improvement for SGD

Classifier   train-time test-time error-rate
--------------------------------------------
sparse         0.2035s    0.0042s     0.2301  
dense          0.4488s    0.0235s     0.2302

pprett · 2014-11-24T21:37:46Z

sklearn/ensemble/_gradient_boosting.pyx

+    for i in range(n_estimators):
+        for k in range(K):
+            tree = estimators[i, k].tree_
+            out += scale * tree.predict(X).reshape((X.shape[0], 1))


in the sparse case we fall back to tree.predict -- the special implementation above is only to get good performance for very small X

Please put that remark as an inline comment.

arjoly · 2014-11-25T15:56:10Z

I am curious. Can you benchmark on 20 newsgroup? There is a benchmark script in benchmarks.

arjoly · 2014-11-25T15:57:18Z

If you have time, can you have a look at #3790? I would like to end this and at least put some improvement in the gbrt from this pr.

arjoly · 2014-11-25T15:58:11Z

benchmarks/bench_covertype.py

+    if sparse:
+        to_sparse = sp.csc_matrix
+        if order.lower() == 'c':
+            to_sparse = sp.csr_matrix


It's a bit hackish...

I would make the sparse argument a str with the csr / csc / coo / None (default) value.

pprett · 2014-11-25T16:18:16Z

I am curious. Can you benchmark on 20 newsgroup? There is a benchmark script in benchmarks.

@arjoly I tried it -- it takes ages -- waiting 5 minutes to train a single decision stump; unbearable. I wonder if we should create a dedicated DecisionStumpClassifier|Regressor that is optimized for sparse inputs

pprett · 2014-11-25T16:24:01Z

strange.. adaboost is way faster on this benchmark than gbrt... both use max_depht=1.. both use the BestSparseSplitter -- @arjoly do you know of any enhancements to adaboost regarding sparse data?

pprett · 2014-11-25T16:42:23Z

@arjoly nevermind -- stupid me... stupid me.. stupid me...

Thou shalt not use gradient boosting for multi-class problems

pprett · 2014-11-25T16:58:11Z

so -- here we go -- of course we will try to distinguish between alt.atheism and soc.religion.christian::

Classifier   train-time test-time Accuracy
--------------------------------------------
dummy             0.0003s    0.0006s     0.4658  
naive_bayes       0.0207s    0.0032s     0.6750  
cart              0.3463s    0.0014s     0.7406  
logistic_regression  0.1091s    0.0010s     0.8563  
adaboost          31.0893s   0.1319s     0.9219  
gbrt              24.4430s   0.0965s     0.9247

adaboost and gbrt use 100 stumps

pprett · 2014-11-25T16:59:26Z

the time difference between gbrt and adaboost is the difference between optimizing for MSE and Gini, respectively.

arjoly · 2014-11-25T17:06:04Z

Thanks for the benchmark. I will try to find time tomorrow to read further this pr.

mblondel · 2014-11-25T17:43:35Z

Holds data in both csc and csr format during fit

Would be nice to find a way around this.

pprett · 2014-11-25T17:47:29Z

Would be nice to find a way around this.

yes, we currently need csr to make predictions (which training example ended up in which leaf) and csc for fitting the trees. If we record during tree fitting which training examples ended up in which leaf then we dont need to run the tree.apply, however, we would need to do it for the out-of-bag examples (subsample < 1).

To make matters worse: our trees require CSR format for predictions (makes a lot of sense to assume that), so if you want to make predictions using your model, you need to have your data in CSR anyways. So for example if you make a grid search using a sparse matrix, each grid point will convert your X anyways (either CSC or CSR)

fix: staged predictions on sparse data

arjoly · 2014-11-27T09:14:32Z

Making an apply for csc matrices could be an option. It would require to code the same algorithm as for the depth tree builder.

ogrisel · 2014-11-27T09:35:08Z

sklearn/ensemble/_gradient_boosting.pyx

+    Each estimator in the stage is scaled by ``scale`` before
+    its prediction is added to ``out``.
+    """
+    return predict_stages_sparse(estimators[stage:stage + 1], X, scale, out)


Why introduce the pair of functions predict_stage_sparse / predict_stage_dense instead of checking sp.issparse(X) to dispatch to the right predict_stages_xxx method?

ogrisel · 2014-11-27T14:56:50Z

I think it's ok to hold data under both CSR and CSC during fit for now. +1 for merge on my side.

pprett · 2014-11-27T15:23:29Z

forgot to push the commits to address the review

mblondel · 2014-11-27T15:31:48Z

Making an apply for csc matrices could be an option. It would require to code the same algorithm as for the depth tree builder.

I haven't thought about it deeply but it should be possible to implement apply directly on a CSC matrix using a binary search to retrieve feature j in sample i.

ogrisel · 2014-11-28T14:55:32Z

I haven't thought about it deeply but it should be possible to implement apply directly on a CSC matrix using a binary search to retrieve feature j in sample i.

That would be great. However I don't want to delay this PR too much if it's not easy to do. We could add an inline comment where X_apply is pre-computed to tell that it would be better to add native CSC support to the apply method of trees to spare the in-memory input data copy.

mblondel · 2014-11-28T15:11:34Z

I don't oppose merging but CSC support would be nice to have before the
next release :)

arjoly · 2014-11-28T17:08:36Z

sklearn/ensemble/partial_dependence.py

@@ -20,6 +21,16 @@
 from .gradient_boosting import BaseGradientBoosting


+def _csc_col_minmax(X):


Why not using min_max_axis from sklearn.utils.sparsefuncs?

good point -- I'll check min_max_axis -- I still need a utility function though because I need the same return value as mquantile

arjoly · 2015-01-19T14:32:54Z

I don't oppose merging but CSC support would be nice to have before the
next release :)

I have just implemented a csc implementation of apply (see this branch). Unfortunately, it is considerably slower than the csr implementation (here the benchmark) .

mblondel · 2015-01-19T15:52:39Z

@arjoly Any clue why this is so slow? This seems like an algorithmic issue.

amueller · 2015-06-08T21:42:04Z

@pprett didn't you say at the sprint this was merged? hum :-/

arjoly · 2015-09-29T14:16:17Z

done and super-seeded by #5252 thanks to @jmschrei

pprett reviewed Nov 24, 2014
View reviewed changes

arjoly reviewed Nov 25, 2014
View reviewed changes

pprett added 5 commits November 25, 2014 17:29

sparse input support for GBRT

bda27ee

add sparse data support to benchmark script

b19ce10

use sklearn backports of np assertions

ec794e5

make --sparse a str

5935c17

cleanup after rebase

84d79f9

pprett force-pushed the gbrt-sparse branch from 0b92b1c to 84d79f9 Compare November 25, 2014 16:35

pprett added 4 commits November 26, 2014 19:21

better tests for sparse gbrt

bb0c94d

fix: staged predictions on sparse data

moa tests for sparse gbrt

b5aff3f

partial dependence supports sparse inputs

750bde2

cosmit

3715fb1

pprett changed the title ~~Sparse input support for Gradient Boosting~~ [MRG] Sparse input support for Gradient Boosting Nov 26, 2014

use six.moves.zip instead of izip

3db74bf

ogrisel reviewed Nov 27, 2014
View reviewed changes

cosmit: rename csc_col_minmax

2ba830d

enh: added improvements pointed out in the reviews

fd2957a

restort 20newsgroups

fa812b8

arjoly reviewed Nov 28, 2014
View reviewed changes

glouppe mentioned this pull request May 11, 2015

The ensembles like random forest classifier need to support sparse matrix #655

Closed

arjoly closed this Sep 29, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Sparse input support for Gradient Boosting #3880

[MRG] Sparse input support for Gradient Boosting #3880

pprett commented Nov 24, 2014

pprett commented Nov 24, 2014

pprett commented Nov 24, 2014

pprett commented Nov 24, 2014

pprett Nov 24, 2014

ogrisel Nov 27, 2014

pprett Nov 27, 2014

arjoly commented Nov 25, 2014

arjoly commented Nov 25, 2014

arjoly Nov 25, 2014

arjoly Nov 25, 2014

pprett Nov 25, 2014

pprett commented Nov 25, 2014

pprett commented Nov 25, 2014

pprett commented Nov 25, 2014

pprett commented Nov 25, 2014

pprett commented Nov 25, 2014

arjoly commented Nov 25, 2014

mblondel commented Nov 25, 2014

pprett commented Nov 25, 2014

arjoly commented Nov 27, 2014

ogrisel Nov 27, 2014

pprett Nov 27, 2014

ogrisel commented Nov 27, 2014

pprett commented Nov 27, 2014

mblondel commented Nov 27, 2014

ogrisel commented Nov 28, 2014

mblondel commented Nov 28, 2014

arjoly Nov 28, 2014

pprett Nov 28, 2014

arjoly commented Jan 19, 2015

mblondel commented Jan 19, 2015

amueller commented Jun 8, 2015

arjoly commented Sep 29, 2015

		@@ -20,6 +21,16 @@
		from .gradient_boosting import BaseGradientBoosting


		def _csc_col_minmax(X):

[MRG] Sparse input support for Gradient Boosting #3880

[MRG] Sparse input support for Gradient Boosting #3880

Conversation

pprett commented Nov 24, 2014

pprett commented Nov 24, 2014

pprett commented Nov 24, 2014

pprett commented Nov 24, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arjoly commented Nov 25, 2014

arjoly commented Nov 25, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pprett commented Nov 25, 2014

pprett commented Nov 25, 2014

pprett commented Nov 25, 2014

pprett commented Nov 25, 2014

pprett commented Nov 25, 2014

arjoly commented Nov 25, 2014

mblondel commented Nov 25, 2014

pprett commented Nov 25, 2014

arjoly commented Nov 27, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ogrisel commented Nov 27, 2014

pprett commented Nov 27, 2014

mblondel commented Nov 27, 2014

ogrisel commented Nov 28, 2014

mblondel commented Nov 28, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arjoly commented Jan 19, 2015

mblondel commented Jan 19, 2015

amueller commented Jun 8, 2015

arjoly commented Sep 29, 2015