Skip to content

[MRG+1] Optimize criterion update by performing reverse update #5220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 8, 2015

Conversation

arjoly
Copy link
Member

@arjoly arjoly commented Sep 7, 2015

The speed up idea is to update the left and the right node statistics from end to new_pos instead of from pos to new_pos if it's beneficial (end - new_pos < new_pos - pos).

Benchmarks are coming.

# Go through each sample in the new indexing
for p in range(pos, new_pos):
i = samples[p]
if (new_pos - pos) <= (end - new_pos):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here explaining the motivation of this trick? Why would it be more efficient to count backwards?

@arjoly
Copy link
Member Author

arjoly commented Sep 7, 2015

It's faster on mnist and covertype.

# mnist master
ExtraTrees                   46.54s       0.50s       0.0294
RandomForest                 41.30s       0.42s       0.0318
CART                         21.15s       0.18s       0.1219

# mnist this-prt
ExtraTrees                   41.98s       0.49s       0.0294
RandomForest                 39.61s       0.42s       0.0318
CART                         20.23s       0.22s       0.1219

# covertype master
RandomForest  65.3439s   0.3412s     0.0330  
ExtraTrees    43.6079s   0.5208s     0.0372  
CART          14.4965s   0.0763s     0.0424  
GBRT         491.9427s   0.2677s     0.1777  

# covertype this-pr
RandomForest  44.7000s   0.3477s     0.0330  
ExtraTrees    33.8317s   0.5236s     0.0372  
CART          11.7058s   0.0702s     0.0424  
GBRT         435.0230s   0.2712s     0.1777  

@arjoly
Copy link
Member Author

arjoly commented Sep 7, 2015

Here the results on 20 newsgroup.

# master
random_forest     38.7310s   0.9190s     0.7645  
extra_trees       98.5542s   1.2527s     0.7965  

# this pr
random_forest     24.5683s   0.9274s     0.7645  
extra_trees       35.4025s   1.0758s     0.7965  

@arjoly
Copy link
Member Author

arjoly commented Sep 7, 2015

Overall speed up over the 3 datasets. The gain is impressive on 20newgroup and covertype.

@glouppe
Copy link
Contributor

glouppe commented Sep 7, 2015

The gain is expected for random cuts, where obviously it makes sense to compute statistics only from the shorter side of the cut. However, how do you explain this in the case of best splits? Shouldnt it be the same since we have to compute everything anyway? Or is it due to constant feature values at the middle of the interval? Could you benchmark master vs this-pr for a fully random problem? (X=np.random.rand())

@@ -138,6 +138,10 @@ cdef class Criterion:

pass

cdef void reverse_reset(self) nogil:
"""Reset the criterion at pos=end."""
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add This method must be implemented by the subclass., as we have for reset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I am not sure whether we should add such a method or pass a flag instead in reset? (I have not strong opinion about one or the other. Though, flat might be better than nested)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a method to avoid the if at each call of reset.

@glouppe
Copy link
Contributor

glouppe commented Sep 7, 2015

Except my comment above, +1 for merge. Nice trick :)

@arjoly
Copy link
Member Author

arjoly commented Sep 7, 2015

The gain is due to the nature of dataset for random forest. For the best_split, there is also the effect of FEATURE_THRESHOLD constant.

On the uniform benchmark, we lose a bit of performance (as much as when we added the min_weight_leaf parameters).

# rand-uniform with X ~ [0, 1.] ^ 100
random forest = 7.957109000000002
extra trees = 2.2590310000000002
cart = 1.619597999999998

# thispr rand-uniform with X ~ [0, 1.] ^ 100
random forest = 8.304107
extra trees = 2.220293
cart = 1.6509159999999987

The effect of FEATURE_THRESHOLD is particularly visible if X ~ [0, 10^-6]^100. edit: but we don't see much the effect of this pr, more that we are skipping possible splits altogether.

np.random.seed(0)
n_samples = 10 ** 4
X = np.random.uniform(0., 10 ** -6, size=(n_samples, 100))
y = np.random.uniform(size=n_samples)

# master rand-uniform with X ~ [0, 10^-6]^100
random forest = 0.483679
extra trees = 2.519623
cart = 0.0839139999999996

# thispr rand-uniform with X ~ [0, 10^-6]^100
random forest = 0.485565
extra trees = 2.413003
cart = 0.08735800000000005

# bench_random.py

import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.tree import DecisionTreeRegressor
try:
    from time import process_time as bench_time
except ImportError:
    from time import time as bench_time

np.random.seed(0)
n_samples = 10 ** 4
X = np.random.uniform(size=(n_samples, 100))
y = np.random.uniform(size=n_samples)
print("# rand-uniform ")
for name, EstimatorClass in  [("random forest", RandomForestRegressor),
                              ("extra trees", ExtraTreesRegressor),
                              ('cart', DecisionTreeRegressor)]:
    est = EstimatorClass(random_state=0)
    time_start = bench_time()
    est.fit(X, y)
    chrono = bench_time() - time_start
    print('%s = %s' % (name, chrono))

@arjoly
Copy link
Member Author

arjoly commented Sep 7, 2015

@glouppe I have added the comment.

@arjoly arjoly changed the title [MRG] Optimize criterion update by performing reverse update [MRG+1] Optimize criterion update by performing reverse update Sep 7, 2015
…rion

The idea is to take into account the linear properties of the impurity
between the prospective left, right and parents splits.
@arjoly arjoly force-pushed the fast-reverse-update branch from e475dac to 251fecc Compare September 7, 2015 12:02
@arjoly
Copy link
Member Author

arjoly commented Sep 7, 2015

I have added a what's new entry and squashed everything.

@jmschrei
Copy link
Member

jmschrei commented Sep 7, 2015

I think the majority of the speed gain comes from when you have a lot of constants in a feature (such as sparse data). The 20 news group makes sense to me, because you can imagine that if you have 100 values, of which the first 90 are 0, it's faster to calculate sufficient statistics backwards from 100 to 90 rather than from forward 0 to 90 to find a split.

@jmschrei
Copy link
Member

jmschrei commented Sep 7, 2015

I can also see why it may be slower on random dense data. There is a cost to doing the split, and it seems likely that this will be called towards the end of many splits. Caching w, yw, and yw**2 from previous splits seems like a good compliment to this optimization, as the "reset" won't actually cost anything other than the update step.

@arjoly
Copy link
Member Author

arjoly commented Sep 7, 2015

I think the majority of the speed gain comes from when you have a lot of constants in a feature (such as sparse data).

This pr should always speed up extra trees.

@arjoly
Copy link
Member Author

arjoly commented Sep 7, 2015

Caching w, yw, and yw**2 from previous splits seems like a good compliment to this optimization

+1

@arjoly
Copy link
Member Author

arjoly commented Sep 7, 2015

@jmschrei Do you give your +1?

@jmschrei
Copy link
Member

jmschrei commented Sep 7, 2015

Yes, this looks good. I didn't realize I could give a +1.

@glouppe
Copy link
Contributor

glouppe commented Sep 8, 2015

Thanks for the review @jmschrei :) Merging this one.

glouppe added a commit that referenced this pull request Sep 8, 2015
[MRG+1] Optimize criterion update by performing reverse update
@glouppe glouppe merged commit 59a1bdd into scikit-learn:master Sep 8, 2015
@arjoly
Copy link
Member Author

arjoly commented Sep 8, 2015

Great! 🍻

@jnothman
Copy link
Member

jnothman commented Sep 8, 2015

Yes, this looks good. I didn't realize I could give a +1.

Free speech! ... only a question of whether anyone will listen :)

On 8 September 2015 at 16:50, Arnaud Joly notifications@github.com wrote:

Great! [image: 🍻]


Reply to this email directly or view it on GitHub
#5220 (comment)
.

@mblondel
Copy link
Member

mblondel commented Sep 8, 2015

Very nice.

@arjoly arjoly deleted the fast-reverse-update branch September 8, 2015 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants