-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+1] Optimize criterion update by performing reverse update #5220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# Go through each sample in the new indexing | ||
for p in range(pos, new_pos): | ||
i = samples[p] | ||
if (new_pos - pos) <= (end - new_pos): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment here explaining the motivation of this trick? Why would it be more efficient to count backwards?
It's faster on mnist and covertype.
|
Here the results on 20 newsgroup.
|
Overall speed up over the 3 datasets. The gain is impressive on 20newgroup and covertype. |
The gain is expected for random cuts, where obviously it makes sense to compute statistics only from the shorter side of the cut. However, how do you explain this in the case of best splits? Shouldnt it be the same since we have to compute everything anyway? Or is it due to constant feature values at the middle of the interval? Could you benchmark master vs this-pr for a fully random problem? ( |
@@ -138,6 +138,10 @@ cdef class Criterion: | |||
|
|||
pass | |||
|
|||
cdef void reverse_reset(self) nogil: | |||
"""Reset the criterion at pos=end.""" | |||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add This method must be implemented by the subclass.
, as we have for reset
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I am not sure whether we should add such a method or pass a flag instead in reset
? (I have not strong opinion about one or the other. Though, flat might be better than nested)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a method to avoid the if at each call of reset
.
Except my comment above, +1 for merge. Nice trick :) |
The gain is due to the nature of dataset for random forest. For the best_split, there is also the effect of On the uniform benchmark, we lose a bit of performance (as much as when we added the min_weight_leaf parameters).
The effect of FEATURE_THRESHOLD is particularly visible if X ~ [0, 10^-6]^100. edit: but we don't see much the effect of this pr, more that we are skipping possible splits altogether.
# bench_random.py
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.tree import DecisionTreeRegressor
try:
from time import process_time as bench_time
except ImportError:
from time import time as bench_time
np.random.seed(0)
n_samples = 10 ** 4
X = np.random.uniform(size=(n_samples, 100))
y = np.random.uniform(size=n_samples)
print("# rand-uniform ")
for name, EstimatorClass in [("random forest", RandomForestRegressor),
("extra trees", ExtraTreesRegressor),
('cart', DecisionTreeRegressor)]:
est = EstimatorClass(random_state=0)
time_start = bench_time()
est.fit(X, y)
chrono = bench_time() - time_start
print('%s = %s' % (name, chrono)) |
@glouppe I have added the comment. |
…rion The idea is to take into account the linear properties of the impurity between the prospective left, right and parents splits.
e475dac
to
251fecc
Compare
I have added a what's new entry and squashed everything. |
I think the majority of the speed gain comes from when you have a lot of constants in a feature (such as sparse data). The 20 news group makes sense to me, because you can imagine that if you have 100 values, of which the first 90 are 0, it's faster to calculate sufficient statistics backwards from 100 to 90 rather than from forward 0 to 90 to find a split. |
I can also see why it may be slower on random dense data. There is a cost to doing the split, and it seems likely that this will be called towards the end of many splits. Caching w, yw, and yw**2 from previous splits seems like a good compliment to this optimization, as the "reset" won't actually cost anything other than the update step. |
This pr should always speed up extra trees. |
+1 |
@jmschrei Do you give your +1? |
Yes, this looks good. I didn't realize I could give a +1. |
Thanks for the review @jmschrei :) Merging this one. |
[MRG+1] Optimize criterion update by performing reverse update
Great! 🍻 |
Free speech! ... only a question of whether anyone will listen :) On 8 September 2015 at 16:50, Arnaud Joly notifications@github.com wrote:
|
Very nice. |
The speed up idea is to update the left and the right node statistics from end to new_pos instead of from pos to new_pos if it's beneficial (end - new_pos < new_pos - pos).
Benchmarks are coming.