-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
BUG: fix KD Tree node construction #11556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
do you have a tiny bench script to share? |
No, I could create one if it would be useful |
it would be great thanks
|
Here's a quick test: from sklearn.datasets import make_blobs
from sklearn.neighbors import KDTree
import sklearn
print(sklearn.__version__)
X, y = make_blobs(n_samples=100000, n_features=3)
X1 = X[:50000]
X2 = X[50000:]
%timeit KDTree(X1).query(X2, 1) Current release:
On this branch:
|
Exploring a bit more... the interesting thing is that the use of the correct radius does not seem to appreciably impact the query time... as currently written, it ends up being essentially a proxy for number of points in the node. We could probably explicitly rank query nodes by number of points rather than by radius and marginally improve both memory use (from not having to store each node's radius) as well as tree construction time (from not having to compute each node's radius), without an appreciable impact on query time in most cases... but that's probably for another PR. from sklearn.datasets import make_blobs
from sklearn.neighbors import KDTree
import sklearn
print(sklearn.__version__)
X, y = make_blobs(n_samples=100000, n_features=3)
X1 = X[:50000]
X2 = X[50000:]
%timeit KDTree(X1)
tree = KDTree(X1)
%timeit tree.query(X2, 5)
|
Appveyor had failed. I've restarted. |
@jakevdp can you just add a what's new entry? thx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks a lot for this fix @jakevdp !
I'm not sure if it's worth waiting for Appveyor or merge as it is, given the issues we have been having lately with it.
This PR is soon to be processed by appveyor, let's wait a bit. |
can someone restart appveyor here? |
rebased to restart CIs |
Thanks a lot for this fix @jakevdp ! Appveyor passed, merging. BTW, if you have an opinion on #11103 and #597 (comment) it would be very much appreciated. |
Fixes #11313
The issue is in pre-computation of KD Tree node radii, which are used during querying to rank the queue of nodes to be split & explored. As such, I think the main effect of this bug will be related to performance – this is why our tests didn't catch it.