-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Fix Kd-tree time complexity problem. #11103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you @jiefangxuanyan, this looks nice. Replacing a homemade partial quicksort implementation with a standard C++ function call sounds reasonable. Could you please provide some performance benchmarks both in the general case and the degenerate one you mentionned? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping @jakevdp
How does that compare with using numpy.partition? |
Sorry for my late reply. I added a simple benchmark including a general case and cases causing time complexity degeneration. All of the cases should be able to finish in around or less than 0.1 second. But the old implementation uses several seconds. |
Great ! On my desktop: # this PR
random: 0.03181624412536621
ordered: 0.024422883987426758
reverse ordered: 0.025389671325683594
duplicated: 0.022417306900024414
# master
random: 0.08892083168029785
ordered: 3.657707691192627
reverse ordered: 3.7428951263427734
duplicated: 2.4862685203552246 I am not sure we need to keep the benchmark though, since it's useful only for this PR. |
@jnothman I didn't use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree -- let's remove this benchmark script from the PR. Posting it in a comment of this PR might I think be sufficient.
Could you please provide the benchmarks results with leaf_size=40
(default)?
How does this impart the query time? Could we benchmark that as well?
Generally this approach looks reasonable to me but I have close to no experience with C++ (STL) and it would be good if this was reviewed by someone who did. Maybe @jakirkham ?
begin = time() | ||
tree = KDTree(expanded_case, leaf_size=1) | ||
end = time() | ||
del tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
garbage collector will take care of this, so, I think, we can remove it.
|
||
|
||
if __name__ == "__main__": | ||
main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just write the benchmark script directly without defining the main
function
I'll try to take a look. Going down a valgrind rabbit hole ATM. Could you please give a high level summary of this code and change? Am not too familiar with either. So would need that to give a review. |
Similar code has been merged in SciPy at scipy/scipy#11653. We amended the index comparator to ensure the partial sort is stable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sturlamolden for the reminder. @jiefangxuanyan are you able to finish this up for us? Thanks
@@ -0,0 +1,31 @@ | |||
#include "nth_element.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to write this in Cython?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::nth_element
requires a functor comparator. This is not easy to implement in Cython.
sklearn/neighbors/src/nth_element.h
Outdated
Py_intptr_t split_index, | ||
Py_intptr_t n_features, | ||
Py_intptr_t n_points); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Scipy PR (scipy/scipy#11653) put the comparator in the .h
file. Do you think there would be any benefit in this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to a template based implementation of partition_node_indices
. It can only be place in a header file.
sklearn/neighbors/nth_element.pyx
Outdated
@@ -0,0 +1,9 @@ | |||
from typedefs cimport DTYPE_t, ITYPE_t | |||
|
|||
cdef extern from "nth_element.h": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we not just define this extern directly in binary_tree.pxi
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original definition is fine to be put in binary_tree.pxi
. But in my recent changes partition_node_indices
is based on template now. The new nth_element.pyx
is actually a bridge between Cython and C++ code. Without this, I'll have to change _kd_tree.pyx
and _ball_tree.pyx
to C++ mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this, I'll have to change _kd_tree.pyx and _ball_tree.pyx to C++ mode.
Excuse my ignorance... What risks come with changing them to C++ mode?
@jnothman OK, I'll work on it this weekend. |
# Conflicts: # sklearn/neighbors/setup.py
@jnothman I struggled on this for several weekends. And I found it's not easy to write a |
1. Add "_" in the file name; 2. Make the comparator stable; 3. Use a template implementation of partition_node_indices, the original one has hard-coded types(DTYPE_t -> double, ITYPE_t -> Py_intptr_t) which would be easily broken when these type changes.
What exactly is difficult? If you decide to replace quickselect with introselect: You can find C code in NumPy (for numpy.partition) which is simple to translate to Cython. Be careful not to copy from GCC standard libaries as they are GPL (with an exception for binaries). numpy.partition and numpy.argpartition are introselect. Will using any of those be a major overhead? |
I agree, @sturlamolden, that it may be possible to more directly use numpy.argpartition or the C-API PyArray_ArgPartition here. The only problem is that we've currently got pointers which then need to be wrapped in an array. @jiefangxuanyan, we would be keen to avoid adding C++ code here if possible. Would you be able to try out using numpy's introselect implementation? |
Creating an ndarray from a pointer is easy, either class PointerWrapper:
def __init__(self, pointer, shape, dtype, order, strides):
self.__array_interface__ = dict(
data = (pointer, False),
descr = dtype.descr,
shape = shape,
strides = strides,
typestr = dtype.str,
version = 3,
)
def wrap_pointer(pointer, shape, dtype, order, strides):
return np.asarray(PointerWrapper(pointer, shape, dtype, order, strides)) Using NumPy C API ( |
Here is an improved version of introselect in C++, which is very easy to translate to Cython or C. I just looked through the C++ code and C++ features (like templates and auto) are just used for generics. It has BOOST license so it should be fine to include in Scikit-learn. https://github.com/andralex/MedianOfNinthers Corresponding paper: |
Why would we not use numpy's? |
I put together a numpy-based solution in master...jnothman:partition_node_indices @jiefangxuanyan, could you please present some benchmarks comparing (a) master; (b) your solution; (c) my Are you okay to give that a go? |
Hi @jiefangxuanyan, |
You can embed C or C++ code in a Cython file, similarly to using inline assembly in C or C++. You do not have to use an extra header file to write a C++ comparator in Cython. Only the comparator class will have to be written in C++ – and it just a tiny class (about 10 loc or so). Just throw out the extra header file and inline the C++ comparator class directly into the Cython source file. |
And this is how you do it... # distutils : language = c++
cdef extern from *:
“””
// inlined C++ code goes here
“””
|
You can see here from scipy.spatrial.cKDTree what std::nth_element will do compared to quickselect: Notice the performance improvement on the sorted dataset. |
As a first start, here are the results that I got with the script on
@jiefangxuanyan's suggestion seems to be a good proposal, though it needs to be benchmarked on more cases. I propose to supersede it, including @sturlamolden's comments if you all agree. 🙂 |
I've done a quick follow-up in #19473 with some asv benchmarks. |
Closing because this is fixed in #19473 |
Reference Issues/PRs
Fixes #7687.
What does this implement/fix? Explain your changes.
The original selection algorithm used by Scikit-learn's Kd-tree may suffer from time complexity degeneracy on some special cases. Two obvious cases are sorted inputs and duplicate inputs. Fortunately, there is a efficient and robost intro-select implementation in C++ STL which can handle our problem perfectly. So I replaced the original implementation with a wrapped call to
std::nth_element
.