Skip to content

Fix Kd-tree time complexity problem. #11103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jiefangxuanyan
Copy link
Contributor

Reference Issues/PRs

Fixes #7687.

What does this implement/fix? Explain your changes.

The original selection algorithm used by Scikit-learn's Kd-tree may suffer from time complexity degeneracy on some special cases. Two obvious cases are sorted inputs and duplicate inputs. Fortunately, there is a efficient and robost intro-select implementation in C++ STL which can handle our problem perfectly. So I replaced the original implementation with a wrapped call to std::nth_element.

@rth
Copy link
Member

rth commented May 17, 2018

Thank you @jiefangxuanyan, this looks nice.

Replacing a homemade partial quicksort implementation with a standard C++ function call sounds reasonable.

Could you please provide some performance benchmarks both in the general case and the degenerate one you mentionned?

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping @jakevdp

@jnothman
Copy link
Member

How does that compare with using numpy.partition?

@jiefangxuanyan
Copy link
Contributor Author

Sorry for my late reply. I added a simple benchmark including a general case and cases causing time complexity degeneration. All of the cases should be able to finish in around or less than 0.1 second. But the old implementation uses several seconds.

@TomDLT
Copy link
Member

TomDLT commented Jun 8, 2018

Great ! On my desktop:

# this PR
random: 0.03181624412536621
ordered: 0.024422883987426758
reverse ordered: 0.025389671325683594
duplicated: 0.022417306900024414

# master
random: 0.08892083168029785
ordered: 3.657707691192627
reverse ordered: 3.7428951263427734
duplicated: 2.4862685203552246

I am not sure we need to keep the benchmark though, since it's useful only for this PR.

@jiefangxuanyan
Copy link
Contributor Author

@jnothman I didn't use numpy.partition because it doesn't support partitioning an array of index. There are two ways to solve this. One is to gather values from the data array with indices, use np.argpartition, then apply a permutation composition to reorder the indices. The other is to rewrite the tree building function entirely. But I think these two ways are all too complex.

@jiefangxuanyan
Copy link
Contributor Author

@TomDLT
@rth asked me to write a benchmark for this. If it's not useful for further development I can also exclude it from my commit. For my personal opinion I think it's good to verify our time complexity in case of some one else want's to modify these code later.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree -- let's remove this benchmark script from the PR. Posting it in a comment of this PR might I think be sufficient.

Could you please provide the benchmarks results with leaf_size=40 (default)?

How does this impart the query time? Could we benchmark that as well?

Generally this approach looks reasonable to me but I have close to no experience with C++ (STL) and it would be good if this was reviewed by someone who did. Maybe @jakirkham ?

begin = time()
tree = KDTree(expanded_case, leaf_size=1)
end = time()
del tree
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

garbage collector will take care of this, so, I think, we can remove it.



if __name__ == "__main__":
main()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just write the benchmark script directly without defining the main function

@jakirkham
Copy link
Contributor

I'll try to take a look. Going down a valgrind rabbit hole ATM.

Could you please give a high level summary of this code and change? Am not too familiar with either. So would need that to give a review.

@sturlamolden
Copy link
Contributor

sturlamolden commented Mar 15, 2020

Similar code has been merged in SciPy at scipy/scipy#11653. We amended the index comparator to ensure the partial sort is stable.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sturlamolden for the reminder. @jiefangxuanyan are you able to finish this up for us? Thanks

@@ -0,0 +1,31 @@
#include "nth_element.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to write this in Cython?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::nth_element requires a functor comparator. This is not easy to implement in Cython.

Py_intptr_t split_index,
Py_intptr_t n_features,
Py_intptr_t n_points);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Scipy PR (scipy/scipy#11653) put the comparator in the .h file. Do you think there would be any benefit in this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to a template based implementation of partition_node_indices. It can only be place in a header file.

@@ -0,0 +1,9 @@
from typedefs cimport DTYPE_t, ITYPE_t

cdef extern from "nth_element.h":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we not just define this extern directly in binary_tree.pxi?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original definition is fine to be put in binary_tree.pxi. But in my recent changes partition_node_indices is based on template now. The new nth_element.pyx is actually a bridge between Cython and C++ code. Without this, I'll have to change _kd_tree.pyx and _ball_tree.pyx to C++ mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, I'll have to change _kd_tree.pyx and _ball_tree.pyx to C++ mode.

Excuse my ignorance... What risks come with changing them to C++ mode?

@jiefangxuanyan
Copy link
Contributor Author

@jnothman OK, I'll work on it this weekend.

# Conflicts:
#	sklearn/neighbors/setup.py
@jiefangxuanyan
Copy link
Contributor Author

@jnothman I struggled on this for several weekends. And I found it's not easy to write a std::nth_element compatible comparator with Cython. Maybe I have to write it with raw C++ or re-write a Cython version of std::nth_element by myself. How do you think about it?

1. Add "_" in the file name;
2. Make the comparator stable;
3. Use a template implementation of partition_node_indices, the original one has hard-coded types(DTYPE_t -> double, ITYPE_t -> Py_intptr_t) which would be easily broken when these type changes.
@sturlamolden
Copy link
Contributor

What exactly is difficult?

If you decide to replace quickselect with introselect: You can find C code in NumPy (for numpy.partition) which is simple to translate to Cython. Be careful not to copy from GCC standard libaries as they are GPL (with an exception for binaries).

numpy.partition and numpy.argpartition are introselect. Will using any of those be a major overhead?

@sturlamolden
Copy link
Contributor

@jnothman
Copy link
Member

I agree, @sturlamolden, that it may be possible to more directly use numpy.argpartition or the C-API PyArray_ArgPartition here. The only problem is that we've currently got pointers which then need to be wrapped in an array.

@jiefangxuanyan, we would be keen to avoid adding C++ code here if possible. Would you be able to try out using numpy's introselect implementation?

@sturlamolden
Copy link
Contributor

sturlamolden commented Apr 19, 2020

Creating an ndarray from a pointer is easy, either PyArray_SimpleNewFromData or using some Python (after creating a Python int from the data adress, after casting it to void* and then to Py_uintptr_t in Cython).

class PointerWrapper:
    def __init__(self, pointer, shape, dtype, order, strides):
        self.__array_interface__ = dict(
            data = (pointer, False),
            descr = dtype.descr,
            shape = shape,
            strides = strides,
            typestr = dtype.str,
            version = 3,
        )

def wrap_pointer(pointer, shape, dtype, order, strides):
    return np.asarray(PointerWrapper(pointer, shape, dtype, order, strides))

Using NumPy C API (PyArray_SimpleNewFromData) has smaller overhead though.

@sturlamolden
Copy link
Contributor

Here is an improved version of introselect in C++, which is very easy to translate to Cython or C. I just looked through the C++ code and C++ features (like templates and auto) are just used for generics. It has BOOST license so it should be fine to include in Scikit-learn.

https://github.com/andralex/MedianOfNinthers

Corresponding paper:
http://erdani.com/research/sea2017.pdf

@jnothman
Copy link
Member

Why would we not use numpy's?

@jnothman
Copy link
Member

I put together a numpy-based solution in master...jnothman:partition_node_indices

@jiefangxuanyan, could you please present some benchmarks comparing (a) master; (b) your solution; (c) my partition_node_indices. It should compare timing for ordinary cases, as well as the special cases you describe where a sort was costly.

Are you okay to give that a go?

Base automatically changed from master to main January 22, 2021 10:50
@jjerphan
Copy link
Member

Hi @jiefangxuanyan,
Are you willing to continue to work on this PR or can I supersede it? 🙂

@sturlamolden
Copy link
Contributor

@jnothman I struggled on this for several weekends. And I found it's not easy to write a std::nth_element compatible comparator with Cython. Maybe I have to write it with raw C++ or re-write a Cython version of std::nth_element by myself. How do you think about it?

@jiefangxuanyan, we would be keen to avoid adding C++ code here if possible. Would you be able to try out using numpy's introselect implementation?

You can embed C or C++ code in a Cython file, similarly to using inline assembly in C or C++. You do not have to use an extra header file to write a C++ comparator in Cython. Only the comparator class will have to be written in C++ – and it just a tiny class (about 10 loc or so). Just throw out the extra header file and inline the C++ comparator class directly into the Cython source file.

@sturlamolden
Copy link
Contributor

And this is how you do it...

# distutils : language = c++

cdef extern from *:
    “””
    // inlined C++ code goes here
    “””

@sturlamolden
Copy link
Contributor

You can see here from scipy.spatrial.cKDTree what std::nth_element will do compared to quickselect:

scipy/scipy#11679

Notice the performance improvement on the sorted dataset.

@jjerphan
Copy link
Member

I put together a numpy-based solution in master...jnothman:partition_node_indices

@jiefangxuanyan, could you please present some benchmarks comparing (a) master; (b) your solution; (c) my partition_node_indices. It should compare timing for ordinary cases, as well as the special cases you describe where a sort was costly.

Are you okay to give that a go?

As a first start, here are the results that I got with the script on main's current state:

random: 0.06900787353515625s
ordered: 3.3418073654174805s
reverse ordered: 3.6653695106506348s
duplicated: 2.4857819080352783s
  • on partition_node_indices after merging main (5f2633b)
random: 0.24102473258972168s
ordered: 0.23180484771728516s
reverse ordered: 0.23183298110961914s
duplicated: 0.22904562950134277s
  • on cpp-nth-element after merging main (d2f0f37)
random: 0.03429245948791504s
ordered: 0.020520448684692383s
reverse ordered: 0.017237424850463867s
duplicated: 0.012929916381835938s

@jiefangxuanyan's suggestion seems to be a good proposal, though it needs to be benchmarked on more cases.

I propose to supersede it, including @sturlamolden's comments if you all agree. 🙂

@jjerphan
Copy link
Member

I've done a quick follow-up in #19473 with some asv benchmarks.

@thomasjpfan
Copy link
Member

Closing because this is fixed in #19473

@thomasjpfan thomasjpfan closed this Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sklearn.neighbors.KDTree complexity for building is not O(n(k+log(n))
9 participants