Fix Kd-tree time complexity problem. #11103

jiefangxuanyan · 2018-05-17T09:03:47Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

The original selection algorithm used by Scikit-learn's Kd-tree may suffer from time complexity degeneracy on some special cases. Two obvious cases are sorted inputs and duplicate inputs. Fortunately, there is a efficient and robost intro-select implementation in C++ STL which can handle our problem perfectly. So I replaced the original implementation with a wrapped call to std::nth_element.

rth · 2018-05-17T12:06:33Z

Thank you @jiefangxuanyan, this looks nice.

Replacing a homemade partial quicksort implementation with a standard C++ function call sounds reasonable.

Could you please provide some performance benchmarks both in the general case and the degenerate one you mentionned?

jnothman

Ping @jakevdp

jnothman · 2018-05-17T22:21:40Z

How does that compare with using numpy.partition?

… original partition algorithm.

jiefangxuanyan · 2018-06-08T09:01:03Z

Sorry for my late reply. I added a simple benchmark including a general case and cases causing time complexity degeneration. All of the cases should be able to finish in around or less than 0.1 second. But the old implementation uses several seconds.

TomDLT · 2018-06-08T09:27:43Z

Great ! On my desktop:

# this PR
random: 0.03181624412536621
ordered: 0.024422883987426758
reverse ordered: 0.025389671325683594
duplicated: 0.022417306900024414

# master
random: 0.08892083168029785
ordered: 3.657707691192627
reverse ordered: 3.7428951263427734
duplicated: 2.4862685203552246

I am not sure we need to keep the benchmark though, since it's useful only for this PR.

jiefangxuanyan · 2018-07-18T05:59:30Z

@jnothman I didn't use numpy.partition because it doesn't support partitioning an array of index. There are two ways to solve this. One is to gather values from the data array with indices, use np.argpartition, then apply a permutation composition to reorder the indices. The other is to rewrite the tree building function entirely. But I think these two ways are all too complex.

jiefangxuanyan · 2018-07-18T06:05:34Z

@TomDLT
@rth asked me to write a benchmark for this. If it's not useful for further development I can also exclude it from my commit. For my personal opinion I think it's good to verify our time complexity in case of some one else want's to modify these code later.

rth

I agree -- let's remove this benchmark script from the PR. Posting it in a comment of this PR might I think be sufficient.

Could you please provide the benchmarks results with leaf_size=40 (default)?

How does this impart the query time? Could we benchmark that as well?

Generally this approach looks reasonable to me but I have close to no experience with C++ (STL) and it would be good if this was reviewed by someone who did. Maybe @jakirkham ?

rth · 2018-07-19T00:37:44Z

benchmarks/bench_binary_tree_partition.py

+        begin = time()
+        tree = KDTree(expanded_case, leaf_size=1)
+        end = time()
+        del tree


garbage collector will take care of this, so, I think, we can remove it.

rth · 2018-07-19T00:38:43Z

benchmarks/bench_binary_tree_partition.py

+
+
+if __name__ == "__main__":
+    main()


I would just write the benchmark script directly without defining the main function

jakirkham · 2018-07-19T17:02:02Z

I'll try to take a look. Going down a valgrind rabbit hole ATM.

Could you please give a high level summary of this code and change? Am not too familiar with either. So would need that to give a review.

sturlamolden · 2020-03-15T14:34:57Z

Similar code has been merged in SciPy at scipy/scipy#11653. We amended the index comparator to ensure the partial sort is stable.

jnothman

Thanks @sturlamolden for the reminder. @jiefangxuanyan are you able to finish this up for us? Thanks

jnothman · 2018-06-09T11:10:49Z

sklearn/neighbors/src/nth_element.cpp

@@ -0,0 +1,31 @@
+#include "nth_element.h"


Is it possible to write this in Cython?

std::nth_element requires a functor comparator. This is not easy to implement in Cython.

jnothman · 2020-03-18T13:31:38Z

sklearn/neighbors/src/nth_element.h

+                           Py_intptr_t split_index,
+                           Py_intptr_t n_features,
+                           Py_intptr_t n_points);
+


The Scipy PR (scipy/scipy#11653) put the comparator in the .h file. Do you think there would be any benefit in this?

I switched to a template based implementation of partition_node_indices. It can only be place in a header file.

jnothman · 2020-03-18T13:33:19Z

sklearn/neighbors/nth_element.pyx

@@ -0,0 +1,9 @@
+from typedefs cimport DTYPE_t, ITYPE_t
+
+cdef extern from "nth_element.h":


Why do we not just define this extern directly in binary_tree.pxi?

My original definition is fine to be put in binary_tree.pxi. But in my recent changes partition_node_indices is based on template now. The new nth_element.pyx is actually a bridge between Cython and C++ code. Without this, I'll have to change _kd_tree.pyx and _ball_tree.pyx to C++ mode.

Without this, I'll have to change _kd_tree.pyx and _ball_tree.pyx to C++ mode.

Excuse my ignorance... What risks come with changing them to C++ mode?

jiefangxuanyan · 2020-03-18T15:27:52Z

@jnothman OK, I'll work on it this weekend.

# Conflicts: # sklearn/neighbors/setup.py

jiefangxuanyan · 2020-04-11T10:43:05Z

@jnothman I struggled on this for several weekends. And I found it's not easy to write a std::nth_element compatible comparator with Cython. Maybe I have to write it with raw C++ or re-write a Cython version of std::nth_element by myself. How do you think about it?

1. Add "_" in the file name; 2. Make the comparator stable; 3. Use a template implementation of partition_node_indices, the original one has hard-coded types(DTYPE_t -> double, ITYPE_t -> Py_intptr_t) which would be easily broken when these type changes.

sturlamolden · 2020-04-11T17:00:49Z

What exactly is difficult?

If you decide to replace quickselect with introselect: You can find C code in NumPy (for numpy.partition) which is simple to translate to Cython. Be careful not to copy from GCC standard libaries as they are GPL (with an exception for binaries).

numpy.partition and numpy.argpartition are introselect. Will using any of those be a major overhead?

sturlamolden · 2020-04-11T17:18:32Z

https://github.com/numpy/numpy/blob/master/numpy/core/src/npysort/selection.c.src

jnothman · 2020-04-19T14:48:55Z

I agree, @sturlamolden, that it may be possible to more directly use numpy.argpartition or the C-API PyArray_ArgPartition here. The only problem is that we've currently got pointers which then need to be wrapped in an array.

@jiefangxuanyan, we would be keen to avoid adding C++ code here if possible. Would you be able to try out using numpy's introselect implementation?

sturlamolden · 2020-04-19T22:16:20Z

Creating an ndarray from a pointer is easy, either PyArray_SimpleNewFromData or using some Python (after creating a Python int from the data adress, after casting it to void* and then to Py_uintptr_t in Cython).

class PointerWrapper:
    def __init__(self, pointer, shape, dtype, order, strides):
        self.__array_interface__ = dict(
            data = (pointer, False),
            descr = dtype.descr,
            shape = shape,
            strides = strides,
            typestr = dtype.str,
            version = 3,
        )

def wrap_pointer(pointer, shape, dtype, order, strides):
    return np.asarray(PointerWrapper(pointer, shape, dtype, order, strides))

Using NumPy C API (PyArray_SimpleNewFromData) has smaller overhead though.

sturlamolden · 2020-04-19T22:35:54Z

Here is an improved version of introselect in C++, which is very easy to translate to Cython or C. I just looked through the C++ code and C++ features (like templates and auto) are just used for generics. It has BOOST license so it should be fine to include in Scikit-learn.

https://github.com/andralex/MedianOfNinthers

Corresponding paper:
http://erdani.com/research/sea2017.pdf

jnothman · 2020-04-20T01:11:53Z

Why would we not use numpy's?

jnothman · 2020-04-22T14:25:11Z

I put together a numpy-based solution in master...jnothman:partition_node_indices

@jiefangxuanyan, could you please present some benchmarks comparing (a) master; (b) your solution; (c) my partition_node_indices. It should compare timing for ordinary cases, as well as the special cases you describe where a sort was costly.

Are you okay to give that a go?

jjerphan · 2021-01-26T17:41:33Z

Hi @jiefangxuanyan,
Are you willing to continue to work on this PR or can I supersede it? 🙂

sturlamolden · 2021-01-26T21:42:25Z

@jnothman I struggled on this for several weekends. And I found it's not easy to write a std::nth_element compatible comparator with Cython. Maybe I have to write it with raw C++ or re-write a Cython version of std::nth_element by myself. How do you think about it?

@jiefangxuanyan, we would be keen to avoid adding C++ code here if possible. Would you be able to try out using numpy's introselect implementation?

You can embed C or C++ code in a Cython file, similarly to using inline assembly in C or C++. You do not have to use an extra header file to write a C++ comparator in Cython. Only the comparator class will have to be written in C++ – and it just a tiny class (about 10 loc or so). Just throw out the extra header file and inline the C++ comparator class directly into the Cython source file.

sturlamolden · 2021-01-26T21:46:30Z

And this is how you do it...

# distutils : language = c++

cdef extern from *:
    “””
    // inlined C++ code goes here
    “””

sturlamolden · 2021-01-26T21:58:03Z

You can see here from scipy.spatrial.cKDTree what std::nth_element will do compared to quickselect:

scipy/scipy#11679

Notice the performance improvement on the sorted dataset.

jjerphan · 2021-01-27T11:23:19Z

I put together a numpy-based solution in master...jnothman:partition_node_indices

@jiefangxuanyan, could you please present some benchmarks comparing (a) master; (b) your solution; (c) my partition_node_indices. It should compare timing for ordinary cases, as well as the special cases you describe where a sort was costly.

Are you okay to give that a go?

As a first start, here are the results that I got with the script on main's current state:

on main (6a9c286)

random: 0.06900787353515625s
ordered: 3.3418073654174805s
reverse ordered: 3.6653695106506348s
duplicated: 2.4857819080352783s

on partition_node_indices after merging main (5f2633b)

random: 0.24102473258972168s
ordered: 0.23180484771728516s
reverse ordered: 0.23183298110961914s
duplicated: 0.22904562950134277s

on cpp-nth-element after merging main (d2f0f37)

random: 0.03429245948791504s
ordered: 0.020520448684692383s
reverse ordered: 0.017237424850463867s
duplicated: 0.012929916381835938s

@jiefangxuanyan's suggestion seems to be a good proposal, though it needs to be benchmarked on more cases.

I propose to supersede it, including @sturlamolden's comments if you all agree. 🙂

jjerphan · 2021-02-16T23:06:48Z

I've done a quick follow-up in #19473 with some asv benchmarks.

thomasjpfan · 2021-04-09T01:04:51Z

Closing because this is fixed in #19473

Use std::nth_element from C++ standard library.

82eb949

jnothman reviewed May 17, 2018

View reviewed changes

jiefangxuanyan added 2 commits June 8, 2018 15:07

Add a benchmark to expose time complexity degeneration problem of the…

076069a

… original partition algorithm.

Added a general case for binary tree partition algorithm benchmark.

3c34c7c

rth mentioned this pull request Jul 19, 2018

BUG: fix KD Tree node construction #11556

Merged

rth reviewed Jul 19, 2018

View reviewed changes

amueller added the Stalled label Aug 5, 2019

99991 mentioned this pull request Feb 26, 2020

scipy.spatial.cKDTree construction slow for some datasets scipy/scipy#11595

Closed

github-actions bot added the module:neighbors label Mar 2, 2020

scidam mentioned this pull request Mar 9, 2020

PERF: Switch to C++ STL std::nth_element scipy/scipy#11653

Merged

jnothman reviewed Mar 18, 2020

View reviewed changes

Merge branch 'master' into cpp-nth-element

5c1f40e

# Conflicts: # sklearn/neighbors/setup.py

Base automatically changed from master to main January 22, 2021 10:50

jjerphan mentioned this pull request Feb 16, 2021

ENH Improve the creation of KDTree and BallTree on their worst-case time complexity #19473

Merged

thomasjpfan closed this Apr 9, 2021

		@@ -0,0 +1,9 @@
		from typedefs cimport DTYPE_t, ITYPE_t

		cdef extern from "nth_element.h":

Fix Kd-tree time complexity problem. #11103

Fix Kd-tree time complexity problem. #11103

Conversation

jiefangxuanyan commented May 17, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

rth commented May 17, 2018

jnothman left a comment

Choose a reason for hiding this comment

jnothman commented May 17, 2018

jiefangxuanyan commented Jun 8, 2018

TomDLT commented Jun 8, 2018

jiefangxuanyan commented Jul 18, 2018

jiefangxuanyan commented Jul 18, 2018

rth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakirkham commented Jul 19, 2018

sturlamolden commented Mar 15, 2020 • edited by jnothman Loading

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiefangxuanyan commented Mar 18, 2020

jiefangxuanyan commented Apr 11, 2020

sturlamolden commented Apr 11, 2020

sturlamolden commented Apr 11, 2020

jnothman commented Apr 19, 2020

sturlamolden commented Apr 19, 2020 • edited Loading

sturlamolden commented Apr 19, 2020

jnothman commented Apr 20, 2020

jnothman commented Apr 22, 2020

jjerphan commented Jan 26, 2021

sturlamolden commented Jan 26, 2021

sturlamolden commented Jan 26, 2021

sturlamolden commented Jan 26, 2021

jjerphan commented Jan 27, 2021

jjerphan commented Feb 16, 2021

thomasjpfan commented Apr 9, 2021

sturlamolden commented Mar 15, 2020 •

edited by jnothman

Loading

sturlamolden commented Apr 19, 2020 •

edited

Loading