Skip to content

ENH Improve the creation of KDTree and BallTree on their worst-case time complexity #19473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Apr 8, 2021

Conversation

jjerphan
Copy link
Member

@jjerphan jjerphan commented Feb 16, 2021

Reference Issues/PRs

Supersedes #11103.
Fixes #7687.

What does this implement/fix? Explain your changes.

The creation of KDTree and BallTree (which relies on partial quick-sort) suffers from higher worst-case time complexity of $\mathcal{O}(n^2)$ on some edges cases (for instance on sorted inputs and duplicate inputs).

The C++ std::nth_element algorithm is a better alternative in those case as it relies on a intro-select algorithm which has a worst case time complexity of $\mathcal{O}(n)$.

This PR replaces this first algorithm with this second using a comparator and a minimal Cython wrapping.

jiefangxuanyan and others added 8 commits May 17, 2018 16:55
# Conflicts:
#	sklearn/neighbors/setup.py
1. Add "_" in the file name;
2. Make the comparator stable;
3. Use a template implementation of partition_node_indices, the original one has hard-coded types(DTYPE_t -> double, ITYPE_t -> Py_intptr_t) which would be easily broken when these type changes.
@jjerphan
Copy link
Member Author

jjerphan commented Feb 16, 2021

Using a simple ASV benchmark (to put in asv_benchmarks/benchmarks/ for reproducibility), I obtain the following results to compare main (at b251f3f) and this branch (at 719e6c8).

There is a significant improvement forKDTrees' creation and a marginal one when querying them.

KDTree creation

$ asv run -b KDTreeCreation  
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl.
·· Installing 719e6c8f <cpp-nth-element> into conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl.
· Running 2 total benchmarks (2 commits * 1 environments * 1 benchmarks)
[  0.00%] · For scikit-learn commit 719e6c8f <cpp-nth-element>:
[  0.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl
[ 50.00%] ··· binary_tree.KDTreeCreation.time_creation                                                                          ok
[ 50.00%] ··· ======= ================= ============ ============= ============= =============
              --                                              leaf_size                       
              ------------------------- ------------------------------------------------------
                 n       matrix_type         1             20            40           100     
              ======= ================= ============ ============= ============= =============
                 10         random        80.1±3μs      79.0±2μs      79.3±4μs      81.2±3μs  
                 10        ordered        85.8±5μs      82.2±4μs      79.7±3μs      80.4±3μs  
                 10    reverse_ordered    83.6±2μs      83.6±7μs      85.3±7μs      81.4±2μs  
                 10       duplicated      84.7±3μs      88.9±4μs      85.8±3μs      84.4±2μs  
                100         random        110±3μs       100±7μs       87.1±3μs      85.1±2μs  
                100        ordered        120±10μs      91.1±3μs      87.8±2μs      84.2±2μs  
                100    reverse_ordered    110±3μs       89.4±2μs      86.2±2μs      88.4±2μs  
                100       duplicated      109±7μs       88.5±2μs      88.5±2μs      85.3±3μs  
                1000        random        455±20μs      238±7μs       206±7μs       180±5μs   
                1000       ordered        357±10μs      187±6μs       162±4μs       152±5μs   
                1000   reverse_ordered    375±10μs      204±20μs      173±7μs       153±7μs   
                1000      duplicated      295±20μs      181±8μs       157±4μs       140±3μs   
               10000        random       4.60±0.1ms   2.14±0.05ms   1.87±0.05ms   1.70±0.05ms 
               10000       ordered       3.50±0.2ms   1.52±0.04ms   1.31±0.04ms   1.11±0.04ms 
               10000   reverse_ordered   3.62±0.1ms    1.59±0.2ms    1.53±0.1ms    1.24±0.1ms 
               10000      duplicated     3.00±0.3ms   1.44±0.05ms   1.24±0.04ms   1.07±0.03ms 
              ======= ================= ============ ============= ============= =============

[ 50.00%] · For scikit-learn commit b251f3f8 <main>:
[ 50.00%] ·· Building for conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl..
[ 50.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl
[100.00%] ··· binary_tree.KDTreeCreation.time_creation                                                                          ok
[100.00%] ··· ======= ================= ============= ============= ============= =============
              --                                               leaf_size                       
              ------------------------- -------------------------------------------------------
                 n       matrix_type          1             20            40           100     
              ======= ================= ============= ============= ============= =============
                 10         random         84.7±2μs      82.9±2μs     88.0±10μs      83.8±3μs  
                 10        ordered         83.0±1μs      86.3±5μs      86.3±4μs      84.3±3μs  
                 10    reverse_ordered    94.3±10μs      83.4±2μs      82.5±2μs      83.4±2μs  
                 10       duplicated       85.7±2μs      83.7±3μs      82.7±2μs      83.4±2μs  
                100         random         111±3μs       88.9±2μs      84.3±2μs      81.5±2μs  
                100        ordered         116±2μs       100±3μs       93.2±2μs      84.0±2μs  
                100    reverse_ordered     120±3μs       101±3μs       95.4±2μs      83.2±2μs  
                100       duplicated       111±3μs       95.0±2μs      90.0±2μs      82.4±3μs  
                1000        random         492±20μs      331±20μs      278±10μs      220±10μs  
                1000       ordered       1.79±0.05ms   1.63±0.03ms   1.60±0.04ms   1.49±0.03ms 
                1000   reverse_ordered   1.94±0.05ms   1.77±0.04ms    1.74±0.1ms    3.00±0.5ms 
                1000      duplicated      3.02±0.2ms    1.31±0.1ms   1.24±0.04ms   1.09±0.03ms 
               10000        random        8.10±0.4ms    6.01±0.4ms    5.51±0.4ms    4.82±0.3ms 
               10000       ordered        153±0.9ms     150±0.8ms      159±9ms       153±4ms   
               10000   reverse_ordered    165±0.9ms      163±1ms       164±2ms       165±7ms   
               10000      duplicated       126±10ms      116±7ms       115±6ms       110±1ms   
              ======= ================= ============= ============= ============= =============

KDTree query

$ asv run -b KDTreeQuery
· Creating environments
· Discovering benchmarks
· Running 2 total benchmarks (2 commits * 1 environments * 1 benchmarks)
[  0.00%] · For scikit-learn commit 719e6c8f <cpp-nth-element>:
[  0.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl
[ 50.00%] ··· binary_tree.KDTreeQuery.time_query                                                                                ok
[ 50.00%] ··· ======= ================= ============ ============ ============ ============
              --                                             leaf_size                     
              ------------------------- ---------------------------------------------------
                 n       matrix_type         1            20           40          100     
              ======= ================= ============ ============ ============ ============
                 10         random        83.9±2μs     66.8±1μs     67.4±2μs     68.6±2μs  
                 10        ordered        83.1±2μs     66.2±2μs     66.4±2μs     68.0±2μs  
                 10    reverse_ordered    90.1±3μs     75.9±2μs     76.4±1μs     76.3±2μs  
                 10       duplicated      134±4μs      71.0±2μs     71.6±2μs     70.4±2μs  
                100         random        113±3μs      97.5±2μs     105±3μs      123±4μs   
                100        ordered        109±3μs      90.5±2μs     94.6±2μs     112±3μs   
                100    reverse_ordered    122±4μs      97.6±3μs     102±2μs      205±80μs  
                100       duplicated     964±200μs     153±9μs      128±3μs      118±4μs   
                1000        random        143±3μs      133±4μs      139±4μs      161±6μs   
                1000       ordered        131±4μs      117±3μs      125±2μs      145±3μs   
                1000   reverse_ordered    149±4μs      131±3μs      134±3μs      152±4μs   
                1000      duplicated     5.17±0.1ms    846±20μs     670±20μs     587±10μs  
               10000        random        183±5μs      165±5μs      178±4μs      216±6μs   
               10000       ordered        159±4μs      139±4μs      150±4μs      184±7μs   
               10000   reverse_ordered    192±6μs      162±5μs      172±5μs      200±5μs   
               10000      duplicated     78.0±0.9ms   7.15±0.2ms   5.80±0.2ms   5.29±0.2ms 
              ======= ================= ============ ============ ============ ============

[ 50.00%] · For scikit-learn commit b251f3f8 <main>:
[ 50.00%] ·· Building for conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl..
[ 50.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl
[100.00%] ··· binary_tree.KDTreeQuery.time_query                                                                                ok
[100.00%] ··· ======= ================= ============ ============ ============ ============
              --                                             leaf_size                     
              ------------------------- ---------------------------------------------------
                 n       matrix_type         1            20           40          100     
              ======= ================= ============ ============ ============ ============
                 10         random        91.8±2μs     73.0±2μs     73.9±2μs     73.1±2μs  
                 10        ordered        89.2±2μs     70.5±2μs     70.8±1μs     70.7±2μs  
                 10    reverse_ordered    95.2±3μs     159±20μs     158±20μs     81.4±3μs  
                 10       duplicated      142±10μs     76.5±3μs    79.0±20μs     154±50μs  
                100         random        251±8μs      105±3μs      111±4μs      127±4μs   
                100        ordered        113±4μs      94.6±2μs     100±2μs      117±3μs   
                100    reverse_ordered    125±3μs      107±10μs     108±7μs     197±100μs  
                100       duplicated      747±40μs     160±7μs      135±4μs      123±4μs   
                1000        random        147±6μs      134±4μs      148±5μs      178±4μs   
                1000       ordered        133±2μs      123±3μs      128±4μs      151±7μs   
                1000   reverse_ordered    149±4μs      132±3μs      137±3μs      158±5μs   
                1000      duplicated     5.19±0.1ms    871±30μs     710±20μs     629±20μs  
               10000        random        186±5μs      173±5μs      199±7μs      248±10μs  
               10000       ordered        171±5μs      149±3μs      162±4μs      193±5μs   
               10000   reverse_ordered    203±7μs      170±5μs      242±70μs     218±8μs   
               10000      duplicated      85.9±5ms    7.80±0.3ms   6.25±0.1ms   5.53±0.1ms 
              ======= ================= ============ ============ ============ ============

Machine specs (adapted from machine.json)

  • Arch: "x86_64"
  • Cpu: "Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz"
  • Num Cpu: "4"
  • OS: "Linux 5.10.15-200.fc33.x86_64"
  • RAM": "7799752"
>>> from sklearn import show_versions

>>> show_versions()

System:
    python: 3.9.1 (default, Jan 20 2021, 00:00:00)  [GCC 10.2.1 20201125 (Red Hat 10.2.1-9)]
executable: /home/jjerphan/.virtualenvs/sk/bin/python
   machine: Linux-5.10.15-200.fc33.x86_64-x86_64-with-glibc2.32

Python dependencies:
          pip: 21.0.1
   setuptools: 51.1.1
      sklearn: 1.0.dev0
        numpy: 1.20.1
        scipy: 1.6.0
       Cython: 0.29.21
       pandas: 1.2.2
   matplotlib: 3.3.4
       joblib: 1.0.1
threadpoolctl: 2.1.0

Built with OpenMP: True

@sturlamolden
Copy link
Contributor

Actually, std::nth_element is required by the C++ standard to be worst-case O(n).

It used to be worst-case O(n log n) on GNU, as they used fallback to heapselect. This was a known bug as it violated the C++ standard. It is about 10 years since I saw a bug report on this, so presumably it is fixed by now.

@sturlamolden
Copy link
Contributor

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35968

@jjerphan
Copy link
Member Author

Thanks @sturlamolden for this pointer!

My results rely on GCC:

 → gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/10/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.2.1 20201125 (Red Hat 10.2.1-9) (GCC)

I'll include my machine setup in my comment above. Ideally, we would like to run those benchmarks on configurations using different compilers.

@NicolasHug
Copy link
Member

Thanks for the PR @jjerphan

As far as I know, we've been reluctant to include C++ code - unless in cases of vendoring libraries like libsvm, liblinear and murmur

Have you considered alternatives that might not require writing C++ ourselves? For example, is there a way to write the comparator directly in Cython and bypass the wrapper?

Personally, I'm not necessarily opposed to having C++ within the codebase (this is something I considered at some point for #18394). But this needs to be carefully examined, as this would set a precedent.

@jjerphan
Copy link
Member Author

jjerphan commented Feb 17, 2021

Hi @NicolasHug

Thanks for your comment.

Yes, I am currently trying to see how I can define and use IndexComparator directly as a template class in Cython. If possible, the only C++ adherence would be the import of nth_element from <std::algorithm>.

@NicolasHug
Copy link
Member

NicolasHug commented Feb 17, 2021

Got it, you might want to prefix the PR tile as [WIP] then ;) and switch to [MRG] when ready for reviews

@sturlamolden
Copy link
Contributor

It is only a few lines of C++ inlined into the Cython code. In this case it cannot be avoided, as Cython cannot implement a C++ class with member functions. The cppclass keyword in Cython only allows us to declare a C++ class.

The alternative would be to write introselect in C or Cython. There is C code for it in NumPy. It uses median-of-5 as fallback, so is worst-case O(n). It should not be that difficult to clean it up.

https://github.com/numpy/numpy/blob/master/numpy/core/src/npysort/selection.c.src

@NicolasHug
Copy link
Member

It is only a few lines of C++ inlined into the Cython code.

I agree, although the size or the complexity of the code is not really what matters here.

The alternative would be to write introselect in C or Cython

I agree this is worth considering - in Cython though, not C, for the same reasons as above. For ref, we do have a custom implementation of introsort (not introselect) with median3 in Cython: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/tree/_splitter.pyx#L496:L496

@sturlamolden
Copy link
Contributor

sturlamolden commented Feb 17, 2021

It is only a few lines of C++ inlined into the Cython code.

I agree, although the size or the complexity of the code is not really what matters here.

Then what is it exactly that matters here? In terms of maintainability, this snipplet of inlined C++ is a nobrainer.

It is just there because Cython cannot use cdef struct or cdef cppclass to implement a C++ functor. A simple C++ class (or struct) is therefore needed to interface std::nth_element.

The alternative would be to write introselect in C or Cython

I agree this is worth considering - in Cython though, not C, for the same reasons as above.

True, but this would lead to code that are much more complex and has a higher chance of containing errors than a tiny driver for std::nth_element.

@NicolasHug
Copy link
Member

Then what is it exactly that matters here? In terms of maintainability, this snipplet of inlined C++ is a nobrainer.

Isolated, yes. In the context of the whole project, no. This one might be simple, but it opens the door to more complex contributions that we may not want. It sets a precedent, as I mentioned, and we need to have clear requirements as to what we can and cannot accept in the project. So far, we have not accepted C++ code (except when vendoring stuff)

Again, I'm not opposed to it. I'm only saying this needs to be discussed, in particular with other devs. If there's a simple enough solution that doesn't require input from the rest of the core devs, it is preferable.

The decision to favor Cython over C/C++ code was made long before I got involved, so I'm not entirely sure of what drove the decision. One argument I believe is that Cython is easier to write than C/C++ for contributors who are used to Python. If you ask me, after having written thousands of lines of Cython, I personally find that this is absolutely not true.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's difficult to implement a compatible comparator in Cython, I see no great problem with this. I don't think this is going to set us on a dangerous slippery slope towards accepting contributions with large portions in C/++. It's only really defining a comparator (and abstracting that away so that it can be used in Cython). I think it would benefit from a few more comments under the assumption that readers of this codebase won't often know C++.

@sturlamolden
Copy link
Contributor

It might actually be possible to avoid a functor here. One could use a global *data, split_dim and n_features, and declare them thread local. I am not sure if we can make Cython accept the C++ keyword thread_local though. But if we could, the IndexComparator functor could be replaced with an inline function.

Using a mutex would also work, but it would prevent concurrent construction of kd-trees.

@sturlamolden
Copy link
Contributor

We could reduce the amount of inlined C++ to three lines:

static thread_local DTYPE_t *data = nullptr;
static thread_local ITYPE_t split_dim = 0;
static thread_local ITYPE_t n_features = 0;

Then we could cdef these static globals without thread_local to Cython, and it should just work. This is as far as I can tell the minimal amount of C++ we need to run std::nth_element.

@jjerphan
Copy link
Member Author

On the one hand, I believe in the idea that relying on a efficient and well tested implementation of algorithms is better than reimplementing it. It seems that this is the case for GCC's implementation of std::n_th_elements.

But on the other hand, I do understand and respect the will of minimising — or even of getting ride of — code which isn't written Python or Cython within scikit-learn.

I've spent some time yesterday trying to reduce the amount of C++ code: it is not easy when it comes to implement a comparator, as underlined by @jnothman and I haven't been able to converge to something that was simpler¹. To pursue on @NicolasHug's remark that I share:

One argument I believe is that Cython is easier to write than C/C++ for contributors who are used to Python. If you ask me, after having written thousands of lines of Cython, I personally find that this is absolutely not true.

Would having a Cython comparator for a C++ interface be even easier for another code reader to understand?

Furthermore, I haven't inspect @sturlamolden's proposals to remove the C++ part and @NicolasHug's pointer to a Cython implementation of another algorithm. In both case, this would remove C++ adherence, but , to me, this would also add complexity with bits that are harder to understand. Do you agree?

Finally. I haven't (yet) been able to find other discussions or PRs which are mentioning the motive for solely using Python and Cython — please do share pointers if you have!

Using C++ here (as off 711ea35) does not come with the cons of code vendoring (for instance: new file extension, an extra setup for builds, code duplication between projects, larger wheels and source distributions, etc.): it is one good/simple/natural glue which offers a fair trade-off to expose an efficient and trusted implementation.

But there might be others, and I would be glad if this could be further improved in an even simpler fashion! 🙂


Technical note:

[1] See this part of Cython docs for technical aspects regarding using C++ interfaces in Cython while minimising the amount of C++ code).

@sturlamolden
Copy link
Contributor

sturlamolden commented Feb 18, 2021

Furthermore, I haven't inspect @sturlamolden's proposals to remove the C++ part and @NicolasHug's pointer to a Cython implementation of another algorithm. In both case, this would remove C++ adherence, but , to me, this would also add complexity with bits that are harder to understand. Do you agree?

I don’t know. Using global variables does not add complexity, in terms of the required C++.

thread_local is only required if we release the GIL during the build. I must admit it has been a while since I looked at sklearn’s kd-tree code. If the GIL is not released, then no inlined C++ is needed to use std::nth_element. If the GIL is released, you only need to make sure the three variables are declared thread local to the C++ compiler.

Personally I think the solution with an inlined struct is perfectly ok. It is what we use in SciPy, though almost all of the SciPy kd-tree code is C++.

@sturlamolden
Copy link
Contributor

I should perhaps mention that NumPy also makes use of thread local globals. E.g. if we have a Fortran function that makes callbacks to Python, then f2py will store the global state in a threadlocal variable.

@cmarmo
Copy link
Contributor

cmarmo commented Mar 1, 2021

I'm wondering if this change is worth a changelog entry... the check is green because no tests are modified, but this is meant to improve performances, right?

@jjerphan
Copy link
Member Author

jjerphan commented Mar 1, 2021

I'm wondering if this change is worth a changelog entry... the check is green because no tests are modified, but this is meant to improve performances, right?

This is solely meant to improve performance, yes.

Let me know if I should add a change-log entry, on my end I am mainly looking for other maintainer's opinions, suggestions and/or approvals.

@jnothman
Copy link
Member

jnothman commented Mar 1, 2021 via email

Copy link
Contributor

@sturlamolden sturlamolden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changelog should reflect what the C++ standard says and all current implementations do, not what a 8 year old bug in GNU libstdc++ used to do.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to side with allowing more c++ code because I think it makes code easier to read and write. Having access to c++ containers and algorithms can help reduce the amount of code we have to maintain. (In this PR, we are taking advantage of algorithm)

jjerphan and others added 6 commits April 7, 2021 13:23
Reintroduce the docstring of partition_node_indices.

Co-authored-by: "Thomas J. Fan" <thomasjpfan@gmail.com>
Co-authored-by: "Thomas J. Fan" <thomasjpfan@gmail.com>
As the changes relate to both BinaryTree's
subclasses.
@jjerphan jjerphan changed the title ENH Improve KDTree worst-case time complexity ENH Improve the creation of KDTree and BallTree on their worst-case time complexity Apr 7, 2021
@jjerphan
Copy link
Member Author

jjerphan commented Apr 7, 2021

Here are some (re-organisated) results from the latest benchmark script which now also compares changes on KDTree and BallTree (see the raw results here).

BinaryTree creation

main, at 9cfacf1
======================================= ======= ================= ============= ============= ============= =============
--                                                                                       leaf_size
----------------------------------------------------------------- -------------------------------------------------------
               BinaryTree                  n       matrix_type          1             20            40           100
======================================= ======= ================= ============= ============= ============= =============
   sklearn.neighbors._kd_tree.KDTree       10         random         84.3±2μs      88.0±5μs      80.1±2μs      80.9±2μs
   sklearn.neighbors._kd_tree.KDTree       10        ordered         79.4±3μs      78.2±2μs      79.2±2μs      81.2±4μs
   sklearn.neighbors._kd_tree.KDTree       10    reverse_ordered     82.1±4μs      81.3±3μs      79.5±2μs      84.8±2μs
   sklearn.neighbors._kd_tree.KDTree       10       duplicated       81.5±2μs      84.9±2μs      84.7±2μs      84.5±2μs
   sklearn.neighbors._kd_tree.KDTree      100         random         105±2μs       86.6±2μs      83.8±2μs      81.4±2μs
   sklearn.neighbors._kd_tree.KDTree      100        ordered         116±2μs       104±3μs       97.6±3μs      81.4±1μs
   sklearn.neighbors._kd_tree.KDTree      100    reverse_ordered     127±4μs       108±3μs       98.5±2μs      85.5±2μs
   sklearn.neighbors._kd_tree.KDTree      100       duplicated       108±2μs       96.5±3μs      98.9±4μs      81.5±2μs
   sklearn.neighbors._kd_tree.KDTree      1000        random         505±20μs      395±20μs      296±20μs      266±20μs
   sklearn.neighbors._kd_tree.KDTree      1000       ordered       2.00±0.08ms   1.73±0.08ms   1.82±0.08ms    1.75±0.1ms
   sklearn.neighbors._kd_tree.KDTree      1000   reverse_ordered    2.20±0.2ms   1.96±0.07ms    2.10±0.2ms   1.66±0.08ms
   sklearn.neighbors._kd_tree.KDTree      1000      duplicated     1.64±0.08ms    1.45±0.1ms   1.18±0.03ms   1.08±0.02ms
   sklearn.neighbors._kd_tree.KDTree     10000        random        8.07±0.3ms    5.96±0.3ms    5.71±0.3ms    5.53±0.6ms
   sklearn.neighbors._kd_tree.KDTree     10000       ordered         153±1ms       159±1ms       156±4ms       164±8ms
   sklearn.neighbors._kd_tree.KDTree     10000   reverse_ordered     185±8ms       168±5ms       173±6ms       175±8ms
   sklearn.neighbors._kd_tree.KDTree     10000      duplicated       127±7ms       128±5ms       117±5ms       119±8ms
 sklearn.neighbors._ball_tree.BallTree     10         random         80.1±2μs      82.7±4μs      78.7±2μs      79.4±2μs
 sklearn.neighbors._ball_tree.BallTree     10        ordered         81.4±3μs      80.4±2μs      82.5±5μs      84.2±2μs
 sklearn.neighbors._ball_tree.BallTree     10    reverse_ordered     85.3±3μs      83.8±2μs      83.6±2μs      83.7±2μs
 sklearn.neighbors._ball_tree.BallTree     10       duplicated       85.5±3μs      83.0±2μs      79.1±2μs      80.2±3μs
 sklearn.neighbors._ball_tree.BallTree    100         random         99.8±3μs      91.0±3μs      87.8±2μs      83.8±2μs
 sklearn.neighbors._ball_tree.BallTree    100        ordered         116±3μs       96.4±2μs      95.3±2μs      81.0±3μs
 sklearn.neighbors._ball_tree.BallTree    100    reverse_ordered     112±2μs       97.5±2μs      91.9±2μs      80.9±2μs
 sklearn.neighbors._ball_tree.BallTree    100       duplicated       112±3μs       99.7±2μs      94.8±3μs      89.7±3μs
 sklearn.neighbors._ball_tree.BallTree    1000        random         545±50μs      330±20μs      285±20μs      226±10μs
 sklearn.neighbors._ball_tree.BallTree    1000       ordered       1.78±0.04ms   1.76±0.04ms   1.68±0.04ms   1.60±0.08ms
 sklearn.neighbors._ball_tree.BallTree    1000   reverse_ordered   1.99±0.04ms   1.85±0.04ms    1.87±0.1ms   1.81±0.08ms
 sklearn.neighbors._ball_tree.BallTree    1000      duplicated     1.43±0.05ms   1.29±0.03ms   1.23±0.03ms   1.09±0.03ms
 sklearn.neighbors._ball_tree.BallTree   10000        random        7.92±0.4ms    6.28±0.5ms    5.95±0.3ms    5.45±0.4ms
 sklearn.neighbors._ball_tree.BallTree   10000       ordered         164±3ms       160±4ms       150±2ms       160±7ms
 sklearn.neighbors._ball_tree.BallTree   10000   reverse_ordered     176±8ms       163±5ms       171±7ms       162±3ms
 sklearn.neighbors._ball_tree.BallTree   10000      duplicated       116±6ms       120±5ms       115±4ms       112±3ms
======================================= ======= ================= ============= ============= ============= =============
This PR, at 809aa8b
======================================= ======= ================= ============ ============= ============= =============
--                                                                                      leaf_size
----------------------------------------------------------------- ------------------------------------------------------
               BinaryTree                  n       matrix_type         1             20            40           100
======================================= ======= ================= ============ ============= ============= =============
   sklearn.neighbors._kd_tree.KDTree       10         random        98.0±7μs      91.9±5μs      91.5±4μs      92.9±4μs
   sklearn.neighbors._kd_tree.KDTree       10        ordered        100±10μs      136±20μs      160±30μs      158±30μs
   sklearn.neighbors._kd_tree.KDTree       10    reverse_ordered    161±20μs      98.3±8μs      106±10μs      98.2±5μs
   sklearn.neighbors._kd_tree.KDTree       10       duplicated      103±10μs      96.7±7μs      104±10μs      94.5±7μs
   sklearn.neighbors._kd_tree.KDTree      100         random        121±8μs       104±8μs       104±7μs       102±8μs
   sklearn.neighbors._kd_tree.KDTree      100        ordered        118±10μs      104±10μs      103±8μs       102±8μs
   sklearn.neighbors._kd_tree.KDTree      100    reverse_ordered    139±10μs      103±6μs       101±6μs       104±10μs
   sklearn.neighbors._kd_tree.KDTree      100       duplicated      127±10μs      185±20μs      106±10μs      185±20μs
   sklearn.neighbors._kd_tree.KDTree      1000        random        893±90μs      305±30μs      257±20μs      226±20μs
   sklearn.neighbors._kd_tree.KDTree      1000       ordered        473±70μs      225±20μs      193±20μs      185±20μs
   sklearn.neighbors._kd_tree.KDTree      1000   reverse_ordered    441±30μs      230±20μs      181±10μs      172±10μs
   sklearn.neighbors._kd_tree.KDTree      1000      duplicated      342±20μs      188±10μs      193±20μs      173±20μs
   sklearn.neighbors._kd_tree.KDTree     10000        random       5.26±0.2ms    2.32±0.1ms    2.16±0.1ms   1.94±0.08ms
   sklearn.neighbors._kd_tree.KDTree     10000       ordered       4.01±0.2ms    1.80±0.1ms   1.52±0.08ms    1.36±0.1ms
   sklearn.neighbors._kd_tree.KDTree     10000   reverse_ordered   4.25±0.3ms    1.85±0.1ms   1.56±0.08ms    1.48±0.2ms
   sklearn.neighbors._kd_tree.KDTree     10000      duplicated     3.11±0.3ms   1.48±0.08ms   1.29±0.07ms   1.28±0.08ms
 sklearn.neighbors._ball_tree.BallTree     10         random        90.3±5μs      103±30μs      96.4±9μs      119±30μs
 sklearn.neighbors._ball_tree.BallTree     10        ordered        181±50μs     95.7±10μs      87.7±7μs     87.0±10μs
 sklearn.neighbors._ball_tree.BallTree     10    reverse_ordered   95.7±10μs      89.4±6μs      85.1±3μs      95.6±9μs
 sklearn.neighbors._ball_tree.BallTree     10       duplicated      86.0±4μs      89.7±2μs      82.3±2μs      82.2±2μs
 sklearn.neighbors._ball_tree.BallTree    100         random        112±2μs       95.6±3μs      92.0±3μs      82.5±2μs
 sklearn.neighbors._ball_tree.BallTree    100        ordered        110±4μs       92.6±3μs      92.9±4μs     92.8±20μs
 sklearn.neighbors._ball_tree.BallTree    100    reverse_ordered    110±4μs       94.2±2μs      86.2±2μs      82.1±1μs
 sklearn.neighbors._ball_tree.BallTree    100       duplicated      99.0±3μs      86.6±2μs     86.7±10μs      89.4±3μs
 sklearn.neighbors._ball_tree.BallTree    1000        random        422±10μs      260±7μs       211±5μs       199±6μs
 sklearn.neighbors._ball_tree.BallTree    1000       ordered        333±20μs      190±5μs       167±3μs       181±40μs
 sklearn.neighbors._ball_tree.BallTree    1000   reverse_ordered   510±100μs      228±10μs      190±10μs      157±10μs
 sklearn.neighbors._ball_tree.BallTree    1000      duplicated      326±60μs      193±8μs       215±20μs      153±10μs
 sklearn.neighbors._ball_tree.BallTree   10000        random       4.19±0.1ms   2.28±0.06ms   1.98±0.05ms   1.74±0.06ms
 sklearn.neighbors._ball_tree.BallTree   10000       ordered       3.37±0.1ms   1.74±0.05ms   1.53±0.05ms    1.44±0.2ms
 sklearn.neighbors._ball_tree.BallTree   10000   reverse_ordered   3.50±0.6ms   1.66±0.05ms    1.53±0.1ms   1.29±0.06ms
 sklearn.neighbors._ball_tree.BallTree   10000      duplicated     2.87±0.1ms   1.56±0.07ms   1.47±0.06ms   1.28±0.05ms
======================================= ======= ================= ============ ============= ============= =============

BinaryTree querying

main, at 9cfacf1
======================================= ======= ================= ============= ============ ============= ============
--                                                                                      leaf_size
----------------------------------------------------------------- -----------------------------------------------------
               BinaryTree                  n       matrix_type          1            20            40          100
======================================= ======= ================= ============= ============ ============= ============
   sklearn.neighbors._kd_tree.KDTree       10         random         98.6±3μs     71.9±2μs      71.4±1μs     72.0±2μs
   sklearn.neighbors._kd_tree.KDTree       10        ordered         96.7±6μs     73.2±2μs      73.9±2μs     68.9±1μs
   sklearn.neighbors._kd_tree.KDTree       10    reverse_ordered     98.7±3μs     81.3±3μs      76.2±2μs     80.3±2μs
   sklearn.neighbors._kd_tree.KDTree       10       duplicated       141±4μs      75.7±4μs      75.1±2μs     70.6±2μs
   sklearn.neighbors._kd_tree.KDTree      100         random         113±2μs      98.6±2μs      118±5μs      137±8μs
   sklearn.neighbors._kd_tree.KDTree      100        ordered         119±3μs      90.7±2μs      106±4μs      143±10μs
   sklearn.neighbors._kd_tree.KDTree      100    reverse_ordered     129±3μs      104±3μs       106±3μs      184±6μs
   sklearn.neighbors._kd_tree.KDTree      100       duplicated       711±20μs     154±3μs       132±4μs      116±5μs
   sklearn.neighbors._kd_tree.KDTree      1000        random         153±6μs      157±7μs       186±10μs     183±8μs
   sklearn.neighbors._kd_tree.KDTree      1000       ordered         158±8μs      123±5μs       141±7μs      166±9μs
   sklearn.neighbors._kd_tree.KDTree      1000   reverse_ordered     150±5μs      131±3μs       136±4μs      171±6μs
   sklearn.neighbors._kd_tree.KDTree      1000      duplicated     5.16±0.09ms    918±20μs      751±10μs     655±30μs
   sklearn.neighbors._kd_tree.KDTree     10000        random         192±7μs      188±10μs      222±10μs     260±20μs
   sklearn.neighbors._kd_tree.KDTree     10000       ordered         172±6μs      149±5μs       161±4μs      200±6μs
   sklearn.neighbors._kd_tree.KDTree     10000   reverse_ordered     204±7μs      173±5μs       186±6μs      214±6μs
   sklearn.neighbors._kd_tree.KDTree     10000      duplicated      79.4±0.9ms   7.07±0.2ms   5.90±0.09ms   5.55±0.1ms
 sklearn.neighbors._ball_tree.BallTree     10         random         86.4±5μs     80.9±3μs      74.7±3μs     82.8±5μs
 sklearn.neighbors._ball_tree.BallTree     10        ordered         74.5±2μs     71.3±5μs      68.4±1μs     69.9±5μs
 sklearn.neighbors._ball_tree.BallTree     10    reverse_ordered     76.1±3μs     74.2±2μs      73.1±2μs     74.2±3μs
 sklearn.neighbors._ball_tree.BallTree     10       duplicated       96.4±4μs     68.6±1μs      67.2±1μs     67.9±2μs
 sklearn.neighbors._ball_tree.BallTree    100         random         95.3±2μs     105±8μs       104±3μs      122±2μs
 sklearn.neighbors._ball_tree.BallTree    100        ordered         92.4±2μs     91.0±2μs      97.8±3μs     120±3μs
 sklearn.neighbors._ball_tree.BallTree    100    reverse_ordered     90.8±5μs     82.1±2μs      95.4±2μs     177±5μs
 sklearn.neighbors._ball_tree.BallTree    100       duplicated       367±20μs     127±2μs       118±2μs      114±3μs
 sklearn.neighbors._ball_tree.BallTree    1000        random         113±3μs      120±3μs       146±4μs      180±6μs
 sklearn.neighbors._ball_tree.BallTree    1000       ordered         107±3μs      106±3μs       116±2μs      144±4μs
 sklearn.neighbors._ball_tree.BallTree    1000   reverse_ordered     101±3μs      104±3μs       109±3μs      136±3μs
 sklearn.neighbors._ball_tree.BallTree    1000      duplicated     2.47±0.09ms    726±20μs      654±20μs     606±20μs
 sklearn.neighbors._ball_tree.BallTree   10000        random         150±10μs     155±5μs       182±6μs      239±7μs
 sklearn.neighbors._ball_tree.BallTree   10000       ordered         167±60μs     170±70μs      140±3μs      202±30μs
 sklearn.neighbors._ball_tree.BallTree   10000   reverse_ordered     126±6μs      133±10μs      138±3μs      218±50μs
 sklearn.neighbors._ball_tree.BallTree   10000      duplicated       37.6±2ms    6.36±0.2ms    5.45±0.2ms   5.35±0.1ms
======================================= ======= ================= ============= ============ ============= ============
This PR, at 809aa8b
======================================= ======= ================= ============ ============ ============ ============
--                                                                                     leaf_size
----------------------------------------------------------------- ---------------------------------------------------
               BinaryTree                  n       matrix_type         1            20           40          100
======================================= ======= ================= ============ ============ ============ ============
   sklearn.neighbors._kd_tree.KDTree       10         random        106±5μs      84.2±5μs     82.7±5μs     97.0±8μs
   sklearn.neighbors._kd_tree.KDTree       10        ordered        93.0±3μs     81.8±3μs     78.1±2μs     77.6±4μs
   sklearn.neighbors._kd_tree.KDTree       10    reverse_ordered    105±3μs      80.8±3μs     82.3±4μs     84.2±7μs
   sklearn.neighbors._kd_tree.KDTree       10       duplicated      151±10μs     75.5±2μs     83.6±5μs     84.0±4μs
   sklearn.neighbors._kd_tree.KDTree      100         random        117±2μs      111±4μs      119±4μs      135±6μs
   sklearn.neighbors._kd_tree.KDTree      100        ordered        124±6μs      100±4μs      107±5μs      125±8μs
   sklearn.neighbors._kd_tree.KDTree      100    reverse_ordered    126±7μs      114±6μs      115±20μs     198±9μs
   sklearn.neighbors._kd_tree.KDTree      100       duplicated     793±100μs     158±6μs      129±6μs      115±3μs
   sklearn.neighbors._kd_tree.KDTree      1000        random        148±7μs      148±10μs     180±10μs     187±10μs
   sklearn.neighbors._kd_tree.KDTree      1000       ordered        135±4μs      131±5μs      137±4μs      160±7μs
   sklearn.neighbors._kd_tree.KDTree      1000   reverse_ordered    155±7μs      132±3μs      136±4μs      160±6μs
   sklearn.neighbors._kd_tree.KDTree      1000      duplicated     5.25±0.2ms   1.28±0.2ms   1.21±0.2ms   865±100μs
   sklearn.neighbors._kd_tree.KDTree     10000        random        232±20μs     211±30μs     260±60μs     312±50μs
   sklearn.neighbors._kd_tree.KDTree     10000       ordered        210±10μs     174±10μs     197±30μs     240±20μs
   sklearn.neighbors._kd_tree.KDTree     10000   reverse_ordered    289±50μs     232±30μs     223±20μs     235±20μs
   sklearn.neighbors._kd_tree.KDTree     10000      duplicated      101±6ms     8.17±0.5ms   6.92±0.6ms   5.94±0.3ms
 sklearn.neighbors._ball_tree.BallTree     10         random        116±30μs    98.8±10μs     86.6±5μs     85.8±4μs
 sklearn.neighbors._ball_tree.BallTree     10        ordered        95.7±7μs     125±30μs    92.4±10μs     86.4±4μs
 sklearn.neighbors._ball_tree.BallTree     10    reverse_ordered    96.7±8μs     91.4±6μs     94.3±6μs     94.8±7μs
 sklearn.neighbors._ball_tree.BallTree     10       duplicated      125±8μs      82.0±5μs     87.8±9μs     84.8±6μs
 sklearn.neighbors._ball_tree.BallTree    100         random        113±9μs      120±9μs      131±9μs      144±10μs
 sklearn.neighbors._ball_tree.BallTree    100        ordered        110±9μs      107±6μs      125±10μs     142±9μs
 sklearn.neighbors._ball_tree.BallTree    100    reverse_ordered    107±6μs      108±6μs      110±8μs      223±20μs
 sklearn.neighbors._ball_tree.BallTree    100       duplicated      413±30μs     165±10μs     162±20μs     155±30μs
 sklearn.neighbors._ball_tree.BallTree    1000        random        135±10μs     148±20μs     178±40μs     182±20μs
 sklearn.neighbors._ball_tree.BallTree    1000       ordered        107±5μs      110±5μs      133±20μs     147±5μs
 sklearn.neighbors._ball_tree.BallTree    1000   reverse_ordered    106±5μs      124±30μs     151±30μs     178±20μs
 sklearn.neighbors._ball_tree.BallTree    1000      duplicated     2.79±0.1ms    733±20μs     673±50μs     633±20μs
 sklearn.neighbors._ball_tree.BallTree   10000        random        144±5μs      145±4μs      168±6μs      206±6μs
 sklearn.neighbors._ball_tree.BallTree   10000       ordered        120±4μs      130±5μs      147±5μs      186±6μs
 sklearn.neighbors._ball_tree.BallTree   10000   reverse_ordered    115±2μs      128±5μs      136±4μs      170±5μs
 sklearn.neighbors._ball_tree.BallTree   10000      duplicated      36.9±1ms    6.26±0.2ms   6.07±0.2ms    6.85±1ms
======================================= ======= ================= ============ ============ ============ ============

Machine specs (adapted from machine.json)

  • Arch: "x86_64"
  • Cpu: "Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz"
  • Num Cpu: "4"
  • OS: "Linux 5.10.22-200.fc33.x86_64"
  • RAM": "7799752"
>>> from sklearn import show_versions

>>> show_versions()

System:
    python: 3.9.2 (default, Feb 20 2021, 00:00:00)  [GCC 10.2.1 20201125 (Red Hat 10.2.1-9)]
   machine: Linux-5.10.22-200.fc33.x86_64-x86_64-with-glibc2.32

Python dependencies:
          pip: 21.0.1
   setuptools: 51.1.1
      sklearn: 1.0.dev0
        numpy: 1.20.1
        scipy: 1.6.0
       Cython: 0.29.21
       pandas: 1.2.2
   matplotlib: 3.3.4
       joblib: 1.0.1
threadpoolctl: 2.1.0

Built with OpenMP: True

@thomasjpfan thomasjpfan added this to the 1.0 milestone Apr 7, 2021
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MInor comment, otherwise LGTM

@thomasjpfan thomasjpfan merged commit ee524f4 into scikit-learn:main Apr 8, 2021
@jjerphan jjerphan deleted the cpp-nth-element branch April 8, 2021 18:43
thomasjpfan added a commit to thomasjpfan/scikit-learn that referenced this pull request Apr 19, 2021
…ime complexity (scikit-learn#19473)

Co-authored-by: jiefangxuanyan <505745416@qq.com>
Co-authored-by: "Thomas J. Fan" <thomasjpfan@gmail.com>
@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sklearn.neighbors.KDTree complexity for building is not O(n(k+log(n))
7 participants