MNT simplify some tree code with memoryviews #12886

adrinjalali · 2018-12-28T19:08:16Z

Related to #10624

This for now is more of a question, before potentially going further.

There are parts of the tree code which can be significantly simplified using memoryviews, like the one presented in this PR. The question is, is this change valid? I think it passes all the tests and dosn't add anything to the time required for the tests, but I'm not sure if our tests cover things such as passing a readonly array to the function, which I guess is an edge case reading the discussion there.

amueller · 2018-12-28T19:57:21Z

we do test readonly arrays, the question is whether they survive all conversions to that point (probably they get copied somewhere).
The tests are not a good benchmark, I think you need to run actual benchmarks on larger datasets both using big and small trees and do full profiling to see if this makes it slower.

adrinjalali · 2018-12-30T15:43:47Z

It seems at least according to the benchmarks/bench_tree.py, such changes make it faster, and not just by a tiny bit.

Results from the master on my machine:

Results from this branch on my machine:

amueller · 2018-12-30T22:04:08Z

well that's interesting. Should also be interesting to @NicolasHug

amueller · 2018-12-30T22:04:47Z

how stable are these curves though, can we get bars? ;)

sklearn/tree/_tree.pyx

adrinjalali · 2018-12-31T11:03:13Z

I know I'm doing something wrong here, but really don't know what. I get this very odd looking cython compile error:

Memoryview 'DOUBLE_t[:, :]' not conformable to memoryview 'DOUBLE_t[:, :]'.

Been trying to figure it out, but any help would be appreciated.

NicolasHug · 2018-12-31T13:37:36Z

don't ask me why but I think that

from _tree cimport DOUBLE_t

in both splitter and criterion will fix it (also remove ctypedef np.npy_float64 DOUBLE_t).

Very weird that the other dtypes aren't causing the same thing.

Looks similar to this where the fix is similar

adrinjalali · 2018-12-31T17:05:39Z

don't ask me why but I think that
from _tree cimport DOUBLE_t
in both splitter and criterion will fix it (also remove ctypedef np.npy_float64 DOUBLE_t).

Very weird that the other dtypes aren't causing the same thing.

Looks similar to this where the fix is similar

Ooh, that makes a lot of sense. I wish cython would print the complete type, like, _tree.DOUBLE_t vs. _criterion.DOUBLE_t instead or something. Thanks.

NicolasHug · 2018-12-31T17:19:10Z

But does it make sense to you that only DOUBLE_t is causing this, and not e.g. DTYPE_t?

It looks like a Cython bug to me.

adrinjalali · 2018-12-31T17:28:10Z

But does it make sense to you that only DOUBLE_t is causing this, and not e.g. DTYPE_t?

It looks like a Cython bug to me.

I think the difference may be that DOUBLE_t is the only one used in the context of a memoryview while passing between modules. I'd agree that it's a bug or at least an inconsistency that it has stricter type checking for a_t [:] x than a_t* x though. But please read all of that with the hint of "adrin is still learning cython" :D.

jnothman

I personally find MemoryViews beneficial to maintainability, and helpful for catching bugs. If bench_tree is happy, and @NicolasHug is happy, I think this is a positive change.

jnothman · 2019-01-02T11:44:32Z

Sorry, got confused. Not sure that @NicolasHug is the one most in need of being happy. Rather, if you are happy that this does not make life more difficult with your other work on trees, I'm happy for it.

NicolasHug · 2019-01-02T12:57:56Z

It sure doesn't make my life harder at least ^^
Like Adrin I'm still learning those Cython features, so I can't give an informed review for this. I'm mostly interested on how thing can be done to get some inspiration for #12807

adrinjalali · 2019-01-02T13:44:08Z

I made some more reliable benchmarks with error bars (10 repeats each) using the benchmarks/bench_tree.py, under 3 different conditions. In the following plots:

master: the master
only_x: uses memoryviews for X in _tree.* and _splitter.*
criterion_y: also uses memoryviews for y in _criterion.*

Charts for increasing number of rows:
Classifier:

Regressor:

Increasing number of columns:
Classifier:

Regressor:

I guess the conclusion from these charts is that when we use memory views for X, there's little or no overhead. But there's some overhead when we try to apply that for y in criterion. Here are a few points about that:

In the case of using memory views for X, we only do a type cast/conversion when we want to use it, and the memory view is not passed around in functions. This can be done since in all those cases we operate under gil and we can do python stuff.

All the _criterion.* code is under nogil, which means we can't do that type cast. As a result y needs to be passed around as a memory view struct. Now the issue is that the memoryview struct is a rather large one, i.e. the generated C code:

typedef struct {
  struct __pyx_memoryview_obj *memview;
  char *data;
  Py_ssize_t shape[8];
  Py_ssize_t strides[8];
  Py_ssize_t suboffsets[8];
} __Pyx_memviewslice;

And this struct is what's passed around, and not a pointer to it. I couldn't find a way to pass around pointers and then do a cast while keeping gil, and I suspect this is the reason behind the overhead for this case.

WDYT?

NicolasHug · 2019-01-02T14:35:12Z

What are the units of the axis? You don't just use 10 samples / rows right?

Have you tried removing some of the unnecessary casts in criterion.pyx? For example

    cdef int update(self, SIZE_t new_pos) nogil except -1:
        ...
        cdef DOUBLE_t[:, :] y = self.y
        ...

is probably not needed since self.y has been declared as a memory view already.

I assume looking at the annotated generated code (cython -a) doesn't help since it's all no-gil / no-python interaction code ?

In any case I don't think the overhead of passing the memview struct is causing the slowdown, because this struct is only passed once when building the criterion object right? It's not passed into tons of function calls. Also, I would assume the same overhead occurs for the X view: Cython still has to allocate the memview struct for X, on the heap instead of on the stack.

Like I said above I'm not experienced with advanced Cython so... don't take all this for granted ;)

NicolasHug · 2019-01-02T14:50:21Z

Another lead might be that the previous code for y knew that y was C-aligned, while now Cython has to generate code for any kind of alignment adding an unnecessary multiplication. It might be worth it to specify that the view is C-aligned to avoid the innermost stride
multiplication. (see section Caring about memory layout in this paper)

here is the doc for C or F contiguous memviews

adrinjalali · 2019-01-02T15:47:02Z

Nice tips @NicolasHug . Those together bring all tests as shown in these plots, i.e. no overhead (most of it was after I applied your first comment):

Now I need to figure out a way to to handle the read-only input issue (i.e. failing tests).

adrinjalali · 2019-01-02T18:12:53Z

I guess this looks good for a review now.

sklearn/tree/_criterion.pyx

rth

I'm not too familiar with the trees code but this definitely look more readable.

jnothman

I think it is helpful to improve the maintainability of the tree code... but this also changes the interface of Criterion which users are known to develop custom extensions to. Perhaps the same is true of Splitter. I'm a little concerned that this will break code (even on not-really-supported API) sufficiently to not be worthwhile.

If we have reason to break these APIs in any case, then this maintainability improvement could be thrown in, but it's not very fair to users otherwise. What do others think?

NicolasHug · 2019-01-07T20:28:56Z

My 2 cents is that we should not start caring about non public API changes.

Quoting #10251 (comment)

this interface is not considered public, stable API and should be used at one's own risk.

adrinjalali · 2019-01-07T20:51:41Z

It's also mostly (if not all) the C API, and not even the python API. I really don't think we should care about backward compatibility of those pieces of the code. That said, I'm happy to have a blog post somewhere (we don't have a blog, do we?) and announce well in advance that there are some C API changes to those parts and explain them.

jnothman · 2019-01-07T21:44:45Z

I don't see how C vs Python API should make a difference, except insofar as it might be harder to write Cython code compatible with multiple versions of Scikit-learn. If I am not mistaken it would require translation-time switching based on sklearn.__version__. Perhaps it's right that this is not public, and we should stick to our guns and not support it. But we know people are extending Criterion, we recently offered them a pxd to do so (albeit with no promises of future support), and as long as this change is largely cosmetic, I don't see why it is good to force downstream libraries to be pinned to a particular Scikit-learn version.

adrinjalali · 2019-01-10T20:00:10Z

Personally, I was really confused by the strides and it took me a while to figure they're just there to index the matrices.

I guess we [mostly] agree that this part of the package is not the easiest to get used to, which contributes to inhibiting contributions to it by a wider developer community. FWIW, I personally find memoryviews significantly nicer than what they're replacing here, and much less intimidating.

Generally, I think this PR is not the only thing which can potentially improve readability of the tree code base, and I'd find any contribution towards this goal valuable (the least it does is to ease my life working on trees, and that's why I started this PR). So I would have liked us to follow the "it's not public, therefore we can change the API" principle, at least here.

I understand it's a nice thing to not change APIs, but:

If I were a developer using private parts of a package, I wouldn't expect them to keep it as is.
Even if it was public, I'd support a deprecation cycle to clean up the code base, cause I feel it helps with the maintainability of the code base.

I don't know, that's all I have I guess.

thomasjpfan · 2019-01-14T23:20:37Z

@jnothman Is it possible to direct the people using the current version of Criterion to this PR and see what they think about it?

jnothman · 2019-01-15T07:34:12Z

We can certainly ask people at #12381

NicolasHug · 2019-02-06T18:50:43Z

As far as I can tell the only change to the Criterion is pretty minimal and only impacts the __init__ signature. Someone with a custom Criterion object would only need to wrap they y pointer into a memory view (EDIT: the other way around actually), and remove the stride argument. That's a pretty minor inconvenience for using a private API, IMHO.

@camilstaps, you opened #10251 and needed custom Criterions, WDYT?

Also @glemaitre , @jmschrei, @glouppe , you've been working a lot on the tree code, maybe you want to weight in?

EDIT: for context: this PR greatly simplifies tree code by using memory views instead of pointer + stride, but breaks the Criterion interface, which is private but apparently used anyway by some users.

jnothman · 2019-02-06T20:48:25Z

Thanks for the persuasive market research, @NicolasHug :D

Maybe we should just make the change.

glemaitre · 2019-02-06T20:52:39Z

Regarding custom Criterion, I think that it is a minor issue. If I recall the PR which was exposing the criterion, it was mentioned that we did not want to officially support it but exposed the criterion for convenience instead.

adrinjalali · 2019-02-06T20:55:43Z

Maybe we should just make the change.

\o/

jnothman · 2019-02-06T20:58:23Z

What did we decide about read only memoryview handling? Are we copying at master? In this PR?

adrinjalali · 2019-02-06T21:23:59Z

I don't think we're copying. Some tests did actually fail and then I changed them in splitter and criterions to const memoryview instead and now tests are happy.

NicolasHug

Other than that LGTM!

NicolasHug · 2019-02-06T22:19:45Z

sklearn/tree/_criterion.pyx

@@ -454,7 +444,7 @@ cdef class ClassificationCriterion(Criterion):

                for k in range(self.n_outputs):
                    label_index = (k * self.sum_stride +
-                                   <SIZE_t> y[i * self.y_stride + k])
+                                   <SIZE_t> self.y[i, k])


Looking at the annotated html file we can see that using self generates some small Python interactions. I still don't understand why: this is a cdef class and self.y is properly typed, but I have observed this in #12807 as well.

Having only one Python interation is OK, but having interactions inside loops like here might be costly, so it's best to remove them. The workaround I found is to declare a local view and use it instead:

def const DOUBLE_t [:, ::1] y = self.y

Sorry @adrinjalali I know I pretty much told you to use self. in #12886 (comment), but I wasn't aware of this at the time :s

It's quite strange that the observation contradicts what we'd expect. I did change them and put the local variables in place, but the overhead of the local variable seems to be kinda much more than the little python interaction when using self. So I guess we should keep them as is, although along the way I did a bit more cleanups and the benchmarks are still intact.

Okay,

I'll try to come up with a reproducing example and ask Cython folks why we're having Python interactions. Meanwhile I agree, let's keep it as is.

Thanks for double-checking!

adrinjalali · 2019-02-15T09:24:14Z

@jnothman any final verdict on this one?

jnothman

I'm in favour. Is that final?

adrinjalali · 2019-02-19T09:50:56Z

I'm in favour. Is that final?

It is for this PR. I've found a bit more in the context of the NOCATS PR, but that can stay there.

For future reference, any such changes should go through time performance tests, sometimes what we see is not intuitive and they may actually slow down the run.

adrinjalali · 2019-02-25T16:18:34Z

@glemaitre review ping :)

sklearn/tree/_criterion.pxd

sklearn/tree/_criterion.pyx

* simplify apply_dense with memoryviews * change more Xs to memviews in splitter * remove commented out lines * trying to remove y_stride * float->dtype_t * criterion has no more y_stride * remove redundant constructs and handle const y input * fix criterion docstring for y's dtype * revert docstring for y * formatting: no space between type name and [:, :] * remove redundant X * more cleanup on criterion * address comment

)" This reverts commit ace4eb1.

* simplify apply_dense with memoryviews * change more Xs to memviews in splitter * remove commented out lines * trying to remove y_stride * float->dtype_t * criterion has no more y_stride * remove redundant constructs and handle const y input * fix criterion docstring for y's dtype * revert docstring for y * formatting: no space between type name and [:, :] * remove redundant X * more cleanup on criterion * address comment

simplify apply_dense with memoryviews

095f173

adrinjalali added 2 commits December 30, 2018 16:31

change more Xs to memviews in splitter

99b946e

remove commented out lines

17c3bbb

NicolasHug reviewed Dec 30, 2018

View reviewed changes

sklearn/tree/_tree.pyx Outdated Show resolved Hide resolved

adrinjalali added 2 commits December 31, 2018 11:58

trying to remove y_stride

096276e

float->dtype_t

7248a8c

jnothman reviewed Jan 2, 2019

View reviewed changes

criterion has no more y_stride

a416c70

adrinjalali added 2 commits January 2, 2019 17:02

remove redundant constructs and handle const y input

554b79e

fix criterion docstring for y's dtype

3ae6c5a

NicolasHug reviewed Jan 2, 2019

View reviewed changes

sklearn/tree/_criterion.pyx Outdated Show resolved Hide resolved

rth reviewed Jan 2, 2019

View reviewed changes

adrinjalali added 2 commits January 2, 2019 23:53

revert docstring for y

b136c8b

formatting: no space between type name and [:, :]

5899d08

jnothman reviewed Jan 7, 2019

View reviewed changes

adrinjalali mentioned this pull request Jan 15, 2019

[MRG] Fix #12363 Included criterion.pxd #12381

Merged

NicolasHug reviewed Feb 6, 2019

View reviewed changes

more cleanup on criterion

a5d2457

NicolasHug approved these changes Feb 10, 2019

View reviewed changes

jnothman approved these changes Feb 19, 2019

View reviewed changes

thomasjpfan reviewed Feb 25, 2019

View reviewed changes

sklearn/tree/_criterion.pxd Outdated Show resolved Hide resolved

sklearn/tree/_criterion.pyx Show resolved Hide resolved

address comment

fd7b5fe

glemaitre merged commit 90570ab into scikit-learn:master Feb 28, 2019

adrinjalali deleted the tree/memoryviews branch February 28, 2019 18:31

jeremiedbb mentioned this pull request Apr 12, 2019

[MRG] Fix missing 'const' in a few memoryview declaration in trees. #13626

Merged

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "MNT simplify some tree code with memoryviews (scikit-learn#12886

74595ff

)" This reverts commit ace4eb1.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "MNT simplify some tree code with memoryviews (scikit-learn#12886

b7db165

)" This reverts commit ace4eb1.

MNT simplify some tree code with memoryviews #12886

MNT simplify some tree code with memoryviews #12886

Conversation

adrinjalali commented Dec 28, 2018

amueller commented Dec 28, 2018

adrinjalali commented Dec 30, 2018

amueller commented Dec 30, 2018

amueller commented Dec 30, 2018

adrinjalali commented Dec 31, 2018

NicolasHug commented Dec 31, 2018 • edited Loading

adrinjalali commented Dec 31, 2018

NicolasHug commented Dec 31, 2018

adrinjalali commented Dec 31, 2018

jnothman left a comment

Choose a reason for hiding this comment

jnothman commented Jan 2, 2019

NicolasHug commented Jan 2, 2019

adrinjalali commented Jan 2, 2019

NicolasHug commented Jan 2, 2019

NicolasHug commented Jan 2, 2019

adrinjalali commented Jan 2, 2019

adrinjalali commented Jan 2, 2019

rth left a comment

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

NicolasHug commented Jan 7, 2019

adrinjalali commented Jan 7, 2019

jnothman commented Jan 7, 2019 via email

adrinjalali commented Jan 10, 2019

thomasjpfan commented Jan 14, 2019

jnothman commented Jan 15, 2019 via email

NicolasHug commented Feb 6, 2019 • edited Loading

jnothman commented Feb 6, 2019

glemaitre commented Feb 6, 2019 • edited Loading

adrinjalali commented Feb 6, 2019

jnothman commented Feb 6, 2019

adrinjalali commented Feb 6, 2019

NicolasHug left a comment

Choose a reason for hiding this comment

NicolasHug Feb 6, 2019

Choose a reason for hiding this comment

adrinjalali Feb 9, 2019

Choose a reason for hiding this comment

NicolasHug Feb 9, 2019

Choose a reason for hiding this comment

adrinjalali commented Feb 15, 2019

jnothman left a comment

Choose a reason for hiding this comment

adrinjalali commented Feb 19, 2019

adrinjalali commented Feb 25, 2019

NicolasHug commented Dec 31, 2018 •

edited

Loading

NicolasHug commented Feb 6, 2019 •

edited

Loading

glemaitre commented Feb 6, 2019 •

edited

Loading