Skip to content

MNT simplify some tree code with memoryviews #12886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Feb 28, 2019

Conversation

adrinjalali
Copy link
Member

Related to #10624

This for now is more of a question, before potentially going further.

There are parts of the tree code which can be significantly simplified using memoryviews, like the one presented in this PR. The question is, is this change valid? I think it passes all the tests and dosn't add anything to the time required for the tests, but I'm not sure if our tests cover things such as passing a readonly array to the function, which I guess is an edge case reading the discussion there.

@amueller
Copy link
Member

we do test readonly arrays, the question is whether they survive all conversions to that point (probably they get copied somewhere).
The tests are not a good benchmark, I think you need to run actual benchmarks on larger datasets both using big and small trees and do full profiling to see if this makes it slower.

@adrinjalali
Copy link
Member Author

It seems at least according to the benchmarks/bench_tree.py, such changes make it faster, and not just by a tiny bit.

Results from the master on my machine:
scikit-learn_tree_benchmark_results-master

Results from this branch on my machine:
scikit-learn_tree_benchmark_results-memview

@amueller
Copy link
Member

well that's interesting. Should also be interesting to @NicolasHug

@amueller
Copy link
Member

how stable are these curves though, can we get bars? ;)

@adrinjalali
Copy link
Member Author

I know I'm doing something wrong here, but really don't know what. I get this very odd looking cython compile error:

Memoryview 'DOUBLE_t[:, :]' not conformable to memoryview 'DOUBLE_t[:, :]'.

Been trying to figure it out, but any help would be appreciated.

@NicolasHug
Copy link
Member

NicolasHug commented Dec 31, 2018

don't ask me why but I think that

from _tree cimport DOUBLE_t

in both splitter and criterion will fix it (also remove ctypedef np.npy_float64 DOUBLE_t).

Very weird that the other dtypes aren't causing the same thing.

Looks similar to this where the fix is similar

@adrinjalali
Copy link
Member Author

don't ask me why but I think that

from _tree cimport DOUBLE_t

in both splitter and criterion will fix it (also remove ctypedef np.npy_float64 DOUBLE_t).

Very weird that the other dtypes aren't causing the same thing.

Looks similar to this where the fix is similar

Ooh, that makes a lot of sense. I wish cython would print the complete type, like, _tree.DOUBLE_t vs. _criterion.DOUBLE_t instead or something. Thanks.

@NicolasHug
Copy link
Member

But does it make sense to you that only DOUBLE_t is causing this, and not e.g. DTYPE_t?

It looks like a Cython bug to me.

@adrinjalali
Copy link
Member Author

But does it make sense to you that only DOUBLE_t is causing this, and not e.g. DTYPE_t?

It looks like a Cython bug to me.

I think the difference may be that DOUBLE_t is the only one used in the context of a memoryview while passing between modules. I'd agree that it's a bug or at least an inconsistency that it has stricter type checking for a_t [:] x than a_t* x though. But please read all of that with the hint of "adrin is still learning cython" :D.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find MemoryViews beneficial to maintainability, and helpful for catching bugs. If bench_tree is happy, and @NicolasHug is happy, I think this is a positive change.

@jnothman
Copy link
Member

jnothman commented Jan 2, 2019

Sorry, got confused. Not sure that @NicolasHug is the one most in need of being happy. Rather, if you are happy that this does not make life more difficult with your other work on trees, I'm happy for it.

@NicolasHug
Copy link
Member

It sure doesn't make my life harder at least ^^
Like Adrin I'm still learning those Cython features, so I can't give an informed review for this. I'm mostly interested on how thing can be done to get some inspiration for #12807

@adrinjalali
Copy link
Member Author

I made some more reliable benchmarks with error bars (10 repeats each) using the benchmarks/bench_tree.py, under 3 different conditions. In the following plots:

  • master: the master
  • only_x: uses memoryviews for X in _tree.* and _splitter.*
  • criterion_y: also uses memoryviews for y in _criterion.*

Charts for increasing number of rows:
Classifier:
clf-rows
Regressor:
reg-rows

Increasing number of columns:
Classifier:
clf-cols
Regressor:
reg-cols

I guess the conclusion from these charts is that when we use memory views for X, there's little or no overhead. But there's some overhead when we try to apply that for y in criterion. Here are a few points about that:

In the case of using memory views for X, we only do a type cast/conversion when we want to use it, and the memory view is not passed around in functions. This can be done since in all those cases we operate under gil and we can do python stuff.

All the _criterion.* code is under nogil, which means we can't do that type cast. As a result y needs to be passed around as a memory view struct. Now the issue is that the memoryview struct is a rather large one, i.e. the generated C code:

typedef struct {
  struct __pyx_memoryview_obj *memview;
  char *data;
  Py_ssize_t shape[8];
  Py_ssize_t strides[8];
  Py_ssize_t suboffsets[8];
} __Pyx_memviewslice;

And this struct is what's passed around, and not a pointer to it. I couldn't find a way to pass around pointers and then do a cast while keeping gil, and I suspect this is the reason behind the overhead for this case.

WDYT?

@NicolasHug
Copy link
Member

What are the units of the axis? You don't just use 10 samples / rows right?

Have you tried removing some of the unnecessary casts in criterion.pyx? For example

    cdef int update(self, SIZE_t new_pos) nogil except -1:
        ...
        cdef DOUBLE_t[:, :] y = self.y
        ...

is probably not needed since self.y has been declared as a memory view already.

I assume looking at the annotated generated code (cython -a) doesn't help since it's all no-gil / no-python interaction code ?

In any case I don't think the overhead of passing the memview struct is causing the slowdown, because this struct is only passed once when building the criterion object right? It's not passed into tons of function calls. Also, I would assume the same overhead occurs for the X view: Cython still has to allocate the memview struct for X, on the heap instead of on the stack.

Like I said above I'm not experienced with advanced Cython so... don't take all this for granted ;)

@NicolasHug
Copy link
Member

Another lead might be that the previous code for y knew that y was C-aligned, while now Cython has to generate code for any kind of alignment adding an unnecessary multiplication. It might be worth it to specify that the view is C-aligned to avoid the innermost stride
multiplication. (see section Caring about memory layout in this paper)

here is the doc for C or F contiguous memviews

@adrinjalali
Copy link
Member Author

Nice tips @NicolasHug . Those together bring all tests as shown in these plots, i.e. no overhead (most of it was after I applied your first comment):

clf-cols
clf-rows
reg-cols
reg-rows

Now I need to figure out a way to to handle the read-only input issue (i.e. failing tests).

@adrinjalali
Copy link
Member Author

I guess this looks good for a review now.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with the trees code but this definitely look more readable.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is helpful to improve the maintainability of the tree code... but this also changes the interface of Criterion which users are known to develop custom extensions to. Perhaps the same is true of Splitter. I'm a little concerned that this will break code (even on not-really-supported API) sufficiently to not be worthwhile.

If we have reason to break these APIs in any case, then this maintainability improvement could be thrown in, but it's not very fair to users otherwise. What do others think?

@NicolasHug
Copy link
Member

My 2 cents is that we should not start caring about non public API changes.

Quoting #10251 (comment)

this interface is not considered public, stable API and should be used at one's own risk.

@adrinjalali
Copy link
Member Author

It's also mostly (if not all) the C API, and not even the python API. I really don't think we should care about backward compatibility of those pieces of the code. That said, I'm happy to have a blog post somewhere (we don't have a blog, do we?) and announce well in advance that there are some C API changes to those parts and explain them.

@jnothman
Copy link
Member

jnothman commented Jan 7, 2019 via email

@adrinjalali
Copy link
Member Author

Personally, I was really confused by the strides and it took me a while to figure they're just there to index the matrices.

I guess we [mostly] agree that this part of the package is not the easiest to get used to, which contributes to inhibiting contributions to it by a wider developer community. FWIW, I personally find memoryviews significantly nicer than what they're replacing here, and much less intimidating.

Generally, I think this PR is not the only thing which can potentially improve readability of the tree code base, and I'd find any contribution towards this goal valuable (the least it does is to ease my life working on trees, and that's why I started this PR). So I would have liked us to follow the "it's not public, therefore we can change the API" principle, at least here.

I understand it's a nice thing to not change APIs, but:

  1. If I were a developer using private parts of a package, I wouldn't expect them to keep it as is.
  2. Even if it was public, I'd support a deprecation cycle to clean up the code base, cause I feel it helps with the maintainability of the code base.

I don't know, that's all I have I guess.

@thomasjpfan
Copy link
Member

@jnothman Is it possible to direct the people using the current version of Criterion to this PR and see what they think about it?

@jnothman
Copy link
Member

jnothman commented Jan 15, 2019 via email

@NicolasHug
Copy link
Member

NicolasHug commented Feb 6, 2019

As far as I can tell the only change to the Criterion is pretty minimal and only impacts the __init__ signature. Someone with a custom Criterion object would only need to wrap they y pointer into a memory view (EDIT: the other way around actually), and remove the stride argument. That's a pretty minor inconvenience for using a private API, IMHO.

@camilstaps, you opened #10251 and needed custom Criterions, WDYT?

Also @glemaitre , @jmschrei, @glouppe , you've been working a lot on the tree code, maybe you want to weight in?

EDIT: for context: this PR greatly simplifies tree code by using memory views instead of pointer + stride, but breaks the Criterion interface, which is private but apparently used anyway by some users.

@jnothman
Copy link
Member

jnothman commented Feb 6, 2019

Thanks for the persuasive market research, @NicolasHug :D

Maybe we should just make the change.

@glemaitre
Copy link
Member

glemaitre commented Feb 6, 2019

Regarding custom Criterion, I think that it is a minor issue. If I recall the PR which was exposing the criterion, it was mentioned that we did not want to officially support it but exposed the criterion for convenience instead.

@adrinjalali
Copy link
Member Author

Maybe we should just make the change.

\o/

@jnothman
Copy link
Member

jnothman commented Feb 6, 2019

What did we decide about read only memoryview handling? Are we copying at master? In this PR?

@adrinjalali
Copy link
Member Author

I don't think we're copying. Some tests did actually fail and then I changed them in splitter and criterions to const memoryview instead and now tests are happy.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than that LGTM!

@@ -454,7 +444,7 @@ cdef class ClassificationCriterion(Criterion):

for k in range(self.n_outputs):
label_index = (k * self.sum_stride +
<SIZE_t> y[i * self.y_stride + k])
<SIZE_t> self.y[i, k])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the annotated html file we can see that using self generates some small Python interactions. I still don't understand why: this is a cdef class and self.y is properly typed, but I have observed this in #12807 as well.

Having only one Python interation is OK, but having interactions inside loops like here might be costly, so it's best to remove them. The workaround I found is to declare a local view and use it instead:

def const DOUBLE_t [:, ::1] y = self.y

Sorry @adrinjalali I know I pretty much told you to use self. in #12886 (comment), but I wasn't aware of this at the time :s

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite strange that the observation contradicts what we'd expect. I did change them and put the local variables in place, but the overhead of the local variable seems to be kinda much more than the little python interaction when using self. So I guess we should keep them as is, although along the way I did a bit more cleanups and the benchmarks are still intact.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay,

I'll try to come up with a reproducing example and ask Cython folks why we're having Python interactions. Meanwhile I agree, let's keep it as is.

Thanks for double-checking!

@adrinjalali
Copy link
Member Author

@jnothman any final verdict on this one?

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favour. Is that final?

@adrinjalali
Copy link
Member Author

I'm in favour. Is that final?

It is for this PR. I've found a bit more in the context of the NOCATS PR, but that can stay there.

For future reference, any such changes should go through time performance tests, sometimes what we see is not intuitive and they may actually slow down the run.

@adrinjalali
Copy link
Member Author

@glemaitre review ping :)

@glemaitre glemaitre merged commit 90570ab into scikit-learn:master Feb 28, 2019
@adrinjalali adrinjalali deleted the tree/memoryviews branch February 28, 2019 18:31
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
* simplify apply_dense with memoryviews

* change more Xs to memviews in splitter

* remove commented out lines

* trying to remove y_stride

* float->dtype_t

* criterion has no more y_stride

* remove redundant constructs and handle const y input

* fix criterion docstring for y's dtype

* revert docstring for y

* formatting: no space between type name and [:, :]

* remove redundant X

* more cleanup on criterion

* address comment
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
* simplify apply_dense with memoryviews

* change more Xs to memviews in splitter

* remove commented out lines

* trying to remove y_stride

* float->dtype_t

* criterion has no more y_stride

* remove redundant constructs and handle const y input

* fix criterion docstring for y's dtype

* revert docstring for y

* formatting: no space between type name and [:, :]

* remove redundant X

* more cleanup on criterion

* address comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants