-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] split tree module into several packages #5230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In all files using Numpy, make sure that the C-API is properly initialized using See https://github.com/cython/cython/wiki/tutorials-numpy#c-api-initalization This should fix the segfault I believe. I'll check the structure tomorrow. |
Solved. You're a genius. :) |
I got bit by this very obscure error several times and learned from it :) |
@@ -1074,6 +1074,7 @@ def _init_decision_function(self, X): | |||
raise ValueError("X.shape[1] should be {0:d}, not {1:d}.".format( | |||
self.n_features, X.shape[1])) | |||
score = self.init_.predict(X).astype(np.float64) | |||
print(score) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove this with the first round of comments.
As a double check, could you quickly make sure that there is no regression in terms of speed? (Should be all the same, but we are never too sure) (I am saying this because these changes add 12000 new lines of C code) |
For |
Other than my comments, I am happy with the code structure. I hope it will make our lives easier for the next PRs :) |
@@ -14,7 +14,7 @@ | |||
|
|||
from ..externals import six | |||
|
|||
from . import _tree | |||
from . import _tree, _criterion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you have one import per line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better to have one import per line as it's easier to see changes whenever you use a software like git.
The culprit is cython and this is well known. Cython adds a lot of functions to handle all sort of situations without letting you know. |
Looks good, thanks @jmschrei |
All changes incorporated, and all unit tests pass on my computer. Lets see if travis/appveyor build. |
A last nitpick if you agree, can you sort import alphabetically? Thanks a lot for this pr. |
Times seem to be the same as well. All estimators use default settings and I'll alphabetize imports now.
|
Could you run |
Thanks for the checks. +1 on my side |
Windows is not happy. I think maybe I'm not allowed to import inline functions? |
034a187
to
5978c0b
Compare
Everything looks good, commits have been squashed. @arjoly do you give your +1? |
|
I was thinking to this benchmark.
Overall it looks good. I am merging. |
[MRG+1] split tree module into several packages
🎺 |
thanks @jmschrei I am eager to see new speed up! :-) |
This pull request handles splitting
_tree.pyx
into_tree.pyx
with Tree and TreeBuilders,_splitter.pyx
with all splitters and split records,_criterion.pyx
with all criteria, and moves some functions into_utils.pyx
. No code is changed, simply moved, in this PR. I almost have this working, with all unit tests passing except those which trigger a specific segfault. The segfault relates to the following function:specifically, this call:
In the test case,
shape = 1
,data = [ 2 ]
, and it triggers a segfault. I have been reading aboutPyArray_SimpleNewFromData
but figured that it might be easier to ask. So, (1) do you agree with this layout, and (2) do you know why this segfault is being triggered? Thanks!cc @glouppe @arjoly @ogrisel