-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] _tree.pyx documentation additions #5010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -157,7 +230,18 @@ cdef class ClassificationCriterion(Criterion): | |||
|
|||
def __cinit__(self, SIZE_t n_outputs, | |||
np.ndarray[SIZE_t, ndim=1] n_classes): | |||
# Default values | |||
""" | |||
Initialize attributes for this classifier, automatically calling the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
classifier -> criterion
General comment: we should agree on a common terminology regarding outputs. So far the code was mentioning "targets" or "outputs". Now you have added/replaced with "responses". I dont know what is best but we should try to limit ourselves -- it is otherwise more confusing than helpful. |
total += entropy | ||
label_count_total += label_count_stride | ||
|
||
# Return the total entropy over all responses divided by the number of | ||
# responses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, not sure if the added verbosity is so helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, this code is quite straight forward. You might want to rename total
to total_entropy
if you want to make it fully explicit. No need for inline comments in this function.
Thanks for this PR! In general, I agree with your proposal, in particular at places where docstrings were missing. However, I am not totally happy with the inline comments you added. It is sometimes adding too much verbosity for nothing. In my opinion, inline comments should be used with parsimony, at places where the code may be difficult to understand by itself -- adding too many of them may have an overall negative impact. I dont know what is others opinion on this? |
I can tone down the inline comments, especially in places where they were not helpful. I think 'targets' is the right way to talk about y, so I can change that as well. |
I agree in general. When the "what" is not clear from the Python code, then generally renaming the local variable to have more explicit names is a better solution than adding an inline comment. When the "why" of a particular line is not clear, then an inline comment is the good solution. |
@@ -391,29 +548,49 @@ cdef class Entropy(ClassificationCriterion): | |||
|
|||
cdef double entropy = 0.0 | |||
cdef double total = 0.0 | |||
cdef double tmp | |||
cdef double pmk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think probability_mass_k
is more explicit. The statements are short so we can afford a long variable name here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively density_k
or frequency_k
are fine as well.
General remark for docstrings: it is recommend to follow PEP 257 to be nice with developer tools like IDE tooltips and sphinx API doc. In particular that means following the structure: def my_function(argument_1, argument_2):
"""One-line short summary of the function in less than 80 cols
Extended paragraph that give a detailed description of what the function
does. Note that the extended paragraph is separated from the one line
short summary by a blank line.
... (parameters doc and so on)
"""
# body of the function here |
The input data which is being split | ||
start: int64 | ||
The first sample to be used on this leaf | ||
end: int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use SIZE_t as type definition.
if tmp > 0.0: | ||
tmp /= weighted_n_right | ||
entropy_right -= tmp * log(tmp) | ||
# Calculate this classes entropy on the left node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this
Did my round of review, should be good to go once they are fixed along with Olivier's. |
pass | ||
|
||
cdef void reset(self) nogil: | ||
"""Reset the criterion at pos=start.""" | ||
"""Placeholder for a method which will reset the criterion at | ||
pos=start. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to add the fact that this is a place holder, I would say
"""Reset the criterion at pos=start
This methods must be implemented by the sub-class.
"""
Instead of saying that the method is a placeholder, I would find more informative to say that the method |
safe_realloc(&self.n_classes, n_outputs) | ||
|
||
cdef SIZE_t k = 0 | ||
cdef SIZE_t label_count_stride = 0 | ||
|
||
# For each target, set the number of unique classes in that target, | ||
# and also set the stride for that target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and also set the stride for that target
-> and also compute the maximal stride of all targets
All suggestions should be incorporated. Thank you! |
Apparently this commit broke Gaussian processes! But only for Python 3.4 on Travis. I always knew that Gaussian processes were secretly opposed to documentation. |
Yes, this is due to #5045. Dont worry about it :) |
_gradient_boosting.c decided to tag along for the ride, so I accidentally pushed it too, but I went back and fixed the issue (I believe). All changes should be incorporated now. |
Merging this and making a few minor changes. Thanks for your patience! |
[MRG] _tree.pyx documentation additions
Thanks @jmschrei ! |
This is the documentation aspect of #5007 alone. No functional changes to the code are made, and it passes all nosetests. This documentation covers all criterion, and the dense splitters, and includes both docstrings and inline comments.