[MRG] _tree.pyx documentation additions #5010

jmschrei · 2015-07-21T07:59:06Z

This is the documentation aspect of #5007 alone. No functional changes to the code are made, and it passes all nosetests. This documentation covers all criterion, and the dense splitters, and includes both docstrings and inline comments.

amueller · 2015-07-21T08:54:12Z

This looks great! ping @glouppe @arjoly

glouppe · 2015-07-21T13:45:31Z

sklearn/tree/_tree.pyx

@@ -157,7 +230,18 @@ cdef class ClassificationCriterion(Criterion):

    def __cinit__(self, SIZE_t n_outputs,
                  np.ndarray[SIZE_t, ndim=1] n_classes):
-        # Default values
+        """
+        Initialize attributes for this classifier, automatically calling the


classifier -> criterion

glouppe · 2015-07-21T13:58:44Z

General comment: we should agree on a common terminology regarding outputs. So far the code was mentioning "targets" or "outputs". Now you have added/replaced with "responses". I dont know what is best but we should try to limit ourselves -- it is otherwise more confusing than helpful.

glouppe · 2015-07-21T13:59:57Z

sklearn/tree/_tree.pyx

            total += entropy
            label_count_total += label_count_stride

+        # Return the total entropy over all responses divided by the number of
+        # responses.


again, not sure if the added verbosity is so helpful

+1, this code is quite straight forward. You might want to rename total to total_entropy if you want to make it fully explicit. No need for inline comments in this function.

glouppe · 2015-07-21T14:08:15Z

Thanks for this PR! In general, I agree with your proposal, in particular at places where docstrings were missing. However, I am not totally happy with the inline comments you added. It is sometimes adding too much verbosity for nothing. In my opinion, inline comments should be used with parsimony, at places where the code may be difficult to understand by itself -- adding too many of them may have an overall negative impact. I dont know what is others opinion on this?

jmschrei · 2015-07-21T14:10:45Z

I can tone down the inline comments, especially in places where they were not helpful. I think 'targets' is the right way to talk about y, so I can change that as well.

ogrisel · 2015-07-21T14:18:23Z

In my opinion, inline comments should be used with parsimony, at places where the code may be difficult to understand by itself -- adding too many of them may have an overall negative impact. I dont know what is others opinion on this?

I agree in general. When the "what" is not clear from the Python code, then generally renaming the local variable to have more explicit names is a better solution than adding an inline comment.

When the "why" of a particular line is not clear, then an inline comment is the good solution.

ogrisel · 2015-07-21T14:20:01Z

sklearn/tree/_tree.pyx

@@ -391,29 +548,49 @@ cdef class Entropy(ClassificationCriterion):

        cdef double entropy = 0.0
        cdef double total = 0.0
-        cdef double tmp
+        cdef double pmk


I think probability_mass_k is more explicit. The statements are short so we can afford a long variable name here.

alternatively density_k or frequency_k are fine as well.

ogrisel · 2015-07-21T14:25:24Z

General remark for docstrings: it is recommend to follow PEP 257 to be nice with developer tools like IDE tooltips and sphinx API doc. In particular that means following the structure:

def my_function(argument_1, argument_2):
    """One-line short summary of the function in less than 80 cols

    Extended paragraph that give a detailed description of what the function
    does. Note that the extended paragraph is separated from the one line
    short summary by a blank line.

    ... (parameters doc and so on)
    """
    # body of the function here

arjoly · 2015-07-23T07:40:09Z

sklearn/tree/_tree.pyx

+            The input data which is being split
+        start: int64
+            The first sample to be used on this leaf
+        end: int64


I would use SIZE_t as type definition.

glouppe · 2015-07-24T06:22:55Z

sklearn/tree/_tree.pyx

-                if tmp > 0.0:
-                    tmp /= weighted_n_right
-                    entropy_right -= tmp * log(tmp)
+                # Calculate this classes entropy on the left node


Remove this

glouppe · 2015-07-24T06:33:54Z

Did my round of review, should be good to go once they are fixed along with Olivier's.

arjoly · 2015-07-24T07:33:15Z

sklearn/tree/_tree.pyx

        pass

    cdef void reset(self) nogil:
-        """Reset the criterion at pos=start."""
+        """Placeholder for a method which will reset the criterion at 
+        pos=start.


If you want to add the fact that this is a place holder, I would say

"""Reset the criterion at pos=start This methods must be implemented by the sub-class. """

arjoly · 2015-07-24T07:46:13Z

Instead of saying that the method is a placeholder, I would find more informative to say that the method
must be implemented by the sub-class.

arjoly · 2015-07-24T07:48:21Z

sklearn/tree/_tree.pyx

        safe_realloc(&self.n_classes, n_outputs)

        cdef SIZE_t k = 0
        cdef SIZE_t label_count_stride = 0

+        # For each target, set the number of unique classes in that target,
+        # and also set the stride for that target


and also set the stride for that target-> and also compute the maximal stride of all targets

jmschrei · 2015-07-29T10:55:21Z

All suggestions should be incorporated. Thank you!

jmschrei · 2015-07-29T11:17:45Z

Apparently this commit broke Gaussian processes! But only for Python 3.4 on Travis. I always knew that Gaussian processes were secretly opposed to documentation.

glouppe · 2015-07-29T11:25:06Z

Yes, this is due to #5045. Dont worry about it :)

jmschrei · 2015-07-30T16:22:47Z

_gradient_boosting.c decided to tag along for the ride, so I accidentally pushed it too, but I went back and fixed the issue (I believe). All changes should be incorporated now.

glouppe · 2015-07-31T07:15:34Z

Merging this and making a few minor changes. Thanks for your patience!

[MRG] _tree.pyx documentation additions

arjoly · 2015-07-31T08:00:13Z

Thanks @jmschrei !

Jacob Schreiber added 3 commits July 16, 2015 09:53

Generated C file

bda486d

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn

8b6844a

Documentation added to criterion, dense splitters.

176f4cf

jmschrei force-pushed the _tree_doc branch from ce6f945 to 176f4cf Compare July 21, 2015 08:00

glouppe reviewed Jul 21, 2015
View reviewed changes

ogrisel reviewed Jul 21, 2015
View reviewed changes

arjoly reviewed Jul 23, 2015
View reviewed changes

glouppe reviewed Jul 24, 2015
View reviewed changes

arjoly reviewed Jul 24, 2015
View reviewed changes

glouppe mentioned this pull request Jul 28, 2015

[WIP] _tree.pyx rewrite #5041

Closed

30 tasks

Final round of revisions

3a135f5

jmschrei force-pushed the _tree_doc branch from fcc6add to 3a135f5 Compare July 30, 2015 16:21

glouppe added a commit that referenced this pull request Jul 31, 2015

Merge pull request #5010 from jmschrei/_tree_doc

d857349

[MRG] _tree.pyx documentation additions

glouppe merged commit d857349 into scikit-learn:master Jul 31, 2015

Uh oh!

[MRG] _tree.pyx documentation additions #5010

[MRG] _tree.pyx documentation additions #5010

Uh oh!

Conversation

jmschrei commented Jul 21, 2015

Uh oh!

amueller commented Jul 21, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glouppe commented Jul 21, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glouppe commented Jul 21, 2015

Uh oh!

jmschrei commented Jul 21, 2015

Uh oh!

ogrisel commented Jul 21, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Jul 21, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glouppe commented Jul 24, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arjoly commented Jul 24, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmschrei commented Jul 29, 2015

Uh oh!

jmschrei commented Jul 29, 2015

Uh oh!

glouppe commented Jul 29, 2015

Uh oh!

jmschrei commented Jul 30, 2015

Uh oh!

glouppe commented Jul 31, 2015

Uh oh!

arjoly commented Jul 31, 2015

Uh oh!

Uh oh!