Skip to content

DOC Fix tree explanation of tree_.value in example #29331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 2, 2024

Conversation

adam2392
Copy link
Member

Reference Issues/PRs

Fixes: #28921

What does this implement/fix? Explain your changes.

Fixes the explanation of tree_.value as suggested in #28921 (comment).

A follow-on PR that changes the name of tree_.value with a deprecation to tree_.weighted_proportion seems better and more explicit.

Any other comments?

cc: @lesteve

Signed-off-by: Adam Li <adam2392@gmail.com>
Copy link

github-actions bot commented Jun 21, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: dfcd9e5. Link to the linter CI: here

Copy link
Member

@ArturoAmorQ ArturoAmorQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @adam2392, thanks for addressing this issue :) Here's a couple of comments:

  • On top of the API changelog entry, we might want to also include a versionchanged directive for each of the attributes on those estimators.

  • For the FIX changelog entry, I think you are linking to an unrelated PR, but in any case, the proportion parameter of plot_tree function has not changed in version 1.6 as far as I know, or am I misunderstanding something?

@adam2392
Copy link
Member Author

adam2392 commented Jun 24, 2024

  • On top of the API changelog entry, we might want to also include a versionchanged directive for each of the attributes on those estimators.

The tree_.value is currently an undocumented attributed, so should I add a versionchanged to the tree_ attribute in each of these tree classes?

  • For the FIX changelog entry, I think you are linking to an unrelated PR, but in any case, the proportion parameter of plot_tree function has not changed in version 1.6 as far as I know, or am I misunderstanding something?

The PR that I linked is when the behavior changed such that tree_.value in Cython stores proportions instead of absolute counts. The plot_tree does not have any API changes you're right, so I suppose I can change the fix changelog entry for v1.6 and remove that altogether, since this is just a patch at the documentation/example level.

Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392 adam2392 requested a review from ArturoAmorQ June 24, 2024 17:09
@adam2392
Copy link
Member Author

The issues raised are fixed now:

  1. only show floating point up to 3 decimal places when printing tree_.value in the example
  2. show proportions when plotting tree (here I added an API change to set proportion=True by default, since this is in line with the behavior of tree_.value. (pending commit)

ref: https://output.circle-artifacts.com/output/job/bdc192fe-d4f9-4649-9ff0-03ddfb038abf/artifacts/0/doc/auto_examples/tree/plot_unveil_tree_structure.html

Signed-off-by: Adam Li <adam2392@gmail.com>
@ArturoAmorQ
Copy link
Member

I would rather not change the behavior of plot_tree for backward compatibility and just set proportion=True in the example. Could you please revert the latest commit?

Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Member Author

adam2392 commented Jun 25, 2024

Kay that makes sense! Done

Copy link
Member

@ArturoAmorQ ArturoAmorQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of final comments but otherwise LGTM :)

Comment on lines 944 to 945
:pr:`13649` by :user:`Samuel Ronsin <samronsin>`, initiated by
:user:`Patrick O'Reilly <pat-oreilly>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging @glemaitre for his opinion to rather linking #27639 here.

Copy link
Member

@lesteve lesteve Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#27639 indeed seems more appropriate, since this is where the choice of switching to a weighted proportion was made IIUC.

For better visibility I would also put it in a changed models section i.e. a lot more towards the top. For example there is such a section in 1.5 but a new one needs to be added for 1.4.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edited and moved to the top.

@ArturoAmorQ ArturoAmorQ changed the title API, FIX, DOC Fix tree explanation of tree_.value in example DOC Fix tree explanation of tree_.value in example Jun 27, 2024
Copy link
Member

@lesteve lesteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, a few comments from me

Comment on lines 944 to 945
:pr:`13649` by :user:`Samuel Ronsin <samronsin>`, initiated by
:user:`Patrick O'Reilly <pat-oreilly>`.
Copy link
Member

@lesteve lesteve Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#27639 indeed seems more appropriate, since this is where the choice of switching to a weighted proportion was made IIUC.

For better visibility I would also put it in a changed models section i.e. a lot more towards the top. For example there is such a section in 1.5 but a new one needs to be added for 1.4.

Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Member Author

Thanks for the review @lesteve! I have addressed your comments in 7343096

Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392 adam2392 requested a review from lesteve June 27, 2024 13:58
#
# One could convert this to the absolute weighted number of samples reaching a node,
# by multiplying this number by `tree_.weighted_n_node_samples[node_idx]` for the
# given node, or by `tree_.n_node_samples[node_idx]` for the unweighted number of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit unsure about multiplying tree_.value (proportion of samples taking into account sample weights) by tree_.n_node_samples[node_idx] which does not take sample weights into account. It seems to me that the output of this multiplication has not a clear meaning, but maybe I am missing something ...

Maybe only mention tree_.weighted_n_nodes_sample and mention that sample weights are not used in this example, which is the same as being one for all the samples?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 3c1f24a


tree.plot_tree(clf)
tree.plot_tree(clf, proportion=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that works for this example, but in an ideal world (not this PR) it would be nice if by default plot_tree would show something that matches tree_.value. Is it worth doing a 2 release deprecation period where you say that proportion default value is going to switch from False to True. Could we bend our rules and switch to proportion=True directly?

I'd slightly lean towards bending our rules and switch to proportion=True by default but this is a bit controversial.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally did deprecation, but @ArturoAmorQ (#29331 (comment)) suggested I keep the default behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly think either way is okay as this is like power-user stuff

@adam2392 adam2392 requested a review from lesteve July 1, 2024 15:37
@lesteve
Copy link
Member

lesteve commented Jul 2, 2024

OK thanks a lot for your patience and continued work on this @adam2392, let's merge this one!

@lesteve lesteve merged commit 610d4f7 into scikit-learn:main Jul 2, 2024
30 checks passed
@adam2392 adam2392 deleted the treedescription branch July 2, 2024 12:11
@lesteve
Copy link
Member

lesteve commented Jul 2, 2024

@adam2392 if you feel adventurous you may want to open an issue about aggressively switching the default to proportion=True in plot_tree in order to match tree.value and get other maintainers feeling about it. Not sure if other visualization functions are affected by this to be 100% honest ...

snath-xoc pushed a commit to snath-xoc/scikit-learn that referenced this pull request Jul 5, 2024
)

Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Sep 9, 2024
)

Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
glemaitre pushed a commit that referenced this pull request Sep 11, 2024
Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Undocumented change in tree_.value example for DecisionTreeClassifier between versions 1.3.2 and 1.4.2
3 participants