Skip to content

Doc: follow up on histogram normalization example #27459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

story645
Copy link
Member

@story645 story645 commented Dec 7, 2023

PR summary

Per #27426 (comment), PR aimed at reducing some of the cognitive load required to understand this example by adding:

  • subheadings pulling from the bullet list at the top of the page,
    • particularly pulled out/made each parameter the focus
    • moved examples to the bottom -> that way focus is on Matplotlib functionality
  • add titles to plots to guide folks on what they're supposed to be looking for
  • add an alpha to the data so that it's not competing with the histogram for visual focus.
  • deemphasizing labeling in the code because the point of this example is supposed to be histograms

As mentioned in #27426 (comment), reducing cognitive load makes the example more accessible (in a DEI way), which is in keeping with our CoC and mission. Universal design principals are that material that is more accessible tends to lead to better learning outcomes even for folks who don't need the accommodation.

I also moved it to tutorials because on https://matplotlib.org/devdocs/ we've started making the distinction that

  • demos -> plot types, examples
  • usage -> user guide, tutorial

This document is very much focused on why and how normalization works and when to use it rather than a demonstration of features (though this PR revision emphasizes the parameters) and I think the more we start scoping things the easier it'll be to start building identities around each section of the docs, which hopefully will help w/ discoverability.

-->

PR checklist

@story645 story645 added the Documentation: tutorials files in galleries/tutorials label Dec 7, 2023
@story645 story645 changed the title Doc histogram Doc: follow up on histogram normalization example Dec 7, 2023
@story645 story645 marked this pull request as draft December 8, 2023 06:49
@story645 story645 force-pushed the doc-histogram branch 2 times, most recently from aec07ce to de7b768 Compare December 18, 2023 07:30
@story645 story645 marked this pull request as ready for review December 18, 2023 08:11
Copy link
Member

@QuLogic QuLogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The figure width comment applies to all figures, but I did not feel like commenting on all of them.

Copy link
Member

@QuLogic QuLogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, submitted a bit early.

I also think using 'False'/'True' instead of False/True is a bit weird, but that's not new here.

Comment on lines 188 to 280
ax.hist(xdata, bins=xbins, weights=1/len(xdata) * np.ones(len(xdata)),
histtype='step', label=f'{dx}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be equivalent:

Suggested change
ax.hist(xdata, bins=xbins, weights=1/len(xdata) * np.ones(len(xdata)),
histtype='step', label=f'{dx}')
ax.hist(xdata, bins=xbins, weights=np.full_like(xdata, 1 / len(xdata)),
histtype='step', label=f'{dx}')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think np.ones(N)/N is also equivalent, but I thought the goal was to make it clear that the scale factor here is 1/N

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1/N is still shown in my suggestion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but I don't see how your suggestion is clearer since full_like isn't a common method (also when I tried it I got all 0s since I didn't do 1./N)

@story645 story645 force-pushed the doc-histogram branch 3 times, most recently from 1d65522 to e9c7d92 Compare December 20, 2023 04:03
@story645 story645 marked this pull request as draft December 20, 2023 16:26
@story645
Copy link
Member Author

story645 commented Dec 20, 2023

ToDo: I wanna pull out the one line that's the histogram before or after the code block to more clearly isolate/explain how to do the things. - done

@story645 story645 marked this pull request as ready for review December 21, 2023 05:56
@tacaswell
Copy link
Member

I have several substantial concerns about this.

The bulk of these changes were proposed and rejected as part of the discussion in #27426. I don't see any discussion of what is different now.

If this content is moved, it should probably be moved to "exp}lain" not "tutorial" as it is (was) an explanation of what each of the (many) imports to hist do (https://diataxis.fr/explanation/ vs https://diataxis.fr/tutorials/).

While discussion of cognitive load is reasonable, I have concerns with it being used as the primary justification of these changes. While in some cases increasing the structure can help, introducing the wrong structure can also make it worse.

Previously the two top level sections were "I want to change the bin edges" (which is only bins) and "I want to change the meaning of the bin values" (which is density and weights). In this case I actually think these changes increase cognitive load by leaning too far into a parameter per subheading and by moving the most clarifying text abou why things behave the way they to into "case studies".

# weights
# =======
#
# Sometimes people want to normalize so that the sum of counts is one. This is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is underselling why the weight keyword exists. If we are going to go with this re-organization the subsections need to be individually complete and correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at all the changes here, but agreed that the point of the example was not to explain all the uses of the weight parameter, but rather to point out that it can be used to get the effect of normalizing by N which people sometimes want.

@story645
Copy link
Member Author

story645 commented Dec 23, 2023

The bulk of these changes were proposed and rejected as part of the discussion in #27426.

As linked to at the very top of this PR, @jklymak said to put these changes in a follow up PR:

I'd suggest further changes could be follow-up PRs.

"I want to change the meaning of the bin values" (which is density and weights). In this case I actually think these changes increase cognitive load by leaning too far into a parameter per subheading

The reason for separating them is that the Matplotlib library provides them as separate independent keywords so someone scanning this doc can see "oh density, and here's an explanation", "oh weights, here's an explanation" -> that separation enforces that these are distinct parameters used to achieve distinct goals & that's why the API provides both as distinct entities.

and by moving the most clarifying text abou why things behave the way they to into "case studies".

The explanations of what weights does and what density does are still under the respective parameter subheadings, so can you please explain what you mean here? Frankly I think the "why I want to change the meaning of the bin values" is completely out of scope for the Matplotlib docs because that's data/stats, but I can see the value in it so I left it as case studies.

I'll flesh out that intro to that section/change the title to make that clearer, but I think for the Matplotlib docs we want to separate the "how does it work" from the "why do I want to do this" b/c the why is different for every user and many users know their why but still need to know our how.

ETA: also I think it's a good idea to put back a higher level "computing bin values" before the density and weights section and I'll put that back.

@story645
Copy link
Member Author

story645 commented Dec 23, 2023

If this content is moved, it should probably be moved to "exp}lain" not "tutorial" as it is (was) an explanation of what each of the (many) imports to hist do (https://diataxis.fr/explanation/ vs https://diataxis.fr/tutorials/).

I could see that, in that the example it's most similar to is https://matplotlib.org/devdocs/users/explain/artists/imshow_extent.html and I don't mind it moving there.

But also, the rewrite is in line with the diataxis 'tutorial' style

The principles outlined below - repetition, action, small steps, results early and often, concreteness

it works through the parameters one by one building them up, with repetition and concrete images at every step and It breaks up the explanations into very small chunks paired with immediate concrete examples. Even the pulling out the code before the larger examples is to prime the reader to see it in the larger example-> again repetition + small steps. (ETA: it's even written in the we and that was @jklymak )

Which, a related but separate discussion is untangling tutorials from user guide because currently they're explicitly intermixed in a way where it's very unclear which one a user should go to.

@jklymak
Copy link
Member

jklymak commented Dec 24, 2023

The reason for separating them is that the Matplotlib library provides them as separate independent keywords so someone scanning this doc can see "oh density, and here's an explanation", "oh weights, here's an explanation" -> that separation enforces that these are distinct parameters used to achieve distinct goals & that's why the API provides both as distinct entities.

The original goal of this example was to give people an idea how to make certain types of histograms, not explain the API, which is what the API docs should do. In particular, the original goal was to document how you get the probability mass function, since people are often confused that is not what density=True provides.

An example of using weights for other uses would probably be fine, since the scope of the example broadened somewhat as it was composed.

I don't think this belongs in Tutorials. Having Tutorials creep back into being a catchall for things we want to explain would be an organizational step backwards. I think it belongs in Examples, because we do not have in-depth examples like this in the Users guide, and adding a bunch of very specialized ones like this would be hard to do in a balanced fashion.

@story645
Copy link
Member Author

. I think it belongs in Examples, because we do not have in-depth examples like this in the Users guide, and adding a bunch of very specialized ones like this would be hard to do in a balanced fashion.

I definitely don't think it belongs in the user guide - but I also don't think that the imshow guide belongs in the user guide either. You said yourself elsewhere that you think of the gallery as short stack overflow style demos -> which an explanation of histogram normalization is not.

Having Tutorials creep back into being a catchall for things we want to explain would be an organizational step backwards.

What I'm proposing is tutorials be the place for long form explanatory tutorials of how to use our plotting methods...which yes means pulling some things back from the user guide. When I'm thinking clean separation and scope, my mind goes to:

  • User guide: key concepts/core objects-what is a figure/axes/formatter/locator/norm/colormap
  • Tutorial: how do I use the histogram method?
  • Gallery: here's a demo of the pmf with a link back to the histogram usage guide/tagged plot_type:histogram like the usage guide so that they'll show up together.

Granted, what I really would like is to maybe do this sorting out as another GSOD, but there's no way it'll be successful if we can't compromise somehow. I think for the most part the bulk of the work ends up being cleaning up framing so that it's consistent within a section, with some moving around based on agreed upon content guidelines.

The original goal of this example was to give people an idea how to make certain types of histograms, not explain the API, which is what the API docs should do.

I think it's nice to have docs that show how all the API parameters work in one place/how to use those parameters (this is a thing folks ask for a lot - how do I use pie) & that in the API docs this kinda step by step w/ pictures would be majorly distracting/annoying for folks who already know how to use hist & just want to double check an argument - which that's the instructive vs reference distinction.

In particular, the original goal was to document how you get the probability mass function, since people are often confused that is not what density=True provides

Then it should have been scoped to that, but the scope grew and it became a tutorial on histogram binning. Fundementally what I'm aiming at here is reframing this document so that it's primarily focused on library functionality. I was very careful to not remove any of the content you wrote.

@jklymak
Copy link
Member

jklymak commented Dec 24, 2023

What I'm proposing is tutorials be the place for long form explanatory tutorials of how to use our plotting methods...which yes means pulling some things back from the user guide. When I'm thinking clean separation and scope, my mind goes to:

  • User guide: key concepts/core objects-what is a figure/axes/formatter/locator/norm/colormap
  • Tutorial: how do I use the histogram method?
  • Gallery: here's a demo of the pmf with a link back to the histogram usage guide/tagged plot_type:histogram like the usage guide so that they'll show up together.

I don't agree with that as an endpoint for the user guide. The user guide should be how to use Matplotlib, as completely as we can manage. To me, that eventually would include "how do I use the histogram method". Tutorials should be like the "Lifecycle of a plot" variety, or other online tutorials that take the reader on a complete guided journey. Examples should be standalone, and relatively short.

This particular example is long, so I agree it is borderline. An advantage to leaving it in "examples" is that it puts all the hist things in one spot. If one were to make a "hist tutorial" it would ideally also discuss styling the histograms, and whatever else is in the hist examples. Eventually with enough such method-based tutorials, you could imagine migrating them to the user guide.

Then it should have been scoped to that, but the scope grew and it became a tutorial on histogram binning.

Not clear that is a problem - you need to understand the binning if you were going to understand the density arguments because they interact so strongly. I mean I guess there could have been two separate examples, but that doesn't seem necessary

@story645
Copy link
Member Author

story645 commented Dec 25, 2023

Tutorials should be like the "Lifecycle of a plot" variety, or other online tutorials that take the reader on a complete guided journey.

Yeah, I'm proposing complete guided journeys of the plotting methods. The feedback I've consistently heard on the lifecycle tutorial is that we ruined what made it good & now nobody understands what it's for. And what do you mean by other online tutorials?

What I'm struggling with is what would be the paragraph at the top of "user guide" and on top of "tutorial" telling the user what they could expect from each page.

Granted, I also wouldn't mind getting rid of tutorials all together if we can't come to a consensus on its purpose.

An advantage to leaving it in "examples" is that it puts all the hist things in one spot.

We can also use tags to do that.

If one were to make a "hist tutorial" it would ideally also discuss styling the histograms, and whatever else is in the hist examples.

Sort of the intent behind restructuring in this way - it provides a natural way to slot in the other hist parameters.

You need to understand the binning if you were going to understand the density arguments because they interact so strongly.
I mean I guess there could have been two separate examples, but that doesn't seem necessary

Two separate examples means that there's a binning example that we could link to/cross reference across every example where it may be helpful to know about binning. If I was gonna draw an n-level deep toctree of the docs, the implication would be that binning is a topic that's important only for normalization because that's where it's explained - a big motivator for a lot of what I'm proposing is to try and prevent users from conflating topics.

ETA: one of my major meta motivations is that I think it's much easier to see what docs are missing if we've got our docs well organized. And to organize our docs, we have to collectively decide on what we think should go where.

ETA 2: Granted, I also think we've had so many back and forths on this and are at a total impasse & it's probably more constructive to hash this out on a call in January at this point.

@story645
Copy link
Member Author

story645 commented Dec 25, 2023

Also, just for the record, I'm not really married to the exact user guide/tutorial bins that I'm proposing here - @melissawm and @esibinga weren't super keen on my exact division either. I'm concerned about putting plotting function tutorials in user guide b/c a) I think we agree that they should be written as tutorials even if we disagree on what that means, b) we've got 60 odd methods, 30 or so in the plot type gallery, and that's a lot of content to add the guide c) my meta proposal is that the guide cover the topics that apply to an arbitrary plot -> figure/axes/formatting/etc - while plotting method usage is guided by individual needs (for example, I've never needed to make a contour plot, but yes under this definition something like the imshow guide would stay under user guide but expanded to make clear it applies to all the AxesImage plot types, which # ties back to underlying object as the organizing principal).

My priority is that each section of the docs have a clear distinct identity, scope, & pedagogical approach (ETA: which is what the diataxes approach is fundementally advocating) so that we can sort documents, improve cohesiveness within a section, and reduce duplication/redundancy.

The motivation for this PR was that the contents of the document changed scope within the original (pulling back to being the histogram features example this replaced) & this attempts to update the structure to match that change in scope. Functionally, this is what I've been doing with basically all my doc PRs lately - sorting and binning to make the distribution of docs/topics/content cleaner. ETA: This is b/c I had thought that there was long standing consensus that the major issue w/ the docs is the information architecture, so I was approaching this PR w/ that in mind. I mostly do not have major concerns w/ the original PR as a standalone document, but within the context of the rest of the documentation I think it is too much a tutorial on histogram to be an example (bins don't need to be explained for a demo of using hist for norming) but as a tutorial is too focused on the statistics/why use histograms and not enough on library functionality and I'm fairly certain our goal is not to recreate https://www.itl.nist.gov/div898/handbook/eda/section3/histogra.htm

Pull tutorial content out of example and into standalone tutorial.

Co-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>
@story645 story645 marked this pull request as draft December 29, 2023 06:20
@story645
Copy link
Member Author

story645 commented Dec 29, 2023

makes more sense to rework this w/ consolidating the other histogram examples in #27569 so shifting back to draft.

@story645 story645 added Documentation and removed Documentation: tutorials files in galleries/tutorials labels Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants