Doc: follow up on histogram normalization example #27459

story645 · 2023-12-07T07:04:40Z

PR summary

Per #27426 (comment), PR aimed at reducing some of the cognitive load required to understand this example by adding:

subheadings pulling from the bullet list at the top of the page,
- particularly pulled out/made each parameter the focus
- moved examples to the bottom -> that way focus is on Matplotlib functionality
add titles to plots to guide folks on what they're supposed to be looking for
add an alpha to the data so that it's not competing with the histogram for visual focus.
deemphasizing labeling in the code because the point of this example is supposed to be histograms

As mentioned in #27426 (comment), reducing cognitive load makes the example more accessible (in a DEI way), which is in keeping with our CoC and mission. Universal design principals are that material that is more accessible tends to lead to better learning outcomes even for folks who don't need the accommodation.

I also moved it to tutorials because on https://matplotlib.org/devdocs/ we've started making the distinction that

demos -> plot types, examples
usage -> user guide, tutorial

This document is very much focused on why and how normalization works and when to use it rather than a demonstration of features (though this PR revision emphasizes the parameters) and I think the more we start scoping things the easier it'll be to start building identities around each section of the docs, which hopefully will help w/ discoverability.

-->

PR checklist

"closes #0000" is in the body of the PR description to link the related issue
new and changed code is tested
Plotting related features are demonstrated in an example
New Features and API Changes are noted with a directive and release note
Documentation complies with general and docstring guidelines

QuLogic

The figure width comment applies to all figures, but I did not feel like commenting on all of them.

galleries/tutorials/histogram_normalization.py

QuLogic

Oops, submitted a bit early.

I also think using 'False'/'True' instead of False/True is a bit weird, but that's not new here.

QuLogic · 2023-12-19T00:36:51Z

galleries/tutorials/histogram_normalization.py

+    ax.hist(xdata, bins=xbins, weights=1/len(xdata) * np.ones(len(xdata)),
+                   histtype='step', label=f'{dx}')


I think this should be equivalent:

Suggested change

ax.hist(xdata, bins=xbins, weights=1/len(xdata) * np.ones(len(xdata)),

histtype='step', label=f'{dx}')

ax.hist(xdata, bins=xbins, weights=np.full_like(xdata, 1 / len(xdata)),

histtype='step', label=f'{dx}')

I think np.ones(N)/N is also equivalent, but I thought the goal was to make it clear that the scale factor here is 1/N

1/N is still shown in my suggestion?

Yeah, but I don't see how your suggestion is clearer since full_like isn't a common method (also when I tried it I got all 0s since I didn't do 1./N)

galleries/tutorials/histogram_normalization.py

story645 · 2023-12-20T16:27:32Z

ToDo: I wanna pull out the one line that's the histogram before or after the code block to more clearly isolate/explain how to do the things. - done

tacaswell · 2023-12-22T22:02:14Z

I have several substantial concerns about this.

The bulk of these changes were proposed and rejected as part of the discussion in #27426. I don't see any discussion of what is different now.

If this content is moved, it should probably be moved to "exp}lain" not "tutorial" as it is (was) an explanation of what each of the (many) imports to hist do (https://diataxis.fr/explanation/ vs https://diataxis.fr/tutorials/).

While discussion of cognitive load is reasonable, I have concerns with it being used as the primary justification of these changes. While in some cases increasing the structure can help, introducing the wrong structure can also make it worse.

Previously the two top level sections were "I want to change the bin edges" (which is only bins) and "I want to change the meaning of the bin values" (which is density and weights). In this case I actually think these changes increase cognitive load by leaning too far into a parameter per subheading and by moving the most clarifying text abou why things behave the way they to into "case studies".

tacaswell · 2023-12-22T22:03:13Z

galleries/tutorials/histogram_normalization.py

+# weights
+# =======
+#
+# Sometimes people want to normalize so that the sum of counts is one.  This is


This is underselling why the weight keyword exists. If we are going to go with this re-organization the subsections need to be individually complete and correct.

I haven't looked at all the changes here, but agreed that the point of the example was not to explain all the uses of the weight parameter, but rather to point out that it can be used to get the effect of normalizing by N which people sometimes want.

story645 · 2023-12-23T23:16:13Z

The bulk of these changes were proposed and rejected as part of the discussion in #27426.

As linked to at the very top of this PR, @jklymak said to put these changes in a follow up PR:

I'd suggest further changes could be follow-up PRs.

"I want to change the meaning of the bin values" (which is density and weights). In this case I actually think these changes increase cognitive load by leaning too far into a parameter per subheading

The reason for separating them is that the Matplotlib library provides them as separate independent keywords so someone scanning this doc can see "oh density, and here's an explanation", "oh weights, here's an explanation" -> that separation enforces that these are distinct parameters used to achieve distinct goals & that's why the API provides both as distinct entities.

and by moving the most clarifying text abou why things behave the way they to into "case studies".

The explanations of what weights does and what density does are still under the respective parameter subheadings, so can you please explain what you mean here? Frankly I think the "why I want to change the meaning of the bin values" is completely out of scope for the Matplotlib docs because that's data/stats, but I can see the value in it so I left it as case studies.

I'll flesh out that intro to that section/change the title to make that clearer, but I think for the Matplotlib docs we want to separate the "how does it work" from the "why do I want to do this" b/c the why is different for every user and many users know their why but still need to know our how.

ETA: also I think it's a good idea to put back a higher level "computing bin values" before the density and weights section and I'll put that back.

story645 · 2023-12-23T23:40:53Z

If this content is moved, it should probably be moved to "exp}lain" not "tutorial" as it is (was) an explanation of what each of the (many) imports to hist do (https://diataxis.fr/explanation/ vs https://diataxis.fr/tutorials/).

I could see that, in that the example it's most similar to is https://matplotlib.org/devdocs/users/explain/artists/imshow_extent.html and I don't mind it moving there.

But also, the rewrite is in line with the diataxis 'tutorial' style

The principles outlined below - repetition, action, small steps, results early and often, concreteness

it works through the parameters one by one building them up, with repetition and concrete images at every step and It breaks up the explanations into very small chunks paired with immediate concrete examples. Even the pulling out the code before the larger examples is to prime the reader to see it in the larger example-> again repetition + small steps. (ETA: it's even written in the we and that was @jklymak )

Which, a related but separate discussion is untangling tutorials from user guide because currently they're explicitly intermixed in a way where it's very unclear which one a user should go to.

jklymak · 2023-12-24T17:17:18Z

The reason for separating them is that the Matplotlib library provides them as separate independent keywords so someone scanning this doc can see "oh density, and here's an explanation", "oh weights, here's an explanation" -> that separation enforces that these are distinct parameters used to achieve distinct goals & that's why the API provides both as distinct entities.

The original goal of this example was to give people an idea how to make certain types of histograms, not explain the API, which is what the API docs should do. In particular, the original goal was to document how you get the probability mass function, since people are often confused that is not what density=True provides.

An example of using weights for other uses would probably be fine, since the scope of the example broadened somewhat as it was composed.

I don't think this belongs in Tutorials. Having Tutorials creep back into being a catchall for things we want to explain would be an organizational step backwards. I think it belongs in Examples, because we do not have in-depth examples like this in the Users guide, and adding a bunch of very specialized ones like this would be hard to do in a balanced fashion.

story645 · 2023-12-24T19:42:47Z

. I think it belongs in Examples, because we do not have in-depth examples like this in the Users guide, and adding a bunch of very specialized ones like this would be hard to do in a balanced fashion.

I definitely don't think it belongs in the user guide - but I also don't think that the imshow guide belongs in the user guide either. You said yourself elsewhere that you think of the gallery as short stack overflow style demos -> which an explanation of histogram normalization is not.

Having Tutorials creep back into being a catchall for things we want to explain would be an organizational step backwards.

What I'm proposing is tutorials be the place for long form explanatory tutorials of how to use our plotting methods...which yes means pulling some things back from the user guide. When I'm thinking clean separation and scope, my mind goes to:

User guide: key concepts/core objects-what is a figure/axes/formatter/locator/norm/colormap
Tutorial: how do I use the histogram method?
Gallery: here's a demo of the pmf with a link back to the histogram usage guide/tagged plot_type:histogram like the usage guide so that they'll show up together.

Granted, what I really would like is to maybe do this sorting out as another GSOD, but there's no way it'll be successful if we can't compromise somehow. I think for the most part the bulk of the work ends up being cleaning up framing so that it's consistent within a section, with some moving around based on agreed upon content guidelines.

The original goal of this example was to give people an idea how to make certain types of histograms, not explain the API, which is what the API docs should do.

I think it's nice to have docs that show how all the API parameters work in one place/how to use those parameters (this is a thing folks ask for a lot - how do I use pie) & that in the API docs this kinda step by step w/ pictures would be majorly distracting/annoying for folks who already know how to use hist & just want to double check an argument - which that's the instructive vs reference distinction.

In particular, the original goal was to document how you get the probability mass function, since people are often confused that is not what density=True provides

Then it should have been scoped to that, but the scope grew and it became a tutorial on histogram binning. Fundementally what I'm aiming at here is reframing this document so that it's primarily focused on library functionality. I was very careful to not remove any of the content you wrote.

jklymak · 2023-12-24T23:27:01Z

What I'm proposing is tutorials be the place for long form explanatory tutorials of how to use our plotting methods...which yes means pulling some things back from the user guide. When I'm thinking clean separation and scope, my mind goes to:

User guide: key concepts/core objects-what is a figure/axes/formatter/locator/norm/colormap

Tutorial: how do I use the histogram method?

Gallery: here's a demo of the pmf with a link back to the histogram usage guide/tagged plot_type:histogram like the usage guide so that they'll show up together.

I don't agree with that as an endpoint for the user guide. The user guide should be how to use Matplotlib, as completely as we can manage. To me, that eventually would include "how do I use the histogram method". Tutorials should be like the "Lifecycle of a plot" variety, or other online tutorials that take the reader on a complete guided journey. Examples should be standalone, and relatively short.

This particular example is long, so I agree it is borderline. An advantage to leaving it in "examples" is that it puts all the hist things in one spot. If one were to make a "hist tutorial" it would ideally also discuss styling the histograms, and whatever else is in the hist examples. Eventually with enough such method-based tutorials, you could imagine migrating them to the user guide.

Then it should have been scoped to that, but the scope grew and it became a tutorial on histogram binning.

Not clear that is a problem - you need to understand the binning if you were going to understand the density arguments because they interact so strongly. I mean I guess there could have been two separate examples, but that doesn't seem necessary

story645 · 2023-12-25T02:45:24Z

Tutorials should be like the "Lifecycle of a plot" variety, or other online tutorials that take the reader on a complete guided journey.

Yeah, I'm proposing complete guided journeys of the plotting methods. The feedback I've consistently heard on the lifecycle tutorial is that we ruined what made it good & now nobody understands what it's for. And what do you mean by other online tutorials?

What I'm struggling with is what would be the paragraph at the top of "user guide" and on top of "tutorial" telling the user what they could expect from each page.

Granted, I also wouldn't mind getting rid of tutorials all together if we can't come to a consensus on its purpose.

An advantage to leaving it in "examples" is that it puts all the hist things in one spot.

We can also use tags to do that.

If one were to make a "hist tutorial" it would ideally also discuss styling the histograms, and whatever else is in the hist examples.

Sort of the intent behind restructuring in this way - it provides a natural way to slot in the other hist parameters.

You need to understand the binning if you were going to understand the density arguments because they interact so strongly.
I mean I guess there could have been two separate examples, but that doesn't seem necessary

Two separate examples means that there's a binning example that we could link to/cross reference across every example where it may be helpful to know about binning. If I was gonna draw an n-level deep toctree of the docs, the implication would be that binning is a topic that's important only for normalization because that's where it's explained - a big motivator for a lot of what I'm proposing is to try and prevent users from conflating topics.

ETA: one of my major meta motivations is that I think it's much easier to see what docs are missing if we've got our docs well organized. And to organize our docs, we have to collectively decide on what we think should go where.

ETA 2: Granted, I also think we've had so many back and forths on this and are at a total impasse & it's probably more constructive to hash this out on a call in January at this point.

story645 · 2023-12-25T03:34:32Z

Also, just for the record, I'm not really married to the exact user guide/tutorial bins that I'm proposing here - @melissawm and @esibinga weren't super keen on my exact division either. I'm concerned about putting plotting function tutorials in user guide b/c a) I think we agree that they should be written as tutorials even if we disagree on what that means, b) we've got 60 odd methods, 30 or so in the plot type gallery, and that's a lot of content to add the guide c) my meta proposal is that the guide cover the topics that apply to an arbitrary plot -> figure/axes/formatting/etc - while plotting method usage is guided by individual needs (for example, I've never needed to make a contour plot, but yes under this definition something like the imshow guide would stay under user guide but expanded to make clear it applies to all the AxesImage plot types, which # ties back to underlying object as the organizing principal).

My priority is that each section of the docs have a clear distinct identity, scope, & pedagogical approach (ETA: which is what the diataxes approach is fundementally advocating) so that we can sort documents, improve cohesiveness within a section, and reduce duplication/redundancy.

The motivation for this PR was that the contents of the document changed scope within the original (pulling back to being the histogram features example this replaced) & this attempts to update the structure to match that change in scope. Functionally, this is what I've been doing with basically all my doc PRs lately - sorting and binning to make the distribution of docs/topics/content cleaner. ETA: This is b/c I had thought that there was long standing consensus that the major issue w/ the docs is the information architecture, so I was approaching this PR w/ that in mind. I mostly do not have major concerns w/ the original PR as a standalone document, but within the context of the rest of the documentation I think it is too much a tutorial on histogram to be an example (bins don't need to be explained for a demo of using hist for norming) but as a tutorial is too focused on the statistics/why use histograms and not enough on library functionality and I'm fairly certain our goal is not to recreate https://www.itl.nist.gov/div898/handbook/eda/section3/histogra.htm

Pull tutorial content out of example and into standalone tutorial. Co-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>

story645 · 2023-12-29T06:21:19Z

makes more sense to rework this w/ consolidating the other histogram examples in #27569 so shifting back to draft.

story645 added the Documentation: tutorials files in galleries/tutorials label Dec 7, 2023

story645 changed the title ~~Doc histogram~~ Doc: follow up on histogram normalization example Dec 7, 2023

story645 force-pushed the doc-histogram branch from 56b4b24 to 9fdb0cd Compare December 7, 2023 08:51

story645 marked this pull request as draft December 8, 2023 06:49

story645 force-pushed the doc-histogram branch 2 times, most recently from aec07ce to de7b768 Compare December 18, 2023 07:30

story645 marked this pull request as ready for review December 18, 2023 08:11

QuLogic reviewed Dec 19, 2023

View reviewed changes

story645 force-pushed the doc-histogram branch 3 times, most recently from 1d65522 to e9c7d92 Compare December 20, 2023 04:03

story645 marked this pull request as draft December 20, 2023 16:26

story645 force-pushed the doc-histogram branch from e9c7d92 to 165a3f2 Compare December 21, 2023 05:56

story645 marked this pull request as ready for review December 21, 2023 05:56

tacaswell reviewed Dec 22, 2023

View reviewed changes

story645 force-pushed the doc-histogram branch from 165a3f2 to 215411a Compare December 24, 2023 04:08

story645 force-pushed the doc-histogram branch from 215411a to ca4675e Compare December 25, 2023 20:19

doc: follow up for normalizing histogram

a15c661

Pull tutorial content out of example and into standalone tutorial. Co-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>

story645 force-pushed the doc-histogram branch from ca4675e to a15c661 Compare December 25, 2023 22:11

story645 marked this pull request as draft December 29, 2023 06:20

story645 added Documentation and removed Documentation: tutorials files in galleries/tutorials labels Jan 17, 2024

github-actions bot added the status: needs rebase label Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc: follow up on histogram normalization example #27459

Doc: follow up on histogram normalization example #27459

story645 commented Dec 7, 2023 •

edited

Loading

QuLogic left a comment

QuLogic left a comment •

edited

Loading

QuLogic Dec 19, 2023

story645 Dec 19, 2023

QuLogic Dec 20, 2023

story645 Dec 20, 2023

story645 commented Dec 20, 2023 •

edited

Loading

tacaswell commented Dec 22, 2023

tacaswell Dec 22, 2023

jklymak Dec 23, 2023

story645 commented Dec 23, 2023 •

edited

Loading

story645 commented Dec 23, 2023 •

edited

Loading

jklymak commented Dec 24, 2023

story645 commented Dec 24, 2023

jklymak commented Dec 24, 2023

story645 commented Dec 25, 2023 •

edited

Loading

story645 commented Dec 25, 2023 •

edited

Loading

story645 commented Dec 29, 2023 •

edited

Loading

		ax.hist(xdata, bins=xbins, weights=1/len(xdata) * np.ones(len(xdata)),
		histtype='step', label=f'{dx}')

Doc: follow up on histogram normalization example #27459

Are you sure you want to change the base?

Doc: follow up on histogram normalization example #27459

Conversation

story645 commented Dec 7, 2023 • edited Loading

PR summary

PR checklist

QuLogic left a comment

Choose a reason for hiding this comment

QuLogic left a comment • edited Loading

Choose a reason for hiding this comment

QuLogic Dec 19, 2023

Choose a reason for hiding this comment

story645 Dec 19, 2023

Choose a reason for hiding this comment

QuLogic Dec 20, 2023

Choose a reason for hiding this comment

story645 Dec 20, 2023

Choose a reason for hiding this comment

story645 commented Dec 20, 2023 • edited Loading

tacaswell commented Dec 22, 2023

tacaswell Dec 22, 2023

Choose a reason for hiding this comment

jklymak Dec 23, 2023

Choose a reason for hiding this comment

story645 commented Dec 23, 2023 • edited Loading

story645 commented Dec 23, 2023 • edited Loading

jklymak commented Dec 24, 2023

story645 commented Dec 24, 2023

jklymak commented Dec 24, 2023

story645 commented Dec 25, 2023 • edited Loading

story645 commented Dec 25, 2023 • edited Loading

story645 commented Dec 29, 2023 • edited Loading

story645 commented Dec 7, 2023 •

edited

Loading

QuLogic left a comment •

edited

Loading

story645 commented Dec 20, 2023 •

edited

Loading

story645 commented Dec 23, 2023 •

edited

Loading

story645 commented Dec 23, 2023 •

edited

Loading

story645 commented Dec 25, 2023 •

edited

Loading

story645 commented Dec 25, 2023 •

edited

Loading

story645 commented Dec 29, 2023 •

edited

Loading