Skip to content

Computational pattern tutorial edits #186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jul 6, 2023

Conversation

JessicaS11
Copy link
Contributor

Addresses #166 and adds examples of why you should use these functions instead of loops.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@JessicaS11 JessicaS11 linked an issue Jun 23, 2023 that may be closed by this pull request
@JessicaS11
Copy link
Contributor Author

@scottyhq @dcherian @yutik-nn @e-marshall

It turns out I'm having trouble creating a loop example because I've hit upon the thing that usually keeps me from using groupby (this code is also in the tutorial notebook on this branch):

import numpy as np
import xarray as xr

xr.set_options(keep_attrs=True, display_expand_data=False)
da = xr.tutorial.load_dataset("air_temperature", engine="netcdf4").air

the outputs (acknowledging the differences in printing vs storing and object type) of:

months = [1,2,3,4,5,6,7,8,9,10,11,12]
avg_temps = []

for mon in months:
    avg = da[da["time.month"]==mon].mean()
    avg_temps.append(float(avg.data))

print(avg_temps)

and

for label, group in da.groupby("time.month"):
    print(group.mean().data)

match and make sense to me (12 values, one for each month, with all the days of each of the years included in the mean).

Then

da.groupby("time.month").mean()

returns a large array of len 12, with each of those also being an array with lots of values. Why is the behavior of these different, and what am I missing conceptually?

@dcherian
Copy link
Contributor

Use group.mean("time"). The reduction for groupby operates along all dimensions of the variable you're grouping by, not all dimensions of the object you are reducing (Use groupby().mean(...) for the latter)

@JessicaS11 JessicaS11 changed the title High level tutorial edits High level computational pattern tutorial edits Jun 29, 2023
@JessicaS11
Copy link
Contributor Author

Thanks, @dcherian!

It took me some head scratching, but I see what's going on now. I also dug in a bit to the groupby.datasetgroupby.mean() and dataarray.mean docs. I see why it's set up the way it is, but I find it incredibly confusing that the default behavior of .mean() is different depending on what context you use it in. If mean reduces along all dimensions by default in the dataarray and dataset .mean() implementations, to me it should also reduce along all dimensions if I use .mean() on a groupby operation (which results in a bunch of dataarrays!) and I should have to specify if I want something different (e.g. reducing along only the groupby dimension).

I'd be curious to explore some ideas around minimizing this cognitive dissonance for new users (assuming others encounter this issue). Something like changing the default of the datarray.mean to ... instead of None and considering how we showcase the use of .mean() in tutorials/examples (making sure we note this). I've realized through working on this tutorial (and looking back at old code) that I only ever used groupby to iterate within a for loop because I couldn't find an explanation for these differing behaviors (that will change with these edits though!) and wasn't code-savvy enough to pick up on the difference in how they were implemented.

@dcherian
Copy link
Contributor

dcherian commented Jul 3, 2023

The need for grouped/rolling/coarsen reduction over all dimensions of the array seems rare (even in my experience): pydata/xarray#2363, not to mention somewhat confusing for a dataset where arrays can have different dimensions.

At this point, any changes to defaults are very unlikely. However, every "pattern" should be consistent in this behaviour.

@JessicaS11
Copy link
Contributor Author

@scottyhq The QC linkchecker is failing with an anchor error for a specific discussion comment, even though the url works fine if you stick it in a browser... thoughts on how we might want to handle this? As best I understand it (from e.g.), GH generated anchors are an unresolved issue in certain cases... the comments for the particular discussion thread I'm linking to are long (>60 replies), so the only alternative I can think of to point to the correct comment would be to provide the date of the solution post.

@JessicaS11 JessicaS11 marked this pull request as ready for review July 3, 2023 20:05
Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like these additions a lot! I think they make the material a lot more approachable when you're working through it on your own.

@keewis
Copy link
Contributor

keewis commented Jul 4, 2023

the comments for the particular discussion thread I'm linking to are long (>60 replies)

Note that that doesn't even contain the final solution: there were a lot of things that were unclear, so instead we resorted to using a different channel. The final solution computes the binary mask, applies scipy.ndimage.label using apply_ufunc, then uses groupby to compute event stats (but I don't remember the exact code).

Not sure how to deal with that, maybe we can just ask them to post the final state in a new comment?

@JessicaS11
Copy link
Contributor Author

Not sure how to deal with that, maybe we can just ask them to post the final state in a new comment?

Great idea, @keewis. I've added a comment in the discussion to this effect.

@dcherian
Copy link
Contributor

dcherian commented Jul 5, 2023

Note that that doesn't even contain the final solution:

Maybe we don't need to link to it ?

@JessicaS11
Copy link
Contributor Author

Maybe we don't need to link to it ?

I took the link out for now until the final solution is posted (though it turns our that even though the QC checker failed the anchored link it renders fine in the preview.

I also noticed the preview has all the exercise solutions numbered as 8. @lsetiawan Is this an easy thing to fix?

Otherwise this should be about ready to go!

@dcherian
Copy link
Contributor

dcherian commented Jul 6, 2023

< also noticed the preview has all the exercise solutions numbered as 8

I think you need

```{exercise}
:label: generalize
```

instead of

```{exercise} generalize
```

@JessicaS11
Copy link
Contributor Author

< also noticed the preview has all the exercise solutions numbered as 8

I think you need

```{exercise}
:label: generalize

I tried and it failed: ERROR: Directive 'solution': 1 argument(s) required, 0 supplied

@dcherian
Copy link
Contributor

dcherian commented Jul 6, 2023

I think you assign the label under exercise and then reference it in the solution as

```{exercise} 
:label: generalize
```
```{solution} generalize
```

(it took me a while to figure it out too)

@dcherian
Copy link
Contributor

dcherian commented Jul 6, 2023

For some reason --- didn't render properly in my jupyterlab 🤷🏾

@dcherian dcherian changed the title High level computational pattern tutorial edits Computational pattern tutorial edits Jul 6, 2023
@dcherian dcherian enabled auto-merge (squash) July 6, 2023 21:28
@dcherian dcherian merged commit 219ac74 into xarray-contrib:main Jul 6, 2023
@dcherian
Copy link
Contributor

dcherian commented Jul 6, 2023

Thanks @JessicaS11 this is a great improvement!

@JessicaS11
Copy link
Contributor Author

I think you assign the label under exercise and then reference it in the solution as

```{exercise} 
:label: generalize

(it took me a while to figure it out too)

Haha - well it seems obvious in retrospect (I thought "generalize" was a sort of odd command). Thanks for the update and for fixing+merging!

@JessicaS11 JessicaS11 deleted the highleveltutorial branch July 7, 2023 16:51
dcherian added a commit to negin513/xarray-tutorial that referenced this pull request Jul 10, 2023
* upstream/main:
  Update indexing material (xarray-contrib#192)
  Migrate to exercise syntax (xarray-contrib#196)
  Bump pangeo/base-image from 2023.06.20 to 2023.07.05 in /.devcontainer (xarray-contrib#193)
  Add Nebari-specific instructions for SciPy 2023 (xarray-contrib#195)
  Add contributors image (xarray-contrib#194)
  Computational pattern tutorial edits (xarray-contrib#186)
  Housekeeping + clean up sidebar (xarray-contrib#190)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feedback on "high-level computation patterns"
3 participants