From 53747e7dd9244388264476b37bde29438882155f Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Wed, 14 Jun 2023 09:50:29 -0400 Subject: [PATCH 01/22] revise intro portion of notebook --- .../01-high-level-computation-patterns.ipynb | 36 ++++++++++++------- 1 file changed, 23 insertions(+), 13 deletions(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index 77a0a3ad..0cc5a134 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -1,6 +1,7 @@ { "cells": [ { + "attachments": {}, "cell_type": "markdown", "id": "a349a0a5-eeb3-410a-b5d1-f472a8ca14b2", "metadata": { @@ -10,10 +11,28 @@ "tags": [] }, "source": [ - "# High-level computational patterns\n" + "# Utilizing computational patterns\n", + "\n", + "From https://toolz.readthedocs.io/en/latest/control.html\n", + "\n", + "Often when writing code we repeat certain patterns, whether we realize it or not.\n", + "If you have learned to write list comprehensions, you are taking advantage of a \"control pattern\".\n", + "Often, these patterns are so common that many packages have built in functions to implement them.\n", + "\n", + "> The Toolz library contains dozens of patterns like map and groupby. Learning a\n", + "> core set (maybe a dozen) covers the vast majority of common programming tasks\n", + "> often done by hand. A rich vocabulary of core control functions conveys the\n", + "> following benefits:\n", + ">\n", + "> - You identify new patterns\n", + "> - You make fewer errors in rote coding\n", + "> - You can depend on well tested and benchmarked implementations\n", + "\n", + "The same is true for xarray" ] }, { + "attachments": {}, "cell_type": "markdown", "id": "623d5170-f32d-4643-9a59-c54768ee7185", "metadata": { @@ -25,18 +44,9 @@ "source": [ "## Motivation / Learning goals\n", "\n", - "From https://toolz.readthedocs.io/en/latest/control.html\n", - "\n", - "> The Toolz library contains dozens of patterns like map and groupby. Learning a\n", - "> core set (maybe a dozen) covers the vast majority of common programming tasks\n", - "> often done by hand. A rich vocabulary of core control functions conveys the\n", - "> following benefits:\n", - ">\n", - "> - You identify new patterns\n", - "> - You make fewer errors in rote coding\n", - "> - You can depend on well tested and benchmarked implementations\n", - "\n", - "The same is true for xarray\n" + "- Learn what high-level computational patterns are available in Xarray\n", + "- Identify when you are using a high-level computational pattern\n", + "- Implement that pattern using built-in Xarray functionality" ] }, { From 7726976377fac0170300f3a0f8efb261101f6b9b Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Fri, 23 Jun 2023 15:45:57 -0400 Subject: [PATCH 02/22] update rolling-reduce example to use a non built-in function --- .../01-high-level-computation-patterns.ipynb | 57 +++++++++++++------ 1 file changed, 39 insertions(+), 18 deletions(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index 0cc5a134..0c28403c 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -50,6 +50,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "5f0cda65-cfaa-42ed-bd4d-f290c1e98bb3", "metadata": { @@ -79,12 +80,13 @@ " [Bin data in to groups and reduce](https://docs.xarray.dev/en/stable/groupby.html)\n", "1. `groupby_bins`: GroupBy after discretizing a numeric variable.\n", "1. `resample` :\n", - " [Groupby specialized for time axes. Either downsample or upsample your data.](https://docs.xarray.dev/en/stable/user-guide/time-series.html#resampling-and-grouped-operations)\n", + " [GroupBy specialized for time axes. Either downsample or upsample your data.](https://docs.xarray.dev/en/stable/user-guide/time-series.html#resampling-and-grouped-operations)\n", "1. `weighted` :\n", " [Weight your data before reducing](https://docs.xarray.dev/en/stable/user-guide/computation.html#weighted-array-reductions)\n" ] }, { + "attachments": {}, "cell_type": "markdown", "id": "71d8581c-1ffb-47a0-b992-12c3997f3586", "metadata": { @@ -94,7 +96,7 @@ "tags": [] }, "source": [ - "## Load example dataset\n" + "### Load example dataset\n" ] }, { @@ -116,6 +118,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "90832354-d0f3-4d83-a979-23b685203d3e", "metadata": { @@ -127,12 +130,7 @@ "source": [ "---\n", "\n", - "## Concept: \"index space\" vs \"label space\"\n", - "\n", - "These are windowed operations with a window of a fixed size.\n", - "\n", - "- `rolling`: sliding window operations e.g. running mean\n", - "- `coarsen`: decimating; reshaping\n" + "### Concept refresher: \"index space\" vs \"label space\"\n" ] }, { @@ -204,6 +202,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "e9b80381-8a0d-4833-97fa-687bf693ca5a", "metadata": {}, @@ -214,13 +213,17 @@ "\n", "### Index space\n", "\n", + "These are windowed operations with a window of a fixed size.\n", + "\n", "1. `rolling` :\n", - " [Operate on rolling windows of your data e.g. running mean](https://docs.xarray.dev/en/stable/user-guide/computation.html#rolling-window-operations)\n", + " [Operate on rolling (sliding) windows of your data e.g. running mean](https://docs.xarray.dev/en/stable/user-guide/computation.html#rolling-window-operations)\n", "1. `coarsen` :\n", - " [Downsample your data](https://docs.xarray.dev/en/stable/user-guide/computation.html#coarsen-large-arrays)\n", + " [Downsample your data (decimating, reshaping)](https://docs.xarray.dev/en/stable/user-guide/computation.html#coarsen-large-arrays)\n", "\n", "### Label space\n", "\n", + "These are windowed operations with irregular windows based on your data.\n", + "\n", "1. `groupby` :\n", " [Bin data in to groups and reduce](https://docs.xarray.dev/en/stable/groupby.html)\n", "1. `groupby_bins`: GroupBy after discretizing a numeric variable.\n", @@ -229,6 +232,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "339bdf89-c7da-4fca-89e1-a6655e65a6a3", "metadata": { @@ -238,11 +242,15 @@ "tags": [] }, "source": [ + "START HERE\n", + "ToDo revise/edit these examples\n", + "add some \"loop\" versions to show what a user might come up with that could be turned into one of these pattern operations\n", + "\n", "---\n", "\n", "## Index space: windows of fixed width\n", "\n", - "### Sliding windows of fixed length: `rolling`\n", + "### Sliding windows of fixed length: [`rolling`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.rolling.html)\n", "\n", "- returns object of same shape as input\n", "- pads with NaNs to make this happen\n", @@ -280,6 +288,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "b88c116e-ad63-4fea-81a7-bcabc194dee5", "metadata": { @@ -291,29 +300,41 @@ "source": [ "#### Apply an existing numpy-only function with `reduce`\n", "\n", + "In some cases, we may want to apply a sliding window function using rolling that is not built in to Xarray. In these cases we can still leverage the sliding windows of rolling and apply our own function with [`reduce`](https://docs.xarray.dev/en/stable/generated/xarray.core.rolling.DataArrayRolling.reduce.html).\n", + "\n", "Tip: The `reduce` method expects a function that can receive and return plain\n", - "arrays (e.g. numpy). The `map` method expects a function that can receive and\n", - "return Xarray objects.\n", + "arrays (e.g. numpy), as in each of the \"windows\" provided by the rolling iterator. This is in contrast to the `map` method, which expects a function that can receive and return Xarray objects.\n", "\n", - "Here's an example function: `np.mean`\n" + "Here's an example function: [`np.ptp`](https://numpy.org/doc/stable/reference/generated/numpy.ptp.html).\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b1610220", + "metadata": {}, + "outputs": [], + "source": [ + "data.rolling(lat=5, lon=5, center=True).reduce(np.ptp).plot()" ] }, { + "attachments": {}, "cell_type": "markdown", "id": "9ef251aa-ce3c-4318-95ba-470568ebd967", "metadata": {}, "source": [ - "**Exercise** Calculate the rolling mean in 5 point bins along both latitude and\n", - "longitude using\n", - "[`rolling(...).reduce`](https://docs.xarray.dev/en/stable/generated/xarray.core.rolling.DataArrayRolling.reduce.html)\n" + "**Exercise** Calculate the rolling mean in 5 point bins along both latitude and longitude using\n", + "[`rolling(**kwargs).reduce`](https://docs.xarray.dev/en/stable/generated/xarray.core.rolling.DataArrayRolling.reduce.html)\n" ] }, { + "attachments": {}, "cell_type": "markdown", "id": "75397b3d-5961-4924-b688-23520b79aae8", "metadata": {}, "source": [ - "**Answer:**\n" + "**Answer**\n" ] }, { From f51eff2c2ae66930349d1af7084a41e58dfbaabd Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Fri, 23 Jun 2023 16:20:15 -0400 Subject: [PATCH 03/22] revisions to rest of rolling and coarsen --- .../01-high-level-computation-patterns.ipynb | 66 +++++++++++-------- 1 file changed, 37 insertions(+), 29 deletions(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index 0c28403c..3de052d3 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -82,7 +82,10 @@ "1. `resample` :\n", " [GroupBy specialized for time axes. Either downsample or upsample your data.](https://docs.xarray.dev/en/stable/user-guide/time-series.html#resampling-and-grouped-operations)\n", "1. `weighted` :\n", - " [Weight your data before reducing](https://docs.xarray.dev/en/stable/user-guide/computation.html#weighted-array-reductions)\n" + " [Weight your data before reducing](https://docs.xarray.dev/en/stable/user-guide/computation.html#weighted-array-reductions)\n", + "\n", + "\n", + "Note: the documentation links in this tutorial point to the DataArray implementations of each function, but they are also available for DataSet objects.\n" ] }, { @@ -243,7 +246,6 @@ }, "source": [ "START HERE\n", - "ToDo revise/edit these examples\n", "add some \"loop\" versions to show what a user might come up with that could be turned into one of these pattern operations\n", "\n", "---\n", @@ -357,6 +359,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "d0155b62-d08f-42c6-b467-1af73a7829c0", "metadata": { @@ -366,10 +369,9 @@ "tags": [] }, "source": [ - "#### For more complicated analysis, construct a new array with a new dimension.\n", + "#### Storing the outputs from `rolling` operations with `construct`\n", "\n", - "Allows things like short-time fourier transform, spectrogram, windowed rolling\n", - "etc.\n" + "In the above examples, we plotted the outputs of our rolling operations. Xarray makes it easy to store the outputs from `rolling` directly into the DataArray using the [`construct`](https://docs.xarray.dev/en/stable/generated/xarray.core.rolling.DataArrayRolling.construct.html#xarray.core.rolling.DataArrayRolling.construct) method." ] }, { @@ -395,12 +397,12 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "0a23b9a9-076b-472d-b7a6-57083566a32d", "metadata": {}, "source": [ - "**Exercise** Calculate the 5 point running mean in time using\n", - "`rolling.construct`\n" + "**Exercise** Calculate the 5 point running mean in time and add it to your DataArray using `rolling.construct`" ] }, { @@ -441,6 +443,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "816929d5-6635-4e93-99fc-79b5729c5491", "metadata": { @@ -449,7 +452,7 @@ "source": [ "#### Advanced: Another `construct` example\n", "\n", - "This is a 2D rolling example; we need to provide two new dimension names\n" + "This is a 2D rolling example; we need to provide two new dimension names.\n" ] }, { @@ -459,10 +462,11 @@ "metadata": {}, "outputs": [], "source": [ - "(data.rolling(lat=5, lon=5, center=True).construct(lat=\"lat_roll\", lon=\"lon_roll\"))" + "data.rolling(lat=5, lon=5, center=True).construct(lat=\"lat_roll\", lon=\"lon_roll\")" ] }, { + "attachments": {}, "cell_type": "markdown", "id": "f75d2a5f-31d0-4943-b70a-06e7c8a30601", "metadata": { @@ -476,9 +480,7 @@ "\n", "### Block windows of fixed length: `coarsen`\n", "\n", - "For non-overlapping windows or \"blocks\" use `coarsen`. The syntax is very\n", - "similar to `rolling`. You will need to specify `boundary` if the length of the\n", - "dimension is not a multiple of the block size\n" + "For non-overlapping windows or \"blocks\" use [`coarsen`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.coarsen.html). The syntax is very similar to `rolling`. You will need to specify how you want Xarray to handle the `boundary` if the length of the dimension is not a multiple of the block size.\n" ] }, { @@ -508,7 +510,7 @@ "metadata": {}, "outputs": [], "source": [ - "data.coarsen(lat=5, lon=5, boundary=\"trim\").std()" + "data.coarsen(lat=5, lon=5, boundary=\"trim\").mean()" ] }, { @@ -522,6 +524,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "b30794c7-1aeb-4e13-b6b4-824f23ac07df", "metadata": { @@ -531,10 +534,9 @@ "tags": [] }, "source": [ - "#### coarsen supports `reduce` for custom reductions\n", + "#### Coarsen supports `reduce` for custom reductions\n", "\n", - "**Exercise** Use `coarsen.reduce` to apply `np.mean` in 5x5 (latxlon) point\n", - "blocks of `data`\n" + "**Exercise** Use `coarsen.reduce` to apply `np.ptp` in 5x5 (lat x lon) point blocks to `data`" ] }, { @@ -561,6 +563,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "0e7cbd7b-da18-48a3-bd29-708d97cc3bb7", "metadata": { @@ -570,12 +573,11 @@ "tags": [] }, "source": [ - "#### coarsen supports `construct` for block reshaping\n", + "#### Coarsen supports `construct` for block reshaping and storing outputs\n", "\n", "This is usually a good alternative to `np.reshape`\n", "\n", - "A simple example splits a 2-year long monthly 1D time series into a 2D array\n", - "shaped (year x month)\n" + "A simple example splits a 2-year long monthly 1D time series into a 2D array shaped (year x month)\n" ] }, { @@ -605,6 +607,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "5dc5c7e7-bc3c-4362-bbd1-6a05801b7c90", "metadata": {}, @@ -615,8 +618,7 @@ "1. The new dimensions `year` and `month` don't have any coordinate labels\n", " associated with them.\n", "\n", - "What if the data had say 23 instead of 24 values? In that case we specify a\n", - "different `boundary` , here we pad to 24 values.\n" + "What if the data had say 23 instead of 24 values (`months.isel(time=slice(1, None)`)? In that case we specify a different `boundary` (the default `boundary=\"exact\"` worked above); here we pad to 24 values.\n" ] }, { @@ -630,11 +632,12 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "f4e90b49-42e4-411f-9148-bcaf145de26c", "metadata": {}, "source": [ - "This ends up adding values at the end of the array, not so sensible for this\n", + "This adds values at the end of the array (see the 'nan' at the end of the time coordinate?), which is not so sensible for this\n", "problem. We have some control of the padding through the `side` kwarg to `coarsen`. For `side=\"right\"` we get more sensible output." ] }, @@ -651,11 +654,12 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "8174aad1-d6e1-4772-bf23-91e363a92c19", "metadata": {}, "source": [ - "Note that `coarsen` pads with NaNs. For more control over paddnig, use\n", + "Note that `coarsen` pads with NaNs. For more control over padding, use\n", "[DataArray.pad](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.pad.html) explicitly." ] }, @@ -671,7 +675,9 @@ " .pad(time=(1, 0), constant_values=-1)\n", " .coarsen(time=12)\n", " .construct(time=(\"year\", \"month\"))\n", - ")" + ")\n", + "\n", + "#NOTE: check output of this cell (why is the first value of time nan instead of -1?)" ] }, { @@ -733,21 +739,20 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "4de2984e-9c28-4ed7-909f-bab47b6eae49", "metadata": {}, "source": [ - "This exercise came up during the live lecture.\n", + "This exercise came up during a live lecture.\n", "\n", "**Exercise** Calculate the rolling 4 month average, averaged across years.\n", "\n", "**Answer**\n", "\n", "1. We first reshape using `coarsen.construct` to add `year` as a new dimension.\n", - "2. Then `rolling` on the month dimension.\n", - "3. It turns out that `roll.mean([\"year\", \"month\"])` doesn't work. So we use\n", - " `roll.construct` to get a DataArray with a new dimension `window` and then\n", - " take the mean over `window` and `year`\n" + "2. Apply `rolling` on the month dimension.\n", + "3. It turns out that `roll.mean([\"year\", \"month\"])` doesn't work. So we use `roll.construct` to get a DataArray with a new dimension `window` and then take the mean over `window` and `year`\n" ] }, { @@ -779,6 +784,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "25fd132c-5436-4af6-b8ad-75269cb45e75", "metadata": { @@ -788,6 +794,8 @@ "tags": [] }, "source": [ + "START EDITING HERE!\n", + "\n", "---\n", "\n", "## Label space \"windows\" or bins : GroupBy\n", From 5173d87ca77c9272db191bd579538504a1e525fc Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Fri, 23 Jun 2023 20:27:38 +0000 Subject: [PATCH 04/22] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- intermediate/01-high-level-computation-patterns.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index 3de052d3..3ac16c2e 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -677,7 +677,7 @@ " .construct(time=(\"year\", \"month\"))\n", ")\n", "\n", - "#NOTE: check output of this cell (why is the first value of time nan instead of -1?)" + "# NOTE: check output of this cell (why is the first value of time nan instead of -1?)" ] }, { From a1681bc951aad694c5f0d38ab00428ccba91f347 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Tue, 27 Jun 2023 14:36:09 -0400 Subject: [PATCH 05/22] finish revising groupby through end; start adding loop example --- .../01-high-level-computation-patterns.ipynb | 891 ++++++++++++++++-- 1 file changed, 797 insertions(+), 94 deletions(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index 3ac16c2e..04d70106 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -1,7 +1,6 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "id": "a349a0a5-eeb3-410a-b5d1-f472a8ca14b2", "metadata": { @@ -32,7 +31,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "623d5170-f32d-4643-9a59-c54768ee7185", "metadata": { @@ -50,7 +48,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "5f0cda65-cfaa-42ed-bd4d-f290c1e98bb3", "metadata": { @@ -81,15 +78,12 @@ "1. `groupby_bins`: GroupBy after discretizing a numeric variable.\n", "1. `resample` :\n", " [GroupBy specialized for time axes. Either downsample or upsample your data.](https://docs.xarray.dev/en/stable/user-guide/time-series.html#resampling-and-grouped-operations)\n", - "1. `weighted` :\n", - " [Weight your data before reducing](https://docs.xarray.dev/en/stable/user-guide/computation.html#weighted-array-reductions)\n", "\n", "\n", "Note: the documentation links in this tutorial point to the DataArray implementations of each function, but they are also available for DataSet objects.\n" ] }, { - "attachments": {}, "cell_type": "markdown", "id": "71d8581c-1ffb-47a0-b992-12c3997f3586", "metadata": { @@ -104,10 +98,31 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "id": "02a9022f-1503-45a2-b57a-05ebfeb11d16", "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "import numpy as np\n", "import xarray as xr\n", @@ -121,7 +136,624 @@ ] }, { - "attachments": {}, + "cell_type": "markdown", + "id": "bd47d199", + "metadata": {}, + "source": [ + "---\n", + "\n", + "### Identifying high-level computation patterns\n", + "\n", + "*or, when should I use these functions?*\n", + "\n", + "Consider a common use case. We want to complete some \"task\" for each of \"something\". The \"task\" might be a computation (e.g. mean, median, plot). The \"something\" could be a group of array values (e.g. pixels) or segments of time (e.g. monthly or seasonally).\n", + "\n", + "Often, our solution to this type of problem is to write a loop. Say we want the average air temperature for each month:" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "70159772", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[273.416748046875, 273.13104248046875, 275.1137390136719, 278.5469970703125, 283.299072265625, 287.5657043457031, 289.90692138671875, 290.089111328125, 287.41375732421875, 283.6811828613281, 277.9678039550781, 274.35107421875]\n" + ] + } + ], + "source": [ + "months = [1,2,3,4,5,6,7,8,9,10,11,12]\n", + "avg_temps = []\n", + "\n", + "for mon in months:\n", + " avg = da[da[\"time.month\"]==mon].mean()\n", + " avg_temps.append(float(avg.data))\n", + "\n", + "print(avg_temps)" + ] + }, + { + "cell_type": "markdown", + "id": "c1772b16", + "metadata": {}, + "source": [ + "Writing a for-loop here is not wrong, but it can quickly become cumbersome if you have a complex function to apply and it will take awhile to compute on a large dataset (you may even run out of memory). Parallelizing the computation would take a lot of additional work.\n", + "\n", + "Xarray's functionality instead allows us to do the same computation in one line of code (plus, the computation is optimized and ready to take advantage of parallel compute resources)!" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "id": "6f1b23fa", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "273.41675\n", + "273.13104\n", + "275.11374\n", + "278.547\n", + "283.29907\n", + "287.5657\n", + "289.90692\n", + "290.0891\n", + "287.41376\n", + "283.68118\n", + "277.9678\n", + "274.35107\n", + "[[[246.34987 246.38608 246.21518 ... 243.06113 244.08795 245.6467 ]\n", + " [248.8576 248.90733 248.7104 ... 241.52866 243.50865 246.75471]\n", + " [251.57712 251.19661 250.71463 ... 243.39891 246.78462 251.56572]\n", + " ...\n", + " [295.85028 295.24405 295.22684 ... 295.18625 294.65707 294.0485 ]\n", + " [296.5446 296.46982 296.15994 ... 295.35593 295.0812 294.53006]\n", + " [297.15417 297.2383 297.04892 ... 296.01797 295.77554 295.63647]]\n", + "\n", + " [[246.67715 246.40576 245.9484 ... 241.85838 243.0021 244.44383]\n", + " [247.8001 247.75992 247.47757 ... 240.64706 242.26633 245.06662]\n", + " [249.07079 248.57234 247.94254 ... 242.42874 245.33348 249.72273]\n", + " ...\n", + " [295.92886 295.41788 295.16602 ... 294.4443 293.78143 293.18265]\n", + " [296.78754 296.63443 296.15707 ... 294.51166 294.2178 293.70258]\n", + " [297.2889 297.2165 296.85797 ... 295.16058 294.9558 294.87967]]\n", + "\n", + " [[251.03168 250.67987 250.18945 ... 242.19398 243.11484 244.30956]\n", + " [252.97194 252.86617 252.57347 ... 241.61102 243.02509 245.38196]\n", + " [254.46768 254.09142 253.63428 ... 241.67184 244.49664 248.74258]\n", + " ...\n", + " [295.65652 295.24582 295.22464 ... 294.7663 294.16412 293.6729 ]\n", + " [296.70294 296.68756 296.3824 ... 294.85083 294.57034 294.15213]\n", + " [297.38174 297.4631 297.22668 ... 295.3349 295.11124 295.01654]]\n", + "\n", + " ...\n", + "\n", + " [[261.8136 261.21255 260.5036 ... 248.19336 249.06995 250.41624]\n", + " [269.02225 268.92944 268.71478 ... 246.41554 248.16833 251.14897]\n", + " [269.64017 268.7958 268.45483 ... 246.01215 249.6174 254.69598]\n", + " ...\n", + " [299.09723 298.30466 297.9945 ... 299.09454 298.6955 298.29483]\n", + " [299.43155 299.23853 298.7375 ... 299.2589 299.28873 299.0363 ]\n", + " [299.37054 299.42462 299.15607 ... 299.72403 299.66312 299.76233]]\n", + "\n", + " [[253.74484 253.64487 253.49716 ... 242.96066 243.9345 245.14209]\n", + " [259.12967 258.62927 258.19144 ... 241.84921 243.07965 245.46625]\n", + " [261.04227 258.83536 257.51193 ... 242.38234 245.13663 249.52368]\n", + " ...\n", + " [297.8426 297.1406 296.98773 ... 297.96884 297.56888 297.1611 ]\n", + " [298.58783 298.42026 297.96896 ... 298.16412 298.19397 297.9083 ]\n", + " [298.81143 298.8566 298.62103 ... 298.72955 298.7519 298.8189 ]]\n", + "\n", + " [[247.971 248.02118 247.91302 ... 239.7719 241.02383 242.62823]\n", + " [249.73361 250.16037 250.48581 ... 238.78964 240.96469 244.11626]\n", + " [252.0296 251.53136 251.36629 ... 238.07542 241.91293 247.06987]\n", + " ...\n", + " [296.76508 295.97668 295.88922 ... 296.45605 296.09137 295.65756]\n", + " [297.46814 297.38025 297.04428 ... 296.8556 296.84668 296.52133]\n", + " [297.8809 297.9868 297.77554 ... 297.60034 297.5655 297.53763]]]\n" + ] + } + ], + "source": [ + "for label, group in da.groupby(\"time.month\"):\n", + " print(group.mean().data)\n", + " \n", + "avg_temps = da.groupby(\"time.month\").mean()\n", + "print(avg_temps.data)" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "id": "ad5ee977", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "
<xarray.DataArray 'air' (month: 12, lat: 25, lon: 53)>\n",
+       "246.3 246.4 246.2 245.8 245.2 244.6 ... 298.1 298.0 298.0 297.6 297.6 297.5\n",
+       "Coordinates:\n",
+       "  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0\n",
+       "  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0\n",
+       "  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12\n",
+       "Attributes:\n",
+       "    long_name:     4xDaily Air temperature at sigma level 995\n",
+       "    units:         degK\n",
+       "    precision:     2\n",
+       "    GRIB_id:       11\n",
+       "    GRIB_name:     TMP\n",
+       "    var_desc:      Air temperature\n",
+       "    dataset:       NMC Reanalysis\n",
+       "    level_desc:    Surface\n",
+       "    statistic:     Individual Obs\n",
+       "    parent_stat:   Other\n",
+       "    actual_range:  [185.16 322.1 ]
" + ], + "text/plain": [ + "\n", + "246.3 246.4 246.2 245.8 245.2 244.6 ... 298.1 298.0 298.0 297.6 297.6 297.5\n", + "Coordinates:\n", + " * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0\n", + " * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0\n", + " * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12\n", + "Attributes:\n", + " long_name: 4xDaily Air temperature at sigma level 995\n", + " units: degK\n", + " precision: 2\n", + " GRIB_id: 11\n", + " GRIB_name: TMP\n", + " var_desc: Air temperature\n", + " dataset: NMC Reanalysis\n", + " level_desc: Surface\n", + " statistic: Individual Obs\n", + " parent_stat: Other\n", + " actual_range: [185.16 322.1 ]" + ] + }, + "execution_count": 49, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "da.groupby(\"time.month\").mean()" + ] + }, + { + "cell_type": "markdown", + "id": "4f548b71", + "metadata": {}, + "source": [ + "Read on through this tutorial to learn some of the incredible ways to use Xarray to avoid writing long for-loops and efficiently complete computational analyses on your data." + ] + }, + { "cell_type": "markdown", "id": "90832354-d0f3-4d83-a979-23b685203d3e", "metadata": { @@ -205,7 +837,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "e9b80381-8a0d-4833-97fa-687bf693ca5a", "metadata": {}, @@ -235,7 +866,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "339bdf89-c7da-4fca-89e1-a6655e65a6a3", "metadata": { @@ -245,7 +875,6 @@ "tags": [] }, "source": [ - "START HERE\n", "add some \"loop\" versions to show what a user might come up with that could be turned into one of these pattern operations\n", "\n", "---\n", @@ -290,7 +919,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "b88c116e-ad63-4fea-81a7-bcabc194dee5", "metadata": { @@ -304,8 +932,7 @@ "\n", "In some cases, we may want to apply a sliding window function using rolling that is not built in to Xarray. In these cases we can still leverage the sliding windows of rolling and apply our own function with [`reduce`](https://docs.xarray.dev/en/stable/generated/xarray.core.rolling.DataArrayRolling.reduce.html).\n", "\n", - "Tip: The `reduce` method expects a function that can receive and return plain\n", - "arrays (e.g. numpy), as in each of the \"windows\" provided by the rolling iterator. This is in contrast to the `map` method, which expects a function that can receive and return Xarray objects.\n", + "Tip: The `reduce` method expects a function that can receive and return plain arrays (e.g. numpy), as in each of the \"windows\" provided by the rolling iterator. This is in contrast to the `map` method, which expects a function that can receive and return Xarray objects.\n", "\n", "Here's an example function: [`np.ptp`](https://numpy.org/doc/stable/reference/generated/numpy.ptp.html).\n" ] @@ -321,7 +948,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "9ef251aa-ce3c-4318-95ba-470568ebd967", "metadata": {}, @@ -331,7 +957,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "75397b3d-5961-4924-b688-23520b79aae8", "metadata": {}, @@ -359,7 +984,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "d0155b62-d08f-42c6-b467-1af73a7829c0", "metadata": { @@ -397,7 +1021,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "0a23b9a9-076b-472d-b7a6-57083566a32d", "metadata": {}, @@ -443,7 +1066,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "816929d5-6635-4e93-99fc-79b5729c5491", "metadata": { @@ -466,7 +1088,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "f75d2a5f-31d0-4943-b70a-06e7c8a30601", "metadata": { @@ -524,7 +1145,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "b30794c7-1aeb-4e13-b6b4-824f23ac07df", "metadata": { @@ -563,7 +1183,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "0e7cbd7b-da18-48a3-bd29-708d97cc3bb7", "metadata": { @@ -607,7 +1226,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "5dc5c7e7-bc3c-4362-bbd1-6a05801b7c90", "metadata": {}, @@ -632,7 +1250,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "f4e90b49-42e4-411f-9148-bcaf145de26c", "metadata": {}, @@ -654,7 +1271,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "8174aad1-d6e1-4772-bf23-91e363a92c19", "metadata": {}, @@ -739,7 +1355,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "4de2984e-9c28-4ed7-909f-bab47b6eae49", "metadata": {}, @@ -784,7 +1399,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "25fd132c-5436-4af6-b8ad-75269cb45e75", "metadata": { @@ -794,35 +1408,42 @@ "tags": [] }, "source": [ - "START EDITING HERE!\n", - "\n", "---\n", "\n", "## Label space \"windows\" or bins : GroupBy\n", "\n", - "Generalization of `coarsen`: sometimes the windows you want are not regular.\n", + "Sometimes the windows you want are not regularly spaced or even defined by a grid.\n", + "For instance, grouping data by month (which have varying numbers of days) or the results of an image classification.\n", + "The GroupBy functions are essentially a generalization of `coarsen`: \n", "\n", - "- `groupby`: e.g. climatologies, composites; works when \"groups\" are exact: e.g.\n", - " characters or integers; not floats\n", - "- `groupby_bins`: binning operations e.g. histograms\n", - "- `resample`: groupby but specialized for time grouping (so far)\n", + "- `groupby`: divide data into distinct groups, e.g. climatologies, composites. Works when \"groups\" are exact and can be determined using equality (`==`), e.g. characters or integers. Remember that floats are not exact values.\n", + "- `groupby_bins`: Use binning operations, e.g. histograms, to group your data.\n", + "- `resample`: Specialized implementation of GroupBy specifically for time grouping (so far)\n", "\n", - "**tip** Both `groupby_bins` and `resample` are implemented as `GroupBy` with a\n", - "specific way of constructing group labels.\n", + "**hint** Both `groupby_bins` and `resample` are implemented as `GroupBy` with a specific way of constructing group labels.\n", "\n", "### Deconstructing GroupBy\n", "\n", - "Commonly called \"split-apply-combine\".\n", + "The GroupBy workflow is commonly called \"split-apply-combine\".\n", "\n", "1. \"split\" : break dataset into groups\n", - "1. \"apply\" : apply an operation, usually a reduction like `mean`\n", - "1. \"combine\" : concatenate results from apply step along new \"group\" dimension\n", + "1. \"apply\" : apply an operation, for instance a reduction like `mean`\n", + "1. \"combine\" : concatenate results from apply step along a new \"group\" dimension\n", "\n", - "But really there is a first step: \"identifying groups\" also called\n", - "\"factorization\" (or \"binning\"). Usually this is the hard part.\n", + "But really there is a \"hidden\" first step: identifying groups (also called factorization or binning). Usually this is the hard part.\n", "\n", - "So \"identify groups\" → \"split into groups\" → \"apply function\" → \"combine\n", - "results\".\n" + "In reality the workflow is: \"identify groups\" → \"split into groups\" → \"apply function\" → \"combine results\".\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "55c5e475", + "metadata": {}, + "outputs": [], + "source": [ + "# recall our earlier DataArray\n", + "da" ] }, { @@ -832,6 +1453,9 @@ "metadata": {}, "outputs": [], "source": [ + "# GroupBy returns an iterator that traverses the specified groups, here by month.\n", + "# Notice that groupby is clever enough for us to leave out the `.dt` before `.month`\n", + "# we would need to specify to access the month data directly (see plot below).\n", "da.groupby(\"time.month\")" ] }, @@ -842,6 +1466,8 @@ "metadata": {}, "outputs": [], "source": [ + "# for each group (e.g. the air temperature in a given month for all the years),\n", + "# compute the mean\n", "da.groupby(\"time.month\").mean()" ] }, @@ -850,7 +1476,9 @@ "id": "7a579539-1634-462c-b4d9-ea558fceadfb", "metadata": {}, "source": [ - "This is how xarray identifies \"groups\" for the monthly climatology computation\n" + "Notice that since we have averaged over all the years for each month, our resulting DataArray no longer has a \"year\" coordinate.\n", + "\n", + "If we want to see how Xarray identifies \"groups\" for the monthly climatology computation, we can plot our input to `groupby`. GroupBy is clever enough to figure out how many values there are an thus how many groups to make.\n" ] }, { @@ -860,7 +1488,7 @@ "metadata": {}, "outputs": [], "source": [ - "da.time.dt.month.plot()" + "da.time.month.plot()" ] }, { @@ -868,7 +1496,7 @@ "id": "a6d21727-4c15-4f13-ae53-61d5f4944554", "metadata": {}, "source": [ - "Similarly for binning,\n" + "Similarly for binning (remember this is useful when the parameter you are binning over is not \"exact\", like a float),\n" ] }, { @@ -899,6 +1527,14 @@ "da.resample(time=\"M\")" ] }, + { + "cell_type": "markdown", + "id": "3763efb3", + "metadata": {}, + "source": [ + "QUSTION (intentionally spelled wrong so a check will catch this block) - I want to explain why resample with time=\"M\" has 24 bins, while groupby over month had 12. But I don't actually know..." + ] + }, { "cell_type": "markdown", "id": "0b2de08d-0b7b-4725-80f3-c94d19d91669", @@ -911,14 +1547,13 @@ "source": [ "### Constructing group labels\n", "\n", - "Xarray uses `pandas.factorize` for `groupby` and `pandas.cut` for\n", - "`groupby_bins`.\n", + "Xarray uses [`pandas.factorize`](https://pandas.pydata.org/docs/reference/api/pandas.factorize.html) for `groupby` and [`pandas.cut`](https://pandas.pydata.org/docs/reference/api/pandas.cut.html) for `groupby_bins`.\n", "\n", - "If the automatic group detection doesn't work for your problem then these\n", - "functions are useful for constructing \"group labels\" in many cases\n", + "#### Functions to construct group labels\n", + "If the automatic group detection doesn't work for your problem then these functions are useful for constructing specific \"group labels\" in many cases\n", "\n", "1. [numpy.digitize](https://numpy.org/doc/stable/reference/generated/numpy.digitize.html)\n", - " (binning)\n", + " for binning\n", "1. [numpy.searchsorted](https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html)\n", " supports many other data types\n", "1. [pandas.factorize](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.factorize.html)\n", @@ -938,12 +1573,12 @@ "tags": [] }, "source": [ - "#### More commonly useful are [\"datetime components\"](https://docs.xarray.dev/en/stable/user-guide/time-series.html#datetime-components)\n", + "#### [\"Datetime components\"](https://docs.xarray.dev/en/stable/user-guide/time-series.html#datetime-components) for creating groups\n", "\n", "See a full list\n", "[here](https://docs.xarray.dev/en/stable/generated/xarray.core.accessor_dt.DatetimeAccessor.html?highlight=DatetimeAccessor)\n", "\n", - "Accessed using `DataArray.dt.*`\n" + "These can be accessed in a few different ways as illustrated below.\n" ] }, { @@ -993,10 +1628,11 @@ "id": "db7bd7e6-59cd-4b2a-ac37-2ff4d40d9fc8", "metadata": {}, "source": [ + "#### Construct and use custom labels\n", + "\n", "**Demo** Grouping over a custom definition of seasons using numpy.isin.\n", "\n", - "We want to group over 4 seasons: `DJF`, `MAM`, `JJAS`, `ON` - this makes\n", - "physical sense in the Indian Ocean basin\n", + "We want to group over four seasons: `DJF`, `MAM`, `JJAS`, `ON` - this makes physical sense in the Indian Ocean basin.\n", "\n", "Start by extracting months.\n" ] @@ -1027,8 +1663,8 @@ "metadata": {}, "outputs": [], "source": [ - "season = np.full(month.shape, \" \")\n", - "season" + "myseason = np.full(month.shape, \" \")\n", + "myseason" ] }, { @@ -1046,12 +1682,29 @@ "metadata": {}, "outputs": [], "source": [ - "season[np.isin(month, [12, 1, 2])] = \"DJF\"\n", - "season[np.isin(month, [3, 4, 5])] = \"MAM\"\n", - "season[np.isin(month, [6, 7, 8, 9])] = \"JJAS\"\n", - "season[np.isin(month, [10, 11])] = \"ON\"\n", - "season = da.time.copy(data=season)\n", - "season" + "myseason[np.isin(month, [12, 1, 2])] = \"DJF\"\n", + "myseason[np.isin(month, [3, 4, 5])] = \"MAM\"\n", + "myseason[np.isin(month, [6, 7, 8, 9])] = \"JJAS\"\n", + "myseason[np.isin(month, [10, 11])] = \"ON\"" + ] + }, + { + "cell_type": "markdown", + "id": "297f4d2f", + "metadata": {}, + "source": [ + "Turn our new seasonal group array into a DataArray." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6a72a117", + "metadata": {}, + "outputs": [], + "source": [ + "myseason_da = da.time.copy(data=myseason)\n", + "myseason_da" ] }, { @@ -1061,9 +1714,38 @@ "metadata": {}, "outputs": [], "source": [ + "# Why don't we add the season array to the original dataarray (da) and\n", + "# use da.groupby(\"myseason\")? To me, it's confusing to now need two dataarrays\n", + "\n", + "(\n", + " # Calculate climatology\n", + " da.groupby(myseason_da)\n", + " .mean()\n", + " # reindex to get seasons in logical order (not alphabetical order)\n", + " .reindex(time=[\"DJF\", \"MAM\", \"JJAS\", \"ON\"])\n", + " .plot(col=\"time\")\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "cae97fe4", + "metadata": {}, + "source": [ + "Equivalently, we could add our custom seasons to our original DataArray and use groupby there." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "36dfd6fd", + "metadata": {}, + "outputs": [], + "source": [ + "da = da.assign({\"myseason\" : ((\"time\"), myseason)})\n", "(\n", " # Calculate climatology\n", - " da.groupby(season)\n", + " da.groupby(\"myseason\")\n", " .mean()\n", " # reindex to get seasons in logical order (not alphabetical order)\n", " .reindex(time=[\"DJF\", \"MAM\", \"JJAS\", \"ON\"])\n", @@ -1081,9 +1763,9 @@ "tags": [] }, "source": [ - "#### `floor`, `ceil` and `round` time\n", + "##### `floor`, `ceil` and `round` on time\n", "\n", - "Basically \"resampling\"\n" + "Additional functionality in the [datetime accessor](https://docs.xarray.dev/en/stable/generated/xarray.core.accessor_dt.DatetimeAccessor.html) allows us to effectively \"resample\" our time data to remove roundoff errors in timestamps.\n" ] }, { @@ -1118,11 +1800,11 @@ "tags": [] }, "source": [ - "#### `strftime` can be extremely useful\n", + "##### `strftime` is another powerful option\n", "\n", "So useful and so unintuitive that it has its own website: https://strftime.org/\n", "\n", - "This example avoids merging \"Feb-29\" and \"Mar-01\" for a daily climatology\n" + "This is useful to avoid merging \"Feb-29\" and \"Mar-01\" for a daily climatology\n" ] }, { @@ -1142,9 +1824,9 @@ "tags": [] }, "source": [ - "### groupby supports `reduce` for custom reductions\n", + "### Custom reductions with GroupBy\n", "\n", - "This applies to `groupby_bins` and `resample`\n" + "Analagous to `rolling`, `reduce` and `map` apply custom reductions to `groupby_bins` and `resample`.\n" ] }, { @@ -1154,7 +1836,7 @@ "metadata": {}, "outputs": [], "source": [ - "(da.groupby(\"time.month\").reduce(np.mean).plot(col=\"month\", col_wrap=4))" + "(da.groupby(\"time.month\").reduce(np.ptp).plot(col=\"month\", col_wrap=4))" ] }, { @@ -1162,9 +1844,7 @@ "id": "7cd7ede5-8e57-4099-ab39-b9d75427f125", "metadata": {}, "source": [ - "**tip** `map` is for functions that expect and return xarray objects (see also\n", - "`Dataset.map`). `reduce` is for functions that expect and return plain arrays\n", - "(like numpy or scipy functions)\n" + "**tip** `map` is for functions that expect and return xarray objects (see also [`Dataset.map`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.map.html)). `reduce` is for functions that expect and return plain arrays (like Numpy or SciPy functions).\n" ] }, { @@ -1174,14 +1854,15 @@ "tags": [] }, "source": [ - "### GroupBy does not provide construct\n", + "### Adding GroupBy outputs to your DataArray or DataSet\n", + "\n", + "GroupBy does not provide a `construct` method, because all the groups need not be the same \"length\" (e.g. months can have 28, 29, 30, or 31 days).\n", "\n", - "All the groups need not be the same \"length\" (e.g. months can have 28, 29, 30,\n", - "or 31 days)\n", + "#### Instead looping over groupby objects is possible\n", "\n", - "### Instead looping over groupby objects is possible\n", + "Because `groupby` returns an iterator that loops over each group, it is easy to loop over groupby objects.\n", "\n", - "Maybe you want to plot data in each group separately?\n" + "Maybe you want to plot data in each group separately:\n" ] }, { @@ -1200,7 +1881,7 @@ "id": "8017d842-ff79-47ec-928d-43e3cf4e7b66", "metadata": {}, "source": [ - "This is a DataArray contain data for all December days\n" + "This is a DataArray containing data for all December days (because the last printed `label` value is `12`, so the last `group` value is for December)." ] }, { @@ -1233,6 +1914,24 @@ "group.plot.hist()" ] }, + { + "cell_type": "markdown", + "id": "d339c52c", + "metadata": {}, + "source": [ + "Remember, this is example is just to show how you could operate on each group object in a groupby operation. If we wanted to just explore the December (or March) data, we should just filter for it directly:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0c4fd9b2", + "metadata": {}, + "outputs": [], + "source": [ + "da[[\"time.month\"==12]].plot.hist()" + ] + }, { "cell_type": "markdown", "id": "32dfe5fd-0e8f-4b69-a3c1-03f73c484b6b", @@ -1240,11 +1939,9 @@ "tags": [] }, "source": [ - "### In most cases, avoid a for loop using `map`\n", + "#### In most cases, avoid a for loop using `map`\n", "\n", - "Apply functions that expect xarray Datasets or DataArrays.\n", - "\n", - "Avoid having to manually combine results using concat\n" + "`map` enables us to apply functions that expect xarray Datasets or DataArrays. This makes it easy to perform calculations on the grouped data, add the results from each group back to the original object, and avoid having to manually combine results (using concat).\n" ] }, { @@ -1256,9 +1953,9 @@ }, "outputs": [], "source": [ - "def iqr(da, dim):\n", + "def iqr(gb_da, dim):\n", " \"\"\"Calculates interquartile range\"\"\"\n", - " return (da.quantile(q=0.75, dim=dim) - da.quantile(q=0.25, dim=dim)).rename(\"iqr\")\n", + " return (gb_da.quantile(q=0.75, dim=dim) - gb_da.quantile(q=0.25, dim=dim)).rename(\"iqr\")\n", "\n", "\n", "da.groupby(\"time.month\").map(iqr, dim=\"time\")" @@ -1282,19 +1979,19 @@ "Xarray provides methods for high-level analysis patterns:\n", "\n", "1. `rolling` :\n", - " [Operate on rolling windows of your data e.g. running mean](https://docs.xarray.dev/en/stable/user-guide/computation.html#rolling-window-operations)\n", + " [Operate on rolling (fixed length, overlapping) windows of your data e.g. running mean](https://docs.xarray.dev/en/stable/user-guide/computation.html#rolling-window-operations)\n", "1. `coarsen` :\n", - " [Downsample your data](https://docs.xarray.dev/en/stable/user-guide/computation.html#coarsen-large-arrays)\n", + " [Operate on blocks (fixed length) of your data (downsample)](https://docs.xarray.dev/en/stable/user-guide/computation.html#coarsen-large-arrays)\n", "1. `groupby` :\n", - " [Bin data in to groups and reduce](https://docs.xarray.dev/en/stable/groupby.html)\n", - "1. `groupby_bins`: GroupBy after discretizing a numeric variable.\n", + " [Parse data into groups (using an exact value) and operate on each one (reduce data)](https://docs.xarray.dev/en/stable/groupby.html)\n", + "1. `groupby_bins`: [GroupBy after discretizing a numeric (non-exact, e.g. float) variable](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.groupby_bins.html)\n", "1. `resample` :\n", " [Groupby specialized for time axes. Either downsample or upsample your data.](https://docs.xarray.dev/en/stable/user-guide/time-series.html#resampling-and-grouped-operations)\n", - "1. `weighted` :\n", - " [Weight your data before reducing](https://docs.xarray.dev/en/stable/user-guide/computation.html#weighted-array-reductions)\n", + "\n", "\n", "## More resources\n", "\n", + "1. [Weight your data before reducing](https://docs.xarray.dev/en/stable/user-guide/computation.html#weighted-array-reductions)\n", "1. More tutorials here: https://tutorial.xarray.dev/\n", "1. Answers to common questions on \"how to do X\" are here:\n", " https://docs.xarray.dev/en/stable/howdoi.html\n" @@ -1302,6 +1999,11 @@ } ], "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, "language_info": { "codemirror_mode": { "name": "ipython", @@ -1311,7 +2013,8 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3" + "pygments_lexer": "ipython3", + "version": "3.10.10" } }, "nbformat": 4, From 644e2037f5d81de923078ea9c24b8a716bb3bd5d Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue, 27 Jun 2023 18:37:26 +0000 Subject: [PATCH 06/22] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- .../01-high-level-computation-patterns.ipynb | 600 +----------------- 1 file changed, 15 insertions(+), 585 deletions(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index 04d70106..373d4c2e 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -98,31 +98,10 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "id": "02a9022f-1503-45a2-b57a-05ebfeb11d16", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 1, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "import numpy as np\n", "import xarray as xr\n", @@ -153,24 +132,16 @@ }, { "cell_type": "code", - "execution_count": 37, + "execution_count": null, "id": "70159772", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "[273.416748046875, 273.13104248046875, 275.1137390136719, 278.5469970703125, 283.299072265625, 287.5657043457031, 289.90692138671875, 290.089111328125, 287.41375732421875, 283.6811828613281, 277.9678039550781, 274.35107421875]\n" - ] - } - ], - "source": [ - "months = [1,2,3,4,5,6,7,8,9,10,11,12]\n", + "outputs": [], + "source": [ + "months = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]\n", "avg_temps = []\n", "\n", "for mon in months:\n", - " avg = da[da[\"time.month\"]==mon].mean()\n", + " avg = da[da[\"time.month\"] == mon].mean()\n", " avg_temps.append(float(avg.data))\n", "\n", "print(avg_temps)" @@ -188,559 +159,24 @@ }, { "cell_type": "code", - "execution_count": 54, + "execution_count": null, "id": "6f1b23fa", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "273.41675\n", - "273.13104\n", - "275.11374\n", - "278.547\n", - "283.29907\n", - "287.5657\n", - "289.90692\n", - "290.0891\n", - "287.41376\n", - "283.68118\n", - "277.9678\n", - "274.35107\n", - "[[[246.34987 246.38608 246.21518 ... 243.06113 244.08795 245.6467 ]\n", - " [248.8576 248.90733 248.7104 ... 241.52866 243.50865 246.75471]\n", - " [251.57712 251.19661 250.71463 ... 243.39891 246.78462 251.56572]\n", - " ...\n", - " [295.85028 295.24405 295.22684 ... 295.18625 294.65707 294.0485 ]\n", - " [296.5446 296.46982 296.15994 ... 295.35593 295.0812 294.53006]\n", - " [297.15417 297.2383 297.04892 ... 296.01797 295.77554 295.63647]]\n", - "\n", - " [[246.67715 246.40576 245.9484 ... 241.85838 243.0021 244.44383]\n", - " [247.8001 247.75992 247.47757 ... 240.64706 242.26633 245.06662]\n", - " [249.07079 248.57234 247.94254 ... 242.42874 245.33348 249.72273]\n", - " ...\n", - " [295.92886 295.41788 295.16602 ... 294.4443 293.78143 293.18265]\n", - " [296.78754 296.63443 296.15707 ... 294.51166 294.2178 293.70258]\n", - " [297.2889 297.2165 296.85797 ... 295.16058 294.9558 294.87967]]\n", - "\n", - " [[251.03168 250.67987 250.18945 ... 242.19398 243.11484 244.30956]\n", - " [252.97194 252.86617 252.57347 ... 241.61102 243.02509 245.38196]\n", - " [254.46768 254.09142 253.63428 ... 241.67184 244.49664 248.74258]\n", - " ...\n", - " [295.65652 295.24582 295.22464 ... 294.7663 294.16412 293.6729 ]\n", - " [296.70294 296.68756 296.3824 ... 294.85083 294.57034 294.15213]\n", - " [297.38174 297.4631 297.22668 ... 295.3349 295.11124 295.01654]]\n", - "\n", - " ...\n", - "\n", - " [[261.8136 261.21255 260.5036 ... 248.19336 249.06995 250.41624]\n", - " [269.02225 268.92944 268.71478 ... 246.41554 248.16833 251.14897]\n", - " [269.64017 268.7958 268.45483 ... 246.01215 249.6174 254.69598]\n", - " ...\n", - " [299.09723 298.30466 297.9945 ... 299.09454 298.6955 298.29483]\n", - " [299.43155 299.23853 298.7375 ... 299.2589 299.28873 299.0363 ]\n", - " [299.37054 299.42462 299.15607 ... 299.72403 299.66312 299.76233]]\n", - "\n", - " [[253.74484 253.64487 253.49716 ... 242.96066 243.9345 245.14209]\n", - " [259.12967 258.62927 258.19144 ... 241.84921 243.07965 245.46625]\n", - " [261.04227 258.83536 257.51193 ... 242.38234 245.13663 249.52368]\n", - " ...\n", - " [297.8426 297.1406 296.98773 ... 297.96884 297.56888 297.1611 ]\n", - " [298.58783 298.42026 297.96896 ... 298.16412 298.19397 297.9083 ]\n", - " [298.81143 298.8566 298.62103 ... 298.72955 298.7519 298.8189 ]]\n", - "\n", - " [[247.971 248.02118 247.91302 ... 239.7719 241.02383 242.62823]\n", - " [249.73361 250.16037 250.48581 ... 238.78964 240.96469 244.11626]\n", - " [252.0296 251.53136 251.36629 ... 238.07542 241.91293 247.06987]\n", - " ...\n", - " [296.76508 295.97668 295.88922 ... 296.45605 296.09137 295.65756]\n", - " [297.46814 297.38025 297.04428 ... 296.8556 296.84668 296.52133]\n", - " [297.8809 297.9868 297.77554 ... 297.60034 297.5655 297.53763]]]\n" - ] - } - ], + "outputs": [], "source": [ "for label, group in da.groupby(\"time.month\"):\n", " print(group.mean().data)\n", - " \n", + "\n", "avg_temps = da.groupby(\"time.month\").mean()\n", "print(avg_temps.data)" ] }, { "cell_type": "code", - "execution_count": 49, + "execution_count": null, "id": "ad5ee977", "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
<xarray.DataArray 'air' (month: 12, lat: 25, lon: 53)>\n",
-       "246.3 246.4 246.2 245.8 245.2 244.6 ... 298.1 298.0 298.0 297.6 297.6 297.5\n",
-       "Coordinates:\n",
-       "  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0\n",
-       "  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0\n",
-       "  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12\n",
-       "Attributes:\n",
-       "    long_name:     4xDaily Air temperature at sigma level 995\n",
-       "    units:         degK\n",
-       "    precision:     2\n",
-       "    GRIB_id:       11\n",
-       "    GRIB_name:     TMP\n",
-       "    var_desc:      Air temperature\n",
-       "    dataset:       NMC Reanalysis\n",
-       "    level_desc:    Surface\n",
-       "    statistic:     Individual Obs\n",
-       "    parent_stat:   Other\n",
-       "    actual_range:  [185.16 322.1 ]
" - ], - "text/plain": [ - "\n", - "246.3 246.4 246.2 245.8 245.2 244.6 ... 298.1 298.0 298.0 297.6 297.6 297.5\n", - "Coordinates:\n", - " * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0\n", - " * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0\n", - " * month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12\n", - "Attributes:\n", - " long_name: 4xDaily Air temperature at sigma level 995\n", - " units: degK\n", - " precision: 2\n", - " GRIB_id: 11\n", - " GRIB_name: TMP\n", - " var_desc: Air temperature\n", - " dataset: NMC Reanalysis\n", - " level_desc: Surface\n", - " statistic: Individual Obs\n", - " parent_stat: Other\n", - " actual_range: [185.16 322.1 ]" - ] - }, - "execution_count": 49, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ "da.groupby(\"time.month\").mean()" ] @@ -1742,7 +1178,7 @@ "metadata": {}, "outputs": [], "source": [ - "da = da.assign({\"myseason\" : ((\"time\"), myseason)})\n", + "da = da.assign({\"myseason\": ((\"time\"), myseason)})\n", "(\n", " # Calculate climatology\n", " da.groupby(\"myseason\")\n", @@ -1929,7 +1365,7 @@ "metadata": {}, "outputs": [], "source": [ - "da[[\"time.month\"==12]].plot.hist()" + "da[[\"time.month\" == 12]].plot.hist()" ] }, { @@ -1999,11 +1435,6 @@ } ], "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, "language_info": { "codemirror_mode": { "name": "ipython", @@ -2013,8 +1444,7 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.10" + "pygments_lexer": "ipython3" } }, "nbformat": 4, From c25b749a289add851f55b9c24b0e59a198cb3039 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Thu, 29 Jun 2023 16:49:33 -0400 Subject: [PATCH 07/22] finish for loop to groupby example and add note about default behavior of mean() --- .../01-high-level-computation-patterns.ipynb | 55 +++++++++++++++---- 1 file changed, 43 insertions(+), 12 deletions(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index 373d4c2e..f41fa69b 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -114,6 +114,16 @@ "data.plot()" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "edc35fa6", + "metadata": {}, + "outputs": [], + "source": [ + "da" + ] + }, { "cell_type": "markdown", "id": "bd47d199", @@ -127,7 +137,7 @@ "\n", "Consider a common use case. We want to complete some \"task\" for each of \"something\". The \"task\" might be a computation (e.g. mean, median, plot). The \"something\" could be a group of array values (e.g. pixels) or segments of time (e.g. monthly or seasonally).\n", "\n", - "Often, our solution to this type of problem is to write a loop. Say we want the average air temperature for each month:" + "Often, our solution to this type of problem is to write a for loop. Say we want the average air temperature for each month across the entire domain (all lat and lon values):" ] }, { @@ -149,12 +159,10 @@ }, { "cell_type": "markdown", - "id": "c1772b16", + "id": "d3a992bf", "metadata": {}, "source": [ - "Writing a for-loop here is not wrong, but it can quickly become cumbersome if you have a complex function to apply and it will take awhile to compute on a large dataset (you may even run out of memory). Parallelizing the computation would take a lot of additional work.\n", - "\n", - "Xarray's functionality instead allows us to do the same computation in one line of code (plus, the computation is optimized and ready to take advantage of parallel compute resources)!" + "An easy conceptual next step for this example (but still using our for loop) would be to use Xarray's `groupby` function to create an iterator that does the work of grouping our data by month and looping over each month." ] }, { @@ -164,21 +172,33 @@ "metadata": {}, "outputs": [], "source": [ + "avg_temps = []\n", + "\n", "for label, group in da.groupby(\"time.month\"):\n", - " print(group.mean().data)\n", + " avg_temps.append(float(group.mean().data))\n", "\n", - "avg_temps = da.groupby(\"time.month\").mean()\n", - "print(avg_temps.data)" + "print(avg_temps)" + ] + }, + { + "cell_type": "markdown", + "id": "c1772b16", + "metadata": {}, + "source": [ + "Writing a for-loop here is not wrong, but it can quickly become cumbersome if you have a complex function to apply and it will take awhile to compute on a large dataset (you may even run out of memory). Parallelizing the computation would take a lot of additional work.\n", + "\n", + "Xarray's functionality instead allows us to do the same computation in one line of code (plus, the computation is optimized and ready to take advantage of parallel compute resources)!" ] }, { "cell_type": "code", "execution_count": null, - "id": "ad5ee977", + "id": "c53fda41", "metadata": {}, "outputs": [], "source": [ - "da.groupby(\"time.month\").mean()" + "avg_temps = da.groupby(\"time.month\").mean(...) # note the use of the ellipses here\n", + "print(avg_temps.data)" ] }, { @@ -186,7 +206,12 @@ "id": "4f548b71", "metadata": {}, "source": [ - "Read on through this tutorial to learn some of the incredible ways to use Xarray to avoid writing long for-loops and efficiently complete computational analyses on your data." + "Read on through this tutorial to learn some of the incredible ways to use Xarray to avoid writing long for-loops and efficiently complete computational analyses on your data.\n", + "\n", + "```{note}\n", + "By default, `da.mean()` (and `df.mean()`) will calculate the mean by reducing your data over all dimensions (unless you specify otherwise using the `dim` kwarg). The default behavior of `.mean()` on a groupby is to calculate the mean over all dimensions of the variable you are grouping by - but not all the dimensions of the object you are operating on. To compute the mean across all dimensions of a groupby, we must specify `...` for all dimensions (or use the `dim` kwarg to specify which dimensions to reduce by).\n", + "\n", + "```" ] }, { @@ -1435,6 +1460,11 @@ } ], "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, "language_info": { "codemirror_mode": { "name": "ipython", @@ -1444,7 +1474,8 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3" + "pygments_lexer": "ipython3", + "version": "3.10.10" } }, "nbformat": 4, From 3af07255a5b2038c6d0be08e2f6cc049e77053cc Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Thu, 29 Jun 2023 20:52:01 +0000 Subject: [PATCH 08/22] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- intermediate/01-high-level-computation-patterns.ipynb | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index f41fa69b..c4e1e201 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -197,7 +197,7 @@ "metadata": {}, "outputs": [], "source": [ - "avg_temps = da.groupby(\"time.month\").mean(...) # note the use of the ellipses here\n", + "avg_temps = da.groupby(\"time.month\").mean(...) # note the use of the ellipses here\n", "print(avg_temps.data)" ] }, @@ -1460,11 +1460,6 @@ } ], "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, "language_info": { "codemirror_mode": { "name": "ipython", @@ -1474,8 +1469,7 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.10" + "pygments_lexer": "ipython3" } }, "nbformat": 4, From 88c70640ef15ea5a4bd764e6f361b6e0037409d1 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Fri, 30 Jun 2023 10:44:53 -0400 Subject: [PATCH 09/22] add pointers to complex for loop to groupby example --- intermediate/01-high-level-computation-patterns.ipynb | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index c4e1e201..51d41876 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -202,6 +202,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "4f548b71", "metadata": {}, @@ -211,7 +212,9 @@ "```{note}\n", "By default, `da.mean()` (and `df.mean()`) will calculate the mean by reducing your data over all dimensions (unless you specify otherwise using the `dim` kwarg). The default behavior of `.mean()` on a groupby is to calculate the mean over all dimensions of the variable you are grouping by - but not all the dimensions of the object you are operating on. To compute the mean across all dimensions of a groupby, we must specify `...` for all dimensions (or use the `dim` kwarg to specify which dimensions to reduce by).\n", "\n", - "```" + "```\n", + "\n", + "For a more complex example (identifying flood events - including their start and end date - from rainfall data) illustrating the transition from for loops to high level computation tools, see [this discussion](https://github.com/pydata/xarray/discussions/7641). The [original 40 lines of code](https://github.com/pydata/xarray/discussions/7641#discussion-4976005), including nested for loops, was streamlined into [this ~14 line workflow](https://github.com/pydata/xarray/discussions/7641#discussioncomment-5635306) without any loops." ] }, { From 5247e25d4f54dee31913c71f164496cb8104b2c6 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Fri, 30 Jun 2023 11:14:45 -0400 Subject: [PATCH 10/22] change exercises and solutions to use directives --- .../01-high-level-computation-patterns.ipynb | 139 +++++++++--------- 1 file changed, 72 insertions(+), 67 deletions(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index 51d41876..8fd04820 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -202,7 +202,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "4f548b71", "metadata": {}, @@ -459,7 +458,7 @@ "source": [ "#### Storing the outputs from `rolling` operations with `construct`\n", "\n", - "In the above examples, we plotted the outputs of our rolling operations. Xarray makes it easy to store the outputs from `rolling` directly into the DataArray using the [`construct`](https://docs.xarray.dev/en/stable/generated/xarray.core.rolling.DataArrayRolling.construct.html#xarray.core.rolling.DataArrayRolling.construct) method." + "In the above examples, we plotted the outputs of our rolling operations. Xarray makes it easy to integrate the outputs from `rolling` directly into the DataArray using the [`construct`](https://docs.xarray.dev/en/stable/generated/xarray.core.rolling.DataArrayRolling.construct.html#xarray.core.rolling.DataArrayRolling.construct) method." ] }, { @@ -486,33 +485,37 @@ }, { "cell_type": "markdown", - "id": "0a23b9a9-076b-472d-b7a6-57083566a32d", + "id": "5d7562a7", "metadata": {}, "source": [ - "**Exercise** Calculate the 5 point running mean in time and add it to your DataArray using `rolling.construct`" + "Because `.construct()` only returns a \"view\" (not a copy) of the original data object, in order to \"save\" the results you would need to rewrite the original object: `simple = simple.rolling(time=5, center=True).construct(\"window\")`." ] }, { + "attachments": {}, "cell_type": "markdown", - "id": "fcd19bd6-0564-4b0e-b3cb-b5c31f88b4da", + "id": "0a23b9a9-076b-472d-b7a6-57083566a32d", "metadata": {}, "source": [ - "**Answer**\n" + "```{exercise}\n", + "Calculate the 5 point running mean in time and add it to your DataArray using `rolling.construct`\n", + "```" ] }, { - "cell_type": "code", - "execution_count": null, - "id": "80c80728-440a-43d9-957c-65bf111e710d", - "metadata": { - "jupyter": { - "source_hidden": true - }, - "tags": [] - }, - "outputs": [], + "attachments": {}, + "cell_type": "markdown", + "id": "fcd19bd6-0564-4b0e-b3cb-b5c31f88b4da", + "metadata": {}, "source": [ - "(simple.rolling(time=5, center=True).construct(\"window\").mean(\"window\"))" + "```{solution}\n", + ":class: dropdown\n", + "\n", + "```python\n", + "simple.rolling(time=5, center=True).construct(\"window\").mean(\"window\")\n", + "```\n", + "\n", + "```\n" ] }, { @@ -609,6 +612,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "b30794c7-1aeb-4e13-b6b4-824f23ac07df", "metadata": { @@ -620,30 +624,25 @@ "source": [ "#### Coarsen supports `reduce` for custom reductions\n", "\n", - "**Exercise** Use `coarsen.reduce` to apply `np.ptp` in 5x5 (lat x lon) point blocks to `data`" + "```{exercise}\n", + "Use `coarsen.reduce` to apply `np.ptp` in 5x5 (lat x lon) point blocks to `data`\n", + "```" ] }, { + "attachments": {}, "cell_type": "markdown", "id": "4f88d113-86d1-4158-b4e7-f54f98af3c0c", "metadata": {}, "source": [ - "**Answer**\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "446c773f-59e4-4b7a-86bd-fd7d40e223e6", - "metadata": { - "jupyter": { - "source_hidden": true - }, - "tags": [] - }, - "outputs": [], - "source": [ - "(data.coarsen(lat=5, lon=5, boundary=\"trim\").reduce(np.mean).plot())" + "```{solution}\n", + ":class: dropdown\n", + "\n", + "```python\n", + "data.coarsen(lat=5, lon=5, boundary=\"trim\").reduce(np.mean).plot()\n", + "```\n", + "\n", + "```\n" ] }, { @@ -761,37 +760,27 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "db43eb72-fb9f-4d6a-aab3-4617c9c41ab1", "metadata": {}, "source": [ - "**Exercise** Reshape the `time` dimension of the DataArray `monthly` to year x\n", - "month and visualize the seasonal cycle for two years at 250°E\n" + "```{exercise}\n", + "Reshape the `time` dimension of the DataArray `monthly` to year x\n", + "month and visualize the seasonal cycle for two years at 250°E\n", + "```\n" ] }, { + "attachments": {}, "cell_type": "markdown", "id": "b668514e-b40c-4c64-98bf-4579747ae6ab", "metadata": {}, "source": [ - "**Answer**\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "d01c6873-dc67-4d8b-928a-ad4f834429fa", - "metadata": { - "jupyter": { - "source_hidden": true - }, - "slideshow": { - "slide_type": "subslide" - }, - "tags": [] - }, - "outputs": [], - "source": [ + "```{solution}\n", + ":class: dropdown\n", + "\n", + "```python\n", "# splits time dimension into year x month\n", "year_month = monthly.coarsen(time=12).construct(time=(\"year\", \"month\"))\n", "\n", @@ -815,35 +804,45 @@ "year_month[\"year\"] = [2013, 2014]\n", "\n", "# seasonal cycle for two years\n", - "year_month.sel(lon=250).plot.contourf(col=\"year\", x=\"month\", y=\"lat\")" + "year_month.sel(lon=250).plot.contourf(col=\"year\", x=\"month\", y=\"lat\")\n", + "```\n", + "\n", + "```\n" ] }, { + "attachments": {}, "cell_type": "markdown", "id": "4de2984e-9c28-4ed7-909f-bab47b6eae49", "metadata": {}, "source": [ "This exercise came up during a live lecture.\n", "\n", - "**Exercise** Calculate the rolling 4 month average, averaged across years.\n", - "\n", - "**Answer**\n", - "\n", - "1. We first reshape using `coarsen.construct` to add `year` as a new dimension.\n", - "2. Apply `rolling` on the month dimension.\n", - "3. It turns out that `roll.mean([\"year\", \"month\"])` doesn't work. So we use `roll.construct` to get a DataArray with a new dimension `window` and then take the mean over `window` and `year`\n" + "```{exercise}\n", + "Calculate the rolling 4 month average, averaged across years.\n", + "```\n" ] }, { - "cell_type": "code", - "execution_count": null, + "attachments": {}, + "cell_type": "markdown", "id": "9d907b2b-c9c8-41cb-8af0-756d5c21ffef", "metadata": {}, - "outputs": [], "source": [ + "```{solution}\n", + ":class: dropdown\n", + "\n", + "1. We first reshape using `coarsen.construct` to add `year` as a new dimension.\n", + "2. Apply `rolling` on the month dimension.\n", + "3. It turns out that `roll.mean([\"year\", \"month\"])` doesn't work. So we use `roll.construct` to get a DataArray with a new dimension `window` and then take the mean over `window` and `year`\n", + "\n", + "```python\n", "reshaped = months.coarsen(time=12).construct(time=(\"year\", \"month\"))\n", "roll = reshaped.rolling(month=4, center=True)\n", - "roll.construct(\"window\").mean([\"window\", \"year\"])" + "roll.construct(\"window\").mean([\"window\", \"year\"])\n", + "```\n", + "\n", + "```" ] }, { @@ -1463,6 +1462,11 @@ } ], "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, "language_info": { "codemirror_mode": { "name": "ipython", @@ -1472,7 +1476,8 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3" + "pygments_lexer": "ipython3", + "version": "3.10.10" } }, "nbformat": 4, From fbca164e9a1f3c5637bf09a64e425cbbfa26db33 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Fri, 30 Jun 2023 11:17:33 -0400 Subject: [PATCH 11/22] fix spelling typo --- intermediate/01-high-level-computation-patterns.ipynb | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index 8fd04820..c99fad97 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -1281,6 +1281,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "b5724cbb-dd2a-4dc5-9118-7c7dd294755f", "metadata": { @@ -1289,7 +1290,7 @@ "source": [ "### Custom reductions with GroupBy\n", "\n", - "Analagous to `rolling`, `reduce` and `map` apply custom reductions to `groupby_bins` and `resample`.\n" + "Analogous to `rolling`, `reduce` and `map` apply custom reductions to `groupby_bins` and `resample`.\n" ] }, { From 8e839226b56910d4802913c59d5cc292b5a62dd1 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Fri, 30 Jun 2023 11:55:11 -0400 Subject: [PATCH 12/22] debug code cells --- .../01-high-level-computation-patterns.ipynb | 44 ++----------------- 1 file changed, 3 insertions(+), 41 deletions(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index c99fad97..b24198a4 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -492,7 +492,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "0a23b9a9-076b-472d-b7a6-57083566a32d", "metadata": {}, @@ -503,7 +502,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "fcd19bd6-0564-4b0e-b3cb-b5c31f88b4da", "metadata": {}, @@ -612,7 +610,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "b30794c7-1aeb-4e13-b6b4-824f23ac07df", "metadata": { @@ -630,7 +627,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "4f88d113-86d1-4158-b4e7-f54f98af3c0c", "metadata": {}, @@ -760,7 +756,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "db43eb72-fb9f-4d6a-aab3-4617c9c41ab1", "metadata": {}, @@ -772,7 +767,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "b668514e-b40c-4c64-98bf-4579747ae6ab", "metadata": {}, @@ -811,7 +805,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "4de2984e-9c28-4ed7-909f-bab47b6eae49", "metadata": {}, @@ -824,7 +817,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "9d907b2b-c9c8-41cb-8af0-756d5c21ffef", "metadata": {}, @@ -951,7 +943,7 @@ "metadata": {}, "outputs": [], "source": [ - "da.time.month.plot()" + "da.time.dt.month.plot()" ] }, { @@ -1177,9 +1169,6 @@ "metadata": {}, "outputs": [], "source": [ - "# Why don't we add the season array to the original dataarray (da) and\n", - "# use da.groupby(\"myseason\")? To me, it's confusing to now need two dataarrays\n", - "\n", "(\n", " # Calculate climatology\n", " da.groupby(myseason_da)\n", @@ -1190,32 +1179,6 @@ ")" ] }, - { - "cell_type": "markdown", - "id": "cae97fe4", - "metadata": {}, - "source": [ - "Equivalently, we could add our custom seasons to our original DataArray and use groupby there." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "36dfd6fd", - "metadata": {}, - "outputs": [], - "source": [ - "da = da.assign({\"myseason\": ((\"time\"), myseason)})\n", - "(\n", - " # Calculate climatology\n", - " da.groupby(\"myseason\")\n", - " .mean()\n", - " # reindex to get seasons in logical order (not alphabetical order)\n", - " .reindex(time=[\"DJF\", \"MAM\", \"JJAS\", \"ON\"])\n", - " .plot(col=\"time\")\n", - ")" - ] - }, { "cell_type": "markdown", "id": "7f3097b9-987f-4f9b-9c82-d5c6e879a862", @@ -1281,7 +1244,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "b5724cbb-dd2a-4dc5-9118-7c7dd294755f", "metadata": { @@ -1383,7 +1345,7 @@ "id": "d339c52c", "metadata": {}, "source": [ - "Remember, this is example is just to show how you could operate on each group object in a groupby operation. If we wanted to just explore the December (or March) data, we should just filter for it directly:" + "Remember, this example is just to show how you could operate on each group object in a groupby operation. If we wanted to just explore the December (or March) data, we should just filter for it directly:" ] }, { @@ -1393,7 +1355,7 @@ "metadata": {}, "outputs": [], "source": [ - "da[[\"time.month\" == 12]].plot.hist()" + "da[da[\"time.month\"] == 12].plot.hist()" ] }, { From d75298c2048057659e134f449dc5c3ce07227277 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Fri, 30 Jun 2023 15:56:07 +0000 Subject: [PATCH 13/22] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- intermediate/01-high-level-computation-patterns.ipynb | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index b24198a4..5e5dbe31 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -1425,11 +1425,6 @@ } ], "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, "language_info": { "codemirror_mode": { "name": "ipython", @@ -1439,8 +1434,7 @@ "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.10" + "pygments_lexer": "ipython3" } }, "nbformat": 4, From bd23865972e011179bff4345b4649c2b0bfd09f6 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Mon, 3 Jul 2023 14:58:40 -0400 Subject: [PATCH 14/22] add another tic to solution syntax --- .../01-high-level-computation-patterns.ipynb | 20 +++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index 5e5dbe31..f4157f8b 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -502,18 +502,19 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "fcd19bd6-0564-4b0e-b3cb-b5c31f88b4da", "metadata": {}, "source": [ - "```{solution}\n", + "````{solution}\n", ":class: dropdown\n", "\n", "```python\n", "simple.rolling(time=5, center=True).construct(\"window\").mean(\"window\")\n", "```\n", "\n", - "```\n" + "````\n" ] }, { @@ -627,18 +628,19 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "4f88d113-86d1-4158-b4e7-f54f98af3c0c", "metadata": {}, "source": [ - "```{solution}\n", + "````{solution}\n", ":class: dropdown\n", "\n", "```python\n", "data.coarsen(lat=5, lon=5, boundary=\"trim\").reduce(np.mean).plot()\n", "```\n", "\n", - "```\n" + "````\n" ] }, { @@ -767,11 +769,12 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "b668514e-b40c-4c64-98bf-4579747ae6ab", "metadata": {}, "source": [ - "```{solution}\n", + "````{solution}\n", ":class: dropdown\n", "\n", "```python\n", @@ -801,7 +804,7 @@ "year_month.sel(lon=250).plot.contourf(col=\"year\", x=\"month\", y=\"lat\")\n", "```\n", "\n", - "```\n" + "````\n" ] }, { @@ -817,11 +820,12 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "9d907b2b-c9c8-41cb-8af0-756d5c21ffef", "metadata": {}, "source": [ - "```{solution}\n", + "````{solution}\n", ":class: dropdown\n", "\n", "1. We first reshape using `coarsen.construct` to add `year` as a new dimension.\n", @@ -834,7 +838,7 @@ "roll.construct(\"window\").mean([\"window\", \"year\"])\n", "```\n", "\n", - "```" + "````" ] }, { From 7dabe1a3ed027a0286bb75c7f9b91583f9a878b0 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Mon, 3 Jul 2023 15:04:24 -0400 Subject: [PATCH 15/22] add generalize input for solution --- intermediate/01-high-level-computation-patterns.ipynb | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index f4157f8b..45cf803c 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -507,7 +507,7 @@ "id": "fcd19bd6-0564-4b0e-b3cb-b5c31f88b4da", "metadata": {}, "source": [ - "````{solution}\n", + "````{solution} generalize\n", ":class: dropdown\n", "\n", "```python\n", @@ -633,7 +633,7 @@ "id": "4f88d113-86d1-4158-b4e7-f54f98af3c0c", "metadata": {}, "source": [ - "````{solution}\n", + "````{solution} generalize\n", ":class: dropdown\n", "\n", "```python\n", @@ -774,7 +774,7 @@ "id": "b668514e-b40c-4c64-98bf-4579747ae6ab", "metadata": {}, "source": [ - "````{solution}\n", + "````{solution} generalize\n", ":class: dropdown\n", "\n", "```python\n", @@ -825,7 +825,7 @@ "id": "9d907b2b-c9c8-41cb-8af0-756d5c21ffef", "metadata": {}, "source": [ - "````{solution}\n", + "````{solution} generalize\n", ":class: dropdown\n", "\n", "1. We first reshape using `coarsen.construct` to add `year` as a new dimension.\n", From 641225527ff9e33e47fa0f2b627732ebed307312 Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Wed, 5 Jul 2023 15:43:51 -0400 Subject: [PATCH 16/22] apply suggestions from review --- .../01-high-level-computation-patterns.ipynb | 159 +++++++++++------- 1 file changed, 102 insertions(+), 57 deletions(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index 45cf803c..3039db7b 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -1,6 +1,7 @@ { "cells": [ { + "attachments": {}, "cell_type": "markdown", "id": "a349a0a5-eeb3-410a-b5d1-f472a8ca14b2", "metadata": { @@ -10,14 +11,14 @@ "tags": [] }, "source": [ - "# Utilizing computational patterns\n", - "\n", - "From https://toolz.readthedocs.io/en/latest/control.html\n", + "# Computational Patterns\n", "\n", "Often when writing code we repeat certain patterns, whether we realize it or not.\n", "If you have learned to write list comprehensions, you are taking advantage of a \"control pattern\".\n", "Often, these patterns are so common that many packages have built in functions to implement them.\n", "\n", + "Quoting the [toolz documentation](https://toolz.readthedocs.io/en/latest/control.html):\n", + "\n", "> The Toolz library contains dozens of patterns like map and groupby. Learning a\n", "> core set (maybe a dozen) covers the vast majority of common programming tasks\n", "> often done by hand. A rich vocabulary of core control functions conveys the\n", @@ -27,10 +28,11 @@ "> - You make fewer errors in rote coding\n", "> - You can depend on well tested and benchmarked implementations\n", "\n", - "The same is true for xarray" + "The same is true for xarray." ] }, { + "attachments": {}, "cell_type": "markdown", "id": "623d5170-f32d-4643-9a59-c54768ee7185", "metadata": { @@ -43,11 +45,13 @@ "## Motivation / Learning goals\n", "\n", "- Learn what high-level computational patterns are available in Xarray\n", - "- Identify when you are using a high-level computational pattern\n", - "- Implement that pattern using built-in Xarray functionality" + "- Identify when you are re-implementing a high-level computational pattern\n", + "- Implement that pattern using built-in Xarray functionality\n", + "- Understand the difference between `map` and `reduce`" ] }, { + "attachments": {}, "cell_type": "markdown", "id": "5f0cda65-cfaa-42ed-bd4d-f290c1e98bb3", "metadata": { @@ -70,17 +74,21 @@ "analysis tasks.\n", "\n", "1. `rolling` :\n", - " [Operate on rolling windows of your data e.g. running mean](https://docs.xarray.dev/en/stable/user-guide/computation.html#rolling-window-operations)\n", + " [Operate on rolling windows of your data e.g. running mean.](https://docs.xarray.dev/en/stable/user-guide/computation.html#rolling-window-operations)\n", "1. `coarsen` :\n", - " [Downsample your data](https://docs.xarray.dev/en/stable/user-guide/computation.html#coarsen-large-arrays)\n", + " [Downsample your data.](https://docs.xarray.dev/en/stable/user-guide/computation.html#coarsen-large-arrays)\n", "1. `groupby` :\n", - " [Bin data in to groups and reduce](https://docs.xarray.dev/en/stable/groupby.html)\n", + " [Bin data in to groups and reduce.](https://docs.xarray.dev/en/stable/groupby.html)\n", "1. `groupby_bins`: GroupBy after discretizing a numeric variable.\n", "1. `resample` :\n", " [GroupBy specialized for time axes. Either downsample or upsample your data.](https://docs.xarray.dev/en/stable/user-guide/time-series.html#resampling-and-grouped-operations)\n", + "1. `weighted` : [Weight your data before reducing.](https://docs.xarray.dev/en/stable/user-guide/computation.html#weighted-array-reductions), as in [this tutorial](https://tutorial.xarray.dev/fundamentals/03.4_weighted.html).\n", "\n", "\n", - "Note: the documentation links in this tutorial point to the DataArray implementations of each function, but they are also available for DataSet objects.\n" + "\n", + "```{Note}\n", + "the documentation links in this tutorial point to the DataArray implementations of each function, but they are also available for DataSet objects.\n", + "```\n" ] }, { @@ -111,7 +119,7 @@ "da = xr.tutorial.load_dataset(\"air_temperature\", engine=\"netcdf4\").air\n", "monthly = da.resample(time=\"M\").mean()\n", "data = da.isel(time=0)\n", - "data.plot()" + "data.plot();" ] }, { @@ -125,6 +133,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "bd47d199", "metadata": {}, @@ -202,10 +211,13 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "4f548b71", "metadata": {}, "source": [ + "Here we showed an example for computing a mean over a certain period of time (months), which ultimately uses the `GroupBy` function. The transition from loops to a built-in function is similar for `rolling` and `coarsen` over windows of values (e.g. pixels) instead of \"groups\" of time.\n", + "\n", "Read on through this tutorial to learn some of the incredible ways to use Xarray to avoid writing long for-loops and efficiently complete computational analyses on your data.\n", "\n", "```{note}\n", @@ -360,7 +372,7 @@ "metadata": {}, "outputs": [], "source": [ - "data.plot()" + "data.plot();" ] }, { @@ -378,10 +390,11 @@ "metadata": {}, "outputs": [], "source": [ - "data.rolling(lat=5, lon=5, center=True).mean().plot()" + "data.rolling(lat=5, lon=5, center=True).mean().plot();" ] }, { + "attachments": {}, "cell_type": "markdown", "id": "b88c116e-ad63-4fea-81a7-bcabc194dee5", "metadata": { @@ -395,7 +408,9 @@ "\n", "In some cases, we may want to apply a sliding window function using rolling that is not built in to Xarray. In these cases we can still leverage the sliding windows of rolling and apply our own function with [`reduce`](https://docs.xarray.dev/en/stable/generated/xarray.core.rolling.DataArrayRolling.reduce.html).\n", "\n", - "Tip: The `reduce` method expects a function that can receive and return plain arrays (e.g. numpy), as in each of the \"windows\" provided by the rolling iterator. This is in contrast to the `map` method, which expects a function that can receive and return Xarray objects.\n", + "```{tip}\n", + " The `reduce` method expects a function that can receive and return plain arrays (e.g. numpy), as in each of the \"windows\" provided by the rolling iterator. This is in contrast to the `map` method, which expects a function that can receive and return Xarray objects.\n", + "```\n", "\n", "Here's an example function: [`np.ptp`](https://numpy.org/doc/stable/reference/generated/numpy.ptp.html).\n" ] @@ -407,29 +422,26 @@ "metadata": {}, "outputs": [], "source": [ - "data.rolling(lat=5, lon=5, center=True).reduce(np.ptp).plot()" + "data.rolling(lat=5, lon=5, center=True).reduce(np.ptp).plot();" ] }, { + "attachments": {}, "cell_type": "markdown", "id": "9ef251aa-ce3c-4318-95ba-470568ebd967", "metadata": {}, "source": [ - "**Exercise** Calculate the rolling mean in 5 point bins along both latitude and longitude using\n", - "[`rolling(**kwargs).reduce`](https://docs.xarray.dev/en/stable/generated/xarray.core.rolling.DataArrayRolling.reduce.html)\n" + "```{exercise} generalize\n", + "\n", + "Calculate the rolling mean in 5 point bins along both latitude and longitude using\n", + "[`rolling(**kwargs).reduce`](https://docs.xarray.dev/en/stable/generated/xarray.core.rolling.DataArrayRolling.reduce.html)\n", + "\n", + "```" ] }, { + "attachments": {}, "cell_type": "markdown", - "id": "75397b3d-5961-4924-b688-23520b79aae8", - "metadata": {}, - "source": [ - "**Answer**\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, "id": "a36cbf94-ed41-42c6-8ccf-9278927d395b", "metadata": { "jupyter": { @@ -440,13 +452,20 @@ }, "tags": [] }, - "outputs": [], "source": [ + "````{solution} generalize\n", + ":class: dropdown\n", + "\n", + "```python\n", "# exactly equivalent to data.rolling(...).mean()\n", - "data.rolling(lat=5, lon=5, center=True).reduce(np.mean).plot()" + "data.rolling(lat=5, lon=5, center=True).reduce(np.mean).plot();\n", + "```\n", + "\n", + "````" ] }, { + "attachments": {}, "cell_type": "markdown", "id": "d0155b62-d08f-42c6-b467-1af73a7829c0", "metadata": { @@ -456,7 +475,7 @@ "tags": [] }, "source": [ - "#### Storing the outputs from `rolling` operations with `construct`\n", + "#### View outputs from `rolling` operations with `construct`\n", "\n", "In the above examples, we plotted the outputs of our rolling operations. Xarray makes it easy to integrate the outputs from `rolling` directly into the DataArray using the [`construct`](https://docs.xarray.dev/en/stable/generated/xarray.core.rolling.DataArrayRolling.construct.html#xarray.core.rolling.DataArrayRolling.construct) method." ] @@ -484,11 +503,12 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "5d7562a7", "metadata": {}, "source": [ - "Because `.construct()` only returns a \"view\" (not a copy) of the original data object, in order to \"save\" the results you would need to rewrite the original object: `simple = simple.rolling(time=5, center=True).construct(\"window\")`." + "Because `.construct()` only returns a \"view\" (not a copy) of the original data object (i.e. it is not operating \"in-place\"), in order to \"save\" the results you would need to rewrite the original object: `simple = simple.rolling(time=5, center=True).construct(\"window\")`." ] }, { @@ -518,16 +538,15 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "c81f95b1-e1c3-4b28-8b7f-9519c9316e3f", "metadata": {}, "source": [ "`construct` is clever.\n", "\n", - "1. It constructs a **view** of the original array, so it is memory-efficient.\n", - " but you didn't have to know that.\n", - "1. It does something sensible for dask arrays (though generally you want big\n", - " chunksizes for the dimension you're sliding along).\n", + "1. It constructs a [**view**](https://numpy.org/doc/stable/user/basics.copies.html) of the original array, so it is memory-efficient.\n", + "1. It does something sensible for dask arrays (though generally you want big chunksizes for the dimension you're sliding along).\n", "1. It also works with rolling along multiple dimensions!\n" ] }, @@ -587,7 +606,7 @@ "metadata": {}, "outputs": [], "source": [ - "data.plot()" + "data.plot();" ] }, { @@ -607,7 +626,7 @@ "metadata": {}, "outputs": [], "source": [ - "(data.coarsen(lat=5, lon=5, boundary=\"trim\").mean().plot())" + "(data.coarsen(lat=5, lon=5, boundary=\"trim\").mean().plot();)" ] }, { @@ -637,7 +656,7 @@ ":class: dropdown\n", "\n", "```python\n", - "data.coarsen(lat=5, lon=5, boundary=\"trim\").reduce(np.mean).plot()\n", + "data.coarsen(lat=5, lon=5, boundary=\"trim\").reduce(np.mean).plot();\n", "```\n", "\n", "````\n" @@ -752,9 +771,19 @@ " .pad(time=(1, 0), constant_values=-1)\n", " .coarsen(time=12)\n", " .construct(time=(\"year\", \"month\"))\n", - ")\n", - "\n", - "# NOTE: check output of this cell (why is the first value of time nan instead of -1?)" + ")\n" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "fbe916a3", + "metadata": {}, + "source": [ + "```{note}\n", + "The value specified in `.pad` only applies the `fill_value` to the array, not to coordinate variables.\n", + "This is why the first value of time in the above example is NaN and not -1.\n", + "```" ] }, { @@ -858,6 +887,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "25fd132c-5436-4af6-b8ad-75269cb45e75", "metadata": { @@ -879,7 +909,11 @@ "- `groupby_bins`: Use binning operations, e.g. histograms, to group your data.\n", "- `resample`: Specialized implementation of GroupBy specifically for time grouping (so far)\n", "\n", - "**hint** Both `groupby_bins` and `resample` are implemented as `GroupBy` with a specific way of constructing group labels.\n", + "\n", + "```{hint}\n", + " Both `groupby_bins` and `resample` are implemented as `GroupBy` with a specific way of constructing group labels.\n", + "```\n", + "\n", "\n", "### Deconstructing GroupBy\n", "\n", @@ -914,7 +948,7 @@ "source": [ "# GroupBy returns an iterator that traverses the specified groups, here by month.\n", "# Notice that groupby is clever enough for us to leave out the `.dt` before `.month`\n", - "# we would need to specify to access the month data directly (see plot below).\n", + "# we would need to specify to access the month data directly, as in `da.time.dt.month`.\n", "da.groupby(\"time.month\")" ] }, @@ -947,7 +981,7 @@ "metadata": {}, "outputs": [], "source": [ - "da.time.dt.month.plot()" + "da[\"time.month\"].plot();" ] }, { @@ -987,11 +1021,16 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "3763efb3", "metadata": {}, "source": [ - "QUSTION (intentionally spelled wrong so a check will catch this block) - I want to explain why resample with time=\"M\" has 24 bins, while groupby over month had 12. But I don't actually know..." + "```{note}\n", + "\n", + "Resampling is changing the frequency of our data to monthly (for two years), so we have 24 bins. GroupBy is taking the average across all data in the same month for two years, so we have 12 bins.\n", + "\n", + "```" ] }, { @@ -1083,13 +1122,14 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "db7bd7e6-59cd-4b2a-ac37-2ff4d40d9fc8", "metadata": {}, "source": [ "#### Construct and use custom labels\n", "\n", - "**Demo** Grouping over a custom definition of seasons using numpy.isin.\n", + "##### Custom seasons with `numpy.isin`.\n", "\n", "We want to group over four seasons: `DJF`, `MAM`, `JJAS`, `ON` - this makes physical sense in the Indian Ocean basin.\n", "\n", @@ -1270,14 +1310,18 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "7cd7ede5-8e57-4099-ab39-b9d75427f125", "metadata": {}, "source": [ - "**tip** `map` is for functions that expect and return xarray objects (see also [`Dataset.map`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.map.html)). `reduce` is for functions that expect and return plain arrays (like Numpy or SciPy functions).\n" + "```{tip}\n", + " `map` is for functions that expect and return xarray objects (see also [`Dataset.map`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.map.html)). `reduce` is for functions that expect and return plain arrays (like Numpy or SciPy functions).\n", + "```\n" ] }, { + "attachments": {}, "cell_type": "markdown", "id": "ed904da0-fb85-4432-8083-ef644209affd", "metadata": { @@ -1290,7 +1334,7 @@ "\n", "#### Instead looping over groupby objects is possible\n", "\n", - "Because `groupby` returns an iterator that loops over each group, it is easy to loop over groupby objects.\n", + "Because `groupby` returns an iterator that loops over each group, it is easy to loop over groupby objects. You can also iterate over `rolling` and `coarsen` objects, however this approach is usually quite slow.\n", "\n", "Maybe you want to plot data in each group separately:\n" ] @@ -1400,6 +1444,7 @@ ] }, { + "attachments": {}, "cell_type": "markdown", "id": "a37ec2e3-24a0-4306-abe0-a6c04933bd88", "metadata": {}, @@ -1409,22 +1454,22 @@ "Xarray provides methods for high-level analysis patterns:\n", "\n", "1. `rolling` :\n", - " [Operate on rolling (fixed length, overlapping) windows of your data e.g. running mean](https://docs.xarray.dev/en/stable/user-guide/computation.html#rolling-window-operations)\n", + " [Operate on rolling (fixed length, overlapping) windows of your data e.g. running mean.](https://docs.xarray.dev/en/stable/user-guide/computation.html#rolling-window-operations)\n", "1. `coarsen` :\n", - " [Operate on blocks (fixed length) of your data (downsample)](https://docs.xarray.dev/en/stable/user-guide/computation.html#coarsen-large-arrays)\n", + " [Operate on blocks (fixed length) of your data (downsample).](https://docs.xarray.dev/en/stable/user-guide/computation.html#coarsen-large-arrays)\n", "1. `groupby` :\n", - " [Parse data into groups (using an exact value) and operate on each one (reduce data)](https://docs.xarray.dev/en/stable/groupby.html)\n", - "1. `groupby_bins`: [GroupBy after discretizing a numeric (non-exact, e.g. float) variable](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.groupby_bins.html)\n", + " [Parse data into groups (using an exact value) and operate on each one (reduce data).](https://docs.xarray.dev/en/stable/groupby.html)\n", + "1. `groupby_bins`: [GroupBy after discretizing a numeric (non-exact, e.g. float) variable.](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.groupby_bins.html)\n", "1. `resample` :\n", " [Groupby specialized for time axes. Either downsample or upsample your data.](https://docs.xarray.dev/en/stable/user-guide/time-series.html#resampling-and-grouped-operations)\n", "\n", + "Xarray also provides features to make using those patterns easy:\n", "\n", - "## More resources\n", - "\n", - "1. [Weight your data before reducing](https://docs.xarray.dev/en/stable/user-guide/computation.html#weighted-array-reductions)\n", - "1. More tutorials here: https://tutorial.xarray.dev/\n", - "1. Answers to common questions on \"how to do X\" are here:\n", - " https://docs.xarray.dev/en/stable/howdoi.html\n" + "1. [Weight your data before reducing.](https://docs.xarray.dev/en/stable/user-guide/computation.html#weighted-array-reductions)\n", + "1. Iterate over the operators (`rolling`, `coarsen`, `groupby`, `groupby_bins`, `resample`).\n", + "1. Apply functions that accept numpy-like arrays with `reduce`.\n", + "1. Reshape to a new xarray object with `.construct` (`rolling`, `coarsen` only).\n", + "1. Apply functions that accept xarray objects with `map` (`groupby`, `groupby_bins`, `resample` only).\n" ] } ], From 68a9f2b18508564a0cabfaf54a0e025209c9c8c2 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Wed, 5 Jul 2023 19:44:13 +0000 Subject: [PATCH 17/22] [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --- intermediate/01-high-level-computation-patterns.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index 3039db7b..97e0db91 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -771,7 +771,7 @@ " .pad(time=(1, 0), constant_values=-1)\n", " .coarsen(time=12)\n", " .construct(time=(\"year\", \"month\"))\n", - ")\n" + ")" ] }, { From 2086172b885c2348bca9147630a07e5d156a77ea Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Thu, 6 Jul 2023 09:27:24 -0400 Subject: [PATCH 18/22] fix typo causing CI failure --- intermediate/01-high-level-computation-patterns.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index 97e0db91..5d4f8833 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -626,7 +626,7 @@ "metadata": {}, "outputs": [], "source": [ - "(data.coarsen(lat=5, lon=5, boundary=\"trim\").mean().plot();)" + "(data.coarsen(lat=5, lon=5, boundary=\"trim\").mean().plot())" ] }, { From 38d5bd2fe1a48dd4500d6725e57cd289c234e58b Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Thu, 6 Jul 2023 14:58:34 -0400 Subject: [PATCH 19/22] remove link with failing anchor --- intermediate/01-high-level-computation-patterns.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index 5d4f8833..5baca58e 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -225,7 +225,7 @@ "\n", "```\n", "\n", - "For a more complex example (identifying flood events - including their start and end date - from rainfall data) illustrating the transition from for loops to high level computation tools, see [this discussion](https://github.com/pydata/xarray/discussions/7641). The [original 40 lines of code](https://github.com/pydata/xarray/discussions/7641#discussion-4976005), including nested for loops, was streamlined into [this ~14 line workflow](https://github.com/pydata/xarray/discussions/7641#discussioncomment-5635306) without any loops." + "For a more complex example (identifying flood events - including their start and end date - from rainfall data) illustrating the transition from for loops to high level computation tools, see [this discussion](https://github.com/pydata/xarray/discussions/7641). The [original 40 lines of code](https://github.com/pydata/xarray/discussions/7641#discussion-4976005), including nested for loops, was streamlined into a ~15 line workflow without any loops." ] }, { From ded5fbc6c71bf9abd230ebf46ea085f478a5d13a Mon Sep 17 00:00:00 2001 From: Jessica Scheick Date: Thu, 6 Jul 2023 15:10:01 -0400 Subject: [PATCH 20/22] add :label: to solution directive --- .../01-high-level-computation-patterns.ipynb | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index 5baca58e..d79ea096 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -453,7 +453,8 @@ "tags": [] }, "source": [ - "````{solution} generalize\n", + "````{solution} \n", + ":label: generalize\n", ":class: dropdown\n", "\n", "```python\n", @@ -527,7 +528,8 @@ "id": "fcd19bd6-0564-4b0e-b3cb-b5c31f88b4da", "metadata": {}, "source": [ - "````{solution} generalize\n", + "````{solution}\n", + ":label: generalize\n", ":class: dropdown\n", "\n", "```python\n", @@ -652,7 +654,8 @@ "id": "4f88d113-86d1-4158-b4e7-f54f98af3c0c", "metadata": {}, "source": [ - "````{solution} generalize\n", + "````{solution}\n", + ":label: generalize\n", ":class: dropdown\n", "\n", "```python\n", @@ -803,7 +806,8 @@ "id": "b668514e-b40c-4c64-98bf-4579747ae6ab", "metadata": {}, "source": [ - "````{solution} generalize\n", + "````{solution}\n", + ":label: generalize\n", ":class: dropdown\n", "\n", "```python\n", @@ -854,7 +858,8 @@ "id": "9d907b2b-c9c8-41cb-8af0-756d5c21ffef", "metadata": {}, "source": [ - "````{solution} generalize\n", + "````{solution}\n", + ":label: generalize\n", ":class: dropdown\n", "\n", "1. We first reshape using `coarsen.construct` to add `year` as a new dimension.\n", From 05c8d9c3689e9444b7746929e30674956dc626f3 Mon Sep 17 00:00:00 2001 From: dcherian Date: Thu, 6 Jul 2023 15:21:53 -0600 Subject: [PATCH 21/22] fix? --- .../01-high-level-computation-patterns.ipynb | 71 ++++++------------- 1 file changed, 22 insertions(+), 49 deletions(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index d79ea096..cca30a9f 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -1,7 +1,6 @@ { "cells": [ { - "attachments": {}, "cell_type": "markdown", "id": "a349a0a5-eeb3-410a-b5d1-f472a8ca14b2", "metadata": { @@ -32,7 +31,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "623d5170-f32d-4643-9a59-c54768ee7185", "metadata": { @@ -51,7 +49,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "5f0cda65-cfaa-42ed-bd4d-f290c1e98bb3", "metadata": { @@ -133,12 +130,11 @@ ] }, { - "attachments": {}, "cell_type": "markdown", - "id": "bd47d199", + "id": "6ff7edbb-ab97-4bf0-881a-0627230565f3", "metadata": {}, "source": [ - "---\n", + "***\n", "\n", "### Identifying high-level computation patterns\n", "\n", @@ -211,7 +207,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "4f548b71", "metadata": {}, @@ -238,7 +233,7 @@ "tags": [] }, "source": [ - "---\n", + "***\n", "\n", "### Concept refresher: \"index space\" vs \"label space\"\n" ] @@ -316,7 +311,7 @@ "id": "e9b80381-8a0d-4833-97fa-687bf693ca5a", "metadata": {}, "source": [ - "---\n", + "***\n", "\n", "## Xarray provides high-level patterns in both \"index space\" and \"label space\"\n", "\n", @@ -394,7 +389,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "b88c116e-ad63-4fea-81a7-bcabc194dee5", "metadata": { @@ -426,7 +420,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "9ef251aa-ce3c-4318-95ba-470568ebd967", "metadata": {}, @@ -440,7 +433,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "a36cbf94-ed41-42c6-8ccf-9278927d395b", "metadata": { @@ -466,7 +458,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "d0155b62-d08f-42c6-b467-1af73a7829c0", "metadata": { @@ -504,7 +495,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "5d7562a7", "metadata": {}, @@ -514,22 +504,15 @@ }, { "cell_type": "markdown", - "id": "0a23b9a9-076b-472d-b7a6-57083566a32d", + "id": "5743ba77-def9-4b6f-a777-87014311253d", "metadata": {}, "source": [ "```{exercise}\n", - "Calculate the 5 point running mean in time and add it to your DataArray using `rolling.construct`\n", - "```" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "fcd19bd6-0564-4b0e-b3cb-b5c31f88b4da", - "metadata": {}, - "source": [ - "````{solution}\n", ":label: generalize\n", + "Calculate the 5 point running mean in time and add it to your DataArray using `rolling.construct`\n", + "```\n", + "\n", + "````{solution} generalize\n", ":class: dropdown\n", "\n", "```python\n", @@ -540,7 +523,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "c81f95b1-e1c3-4b28-8b7f-9519c9316e3f", "metadata": {}, @@ -584,7 +566,7 @@ "tags": [] }, "source": [ - "---\n", + "***\n", "\n", "### Block windows of fixed length: `coarsen`\n", "\n", @@ -644,18 +626,17 @@ "#### Coarsen supports `reduce` for custom reductions\n", "\n", "```{exercise}\n", + ":label: coarsen-reduce\n", "Use `coarsen.reduce` to apply `np.ptp` in 5x5 (lat x lon) point blocks to `data`\n", "```" ] }, { - "attachments": {}, "cell_type": "markdown", "id": "4f88d113-86d1-4158-b4e7-f54f98af3c0c", "metadata": {}, "source": [ - "````{solution}\n", - ":label: generalize\n", + "````{solution} coarsen-reduce\n", ":class: dropdown\n", "\n", "```python\n", @@ -778,7 +759,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "fbe916a3", "metadata": {}, @@ -795,19 +775,18 @@ "metadata": {}, "source": [ "```{exercise}\n", + ":label: reshape\n", "Reshape the `time` dimension of the DataArray `monthly` to year x\n", "month and visualize the seasonal cycle for two years at 250°E\n", "```\n" ] }, { - "attachments": {}, "cell_type": "markdown", "id": "b668514e-b40c-4c64-98bf-4579747ae6ab", "metadata": {}, "source": [ - "````{solution}\n", - ":label: generalize\n", + "````{solution} reshape\n", ":class: dropdown\n", "\n", "```python\n", @@ -848,18 +827,17 @@ "This exercise came up during a live lecture.\n", "\n", "```{exercise}\n", + ":label: rolling\n", "Calculate the rolling 4 month average, averaged across years.\n", "```\n" ] }, { - "attachments": {}, "cell_type": "markdown", "id": "9d907b2b-c9c8-41cb-8af0-756d5c21ffef", "metadata": {}, "source": [ - "````{solution}\n", - ":label: generalize\n", + "````{solution} rolling\n", ":class: dropdown\n", "\n", "1. We first reshape using `coarsen.construct` to add `year` as a new dimension.\n", @@ -892,7 +870,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "25fd132c-5436-4af6-b8ad-75269cb45e75", "metadata": { @@ -902,7 +879,7 @@ "tags": [] }, "source": [ - "---\n", + "***\n", "\n", "## Label space \"windows\" or bins : GroupBy\n", "\n", @@ -1026,7 +1003,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "3763efb3", "metadata": {}, @@ -1063,7 +1039,8 @@ " supports characters, strings etc.\n", "1. [pandas.cut](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html)\n", " for binning\n", - "1. [DataArray.isin](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.isin.html)\n" + "1. [DataArray.isin](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.isin.html)\n", + "1. [scipy.ndimage.label](https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.label.html)" ] }, { @@ -1127,7 +1104,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "db7bd7e6-59cd-4b2a-ac37-2ff4d40d9fc8", "metadata": {}, @@ -1315,7 +1291,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "7cd7ede5-8e57-4099-ab39-b9d75427f125", "metadata": {}, @@ -1326,7 +1301,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "ed904da0-fb85-4432-8083-ef644209affd", "metadata": { @@ -1445,11 +1419,10 @@ "id": "3122e22a-77f0-402f-baf6-111821973250", "metadata": {}, "source": [ - "---\n" + "***" ] }, { - "attachments": {}, "cell_type": "markdown", "id": "a37ec2e3-24a0-4306-abe0-a6c04933bd88", "metadata": {}, @@ -1467,10 +1440,10 @@ "1. `groupby_bins`: [GroupBy after discretizing a numeric (non-exact, e.g. float) variable.](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.groupby_bins.html)\n", "1. `resample` :\n", " [Groupby specialized for time axes. Either downsample or upsample your data.](https://docs.xarray.dev/en/stable/user-guide/time-series.html#resampling-and-grouped-operations)\n", + "1. [Weight your data before reducing.](https://docs.xarray.dev/en/stable/user-guide/computation.html#weighted-array-reductions)\n", "\n", - "Xarray also provides features to make using those patterns easy:\n", + "Xarray also provides a consistent interface to make using those patterns easy:\n", "\n", - "1. [Weight your data before reducing.](https://docs.xarray.dev/en/stable/user-guide/computation.html#weighted-array-reductions)\n", "1. Iterate over the operators (`rolling`, `coarsen`, `groupby`, `groupby_bins`, `resample`).\n", "1. Apply functions that accept numpy-like arrays with `reduce`.\n", "1. Reshape to a new xarray object with `.construct` (`rolling`, `coarsen` only).\n", From 2aa328ed1106997ef7e8d10dc35aa593837ee0ad Mon Sep 17 00:00:00 2001 From: dcherian Date: Thu, 6 Jul 2023 15:25:17 -0600 Subject: [PATCH 22/22] more fix --- .../01-high-level-computation-patterns.ipynb | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/intermediate/01-high-level-computation-patterns.ipynb b/intermediate/01-high-level-computation-patterns.ipynb index cca30a9f..ba429693 100644 --- a/intermediate/01-high-level-computation-patterns.ipynb +++ b/intermediate/01-high-level-computation-patterns.ipynb @@ -424,7 +424,8 @@ "id": "9ef251aa-ce3c-4318-95ba-470568ebd967", "metadata": {}, "source": [ - "```{exercise} generalize\n", + "```{exercise}\n", + ":label: rolling-reduce\n", "\n", "Calculate the rolling mean in 5 point bins along both latitude and longitude using\n", "[`rolling(**kwargs).reduce`](https://docs.xarray.dev/en/stable/generated/xarray.core.rolling.DataArrayRolling.reduce.html)\n", @@ -436,17 +437,13 @@ "cell_type": "markdown", "id": "a36cbf94-ed41-42c6-8ccf-9278927d395b", "metadata": { - "jupyter": { - "source_hidden": true - }, "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ - "````{solution} \n", - ":label: generalize\n", + "````{solution} rolling-reduce\n", ":class: dropdown\n", "\n", "```python\n", @@ -508,11 +505,11 @@ "metadata": {}, "source": [ "```{exercise}\n", - ":label: generalize\n", + ":label: rolling-construct\n", "Calculate the 5 point running mean in time and add it to your DataArray using `rolling.construct`\n", "```\n", "\n", - "````{solution} generalize\n", + "````{solution} rolling-construct\n", ":class: dropdown\n", "\n", "```python\n",