diff --git a/fundamentals/02.3_aligning_data_objects.ipynb b/fundamentals/02.3_aligning_data_objects.ipynb index 9ba38380..879d4bf2 100644 --- a/fundamentals/02.3_aligning_data_objects.ipynb +++ b/fundamentals/02.3_aligning_data_objects.ipynb @@ -249,40 +249,31 @@ }, { "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Exercise\n", - "\n", - "Consider the following 2D array. What are the dimensions of `array - array.mean(\"time\")`?" - ] - }, - { - "cell_type": "code", - "execution_count": null, "metadata": { "tags": [] }, - "outputs": [], "source": [ + "````{exercise}\n", + ":label: ex1\n", + "\n", + "Consider the following 2D array. What are the dimensions of `array - array.mean(\"time\")`?\n", + "```python\n", "array = xr.DataArray(\n", " np.arange(12).reshape(3, 4),\n", " dims=(\"space\", \"time\"),\n", " coords={\"space\": [\"a\", \"b\", \"c\"], \"time\": [0, 1, 2, 3]},\n", " name=\"array\",\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "hide-output" - ] - }, - "outputs": [], - "source": [ - "(array - array.mean(\"time\")).dims" + ")\n", + "```\n", + "````\n", + "\n", + "````{solution} ex1\n", + ":class: dropdown\n", + "\n", + "```python\n", + "(array - array.mean(\"time\")).dims\n", + "```\n", + "````" ] }, { @@ -448,17 +439,17 @@ }, { "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Exercise** Consider the following two arrays. Write down the `x` and `y` coordinate locations for `da1 - da2`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], + "metadata": { + "tags": [ + "hide-input" + ] + }, "source": [ + "````{exercise}\n", + ":label: dims\n", + "\n", + "Consider the following two arrays. Write down the `x` and `y` coordinate locations for `da1 - da2`\n", + "```python\n", "da1 = xr.DataArray(\n", " np.arange(12).reshape(3, 4),\n", " dims=(\"space\", \"time\"),\n", @@ -468,18 +459,15 @@ " [0, 1],\n", " dims=\"space\",\n", " coords={\"space\": [\"b\", \"d\"]},\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "tags": [ - "hide-input" - ] - }, - "source": [ - "**Answer** `x = [\"b\"], y=[0, 1, 2, 3]` . `da2` has been broadcasted to 2D (so dimension \"y\" has been inserted) and the two arrays are aligned using `join=\"inner\"` prior to subtraction." + ")\n", + "```\n", + "````\n", + "\n", + "```{solution} dims\n", + ":class: dropdown\n", + "\n", + "`x = [\"b\"], y=[0, 1, 2, 3]` . `da2` has been broadcasted to 2D (so dimension \"y\" has been inserted) and the two arrays are aligned using `join=\"inner\"` prior to subtraction.\n", + "```" ] }, { diff --git a/fundamentals/03.1_computation_with_xarray.ipynb b/fundamentals/03.1_computation_with_xarray.ipynb index 0484c373..2f5c5075 100644 --- a/fundamentals/03.1_computation_with_xarray.ipynb +++ b/fundamentals/03.1_computation_with_xarray.ipynb @@ -4,8 +4,6 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\n", - "\n", "# Basic Computation\n", "\n", "In this lesson, we discuss how to do scientific computations with xarray\n", @@ -176,12 +174,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "
\n", - " Note: apply_ufunc is a powerful function.\n", - " It has many options for doing more complicated things.\n", - " Unfortunately, we don't have time to go into more depth here.\n", - " Please consult the documentation for more details.\n", - "
\n" + "```{tip}\n", + "`apply_ufunc` is a powerful function. It has many options for doing more complicated things. Unfortunately, we don't have time to go into more depth here. See the [`apply_ufunc` tutorial material](https://tutorial.xarray.dev/advanced/apply_ufunc/apply_ufunc.html) for more.\n", + "```\n" ] }, { @@ -270,24 +265,15 @@ "tags": [] }, "source": [ - "### Exercise\n", + "```{exercise}\n", + ":label: sst-mean\n", + "Take the mean of `sst` in both longitude and latitude. Make a simple timeseries plot.\n", + "```\n", + "````{solution} sst-mean\n", + ":class: dropdown\n", "\n", - "Take the mean of `sst` in both longitude and latitude. Make a simple timeseries\n", - "plot:\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "hide-input", - "hide-output" - ] - }, - "outputs": [], - "source": [ - "sst.mean([\"lat\", \"lon\"]).plot();" + "sst.mean([\"lat\", \"lon\"]).plot();\n", + "```\n" ] } ], diff --git a/fundamentals/03.2_groupby_with_xarray.ipynb b/fundamentals/03.2_groupby_with_xarray.ipynb index c96ca4f6..f16d61ef 100644 --- a/fundamentals/03.2_groupby_with_xarray.ipynb +++ b/fundamentals/03.2_groupby_with_xarray.ipynb @@ -4,8 +4,6 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "\n", - "\n", "# Grouped Computations\n", "\n", "In this lesson, we discuss how to do scientific computations with defined \"groups\" of data\n", @@ -29,7 +27,7 @@ "import matplotlib.pyplot as plt\n", "\n", "# don't expand data by default\n", - "xr.set_options(display_expand_data=False)\n", + "xr.set_options(display_expand_data=False, display_expand_attrs=False)\n", "\n", "%config InlineBackend.figure_format='retina'" ] @@ -217,9 +215,10 @@ "`gb` is a DatasetGroupBy object. It represents a GroupBy operation and helpfully tells us the unique \"groups\" or labels found during the split step.\n", "\n", "\n", - "
\n", - " Xarrays' computation methods (groupby, groupby_bins, rolling, coarsen, weighted) all return special objects that represent the basic underlying computation pattern. For e.g. `gb` above is a `DatasetGroupBy` object that represents monthly groupings of the data in `ds` . It is usually helpful to save and reuse these objects for multiple operations (e.g. a mean and standard deviation calculation).\n", - "
" + "```{tip}\n", + "\n", + "Xarrays' computation methods (`groupby`, `groupby_bins`, `rolling`, `coarsen`, `weighted`) all return special objects that represent the basic underlying computation pattern. For e.g. `gb` above is a `DatasetGroupBy` object that represents monthly groupings of the data in `ds` . It is usually helpful to save and reuse these objects for multiple operations (e.g. a mean and standard deviation calculation).\n", + "```" ] }, { @@ -480,22 +479,17 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Exercise\n", + "```{exercise} \n", + ":label: annual-mean\n", "\n", - "Using `groupby`, plot the annual mean time series of SST at 300°E, 50°N" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "hide-output" - ] - }, - "outputs": [], - "source": [ - "ds.groupby(\"time.year\").mean().sst.sel(lon=300, lat=50).plot();" + "Using `groupby`, plot the annual mean time series of SST at 300°E, 50°N\n", + "```\n", + "````{solution} annual-mean\n", + ":class: dropdown\n", + "```python\n", + "ds.groupby(\"time.year\").mean().sst.sel(lon=300, lat=50).plot();\n", + "```\n", + "````" ] }, { @@ -539,9 +533,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "
\n", - " Note: resample only works with proper datetime64 coordinate labels. Note the `dtype` of `time` in the repr above.\n", - "
" + "```{note}\n", + "`resample` only works with proper datetime64 coordinate labels. Note the `dtype` of `time` in the repr above.\n", + "```" ] }, { @@ -588,26 +582,20 @@ "tags": [] }, "source": [ - "### Exercise\n", + "```{exercise}\n", + ":label: resample-mean\n", "\n", "Using `resample`, plot the annual mean time series of SST at 300°E, 50°N.\n", "\n", - "Compare this output to the groupby output. What differences do you see?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "hide-output", - "hide-input" - ] - }, - "outputs": [], - "source": [ + "Compare this output to the groupby output. What differences do you see?\n", + "```\n", + "````{solution} resample-mean\n", + ":class: dropdown\n", + "```python\n", "resampled = ds.resample(time='Y').mean().sst.sel(lon=300, lat=50)\n", - "resampled.plot();" + "resampled.plot();\n", + "```\n", + "````" ] }, { @@ -618,6 +606,8 @@ ] }, "source": [ + "## GroupBy vs Resample \n", + "\n", "Let's compare the grouped and resampled outputs.\n", "\n", "\n", @@ -630,16 +620,14 @@ "cell_type": "code", "execution_count": null, "metadata": { - "tags": [ - "hide-output", - "hide-input" - ] + "tags": [] }, "outputs": [], "source": [ "from IPython.display import display_html\n", "\n", "grouped = ds.groupby(\"time.year\").mean().sst.sel(lon=300, lat=50)\n", + "resampled = ds.resample(time='Y').mean().sst.sel(lon=300, lat=50)\n", "display_html(grouped)\n", "display_html(resampled)" ] @@ -648,10 +636,7 @@ "cell_type": "code", "execution_count": null, "metadata": { - "tags": [ - "hide-input", - "hide-output" - ] + "tags": [] }, "outputs": [], "source": [ diff --git a/fundamentals/03.3_windowed.ipynb b/fundamentals/03.3_windowed.ipynb index 6954f806..7c28884c 100644 --- a/fundamentals/03.3_windowed.ipynb +++ b/fundamentals/03.3_windowed.ipynb @@ -5,8 +5,6 @@ "id": "5d7c85f4-6334-4527-ae2e-34312161aa31", "metadata": {}, "source": [ - "\n", - "\n", "# Windowed Computations\n", "\n", "Xarray has built-in support for windowed operations:\n", @@ -31,6 +29,9 @@ "import xarray as xr\n", "import matplotlib.pyplot as plt\n", "\n", + "np.set_printoptions(threshold=10, edgeitems=2)\n", + "xr.set_options(display_expand_data=False)\n", + "\n", "%config InlineBackend.figure_format='retina'" ] }, @@ -82,9 +83,9 @@ "id": "b48567bc-108a-4527-8011-5a4d45bdf966", "metadata": {}, "source": [ - "
\n", - " Xarrays' computation methods (groupby, groupby_bins, rolling, coarsen, weighted) all return special objects that represent the basic underlying computation pattern. For e.g. `rolling` above is a `DatasetRolling` object that represents 12-point rolling windows of the data in `ds` . It is usually helpful to save and reuse these objects for multiple operations (e.g. a mean and standard deviation calculation).\n", - "
" + "```{tip}\n", + "Xarrays' computation methods (`groupby`, `groupby_bins`, `rolling`, `coarsen`, `weighted`) all return special objects that represent the basic underlying computation pattern. For e.g. `rolling` above is a `DatasetRolling` object that represents 12-point rolling windows of the data in `ds` . It is usually helpful to save and reuse these objects for multiple operations (e.g. a mean and standard deviation calculation).\n", + "```" ] }, { @@ -192,29 +193,20 @@ "id": "3c903632-7221-4969-9f00-963881f87853", "metadata": {}, "source": [ - "### Exercise\n", - "\n", - "\n", - "Illustrate the difference between `center=True` and `center=False` for rolling by looking at the `construct`-ed array." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "23e2166f-ef45-4ea8-bbf7-2c133fa11ca2", - "metadata": { - "tags": [ - "hide-input", - "hide-output" - ] - }, - "outputs": [], - "source": [ + "```{exercise} \n", + ":label: rolling-center\n", + "Illustrate the difference between `center=True` and `center=False` for rolling by looking at the `construct`-ed array.\n", + "```\n", + "````{solution} rolling-center\n", + ":class: dropdown\n", + "```python\n", "display(\"center=True\")\n", "display(simple.rolling(time=5, center=True).construct(\"window\"))\n", "\n", "display(\"center=False\")\n", - "display(simple.rolling(time=5, center=False).construct(\"window\"))" + "display(simple.rolling(time=5, center=False).construct(\"window\"))\n", + "```\n", + "````" ] }, { @@ -249,9 +241,7 @@ "id": "002b8759-aa0e-4186-aaea-3e415392fe42", "metadata": {}, "source": [ - "
\n", - " Xarrays' computation methods (groupby, groupby_bins, rolling, coarsen, weighted) all return special objects that represent the basic underlying computation pattern. For e.g. `coarse` above is a `DatasetCoarsen` object that represents 5-point windows along lat, lon of the data in `ds`. It is usually helpful to save and reuse these objects for multiple operations (e.g. a mean and standard deviation calculation).\n", - "
" + "Xarrays' computation methods (groupby, groupby_bins, rolling, coarsen, weighted) all return special objects that represent the basic underlying computation pattern. For e.g. `coarse` above is a `DatasetCoarsen` object that represents 5-point windows along lat, lon of the data in `ds`. It is usually helpful to save and reuse these objects for multiple operations (e.g. a mean and standard deviation calculation)." ] }, { @@ -300,7 +290,9 @@ "\n", "Like `rolling`, `coarsen` also provides a `construct` method for custom block operations. \n", "\n", - "**Tip** `coarsen.construct` is a handy way to reshape Xarray objects.\n", + "```{tip} \n", + "`coarsen.construct` is a handy way to reshape Xarray objects.\n", + "```\n", "\n", "Consider a \"monthly\" 1D timeseries. This simple example has one value per month for 2 years" ] @@ -344,31 +336,25 @@ "id": "9b97df2d-b320-4dc7-9d40-7377519fb2e7", "metadata": {}, "source": [ - "### Exercise\n", + "````{exercise} \n", + ":label: rolling-pad\n", "\n", "Imagine the array `months` was one element shorter. Use `boundary=\"pad\"` and the `side` kwarg to reshape `months.isel(time=slice(1, None))` to a 2D DataArray with the following values:\n", "\n", - "```\n", + "```python\n", "array([[nan, 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.],\n", " [ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.]])\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "5f7733ad-0483-43d4-b48b-6a782de27332", - "metadata": { - "tags": [ - "hide-input", - "hide-output" - ] - }, - "outputs": [], - "source": [ + "```\n", + "````\n", + "````{solution} rolling-pad\n", + ":class: dropdown\n", + "\n", + "```python\n", "months.isel(time=slice(1, None)).coarsen({\"time\": 12}, boundary=\"pad\", side=\"right\").construct(\n", " time=(\"year\", \"month\")\n", - ")" + ")\n", + "```\n", + "````" ] }, { @@ -376,7 +362,7 @@ "id": "ac49ab39-b8b4-4514-a46f-28601276db48", "metadata": {}, "source": [ - "Note that `coarsen` pads with NaNs. For more control over paddnig, use\n", + "Note that `coarsen` pads with NaNs. For more control over padding, use\n", "[DataArray.pad](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.pad.html) explicitly." ] }, diff --git a/intermediate/xarray_and_dask.ipynb b/intermediate/xarray_and_dask.ipynb index 52b7decf..100bdb47 100644 --- a/intermediate/xarray_and_dask.ipynb +++ b/intermediate/xarray_and_dask.ipynb @@ -19,7 +19,9 @@ "2. Learn that all xarray built-in operations can transparently use dask\n", "\n", "\n", - "**Important:** *Using Dask does not always make your computations run faster!* \n", + "```{important}\n", + "Using Dask does not always make your computations run faster!* \n", + "```\n", "\n", "Performance will depend on the computational infrastructure you're using (for example, how many CPU cores), how the data you're working with is structured and stored, and the algorithms and code you're running. Be sure to review the [Dask best-practices](https://docs.dask.org/en/stable/best-practices.html) if you're new to Dask!" ] @@ -38,7 +40,7 @@ "\n", "> Dask Array implements a subset of the NumPy ndarray interface using blocked algorithms, cutting up the large array into many small arrays (*blocks* or *chunks*). This lets us compute on arrays larger than memory using all of our cores. We coordinate these blocked algorithms using Dask graphs.\n", "\n", - "" + "" ] }, { @@ -82,8 +84,6 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**first an apology!**\n", - "\n", "So what did dask do when you called `.mean`? It added that operation to the \"graph\" or a blueprint of operations to execute later." ] }, @@ -139,7 +139,14 @@ "metadata": {}, "outputs": [], "source": [ - "import xarray as xr" + "%xmode minimal\n", + "\n", + "import numpy as np\n", + "import xarray as xr\n", + "\n", + "# limit the amount of information printed to screen\n", + "xr.set_options(display_expand_data=False)\n", + "np.set_printoptions(threshold=10, edgeitems=2)" ] }, { @@ -198,8 +205,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**Tip**: All variables in a `Dataset` need _not_ have the same chunk size along\n", - "common dimensions." + "```{tip}\n", + "All variables in a `Dataset` need _not_ have the same chunk size along\n", + "common dimensions.\n", + "```" ] }, { @@ -214,7 +223,9 @@ " this means that compute will always be called\n", "2. `.data` will return a Dask array\n", "\n", - "**tip**: Use `to_numpy` or `as_numpy` instead of `.values` so that your code generalizes to other array types (like CuPy arrays, sparse arrays)" + "```{tip}\n", + "Use `to_numpy` or `as_numpy` instead of `.values` so that your code generalizes to other array types (like CuPy arrays, sparse arrays)\n", + "```" ] }, { @@ -241,9 +252,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Exercise\n", - "\n", - "Try calling `ds.air.values` and `ds.air.data`. Do you understand the difference?" + "```{exercise} \n", + ":label: data-values\n", + "Try calling `ds.air.values` and `ds.air.data`. Do you understand the difference?\n", + "```" ] }, { @@ -325,101 +337,43 @@ "2. `.load()` replaces the dask array in the xarray object with a numpy array.\n", " This is equivalent to `ds = ds.compute()`\n", " \n", - "**Tip:** There is a third option : \"persisting\". `.persist()` loads the values into distributed RAM. The values are computed but remain distributed across workers. So `ds.air.persist()` still returns a dask array. This is useful if you will be repeatedly using a dataset for computation but it is too large to load into local memory. You will see a persistent task on the dashboard. See the [dask user guide](https://docs.dask.org/en/latest/api.html#dask.persist) for more on persisting" + "```{tip}\n", + "There is a third option : \"persisting\". `.persist()` loads the values into distributed RAM. The values are computed but remain distributed across workers. So `ds.air.persist()` still returns a dask array. This is useful if you will be repeatedly using a dataset for computation but it is too large to load into local memory. You will see a persistent task on the dashboard. See the [dask user guide](https://docs.dask.org/en/latest/api.html#dask.persist) for more on persisting\n", + "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Exercise\n", + "```{exercise}\n", + ":label: compute\n", "\n", - "Try running `mean.compute` and then examine `mean` after that. Is it still a dask array?" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "mean" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "hide-output", - "hide-input" - ] - }, - "outputs": [], - "source": [ - "mean.compute()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "hide-input", - "hide-output" - ] - }, - "outputs": [], - "source": [ - "mean" + "Try running `mean.compute` and then examine `mean` after that. Is it still a dask array?\n", + "```\n", + "```{solution} compute\n", + ":class: dropdown\n", + "\n", + "Computing returns a numpy array but does not modify in-place. So `mean` still contains a dask array.\n", + "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Exercise\n", + "```{exercise}\n", + ":label: load\n", "\n", - "Now repeat that exercise with `mean.load`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "hide-input", - "hide-output" - ] - }, - "outputs": [], - "source": [ - "mean.load()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "hide-input", - "hide-output" - ] - }, - "outputs": [], - "source": [ - "mean" + "Now repeat that exercise with `mean.load`.\n", + "```\n", + "```{solution} load\n", + ":class: dropdown\n", + "\n", + "`load` modifies an Xarray object in-place so `mean` now contains a numpy array.\n", + "```" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, { "cell_type": "markdown", "metadata": {}, @@ -462,13 +416,6 @@ "client" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, { "cell_type": "markdown", "metadata": {}, @@ -477,7 +424,9 @@ "\n", "👈 Or click the \"Search\" 🔍 button in the [dask-labextension](https://github.com/dask/dask-labextension) dashboard.\n", "\n", - "NOTE: if using the dask-labextension, you should disable the 'Simple' JupyterLab interface (`View -> Simple Interface`), so that you can drag and rearrange whichever dashboards you want. The `Workers` and `Task Stream` are good to make sure the dashboard is working!" + "```{note}\n", + "if using the dask-labextension, you should disable the 'Simple' JupyterLab interface (`View -> Simple Interface`), so that you can drag and rearrange whichever dashboards you want. The `Workers` and `Task Stream` are good to make sure the dashboard is working!\n", + "```" ] }, { @@ -575,7 +524,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**tip** While these operations all work, not all of them are necessarily the optimal implementation for parallelism. Usually analysis pipelines need some tinkering and tweaking to get things to work. In particular read the user guidie recommendations for [chunking](https://docs.xarray.dev/en/stable/user-guide/dask.html#chunking-and-performance) and [performance](https://docs.xarray.dev/en/stable/user-guide/dask.html#optimization-tips)" + "```{tip}\n", + "While these operations all work, not all of them are necessarily the optimal implementation for parallelism. Usually analysis pipelines need some tinkering and tweaking to get things to work. In particular read the user guidie recommendations for [chunking](https://docs.xarray.dev/en/stable/user-guide/dask.html#chunking-and-performance) and [performance](https://docs.xarray.dev/en/stable/user-guide/dask.html#optimization-tips)\n", + "```" ] }, { @@ -591,11 +542,14 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "jp-MarkdownHeadingCollapsed": true, + "tags": [] + }, "source": [ - "### Exercise\n", - "\n", - "Visualize the task graph for a few different computations on `ds.air`! " + "```{exercise}\n", + "Visualize the task graph for a few different computations on `ds.air`!\n", + "```" ] }, { diff --git a/workshops/scipy2023/README.md b/workshops/scipy2023/README.md index 85834561..1ff7b87c 100644 --- a/workshops/scipy2023/README.md +++ b/workshops/scipy2023/README.md @@ -66,8 +66,8 @@ Once your codespace is launched, the following happens: {doc}`../../intermediate/01-high-level-computation-patterns` ``` -```{dropdown} Wrapping other arrays - +```{dropdown} Wrapping other arrays: dask +{doc}`../../intermediate/xarray_and_dask` ``` ```{dropdown} Wrapping custom computation