diff --git a/fundamentals/02.1_indexing_Basic.ipynb b/fundamentals/02.1_indexing_Basic.ipynb index 8fc5d018..6b98ee39 100644 --- a/fundamentals/02.1_indexing_Basic.ipynb +++ b/fundamentals/02.1_indexing_Basic.ipynb @@ -53,7 +53,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "import xarray as xr\n", @@ -71,7 +73,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "ds = xr.tutorial.load_dataset(\"air_temperature\")\n", @@ -81,7 +85,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "da = ds[\"air\"]" @@ -111,7 +117,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "np_array = ds[\"air\"].data # numpy array\n", @@ -128,7 +136,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "np_array[1, 0, 0]" @@ -144,7 +154,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "# extract a time-series for one spatial location\n", @@ -154,7 +166,6 @@ { "cell_type": "markdown", "metadata": { - "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ @@ -185,7 +196,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "da[:, 20, 40]" @@ -194,41 +207,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "```{caution}\n", - "Positional indexing deviates from the NumPy behavior when indexing with multiple arrays. \n", - "```\n", - "We can show this with an example: " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "np_array[:, [0, 1], [0, 1]].shape" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "da[:, [0, 1], [0, 1]].shape" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Please note how the dimension of the `DataArray()` object is different from the `numpy.ndarray`.\n", - "\n", - "```{tip}\n", - "However, users can still achieve NumPy-like pointwise indexing across multiple labeled dimensions by using Xarray vectorized indexing techniques. We will delve further into this topic in the advanced indexing notebook.\n", - "```" - ] + "source": [] }, { "cell_type": "markdown", @@ -253,7 +232,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "da.isel(lat=20, lon=40).plot();" @@ -269,7 +250,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "da.isel(time=slice(0, 20), lat=20, lon=40).plot();" @@ -286,95 +269,113 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "tags": [] + }, "source": [ - "But what if I wanted to select data only for 2014, how would I know the indices for it? Xarray reduce this complexity by introducing label-based indexing. \n", + "Indexing a DataArray directly works (mostly) just like it does for numpy arrays, except that the returned object is always another DataArray; however,when indexing with multiple arrays, positional indexing in Xarray behaves differently compared to NumPy.\n", "\n", - "## Label-based Indexing\n", - "\n", - "To select data by coordinate labels instead of integer indices we can use the same syntax, using `sel` instead of `isel`:" + "```{caution}\n", + "Positional indexing deviates from the NumPy behavior when indexing with multiple arrays. \n", + "```\n", + "We can show this with an example: " ] }, { - "cell_type": "markdown", - "metadata": {}, + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [] + }, + "outputs": [], "source": [ - "For example, let's select the data for one day 2014-01-01 at Lat 25 N and Lon 210 E using `sel` :" + "np_array[:, [0, 1], [0, 1]].shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": { - "tags": [ - "hide-output" - ] + "tags": [] }, "outputs": [], "source": [ - "da.sel(time=\"2014-01-01\", lat=25, lon=210).plot();" + "da[:, [0, 1], [0, 1]].shape" ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "tags": [] + }, "source": [ - "Now, let's select data for year 2014 at this coordinate:\n" + "Please note how the dimension of the `DataArray()` object is different from the `numpy.ndarray`.\n", + "\n", + "```{tip}\n", + "However, users can still achieve NumPy-like pointwise indexing across multiple labeled dimensions by using Xarray vectorized indexing techniques. We will delve further into this topic in the advanced indexing notebook.\n", + "```" ] }, { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "hide-output" - ] - }, - "outputs": [], + "cell_type": "markdown", + "metadata": {}, "source": [ - "da.sel(lat=50.0, lon=200.0, time=\"2014\")" + "So far, we have explored positional indexing, which relies on knowing the exact indices. But, what if you wanted to select data specifically for a particular latitude? It becomes challenging to determine the corresponding indices in such cases. Xarray reduce this complexity by introducing label-based indexing. \n", + "\n", + "## Label-based Indexing\n", + "\n", + "To select data by coordinate labels instead of integer indices we can use the same syntax, using `sel` instead of `isel`:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Similarly we can do slicing or filter a date range using the `.slice` function: " + "For example, let's select all data for Lat 25 °N and Lon 210 °E using `sel` :" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [ + "hide-output" + ] + }, "outputs": [], "source": [ - "# demonstrate slicing\n", - "da.sel(time=slice(\"2014-02-14\", \"2014-12-13\"))" + "da.sel(lat=25, lon=210).plot();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Xarray also supports label-based indexing, just like pandas using `.loc`. To do label based indexing, use the `loc` attribute:" + "Similarly we can do slicing or filter a range using the `.slice` function: " ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "da.loc[\"2014-02-14\":\"2014-12-13\"]" + "# demonstrate slicing\n", + "da.sel(lon=slice(210, 215))" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "da.time" + "# demonstrate slicing\n", + "da.sel(lat=slice(50, 25), lon=slice(210, 215))" ] }, { @@ -389,10 +390,12 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "da.drop_sel(lat=50.0, lon=200.0, time=\"2014\")" + "da.drop_sel(lat=50.0, lon=200.0)" ] }, { @@ -403,17 +406,14 @@ "\n", "## Nearest Neighbor Lookups\n", "\n", - "The label based selection methods `sel()` support `method` and `tolerance` keyword argument. The `method` parameter allows for enabling nearest neighbor (inexact) lookups by use of the methods `pad`, `backfill` or `nearest`:\n", - "\n" + "The label based selection methods `sel()` support `method` and `tolerance` keyword argument. The `method` parameter allows for enabling nearest neighbor (inexact) lookups by use of the methods `pad`, `backfill` or `nearest`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { - "tags": [ - "hide-output" - ] + "tags": [] }, "outputs": [], "source": [ @@ -422,7 +422,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "tags": [] + }, "source": [ "`tolerance` argument limits the maximum distance for valid matches with an inexact lookup:" ] @@ -430,7 +432,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "da.sel(lat=52.25, lon=251.8998, method=\"nearest\", tolerance=2)" @@ -450,75 +454,14 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "ds.sel(lat=52.25, lon=251.8998, method=\"nearest\")" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Exercises\n", - "\n", - "Practice the syntax you’ve learned so far:" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "```{exercise}\n", - ":label: indexing-1\n", - "\n", - "Select the first 30 entries of `latitude` and 30th to 40th entries of `longitude`:\n", - "```\n", - "\n", - "````{solution} indexing-1\n", - ":class: dropdown\n", - "```python\n", - "ds.isel(lat=slice(None, 30), lon=slice(30, 40))\n", - "```\n", - "\n", - "````" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "```{exercise}\n", - ":label: indexing-2\n", - "\n", - "Select all data at 75 degree north and between Jan 1, 2013 and Oct 15, 2013 :\n", - "```\n", - "````{solution} indexing-2\n", - ":class: dropdown\n", - "```python\n", - "ds.sel(lat=75, time=slice(\"2013-01-01\", \"2013-10-15\"))\n", - "```\n", - "````" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "```{exercise}\n", - ":label: indexing-3\n", - "\n", - "Remove all entries at 260 and 270 degrees :\n", - "\n", - "```\n", - "````{solution} indexing-3\n", - ":class: dropdown\n", - "```python\n", - "ds.drop_sel(lon=[260, 270])\n", - "```\n", - "````" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -538,7 +481,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "ds.sel(time='2013-01-01 06:00')" @@ -554,7 +499,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "ds.sel(time='2013-01-01')" @@ -562,9 +509,11 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "tags": [] + }, "source": [ - "We can use this feature to select all points in a month" + "We can use this feature to select all points in a year:" ] }, { @@ -575,14 +524,14 @@ }, "outputs": [], "source": [ - "ds.sel(time=\"2014-May\")" + "ds.sel(time=\"2014\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "or a year" + "or a month:" ] }, { @@ -593,12 +542,14 @@ }, "outputs": [], "source": [ - "ds.sel(time=\"2014\")" + "ds.sel(time=\"2014-May\")" ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "tags": [] + }, "source": [ "### Selecting data for a range of dates\n", "\n", @@ -608,7 +559,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "# This will return a subset of the dataset corresponding to the entire year of 2013.\n", @@ -617,7 +570,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "tags": [] + }, "source": [ "```{note}\n", "\n", @@ -638,7 +593,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "dates = ['2013-07-09', '2013-10-11', '2013-12-24']\n", @@ -657,7 +614,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "ds.sel(time=ds.time.dt.month == 7)" @@ -665,7 +624,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "tags": [] + }, "source": [ "Or, if you wanted to select data from a specific day of each month, you could use:" ] @@ -673,12 +634,85 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "ds.sel(time=ds.time.dt.day == 15)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Exercises\n", + "\n", + "Practice the syntax you’ve learned so far:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "```{exercise}\n", + ":label: indexing-1\n", + "\n", + "Select the first 30 entries of `latitude` and 30th to 40th entries of `longitude`:\n", + "```\n", + "\n", + "````{solution} indexing-1\n", + ":class: dropdown\n", + "```python\n", + "ds.isel(lat=slice(None, 30), lon=slice(30, 40))\n", + "```\n", + "\n", + "````" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "```{exercise}\n", + ":label: indexing-2\n", + "\n", + "Select all data at 75 degree north and between Jan 1, 2013 and Oct 15, 2013 :\n", + "```\n", + "````{solution} indexing-2\n", + ":class: dropdown\n", + "```python\n", + "ds.sel(lat=75, time=slice(\"2013-01-01\", \"2013-10-15\"))\n", + "```\n", + "````" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "```{exercise}\n", + ":label: indexing-3\n", + "\n", + "Remove all entries at 260 and 270 degrees :\n", + "\n", + "```\n", + "````{solution} indexing-3\n", + ":class: dropdown\n", + "```python\n", + "ds.drop_sel(lon=[260, 270])\n", + "```\n", + "````" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -703,11 +737,6 @@ "\n", "- [Xarray Docs - Indexing and Selecting Data](https://docs.xarray.dev/en/stable/indexing.html)" ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] } ], "metadata": { diff --git a/intermediate/indexing/advanced-indexing.ipynb b/intermediate/indexing/advanced-indexing.ipynb index 0f57cae1..804af21c 100644 --- a/intermediate/indexing/advanced-indexing.ipynb +++ b/intermediate/indexing/advanced-indexing.ipynb @@ -24,7 +24,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "import numpy as np\n", @@ -46,7 +48,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "ds = xr.tutorial.load_dataset(\"air_temperature\")\n", @@ -72,7 +76,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "da.isel(time=0, lat=[2, 4, 10, 13], lon=[1, 6, 7]).plot(); # -- orthogonal indexing" @@ -90,7 +96,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "target_lat = xr.DataArray([31, 41, 42, 42], dims=\"degrees_north\")\n", @@ -108,7 +116,9 @@ }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "tags": [] + }, "source": [ "\n", "But what if we would like to find the information from the nearest grid cell to a collection of specified points (for example, weather stations or tower data)?\n", @@ -130,7 +140,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "# Define target latitude and longitude (where weather stations might be)\n", @@ -142,7 +154,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "lon_points" @@ -158,7 +172,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "da.sel(lat=lat_points, lon=lon_points, method=\"nearest\")" @@ -174,7 +190,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "da.sel(lat=lat_points, lon=lon_points, method=\"nearest\").dims" @@ -194,7 +212,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "da.sel(lat=[20, 30, 40], lon=lon_points, method=\"nearest\")" @@ -219,7 +239,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "da = xr.DataArray(\n", diff --git a/workshops/scipy2023/index.ipynb b/workshops/scipy2023/index.ipynb index c984c0ee..bd6bed23 100644 --- a/workshops/scipy2023/index.ipynb +++ b/workshops/scipy2023/index.ipynb @@ -26,7 +26,7 @@ "| Topic | Time | Notebook Links | \n", "| :- | - | - | \n", "| Introduction and Setup | 1:30 (20 min) | --- | \n", - "| Indexing | 1:50 (30 min) | [Advanced and Vectorized Indexing](../../intermediate/indexing/advanced-indexing.ipynb) | \n", + "| Indexing | 1:50 (30 min) | [Quick Introduction to Indexing](../../fundamentals/02.1_indexing_Basic.ipynb)
[Advanced and Vectorized Indexing](../../intermediate/indexing/advanced-indexing.ipynb)
[Boolean Indexing & Masking](../../intermediate/indexing/boolean-masking-indexing.ipynb) | \n", "| *10 minute Break* \n", "| Computational Patterns | 2:30 (50 min) | [Computation Patterns](../../intermediate/01-high-level-computation-patterns.ipynb) | \n", "| *10 minute Break* | \n",