xarray-contrib
diff --git a/‎.devcontainer/Dockerfile
Lines changed: 1 addition & 1 deletion b/‎.devcontainer/Dockerfile
Lines changed: 1 addition & 1 deletion
diff --git a/‎.devcontainer/scipy2024/devcontainer.json
Lines changed: 1 addition & 1 deletion b/‎.devcontainer/scipy2024/devcontainer.json
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md
Lines changed: 1 addition & 1 deletion b/‎README.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎_config.yml
Lines changed: 2 additions & 1 deletion b/‎_config.yml
Lines changed: 2 additions & 1 deletion
diff --git a/‎_static/announcement.css
Lines changed: 0 additions & 4 deletions b/‎_static/announcement.css
Lines changed: 0 additions & 4 deletions
diff --git a/‎_static/style.css
Lines changed: 3 additions & 0 deletions b/‎_static/style.css
Lines changed: 3 additions & 0 deletions
diff --git a/‎_toc.yml
Lines changed: 5 additions & 4 deletions b/‎_toc.yml
Lines changed: 5 additions & 4 deletions
diff --git a/‎fundamentals/01_data_structures.md
Lines changed: 64 additions & 0 deletions b/‎fundamentals/01_data_structures.md
Lines changed: 64 additions & 0 deletions
diff --git a/‎fundamentals/01_datastructures.ipynb
Lines changed: 34 additions & 107 deletions b/‎fundamentals/01_datastructures.ipynb
Lines changed: 34 additions & 107 deletions
diff --git a/‎images/malaria_dataset.png
429 KB b/‎images/malaria_dataset.png
429 KB
diff --git a/‎images/orthogonal_vs_vectorized.png
168 KB b/‎images/orthogonal_vs_vectorized.png
168 KB
diff --git a/‎intermediate/01-high-level-computation-patterns.ipynb
Lines changed: 1 addition & 1 deletion b/‎intermediate/01-high-level-computation-patterns.ipynb
Lines changed: 1 addition & 1 deletion
@@ -1 +1 @@
-FROM pangeo/base-image:2024.06.24
+FROM pangeo/base-image:2024.06.28
@@ -12,7 +12,7 @@
   },
   "customizations": {
     "codespaces": {
-      "openFiles": ["workshops/scipy2024/README.md"]
+      "openFiles": ["workshops/scipy2024/index.ipynb"]
     },
     "vscode": {
       "extensions": ["ms-toolsai.jupyter", "ms-python.python"]
 
@@ -2,7 +2,7 @@
 
 [![CI](https://github.com/xarray-contrib/xarray-tutorial/workflows/CI/badge.svg?branch=main)](https://github.com/xarray-contrib/xarray-tutorial/actions?query=branch%3Amain)
 [![Jupyter Book Badge](https://jupyterbook.org/badge.svg)](https://tutorial.xarray.dev)
-[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/xarray-contrib/xarray-tutorial/HEAD?labpath=overview/fundamental-path/index.ipynb)
+[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/xarray-contrib/xarray-tutorial/HEAD?labpath=workshops/scipy2024/index.ipynb)
 
 This is the repository for a Jupyter Book website with tutorial material for [Xarray](https://github.com/pydata/xarray), _an open source project and Python package that makes working with labelled multi-dimensional arrays simple, efficient, and fun!_
 
 
@@ -8,7 +8,8 @@ only_build_toc_files: true
 # See https://jupyterbook.org/customize/config.html#add-a-link-to-your-repository
 html:
   # NOTE: this announcement shows up on all pages
-  #announcement: 'ℹ️ SciPy Tutorial Attendees. <a href="https://tutorial.xarray.dev/workshops/scipy2023/README.html">Click here </a>.'
+  announcement: '<a href="https://forms.gle/KEq7WviCdz9xTaJX6">The Xarray 2024 User Survey is live. Please take ~5 minutes to fill it out and help us improve Xarray.</a>'
+  #announcement: 'ℹ️ SciPy 2024 Tutorial Attendees. <a href="https://tutorial.xarray.dev/workshops/scipy2024/README.html">Click here </a>.'
   home_page_in_navbar: false
   use_edit_page_button: true
   use_issues_button: true
 
@@ -0,0 +1,3 @@
+.bd-header-announcement {
+  background-color: var(--pst-color-info-bg);
+}
@@ -81,15 +81,16 @@ parts:
 
   - caption: Workshops
     chapters:
+      - file: workshops/scipy2024/index.ipynb
       - file: workshops/scipy2023/README
-      - file: workshops/oceanhackweek2020/README
-        sections:
-          - url: https://tutorial.xarray.dev/overview/xarray-in-45-min
-            title: Xarray in 45 minutes
       - file: workshops/thinking-like-xarray/README
         sections:
           - url: https://tutorial.xarray.dev/intermediate/01-high-level-computation-patterns
             title: High-level computation patterns
+      - file: workshops/oceanhackweek2020/README
+        sections:
+          - url: https://tutorial.xarray.dev/overview/xarray-in-45-min
+            title: Xarray in 45 minutes
       - file: workshops/online-tutorial-series/README
         sections:
           - file: workshops/online-tutorial-series/01_xarray_fundamentals
 
@@ -1,5 +1,69 @@
 # Data Structures
 
+Multi-dimensional (a.k.a. N-dimensional, ND) arrays (sometimes called “tensors”)
+are an essential part of computational science. They are encountered in a wide
+range of fields, including physics, astronomy, geoscience, bioinformatics,
+engineering, finance, and deep learning. In Python, [NumPy](https://numpy.org/)
+provides the fundamental data structure and API for working with raw ND arrays.
+However, real-world datasets are usually more than just raw numbers; they have
+labels which encode information about how the array values map to locations in
+space, time, etc.
+
+The N-dimensional nature of Xarray’s data structures makes it suitable for
+dealing with multi-dimensional scientific data, and its use of dimension names
+instead of axis labels (`dim='time'` instead of `axis=0`) makes such arrays much
+more manageable than the raw NumPy ndarray: with Xarray, you don’t need to keep
+track of the order of an array’s dimensions or insert dummy dimensions of size 1
+to align arrays (e.g., using np.newaxis).
+
+The immediate payoff of using Xarray is that you’ll write less code. The
+long-term payoff is that you’ll understand what you were thinking when you come
+back to look at it weeks or months later.
+
+## Example: Weather forecast
+
+Here is an example of how we might structure a dataset for a weather forecast:
+
+<img src="https://docs.xarray.dev/en/stable/_images/dataset-diagram.png" align="center" width="80%">
+
+You'll notice multiple data variables (temperature, precipitation), coordinate
+variables (latitude, longitude), and dimensions (x, y, t). We'll cover how these
+fit into Xarray's data structures below.
+
+Xarray doesn’t just keep track of labels on arrays – it uses them to provide a
+powerful and concise interface. For example:
+
+- Apply operations over dimensions by name: `x.sum('time')`.
+
+- Select values by label (or logical location) instead of integer location:
+  `x.loc['2014-01-01']` or `x.sel(time='2014-01-01')`.
+
+- Mathematical operations (e.g., `x - y`) vectorize across multiple dimensions
+  (array broadcasting) based on dimension names, not shape.
+
+- Easily use the split-apply-combine paradigm with groupby:
+  `x.groupby('time.dayofyear').mean()`.
+
+- Database-like alignment based on coordinate labels that smoothly handles
+  missing values: `x, y = xr.align(x, y, join='outer')`.
+
+- Keep track of arbitrary metadata in the form of a Python dictionary:
+  `x.attrs`.
+
+## Example: Mosquito genetics
+
+Although the Xarray library was originally developed with Earth Science datasets in mind, the datastructures work well across many other domains! For example, below is a side-by-side view of a data schematic on the left and Xarray Dataset representation on the right taken from a mosquito genetics analysis:
+
+![malaria_dataset](../images/malaria_dataset.png)
+
+The data can be stored as a 3-dimensional array, where one dimension of the array corresponds to positions (**variants**) within a reference genome, another dimension corresponds to the individual mosquitoes that were sequenced (**samples**), and a third dimension corresponds to the number of genomes within each individual (**ploidy**)."
+
+You can explore this dataset in detail via the [training course in data analysis for genomic surveillance of African malaria vectors](https://anopheles-genomic-surveillance.github.io/workshop-5/module-1-xarray.html)!
+
+## Explore on your own
+
+The following collection of notebooks provide interactive code examples for working with example datasets and constructing Xarray data structures manually.
+
 ```{tableofcontents}
 
 ```
@@ -6,62 +6,13 @@
    "source": [
     "# Xarray's Data structures\n",
     "\n",
-    "In this lesson, we cover the basics of Xarray data structures. Our\n",
-    "learning goals are as follows. By the end of the lesson, we will be able to:\n",
+    "In this lesson, we cover the basics of Xarray data structures. By the end of the lesson, we will be able to:\n",
     "\n",
-    "- Understand the basic data structures (`DataArray` and `Dataset` objects) in Xarray\n",
-    "\n",
-    "---\n",
-    "\n",
-    "## Introduction\n",
-    "\n",
-    "Multi-dimensional (a.k.a. N-dimensional, ND) arrays (sometimes called “tensors”)\n",
-    "are an essential part of computational science. They are encountered in a wide\n",
-    "range of fields, including physics, astronomy, geoscience, bioinformatics,\n",
-    "engineering, finance, and deep learning. In Python, [NumPy](https://numpy.org/)\n",
-    "provides the fundamental data structure and API for working with raw ND arrays.\n",
-    "However, real-world datasets are usually more than just raw numbers; they have\n",
-    "labels which encode information about how the array values map to locations in\n",
-    "space, time, etc.\n",
-    "\n",
-    "Here is an example of how we might structure a dataset for a weather forecast:\n",
-    "\n",
-    "<img src=\"https://docs.xarray.dev/en/stable/_images/dataset-diagram.png\" align=\"center\" width=\"80%\">\n",
-    "\n",
-    "You'll notice multiple data variables (temperature, precipitation), coordinate\n",
-    "variables (latitude, longitude), and dimensions (x, y, t). We'll cover how these\n",
-    "fit into Xarray's data structures below.\n",
-    "\n",
-    "Xarray doesn’t just keep track of labels on arrays – it uses them to provide a\n",
-    "powerful and concise interface. For example:\n",
-    "\n",
-    "- Apply operations over dimensions by name: `x.sum('time')`.\n",
-    "\n",
-    "- Select values by label (or logical location) instead of integer location:\n",
-    "  `x.loc['2014-01-01']` or `x.sel(time='2014-01-01')`.\n",
-    "\n",
-    "- Mathematical operations (e.g., `x - y`) vectorize across multiple dimensions\n",
-    "  (array broadcasting) based on dimension names, not shape.\n",
-    "\n",
-    "- Easily use the split-apply-combine paradigm with groupby:\n",
-    "  `x.groupby('time.dayofyear').mean()`.\n",
-    "\n",
-    "- Database-like alignment based on coordinate labels that smoothly handles\n",
-    "  missing values: `x, y = xr.align(x, y, join='outer')`.\n",
-    "\n",
-    "- Keep track of arbitrary metadata in the form of a Python dictionary:\n",
-    "  `x.attrs`.\n",
-    "\n",
-    "The N-dimensional nature of xarray’s data structures makes it suitable for\n",
-    "dealing with multi-dimensional scientific data, and its use of dimension names\n",
-    "instead of axis labels (`dim='time'` instead of `axis=0`) makes such arrays much\n",
-    "more manageable than the raw numpy ndarray: with xarray, you don’t need to keep\n",
-    "track of the order of an array’s dimensions or insert dummy dimensions of size 1\n",
-    "to align arrays (e.g., using np.newaxis).\n",
-    "\n",
-    "The immediate payoff of using xarray is that you’ll write less code. The\n",
-    "long-term payoff is that you’ll understand what you were thinking when you come\n",
-    "back to look at it weeks or months later.\n"
+    ":::{admonition} Learning Goals\n",
+    "- Understand the basic Xarray data structures `DataArray` and `Dataset` \n",
+    "- Customize the display of Xarray data structures\n",
+    "- The connection between Pandas and Xarray data structures\n",
+    ":::"
    ]
   },
   {
@@ -72,13 +23,10 @@
     "\n",
     "Xarray provides two data structures: the `DataArray` and `Dataset`. The\n",
     "`DataArray` class attaches dimension names, coordinates and attributes to\n",
-    "multi-dimensional arrays while `Dataset` combines multiple arrays.\n",
+    "multi-dimensional arrays while `Dataset` combines multiple DataArrays.\n",
     "\n",
     "Both classes are most commonly created by reading data.\n",
-    "To learn how to create a DataArray or Dataset manually, see the [Creating Data Structures](01.1_creating_data_structures.ipynb) tutorial.\n",
-    "\n",
-    "Xarray has a few small real-world tutorial datasets hosted in this GitHub repository https://github.com/pydata/xarray-data.\n",
-    "We'll use the [xarray.tutorial.load_dataset](https://docs.xarray.dev/en/stable/generated/xarray.tutorial.open_dataset.html#xarray.tutorial.open_dataset) convenience function to download and open the `air_temperature` (National Centers for Environmental Prediction) Dataset by name."
+    "To learn how to create a DataArray or Dataset manually, see the [Creating Data Structures](01.1_creating_data_structures.ipynb) tutorial."
    ]
   },
   {
@@ -88,7 +36,13 @@
    "outputs": [],
    "source": [
     "import numpy as np\n",
-    "import xarray as xr"
+    "import xarray as xr\n",
+    "import pandas as pd\n",
+    "\n",
+    "# When working in a Jupyter Notebook you might want to customize Xarray display settings to your liking\n",
+    "# The following settings reduce the amount of data displayed out by default\n",
+    "xr.set_options(display_expand_attrs=False, display_expand_data=False)\n",
+    "np.set_printoptions(threshold=10, edgeitems=2)"
    ]
   },
   {
@@ -97,7 +51,10 @@
    "source": [
     "### Dataset\n",
     "\n",
-    "`Dataset` objects are dictionary-like containers of DataArrays, mapping a variable name to each DataArray.\n"
+    "`Dataset` objects are dictionary-like containers of DataArrays, mapping a variable name to each DataArray.\n",
+    "\n",
+    "Xarray has a few small real-world tutorial datasets hosted in this GitHub repository https://github.com/pydata/xarray-data.\n",
+    "We'll use the [xarray.tutorial.load_dataset](https://docs.xarray.dev/en/stable/generated/xarray.tutorial.open_dataset.html#xarray.tutorial.open_dataset) convenience function to download and open the `air_temperature` (National Centers for Environmental Prediction) Dataset by name."
    ]
   },
   {
@@ -147,14 +104,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### What is all this anyway? (String representations)\n",
+    "#### HTML vs text representations\n",
     "\n",
     "Xarray has two representation types: `\"html\"` (which is only available in\n",
     "notebooks) and `\"text\"`. To choose between them, use the `display_style` option.\n",
     "\n",
     "So far, our notebook has automatically displayed the `\"html\"` representation (which we will continue using).\n",
-    "The `\"html\"` representation is interactive, allowing you to collapse sections (left arrows) and\n",
-    "view attributes and values for each value (right hand sheet icon and data symbol)."
+    "The `\"html\"` representation is interactive, allowing you to collapse sections (▶) and\n",
+    "view attributes and values for each value (📄 and ≡)."
    ]
   },
   {
@@ -171,18 +128,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The output consists of:\n",
+    "☝️ From top to bottom the output consists of:\n",
     "\n",
-    "- a summary of all *dimensions* of the `Dataset` `(lat: 25, time: 2920, lon: 53)`: this tells us that the first\n",
-    "  dimension is named `lat` and has a size of `25`, the second dimension is named\n",
-    "  `time` and has a size of `2920`, and the third dimension is named `lon` and has a size\n",
-    "  of `53`. Because we will access the dimensions by name, the order doesn't matter.\n",
-    "- an unordered list of *coordinates* or dimensions with coordinates with one item\n",
-    "  per line. Each item has a name, one or more dimensions in parentheses, a dtype\n",
-    "  and a preview of the values. Also, if it is a dimension coordinate, it will be\n",
-    "  marked with a `*`.\n",
-    "- an alphabetically sorted list of *dimensions without coordinates* (if there are any)\n",
-    "- an unordered list of *attributes*, or metadata"
+    "- **Dimensions**: summary of all *dimensions* of the `Dataset` `(lat: 25, time: 2920, lon: 53)`: this tells us that the first dimension is named `lat` and has a size of `25`, the second dimension is named `time` and has a size of `2920`, and the third dimension is named `lon` and has a size of `53`. Because we will access the dimensions by name, the order doesn't matter.\n",
+    "- **Coordinates**: an unordered list of *coordinates* or dimensions with coordinates with one item per line. Each item has a name, one or more dimensions in parentheses, a dtype and a preview of the values. Also, if it is a dimension coordinate, it will be printed in **bold** font. *dimensions without coordinates* appear in plain font (there are none in this example, but you might imagine a 'mask' coordinate that has a value assigned at every point).\n",
+    "- **Data variables**: names of each nD *measurement* in the dataset, followed by its dimensions `(time, lat, lon)`, dtype, and a preview of values.\n",
+    "- **Indexes**: Each dimension with coordinates is backed by an \"Index\". In this example, each dimension is backed by a `PandasIndex`\n",
+    "- **Attributes**: an unordered list of metadata (for example, a paragraph describing the dataset)"
    ]
   },
   {
@@ -379,15 +331,6 @@
     "methods on `xarray` objects:\n"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import pandas as pd"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -429,8 +372,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**<code>to_series</code>**: This will always convert `DataArray` objects to\n",
-    "`pandas.Series`, using a `MultiIndex` for higher dimensions\n"
+    "### to_series\n",
+    "This will always convert `DataArray` objects to `pandas.Series`, using a `MultiIndex` for higher dimensions\n"
    ]
   },
   {
@@ -446,9 +389,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**<code>to_dataframe</code>**: This will always convert `DataArray` or `Dataset`\n",
-    "objects to a `pandas.DataFrame`. Note that `DataArray` objects have to be named\n",
-    "for this.\n"
+    "### to_dataframe\n",
+    "\n",
+    "This will always convert `DataArray` or `Dataset` objects to a `pandas.DataFrame`. Note that `DataArray` objects have to be named for this. Since columns in a `DataFrame` need to have the same index, they are\n",
+    "broadcasted."
    ]
   },
   {
@@ -459,23 +403,6 @@
    "source": [
     "ds.air.to_dataframe()"
    ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Since columns in a `DataFrame` need to have the same index, they are\n",
-    "broadcasted.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "ds.to_dataframe()"
-   ]
   }
  ],
  "metadata": {
 
@@ -657,7 +657,7 @@
     ":class: dropdown\n",
     "\n",
     "```python\n",
-    "data.coarsen(lat=5, lon=5, boundary=\"trim\").reduce(np.mean).plot();\n",
+    "data.coarsen(lat=5, lon=5, boundary=\"trim\").reduce(np.ptp).plot();\n",
     "```\n",
     ":::\n",
     "::::"
Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-FROM pangeo/base-image:2024.06.24`
	`1`	`+FROM pangeo/base-image:2024.06.28`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+.bd-header-announcement {`
	`2`	`+ background-color: var(--pst-color-info-bg);`
	`3`	`+}`