diff --git a/.binder/environment.yml b/.binder/environment.yml
index 7f7afd2d..0f565b35 100644
--- a/.binder/environment.yml
+++ b/.binder/environment.yml
@@ -5,6 +5,7 @@ dependencies:
- python=3.8
- bokeh>=2.0.0
- cartopy
+ - cf_xarray
- dask=2.20.0
- dask-labextension>=2.0.0
- distributed=2.20.0
@@ -16,12 +17,12 @@ dependencies:
- ipywidgets>=7.5
- jupyter-server-proxy
- jupyterlab>=2.0.0
- - matplotlib
+ - matplotlib!=3.3.1
- notebook
- nodejs
- netcdf4
- numpy>=1.18.1
- - pandas>=1.0.1
+ - pandas<1.1
- pydap
- python-graphviz
- scipy>=1.3.0
diff --git a/.gitignore b/.gitignore
index 14b0aee1..f06d7f12 100644
--- a/.gitignore
+++ b/.gitignore
@@ -3,6 +3,7 @@ dask-report.html
_build/
*.nc
*.zarr
+dask-worker-space/
# Byte-compiled / optimized / DLL files
__pycache__/
diff --git a/index.rst b/index.rst
index a6ee2e4a..6abd1989 100644
--- a/index.rst
+++ b/index.rst
@@ -3,17 +3,21 @@ Xarray Tutorial
You can run this tutorial in a live session here: |Binder|
-This tutorial is being prepared for SciPy 2020.
-
..
.. |Binder| image:: https://mybinder.org/badge_logo.svg
:target: https://mybinder.org/v2/gh/pydata/xarray-tutorial/master?urlpath=lab
-Contents
---------
.. toctree::
:maxdepth: 1
+ :caption: OceanHackWeek 2020
+
+ oceanhackweek-2020/xarray-oceanhackweek20.ipynb
+
+
+.. toctree::
+ :maxdepth: 1
+ :caption: SciPy 2020
scipy-tutorial/00_overview.ipynb
scipy-tutorial/01_datastructures_and_io.ipynb
diff --git a/oceanhackweek-2020/xarray-oceanhackweek20.ipynb b/oceanhackweek-2020/xarray-oceanhackweek20.ipynb
new file mode 100644
index 00000000..b479e838
--- /dev/null
+++ b/oceanhackweek-2020/xarray-oceanhackweek20.ipynb
@@ -0,0 +1,1258 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "
\n",
+ "\n",
+ "# Xarray in 45 minutes\n",
+ "\n",
+ "In this lesson, we discuss cover the basics of Xarray data structures. By the\n",
+ "end of the lesson, we will be able to:\n",
+ "\n",
+ "- Understand the basic data structures in Xarray\n",
+ "- Inspect `DataArray` and `Dataset` objects.\n",
+ "- Read and write netCDF files using Xarray.\n",
+ "- Understand that there are many packages that build on top of xarray\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## A practical example\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:07:17.256417Z",
+ "start_time": "2020-08-19T16:07:15.713910Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "import xarray as xr\n",
+ "\n",
+ "%matplotlib inline"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:07:18.118460Z",
+ "start_time": "2020-08-19T16:07:18.017413Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# load tutorial dataset\n",
+ "ds = xr.tutorial.load_dataset(\"air_temperature\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## What's in a dataset? many DataArrays\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:07:19.783476Z",
+ "start_time": "2020-08-19T16:07:19.719819Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# dataset repr\n",
+ "ds"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Datasets are dict-like containers of DataArrays i.e. they are a mapping of\n",
+ "variable name to DataArray.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:07:23.486767Z",
+ "start_time": "2020-08-19T16:07:23.452957Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# pull out \"air\" dataarray with dictionary syntax\n",
+ "ds[\"air\"]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "You can save some typing by using the \"attribute\" or \"dot\" notation. This won't\n",
+ "work for variable names that clash with a built-in method name (like `mean` for\n",
+ "example).\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# pull out dataarray using dot notation\n",
+ "ds.air ## same as ds[\"air\"]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## What's in a DataArray? data + (a lot of) metadata\n",
+ "\n",
+ "### Named dimensions `.dims`\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:08:27.758670Z",
+ "start_time": "2020-08-19T16:08:27.727504Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "ds.air.dims"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Coordinate variables or \"tick labels\" (`.coords`)\n",
+ "\n",
+ "`.coords` is a simple\n",
+ "[data container](https://xarray.pydata.org/en/stable/data-structures.html#coordinates)\n",
+ "for coordinate variables.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:08:29.769909Z",
+ "start_time": "2020-08-19T16:08:29.749571Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "ds.air.coords"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:09:26.528972Z",
+ "start_time": "2020-08-19T16:09:26.512791Z"
+ }
+ },
+ "source": [
+ "Coordinates objects support similar indexing notation\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# extracting coordinate variables\n",
+ "ds.air.lon"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# extracting coorindate variables from .coords\n",
+ "ds.coords[\"lon\"]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "It is useful to think of the values in these coordinate variables as axis\n",
+ "\"labels\" such as \"tick labels\" in a figure. These are coordinate locations on a\n",
+ "grid at which you have data.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Arbitrary attributes (`.attrs`)\n",
+ "\n",
+ "`.attrs` is a dictionary that can contain arbitrary python objects. Your only\n",
+ "limitation is that some attributes may not be writeable to a netCDF file\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:09:43.977690Z",
+ "start_time": "2020-08-19T16:09:43.951688Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "ds.air.attrs"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:09:45.395462Z",
+ "start_time": "2020-08-19T16:09:45.373786Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# assign your own attribute\n",
+ "ds.air.attrs[\"who_is_awesome\"] = \"xarray\"\n",
+ "ds.air.attrs"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Underlying data (`.data`)\n",
+ "\n",
+ "Xarray structures wrap underlying simpler data structures. In this case, the\n",
+ "underlying data is a numpy array which you may be familiar with.\n",
+ "\n",
+ "This part of xarray is quite extensible allowing for GPU arrays, sparse arrays,\n",
+ "arrays with units etc. See the demo at the end.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:09:47.536691Z",
+ "start_time": "2020-08-19T16:09:47.501968Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "ds.air.data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# what is the type of the underlying data\n",
+ "type(ds.air.data)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "A numpy array!\n",
+ "\n",
+ "
\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Review\n",
+ "\n",
+ "Xarray provides two main data structures\n",
+ "\n",
+ "- DataArrays that wrap underlying data containers (e.g. numpy arrays) and\n",
+ " contain associated metadata\n",
+ "- Datasets that are dict-like containers of DataArrays\n",
+ "\n",
+ "For more see\n",
+ "\n",
+ "- https://xarray.pydata.org/en/stable/data-structures.html#dataset\n",
+ "- https://xarray.pydata.org/en/stable/data-structures.html#dataarray\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "## Why xarray? Use metadata for fun and ~profit~ papers!\n",
+ "\n",
+ "### Analysis without xarray: `X(`\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:10:25.379707Z",
+ "start_time": "2020-08-19T16:10:25.192751Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# plot the first timestep\n",
+ "lat = ds.air.lat.data # numpy array\n",
+ "lon = ds.air.lon.data # numpy array\n",
+ "temp = ds.air.data # numpy array\n",
+ "plt.figure()\n",
+ "plt.pcolormesh(lon, lat, temp[0, :, :])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:10:28.631492Z",
+ "start_time": "2020-08-19T16:10:28.594725Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "temp.mean(axis=1) ## what did I just do? I can't tell by looking at this line."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Analysis with xarray `=)`\n",
+ "\n",
+ "How readable is this code?\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:10:53.391181Z",
+ "start_time": "2020-08-19T16:10:53.151292Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "ds.air.isel(time=1).plot(x=\"lon\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Use dimension names instead of axis numbers\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:10:49.711769Z",
+ "start_time": "2020-08-19T16:10:49.668537Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "ds.air.mean(\"time\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "## Extracting data or \"indexing\" : `.sel`, `.isel`\n",
+ "\n",
+ "Xarray supports\n",
+ "\n",
+ "- label-based indexing using `.sel`\n",
+ "- position-based indexing using `.isel`\n",
+ "\n",
+ "For more see https://xarray.pydata.org/en/stable/indexing.html\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Label-based indexing\n",
+ "\n",
+ "Xarray inherits its label-based indexing rules from pandas; this means great\n",
+ "support for dates and times!\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# pull out data for all of 2013-May\n",
+ "ds.sel(time=\"2013-05\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# demonstrate slicing\n",
+ "ds.sel(time=slice(\"2013-05\", \"2013-07\"))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# demonstrate \"nearest\" indexing\n",
+ "ds.sel(lon=240.2, method=\"nearest\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# \"nearest indexing at multiple points\"\n",
+ "ds.sel(lon=[240.125, 234], lat=[40.3, 50.3], method=\"nearest\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Position-based indexing\n",
+ "\n",
+ "This is similar to your usual numpy `array[0, 2, 3]` but with the power of named\n",
+ "dimensions!\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# pull out time index 0 and lat index 0\n",
+ "ds.air.isel(time=0, lat=0) # much better than ds.air[0, 0, :]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# demonstrate slicing\n",
+ "ds.air.isel(lat=slice(10))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "## Concepts for computation\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Broadcasting: expanding data\n",
+ "\n",
+ "Let's try to calculate grid cell area associated with the air temperature data.\n",
+ "We may want this to make a proper area-weighted domain-average for example\n",
+ "\n",
+ "A very approximate formula is\n",
+ "\n",
+ "\\begin{equation} Δlat \\times Δlon \\times \\cos(\\text{latitude}) \\end{equation}\n",
+ "\n",
+ "assuming that $Δlon$ = 111km and $Δlat$ = 111km\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dlon = np.cos(ds.air.lat * np.pi / 180) * 111e3\n",
+ "dlon"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "dlat = 111e3 * xr.ones_like(ds.air.lon)\n",
+ "dlat"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "cell_area = dlon * dlat\n",
+ "cell_area"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The result has two dimensions because xarray realizes that dimensions `lon` and\n",
+ "`lat` are different so it automatically \"broadcasts\" to get a 2D result. See the\n",
+ "last row in this image from _Jake VanderPlas Python Data Science Handbook_\n",
+ "\n",
+ "
\n",
+ "\n",
+ "Because xarray knows about dimension names we avoid having to create unnecessary\n",
+ "size-1 dimensions using `np.newaxis` or `.reshape`. For more, see\n",
+ "https://xarray.pydata.org/en/stable/computation.html#broadcasting-by-dimension-name\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "### Alignment: putting data on the same grid\n",
+ "\n",
+ "When doing arithmetic operations xarray automatically \"aligns\" i.e. puts the\n",
+ "data on the same grid. In this case `cell_area` and `ds.air` are at the same\n",
+ "lat, lon points so things are multiplied as you would expect\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "(cell_area * ds.air.isel(time=1))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Now lets make `cell_area` unaligned i.e. change the coordinate labels\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# make a copy of cell_area\n",
+ "# then add 1e-5 to lat\n",
+ "cell_area_bad = cell_area.copy(deep=True)\n",
+ "cell_area_bad[\"lat\"] = cell_area.lat + 1e-5\n",
+ "cell_area_bad"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "cell_area_bad * ds.air.isel(time=1)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Tip:** If you notice extra NaNs or missing points after xarray computation, it\n",
+ "means that your xarray coordinates were not aligned _exactly_.\n",
+ "\n",
+ "For more, see\n",
+ "https://xarray.pydata.org/en/stable/computation.html#automatic-alignment\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "## High level computation: `groupby`, `resample`, `rolling`, `coarsen`, `weighted`\n",
+ "\n",
+ "Xarray has some very useful high level objects that let you do common\n",
+ "computations:\n",
+ "\n",
+ "1. `groupby` :\n",
+ " [Bin data in to groups and reduce](https://xarray.pydata.org/en/stable/groupby.html)\n",
+ "1. `resample` :\n",
+ " [Groupby specialized for time axes. Either downsample or upsample your data.](https://xarray.pydata.org/en/stable/time-series.html#resampling-and-grouped-operations)\n",
+ "1. `rolling` :\n",
+ " [Operate on rolling windows of your data e.g. running mean](https://xarray.pydata.org/en/stable/computation.html#rolling-window-operations)\n",
+ "1. `coarsen` :\n",
+ " [Downsample your data](https://xarray.pydata.org/en/stable/computation.html#coarsen-large-arrays)\n",
+ "1. `weighted` :\n",
+ " [Weight your data before reducing](https://xarray.pydata.org/en/stable/computation.html#weighted-array-reductions)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### groupby\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# seasonal groups\n",
+ "ds.groupby(\"time.season\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:14:03.590082Z",
+ "start_time": "2020-08-19T16:14:03.507391Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# make a seasonal mean\n",
+ "seasonal_mean = ds.groupby(\"time.season\").mean()\n",
+ "seasonal_mean"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The seasons are out of order (they are alphabetically sorted). This is a common\n",
+ "annoyance. The solution is to use `.reindex`\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:15:07.345540Z",
+ "start_time": "2020-08-19T16:15:07.304690Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "seasonal_mean = seasonal_mean.reindex(season=[\"DJF\", \"MAM\", \"JJA\", \"SON\"])\n",
+ "seasonal_mean"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### resample\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# resample to monthly frequency\n",
+ "ds.resample(time=\"M\").mean()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### weighted\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# weight by cell_area and take mean over (time, lon)\n",
+ "ds.weighted(cell_area).mean([\"lon\", \"time\"]).air.plot()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "## Visualization: `.plot`\n",
+ "\n",
+ "For more see https://xarray.pydata.org/en/stable/plotting.html and\n",
+ "https://xarray.pydata.org/en/stable/examples/visualization_gallery.html\n",
+ "\n",
+ "We have seen very simple plots earlier. Xarray has some support for visualizing\n",
+ "3D and 4D datasets by presenting multiple facets (or panels or subplots) showing\n",
+ "variations across rows and/or columns.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:14:06.090565Z",
+ "start_time": "2020-08-19T16:14:05.580534Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# facet the seasonal_mean\n",
+ "seasonal_mean.air.plot(col=\"season\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:15:13.777572Z",
+ "start_time": "2020-08-19T16:15:13.136044Z"
+ },
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "# contours\n",
+ "seasonal_mean.air.plot.contour(col=\"season\", levels=20, add_colorbar=True)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:15:15.998115Z",
+ "start_time": "2020-08-19T16:15:15.805569Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# line plots too? wut\n",
+ "seasonal_mean.air.mean(\"lon\").plot.line(hue=\"season\", y=\"lat\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "## Reading and writing to disk\n",
+ "\n",
+ "Xarray supports many disk formats. Below is a small example using netCDF. For\n",
+ "more see https://xarray.pydata.org/en/stable/io.html\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# write ds to netCDF\n",
+ "ds.to_netcdf(\"my-example-dataset.nc\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# read from disk\n",
+ "fromdisk = xr.open_dataset(\"my-example-dataset.nc\")\n",
+ "fromdisk"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# check that the two are identical\n",
+ "ds.identical(fromdisk)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Tip:** A common use case to read datasets that are a collection of many netCDF\n",
+ "files. See\n",
+ "https://xarray.pydata.org/en/stable/io.html#reading-multi-file-datasets for how\n",
+ "to handle that\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "## More information\n",
+ "\n",
+ "1. A description of common terms used in the xarray documentation:\n",
+ " https://xarray.pydata.org/en/stable/terminology.html\n",
+ "1. For information on how to create a DataArray from an existing numpy array:\n",
+ " https://xarray.pydata.org/en/stable/data-structures.html#creating-a-dataarray\n",
+ "1. Answers to common questions on \"how to do X\" are here:\n",
+ " https://xarray.pydata.org/en/stable/howdoi.html\n",
+ "1. Our more extensive Scipy 2020 tutorial material:\n",
+ " https://xarray-contrib.github.io/xarray-tutorial/\n",
+ "1. Ryan Abernathey has a book on data analysis with a chapter on Xarray:\n",
+ " https://earth-env-data-science.github.io/lectures/xarray/xarray_intro.html\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "## The scientific python / pangeo ecosystem: demo\n",
+ "\n",
+ "Xarray ties in to the larger scientific python ecosystem and in turn many\n",
+ "packages build on top of xarray. A long list of such packages is here:\n",
+ "https://xarray.pydata.org/en/stable/related-projects.html.\n",
+ "\n",
+ "Now we will demonstrate some cool features.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Pandas: tabular data structures\n",
+ "\n",
+ "You can easily convert between xarray and pandas structures:\n",
+ "https://pandas.pydata.org/\n",
+ "\n",
+ "This allows you to conveniently use the extensive pandas ecosystem of packages\n",
+ "(like seaborn) for your work.\n",
+ "\n",
+ "See https://xarray.pydata.org/en/stable/pandas.html\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:16:19.439068Z",
+ "start_time": "2020-08-19T16:16:19.371165Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# convert to pandas dataframe\n",
+ "df = ds.isel(time=slice(10)).to_dataframe()\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:16:20.986334Z",
+ "start_time": "2020-08-19T16:16:20.942823Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# convert dataframe to xarray\n",
+ "df.to_xarray()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### xarray can wrap other array types, not just numpy\n",
+ "\n",
+ "
\n",
+ "\n",
+ "**dask** : parallel arrays https://xarray.pydata.org/en/stable/dask.html &\n",
+ "https://docs.dask.org/en/latest/array.html\n",
+ "\n",
+ "
\n",
+ "\n",
+ "**pydata/sparse** : sparse arrays http://sparse.pydata.org\n",
+ "\n",
+ "
\n",
+ "\n",
+ "**cupy** : GPU arrays http://cupy.chainer.org\n",
+ "\n",
+ "
\n",
+ "\n",
+ "**pint** : unit-aware computations https://pint.readthedocs.org &\n",
+ "https://github.com/xarray-contrib/pint-xarray\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Xarray + dask\n",
+ "\n",
+ "Dask cuts up NumPy arrays into blocks and parallelizes your analysis code across\n",
+ "these blocks\n",
+ "\n",
+ "
\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:16:28.269631Z",
+ "start_time": "2020-08-19T16:16:27.141306Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# make dask cluster; this is for demo purposes\n",
+ "import dask\n",
+ "import distributed\n",
+ "\n",
+ "cluster = distributed.LocalCluster()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:16:28.912370Z",
+ "start_time": "2020-08-19T16:16:28.843239Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "client = distributed.Client(cluster)\n",
+ "client"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:16:30.205686Z",
+ "start_time": "2020-08-19T16:16:30.140967Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# demonstrate dask dataset\n",
+ "dasky = xr.tutorial.open_dataset(\n",
+ " \"air_temperature\", chunks={\"time\": 10}, # 10 time steps in each block\n",
+ ")\n",
+ "\n",
+ "dasky.air"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "All computations with dask-backed xarray objects are lazy, allowing you to build\n",
+ "up a complicated chain of analysis steps quickly\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:16:32.606881Z",
+ "start_time": "2020-08-19T16:16:32.561338Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# demonstrate lazy mean\n",
+ "dasky.air.mean(\"lat\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "To get concrete values, call `.compute` or `.load`\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:16:38.409452Z",
+ "start_time": "2020-08-19T16:16:36.914676Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# \"compute\" the mean\n",
+ "dasky.air.mean(\"lat\").compute()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### holoviews: javascript interactive plots\n",
+ "\n",
+ "the `hvplot` package is a nice easy way to access\n",
+ "[holoviews](http://holoviews.org/) functionality. It attaches itself to all\n",
+ "xarray objects under the `.hvplot` namespace. So instead of using `.plot` use\n",
+ "`.hvplot`\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:17:26.384153Z",
+ "start_time": "2020-08-19T16:17:24.365996Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "import hvplot.xarray\n",
+ "\n",
+ "ds.air.hvplot(groupby=\"time\", clim=(270, 300))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Try the slider!\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### cf_xarray : use even more metadata for even more fun and ~profit~ papers\n",
+ "\n",
+ "[cf_xarray](https://cf-xarray.readthedocs.io/) is a new project that tries to\n",
+ "let you make use of other CF attributes that xarray ignores. It attaches itself\n",
+ "to all xarray objects under the `.cf` namespace.\n",
+ "\n",
+ "Where xarray allows you to specify dimension names for analysis, `cf_xarray`\n",
+ "lets you specify logical names like `\"latitude\"` or `\"longitude\"` instead as\n",
+ "long as the appropriate CF attributes are set.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:17:43.015284Z",
+ "start_time": "2020-08-19T16:17:42.981131Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "import cf_xarray"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:18:31.279888Z",
+ "start_time": "2020-08-19T16:18:31.245785Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# describe cf attributes in dataset\n",
+ "ds.air.cf.describe()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The following `mean` operation will work with any dataset that has appropriate\n",
+ "attributes set that allow detection of the \"latitude\" variable (e.g.\n",
+ "`units: \"degress_north\"` or `standard_name: \"latitude\"`)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:18:33.806279Z",
+ "start_time": "2020-08-19T16:18:33.738629Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# demonstrate equivalent of .mean(\"lat\")\n",
+ "ds.air.cf.mean(\"latitude\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2020-08-19T16:19:27.628895Z",
+ "start_time": "2020-08-19T16:19:27.580971Z"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# demonstrate indexing\n",
+ "ds.air.cf.sel(longitude=242.5, method=\"nearest\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Other cool packages\n",
+ "\n",
+ "- [xgcm](https://xgcm.readthedocs.io/) : grid-aware operations with xarray\n",
+ " objects\n",
+ "- [xrft](https://xgcm.readthedocs.io/) : fourier transforms with xarray\n",
+ "- [xclim](https://xclim.readthedocs.io/) : calculating climate indices with\n",
+ " xarray objects\n",
+ "- [intake-xarray](https://intake-xarray.readthedocs.io/) : forget about file\n",
+ " paths\n",
+ "- [rioxarray](https://corteva.github.io/rioxarray/stable/index.html) : raster\n",
+ " files and xarray\n",
+ "- [xesmf](https://xesmf.readthedocs.io/) : regrid using ESMF\n",
+ "- [MetPy](https://unidata.github.io/MetPy/latest/index.html) : tools for working\n",
+ " with weather data\n",
+ "\n",
+ "More here: https://xarray.pydata.org/en/stable/related-projects.html\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.8"
+ },
+ "toc": {
+ "base_numbering": 1,
+ "nav_menu": {},
+ "number_sections": true,
+ "sideBar": true,
+ "skip_h1_title": false,
+ "title_cell": "Table of Contents",
+ "title_sidebar": "Contents",
+ "toc_cell": true,
+ "toc_position": {},
+ "toc_section_display": true,
+ "toc_window_display": true
+ },
+ "widgets": {
+ "application/vnd.jupyter.widget-state+json": {
+ "state": {},
+ "version_major": 2,
+ "version_minor": 0
+ }
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}