diff --git a/.binder/environment.yml b/.binder/environment.yml index 7f7afd2d..0f565b35 100644 --- a/.binder/environment.yml +++ b/.binder/environment.yml @@ -5,6 +5,7 @@ dependencies: - python=3.8 - bokeh>=2.0.0 - cartopy + - cf_xarray - dask=2.20.0 - dask-labextension>=2.0.0 - distributed=2.20.0 @@ -16,12 +17,12 @@ dependencies: - ipywidgets>=7.5 - jupyter-server-proxy - jupyterlab>=2.0.0 - - matplotlib + - matplotlib!=3.3.1 - notebook - nodejs - netcdf4 - numpy>=1.18.1 - - pandas>=1.0.1 + - pandas<1.1 - pydap - python-graphviz - scipy>=1.3.0 diff --git a/.gitignore b/.gitignore index 14b0aee1..f06d7f12 100644 --- a/.gitignore +++ b/.gitignore @@ -3,6 +3,7 @@ dask-report.html _build/ *.nc *.zarr +dask-worker-space/ # Byte-compiled / optimized / DLL files __pycache__/ diff --git a/index.rst b/index.rst index a6ee2e4a..6abd1989 100644 --- a/index.rst +++ b/index.rst @@ -3,17 +3,21 @@ Xarray Tutorial You can run this tutorial in a live session here: |Binder| -This tutorial is being prepared for SciPy 2020. - .. .. |Binder| image:: https://mybinder.org/badge_logo.svg :target: https://mybinder.org/v2/gh/pydata/xarray-tutorial/master?urlpath=lab -Contents --------- .. toctree:: :maxdepth: 1 + :caption: OceanHackWeek 2020 + + oceanhackweek-2020/xarray-oceanhackweek20.ipynb + + +.. toctree:: + :maxdepth: 1 + :caption: SciPy 2020 scipy-tutorial/00_overview.ipynb scipy-tutorial/01_datastructures_and_io.ipynb diff --git a/oceanhackweek-2020/xarray-oceanhackweek20.ipynb b/oceanhackweek-2020/xarray-oceanhackweek20.ipynb new file mode 100644 index 00000000..b479e838 --- /dev/null +++ b/oceanhackweek-2020/xarray-oceanhackweek20.ipynb @@ -0,0 +1,1258 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "# Xarray in 45 minutes\n", + "\n", + "In this lesson, we discuss cover the basics of Xarray data structures. By the\n", + "end of the lesson, we will be able to:\n", + "\n", + "- Understand the basic data structures in Xarray\n", + "- Inspect `DataArray` and `Dataset` objects.\n", + "- Read and write netCDF files using Xarray.\n", + "- Understand that there are many packages that build on top of xarray\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## A practical example\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:07:17.256417Z", + "start_time": "2020-08-19T16:07:15.713910Z" + } + }, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "import xarray as xr\n", + "\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:07:18.118460Z", + "start_time": "2020-08-19T16:07:18.017413Z" + } + }, + "outputs": [], + "source": [ + "# load tutorial dataset\n", + "ds = xr.tutorial.load_dataset(\"air_temperature\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What's in a dataset? many DataArrays\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:07:19.783476Z", + "start_time": "2020-08-19T16:07:19.719819Z" + } + }, + "outputs": [], + "source": [ + "# dataset repr\n", + "ds" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Datasets are dict-like containers of DataArrays i.e. they are a mapping of\n", + "variable name to DataArray.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:07:23.486767Z", + "start_time": "2020-08-19T16:07:23.452957Z" + } + }, + "outputs": [], + "source": [ + "# pull out \"air\" dataarray with dictionary syntax\n", + "ds[\"air\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can save some typing by using the \"attribute\" or \"dot\" notation. This won't\n", + "work for variable names that clash with a built-in method name (like `mean` for\n", + "example).\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# pull out dataarray using dot notation\n", + "ds.air ## same as ds[\"air\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What's in a DataArray? data + (a lot of) metadata\n", + "\n", + "### Named dimensions `.dims`\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:08:27.758670Z", + "start_time": "2020-08-19T16:08:27.727504Z" + } + }, + "outputs": [], + "source": [ + "ds.air.dims" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Coordinate variables or \"tick labels\" (`.coords`)\n", + "\n", + "`.coords` is a simple\n", + "[data container](https://xarray.pydata.org/en/stable/data-structures.html#coordinates)\n", + "for coordinate variables.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:08:29.769909Z", + "start_time": "2020-08-19T16:08:29.749571Z" + } + }, + "outputs": [], + "source": [ + "ds.air.coords" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:09:26.528972Z", + "start_time": "2020-08-19T16:09:26.512791Z" + } + }, + "source": [ + "Coordinates objects support similar indexing notation\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# extracting coordinate variables\n", + "ds.air.lon" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# extracting coorindate variables from .coords\n", + "ds.coords[\"lon\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is useful to think of the values in these coordinate variables as axis\n", + "\"labels\" such as \"tick labels\" in a figure. These are coordinate locations on a\n", + "grid at which you have data.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Arbitrary attributes (`.attrs`)\n", + "\n", + "`.attrs` is a dictionary that can contain arbitrary python objects. Your only\n", + "limitation is that some attributes may not be writeable to a netCDF file\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:09:43.977690Z", + "start_time": "2020-08-19T16:09:43.951688Z" + } + }, + "outputs": [], + "source": [ + "ds.air.attrs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:09:45.395462Z", + "start_time": "2020-08-19T16:09:45.373786Z" + } + }, + "outputs": [], + "source": [ + "# assign your own attribute\n", + "ds.air.attrs[\"who_is_awesome\"] = \"xarray\"\n", + "ds.air.attrs" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Underlying data (`.data`)\n", + "\n", + "Xarray structures wrap underlying simpler data structures. In this case, the\n", + "underlying data is a numpy array which you may be familiar with.\n", + "\n", + "This part of xarray is quite extensible allowing for GPU arrays, sparse arrays,\n", + "arrays with units etc. See the demo at the end.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:09:47.536691Z", + "start_time": "2020-08-19T16:09:47.501968Z" + } + }, + "outputs": [], + "source": [ + "ds.air.data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# what is the type of the underlying data\n", + "type(ds.air.data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A numpy array!\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Review\n", + "\n", + "Xarray provides two main data structures\n", + "\n", + "- DataArrays that wrap underlying data containers (e.g. numpy arrays) and\n", + " contain associated metadata\n", + "- Datasets that are dict-like containers of DataArrays\n", + "\n", + "For more see\n", + "\n", + "- https://xarray.pydata.org/en/stable/data-structures.html#dataset\n", + "- https://xarray.pydata.org/en/stable/data-structures.html#dataarray\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Why xarray? Use metadata for fun and ~profit~ papers!\n", + "\n", + "### Analysis without xarray: `X(`\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:10:25.379707Z", + "start_time": "2020-08-19T16:10:25.192751Z" + } + }, + "outputs": [], + "source": [ + "# plot the first timestep\n", + "lat = ds.air.lat.data # numpy array\n", + "lon = ds.air.lon.data # numpy array\n", + "temp = ds.air.data # numpy array\n", + "plt.figure()\n", + "plt.pcolormesh(lon, lat, temp[0, :, :])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:10:28.631492Z", + "start_time": "2020-08-19T16:10:28.594725Z" + } + }, + "outputs": [], + "source": [ + "temp.mean(axis=1) ## what did I just do? I can't tell by looking at this line." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Analysis with xarray `=)`\n", + "\n", + "How readable is this code?\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:10:53.391181Z", + "start_time": "2020-08-19T16:10:53.151292Z" + } + }, + "outputs": [], + "source": [ + "ds.air.isel(time=1).plot(x=\"lon\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Use dimension names instead of axis numbers\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:10:49.711769Z", + "start_time": "2020-08-19T16:10:49.668537Z" + } + }, + "outputs": [], + "source": [ + "ds.air.mean(\"time\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Extracting data or \"indexing\" : `.sel`, `.isel`\n", + "\n", + "Xarray supports\n", + "\n", + "- label-based indexing using `.sel`\n", + "- position-based indexing using `.isel`\n", + "\n", + "For more see https://xarray.pydata.org/en/stable/indexing.html\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Label-based indexing\n", + "\n", + "Xarray inherits its label-based indexing rules from pandas; this means great\n", + "support for dates and times!\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# pull out data for all of 2013-May\n", + "ds.sel(time=\"2013-05\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# demonstrate slicing\n", + "ds.sel(time=slice(\"2013-05\", \"2013-07\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# demonstrate \"nearest\" indexing\n", + "ds.sel(lon=240.2, method=\"nearest\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# \"nearest indexing at multiple points\"\n", + "ds.sel(lon=[240.125, 234], lat=[40.3, 50.3], method=\"nearest\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Position-based indexing\n", + "\n", + "This is similar to your usual numpy `array[0, 2, 3]` but with the power of named\n", + "dimensions!\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# pull out time index 0 and lat index 0\n", + "ds.air.isel(time=0, lat=0) # much better than ds.air[0, 0, :]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# demonstrate slicing\n", + "ds.air.isel(lat=slice(10))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Concepts for computation\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Broadcasting: expanding data\n", + "\n", + "Let's try to calculate grid cell area associated with the air temperature data.\n", + "We may want this to make a proper area-weighted domain-average for example\n", + "\n", + "A very approximate formula is\n", + "\n", + "\\begin{equation} Δlat \\times Δlon \\times \\cos(\\text{latitude}) \\end{equation}\n", + "\n", + "assuming that $Δlon$ = 111km and $Δlat$ = 111km\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dlon = np.cos(ds.air.lat * np.pi / 180) * 111e3\n", + "dlon" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dlat = 111e3 * xr.ones_like(ds.air.lon)\n", + "dlat" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cell_area = dlon * dlat\n", + "cell_area" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The result has two dimensions because xarray realizes that dimensions `lon` and\n", + "`lat` are different so it automatically \"broadcasts\" to get a 2D result. See the\n", + "last row in this image from _Jake VanderPlas Python Data Science Handbook_\n", + "\n", + "\n", + "\n", + "Because xarray knows about dimension names we avoid having to create unnecessary\n", + "size-1 dimensions using `np.newaxis` or `.reshape`. For more, see\n", + "https://xarray.pydata.org/en/stable/computation.html#broadcasting-by-dimension-name\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "### Alignment: putting data on the same grid\n", + "\n", + "When doing arithmetic operations xarray automatically \"aligns\" i.e. puts the\n", + "data on the same grid. In this case `cell_area` and `ds.air` are at the same\n", + "lat, lon points so things are multiplied as you would expect\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "(cell_area * ds.air.isel(time=1))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now lets make `cell_area` unaligned i.e. change the coordinate labels\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# make a copy of cell_area\n", + "# then add 1e-5 to lat\n", + "cell_area_bad = cell_area.copy(deep=True)\n", + "cell_area_bad[\"lat\"] = cell_area.lat + 1e-5\n", + "cell_area_bad" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cell_area_bad * ds.air.isel(time=1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Tip:** If you notice extra NaNs or missing points after xarray computation, it\n", + "means that your xarray coordinates were not aligned _exactly_.\n", + "\n", + "For more, see\n", + "https://xarray.pydata.org/en/stable/computation.html#automatic-alignment\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## High level computation: `groupby`, `resample`, `rolling`, `coarsen`, `weighted`\n", + "\n", + "Xarray has some very useful high level objects that let you do common\n", + "computations:\n", + "\n", + "1. `groupby` :\n", + " [Bin data in to groups and reduce](https://xarray.pydata.org/en/stable/groupby.html)\n", + "1. `resample` :\n", + " [Groupby specialized for time axes. Either downsample or upsample your data.](https://xarray.pydata.org/en/stable/time-series.html#resampling-and-grouped-operations)\n", + "1. `rolling` :\n", + " [Operate on rolling windows of your data e.g. running mean](https://xarray.pydata.org/en/stable/computation.html#rolling-window-operations)\n", + "1. `coarsen` :\n", + " [Downsample your data](https://xarray.pydata.org/en/stable/computation.html#coarsen-large-arrays)\n", + "1. `weighted` :\n", + " [Weight your data before reducing](https://xarray.pydata.org/en/stable/computation.html#weighted-array-reductions)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### groupby\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# seasonal groups\n", + "ds.groupby(\"time.season\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:14:03.590082Z", + "start_time": "2020-08-19T16:14:03.507391Z" + } + }, + "outputs": [], + "source": [ + "# make a seasonal mean\n", + "seasonal_mean = ds.groupby(\"time.season\").mean()\n", + "seasonal_mean" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The seasons are out of order (they are alphabetically sorted). This is a common\n", + "annoyance. The solution is to use `.reindex`\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:15:07.345540Z", + "start_time": "2020-08-19T16:15:07.304690Z" + } + }, + "outputs": [], + "source": [ + "seasonal_mean = seasonal_mean.reindex(season=[\"DJF\", \"MAM\", \"JJA\", \"SON\"])\n", + "seasonal_mean" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### resample\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# resample to monthly frequency\n", + "ds.resample(time=\"M\").mean()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### weighted\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# weight by cell_area and take mean over (time, lon)\n", + "ds.weighted(cell_area).mean([\"lon\", \"time\"]).air.plot()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Visualization: `.plot`\n", + "\n", + "For more see https://xarray.pydata.org/en/stable/plotting.html and\n", + "https://xarray.pydata.org/en/stable/examples/visualization_gallery.html\n", + "\n", + "We have seen very simple plots earlier. Xarray has some support for visualizing\n", + "3D and 4D datasets by presenting multiple facets (or panels or subplots) showing\n", + "variations across rows and/or columns.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:14:06.090565Z", + "start_time": "2020-08-19T16:14:05.580534Z" + } + }, + "outputs": [], + "source": [ + "# facet the seasonal_mean\n", + "seasonal_mean.air.plot(col=\"season\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:15:13.777572Z", + "start_time": "2020-08-19T16:15:13.136044Z" + }, + "scrolled": true + }, + "outputs": [], + "source": [ + "# contours\n", + "seasonal_mean.air.plot.contour(col=\"season\", levels=20, add_colorbar=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:15:15.998115Z", + "start_time": "2020-08-19T16:15:15.805569Z" + } + }, + "outputs": [], + "source": [ + "# line plots too? wut\n", + "seasonal_mean.air.mean(\"lon\").plot.line(hue=\"season\", y=\"lat\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Reading and writing to disk\n", + "\n", + "Xarray supports many disk formats. Below is a small example using netCDF. For\n", + "more see https://xarray.pydata.org/en/stable/io.html\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# write ds to netCDF\n", + "ds.to_netcdf(\"my-example-dataset.nc\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# read from disk\n", + "fromdisk = xr.open_dataset(\"my-example-dataset.nc\")\n", + "fromdisk" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# check that the two are identical\n", + "ds.identical(fromdisk)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Tip:** A common use case to read datasets that are a collection of many netCDF\n", + "files. See\n", + "https://xarray.pydata.org/en/stable/io.html#reading-multi-file-datasets for how\n", + "to handle that\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## More information\n", + "\n", + "1. A description of common terms used in the xarray documentation:\n", + " https://xarray.pydata.org/en/stable/terminology.html\n", + "1. For information on how to create a DataArray from an existing numpy array:\n", + " https://xarray.pydata.org/en/stable/data-structures.html#creating-a-dataarray\n", + "1. Answers to common questions on \"how to do X\" are here:\n", + " https://xarray.pydata.org/en/stable/howdoi.html\n", + "1. Our more extensive Scipy 2020 tutorial material:\n", + " https://xarray-contrib.github.io/xarray-tutorial/\n", + "1. Ryan Abernathey has a book on data analysis with a chapter on Xarray:\n", + " https://earth-env-data-science.github.io/lectures/xarray/xarray_intro.html\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## The scientific python / pangeo ecosystem: demo\n", + "\n", + "Xarray ties in to the larger scientific python ecosystem and in turn many\n", + "packages build on top of xarray. A long list of such packages is here:\n", + "https://xarray.pydata.org/en/stable/related-projects.html.\n", + "\n", + "Now we will demonstrate some cool features.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Pandas: tabular data structures\n", + "\n", + "You can easily convert between xarray and pandas structures:\n", + "https://pandas.pydata.org/\n", + "\n", + "This allows you to conveniently use the extensive pandas ecosystem of packages\n", + "(like seaborn) for your work.\n", + "\n", + "See https://xarray.pydata.org/en/stable/pandas.html\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:16:19.439068Z", + "start_time": "2020-08-19T16:16:19.371165Z" + } + }, + "outputs": [], + "source": [ + "# convert to pandas dataframe\n", + "df = ds.isel(time=slice(10)).to_dataframe()\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:16:20.986334Z", + "start_time": "2020-08-19T16:16:20.942823Z" + } + }, + "outputs": [], + "source": [ + "# convert dataframe to xarray\n", + "df.to_xarray()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### xarray can wrap other array types, not just numpy\n", + "\n", + "\n", + "\n", + "**dask** : parallel arrays https://xarray.pydata.org/en/stable/dask.html &\n", + "https://docs.dask.org/en/latest/array.html\n", + "\n", + "\n", + "\n", + "**pydata/sparse** : sparse arrays http://sparse.pydata.org\n", + "\n", + "\n", + "\n", + "**cupy** : GPU arrays http://cupy.chainer.org\n", + "\n", + "\n", + "\n", + "**pint** : unit-aware computations https://pint.readthedocs.org &\n", + "https://github.com/xarray-contrib/pint-xarray\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Xarray + dask\n", + "\n", + "Dask cuts up NumPy arrays into blocks and parallelizes your analysis code across\n", + "these blocks\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:16:28.269631Z", + "start_time": "2020-08-19T16:16:27.141306Z" + } + }, + "outputs": [], + "source": [ + "# make dask cluster; this is for demo purposes\n", + "import dask\n", + "import distributed\n", + "\n", + "cluster = distributed.LocalCluster()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:16:28.912370Z", + "start_time": "2020-08-19T16:16:28.843239Z" + } + }, + "outputs": [], + "source": [ + "client = distributed.Client(cluster)\n", + "client" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:16:30.205686Z", + "start_time": "2020-08-19T16:16:30.140967Z" + } + }, + "outputs": [], + "source": [ + "# demonstrate dask dataset\n", + "dasky = xr.tutorial.open_dataset(\n", + " \"air_temperature\", chunks={\"time\": 10}, # 10 time steps in each block\n", + ")\n", + "\n", + "dasky.air" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "All computations with dask-backed xarray objects are lazy, allowing you to build\n", + "up a complicated chain of analysis steps quickly\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:16:32.606881Z", + "start_time": "2020-08-19T16:16:32.561338Z" + } + }, + "outputs": [], + "source": [ + "# demonstrate lazy mean\n", + "dasky.air.mean(\"lat\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To get concrete values, call `.compute` or `.load`\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:16:38.409452Z", + "start_time": "2020-08-19T16:16:36.914676Z" + } + }, + "outputs": [], + "source": [ + "# \"compute\" the mean\n", + "dasky.air.mean(\"lat\").compute()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### holoviews: javascript interactive plots\n", + "\n", + "the `hvplot` package is a nice easy way to access\n", + "[holoviews](http://holoviews.org/) functionality. It attaches itself to all\n", + "xarray objects under the `.hvplot` namespace. So instead of using `.plot` use\n", + "`.hvplot`\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:17:26.384153Z", + "start_time": "2020-08-19T16:17:24.365996Z" + } + }, + "outputs": [], + "source": [ + "import hvplot.xarray\n", + "\n", + "ds.air.hvplot(groupby=\"time\", clim=(270, 300))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Try the slider!\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### cf_xarray : use even more metadata for even more fun and ~profit~ papers\n", + "\n", + "[cf_xarray](https://cf-xarray.readthedocs.io/) is a new project that tries to\n", + "let you make use of other CF attributes that xarray ignores. It attaches itself\n", + "to all xarray objects under the `.cf` namespace.\n", + "\n", + "Where xarray allows you to specify dimension names for analysis, `cf_xarray`\n", + "lets you specify logical names like `\"latitude\"` or `\"longitude\"` instead as\n", + "long as the appropriate CF attributes are set.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:17:43.015284Z", + "start_time": "2020-08-19T16:17:42.981131Z" + } + }, + "outputs": [], + "source": [ + "import cf_xarray" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:18:31.279888Z", + "start_time": "2020-08-19T16:18:31.245785Z" + } + }, + "outputs": [], + "source": [ + "# describe cf attributes in dataset\n", + "ds.air.cf.describe()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following `mean` operation will work with any dataset that has appropriate\n", + "attributes set that allow detection of the \"latitude\" variable (e.g.\n", + "`units: \"degress_north\"` or `standard_name: \"latitude\"`)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:18:33.806279Z", + "start_time": "2020-08-19T16:18:33.738629Z" + } + }, + "outputs": [], + "source": [ + "# demonstrate equivalent of .mean(\"lat\")\n", + "ds.air.cf.mean(\"latitude\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2020-08-19T16:19:27.628895Z", + "start_time": "2020-08-19T16:19:27.580971Z" + } + }, + "outputs": [], + "source": [ + "# demonstrate indexing\n", + "ds.air.cf.sel(longitude=242.5, method=\"nearest\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Other cool packages\n", + "\n", + "- [xgcm](https://xgcm.readthedocs.io/) : grid-aware operations with xarray\n", + " objects\n", + "- [xrft](https://xgcm.readthedocs.io/) : fourier transforms with xarray\n", + "- [xclim](https://xclim.readthedocs.io/) : calculating climate indices with\n", + " xarray objects\n", + "- [intake-xarray](https://intake-xarray.readthedocs.io/) : forget about file\n", + " paths\n", + "- [rioxarray](https://corteva.github.io/rioxarray/stable/index.html) : raster\n", + " files and xarray\n", + "- [xesmf](https://xesmf.readthedocs.io/) : regrid using ESMF\n", + "- [MetPy](https://unidata.github.io/MetPy/latest/index.html) : tools for working\n", + " with weather data\n", + "\n", + "More here: https://xarray.pydata.org/en/stable/related-projects.html\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.8" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": true, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": true + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": {}, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}