{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"[](https://colab.research.google.com/github/neuromatch/climate-course-content/blob/main/tutorials/W1D1_ClimateSystemOverview/student/W1D1_Tutorial3.ipynb)
\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"# Tutorial 3: Opening and Plotting netCDF Data\n",
"\n",
"**Week 1, Day 1, Climate System Overview**\n",
"\n",
"**Content creators:** Sloane Garelick, Julia Kent\n",
"\n",
"**Content reviewers:** Katrina Dobson, Younkap Nina Duplex, Danika Gupta, Maria Gonzalez, Will Gregory, Nahid Hasan, Paul Heubel, Sherry Mi, Beatriz Cosenza Muralles, Jenna Pearson, Agustina Pesce, Chi Zhang, Ohad Zivan\n",
"\n",
"**Content editors:** Paul Heubel, Jenna Pearson, Chi Zhang, Ohad Zivan\n",
"\n",
"**Production editors:** Wesley Banfield, Paul Heubel, Jenna Pearson, Konstantine Tsafatinos, Chi Zhang, Ohad Zivan\n",
"\n",
"**Our 2024 Sponsors:** CMIP, NFDI4Earth\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## \n",
"\n",
"Pythia credit: Rose, B. E. J., Kent, J., Tyle, K., Clyne, J., Banihirwe, A., Camron, D., May, R., Grover, M., Ford, R. R., Paul, K., Morley, J., Eroglu, O., Kailyn, L., & Zacharias, A. (2023). Pythia Foundations (Version v2023.05.01) https://zenodo.org/record/8065851\n",
"\n",
"## \n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"# Tutorial Objectives\n",
"\n",
"*Estimated timing of tutorial:* 30 minutes \n",
"\n",
"Many global climate datasets are stored as [NetCDF](https://pro.arcgis.com/en/pro-app/latest/help/data/multidimensional/what-is-netcdf-data.htm) (network Common Data Form) files. NetCDF is a file format for storing multidimensional variables such as temperature, humidity, pressure, wind speed, and direction. These types of files also include metadata that gives you information about the variables and the dataset itself.\n",
"\n",
"In this tutorial, we will import atmospheric pressure and temperature data stored in a NetCDF file. We will learn how to use various attributes of Xarray to import, analyze, interpret, and plot the data."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"# Setup\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"# installations ( uncomment and run this cell ONLY when using google colab or kaggle )\n",
"#!pip install pythia_datasets"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 975,
"status": "ok",
"timestamp": 1681571208838,
"user": {
"displayName": "Sloane Garelick",
"userId": "04706287370408131987"
},
"user_tz": 240
},
"tags": []
},
"outputs": [],
"source": [
"# imports\n",
"import numpy as np\n",
"import xarray as xr\n",
"from pythia_datasets import DATASETS\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Figure Settings\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Figure Settings\n",
"import ipywidgets as widgets # interactive display\n",
"\n",
"%config InlineBackend.figure_format = 'retina'\n",
"plt.style.use(\n",
" \"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Video 1: Atmospheric Climate Systems\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Video 1: Atmospheric Climate Systems\n",
"\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == \"Bilibili\":\n",
" src = f\"https://player.bilibili.com/player.html?bvid={id}&page={page}\"\n",
" elif source == \"Osf\":\n",
" src = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render\"\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == \"Youtube\":\n",
" video = YouTubeVideo(\n",
" id=video_ids[i][1], width=W, height=H, fs=fs, rel=0\n",
" )\n",
" print(\n",
" f\"Video available at https://youtube.com/watch?v={video.id}\")\n",
" else:\n",
" video = PlayVideo(\n",
" id=video_ids[i][1],\n",
" source=video_ids[i][0],\n",
" width=W,\n",
" height=H,\n",
" fs=fs,\n",
" autoplay=False,\n",
" )\n",
" if video_ids[i][0] == \"Bilibili\":\n",
" print(\n",
" f\"Video available at https://www.bilibili.com/video/{video.id}\"\n",
" )\n",
" elif video_ids[i][0] == \"Osf\":\n",
" print(f\"Video available at https://osf.io/{video.id}\")\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [(\"Youtube\", \"7yJM9MDLeoo\"), (\"Bilibili\", \"BV1wz4y177xE\")]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"pycharm": {
"name": "#%%\n"
},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @markdown\n",
"from ipywidgets import widgets\n",
"from IPython.display import IFrame\n",
"\n",
"link_id = \"cnfwz\"\n",
"\n",
"download_link = f\"https://osf.io/download/{link_id}/\"\n",
"render_link = f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\"\n",
"# @markdown\n",
"out = widgets.Output()\n",
"with out:\n",
" print(f\"If you want to download the slides: {download_link}\")\n",
" display(IFrame(src=f\"{render_link}\", width=730, height=410))\n",
"display(out)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"# Section 1: Opening netCDF Data\n",
"\n",
"Xarray is closely linked with the netCDF data model, and it even treats netCDF as a 'first-class' file format. This means that Xarray can easily open netCDF datasets. However, these datasets need to follow some of Xarray's rules. One such rule is that coordinates must be 1-dimensional.\n",
"\n",
"Here we're getting the data from Project Pythia's custom library of example data, which we already imported above with from pythia_datasets import DATASETS
. The DATASETS.fetch()
method will automatically download and cache (store) our example data file NARR_19930313_0000.nc
locally.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 1064,
"status": "ok",
"timestamp": 1681571216216,
"user": {
"displayName": "Sloane Garelick",
"userId": "04706287370408131987"
},
"user_tz": 240
},
"tags": []
},
"outputs": [],
"source": [
"filepath = DATASETS.fetch(\"NARR_19930313_0000.nc\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Once we have a valid path to a data file that Xarray knows how to read, we can open it like this:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 891,
"status": "ok",
"timestamp": 1681571219656,
"user": {
"displayName": "Sloane Garelick",
"userId": "04706287370408131987"
},
"user_tz": 240
},
"tags": []
},
"outputs": [],
"source": [
"ds = xr.open_dataset(filepath)\n",
"ds"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Questions 1\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"1. What are the dimensions of this dataset?\n",
"1. How many climate variables are in this dataset?\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {},
"tags": []
},
"source": [
"[*Click for solution*](https://github.com/neuromatch/climate-course-content/tree/main/tutorials/W1D1_ClimateSystemOverview/solutions/W1D1_Tutorial3_Solution_f9b80099.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 1.1: Subsetting the `Dataset`\n",
"\n",
"Our call to `xr.open_dataset()` above returned a `Dataset` object that we've decided to call `ds`. We can then pull out individual fields. First, let's assess the `isobaric1` values. **Isobaric** means characterized by constant or equal pressure. Let's look at the `isobaric1` values:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 268,
"status": "ok",
"timestamp": 1681571222970,
"user": {
"displayName": "Sloane Garelick",
"userId": "04706287370408131987"
},
"user_tz": 240
},
"tags": []
},
"outputs": [],
"source": [
"ds.isobaric1\n",
"# Recall that we can also use dictionary syntax like `ds['isobaric1']` to do the same thing"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"The `isobaric1` coordinate contains 29 pressure values (in hPa) corresponding to different pressures of the atmosphere. Recall from the video that pressure decreases with height in the atmosphere. Therefore, in our dataset lower atmospheric pressure values will correspond to higher altitudes. For each isobaric pressure value, there is data for all other variables in the dataset at that same pressure level of the atmosphere:\n",
"\n",
"- **Wind**: the $u$ and $v$ components of the wind describe the direction of wind movement along a pressure level of the atmosphere. The $u$ wind component is parallel to the x-axis (i.e. longitude) and the $v$ wind component is parallel to the y-axis (i.e. latitude).\n",
"- **Temperature**: temperatures on a specific atmospheric pressure level\n",
"- **Geopotential Height**: the height of a given point in the atmosphere in units proportional to the potential energy of unit mass (geopotential) at this height relative to sea level\n",
"\n",
"Let's explore this `Dataset` a bit further.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"`Datasets` also support much of the same subsetting operations as `DataArray`, but will perform the operation on all data. Let's subset all data from an atmospheric pressure of 1000 hPa (the typical pressure at sea level):\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 143,
"status": "ok",
"timestamp": 1681571229281,
"user": {
"displayName": "Sloane Garelick",
"userId": "04706287370408131987"
},
"user_tz": 240
},
"tags": []
},
"outputs": [],
"source": [
"ds_1000 = ds.sel(isobaric1=1000.0)\n",
"ds_1000"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"We can further subset to a single `DataArray` to isolate a specific climate measurement. Let's subset temperature from the atmospheric level at which the isobaric pressure is 1000 hPa:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 227,
"status": "ok",
"timestamp": 1681571232661,
"user": {
"displayName": "Sloane Garelick",
"userId": "04706287370408131987"
},
"user_tz": 240
},
"tags": []
},
"outputs": [],
"source": [
"ds_1000.Temperature_isobaric"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 1.2: Aggregation Operations\n",
"\n",
"Not only can you use the named dimensions for manual slicing and indexing of data (as we saw in the last tutorial), but you can also use it to control aggregation operations (e.g., average, sum, standard deviation). Aggregation methods for Xarray objects operate over the named coordinate dimensions specified by keyword argument dim
.\n",
"\n",
"First, let's try calculating the [`.std()`](https://docs.xarray.dev/en/latest/generated/xarray.DataArray.std.html) ([standard deviation](https://en.wikipedia.org/wiki/Standard_deviation)) of the $u$ component of the isobaric wind from our `Dataset`. The following code will calculate the standard deviation of all the `u-component_of_wind_isobaric` values at each isobaric level. In other words, we'll end up with one standard deviation value for each of the 29 isobaric levels. Note: because of the '-' present in the name, we cannot use dot notation to select this variable and must use a dictionary key style selection instead.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 161,
"status": "ok",
"timestamp": 1681571235929,
"user": {
"displayName": "Sloane Garelick",
"userId": "04706287370408131987"
},
"user_tz": 240
},
"tags": []
},
"outputs": [],
"source": [
"# get wind data in longitudinal direction\n",
"u_winds = ds[\"u-component_of_wind_isobaric\"]\n",
"\n",
"# take the standard deviation\n",
"u_winds.std(dim=[\"x\", \"y\"])"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Side note: Recall that the $u$ wind component is parallel to the x-axis (i.e. longitude) and the $v$ wind component is parallel to the y-axis (i.e. latitude). A positive $u$ wind comes from the west, and a negative $u$ wind comes from the east. A positive $v$ wind comes from the south, and a negative $v$ wind comes from the north."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Next, let's try calculating the mean of the temperature profile (temperature as a function of pressure) over a specific region. For this exercise, we will calculate the temperature profile over Colorado, USA. The bounds of Colorado are:\n",
"\n",
"- x: -182km to 424km\n",
"- y: -1450km to -990km\n",
"\n",
"If you look back at the values for `x` and `y` in our dataset, the units for these values are kilometers (km). Remember that they are also the coordinates for the `lat` and `lon` variables in our dataset. The bounds for Colorado correspond to the coordinates 37°N to 41°N and 102°W to 109°W.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 149,
"status": "ok",
"timestamp": 1681571239494,
"user": {
"displayName": "Sloane Garelick",
"userId": "04706287370408131987"
},
"user_tz": 240
},
"tags": []
},
"outputs": [],
"source": [
"# get the temperature data\n",
"temps = ds.Temperature_isobaric\n",
"\n",
"# take just the spatial data we are interested in for Colorado\n",
"co_temps = temps.sel(x=slice(-182, 424), y=slice(-1450, -990))\n",
"\n",
"# take the spatial average\n",
"prof = co_temps.mean(dim=[\"x\", \"y\"])\n",
"prof"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"# Section 2: Plotting with Xarray\n",
"\n",
"Another major benefit of using labeled data structures is that they enable automated plotting with axis labels.\n",
"\n",
"## Section 2.1: Simple Visualization with `.plot()`\n",
"\n",
"Much like [Pandas](https://foundations.projectpythia.org/core/pandas.html), Xarray includes an interface to [Matplotlib](https://foundations.projectpythia.org/core/matplotlib.html) that we can access through the `.plot()` method of every `DataArray`.\n",
"\n",
"For quick and easy data exploration, we can just call [`.plot()`](https://docs.xarray.dev/en/latest/user-guide/plotting.html) without any modifiers:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 336,
"status": "ok",
"timestamp": 1681571243036,
"user": {
"displayName": "Sloane Garelick",
"userId": "04706287370408131987"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"prof.plot()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Here Xarray has generated a line plot of the temperature data against the coordinate variable `isobaric`. Also, the metadata are used to auto-generate axis labels and units.\n",
"\n",
"Consider the following questions:\n",
"\n",
"- What isobaric pressure corresponds to Earth's surface?\n",
"- How does temperature change with increasing altitude in the atmosphere?\n",
"\n",
"It might be a bit difficult to answer these questions with our current plot, so let's try customizing our figure to present the data clearer.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 2.2: Customizing the Plot\n",
"\n",
"As in Pandas, the [`.plot()`](https://docs.xarray.dev/en/latest/generated/xarray.DataArray.plot.html) method is mostly just a wrapper to Matplotlib, so we can customize our plot in familiar ways.\n",
"\n",
"In this air temperature profile example, we would like to make two changes:\n",
"\n",
"- swap the axes so that we have isobaric levels on the y-axis (vertical) of the figure (since isobaric levels correspond to altitude)\n",
"- make pressure decrease upward in the figure so that up is up (since pressure decreases with altitude)\n",
"\n",
"We can do this by adding a few keyword arguments to our `.plot()`:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 446,
"status": "ok",
"timestamp": 1681571246625,
"user": {
"displayName": "Sloane Garelick",
"userId": "04706287370408131987"
},
"user_tz": 240
},
"tags": []
},
"outputs": [],
"source": [
"prof.plot(y=\"isobaric1\", yincrease=False)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Questions 2.2\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"1. What isobaric pressure corresponds to Earth's surface?\n",
"2. Why do you think temperature generally decreases with height?\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {},
"tags": []
},
"source": [
"[*Click for solution*](https://github.com/neuromatch/climate-course-content/tree/main/tutorials/W1D1_ClimateSystemOverview/solutions/W1D1_Tutorial3_Solution_6605983a.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 2.3: Plotting 2D Data\n",
"\n",
"In the example above, the `.plot()` method produced a line plot.\n",
"\n",
"What if we call `.plot()` on a 2D array? Let's try plotting the temperature data from the 1000 hPa isobaric level (surface temperature) for all x and y values:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 1089,
"status": "ok",
"timestamp": 1681571250322,
"user": {
"displayName": "Sloane Garelick",
"userId": "04706287370408131987"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"temps.sel(isobaric1=1000).plot()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Xarray has recognized that the `DataArray` object calling the plot method has two coordinate variables, and generates a 2D plot using the `.pcolormesh()` method from Matplotlib.\n",
"\n",
"In this case, we are looking at air temperatures on the 1000 hPa isobaric surface over North America. Note you could improve this figure further by using [Cartopy](https://foundations.projectpythia.org/core/cartopy.html) to handle the map projection and geographic features.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Questions 2.3: Climate Connection\n",
"\n",
"1. The map you made shows the temperature across the United States at the 1000 hPa level of the atmosphere. How do you think temperatures at the 500 hPa level would compare? What might be causing the spatial differences in temperature seen in the map?\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {},
"tags": []
},
"source": [
"[*Click for solution*](https://github.com/neuromatch/climate-course-content/tree/main/tutorials/W1D1_ClimateSystemOverview/solutions/W1D1_Tutorial3_Solution_1c4e9001.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"# Summary\n",
"\n",
"Xarray brings the joy of Pandas-style labeled data operations to N-dimensional data. As such, it has become a central workhorse in the geoscience community for analyzing gridded datasets. Xarray allows us to open self-describing NetCDF files and make full use of the coordinate axes, labels, units, and other metadata. By utilizing labeled coordinates, our code becomes simpler to write, easier to read, and more robust.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"# Resources\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Code and data for this tutorial is based on existing content from [Project Pythia](https://foundations.projectpythia.org/core/xarray/xarray-intro.html).\n"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"include_colab_link": true,
"name": "W1D1_Tutorial3",
"provenance": [
{
"file_id": "1f2uyMuRNCH2LLG5u4Z4Tdb_OHLHB9saW",
"timestamp": 1679941598643
}
],
"toc_visible": true
},
"kernel": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.19"
}
},
"nbformat": 4,
"nbformat_minor": 4
}