Open In Colab   Open in Kaggle

Tutorial 3: Opening and Plotting netCDF Data#

Week 1, Day 1, Climate System Overview

Content creators: Sloane Garelick, Julia Kent

Content reviewers: Katrina Dobson, Younkap Nina Duplex, Danika Gupta, Maria Gonzalez, Will Gregory, Nahid Hasan, Paul Heubel, Sherry Mi, Beatriz Cosenza Muralles, Jenna Pearson, Agustina Pesce, Chi Zhang, Ohad Zivan

Content editors: Paul Heubel, Jenna Pearson, Chi Zhang, Ohad Zivan

Production editors: Wesley Banfield, Paul Heubel, Jenna Pearson, Konstantine Tsafatinos, Chi Zhang, Ohad Zivan

Our 2024 Sponsors: CMIP, NFDI4Earth

project pythia#

Pythia credit: Rose, B. E. J., Kent, J., Tyle, K., Clyne, J., Banihirwe, A., Camron, D., May, R., Grover, M., Ford, R. R., Paul, K., Morley, J., Eroglu, O., Kailyn, L., & Zacharias, A. (2023). Pythia Foundations (Version v2023.05.01) https://zenodo.org/record/8065851

CMIP.png#

Tutorial Objectives#

Estimated timing of tutorial: 30 minutes

Many global climate datasets are stored as NetCDF (network Common Data Form) files. NetCDF is a file format for storing multidimensional variables such as temperature, humidity, pressure, wind speed, and direction. These types of files also include metadata that gives you information about the variables and the dataset itself.

In this tutorial, we will import atmospheric pressure and temperature data stored in a NetCDF file. We will learn how to use various attributes of Xarray to import, analyze, interpret, and plot the data.

Setup#

# installations ( uncomment and run this cell ONLY when using google colab or kaggle )
#!pip install pythia_datasets
# imports
import numpy as np
import xarray as xr
from pythia_datasets import DATASETS
import matplotlib.pyplot as plt

Install and import feedback gadget#

Hide code cell source
# @title Install and import feedback gadget

!pip3 install vibecheck datatops --quiet

from vibecheck import DatatopsContentReviewContainer
def content_review(notebook_section: str):
    return DatatopsContentReviewContainer(
        "",  # No text prompt
        notebook_section,
        {
            "url": "https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab",
            "name": "comptools_4clim",
            "user_key": "l5jpxuee",
        },
    ).render()


feedback_prefix = "W1D1_T3"
[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: pip install --upgrade pip

Figure Settings#

Hide code cell source
# @title Figure Settings
import ipywidgets as widgets  # interactive display

%config InlineBackend.figure_format = 'retina'
plt.style.use(
    "https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle"
)

Video 1: Atmospheric Climate Systems#

Submit your feedback#

Hide code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Atmospheric_Climate_Systems_Video")
If you want to download the slides: https://osf.io/download/cnfwz/

Submit your feedback#

Hide code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Atmospheric_Climate_Systems_Slides")

Section 1: Opening netCDF Data#

Xarray is closely linked with the netCDF data model, and it even treats netCDF as a ‘first-class’ file format. This means that Xarray can easily open netCDF datasets. However, these datasets need to follow some of Xarray’s rules. One such rule is that coordinates must be 1-dimensional.

Here we’re getting the data from Project Pythia’s custom library of example data, which we already imported above with from pythia_datasets import DATASETS. The DATASETS.fetch() method will automatically download and cache (store) our example data file NARR_19930313_0000.nc locally.

filepath = DATASETS.fetch("NARR_19930313_0000.nc")
Downloading file 'NARR_19930313_0000.nc' from 'https://github.com/ProjectPythia/pythia-datasets/raw/main/data/NARR_19930313_0000.nc' to '/home/runner/.cache/pythia-datasets'.

Once we have a valid path to a data file that Xarray knows how to read, we can open it like this:

ds = xr.open_dataset(filepath)
ds
<xarray.Dataset> Size: 15MB
Dimensions:                       (time1: 1, isobaric1: 29, y: 119, x: 268)
Coordinates:
  * time1                         (time1) datetime64[ns] 8B 1993-03-13
  * isobaric1                     (isobaric1) float32 116B 100.0 125.0 ... 1e+03
  * y                             (y) float32 476B -3.117e+03 ... 714.1
  * x                             (x) float32 1kB -3.324e+03 ... 5.343e+03
Data variables:
    u-component_of_wind_isobaric  (time1, isobaric1, y, x) float32 4MB ...
    LambertConformal_Projection   int32 4B ...
    lat                           (y, x) float64 255kB ...
    lon                           (y, x) float64 255kB ...
    Geopotential_height_isobaric  (time1, isobaric1, y, x) float32 4MB ...
    v-component_of_wind_isobaric  (time1, isobaric1, y, x) float32 4MB ...
    Temperature_isobaric          (time1, isobaric1, y, x) float32 4MB ...
Attributes:
    Originating_or_generating_Center:     US National Weather Service, Nation...
    Originating_or_generating_Subcenter:  North American Regional Reanalysis ...
    GRIB_table_version:                   0,131
    Generating_process_or_model:          North American Regional Reanalysis ...
    Conventions:                          CF-1.6
    history:                              Read using CDM IOSP GribCollection v3
    featureType:                          GRID
    History:                              Translated to CF-1.0 Conventions by...
    geospatial_lat_min:                   10.753308882144761
    geospatial_lat_max:                   46.8308828962289
    geospatial_lon_min:                   -153.88242040519995
    geospatial_lon_max:                   -42.666108129242815

Questions 1#

  1. What are the dimensions of this dataset?

  2. How many climate variables are in this dataset?

Click for solution

Submit your feedback#

Hide code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Questions_1")

Section 1.1: Subsetting the Dataset#

Our call to xr.open_dataset() above returned a Dataset object that we’ve decided to call ds. We can then pull out individual fields. First, let’s assess the isobaric1 values. Isobaric means characterized by constant or equal pressure. Let’s look at the isobaric1 values:

ds.isobaric1
# Recall that we can also use dictionary syntax like `ds['isobaric1']` to do the same thing
<xarray.DataArray 'isobaric1' (isobaric1: 29)> Size: 116B
array([ 100.,  125.,  150.,  175.,  200.,  225.,  250.,  275.,  300.,  350.,
        400.,  450.,  500.,  550.,  600.,  650.,  700.,  725.,  750.,  775.,
        800.,  825.,  850.,  875.,  900.,  925.,  950.,  975., 1000.],
      dtype=float32)
Coordinates:
  * isobaric1  (isobaric1) float32 116B 100.0 125.0 150.0 ... 950.0 975.0 1e+03
Attributes:
    units:                   hPa
    long_name:               Isobaric surface
    positive:                down
    Grib_level_type:         100
    _CoordinateAxisType:     Pressure
    _CoordinateZisPositive:  down

The isobaric1 coordinate contains 29 pressure values (in hPa) corresponding to different pressures of the atmosphere. Recall from the video that pressure decreases with height in the atmosphere. Therefore, in our dataset lower atmospheric pressure values will correspond to higher altitudes. For each isobaric pressure value, there is data for all other variables in the dataset at that same pressure level of the atmosphere:

  • Wind: the \(u\) and \(v\) components of the wind describe the direction of wind movement along a pressure level of the atmosphere. The \(u\) wind component is parallel to the x-axis (i.e. longitude) and the \(v\) wind component is parallel to the y-axis (i.e. latitude).

  • Temperature: temperatures on a specific atmospheric pressure level

  • Geopotential Height: the height of a given point in the atmosphere in units proportional to the potential energy of unit mass (geopotential) at this height relative to sea level

Let’s explore this Dataset a bit further.

Datasets also support much of the same subsetting operations as DataArray, but will perform the operation on all data. Let’s subset all data from an atmospheric pressure of 1000 hPa (the typical pressure at sea level):

ds_1000 = ds.sel(isobaric1=1000.0)
ds_1000
<xarray.Dataset> Size: 1MB
Dimensions:                       (time1: 1, y: 119, x: 268)
Coordinates:
  * time1                         (time1) datetime64[ns] 8B 1993-03-13
    isobaric1                     float32 4B 1e+03
  * y                             (y) float32 476B -3.117e+03 ... 714.1
  * x                             (x) float32 1kB -3.324e+03 ... 5.343e+03
Data variables:
    u-component_of_wind_isobaric  (time1, y, x) float32 128kB ...
    LambertConformal_Projection   int32 4B ...
    lat                           (y, x) float64 255kB ...
    lon                           (y, x) float64 255kB ...
    Geopotential_height_isobaric  (time1, y, x) float32 128kB ...
    v-component_of_wind_isobaric  (time1, y, x) float32 128kB ...
    Temperature_isobaric          (time1, y, x) float32 128kB ...
Attributes:
    Originating_or_generating_Center:     US National Weather Service, Nation...
    Originating_or_generating_Subcenter:  North American Regional Reanalysis ...
    GRIB_table_version:                   0,131
    Generating_process_or_model:          North American Regional Reanalysis ...
    Conventions:                          CF-1.6
    history:                              Read using CDM IOSP GribCollection v3
    featureType:                          GRID
    History:                              Translated to CF-1.0 Conventions by...
    geospatial_lat_min:                   10.753308882144761
    geospatial_lat_max:                   46.8308828962289
    geospatial_lon_min:                   -153.88242040519995
    geospatial_lon_max:                   -42.666108129242815

We can further subset to a single DataArray to isolate a specific climate measurement. Let’s subset temperature from the atmospheric level at which the isobaric pressure is 1000 hPa:

ds_1000.Temperature_isobaric
<xarray.DataArray 'Temperature_isobaric' (time1: 1, y: 119, x: 268)> Size: 128kB
[31892 values with dtype=float32]
Coordinates:
  * time1      (time1) datetime64[ns] 8B 1993-03-13
    isobaric1  float32 4B 1e+03
  * y          (y) float32 476B -3.117e+03 -3.084e+03 -3.052e+03 ... 681.6 714.1
  * x          (x) float32 1kB -3.324e+03 -3.292e+03 ... 5.311e+03 5.343e+03
Attributes:
    long_name:           Temperature @ Isobaric surface
    units:               K
    description:         Temperature
    grid_mapping:        LambertConformal_Projection
    Grib_Variable_Id:    VAR_7-15-131-11_L100
    Grib1_Center:        7
    Grib1_Subcenter:     15
    Grib1_TableVersion:  131
    Grib1_Parameter:     11
    Grib1_Level_Type:    100
    Grib1_Level_Desc:    Isobaric surface

Section 1.2: Aggregation Operations#

Not only can you use the named dimensions for manual slicing and indexing of data (as we saw in the last tutorial), but you can also use it to control aggregation operations (e.g., average, sum, standard deviation). Aggregation methods for Xarray objects operate over the named coordinate dimensions specified by keyword argument dim.

First, let’s try calculating the .std() (standard deviation) of the \(u\) component of the isobaric wind from our Dataset. The following code will calculate the standard deviation of all the u-component_of_wind_isobaric values at each isobaric level. In other words, we’ll end up with one standard deviation value for each of the 29 isobaric levels. Note: because of the ‘-’ present in the name, we cannot use dot notation to select this variable and must use a dictionary key style selection instead.

# get wind data in longitudinal direction
u_winds = ds["u-component_of_wind_isobaric"]

# take the standard deviation
u_winds.std(dim=["x", "y"])
<xarray.DataArray 'u-component_of_wind_isobaric' (time1: 1, isobaric1: 29)> Size: 116B
array([[ 8.673962 , 10.212324 , 11.556413 , 12.254428 , 13.372146 ,
        15.472462 , 16.09197  , 15.846294 , 15.195834 , 13.936979 ,
        12.938881 , 12.060708 , 10.972139 ,  9.722329 ,  8.853286 ,
         8.257242 ,  7.679721 ,  7.45165  ,  7.2352104,  7.039894 ,
         6.8833704,  6.782149 ,  6.7088237,  6.6865997,  6.7247376,
         6.7450233,  6.6859775,  6.5107226,  5.9722614]], dtype=float32)
Coordinates:
  * time1      (time1) datetime64[ns] 8B 1993-03-13
  * isobaric1  (isobaric1) float32 116B 100.0 125.0 150.0 ... 950.0 975.0 1e+03

Side note: Recall that the \(u\) wind component is parallel to the x-axis (i.e. longitude) and the \(v\) wind component is parallel to the y-axis (i.e. latitude). A positive \(u\) wind comes from the west, and a negative \(u\) wind comes from the east. A positive \(v\) wind comes from the south, and a negative \(v\) wind comes from the north.

Next, let’s try calculating the mean of the temperature profile (temperature as a function of pressure) over a specific region. For this exercise, we will calculate the temperature profile over Colorado, USA. The bounds of Colorado are:

  • x: -182km to 424km

  • y: -1450km to -990km

If you look back at the values for x and y in our dataset, the units for these values are kilometers (km). Remember that they are also the coordinates for the lat and lon variables in our dataset. The bounds for Colorado correspond to the coordinates 37°N to 41°N and 102°W to 109°W.

# get the temperature data
temps = ds.Temperature_isobaric

# take just the spatial data we are interested in for Colorado
co_temps = temps.sel(x=slice(-182, 424), y=slice(-1450, -990))

# take the spatial average
prof = co_temps.mean(dim=["x", "y"])
prof
<xarray.DataArray 'Temperature_isobaric' (time1: 1, isobaric1: 29)> Size: 116B
array([[215.078  , 215.76935, 217.243  , 217.82663, 215.83487, 216.10933,
        219.99902, 224.66118, 228.80576, 234.88701, 238.78503, 242.66309,
        246.44807, 249.26636, 250.84995, 253.37354, 257.0429 , 259.08398,
        260.97955, 262.98364, 264.82138, 266.5198 , 268.22467, 269.7471 ,
        271.18216, 272.66815, 274.13037, 275.54718, 276.97675]],
      dtype=float32)
Coordinates:
  * time1      (time1) datetime64[ns] 8B 1993-03-13
  * isobaric1  (isobaric1) float32 116B 100.0 125.0 150.0 ... 950.0 975.0 1e+03

Section 2: Plotting with Xarray#

Another major benefit of using labeled data structures is that they enable automated plotting with axis labels.

Section 2.1: Simple Visualization with .plot()#

Much like Pandas, Xarray includes an interface to Matplotlib that we can access through the .plot() method of every DataArray.

For quick and easy data exploration, we can just call .plot() without any modifiers:

prof.plot()
[<matplotlib.lines.Line2D at 0x7fde1c1aa430>]
../../../_images/300787e13379ea7bd11c05e07dce096e4bbec1b01a931ccb277277ac6492533b.png

Here Xarray has generated a line plot of the temperature data against the coordinate variable isobaric. Also, the metadata are used to auto-generate axis labels and units.

Consider the following questions:

  • What isobaric pressure corresponds to Earth’s surface?

  • How does temperature change with increasing altitude in the atmosphere?

It might be a bit difficult to answer these questions with our current plot, so let’s try customizing our figure to present the data clearer.

Section 2.2: Customizing the Plot#

As in Pandas, the .plot() method is mostly just a wrapper to Matplotlib, so we can customize our plot in familiar ways.

In this air temperature profile example, we would like to make two changes:

  • swap the axes so that we have isobaric levels on the y-axis (vertical) of the figure (since isobaric levels correspond to altitude)

  • make pressure decrease upward in the figure so that up is up (since pressure decreases with altitude)

We can do this by adding a few keyword arguments to our .plot():

prof.plot(y="isobaric1", yincrease=False)
[<matplotlib.lines.Line2D at 0x7fde1c25e6d0>]
../../../_images/da96f44860c20009c19c8424fa239b034ca83a2fb7b601be45038bd13411505d.png

Questions 2.2#

  1. What isobaric pressure corresponds to Earth’s surface?

  2. Why do you think temperature generally decreases with height?

Click for solution

Submit your feedback#

Hide code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Questions_2_2")

Section 2.3: Plotting 2D Data#

In the example above, the .plot() method produced a line plot.

What if we call .plot() on a 2D array? Let’s try plotting the temperature data from the 1000 hPa isobaric level (surface temperature) for all x and y values:

temps.sel(isobaric1=1000).plot()
<matplotlib.collections.QuadMesh at 0x7fde189c1850>
../../../_images/0ddc90e99bd0532c028eb101432d127bc5d12c5efce2c2213750204d7dce6614.png

Xarray has recognized that the DataArray object calling the plot method has two coordinate variables, and generates a 2D plot using the .pcolormesh() method from Matplotlib.

In this case, we are looking at air temperatures on the 1000 hPa isobaric surface over North America. Note you could improve this figure further by using Cartopy to handle the map projection and geographic features.

Questions 2.3: Climate Connection#

  1. The map you made shows the temperature across the United States at the 1000 hPa level of the atmosphere. How do you think temperatures at the 500 hPa level would compare? What might be causing the spatial differences in temperature seen in the map?

Click for solution

Submit your feedback#

Hide code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Questions_2_3")

Summary#

Xarray brings the joy of Pandas-style labeled data operations to N-dimensional data. As such, it has become a central workhorse in the geoscience community for analyzing gridded datasets. Xarray allows us to open self-describing NetCDF files and make full use of the coordinate axes, labels, units, and other metadata. By utilizing labeled coordinates, our code becomes simpler to write, easier to read, and more robust.

Resources#

Code and data for this tutorial is based on existing content from Project Pythia.