Tutorial 2: A Lot of Weather Makes Climate - Exploring the ERA5 Reanalysis#
Week 1, Day 2, Ocean-Atmosphere Reanalysis
Content creators: Momme Hell
Content reviewers: Katrina Dobson, Danika Gupta, Maria Gonzalez, Will Gregory, Nahid Hasan, Paul Heubel, Sherry Mi, Beatriz Cosenza Muralles, Jenna Pearson, Chi Zhang, Ohad Zivan
Content editors: Paul Heubel, Jenna Pearson, Chi Zhang, Ohad Zivan
Production editors: Wesley Banfield, Paul Heubel, Jenna Pearson, Konstantine Tsafatinos, Chi Zhang, Ohad Zivan
Our 2024 Sponsors: NFDI4Earth, CMIP
Tutorial Objectives#
Estimated timing of tutorial: 25 mins
In the previous tutorial, we learned about the El Niño Southern Oscillation (ENSO), which is a specific atmosphere-ocean dynamical phenomenon. You will now examine the atmosphere and the ocean systems more generally.
In this tutorial, you will learn to work with reanalysis data. These data combine observations and models of the Earth system and are a critical tool for weather and climate science. You will first access a specific reanalysis dataset: ECMWF’s ERA5. You will then select variables and slices of interest from the preprocessed file, investigating how important climate variables change on medium-length timescales (hours to months) within a certain region.
By the end of this tutorial, you will be able to:
Access reanalysis data of climatically important variables.
Plot interactive maps to explore changes on various time scales.
Compute and compare time series of different variables from reanalysis data.
Setup#
# installations ( uncomment and run this cell ONLY when using google colab or kaggle )
# !pip install cartopy
# !pip install geoviews
# !pip install cdsapi
import cdsapi
import matplotlib.pyplot as plt
import xarray as xr
import numpy as np
import geoviews as gv
import geoviews.feature as gf
import holoviews
import os
import pooch
import tempfile
from cartopy import crs as ccrs
import warnings
# Suppress warnings issued by Cartopy when downloading data files
warnings.filterwarnings('ignore')
Install and import feedback gadget#
Show code cell source
# @title Install and import feedback gadget
!pip3 install vibecheck datatops --quiet
from vibecheck import DatatopsContentReviewContainer
def content_review(notebook_section: str):
return DatatopsContentReviewContainer(
"", # No text prompt
notebook_section,
{
"url": "https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab",
"name": "comptools_4clim",
"user_key": "l5jpxuee",
},
).render()
feedback_prefix = "W1D2_T2"
[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: pip install --upgrade pip
Helper functions#
Show code cell source
# @title Helper functions
def pooch_load(filelocation=None, filename=None, processor=None):
shared_location = "/home/jovyan/shared/Data/tutorials/W1D2_Ocean-AtmosphereReanalysis" # this is different for each day
user_temp_cache = tempfile.gettempdir()
if os.path.exists(os.path.join(shared_location, filename)):
file = os.path.join(shared_location, filename)
else:
file = pooch.retrieve(
filelocation,
known_hash=None,
fname=os.path.join(user_temp_cache, filename),
processor=processor,
)
return file
Figure Settings#
Show code cell source
# @title Figure Settings
import ipywidgets as widgets # interactive display
%config InlineBackend.figure_format = 'retina'
plt.style.use(
"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle"
)
Video 1: ECMWF Reanalysis#
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_ECMWF_Reanalysis_Video")
Section 1: What is Reanalysis Data?#
Reanalysis refers to the process of combining historical observations from a variety of sources, such as weather stations, satellite measurements, and ocean buoys, with numerical models to create a comprehensive and consistent record of past weather and climate conditions. Reanalysis data is a useful tool to examine the Earth’s climate system over a wide range of time scales, from seasonal through decadal to century-scale changes.
There are multiple Earth system reanalysis products (e.g. MERRA-2, NCEP-NCAR, JRA-55C, see an extensive list here), and no single product fits all needs. For this tutorial, you will be using a product from the European Centre for Medium-Range Weather Forecasts (ECMWF) called ECMWF Reanalysis v5 (ERA5). This video from the ECMWF provides you with a brief introduction to the ERA5 product.
Section 1.1: Accessing ERA5 Data#
You will access the data through our OSF cloud storage to simplify the downloading process. If you are keen to download the data yourself or are simply interested in exploring other variables, please have a look into the get_ERA5_reanalysis_data.ipynb
notebook, where we use the Climate Data Store (CDS) API to get a subset of the huge ECMWF ERA5 Reanalysis data set.
Let’s select a specific year and month to work with, March of 2018:
# load data: 5 variables of ERA5 reanalysis, subregion, hourly, March 2018
fname_ERA5_allvars = "ERA5_5vars_032018_hourly_NE-US.nc"
url_ERA5_allvars = "https://osf.io/7kcwn/download"
ERA5_allvars = xr.open_dataset(pooch_load(url_ERA5_allvars, fname_ERA5_allvars))
Downloading data from 'https://osf.io/7kcwn/download' to file '/tmp/ERA5_5vars_032018_hourly_NE-US.nc'.
SHA256 hash of downloaded file: 8415d3380ab72642775ade13eb0f61818ea079e69dc6bade36f898c2bae5687d
Use this value as the 'known_hash' argument of 'pooch.retrieve' to ensure that the file hasn't changed if it is downloaded again in the future.
ERA5_allvars
<xarray.Dataset> Size: 152MB Dimensions: (longitude: 101, latitude: 101, time: 744) Coordinates: * longitude (longitude) float32 404B -90.0 -89.75 -89.5 ... -65.25 -65.0 * latitude (latitude) float32 404B 55.2 54.95 54.7 54.45 ... 30.7 30.45 30.2 * time (time) datetime64[ns] 6kB 2018-03-01 ... 2018-03-31T23:00:00 Data variables: u10 (time, latitude, longitude) float32 30MB ... v10 (time, latitude, longitude) float32 30MB ... t2m (time, latitude, longitude) float32 30MB ... sst (time, latitude, longitude) float32 30MB ... sp (time, latitude, longitude) float32 30MB ... Attributes: Conventions: CF-1.6 history: 2024-03-12 10:37:17 GMT by grib_to_netcdf-2.25.1: /opt/ecmw...
You just loaded an xarray
dataset, as introduced on the first day. This dataset contains 5 variables covering the Northeastern United States along with their respective coordinates. With this dataset, you have access to our best estimates of climate parameters with a temporal resolution of 1 hour and a spatial resolution of 1/4 degree (i.e. grid points near the Equator represent a ~25 km x 25 km region). This is a lot of data, but still just a fraction of the data available through the full ERA5 dataset.
Section 1.2: Selecting Regions of Interest#
The global ERA5 data over the entire time range is so large that even just one variable would be too large to store on your computer. Here we use preprocessed slices of an example region, to load another region (i.e., a spatial subset) of the data, please check out the get_ERA5_reanalysis_data.ipynb
notebook. In this first example, you will load air surface temperature at 2 meters data for a small region in the Northeastern United States. In later tutorials, you will have the opportunity to select a region of your choice and explore other climate variables.
The magnitude of the wind vector represents the wind speed
which you will use later in the tutorial for time series comparison and discuss in more detail in Tutorial 4. We will calculate that here and add it to our dataset.
# compute ten-meter wind speed, the magnitude of the wind vector
ERA5_allvars["wind_speed"] = np.sqrt(
ERA5_allvars["u10"] ** 2
+ ERA5_allvars["v10"] ** 2
)
# add name and units to the metadata:
ERA5_allvars["wind_speed"].attrs[
"long_name"
] = "10-meter wind speed" # assigning the long name to the attributes
ERA5_allvars["wind_speed"].attrs["units"] = "m/s" # assigning units
ERA5_allvars
<xarray.Dataset> Size: 182MB Dimensions: (longitude: 101, latitude: 101, time: 744) Coordinates: * longitude (longitude) float32 404B -90.0 -89.75 -89.5 ... -65.25 -65.0 * latitude (latitude) float32 404B 55.2 54.95 54.7 ... 30.7 30.45 30.2 * time (time) datetime64[ns] 6kB 2018-03-01 ... 2018-03-31T23:00:00 Data variables: u10 (time, latitude, longitude) float32 30MB -3.345 ... -3.221 v10 (time, latitude, longitude) float32 30MB 2.463 2.297 ... -2.706 t2m (time, latitude, longitude) float32 30MB ... sst (time, latitude, longitude) float32 30MB ... sp (time, latitude, longitude) float32 30MB ... wind_speed (time, latitude, longitude) float32 30MB 4.154 4.124 ... 4.207 Attributes: Conventions: CF-1.6 history: 2024-03-12 10:37:17 GMT by grib_to_netcdf-2.25.1: /opt/ecmw...
Section 2: Plotting Spatial Maps of Reanalysis Data#
First, let’s plot the region’s surface temperature for the first time step of the reanalysis dataset. To do this let’s extract the 2m air temperature data from the dataset that contains all the variables.
ds_surface_temp_2m = ERA5_allvars.t2m
ds_surface_temp_2m
<xarray.DataArray 't2m' (time: 744, latitude: 101, longitude: 101)> Size: 30MB [7589544 values with dtype=float32] Coordinates: * longitude (longitude) float32 404B -90.0 -89.75 -89.5 ... -65.25 -65.0 * latitude (latitude) float32 404B 55.2 54.95 54.7 54.45 ... 30.7 30.45 30.2 * time (time) datetime64[ns] 6kB 2018-03-01 ... 2018-03-31T23:00:00 Attributes: units: K long_name: 2 metre temperature
We will be plotting this a little bit differently than you have previously plotted a map (and differently from how you will plot in most tutorials) so we can look at a few times steps interactively later. To do this we are using the packages geoviews and holoviews.
holoviews.extension("bokeh")
dataset_plot = gv.Dataset(ds_surface_temp_2m.isel(time=0)) # select the first time step
# create the image
images = dataset_plot.to(
gv.Image, ["longitude", "latitude"], ["t2m"], "hour"
)
# aesthetics, add coastlines etc.
images.opts(
cmap="coolwarm",
colorbar=True,
width=600,
height=400,
projection=ccrs.PlateCarree(),
clabel="2m Air Temperature (K)",
) * gf.coastline
In the above figure, coastlines are shown as black lines. Most of the selected region is land, with some ocean (lower right) and a lake (top middle).
Next, we will examine variability at two different frequencies using interactive plots:
Hourly variability
Daily variability
Note that in the previous tutorial, you computed the monthly variability, or climatology, but here you only have one month of data loaded (March 2018). If you are curious about longer timescales you will visit this in the next tutorial!
# average temperatures over the whole month after grouping by the hour
ds_surface_temp_2m_hour = ds_surface_temp_2m.groupby("time.hour").mean()
# interactive plot of hourly frequency of surface temperature
# this cell may take a little longer as it contains several maps in a single plotting function
dataset_plot = gv.Dataset(
ds_surface_temp_2m_hour.isel(hour=slice(0, 12))
) # only the first 12 time steps (midnight to noon), as it is a time-consuming task
images = dataset_plot.to(
gv.Image, ["longitude", "latitude"], ["t2m"], "hour"
)
images.opts(
cmap="coolwarm",
colorbar=True,
width=600,
height=400,
projection=ccrs.PlateCarree(),
clabel="2m Air Temperature (K)",
) * gf.coastline
# average temperatures over the whole day after grouping by the day
ds_surface_temp_2m_day = ds_surface_temp_2m.groupby("time.day").mean()
# interactive plot of daily frequency of surface temperature
# this cell may take a little longer as it contains several maps in a single plotting function holoviews.extension('bokeh')
dataset_plot = gv.Dataset(
ds_surface_temp_2m_day.isel(day=slice(0, 10))
) # only the first 10 time steps, as it is a time-consuming task
images = dataset_plot.to(
gv.Image, ["longitude", "latitude"], ["t2m"], "day"
)
images.opts(
cmap="coolwarm",
colorbar=True,
width=600,
height=400,
projection=ccrs.PlateCarree(),
clabel="2m Air Temperature (K)",
) * gf.coastline
Section 3: Plotting Time Series of Reanalysis Data#
Section 3.1: Surface Air Temperature Time Series#
You have demonstrated that there are a lot of changes in surface temperature within a day and between days. It is crucial to understand this temporal variability in the data when performing climate analysis.
Rather than plotting interactive spatial maps for different timescales, in this last section, you will create a time series of surface air temperature from the data you have already examined to look at variability on longer than daily timescales. Instead of taking the mean in time to create maps, you will now take the mean in space to create time series.
Note that the spatially averaged data will now only have a time coordinate, making it a time series (ts).
# find weights (this is a regular grid so we can use cos(latitude))
weights = np.cos(np.deg2rad(ds_surface_temp_2m.latitude))
weights.name = "weights"
# take the weighted spatial mean since the latitude range of the region of interest is large
ds_surface_temp_2m_ts = ds_surface_temp_2m.weighted(weights).mean(["longitude", "latitude"])
ds_surface_temp_2m_ts
<xarray.DataArray 't2m' (time: 744)> Size: 3kB array([279.38586, 279.24496, 278.75916, 278.48044, 278.38004, 277.85248, 277.7561 , 277.6406 , 277.30057, 277.34055, 276.9405 , 276.95288, 276.95398, 277.10703, 278.01758, 278.59567, 278.9511 , 279.9758 , 280.1939 , 280.3524 , 280.50916, 280.3102 , 279.29538, 279.03204, 278.74936, 278.56732, 277.9061 , 277.49103, 277.3653 , 276.83374, 276.65958, 276.43158, 276.01898, 275.6874 , 274.97195, 274.82266, 274.67667, 274.64728, 275.6112 , 276.29434, 276.76215, 278.043 , 278.36322, 278.5558 , 278.84253, 278.82635, 278.00223, 277.5949 , 277.01642, 276.5853 , 275.99768, 275.46774, 275.2386 , 274.70895, 274.56573, 274.37988, 273.92798, 273.75653, 273.44467, 273.3855 , 273.365 , 273.5404 , 274.80493, 275.93512, 276.61172, 278.06274, 278.51086, 278.84872, 279.2242 , 279.33823, 278.44324, 278.01486, 277.43558, 277.07 , 276.18137, 275.55072, 275.31705, 274.66705, 274.45557, 274.19122, 273.74622, 273.60147, 273.035 , 272.9576 , 272.9595 , 273.19333, 274.46942, 275.58524, 276.23816, 277.4844 , 277.91852, 278.2262 , 278.5752 , 278.64044, 277.69016, 277.18253, 276.62097, 276.20938, 275.40378, 274.88394, 274.64395, 274.0723 , 273.86444, 273.6995 , 273.19196, 272.9751 , 272.71893, 272.668 , 272.5954 , 272.92517, 274.05133, 274.93915, 275.5301 , 276.71628, 277.15103, 277.45184, 277.86447, 277.94495, 277.17047, 276.80698, ... 277.60468, 277.5178 , 277.13773, 277.02316, 277.21945, 277.22427, 277.45337, 277.92896, 279.2616 , 279.88177, 280.26828, 281.0504 , 281.27747, 281.47263, 281.65076, 281.78964, 281.1339 , 280.93436, 280.7854 , 280.54193, 280.02304, 279.7768 , 279.6665 , 279.2424 , 279.14877, 279.0568 , 278.7595 , 278.67413, 278.34985, 278.3431 , 278.46454, 278.88058, 279.92087, 280.81958, 281.3288 , 282.1876 , 282.57104, 282.85272, 283.1092 , 283.21652, 282.59885, 282.37424, 282.03394, 281.8969 , 281.30286, 281.03802, 280.90686, 280.50687, 280.38593, 280.2525 , 279.84207, 279.70578, 279.46384, 279.288 , 279.5107 , 279.9113 , 281.1815 , 281.86655, 282.19662, 283.00238, 283.26505, 283.41943, 283.3755 , 283.34506, 282.17163, 281.88385, 281.54935, 281.29965, 280.55542, 280.20657, 280.10605, 279.5731 , 279.42786, 279.28293, 278.843 , 278.61072, 278.10794, 278.04105, 277.94052, 278.08585, 278.97806, 279.4434 , 279.75375, 280.4592 , 280.65598, 280.76498, 280.8995 , 280.85284, 279.99188, 279.6526 , 279.11496, 278.7411 , 277.62344, 276.84186, 276.63025, 275.65338, 275.29016, 275.02515, 274.21457, 273.98886, 273.75955, 273.6511 , 273.73898, 274.4401 , 276.57278, 277.7429 , 278.26334, 279.3847 , 279.69296, 279.9276 , 280.28668, 280.41367, 279.54553, 279.22537], dtype=float32) Coordinates: * time (time) datetime64[ns] 6kB 2018-03-01 ... 2018-03-31T23:00:00
# plot the time series of surface temperature
fig, ax = plt.subplots()
ax.plot(ds_surface_temp_2m_ts.time, ds_surface_temp_2m_ts)
# aesthetics
ax.set_xlabel("Time (hours)")
ax.set_ylabel("2m Air \nTemperature (K)")
ax.xaxis.set_tick_params(rotation=45)
ax.grid(True)
Questions 3.1#
What is the dominant source of the high frequency (short timescale) variability?
What drives the lower frequency variability?
Would the ENSO variablity that you computed in the previous tutorial show up here? Why or why not?
Submit your feedback#
Show code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Questions_3_1")
Section 3.2: Comparing Time Series of Multiple Variables#
Below you will calculate the time series of the surface air temperature which we just plotted, alongside the time series of several other ERA5 variables for the same period and region: 10-meter wind speed (wind_speed
), atmospheric surface pressure (sp
), and sea surface temperature (sst
).
ERA5_allvars_ts = ERA5_allvars.weighted(weights).mean(["longitude", "latitude"])
ERA5_allvars_ts
<xarray.Dataset> Size: 24kB Dimensions: (time: 744) Coordinates: * time (time) datetime64[ns] 6kB 2018-03-01 ... 2018-03-31T23:00:00 Data variables: u10 (time) float32 3kB -0.1129 -0.1174 -0.07807 ... 0.7965 0.4342 v10 (time) float32 3kB 1.171 1.286 1.376 ... 0.3178 0.1441 0.1829 t2m (time) float32 3kB 279.4 279.2 278.8 278.5 ... 280.4 279.5 279.2 sst (time) float32 3kB 287.1 287.1 287.1 287.1 ... 286.2 286.4 286.4 sp (time) float32 3kB 9.928e+04 9.929e+04 ... 9.938e+04 9.936e+04 wind_speed (time) float32 3kB 3.937 4.035 4.114 4.185 ... 5.22 4.726 4.567
plot_vars = [
"t2m", # air temperature at 2 meters
"wind_speed", # magnitude of the wind vector, cf. Section 1
"sp", # surface air pressure
"sst", # sea surface temperature
]
fig, ax_list = plt.subplots(len(plot_vars), 1, sharex=True)
for var, ax in zip(plot_vars, ax_list): # loop through variables and figure axes
legend_entry = ERA5_allvars[var].attrs["long_name"] # create legend entry
ax.plot(ERA5_allvars_ts.time, ERA5_allvars_ts[var], label=legend_entry) # plot time series
# aesthetics
ax.set_ylabel(f'{var.capitalize()}\n({ERA5_allvars[var].attrs["units"]})') # add ylabel w/ units
ax.xaxis.set_tick_params(rotation=45) # rotate dates of xticks
ax.legend(loc='upper center') # add legend with shared location (upper center)
Summary#
In this tutorial, you learned how to access and process ERA5 reanalysis data. You are now able to select specific slices within the reanalysis dataset and perform operations such as taking spatial and temporal averages to plot them interactively.
You also looked at different climate variables to distinguish and identify the variability present at different timescales.
Resources#
Data for this tutorial can be accessed here. We summarized the download procedure in a separate notebook named get_ERA5_reanalysis_data.ipynb
.