Heatwaves

Heatwaves#

Content creators: Wil Laura

Content reviewers: Will Gregory, Paul Heubel, Laura Paccini, Jenna Pearson, Ohad Zivan

Content editors: Paul Heubel

Production editors: Paul Heubel, Konstantine Tsafatinos

Our 2024 Sponsors: CMIP, NFDI4Earth

Project Background#

In this project, you will look into the characterization of heatwaves using near-surface air temperature reanalysis data. Since we are talking about extreme events when the temperature exceeds a certain threshold for a continuous number of days, we will first analyze the global spatial and temporal distribution of air temperature. Next, we will calculate the number and timing of heatwaves for a local area, then focus on determining the percentage of a region under heat waves. Additionally, you will be able to explore its relationship with other climate drivers. Also, you are encouraged to analyze the health impact of heatwaves using an available mortality dataset. Finally, enjoy exploring the heatwaves!

Project Template#

Data Exploration Notebook#

Project Setup#

# this cell enables anaconda in google colab and has to be run At First
# it will force the kernel to restart, this is necessary to install all system dependencies of cfgrib
# which in turn allows us to open grib files via xarray
#!pip install -q condacolab
#import condacolab
#condacolab.install()

# google colab installs

#!mamba install --quiet cartopy cdsapi cfgrib eccodes numpy==1.26.4

# import packages
#import xclim
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
#import matplotlib.dates as mdates
import cartopy.feature as cfeature
import cartopy.crs as ccrs

#from xclim.core.calendar import percentile_doy

Helper functions#

Figure settings#

ERA5-Land hourly data from 1950 to present#

This ERA5 Reanalysis dataset of hourly data has an increased spatial resolution and focuses on the land variable evolution over the last decades. It is updated regularly by ECMWF and accessible via the CDS API (cf. get_ERA5_reanalysis_data.ipynb). A similar dataset of lower temporal resolution, i.e. monthly averages can be found here.

Depending on your research question it is essential to choose an adequate frequency, however, due to the huge amount of data available, it might become necessary to focus on a regional subset of the ERA5 dataset.

In the following, we show how we downloaded global ERA5-Land data via the CDS API to answer Q2 by calculating global temperature trends. Please note that the API request serves as an example and the downloading process should not be triggered if not necessary. Please think about an adequate frequency and domain of interest beforehand, to request a subset that is sufficient to answer your questions.

# In your home directory (e.g., /home/jovyan/), create a file called .cdsapirc with the following content:
# Replace your-uid:your-api-key with your actual credentials. Do not eliminate the "\n"

# uncomment the following code:
#with open("/home/jovyan/.cdsapirc", "w") as f:
#    f.write("url: https://cds.climate.copernicus.eu/api\n")
#    f.write("key: your-api-key with your actual credentials\n")  # Replace with your actual key

#if you want to know what is your API key in CDS go to: https://cds.climate.copernicus.eu/how-to-api

import cdsapi

#c = cdsapi.Client()

# Uncomment the following block after adjusting it according to your research question
# and after successfully working through the `get_ERA5_reanalysis_data.ipynb` notebook.

#c.retrieve(
#    'reanalysis-era5-land',
#    {
#        'variable': '2m_temperature',
#        'year': ['1974', '1975', '1976', '1977', '1978',
#                 '1979', '1980', '1981', '1982', '1983',
#                 '1984', '1985', '1986', '1987', '1988',
#                 '1989', '1990', '1991', '1992', '1993',
#                 '1994', '1995', '1996', '1997', '1998',
#                 '1999', '2000', '2001', '2002', '2003',
#                 '2004', '2005', '2006', '2007', '2008',
#                 '2009', '2010', '2011', '2012', '2013',
#                 '2014', '2015', '2016', '2017', '2018',
#                 '2019', '2020', '2021', '2022', '2023'],
#        'month': ['01','02','03','04','05','06','07','08','09','10','11','12'],
#        'day': '15',
#        'time': '12:00',
#        'grid': ['0.4', '0.4'],
#        'format': 'grib',
#        'download_format": "unarchived"
#
#    },
#    'reanalysis-era5-land_1974_2023_04x04.grib')

As you can see in the request code block, we downloaded the 2m_temperature / \(\text{t2m}\) variable from the reanalysis-era5-land at noon on every 15th of the month in the last 50 years. In other words, the requested data is not averaged over the whole month but just a sample. To reduce the resolution, we chose a grid of 0.4° in both spatial dimensions. As we want to calculate global trends over the whole time period, this choice should be adequate and save us a few computational-intensive averaging calculations. Furthermore, it helps to emphasize how the downloaded data looks like.

The output is given as a file named reanalysis-era5-land_1974_2023_04x04.grib in the grib format, which needs a small addition in the known file reading method xr.open_dataset(path, engine='cfgrib'), check out this resource for more information. Additionally, we get an idx file that is experimental and useful if the file is opened more often but can be ignored or deleted.

Again we uploaded these files to the OSF cloud for simple data retrieval, and we converted the file format to .nc (NetCDF) as an additional option. The following lines help to open both file types.

Note that the frequency of our example file reanalysis-era5-land_1974_2023_04x04.grib is monthly and not daily, hence no further question of the template can be answered by using it. Please increase the frequency and spatial resolution, and reduce the domain of interest when downloading the data to allow for investigations of regional heatwaves.

# specify filename and filetype
filetype = 'grib'
#filetype = 'nc'
fname_ERA5 = f"reanalysis-era5-land_1974_2023_04x04.{filetype}"

# check whether the specified path/file exists or not (locally or in the JupyterHub)
isExist = os.path.exists(fname_ERA5)

# load data and create data set
if isExist:
    _ = print(f'The file {fname_ERA5} exists locally.\n Loading the data ...\n')
    if filetype == 'grib':
        ds_global = xr.open_dataset(fname_ERA5, engine='cfgrib')
    elif filetype == 'nc':
        ds_global = xr.open_dataset(fname_ERA5)
    else:
        raise ("Please choose an appropriate file type: 'nc' or 'grib'.")

else:
    _ = print(f'The file {fname_ERA5} does not exist locally and has to be downloaded from OSF.\nDownloading the data ...\n')

    # retrieve the grib file from the OSF cloud storage
    if filetype == 'grib':
        link_id = "6d9mf"
    elif filetype == 'nc':
        link_id = "8v63z"
    else:
        raise ("Please choose an appropriate file type: 'nc' or 'grib'.")

    url_grib = f"https://osf.io/download/{link_id}/"

    # The following line is the correct approach, however, it sometimes raises an error that could not be solved by the curriculum team
    # (cf. https://github.com/ecmwf/cfgrib/blob/master/README.rst & https://github.com/pydata/xarray/issues/6512)
    # We, therefore, recommend to download the file separately if this EOFError arises.

    fcached = pooch_load(url_grib, fname_ERA5)

    try:
        if filetype == 'grib':
            ds_global = xr.open_dataset(fcached, engine='cfgrib')
        elif filetype == 'nc':
            ds_global = xr.open_dataset(fcached)
    except EOFError:
        print(f'The cached .grib file could not be parsed with Xarray.\nPlease download the file to your local directory via {url_grib} or download its NetCDF equivalent.')

print(ds_global)

Downloading data from 'https://osf.io/download/6d9mf/' to file '/tmp/reanalysis-era5-land_1974_2023_04x04.grib'.

The file reanalysis-era5-land_1974_2023_04x04.grib does not exist locally and has to be downloaded from OSF.
Downloading the data ...

SHA256 hash of downloaded file: 91297515da329e89e4035e8e94734046ea2003259a29f10bc13069cce3654cc7
Use this value as the 'known_hash' argument of 'pooch.retrieve' to ensure that the file hasn't changed if it is downloaded again in the future.

<xarray.Dataset> Size: 974MB
Dimensions:     (time: 600, latitude: 451, longitude: 900)
Coordinates:
    number      int64 8B ...
  * time        (time) datetime64[ns] 5kB 1974-01-15 1974-02-15 ... 2023-12-15
    step        timedelta64[ns] 8B ...
    surface     float64 8B ...
  * latitude    (latitude) float64 4kB 90.0 89.6 89.2 88.8 ... -89.2 -89.6 -90.0
  * longitude   (longitude) float64 7kB 0.0 0.4 0.8 1.2 ... 358.8 359.2 359.6
    valid_time  (time) datetime64[ns] 5kB ...
Data variables:
    t2m         (time, latitude, longitude) float32 974MB ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2025-07-11T03:50 GRIB to CDM+CF via cfgrib-0.9.1...

# plot the mean 2m temperature of 1974 as example
t2m_1974 = ds_global.sel(time=slice('1974')).mean(dim='time')
_ = t2m_1974[list(t2m_1974.keys())[0]].plot()

../../_images/fb026bf6b26e807ab6774858aca00d91fa27a81170a133756cd899f114eba7ad.png

Weekly Mortality Data (Optional)#

The Organisation for Economic Co-operation and Development (OECD) provides weekly mortality data for 38 countries. The list of countries can be found in the OECD data explorer in the filters section, under reference area. This dataset can be used to analyze the impact of heatwaves on health through the general mortality of a country.

# read the mortality data from a csv file
link_id = "rh3mp"
url = f"https://osf.io/download/{link_id}/"
#data_mortality = pd.read_csv("Weekly_mortality_OECD.csv")
data_mortality = pd.read_csv(url)
data_mortality.info()
data_mortality.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16168 entries, 0 to 16167
Data columns (total 16 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   STRUCTURE_NAME            16168 non-null  object
 1   ACTION                    16168 non-null  object
 2   REF_AREA                  16168 non-null  object
 3   Reference area            16168 non-null  object
 4   FREQ                      16168 non-null  object
 5   Frequency of observation  16168 non-null  object
 6   MEASURE                   16168 non-null  object
 7   Measure                   16168 non-null  object
 8   AGE                       16168 non-null  object
 9   Age                       16168 non-null  object
 10  SEX                       16168 non-null  object
 11  Sex                       16168 non-null  object
 12  UNIT_MEASURE              16168 non-null  object
 13  Unit of measure           16168 non-null  object
 14  TIME_PERIOD               16168 non-null  object
 15  OBS_VALUE                 16168 non-null  int64 
dtypes: int64(1), object(15)
memory usage: 2.0+ MB

	STRUCTURE_NAME	ACTION	REF_AREA	Reference area	FREQ	Frequency of observation	MEASURE	Measure	AGE	Age	SEX	Sex	UNIT_MEASURE	Unit of measure	TIME_PERIOD	OBS_VALUE
0	Mortality by week	I	SWE	Sweden	W	Weekly	M	Mortality	_T	Total	_T	Total	DT	Deaths	2015-W01	1927
1	Mortality by week	I	SWE	Sweden	W	Weekly	M	Mortality	_T	Total	_T	Total	DT	Deaths	2015-W02	1966
2	Mortality by week	I	SWE	Sweden	W	Weekly	M	Mortality	_T	Total	_T	Total	DT	Deaths	2015-W03	1935
3	Mortality by week	I	SWE	Sweden	W	Weekly	M	Mortality	_T	Total	_T	Total	DT	Deaths	2015-W04	1946
4	Mortality by week	I	SWE	Sweden	W	Weekly	M	Mortality	_T	Total	_T	Total	DT	Deaths	2015-W05	1975

Hint for Q3#

For this question you will calculate the percentiles, you can read more about percentiles here and e.g. in W2D3 Tutorial 1. Furthermore, as a recommendation for this question, a definition was given to calculate heatwaves, however, there is a great diversity of definitions, you can read about it in the following article: Perkins & Alexandar (2013).

Hint for Q4#

For Question 4, to understand the method of calculating the percentage of an area under heatwaves, please read the following article: Silva et al. (2022).

Hint for Q5#

The following articles will be helpful: Heo & Bell (2019) (not open access) and Smith et al. (2012).

Hint for Q6#

The following article will be helpful: Reddy et al. (2022).

Hint for Q7#

The following article will help you learn about a method to determine the influence of heatwaves on health by analyzing mortality: Nori-Sarma et al. (2019).