Open In Colab   Open in Kaggle

Tutorial 1: Internal Climate Variability and Single-model Ensembles#

Week 2, Day 1, An Ensemble of Futures

Content creators: Brodie Pearson, Julius Busecke, Tom Nicholas, Paridhi Rustogi, Sam Ditkovsky

Content reviewers: Mujeeb Abdulfatai, Nkongho Ayuketang Arreyndip, Jeffrey N. A. Aryee, Younkap Nina Duplex, Sloane Garelick, Paul Heubel, Zahra Khodakaramimaghsoud, Peter Ohue, Jenna Pearson, Abel Shibu, Derick Temfack, Peizhen Yang, Cheng Zhang, Chi Zhang, Ohad Zivan

Content editors: Paul Heubel, Jenna Pearson, Ohad Zivan, Chi Zhang

Production editors: Wesley Banfield, Paul Heubel, Jenna Pearson, Konstantine Tsafatinos, Chi Zhang, Ohad Zivan

Our 2024 Sponsors: CMIP, NFDI4Earth

Tutorial Objectives#

Estimated timing for tutorial: 25 minutes

In this tutorial, we will learn about the concept of internal climate variability, how it influences the predictability of climate phenomena and how it contributes to uncertainty in CMIP6 model estimates. We will work with a single-model ensemble, which utilizes the MPI-ESM1-2-LR model from CMIP6, to isolate and quantify internal climate variability.

By the end of this tutorial, you will:

  • Understand the importance of internal climate variability and its role in climate prediction and model uncertainty.

  • Create and evaluate a single-model ensemble using IPCC uncertainty bands, providing a visual representation of model uncertainty.

Setup#

# installations ( uncomment and run this cell ONLY when using google colab or kaggle )

# !pip install condacolab &> /dev/null
# import condacolab
# condacolab.install()

# # Install all packages in one call (+ use mamba instead of conda), this must in one line or code will fail
# !mamba install xarray-datatree intake-esm gcsfs xmip aiohttp nc-time-axis cf_xarray xarrayutils &> /dev/null
# imports
import intake
import matplotlib.pyplot as plt
import xarray as xr

from xmip.preprocessing import combined_preprocessing
from xmip.postprocessing import _parse_metric
from datatree import DataTree
/opt/hostedtoolcache/Python/3.9.20/x64/lib/python3.9/site-packages/esmpy/interface/loadESMF.py:94: VersionWarning: ESMF installation version 8.8.0 beta snapshot, ESMPy version 8.8.0b4
  warnings.warn("ESMF installation version {}, ESMPy version {}".format(

Install and import feedback gadget#

Hide code cell source
# @title Install and import feedback gadget

!pip3 install vibecheck datatops --quiet

from vibecheck import DatatopsContentReviewContainer
def content_review(notebook_section: str):
    return DatatopsContentReviewContainer(
        "",  # No text prompt
        notebook_section,
        {
            "url": "https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab",
            "name": "comptools_4clim",
            "user_key": "l5jpxuee",
        },
    ).render()


feedback_prefix = "W2D1_T1"
[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: pip install --upgrade pip

Figure settings#

Hide code cell source
# @title Figure settings
import ipywidgets as widgets  # interactive display

plt.style.use(
    "https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle"
)

%matplotlib inline

Helper functions#

Hide code cell source
# @title Helper functions

def readin_cmip6_to_datatree(facet_dict):
    # open an intake catalog containing the Pangeo CMIP cloud data
    col = intake.open_esm_datastore(
        "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
    )

    # from the full `col` object, create a subset using facet search
    cat = col.search(
        source_id=facet_dict['source_id'],
        variable_id=facet_dict['variable_id'],
        member_id=facet_dict['member_id'],
        table_id=facet_dict['table_id'],
        grid_label=facet_dict['grid_label'],
        experiment_id=facet_dict['experiment_id'],
        require_all_on=facet_dict['require_all_on']  # make sure that we only get models which have all of the above experiments
    )

    # convert the sub-catalog into a datatree object, by opening each dataset into an xarray.Dataset (without loading the data)
    kwargs = dict(
        preprocess=combined_preprocessing,  # apply xMIP fixes to each dataset
        xarray_open_kwargs=dict(
            use_cftime=True
        ),  # ensure all datasets use the same time index
        storage_options={
            "token": "anon"
        },  # anonymous/public authentication to google cloud storage
    )

    cat.esmcat.aggregation_control.groupby_attrs = ["source_id", "experiment_id"]
    dt = cat.to_datatree(**kwargs)

    return dt



def global_mean(ds: xr.Dataset) -> xr.Dataset:
    """Global average, weighted by the cell area"""
    return ds.weighted(ds.areacello.fillna(0)).mean(["x", "y"], keep_attrs=True)


# Calculate anomaly to reference period
def datatree_anomaly(dt):
    dt_out = DataTree()
    for model, subtree in dt.items():
        ref = dt[model]["historical"].ds.sel(time=slice("1950", "1980")).mean()
        dt_out[model] = subtree - ref
    return dt_out

Video 1: Speaker Intro Brodie Pearson#

If you want to download the slides: https://osf.io/download/vfn8z/

Video 2: Internal Climate Variability#

Submit your feedback#

Hide code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Internal_Climate_Variability_Video")
If you want to download the slides: https://osf.io/download/dsm6a/

Submit your feedback#

Hide code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Internal_Climate_Variability_Slides")

Section 1: Internal Climate Variability & Single-model Ensembles#

One of the CMIP6 models we are using in today’s tutorials, MPI-ESM1-2-LR, is part of single-model ensemble, where its modelling centre carried out multiple simulations of the model for at least one of the CMIP6 experiments. To create a single-model ensemble, the modelling centre will run a model using the same forcing data, but with small changes, often via the initial conditions. Due to the chaotic nature of the climate system, small changes in initial conditions lead to differences in the modelled climate as time progresses. These differences are often referred to as internal variability. Running a single-model ensemble, and comparing the results to simulations using other models and/or different forcing datasets, allows us to separate the internal variability of the simulated climate from the variability due to model differences (which you’ll explore in the next tutorial) and/or external-forcing. If you are interested in learning more about large ensemble climate models, you can read this paper.

Let’s take advantage of this single-model ensemble to quantify the internal variability of this model’s simulated climate.

Section 1.1: Loading CMIP6 SST Data with xarray#

As a reminder, these ESMs simulate several systems (ocean, atmosphere, cryosphere, land) that are coupled to each other, and each system has its own variables, physics, and discretizations (grid & timestep).

“EarthSystemModel”

Atmospheric Model Schematic (Credit: Wikipedia)

Let’s repeat the CMIP6 loading method that we learned in Tutorial 6 on last week’s Introduction to Climate Modeling day (W1D5).

Although we will only work with monthly ocean SST data today, the methods introduced can easily be applied/extended to load and analyze various CMIP6 variables, including from other components of the Earth system.

To learn more about CMIP, including additional methods to access CMIP data, please see our CMIP Resource Bank and the CMIP website.

As a reminder, the facets that have to be specified for CMIP6, along with the facet choice(s) we make in this tutorial, are:

  1. variable_id: tos = sea surface temperature

  2. source_id: The CMIP6 model(s) that we want data from

  3. table_id: Omon (ocean monthly output)

  4. grid_id: gn = data on the model’s native grid

  5. experiment_id: ssp585 (we’ll discuss experiments later today)

  6. member_id: r1i1p1f1 etc.

Now, let’s repeat our CMIP6 loading method from the previous tutorial.

Let us load 5 different realizations of the MPI-ESM1-2-LR experiments (r1i1p1f1 through r5i1p1f1). This numbering convention means they were each initialized using a different time-snapshot of the base/spin-up simulation. We use the helper functions readin_cmip6_to_datatree() to access the tos (\(SST\)) and areacello (grid cell area) variables, and datatree_anomaly() to calculate the anomalies. A DataTree method named .map_over_subtree() is used to add the grid cell area to the datatree and to average every ensemble member globally via the global_mean() helper function.

# dictionary of facets for query of surface temperature data
facet_dict = { "source_id":"MPI-ESM1-2-LR",
    "variable_id":"tos",
    "member_id":["r1i1p1f1", "r2i1p1f1", "r3i1p1f1", "r4i1p1f1", "r5i1p1f1"],
    "table_id":"Omon",
    "grid_label":"gn",
    "experiment_id":"historical",
    "require_all_on":["source_id", "member_id"]
    }

# dictionary for query of cell area metric
facet_dict_area = { "source_id":"MPI-ESM1-2-LR",
    "variable_id":"areacello",
    "member_id":"r1i1p1f1",
    "table_id":"Ofx",
    "grid_label":"gn",
    "experiment_id":"historical",
    "require_all_on":"source_id"
    }

# search for temperature and area data and return datatree objects
dt_ensemble = readin_cmip6_to_datatree(facet_dict)
dt_area = readin_cmip6_to_datatree(facet_dict_area)


# add the area (we can reuse the area from before,
# since for a given model the horizontal area does not vary between members)
dt_ensemble_with_area = DataTree()
for model, subtree in dt_ensemble.items():
    metric = dt_area["MPI-ESM1-2-LR"]["historical"].ds["areacello"].squeeze()
    dt_ensemble_with_area[model] = subtree.map_over_subtree(_parse_metric, metric)

# global average
# average every dataset in the tree globally
dt_ensemble_gm = dt_ensemble_with_area.map_over_subtree(global_mean)

# calculate anomaly
dt_ensemble_gm_anomaly = datatree_anomaly(dt_ensemble_gm)

# coarsen data for plotting the historical values of the 5 members
dt_ensemble_gm_anomaly_hist = dt_ensemble_gm_anomaly['MPI-ESM1-2-LR']['historical'].ds.coarsen(time=12).mean().tos
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id/experiment_id'
100.00% [1/1 00:04<00:00]
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id/experiment_id'
100.00% [1/1 00:00<00:00]

Coding Exercise 1.1#

Complete the following code to:

  1. Plot the historical experiment data for each realization, using a distinct color for each realization.

fig, ax = plt.subplots()

# plot the data dt_ensemble_gm_anomaly_hist with a different color for each member id
_ = ...

ax.set_title(
    "Global Mean SST Anomaly from a 5-member single-model ensemble"
)

plt.axhline(0, linestyle='dashed', color='lightgrey')
ax.set_ylabel("Global Mean SST Anomaly (°C)")
ax.set_xlabel("Time (years)")
Text(0.5, 0, 'Time (years)')
../../../_images/4ae89d1ecec785fe274f931e19cf65a8064bae8a60255b46691edd01cf00ac7b.png

Click for solution

Example output:

Solution hint

Submit your feedback#

Hide code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Coding_Exercise_1_1")

Coding Exercise 1.2#

Complete the following code to:

  1. Plot the mean SST anomaly of the single-model ensemble, with an envelope visualizing the spread across members (excluding the lowest and highest 17%)

# First calculate the tos/SST annual mean
da = (
    dt_ensemble_gm_anomaly["MPI-ESM1-2-LR"]['historical']
    .ds.tos.coarsen(time=12)
    .mean()
    .load()
)
fig, ax = plt.subplots()

# calculate the mean across ensemble members and plot it
da_mean = ...
_ = ...

# shading representing spread between members
x = da.time.data
# diagnose the lower range of the likely bounds
da_lower = ...
# diagnose the upper range of the likely bounds
da_upper = ...
# use ax.fill_between() and above bounds to shade likely range
_ = ...

# aesthetics
plt.axhline(0, linestyle='dashed', color='lightgrey')
ax.set_title(
    "Global Mean SST Anomaly from a 5-member single-model ensemble"
)
ax.set_ylabel("Global Mean SST Anomaly (°C)")
ax.set_xlabel("Time (years)")
ax.legend()
/tmp/ipykernel_119392/308364069.py:23: UserWarning: No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
  ax.legend()
<matplotlib.legend.Legend at 0x7fbf54685160>
../../../_images/9b03bba8e2026f939c947061bf3840fb1a9c4beefbfec48bb58c7f6cdd20bdba.png

Click for solution

Example output:

Solution hint

Submit your feedback#

Hide code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Coding_Exercise_1_2")

Summary#

In this tutorial, we explored the internal climate variability and its implications for climate modeling and prediction. We discussed the utility of single-model ensembles for isolating the effects of internal variability by contrasting simulations with identical physics, numerics, and discretization. We quantified the internal variability of MPI-ESM1-2-LR model’s simulated climate. Through this tutorial, we better understand the boundaries of climate prediction and the effect of internal variability as a source of uncertainty in CMIP6 models.

Resources#

This tutorial uses data from the simulations conducted as part of the CMIP6 multi-model ensemble.

For examples on how to access and analyze data, please visit the Pangeo Cloud CMIP6 Gallery

For more information on what CMIP is and how to access the data, please see this page.

For more information about large ensemble climate modelling see this paper.