Open In Colab   Open in Kaggle

Changes in Land Cover: Albedo and Carbon Sequestration#

Content creators: Oz Kira, Julius Bamah

Content reviewers: Yuhan Douglas Rao, Abigail Bodner

Content editors: Zane Mitrevica, Natalie Steinemann, Jenna Pearson, Chi Zhang, Ohad Zivan

Production editors: Wesley Banfield, Jenna Pearson, Chi Zhang, Ohad Zivan

Our 2023 Sponsors: NASA TOPS, Google DeepMind, and CMIP

# @title Project Background

from ipywidgets import widgets
from IPython.display import YouTubeVideo
from IPython.display import IFrame
from IPython.display import display


class PlayVideo(IFrame):
    def __init__(self, id, source, page=1, width=400, height=300, **kwargs):
        self.id = id
        if source == "Bilibili":
            src = f"https://player.bilibili.com/player.html?bvid={id}&page={page}"
        elif source == "Osf":
            src = f"https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render"
        super(PlayVideo, self).__init__(src, width, height, **kwargs)


def display_videos(video_ids, W=400, H=300, fs=1):
    tab_contents = []
    for i, video_id in enumerate(video_ids):
        out = widgets.Output()
        with out:
            if video_ids[i][0] == "Youtube":
                video = YouTubeVideo(
                    id=video_ids[i][1], width=W, height=H, fs=fs, rel=0
                )
                print(f"Video available at https://youtube.com/watch?v={video.id}")
            else:
                video = PlayVideo(
                    id=video_ids[i][1],
                    source=video_ids[i][0],
                    width=W,
                    height=H,
                    fs=fs,
                    autoplay=False,
                )
                if video_ids[i][0] == "Bilibili":
                    print(
                        f"Video available at https://www.bilibili.com/video/{video.id}"
                    )
                elif video_ids[i][0] == "Osf":
                    print(f"Video available at https://osf.io/{video.id}")
            display(video)
        tab_contents.append(out)
    return tab_contents


video_ids = [('Youtube', 'qHZJeZnvQ60'), ('Bilibili', 'BV1fh4y1j7LX')]
tab_contents = display_videos(video_ids, W=730, H=410)
tabs = widgets.Tab()
tabs.children = tab_contents
for i in range(len(tab_contents)):
    tabs.set_title(i, video_ids[i][0])
display(tabs)
# @title Tutorial slides
# @markdown These are the slides for the videos in all tutorials today
from IPython.display import IFrame
link_id = "w8ny7"

The global radiative budget is affected by land cover (e.g., forests, grasslands, agricultural fields, etc.), as different classifications of land cover have different levels of reflectance, or albedo. Vegetation also sequesters carbon at the same time, potentially counteracting these radiative effects.

In this project, you will evaluate the albedo change vs. carbon sequestration. In addition, you will track significant land cover changes, specifically the creation and abandonment of agricultural land.

In this project, you will have the opportunity to explore terrestrial remote sensing (recall our W1D3 tutorial on remote sensing) and meteorological data from GLASS and ERA5. The datasets will provide information on reflectance, albedo, meteorological variables, and land cover changes in your region of interest. We encourage you to investigate the relationships between these variables and their impact on the global radiative budget. Moreover, you can track agricultural land abandonment and analyze its potential connection to climate change. This project aligns well with the topics covered in W2D3, which you are encouraged to explore further.

Project Template#

Project Template

Note: The dashed boxes are socio-economic questions.

Data Exploration Notebook#

Project Setup#

# google colab installs
# !pip install cartopy
# !pip install DateTime
# !pip install matplotlib
# !pip install pyhdf
# !pip install numpy
# !pip install pandas
# !pip install modis-tools
# imports
import numpy as np
from netCDF4 import Dataset
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import pooch
import xarray as xr
import os

import intake
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr

from xmip.preprocessing import combined_preprocessing
from xarrayutils.plotting import shaded_line_plot
from xmip.utils import google_cmip_col

from datatree import DataTree
from xmip.postprocessing import _parse_metric

import cartopy.crs as ccrs
import pooch
import os
import tempfile
import pandas as pd
from IPython.display import display, HTML, Markdown
# helper functions

def pooch_load(filelocation=None,filename=None,processor=None):
    shared_location='/home/jovyan/shared/Data/Projects/Albedo' # this is different for each day
    user_temp_cache=tempfile.gettempdir()
    
    if os.path.exists(os.path.join(shared_location,filename)):
        file = os.path.join(shared_location,filename)
    else:
        file = pooch.retrieve(filelocation,known_hash=None,fname=os.path.join(user_temp_cache,filename),processor=processor)

    return file

Obtain Land and Atmospheric Variables from CMIP6#

Here you will use the Pangeo cloud service to access CMIP6 data using the methods encountered in W1D5 and W2D1. To learn more about CMIP, including additional ways to access CMIP data, please see our CMIP Resource Bank and the CMIP website.

# open an intake catalog containing the Pangeo CMIP cloud data
col = intake.open_esm_datastore(
    "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
)
col

pangeo-cmip6 catalog with 7674 dataset(s) from 514818 asset(s):

unique
activity_id 18
institution_id 36
source_id 88
experiment_id 170
member_id 657
table_id 37
variable_id 700
grid_label 10
zstore 514818
dcpp_init_year 60
version 736
derived_variable_id 0

To see a list of CMIP6 variables and models please visit the Earth System Grid Federation (ESGF) website. Note that not all of these variables are hosted through the Pangeo cloud, but there are additional ways to access all the CMIP6 data as described here, including direct access through ESGF.

You can see which variables and models are available within Pangeo using the sample code below, where we look for models having the variable ‘pastureFrac’ for the historical simulation:

expts = ["historical"]

query = dict(
    experiment_id=expts,
    table_id="Lmon",
    variable_id=["pastureFrac"],
    member_id="r1i1p1f1",
)

col_subset = col.search(require_all_on=["source_id"], **query)
col_subset.df.groupby("source_id")[
    ["experiment_id", "variable_id", "table_id"]
].nunique()
experiment_id variable_id table_id
source_id
GFDL-CM4 1 1 1
GFDL-ESM4 1 1 1

Here we will download several variables from the GFDL-ESM4 historical CMIP6 simulation:

import pandas as pd
from IPython.display import display, HTML, Markdown

# Data as list of dictionaries
classification_system = [
    {
        "Name": "gpp",
        "Description": "Carbon Mass Flux out of Atmosphere due to Gross Primary Production on Land",
    },
    {
        "Name": "npp",
        "Description": "Carbon Mass Flux out of Atmosphere due to Net Primary Production on Land",
    },
    {
        "Name": "nep",
        "Description": "Carbon Mass Flux out of Atmophere due to Net Ecosystem Production on Land",
    },
    {
        "Name": "nbp",
        "Description": "Carbon Mass Flux out of Atmosphere due to Net Biospheric Production on Land",
    },
    {"Name": "treeFrac", "Description": "Land Area Percentage Tree Cover"},
    {"Name": "grassFrac", "Description": "Land Area Percentage Natural Grass"},
    {"Name": "cropFrac", "Description": "Land Area Percentage Crop Cover"},
    {
        "Name": "pastureFrac",
        "Description": "Land Area Percentage Anthropogenic Pasture Cover",
    },
    {"Name": "rsus", "Description": "Surface Upwelling Shortwave Radiation"},
    {"Name": "rsds", "Description": "Surface Downwelling Shortwave Radiation"},
    {"Name": "tas", "Description": "Near-Surface Air Temperature"},
    {"Name": "pr", "Description": "Precipitation"},
    {
        "Name": "areacella",
        "Description": "Grid-Cell Area for Atmospheric Variables (all variabeles are on this grid however)",
    },
]

df = pd.DataFrame(classification_system)
pd.set_option("display.max_colwidth", None)
html = df.to_html(index=False)
title_md = "### Table 1: CMIP6 Variables"
display(Markdown(title_md))
display(HTML(html))

Table 1: CMIP6 Variables

Name Description
gpp Carbon Mass Flux out of Atmosphere due to Gross Primary Production on Land
npp Carbon Mass Flux out of Atmosphere due to Net Primary Production on Land
nep Carbon Mass Flux out of Atmophere due to Net Ecosystem Production on Land
nbp Carbon Mass Flux out of Atmosphere due to Net Biospheric Production on Land
treeFrac Land Area Percentage Tree Cover
grassFrac Land Area Percentage Natural Grass
cropFrac Land Area Percentage Crop Cover
pastureFrac Land Area Percentage Anthropogenic Pasture Cover
rsus Surface Upwelling Shortwave Radiation
rsds Surface Downwelling Shortwave Radiation
tas Near-Surface Air Temperature
pr Precipitation
areacella Grid-Cell Area for Atmospheric Variables (all variabeles are on this grid however)

There are different timescales on which carbon is released back into the atmosphere, and these are reflected in the different production terms. This is highlighted in the figure below (please note these numbers are quite outdated).

Figure 1-2: Global terrestrial carbon uptake. Plant (autotrophic) respiration releases CO2 to the atmosphere, reducing GPP to NPP and resulting in short-term carbon uptake. Decomposition (heterotrophic respiration) of litter and soils in excess of that resulting from disturbance further releases CO2 to the atmosphere, reducing NPP to NEP and resulting in medium-term carbon uptake. Disturbance from both natural and anthropogenic sources (e.g., harvest) leads to further release of CO2 to the atmosphere by additional heterotrophic respiration and combustion-which, in turn, leads to long-term carbon storage (adapted from Steffen et al., 1998). Credit: IPCC

Now you are ready to extract all the variables!

# get monthly land variables

# from the full `col` object, create a subset using facet search
cat = col.search(
    source_id="GFDL-ESM4",
    variable_id=[
        "gpp",
        "npp",
        "nbp",
        "treeFrac",
        "grassFrac",
        "cropFrac",
        "pastureFrac",
    ],  # No 'shrubFrac','baresoilFrac','residualFrac' in GFDL-ESM4
    member_id="r1i1p1f1",
    table_id="Lmon",
    grid_label="gr1",
    experiment_id=["historical"],
    require_all_on=[
        "source_id"
    ],  # make sure that we only get models which have all of the above experiments
)

# convert the sub-catalog into a datatree object, by opening each dataset into an xarray.Dataset (without loading the data)
kwargs = dict(
    preprocess=combined_preprocessing,  # apply xMIP fixes to each dataset
    xarray_open_kwargs=dict(
        use_cftime=True
    ),  # ensure all datasets use the same time index
    storage_options={
        "token": "anon"
    },  # anonymous/public authentication to google cloud storage
)

cat.esmcat.aggregation_control.groupby_attrs = ["source_id", "experiment_id"]
dt_Lmon_variables = cat.to_datatree(**kwargs)

# convert to dataset instead of datatree, remove extra singleton dimensions
ds_Lmon = dt_Lmon_variables["GFDL-ESM4"]["historical"].to_dataset().squeeze()
ds_Lmon
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id/experiment_id'
100.00% [1/1 00:04<00:00]
<xarray.Dataset> Size: 3GB
Dimensions:         (time: 1980, y: 180, x: 288, bnds: 2, vertex: 4)
Coordinates:
  * y               (y) float64 1kB -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
  * x               (x) float64 2kB 0.625 1.875 3.125 ... 356.9 358.1 359.4
  * time            (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:...
    lon_bounds      (x, bnds, y) float64 829kB dask.array<chunksize=(288, 2, 180), meta=np.ndarray>
    lat_bounds      (y, bnds, x) float64 829kB dask.array<chunksize=(180, 2, 288), meta=np.ndarray>
    time_bounds     (time, bnds) object 32kB dask.array<chunksize=(1980, 2), meta=np.ndarray>
    lon             (x, y) float64 415kB 0.625 0.625 0.625 ... 359.4 359.4 359.4
    lat             (x, y) float64 415kB -89.5 -88.5 -87.5 ... 87.5 88.5 89.5
    lon_verticies   (vertex, x, y) float64 2MB dask.array<chunksize=(1, 288, 180), meta=np.ndarray>
    lat_verticies   (vertex, x, y) float64 2MB dask.array<chunksize=(1, 288, 180), meta=np.ndarray>
    member_id       <U8 32B 'r1i1p1f1'
    dcpp_init_year  float64 8B nan
Dimensions without coordinates: bnds, vertex
Data variables:
    cropFrac        (time, y, x) float32 411MB dask.array<chunksize=(901, 180, 288), meta=np.ndarray>
    gpp             (time, y, x) float32 411MB dask.array<chunksize=(990, 180, 288), meta=np.ndarray>
    grassFrac       (time, y, x) float32 411MB dask.array<chunksize=(838, 180, 288), meta=np.ndarray>
    nbp             (time, y, x) float32 411MB dask.array<chunksize=(793, 180, 288), meta=np.ndarray>
    npp             (time, y, x) float32 411MB dask.array<chunksize=(794, 180, 288), meta=np.ndarray>
    pastureFrac     (time, y, x) float32 411MB dask.array<chunksize=(990, 180, 288), meta=np.ndarray>
    treeFrac        (time, y, x) float32 411MB dask.array<chunksize=(878, 180, 288), meta=np.ndarray>
Attributes: (12/54)
    Conventions:                      CF-1.7 CMIP-6.0 UGRID-1.0
    activity_id:                      CMIP
    branch_method:                    standard
    branch_time_in_child:             0.0
    branch_time_in_parent:            36500.0
    comment:                          <null ref>
    ...                               ...
    intake_esm_attrs:member_id:       r1i1p1f1
    intake_esm_attrs:table_id:        Lmon
    intake_esm_attrs:grid_label:      gr1
    intake_esm_attrs:version:         20190726
    intake_esm_attrs:_data_format_:   zarr
    intake_esm_dataset_key:           GFDL-ESM4/historical
# get monthly 'extension' variables

# from the full `col` object, create a subset using facet search
cat = col.search(
    source_id="GFDL-ESM4",
    variable_id="nep",
    member_id="r1i1p1f1",
    table_id="Emon",
    grid_label="gr1",
    experiment_id=["historical"],
    require_all_on=[
        "source_id"
    ],  # make sure that we only get models which have all of the above experiments
)

# convert the sub-catalog into a datatree object, by opening each dataset into an xarray.Dataset (without loading the data)
kwargs = dict(
    preprocess=combined_preprocessing,  # apply xMIP fixes to each dataset
    xarray_open_kwargs=dict(
        use_cftime=True
    ),  # ensure all datasets use the same time index
    storage_options={
        "token": "anon"
    },  # anonymous/public authentication to google cloud storage
)

cat.esmcat.aggregation_control.groupby_attrs = ["source_id", "experiment_id"]
dt_Emon_variables = cat.to_datatree(**kwargs)

# convert to dataset instead of datatree, remove extra singleton dimensions
ds_Emon = dt_Emon_variables["GFDL-ESM4"]["historical"].to_dataset().squeeze()
ds_Emon
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id/experiment_id'
100.00% [1/1 00:01<00:00]
<xarray.Dataset> Size: 416MB
Dimensions:         (time: 1980, y: 180, x: 288, bnds: 2, vertex: 4)
Coordinates:
  * y               (y) float64 1kB -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
  * x               (x) float64 2kB 0.625 1.875 3.125 ... 356.9 358.1 359.4
  * time            (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:...
    lon_bounds      (x, bnds, y) float64 829kB dask.array<chunksize=(288, 2, 180), meta=np.ndarray>
    lat_bounds      (y, bnds, x) float64 829kB dask.array<chunksize=(180, 2, 288), meta=np.ndarray>
    time_bounds     (time, bnds) object 32kB dask.array<chunksize=(1980, 2), meta=np.ndarray>
    lon             (x, y) float64 415kB 0.625 0.625 0.625 ... 359.4 359.4 359.4
    lat             (x, y) float64 415kB -89.5 -88.5 -87.5 ... 87.5 88.5 89.5
    lon_verticies   (vertex, x, y) float64 2MB dask.array<chunksize=(1, 288, 180), meta=np.ndarray>
    lat_verticies   (vertex, x, y) float64 2MB dask.array<chunksize=(1, 288, 180), meta=np.ndarray>
    member_id       <U8 32B 'r1i1p1f1'
    dcpp_init_year  float64 8B nan
Dimensions without coordinates: bnds, vertex
Data variables:
    nep             (time, y, x) float32 411MB dask.array<chunksize=(794, 180, 288), meta=np.ndarray>
Attributes: (12/62)
    Conventions:                      CF-1.7 CMIP-6.0 UGRID-1.0
    activity_id:                      CMIP
    branch_method:                    standard
    branch_time_in_child:             0.0
    branch_time_in_parent:            36500.0
    comment:                          <null ref>
    ...                               ...
    intake_esm_attrs:variable_id:     nep
    intake_esm_attrs:grid_label:      gr1
    intake_esm_attrs:zstore:          gs://cmip6/CMIP6/CMIP/NOAA-GFDL/GFDL-ES...
    intake_esm_attrs:version:         20190726
    intake_esm_attrs:_data_format_:   zarr
    intake_esm_dataset_key:           GFDL-ESM4/historical
# get atmospheric variables

# from the full `col` object, create a subset using facet search
cat = col.search(
    source_id="GFDL-ESM4",
    variable_id=["rsds", "rsus", "tas", "pr"],
    member_id="r1i1p1f1",
    table_id="Amon",
    grid_label="gr1",
    experiment_id=["historical"],
    require_all_on=[
        "source_id"
    ],  # make sure that we only get models which have all of the above experiments
)

# convert the sub-catalog into a datatree object, by opening each dataset into an xarray.Dataset (without loading the data)
kwargs = dict(
    preprocess=combined_preprocessing,  # apply xMIP fixes to each dataset
    xarray_open_kwargs=dict(
        use_cftime=True
    ),  # ensure all datasets use the same time index
    storage_options={
        "token": "anon"
    },  # anonymous/public authentication to google cloud storage
)

cat.esmcat.aggregation_control.groupby_attrs = ["source_id", "experiment_id"]
dt_Amon_variables = cat.to_datatree(**kwargs)

# convert to dataset instead of datatree, remove extra singleton dimensions
ds_Amon = dt_Amon_variables["GFDL-ESM4"]["historical"].to_dataset().squeeze()
ds_Amon
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id/experiment_id'
100.00% [1/1 00:01<00:00]
<xarray.Dataset> Size: 2GB
Dimensions:         (time: 1980, y: 180, x: 288, bnds: 2, vertex: 4)
Coordinates: (12/13)
  * y               (y) float64 1kB -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
  * x               (x) float64 2kB 0.625 1.875 3.125 ... 356.9 358.1 359.4
  * time            (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:...
    lon_bounds      (x, bnds, y) float64 829kB dask.array<chunksize=(288, 2, 180), meta=np.ndarray>
    lat_bounds      (y, bnds, x) float64 829kB dask.array<chunksize=(180, 2, 288), meta=np.ndarray>
    time_bounds     (time, bnds) object 32kB dask.array<chunksize=(1980, 2), meta=np.ndarray>
    ...              ...
    lat             (x, y) float64 415kB -89.5 -88.5 -87.5 ... 87.5 88.5 89.5
    lon_verticies   (vertex, x, y) float64 2MB dask.array<chunksize=(1, 288, 180), meta=np.ndarray>
    lat_verticies   (vertex, x, y) float64 2MB dask.array<chunksize=(1, 288, 180), meta=np.ndarray>
    member_id       <U8 32B 'r1i1p1f1'
    dcpp_init_year  float64 8B nan
    height          float64 8B 2.0
Dimensions without coordinates: bnds, vertex
Data variables:
    pr              (time, y, x) float32 411MB dask.array<chunksize=(600, 180, 288), meta=np.ndarray>
    rsds            (time, y, x) float32 411MB dask.array<chunksize=(358, 180, 288), meta=np.ndarray>
    rsus            (time, y, x) float32 411MB dask.array<chunksize=(345, 180, 288), meta=np.ndarray>
    tas             (time, y, x) float32 411MB dask.array<chunksize=(600, 180, 288), meta=np.ndarray>
Attributes: (12/51)
    Conventions:                      CF-1.7 CMIP-6.0 UGRID-1.0
    activity_id:                      CMIP
    branch_method:                    standard
    branch_time_in_child:             0.0
    branch_time_in_parent:            36500.0
    comment:                          <null ref>
    ...                               ...
    intake_esm_attrs:member_id:       r1i1p1f1
    intake_esm_attrs:table_id:        Amon
    intake_esm_attrs:grid_label:      gr1
    intake_esm_attrs:version:         20190726
    intake_esm_attrs:_data_format_:   zarr
    intake_esm_dataset_key:           GFDL-ESM4/historical
# get atmospheric variables

# from the full `col` object, create a subset using facet search
cat = col.search(
    source_id="GFDL-ESM4",
    variable_id=["areacella"],
    member_id="r1i1p1f1",
    table_id="fx",
    grid_label="gr1",
    experiment_id=["historical"],
    require_all_on=[
        "source_id"
    ],  # make sure that we only get models which have all of the above experiments
)

# convert the sub-catalog into a datatree object, by opening each dataset into an xarray.Dataset (without loading the data)
kwargs = dict(
    preprocess=combined_preprocessing,  # apply xMIP fixes to each dataset
    xarray_open_kwargs=dict(
        use_cftime=True
    ),  # ensure all datasets use the same time index
    storage_options={
        "token": "anon"
    },  # anonymous/public authentication to google cloud storage
)

cat.esmcat.aggregation_control.groupby_attrs = ["source_id", "experiment_id"]
dt_fx_variables = cat.to_datatree(**kwargs)

# convert to dataset instead of datatree, remove extra singleton dimensions
ds_fx = dt_fx_variables["GFDL-ESM4"]["historical"].to_dataset().squeeze()
ds_fx
--> The keys in the returned dictionary of datasets are constructed as follows:
	'source_id/experiment_id'
100.00% [1/1 00:00<00:00]
<xarray.Dataset> Size: 6MB
Dimensions:         (y: 180, x: 288, bnds: 2, vertex: 4)
Coordinates:
  * y               (y) float64 1kB -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
  * x               (x) float64 2kB 0.625 1.875 3.125 ... 356.9 358.1 359.4
    lon_bounds      (x, bnds, y) float64 829kB dask.array<chunksize=(288, 2, 180), meta=np.ndarray>
    lat_bounds      (y, bnds, x) float64 829kB dask.array<chunksize=(180, 2, 288), meta=np.ndarray>
    lon             (x, y) float64 415kB 0.625 0.625 0.625 ... 359.4 359.4 359.4
    lat             (x, y) float64 415kB -89.5 -88.5 -87.5 ... 87.5 88.5 89.5
    lon_verticies   (vertex, x, y) float64 2MB dask.array<chunksize=(1, 288, 180), meta=np.ndarray>
    lat_verticies   (vertex, x, y) float64 2MB dask.array<chunksize=(1, 288, 180), meta=np.ndarray>
    member_id       <U8 32B 'r1i1p1f1'
    dcpp_init_year  float64 8B nan
Dimensions without coordinates: bnds, vertex
Data variables:
    areacella       (y, x) float32 207kB dask.array<chunksize=(180, 288), meta=np.ndarray>
Attributes: (12/61)
    Conventions:                      CF-1.7 CMIP-6.0 UGRID-1.0
    activity_id:                      CMIP
    branch_method:                    standard
    branch_time_in_child:             0.0
    branch_time_in_parent:            36500.0
    comment:                          <null ref>
    ...                               ...
    intake_esm_attrs:variable_id:     areacella
    intake_esm_attrs:grid_label:      gr1
    intake_esm_attrs:zstore:          gs://cmip6/CMIP6/CMIP/NOAA-GFDL/GFDL-ES...
    intake_esm_attrs:version:         20190726
    intake_esm_attrs:_data_format_:   zarr
    intake_esm_dataset_key:           GFDL-ESM4/historical

Since we are only using one model here, it is practical to extract the variables of interest into datarrays and put them in one compact dataset. In addition we need to calculate the surface albedo. Note, that you will learn more about surface albedo (and CMIP6 data) in W1D5.

# merge into single dataset. note, these are all on the 'gr1' grid.
ds = xr.Dataset()

# add land variables
for var in ds_Lmon.data_vars:
    ds[var] = ds_Lmon[var]

# add extension variables
for var in ds_Emon.data_vars:
    ds[var] = ds_Emon[var]

# add atmopsheric variables
for var in ds_Amon.data_vars:
    ds[var] = ds_Amon[var]

# add grid cell area
for var in ds_fx.data_vars:
    ds[var] = ds_fx[var]

# drop unnecessary coordinates
ds = ds.drop_vars(["member_id", "dcpp_init_year", "height"])
ds
<xarray.Dataset> Size: 5GB
Dimensions:      (y: 180, x: 288, time: 1980)
Coordinates:
  * y            (y) float64 1kB -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
  * x            (x) float64 2kB 0.625 1.875 3.125 4.375 ... 356.9 358.1 359.4
  * time         (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
    lon          (x, y) float64 415kB 0.625 0.625 0.625 ... 359.4 359.4 359.4
    lat          (x, y) float64 415kB -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
Data variables: (12/13)
    cropFrac     (time, y, x) float32 411MB dask.array<chunksize=(901, 180, 288), meta=np.ndarray>
    gpp          (time, y, x) float32 411MB dask.array<chunksize=(990, 180, 288), meta=np.ndarray>
    grassFrac    (time, y, x) float32 411MB dask.array<chunksize=(838, 180, 288), meta=np.ndarray>
    nbp          (time, y, x) float32 411MB dask.array<chunksize=(793, 180, 288), meta=np.ndarray>
    npp          (time, y, x) float32 411MB dask.array<chunksize=(794, 180, 288), meta=np.ndarray>
    pastureFrac  (time, y, x) float32 411MB dask.array<chunksize=(990, 180, 288), meta=np.ndarray>
    ...           ...
    nep          (time, y, x) float32 411MB dask.array<chunksize=(794, 180, 288), meta=np.ndarray>
    pr           (time, y, x) float32 411MB dask.array<chunksize=(600, 180, 288), meta=np.ndarray>
    rsds         (time, y, x) float32 411MB dask.array<chunksize=(358, 180, 288), meta=np.ndarray>
    rsus         (time, y, x) float32 411MB dask.array<chunksize=(345, 180, 288), meta=np.ndarray>
    tas          (time, y, x) float32 411MB dask.array<chunksize=(600, 180, 288), meta=np.ndarray>
    areacella    (y, x) float32 207kB dask.array<chunksize=(180, 288), meta=np.ndarray>
# surface albedo is ratio of upwelling shortwave radiation (reflected) to downwelling shortwave radiation (incoming solar radiation).
ds["surf_albedo"] = ds.rsus / ds.rsds

# add attributes
ds["surf_albedo"].attrs = {"units": "Dimensionless", "long_name": "Surface Albedo"}
ds
<xarray.Dataset> Size: 5GB
Dimensions:      (y: 180, x: 288, time: 1980)
Coordinates:
  * y            (y) float64 1kB -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
  * x            (x) float64 2kB 0.625 1.875 3.125 4.375 ... 356.9 358.1 359.4
  * time         (time) object 16kB 1850-01-16 12:00:00 ... 2014-12-16 12:00:00
    lon          (x, y) float64 415kB 0.625 0.625 0.625 ... 359.4 359.4 359.4
    lat          (x, y) float64 415kB -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
Data variables: (12/14)
    cropFrac     (time, y, x) float32 411MB dask.array<chunksize=(901, 180, 288), meta=np.ndarray>
    gpp          (time, y, x) float32 411MB dask.array<chunksize=(990, 180, 288), meta=np.ndarray>
    grassFrac    (time, y, x) float32 411MB dask.array<chunksize=(838, 180, 288), meta=np.ndarray>
    nbp          (time, y, x) float32 411MB dask.array<chunksize=(793, 180, 288), meta=np.ndarray>
    npp          (time, y, x) float32 411MB dask.array<chunksize=(794, 180, 288), meta=np.ndarray>
    pastureFrac  (time, y, x) float32 411MB dask.array<chunksize=(990, 180, 288), meta=np.ndarray>
    ...           ...
    pr           (time, y, x) float32 411MB dask.array<chunksize=(600, 180, 288), meta=np.ndarray>
    rsds         (time, y, x) float32 411MB dask.array<chunksize=(358, 180, 288), meta=np.ndarray>
    rsus         (time, y, x) float32 411MB dask.array<chunksize=(345, 180, 288), meta=np.ndarray>
    tas          (time, y, x) float32 411MB dask.array<chunksize=(600, 180, 288), meta=np.ndarray>
    areacella    (y, x) float32 207kB dask.array<chunksize=(180, 288), meta=np.ndarray>
    surf_albedo  (time, y, x) float32 411MB dask.array<chunksize=(345, 180, 288), meta=np.ndarray>

Alternative Land Cover Approach: Global Land Surface Satellite (GLASS) Dataset#

The Global Land Surface Satellite (GLASS) datasets primarily based on NASA’s Advanced Very High Resolution Radiometer (AVHRR) long-term data record (LTDR) and Moderate Resolution Imaging Spectroradiometer (MODIS) data, in conjunction with other satellite data and ancillary information.

Currently, there are more than dozens of GLASS products are officially released, including leaf area index, fraction of green vegetation coverage, gross primary production, broadband albedo, land surface temperature, evapotranspiration, and so on.

Here we provide you the datasets of GLASS from 1982 to 2015, a 34-year long annual dynamics of global land cover (GLASS-GLC) at 5 km resolution. In this datasets, there are 7 classes, including cropland, forest, grassland, shrubland, tundra, barren land, and snow/ice. The annual global land cover map (5 km) is presented in a GeoTIFF file format named in the form of ‘GLASS-GLC_7classes_year’ with a WGS 84 projection. The relationship between the labels in the files and the 7 land cover classes is shown in the following table

You can refer to this paper for detailed description of this.ts

# Table 1 Classification system, with 7 land cover classes. From paper https://www.earth-syst-sci-data-discuss.net/essd-2019-23
import pandas as pd
from IPython.display import display, HTML, Markdown

# Data as list of dictionaries
classification_system = [
    {"Label": 10, "Class": "Cropland", "Subclass": "Rice paddy", "Description": ""},
    {"Label": 10, "Class": "Cropland", "Subclass": "Greenhouse", "Description": ""},
    {"Label": 10, "Class": "Cropland", "Subclass": "Other farmland", "Description": ""},
    {"Label": 10, "Class": "Cropland", "Subclass": "Orchard", "Description": ""},
    {"Label": 10, "Class": "Cropland", "Subclass": "Bare farmland", "Description": ""},
    {
        "Label": 20,
        "Class": "Forest",
        "Subclass": "Broadleaf, leaf-on",
        "Description": "Tree cover≥10%; Height>5m; For mixed leaf, neither coniferous nor broadleaf types exceed 60%",
    },
    {
        "Label": 20,
        "Class": "Forest",
        "Subclass": "Broadleaf, leaf-off",
        "Description": "",
    },
    {
        "Label": 20,
        "Class": "Forest",
        "Subclass": "Needle-leaf, leaf-on",
        "Description": "",
    },
    {
        "Label": 20,
        "Class": "Forest",
        "Subclass": "Needle-leaf, leaf-off",
        "Description": "",
    },
    {
        "Label": 20,
        "Class": "Forest",
        "Subclass": "Mixed leaf type, leaf-on",
        "Description": "",
    },
    {
        "Label": 20,
        "Class": "Forest",
        "Subclass": "Mixed leaf type, leaf-off",
        "Description": "",
    },
    {
        "Label": 30,
        "Class": "Grassland",
        "Subclass": "Pasture, leaf-on",
        "Description": "Canopy cover≥20%",
    },
    {
        "Label": 30,
        "Class": "Grassland",
        "Subclass": "Natural grassland, leaf-on",
        "Description": "",
    },
    {
        "Label": 30,
        "Class": "Grassland",
        "Subclass": "Grassland, leaf-off",
        "Description": "",
    },
    {
        "Label": 40,
        "Class": "Shrubland",
        "Subclass": "Shrub cover, leaf-on",
        "Description": "Canopy cover≥20%; Height<5m",
    },
    {
        "Label": 40,
        "Class": "Shrubland",
        "Subclass": "Shrub cover, leaf-off",
        "Description": "",
    },
    {
        "Label": 70,
        "Class": "Tundra",
        "Subclass": "Shrub and brush tundra",
        "Description": "",
    },
    {
        "Label": 70,
        "Class": "Tundra",
        "Subclass": "Herbaceous tundra",
        "Description": "",
    },
    {
        "Label": 90,
        "Class": "Barren land",
        "Subclass": "Barren land",
        "Description": "Vegetation cover<10%",
    },
    {"Label": 100, "Class": "Snow/Ice", "Subclass": "Snow", "Description": ""},
    {"Label": 100, "Class": "Snow/Ice", "Subclass": "Ice", "Description": ""},
    {"Label": 0, "Class": "No data", "Subclass": "", "Description": ""},
]

df = pd.DataFrame(classification_system)
pd.set_option("display.max_colwidth", None)
html = df.to_html(index=False)
title_md = "### Table 1 GLASS classification system with 7 land cover classes. From [this paper](https://www.earth-syst-sci-data-discuss.net/essd-2019-23)."
display(Markdown(title_md))
display(HTML(html))

Table 1 GLASS classification system with 7 land cover classes. From this paper.

Label Class Subclass Description
10 Cropland Rice paddy
10 Cropland Greenhouse
10 Cropland Other farmland
10 Cropland Orchard
10 Cropland Bare farmland
20 Forest Broadleaf, leaf-on Tree cover≥10%; Height>5m; For mixed leaf, neither coniferous nor broadleaf types exceed 60%
20 Forest Broadleaf, leaf-off
20 Forest Needle-leaf, leaf-on
20 Forest Needle-leaf, leaf-off
20 Forest Mixed leaf type, leaf-on
20 Forest Mixed leaf type, leaf-off
30 Grassland Pasture, leaf-on Canopy cover≥20%
30 Grassland Natural grassland, leaf-on
30 Grassland Grassland, leaf-off
40 Shrubland Shrub cover, leaf-on Canopy cover≥20%; Height<5m
40 Shrubland Shrub cover, leaf-off
70 Tundra Shrub and brush tundra
70 Tundra Herbaceous tundra
90 Barren land Barren land Vegetation cover<10%
100 Snow/Ice Snow
100 Snow/Ice Ice
0 No data
# source of landuse data: https://doi.pangaea.de/10.1594/PANGAEA.913496
# the folder "land-use" has the data for years 1982 to 2015. choose the years you need and change the path accordingly
path_LandUse = os.path.expanduser(
    "~/shared/Data/Projects/Albedo/land-use/GLASS-GLC_7classes_1982.tif"
)
ds_landuse = xr.open_dataset(path_LandUse).rename({"x": "longitude", "y": "latitude"})
# ds_landuse.band_data[0,:,:].plot() # how to plot the global data
ds_landuse
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[16], line 6
      1 # source of landuse data: https://doi.pangaea.de/10.1594/PANGAEA.913496
      2 # the folder "land-use" has the data for years 1982 to 2015. choose the years you need and change the path accordingly
      3 path_LandUse = os.path.expanduser(
      4     "~/shared/Data/Projects/Albedo/land-use/GLASS-GLC_7classes_1982.tif"
      5 )
----> 6 ds_landuse = xr.open_dataset(path_LandUse).rename({"x": "longitude", "y": "latitude"})
      7 # ds_landuse.band_data[0,:,:].plot() # how to plot the global data
      8 ds_landuse

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/api.py:554, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    551     kwargs.update(backend_kwargs)
    553 if engine is None:
--> 554     engine = plugins.guess_engine(filename_or_obj)
    556 if from_array_kwargs is None:
    557     from_array_kwargs = {}

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/plugins.py:197, in guess_engine(store_spec)
    189 else:
    190     error_msg = (
    191         "found the following matches with the input file in xarray's IO "
    192         f"backends: {compatible_engines}. But their dependencies may not be installed, see:\n"
    193         "https://docs.xarray.dev/en/stable/user-guide/io.html \n"
    194         "https://docs.xarray.dev/en/stable/getting-started-guide/installing.html"
    195     )
--> 197 raise ValueError(error_msg)

ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4', 'h5netcdf', 'scipy', 'gini', 'zarr']. Consider explicitly selecting one of the installed engines via the ``engine`` parameter, or installing additional IO dependencies, see:
https://docs.xarray.dev/en/stable/getting-started-guide/installing.html
https://docs.xarray.dev/en/stable/user-guide/io.html

Alternative Approach: ERA5-Land Monthly Averaged Data from 1950 to Present#

ERA5-Land is a reanalysis dataset that offers an enhanced resolution compared to ERA5, providing a consistent view of land variables over several decades. It is created by replaying the land component of the ECMWF ERA5 climate reanalysis, which combines model data and global observations to generate a complete and reliable dataset using the laws of physics.

ERA5-Land focuses on the water and energy cycles at the surface level, offering a detailed record starting from 1950. The data used here is a post-processed subset of the complete ERA5-Land dataset. Monthly-mean averages have been pre-calculated to facilitate quick and convenient access to the data, particularly for applications that do not require sub-monthly fields. The native spatial resolution of the ERA5-Land reanalysis dataset is 9km on a reduced Gaussian grid (TCo1279). The data in the CDS has been regridded to a regular lat-lon grid of 0.1x0.1 degrees.

To Calculate Albedo Using ERA5-Land#

ERA5 parameter Forecast albedo provides is the measure of the reflectivity of the Earth’s surface. It is the fraction of solar (shortwave) radiation reflected by Earth’s surface, across the solar spectrum, for both direct and diffuse radiation. Values are between 0 and 1. Typically, snow and ice have high reflectivity with albedo values of 0.8 and above, land has intermediate values between about 0.1 and 0.4 and the ocean has low values of 0.1 or less. Radiation from the Sun (solar, or shortwave, radiation) is partly reflected back to space by clouds and particles in the atmosphere (aerosols) and some of it is absorbed. The rest is incident on the Earth’s surface, where some of it is reflected. The portion that is reflected by the Earth’s surface depends on the albedo. In the ECMWF Integrated Forecasting System (IFS), a climatological background albedo (observed values averaged over a period of several years) is used, modified by the model over water, ice and snow. Albedo is often shown as a percentage (%).

# link for albedo data:
albedo_path = "~/shared/Data/Projects/Albedo/ERA/albedo-001.nc"
ds_albedo = xr.open_dataset(albedo_path)
ds_albedo  # note the official variable name is fal (forecast albedo)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/file_manager.py:211, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    210 try:
--> 211     file = self._cache[self._key]
    212 except KeyError:

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/lru_cache.py:56, in LRUCache.__getitem__(self, key)
     55 with self._lock:
---> 56     value = self._cache[key]
     57     self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/home/runner/shared/Data/Projects/Albedo/ERA/albedo-001.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), '2cf8b53e-7b8c-459c-a9d1-a2e3532ee32a']

During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)
Cell In[17], line 3
      1 # link for albedo data:
      2 albedo_path = "~/shared/Data/Projects/Albedo/ERA/albedo-001.nc"
----> 3 ds_albedo = xr.open_dataset(albedo_path)
      4 ds_albedo  # note the official variable name is fal (forecast albedo)

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/api.py:573, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    561 decoders = _resolve_decoders_kwargs(
    562     decode_cf,
    563     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    569     decode_coords=decode_coords,
    570 )
    572 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 573 backend_ds = backend.open_dataset(
    574     filename_or_obj,
    575     drop_variables=drop_variables,
    576     **decoders,
    577     **kwargs,
    578 )
    579 ds = _dataset_from_backend_dataset(
    580     backend_ds,
    581     filename_or_obj,
   (...)
    591     **kwargs,
    592 )
    593 return ds

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:646, in NetCDF4BackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, format, clobber, diskless, persist, lock, autoclose)
    625 def open_dataset(  # type: ignore[override]  # allow LSP violation, not supporting **kwargs
    626     self,
    627     filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
   (...)
    643     autoclose=False,
    644 ) -> Dataset:
    645     filename_or_obj = _normalize_path(filename_or_obj)
--> 646     store = NetCDF4DataStore.open(
    647         filename_or_obj,
    648         mode=mode,
    649         format=format,
    650         group=group,
    651         clobber=clobber,
    652         diskless=diskless,
    653         persist=persist,
    654         lock=lock,
    655         autoclose=autoclose,
    656     )
    658     store_entrypoint = StoreBackendEntrypoint()
    659     with close_on_error(store):

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:409, in NetCDF4DataStore.open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
    403 kwargs = dict(
    404     clobber=clobber, diskless=diskless, persist=persist, format=format
    405 )
    406 manager = CachingFileManager(
    407     netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
    408 )
--> 409 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:356, in NetCDF4DataStore.__init__(self, manager, group, mode, lock, autoclose)
    354 self._group = group
    355 self._mode = mode
--> 356 self.format = self.ds.data_model
    357 self._filename = self.ds.filepath()
    358 self.is_remote = is_remote_uri(self._filename)

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:418, in NetCDF4DataStore.ds(self)
    416 @property
    417 def ds(self):
--> 418     return self._acquire()

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:412, in NetCDF4DataStore._acquire(self, needs_lock)
    411 def _acquire(self, needs_lock=True):
--> 412     with self._manager.acquire_context(needs_lock) as root:
    413         ds = _nc4_require_group(root, self._group, self._mode)
    414     return ds

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/contextlib.py:119, in _GeneratorContextManager.__enter__(self)
    117 del self.args, self.kwds, self.func
    118 try:
--> 119     return next(self.gen)
    120 except StopIteration:
    121     raise RuntimeError("generator didn't yield") from None

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/file_manager.py:199, in CachingFileManager.acquire_context(self, needs_lock)
    196 @contextlib.contextmanager
    197 def acquire_context(self, needs_lock=True):
    198     """Context manager for acquiring a file."""
--> 199     file, cached = self._acquire_with_cache_info(needs_lock)
    200     try:
    201         yield file

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/file_manager.py:217, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    215     kwargs = kwargs.copy()
    216     kwargs["mode"] = self._mode
--> 217 file = self._opener(*self._args, **kwargs)
    218 if self._mode == "w":
    219     # ensure file doesn't get overridden when opened again
    220     self._mode = "a"

File src/netCDF4/_netCDF4.pyx:2469, in netCDF4._netCDF4.Dataset.__init__()

File src/netCDF4/_netCDF4.pyx:2028, in netCDF4._netCDF4._ensure_nc_success()

FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/shared/Data/Projects/Albedo/ERA/albedo-001.nc'

for your convience, included below are preciptation and temprature ERA5 dataset for the same times as the Albedo dataset

precp_path = "~/shared/Data/Projects/Albedo/ERA/precipitation-002.nc"
ds_precp = xr.open_dataset(precp_path)
ds_precp  # the variable name is tp (total preciptation)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/file_manager.py:211, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    210 try:
--> 211     file = self._cache[self._key]
    212 except KeyError:

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/lru_cache.py:56, in LRUCache.__getitem__(self, key)
     55 with self._lock:
---> 56     value = self._cache[key]
     57     self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/home/runner/shared/Data/Projects/Albedo/ERA/precipitation-002.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), '8e5c54b2-abf9-430a-8757-3f02f25e9fcd']

During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)
Cell In[18], line 2
      1 precp_path = "~/shared/Data/Projects/Albedo/ERA/precipitation-002.nc"
----> 2 ds_precp = xr.open_dataset(precp_path)
      3 ds_precp  # the variable name is tp (total preciptation)

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/api.py:573, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    561 decoders = _resolve_decoders_kwargs(
    562     decode_cf,
    563     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    569     decode_coords=decode_coords,
    570 )
    572 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 573 backend_ds = backend.open_dataset(
    574     filename_or_obj,
    575     drop_variables=drop_variables,
    576     **decoders,
    577     **kwargs,
    578 )
    579 ds = _dataset_from_backend_dataset(
    580     backend_ds,
    581     filename_or_obj,
   (...)
    591     **kwargs,
    592 )
    593 return ds

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:646, in NetCDF4BackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, format, clobber, diskless, persist, lock, autoclose)
    625 def open_dataset(  # type: ignore[override]  # allow LSP violation, not supporting **kwargs
    626     self,
    627     filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
   (...)
    643     autoclose=False,
    644 ) -> Dataset:
    645     filename_or_obj = _normalize_path(filename_or_obj)
--> 646     store = NetCDF4DataStore.open(
    647         filename_or_obj,
    648         mode=mode,
    649         format=format,
    650         group=group,
    651         clobber=clobber,
    652         diskless=diskless,
    653         persist=persist,
    654         lock=lock,
    655         autoclose=autoclose,
    656     )
    658     store_entrypoint = StoreBackendEntrypoint()
    659     with close_on_error(store):

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:409, in NetCDF4DataStore.open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
    403 kwargs = dict(
    404     clobber=clobber, diskless=diskless, persist=persist, format=format
    405 )
    406 manager = CachingFileManager(
    407     netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
    408 )
--> 409 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:356, in NetCDF4DataStore.__init__(self, manager, group, mode, lock, autoclose)
    354 self._group = group
    355 self._mode = mode
--> 356 self.format = self.ds.data_model
    357 self._filename = self.ds.filepath()
    358 self.is_remote = is_remote_uri(self._filename)

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:418, in NetCDF4DataStore.ds(self)
    416 @property
    417 def ds(self):
--> 418     return self._acquire()

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:412, in NetCDF4DataStore._acquire(self, needs_lock)
    411 def _acquire(self, needs_lock=True):
--> 412     with self._manager.acquire_context(needs_lock) as root:
    413         ds = _nc4_require_group(root, self._group, self._mode)
    414     return ds

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/contextlib.py:119, in _GeneratorContextManager.__enter__(self)
    117 del self.args, self.kwds, self.func
    118 try:
--> 119     return next(self.gen)
    120 except StopIteration:
    121     raise RuntimeError("generator didn't yield") from None

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/file_manager.py:199, in CachingFileManager.acquire_context(self, needs_lock)
    196 @contextlib.contextmanager
    197 def acquire_context(self, needs_lock=True):
    198     """Context manager for acquiring a file."""
--> 199     file, cached = self._acquire_with_cache_info(needs_lock)
    200     try:
    201         yield file

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/file_manager.py:217, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    215     kwargs = kwargs.copy()
    216     kwargs["mode"] = self._mode
--> 217 file = self._opener(*self._args, **kwargs)
    218 if self._mode == "w":
    219     # ensure file doesn't get overridden when opened again
    220     self._mode = "a"

File src/netCDF4/_netCDF4.pyx:2469, in netCDF4._netCDF4.Dataset.__init__()

File src/netCDF4/_netCDF4.pyx:2028, in netCDF4._netCDF4._ensure_nc_success()

FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/shared/Data/Projects/Albedo/ERA/precipitation-002.nc'
tempr_path = "~/shared/Data/Projects/Albedo/ERA/Temperature-003.nc"
ds_tempr = xr.open_dataset(tempr_path)
ds_tempr  # the variable name is t2m (temprature at 2m)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/file_manager.py:211, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    210 try:
--> 211     file = self._cache[self._key]
    212 except KeyError:

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/lru_cache.py:56, in LRUCache.__getitem__(self, key)
     55 with self._lock:
---> 56     value = self._cache[key]
     57     self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/home/runner/shared/Data/Projects/Albedo/ERA/Temperature-003.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), 'ca21ac6b-7b41-4c8f-b578-e579e61a7d99']

During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)
Cell In[19], line 2
      1 tempr_path = "~/shared/Data/Projects/Albedo/ERA/Temperature-003.nc"
----> 2 ds_tempr = xr.open_dataset(tempr_path)
      3 ds_tempr  # the variable name is t2m (temprature at 2m)

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/api.py:573, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    561 decoders = _resolve_decoders_kwargs(
    562     decode_cf,
    563     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    569     decode_coords=decode_coords,
    570 )
    572 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 573 backend_ds = backend.open_dataset(
    574     filename_or_obj,
    575     drop_variables=drop_variables,
    576     **decoders,
    577     **kwargs,
    578 )
    579 ds = _dataset_from_backend_dataset(
    580     backend_ds,
    581     filename_or_obj,
   (...)
    591     **kwargs,
    592 )
    593 return ds

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:646, in NetCDF4BackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, format, clobber, diskless, persist, lock, autoclose)
    625 def open_dataset(  # type: ignore[override]  # allow LSP violation, not supporting **kwargs
    626     self,
    627     filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
   (...)
    643     autoclose=False,
    644 ) -> Dataset:
    645     filename_or_obj = _normalize_path(filename_or_obj)
--> 646     store = NetCDF4DataStore.open(
    647         filename_or_obj,
    648         mode=mode,
    649         format=format,
    650         group=group,
    651         clobber=clobber,
    652         diskless=diskless,
    653         persist=persist,
    654         lock=lock,
    655         autoclose=autoclose,
    656     )
    658     store_entrypoint = StoreBackendEntrypoint()
    659     with close_on_error(store):

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:409, in NetCDF4DataStore.open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
    403 kwargs = dict(
    404     clobber=clobber, diskless=diskless, persist=persist, format=format
    405 )
    406 manager = CachingFileManager(
    407     netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
    408 )
--> 409 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:356, in NetCDF4DataStore.__init__(self, manager, group, mode, lock, autoclose)
    354 self._group = group
    355 self._mode = mode
--> 356 self.format = self.ds.data_model
    357 self._filename = self.ds.filepath()
    358 self.is_remote = is_remote_uri(self._filename)

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:418, in NetCDF4DataStore.ds(self)
    416 @property
    417 def ds(self):
--> 418     return self._acquire()

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/netCDF4_.py:412, in NetCDF4DataStore._acquire(self, needs_lock)
    411 def _acquire(self, needs_lock=True):
--> 412     with self._manager.acquire_context(needs_lock) as root:
    413         ds = _nc4_require_group(root, self._group, self._mode)
    414     return ds

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/contextlib.py:119, in _GeneratorContextManager.__enter__(self)
    117 del self.args, self.kwds, self.func
    118 try:
--> 119     return next(self.gen)
    120 except StopIteration:
    121     raise RuntimeError("generator didn't yield") from None

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/file_manager.py:199, in CachingFileManager.acquire_context(self, needs_lock)
    196 @contextlib.contextmanager
    197 def acquire_context(self, needs_lock=True):
    198     """Context manager for acquiring a file."""
--> 199     file, cached = self._acquire_with_cache_info(needs_lock)
    200     try:
    201         yield file

File /opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/site-packages/xarray/backends/file_manager.py:217, in CachingFileManager._acquire_with_cache_info(self, needs_lock)
    215     kwargs = kwargs.copy()
    216     kwargs["mode"] = self._mode
--> 217 file = self._opener(*self._args, **kwargs)
    218 if self._mode == "w":
    219     # ensure file doesn't get overridden when opened again
    220     self._mode = "a"

File src/netCDF4/_netCDF4.pyx:2469, in netCDF4._netCDF4.Dataset.__init__()

File src/netCDF4/_netCDF4.pyx:2028, in netCDF4._netCDF4._ensure_nc_success()

FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/shared/Data/Projects/Albedo/ERA/Temperature-003.nc'

Further Reading#

Resources#

This tutorial uses data from the simulations conducted as part of the CMIP6 multi-model ensemble.

For examples on how to access and analyze data, please visit the Pangeo Cloud CMIP6 Gallery

For more information on what CMIP is and how to access the data, please see this page.