Open In Colab   Open in Kaggle

Tutorial 6: Spectral Analysis of Paleoclimate Data#

Week 1, Day 4, Paleoclimate

Content creators: Sloane Garelick

Content reviewers: Yosmely Bermúdez, Dionessa Biton, Katrina Dobson, Maria Gonzalez, Will Gregory, Nahid Hasan, Paul Heubel, Sherry Mi, Beatriz Cosenza Muralles, Brodie Pearson, Jenna Pearson, Chi Zhang, Ohad Zivan

Content editors: Yosmely Bermúdez, Paul Heubel, Zahra Khodakaramimaghsoud, Jenna Pearson, Agustina Pesce, Chi Zhang, Ohad Zivan

Production editors: Wesley Banfield, Paul Heubel, Jenna Pearson, Konstantine Tsafatinos, Chi Zhang, Ohad Zivan

Our 2024 Sponsors: CMIP, NFDI4Earth

Tutorial Objectives#

Estimated timing of tutorial: 25 minutes

In this tutorial, you will manipulate paleoclimate proxy datasets using previously learned computational tools, and apply spectral analysis to further interpret the data.

By the end of this tutorial you will be able to:

  • Interpret the LR04 benthic δ18O curve and how it records temperature

  • Perform various spectral analysis methods to assess dominant spectral powers in the LR04 data

  • Interpret variations in glacial-interglacial cycles recorded by the LR04 δ18O record

Setup#

# installations ( uncomment and run this cell ONLY when using google colab or kaggle )

# !pip install pyleoclim
# imports
import pooch
import os
import tempfile
import pandas as pd
import matplotlib.pyplot as plt

import pyleoclim as pyleo

Install and import feedback gadget#

Hide code cell source
# @title Install and import feedback gadget

!pip3 install vibecheck datatops --quiet

from vibecheck import DatatopsContentReviewContainer
def content_review(notebook_section: str):
    return DatatopsContentReviewContainer(
        "",  # No text prompt
        notebook_section,
        {
            "url": "https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab",
            "name": "comptools_4clim",
            "user_key": "l5jpxuee",
        },
    ).render()


feedback_prefix = "W1D4_T6"
[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: pip install --upgrade pip

Figure Settings#

Hide code cell source
# @title Figure Settings
import ipywidgets as widgets  # interactive display

%config InlineBackend.figure_format = 'retina'
plt.style.use(
    "https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle"
)

Helper functions#

Hide code cell source
# @title Helper functions

def pooch_load(filelocation=None, filename=None, processor=None):
    shared_location = "/home/jovyan/shared/Data/tutorials/W1D4_Paleoclimate"  # this is different for each day
    user_temp_cache = tempfile.gettempdir()

    if os.path.exists(os.path.join(shared_location, filename)):
        file = os.path.join(shared_location, filename)
    else:
        file = pooch.retrieve(
            filelocation,
            known_hash=None,
            fname=os.path.join(user_temp_cache, filename),
            processor=processor,
        )

    return file

Video 1: Spectral Analysis#

Submit your feedback#

Hide code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Spectral_Analysis_Video")
If you want to download the slides: https://osf.io/download/g7rp4/

Submit your feedback#

Hide code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Spectral_Analysis_Slides")

Section 1: Plotting the LR04 δ18O benthic stack#

We will be analyzing a δ18O data from Lisiecki, L. E., and Raymo, M. E. (2005), a stack of 57 globally distributed records of δ18O from benthic foraminifera that spans the past ~5 million years. As we learned from the introductory video, δ18O of foraminifera records temperature due to differences in the isotopic composition of the ocean during glacial and interglacial periods. The δ18O of the ocean (and forams) is more depleted (takes on a smaller values) during interglacials and more enriched (takes on a larger value) during glacials.

Let’s start by importing the data:

# Donwload the data
filename_LR04 = "LR04.csv"
url_LR04 = (
    "https://raw.githubusercontent.com/LinkedEarth/PyleoTutorials/main/data/LR04.csv"
)
lr04_data = pd.read_csv(
    pooch_load(filelocation=url_LR04, filename=filename_LR04), skiprows=4
)
lr04_data.head()
Time (ka) Benthic d18O (per mil) Standard error (per mil)
0 0.0 3.23 0.03
1 1.0 3.23 0.04
2 2.0 3.18 0.03
3 3.0 3.29 0.03
4 4.0 3.30 0.03

We can now create a Series object containing the data:

ts_lr04 = pyleo.Series(
    time=lr04_data["Time (ka)"],
    value=lr04_data["Benthic d18O (per mil)  "],
    time_name="Age",
    time_unit="kyr BP",
    value_name="Benthic d18O (per mil)",
    value_unit="\u2030",
    label="LR04",
)
Time axis values sorted in ascending order
/tmp/ipykernel_118771/827671795.py:1: UserWarning: auto_time_params is not specified. Currently default behavior sets this to True, which might modify your supplied time metadata.  Please set to False if you want a different behavior.
  ts_lr04 = pyleo.Series(

This record spans the past ~5 million years (5,000 kyr), but we’re going to focus in on the past 1 million years (1,000 kyr), so let’s create a time slice of just this time period, and plot the time series.

lr04_slice = ts_lr04.slice([0, 1000])
fig, ax = plt.subplots()  # assign a new plot axis
lr04_slice.plot(ax=ax, legend=False, invert_yaxis=True)
<Axes: xlabel='Age [kyr BP]', ylabel='Benthic d18O (per mil) [‰]'>
../../../_images/8e27a20986f2b56dbc4774df77eecd8b27fbdc3442a3b31c5c0afc16a87bad07.png

Questions 1#

  1. What patterns do you observe in the record?

  2. Does the amplitude of the glacial-interglacial cycles vary over time?

  3. At what frequency do these glacial-interglacial cycles occur?

Click for solution

Submit your feedback#

Hide code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Questions_1")

Section 2: Spectral analysis of the LRO4 δ18O benthic stack#

To better assess the dominant temporal patterns in this record, you can use spectral analysis. As you learned in the introductory video, spectral analysis is a useful data analysis tool in paleoclimate because it allows us to discover underlying periodicities in time series data and can help establish quantitative relationships between forcings and climate variations.

Let’s explore various spectral analysis methods and apply them to the LR04 record to interpret changes in the frequency of glacial-interglacial cycles.

Section 2.1: Spectral Density#

Pyleoclim enables five methods to estimate the spectral density, which will be our main interest in this tutorial:

  • Basic Periodogram, which uses a Fourier transform. The method has various windowing available to reduce variance.

  • Welch’s periodogram, a variant of the basic periodogram, which uses Welch’s method of overlapping segments. The periodogram is computed on each segment and averaged together.

  • Multi-taper method (MTM), which attempts to reduce the variance of spectral estimates by using a small set of tapers rather than the unique data taper or spectral window.

  • Lomb-Scargle periodogram, an inverse approach designed for unevenly-spaced datasets. Several windows are available and Welch’s segmentation can also be used with this method.

  • Weighted wavelet Z-transform (WWZ), a wavelet-based method also made for unevenly-spaced datasets.

All of these methods are available through Series.spectral() by changing the method argument. In this tutorial, we’ll focus on Lomb-Scargle and Weighted wavelet Z-transform in more detail. For additional information on the other methods, refer to this notebook from Pyleoclim.

To get a sense of how the estimates of the spectral density using each of these methods compare, we can plot them on the same graph below. Note we must first interpolate to an even grid and then standardize before estimating the spectral density.

fig, ax = plt.subplots()

# basic periodogram
lr04_slice.interp(step=0.5).standardize().spectral(method="periodogram").plot(
    ax=ax, xlim=[1000, 5], ylim=[0.001, 1000], label="periodogram"
)
# Welch's periodogram
lr04_slice.interp(step=0.5).standardize().spectral(method="welch").plot(
    ax=ax, xlim=[1000, 5], ylim=[0.001, 1000], label="welch"
)
# Multi-taper Method
lr04_slice.interp(step=0.5).standardize().spectral(method="mtm").plot(
    ax=ax, xlim=[1000, 5], ylim=[0.001, 1000], label="mtm"
)
# Lomb-Scargle periodogram
lr04_slice.interp(step=0.5).standardize().spectral(method="lomb_scargle").plot(
    ax=ax, xlim=[1000, 5], ylim=[0.001, 1000], label="lomb_scargle"
)
# weighted wavelet Z-transform (WWZ)
lr04_slice.interp(step=0.5).standardize().spectral(method="wwz").plot(
    ax=ax, xlim=[1000, 5], ylim=[0.001, 1000], label="wwz"
)
<Axes: xlabel='Period [kyr]', ylabel='PSD'>
../../../_images/dd636b7ee55b55dd3c674eecc3b3a4921d102cbf33363f4c96700d70239d91d1.png

Which method should you choose? The Lomb-Scargle periodogram is the default method in Pyleoclim, and is a good choice for many applications because it:

  1. works with unevenly-spaced time series which are often encountered in paleoclimate data,

  2. seems to give consistent results over parameter ranges,

  3. doesn’t require interpolation, limiting time series manipulation, and

  4. is fairly fast to compute.

We will choose this estimate to analyze.

# Lomb-Scargle periodogram
lr04_ls_sd = lr04_slice.interp(step=0.5).standardize().spectral(method="lomb_scargle")

lr04_ls_sd.plot(xlim=[1000, 5], ylim=[0.001, 1000], label="lomb_scargle")
(<Figure size 1000x400 with 1 Axes>,
 <Axes: xlabel='Period [kyr]', ylabel='PSD'>)
../../../_images/4e29895f10e30a8db5fa4f6c69e3e111050398039fedbed1ba2b257083b7d22f.png

Section 2.2: Significance Testing#

Note that there are few peaks that seem more dominant than others, but which of these peaks are significant? That is, do they stand out more than peaks would from a random time series? To figure that out, we can use the method signif_test. Pyleoclim generates many surrogates of the time series using an AR(1) (Autoregressive Process of order 1 - i.e. a random process). These surrogates serve to identify the probability of a peak not just being from random noise and the likelihood that the peak of the same magnitude would be unlikely to be present in a random dataset. The default number of surrogates generated is 200, and the larger this value, the more smooth and precise our results are.

lr04_ls_sd_sig = lr04_ls_sd.signif_test()
Performing spectral analysis on individual series:   0%|          | 0/200 [00:00<?, ?it/s]
Performing spectral analysis on individual series:   4%|▎         | 7/200 [00:00<00:03, 63.75it/s]
Performing spectral analysis on individual series:   7%|▋         | 14/200 [00:00<00:02, 63.47it/s]
Performing spectral analysis on individual series:  10%|█         | 21/200 [00:00<00:02, 63.48it/s]
Performing spectral analysis on individual series:  14%|█▍        | 28/200 [00:00<00:02, 63.38it/s]
Performing spectral analysis on individual series:  18%|█▊        | 35/200 [00:00<00:02, 63.47it/s]
Performing spectral analysis on individual series:  21%|██        | 42/200 [00:00<00:02, 63.44it/s]
Performing spectral analysis on individual series:  24%|██▍       | 49/200 [00:00<00:02, 63.48it/s]
Performing spectral analysis on individual series:  28%|██▊       | 56/200 [00:00<00:02, 63.49it/s]
Performing spectral analysis on individual series:  32%|███▏      | 63/200 [00:00<00:02, 63.51it/s]
Performing spectral analysis on individual series:  35%|███▌      | 70/200 [00:01<00:02, 63.51it/s]
Performing spectral analysis on individual series:  38%|███▊      | 77/200 [00:01<00:01, 63.54it/s]
Performing spectral analysis on individual series:  42%|████▏     | 84/200 [00:01<00:01, 63.58it/s]
Performing spectral analysis on individual series:  46%|████▌     | 91/200 [00:01<00:01, 63.52it/s]
Performing spectral analysis on individual series:  49%|████▉     | 98/200 [00:01<00:01, 63.54it/s]
Performing spectral analysis on individual series:  52%|█████▎    | 105/200 [00:01<00:01, 63.53it/s]
Performing spectral analysis on individual series:  56%|█████▌    | 112/200 [00:01<00:01, 63.58it/s]
Performing spectral analysis on individual series:  60%|█████▉    | 119/200 [00:01<00:01, 63.53it/s]
Performing spectral analysis on individual series:  63%|██████▎   | 126/200 [00:01<00:01, 63.54it/s]
Performing spectral analysis on individual series:  66%|██████▋   | 133/200 [00:02<00:01, 63.55it/s]
Performing spectral analysis on individual series:  70%|███████   | 140/200 [00:02<00:00, 63.50it/s]
Performing spectral analysis on individual series:  74%|███████▎  | 147/200 [00:02<00:00, 63.54it/s]
Performing spectral analysis on individual series:  77%|███████▋  | 154/200 [00:02<00:00, 63.48it/s]
Performing spectral analysis on individual series:  80%|████████  | 161/200 [00:02<00:00, 63.52it/s]
Performing spectral analysis on individual series:  84%|████████▍ | 168/200 [00:02<00:00, 63.31it/s]
Performing spectral analysis on individual series:  88%|████████▊ | 175/200 [00:02<00:00, 63.38it/s]
Performing spectral analysis on individual series:  91%|█████████ | 182/200 [00:02<00:00, 63.43it/s]
Performing spectral analysis on individual series:  94%|█████████▍| 189/200 [00:02<00:00, 63.48it/s]
Performing spectral analysis on individual series:  98%|█████████▊| 196/200 [00:03<00:00, 63.50it/s]
Performing spectral analysis on individual series: 100%|██████████| 200/200 [00:03<00:00, 63.49it/s]

lr04_ls_sd_sig.plot(xlabel="Period [kyrs]")
(<Figure size 1000x400 with 1 Axes>,
 <Axes: xlabel='Period [kyrs]', ylabel='PSD'>)
../../../_images/affbca861f47f130f08666f5a11b649f477c08f17a8833fb782cb4b612c3f805.png

The variable lr04_ls_sd_sig PSD object contains the significant frequencies of the significant peaks from the spectral density of the LR04 data. We can now create the same plot of spectral power, but with only the significant peaks.

NOTE: For this figure, we will plot the x-axis a bit differently. In the previous plot, the x-axis was the “period” but this time the x-axis will be “frequency”, which is the inverse of the period.

fig, ax = plt.subplots()
ax.plot(lr04_ls_sd_sig.frequency, lr04_ls_sd_sig.amplitude)
ax.set_xlim([0.005, 0.07])
ax.set_ylim([0, 200])
ax.set_xlabel("Frequency [1/kyr]")
ax.set_ylabel("Spectra Amplitude")
Text(0, 0.5, 'Spectra Amplitude')
../../../_images/fb5050f86940609fb4b6e5674a20c98f8ff0c6ec34a5c3fd441daef61a6b4d2e.png

The largest spectral powers in the data occur at a frequencies of ~0.01 and ~0.025, as well as a smaller peak at ~0.042, which correspond to 100 kyr, 40 kyr, and 23 kyr, respectively. Do those periodicities sound familiar?

These are the cycles of eccentricity, obliquity and precession. The presence of spectral powers in the LR04 data at the eccentricity, obliquity and precession frequencies highlights the influence of orbital forcings on glacial-interglacial cycles.

Section 2.3: Wavelet Analysis#

Another related tool we can use to learn more about the climate variability recorded in the data is wavelet analysis. This method allows us to “unfold” a spectrum and look at its evolution over time. In other words, wavelet analysis can help us determine changes in the spectral power over time. For additional details about wavelet analysis, refer to this guide.

There are several ways to access this functionality in Pyleoclim, but here we use summary_plot, which stacks together the timeseries itself in the upper panel, its scalogram (a plot of the magnitude of the wavelet power) in the middle, and the power spectral density (PSD, which we plotted earlier in this tutorial) obtained from summing the wavelet coefficients over time to the right.

# create a scalogram
scal = lr04_slice.interp(step=0.5).standardize().wavelet()
# make plot
lr04_slice.summary_plot(
    psd=lr04_ls_sd_sig, scalogram=scal, time_lim=[0, 1000], psd_label="Amplitude"
)
(<Figure size 800x1000 with 4 Axes>,
 {'ts': <Axes: xlabel='Age [kyr BP]', ylabel='Benthic d18O (per mil) [‰]'>,
  'scal': <Axes: xlabel='Age [kyr BP]', ylabel='Scale [kyrs]'>,
  'psd': <Axes: xlabel='Amplitude'>,
  'cb': <Axes: xlabel='cwt Amplitude'>})
../../../_images/80c68b47e259471ed0d06d60fbf065bb43e48fb138015faf2ca0e93df1a7ac1f.png

In this wavelet spectrum, the age of the record is on the x-axis, the period is on the y-axis, and the color represents the amplitude of that power spectrum, with yellow indicating a higher power and purple indicating a lower power. The time series on top is the original LR04 δ18O data, and the plot on the right is the spectral analysis and significance test figure we created earlier in this tutorial.

Questions 2.3: Climate Connection#

  1. In the spectral analysis above, the 100 kyr and 40 kyr period are the most dominant. Here, we further see that over the past 1 million years, the 100 kyr cycle is the strongest (as seen by the yellow color at the 100 kyr scale), followed by the 40 kyr cycle (as seen by the light purple color at the 40 kyr scale). You may notice an absence of much color at the 23 kyr scale. What does this suggest about the influence of precession on glacial-interglacial cycles on this timescale?

Click for solution

Submit your feedback#

Hide code cell source
# @title Submit your feedback
content_review(f"{feedback_prefix}_Questions_2_3")

Summary#

In this tutorial you learned how to analyze paleoclimate proxy datasets using spectral and wavelet analysis. You were introduced to several spectral analysis methods, identified significant spectral peaks in the spectral density, and looked at the wavelet transform, it’s evolution over time. By the end, you were able to determine the influence of various periods, like the 100 kyr and 40 kyr orbital cycles, on glacial-interglacial cycles.

Resources#

Code for this tutorial is based on existing notebooks from LinkedEarth for spectral analysis and wavelet analysis.

Data from the following sources are used in this tutorial:

  • Lisiecki, L. E., and Raymo, M. E. (2005), A Pliocene-Pleistocene stack of 57 globally distributed benthic δ18O records, Paleoceanography, 20, PA1003, https://doi.org/10.1029/2004PA001071.