{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/climate-course-content/blob/main/tutorials/W2D3_ExtremesandVariability/student/W2D3_Tutorial3.ipynb)   <a href=\"https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/climate-course-content/main/tutorials/W2D3_ExtremesandVariability/student/W2D3_Tutorial3.ipynb\" target=\"_blank\"><img alt=\"Open in Kaggle\" src=\"https://kaggle.com/static/images/open-in-kaggle.svg\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Tutorial 3: Extreme Value Analysis - the GEV Distribution\n",
    "\n",
    "**Week 2, Day 3, Extremes & Variability**\n",
    "\n",
    "**Content creators:** Matthias Aengenheyster, Joeri Reinders\n",
    "\n",
    "**Content reviewers:** Younkap Nina Duplex, Sloane Garelick, Paul Heubel, Zahra Khodakaramimaghsoud, Peter Ohue, Laura Paccini, Jenna Pearson, Agustina Pesce, Derick Temfack, Peizhen Yang, Cheng Zhang, Chi Zhang, Ohad Zivan\n",
    "\n",
    "**Content editors:** Paul Heubel, Jenna Pearson, Konstantine Tsafatinos, Chi Zhang, Ohad Zivan\n",
    "\n",
    "**Production editors:** Wesley Banfield, Paul Heubel, Jenna Pearson, Chi Zhang, Ohad Zivan\n",
    "\n",
    "**Our 2024 Sponsors:** CMIP, NFDI4Earth"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Tutorial Objectives"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "*Estimated timing of tutorial:* 30 minutes\n",
    "\n",
    "In the previous tutorial, you used the empirical method to determine return values. Another approach involves computing the probability of exceeding a certain threshold using a probability density function (PDF) fitted to the data. For instance, in Tutorial 1, we applied the normal distribution PDF. However, the normal distribution did not adequately fit our precipitation data. Here, we will explore fitting a distribution, i.e. the **Generalized Extreme Value (GEV)** distribution, that is typically more suitable for data with skewed histograms.\n",
    "\n",
    "By the end of this tutorial, you will have gained the following skills:\n",
    "\n",
    "- Creating a quantile-quantile (QQ) plot to assess the goodness-of-fit between a distribution and the data.\n",
    "- Fitting a Generalized Extreme Value (GEV) distribution to the data.\n",
    "- Understanding how the parameters of the GEV distribution influence its behavior."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 2336,
     "status": "ok",
     "timestamp": 1681923364409,
     "user": {
      "displayName": "Matthias Aengenheyster",
      "userId": "16322208118439170907"
     },
     "user_tz": -60
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# imports\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import pandas as pd\n",
    "from scipy import stats\n",
    "from scipy.stats import genextreme as gev\n",
    "import os\n",
    "import pooch\n",
    "import tempfile"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Install and import feedback gadget\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Install and import feedback gadget\n",
    "\n",
    "!pip3 install vibecheck datatops --quiet\n",
    "\n",
    "from vibecheck import DatatopsContentReviewContainer\n",
    "def content_review(notebook_section: str):\n",
    "    return DatatopsContentReviewContainer(\n",
    "        \"\",  # No text prompt\n",
    "        notebook_section,\n",
    "        {\n",
    "            \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n",
    "            \"name\": \"comptools_4clim\",\n",
    "            \"user_key\": \"l5jpxuee\",\n",
    "        },\n",
    "    ).render()\n",
    "\n",
    "\n",
    "feedback_prefix = \"W2D3_T3\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Figure Settings\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Figure Settings\n",
    "import ipywidgets as widgets  # interactive display\n",
    "\n",
    "%config InlineBackend.figure_format = 'retina'\n",
    "plt.style.use(\n",
    "    \"https://raw.githubusercontent.com/neuromatch/climate-course-content/main/cma.mplstyle\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Helper functions\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Helper functions\n",
    "\n",
    "\n",
    "def pooch_load(filelocation=None, filename=None, processor=None):\n",
    "    shared_location = \"/home/jovyan/shared/Data/tutorials/W2D3_ExtremesandVariability\"  # this is different for each day\n",
    "    user_temp_cache = tempfile.gettempdir()\n",
    "\n",
    "    if os.path.exists(os.path.join(shared_location, filename)):\n",
    "        file = os.path.join(shared_location, filename)\n",
    "    else:\n",
    "        file = pooch.retrieve(\n",
    "            filelocation,\n",
    "            known_hash=None,\n",
    "            fname=os.path.join(user_temp_cache, filename),\n",
    "            processor=processor,\n",
    "        )\n",
    "\n",
    "    return file"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Video 1: Extreme Value Analysis\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Video 1: Extreme Value Analysis\n",
    "\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import YouTubeVideo\n",
    "from IPython.display import IFrame\n",
    "from IPython.display import display\n",
    "\n",
    "\n",
    "class PlayVideo(IFrame):\n",
    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
    "    self.id = id\n",
    "    if source == 'Bilibili':\n",
    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
    "    elif source == 'Osf':\n",
    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
    "\n",
    "\n",
    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
    "  tab_contents = []\n",
    "  for i, video_id in enumerate(video_ids):\n",
    "    out = widgets.Output()\n",
    "    with out:\n",
    "      if video_ids[i][0] == 'Youtube':\n",
    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
    "                             height=H, fs=fs, rel=0)\n",
    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
    "      else:\n",
    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
    "                          height=H, fs=fs, autoplay=False)\n",
    "        if video_ids[i][0] == 'Bilibili':\n",
    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
    "        elif video_ids[i][0] == 'Osf':\n",
    "          print(f'Video available at https://osf.io/{video.id}')\n",
    "      display(video)\n",
    "    tab_contents.append(out)\n",
    "  return tab_contents\n",
    "\n",
    "\n",
    "video_ids = [('Youtube', 'Vblk--ifjsc'), ('Bilibili', 'BV1WM4y1x7aS')]\n",
    "tab_contents = display_videos(video_ids, W=730, H=410)\n",
    "tabs = widgets.Tab()\n",
    "tabs.children = tab_contents\n",
    "for i in range(len(tab_contents)):\n",
    "  tabs.set_title(i, video_ids[i][0])\n",
    "display(tabs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Extreme_Value_Analysis_Video\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "pycharm": {
     "name": "#%%\n"
    },
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import IFrame\n",
    "\n",
    "link_id = \"3bv76\"\n",
    "\n",
    "print(f\"If you want to download the slides: https://osf.io/download/{link_id}/\")\n",
    "IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\", width=854, height=480)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Extreme_Value_Analysis_Slides\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Section 1: Precipitation Data Histogram and Normal Distribution\n",
    "Let's repeat our first steps from Tutorials 1 and 2: \n",
    "1) Open the annual maximum daily precipitation record from Germany.\n",
    "2) Create a histogram of the data and plot the normal distribution PDF."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 331,
     "status": "ok",
     "timestamp": 1681923455336,
     "user": {
      "displayName": "Matthias Aengenheyster",
      "userId": "16322208118439170907"
     },
     "user_tz": -60
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# download file: 'precipitationGermany_1920-2022.csv'\n",
    "filename_precipitationGermany = \"precipitationGermany_1920-2022.csv\"\n",
    "url_precipitationGermany = \"https://osf.io/xs7h6/download\"\n",
    "data = pd.read_csv(\n",
    "    pooch_load(url_precipitationGermany, filename_precipitationGermany), index_col=0\n",
    ").set_index(\"years\")\n",
    "data.columns = [\"precipitation\"]\n",
    "precipitation = data.precipitation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 755,
     "status": "ok",
     "timestamp": 1681923459971,
     "user": {
      "displayName": "Matthias Aengenheyster",
      "userId": "16322208118439170907"
     },
     "user_tz": -60
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots()\n",
    "x_r100 = np.arange(0, 100, 1)\n",
    "bins = np.arange(0, precipitation.max(), 2)\n",
    "\n",
    "# make histogram\n",
    "sns.histplot(precipitation, bins=bins, stat=\"density\", ax=ax)\n",
    "\n",
    "# plot PDF\n",
    "ax.plot(\n",
    "    x_r100,\n",
    "    stats.norm.pdf(x_r100, precipitation.mean(), precipitation.std()),\n",
    "    c=\"k\",\n",
    "    lw=3,\n",
    ")\n",
    "\n",
    "ax.set_xlim(bins[0], bins[-1])\n",
    "ylim = ax.get_ylim()\n",
    "ax.set_xlabel(\"Annual Maximum Daily Precipitation \\n(mm/day)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "<details>\n",
    "<summary> <font color='orangered'>Click here for a description of the plot  </font></summary>\n",
    "Histogram plot of the annual maximum daily precipitation in millimeters per day. The horizontal axis of the histogram represents the data's range via bins, while the vertical axis represents the frequency of occurrences within each interval by counting the number of data points that fall into each bin. By visually inspecting the histogram we get an impression of the shape, central tendency, and spread of the dataset. The bin with the largest amount of occurrences shows 12 counts of annual maximum daily precipitation between 14 and 16 millimeters per day. Bins that show at least one count range from 14 millimeters per day to 60 millimeters per day. Furthermore, the normal distribution is overlaid in black. \n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Next, we generate a [quantile-quantile (QQ) plot](https://en.wikipedia.org/wiki/Q–Q_plot) to assess how well our data aligns with a normal distribution. A **QQ plot** compares the *actual* [percentiles](https://en.wikipedia.org/wiki/Percentile) of the data to the *expected* percentiles based on a specific distribution, such as a normal distribution in this case. The percentiles are the points in your data below which a certain proportion of your data falls. Here we will be using [`norm.ppf()`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html#scipy-stats-norm) where 'ppf' stands for percent point function and returns the quantiles of the normal distribution function (i.e. percentiles). On our precipitation data, we will be using [`numpy.quantile()`](https://numpy.org/doc/stable/reference/generated/numpy.quantile.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 245,
     "status": "ok",
     "timestamp": 1681923461802,
     "user": {
      "displayName": "Matthias Aengenheyster",
      "userId": "16322208118439170907"
     },
     "user_tz": -60
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "x_r1 = np.linspace(0, 1, 100)\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "\n",
    "ax.plot(stats.norm.ppf(x_r1, precipitation.mean(),\n",
    "                       precipitation.std()),\n",
    "        # quantiles of a normal distribution with the mean and std of our precip data\n",
    "        np.quantile(precipitation, x_r1),  # quantiles of our precip data\n",
    "        \"o\",\n",
    "        )\n",
    "ax.plot(x_r100, x_r100, \"k\")\n",
    "\n",
    "ax.set_xlim(10, 72)\n",
    "ax.set_ylim(10, 72)\n",
    "\n",
    "ax.set_xlabel(\"Normal Quantiles\")\n",
    "ax.set_ylabel(\"Sample Quantiles\")\n",
    "\n",
    "ax.grid(True)\n",
    "ax.set_aspect(\"equal\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "<details>\n",
    "<summary> <font color='orangered'>Click here for a description of the plot  </font></summary>\n",
    "A quantile-quantile (QQ) plot that compares the actual percentiles of the data with the expected percentiles of the normal distribution. As lower and upper quantiles differ strongly from the identity line, the normal distribution does not represent the data well in terms of extremes.\n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "If the fit was perfect, the quantile comparison would fall on the identity line (1:1) plotted in black. The stronger the deviations from the identity line, the worse this particular PDF fits our data. Hopefully, you concur that the current fit could be improved, particularly when it comes to the extreme values (lower and upper quantiles), which appear to be over- and underestimated by the normal distribution model. Therefore, let's explore alternative options, as there are numerous other distributions available. One example is the [Generalized Extreme Value (GEV) distribution](https://en.wikipedia.org/wiki/Generalized_extreme_value_distribution).\n",
    "\n",
    "The normal distribution is completely defined by two parameters: its mean and standard deviation. \n",
    "In contrast, the GEV distribution is defined by three parameters: the location, scale, and shape parameters. When the mean of the normal distribution is increased, it shifts the distribution towards higher values, while increasing the standard deviation makes the distribution wider. The normal distribution is symmetrical to its mean as it lacks a parameter that influences its skewness: There is always exactly half of the distribution to the right and left of its mean. This can pose challenges in certain scenarios.\n",
    "\n",
    "In the GEV distribution, the location and scale parameters behave similarly to the mean and standard deviation in the normal distribution. The shape parameter impacts the tails of the distribution, making them thinner or thicker. As extreme event distributions often exhibit thick tails, they tend to possess slight skewness. Adjusting the shape parameter, therefore, influences the skewness (and kurtosis) of the data.\n",
    "\n",
    "To estimate the parameters of the GEV distribution, we utilize the [`stats.genextreme()`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.genextreme.html#scipy-stats-genextreme) function from the `scipy` package. Here we call this function `gev`. The GEV distribution's three parameters (location, scale, and shape) can be estimated from data by calling `gev.fit()`. Note the second argument to this function given below is optional, it is the starting guess for the shape parameter. It sometimes makes sense to set this to zero as otherwise the fitting algorithm may be unstable and return incorrect values (hint: always check if your results are sensible!)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 170,
     "status": "ok",
     "timestamp": 1681923489030,
     "user": {
      "displayName": "Matthias Aengenheyster",
      "userId": "16322208118439170907"
     },
     "user_tz": -60
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "shape, loc, scale = gev.fit(precipitation.values,0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "tags": []
   },
   "outputs": [],
   "source": [
    "shape, loc, scale"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Let's generate another histogram of the data and overlay the GEV distribution with the fitted parameters. It's important to note that there are two sign conventions for the shape parameter. You may encounter an application that has it defined the other way around - check the documentation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "tags": []
   },
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots()\n",
    "# make histogram\n",
    "sns.histplot(precipitation, bins=bins, stat=\"density\", ax=ax)\n",
    "ax.set_xlim(bins[0], bins[-1])\n",
    "\n",
    "# add GEV PDF\n",
    "x_r80 = np.arange(80)\n",
    "\n",
    "ax.plot(x_r80, gev.pdf(x_r80, shape, loc=loc, scale=scale), \"k\", lw=3)\n",
    "ax.set_xlabel(\"Annual Maximum Daily Precipitation \\n(mm/day)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "<details>\n",
    "<summary> <font color='orangered'>Click here for a description of the plot  </font></summary>\n",
    "Histogram plot of the annual maximum daily precipitation in millimeters per day as above. Instead of the normal distribution the fitted generalized extreme value (GEV) distribution is overlaid in black. Its skewness and the thicker tails improve the data representation. \n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "And also create a QQ-plot:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 480,
     "status": "ok",
     "timestamp": 1681923514940,
     "user": {
      "displayName": "Matthias Aengenheyster",
      "userId": "16322208118439170907"
     },
     "user_tz": -60
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "x_r1 = np.linspace(0, 1, 100)\n",
    "x_r100 = np.linspace(0, 100, 100)\n",
    "\n",
    "fig, ax = plt.subplots()\n",
    "ax.plot(\n",
    "    gev.ppf(x_r1, shape, loc=loc, scale=scale),\n",
    "    # quantiles of GEV distribution using the parameter estimates from our data\n",
    "    np.quantile(precipitation, x_r1),\n",
    "    \"o\",\n",
    ")\n",
    "\n",
    "# actual quantiles of our data\n",
    "ax.plot(x_r100, x_r100, \"k\")\n",
    "\n",
    "# aesthetics\n",
    "ax.set_xlim(10, 72)\n",
    "ax.set_ylim(10, 72)\n",
    "\n",
    "ax.set_xlabel(\"GEV Quantiles\")\n",
    "ax.set_ylabel(\"Sample Quantiles\")\n",
    "\n",
    "ax.grid(True)\n",
    "ax.set_aspect(\"equal\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "<details>\n",
    "<summary> <font color='orangered'>Click here for a description of the plot  </font></summary>\n",
    "A quantile-quantile (QQ) plot that compares the actual percentiles of the data with the expected percentiles of the GEV distribution. As lower and upper quantiles differ less from the identity line, the GEV distribution represents the data much better in terms of extremes compared to the normal distribution. \n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "This looks much better! As we expected, the GEV distribution is a better fit for the data than the normal distribution given the skewness of the data we observed in Tutorial 1."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Now, we will overlay both PDFs (normal and GEV) on a single plot to visualize and compare the differences between them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 671,
     "status": "ok",
     "timestamp": 1681923521300,
     "user": {
      "displayName": "Matthias Aengenheyster",
      "userId": "16322208118439170907"
     },
     "user_tz": -60
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots()\n",
    "\n",
    "sns.histplot(precipitation, bins=bins, stat=\"density\", ax=ax, alpha=0.5, lw=0)\n",
    "\n",
    "# normal distribution\n",
    "ax.plot(\n",
    "    x_r100,\n",
    "    stats.norm.pdf(x_r100, precipitation.mean(), precipitation.std()),\n",
    "    c=\"C3\",\n",
    "    lw=3,\n",
    "    label=\"Normal\",\n",
    ")\n",
    "# GEV distribution\n",
    "ax.plot(x_r100, gev.pdf(x_r100, shape, loc=loc, scale=scale), c=\"k\", lw=3, label=\"GEV\")\n",
    "ax.legend()\n",
    "ax.set_xlabel(\"Annual Maximum Daily Precipitation \\n(mm/day)\")\n",
    "ax.set_ylabel(\"Density\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "<details>\n",
    "<summary> <font color='orangered'>Click here for a description of the plot  </font></summary>\n",
    "Histogram plot of the annual maximum daily precipitation in millimeters per day as above. Both, the normal distribution and the fitted generalized extreme value (GEV) distribution are overlaid in red and black, respectively.\n",
    "</details>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "How well do the two fitted distributions reflect the observed data and how do they compare to each other? "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Coding Exercise 1\n",
    "\n",
    "**Parameters of the GEV distribution**\n",
    "\n",
    "Play a little with the `gev.pdf()` function to get a better sense of how the parameters affect the shape of this particular PDF. Plot the GEV distribution against the ‘x’ sequence and vary one parameter. What does this particular parameter affect? \n",
    "\n",
    "First, create a plot in the following cell. In this case, the most important, i.e. the shape parameter, is varied while the other two (location, scale) are held constant. The parameter values and ranges below may be a useful starting point."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# set range of shape values to use\n",
    "range_shape = np.arange(-0.4, 0.4 + 0.1, 0.1)\n",
    "\n",
    "# set scale parameter\n",
    "scale = 7\n",
    "\n",
    "# set location parameter\n",
    "loc = 26\n",
    "\n",
    "# create precipitation array\n",
    "x_r80 = np.arange(80,step=0.5)\n",
    "\n",
    "# setup plots\n",
    "fig, ax = plt.subplots()\n",
    "\n",
    "# setup colors to use for lines\n",
    "colors_shape = plt.cm.coolwarm(np.linspace(0, 1, range_shape.size))\n",
    "\n",
    "# generate pdf for each shape value\n",
    "for idx, shapei in enumerate(range_shape):\n",
    "    ...\n",
    "\n",
    "# aesthetics\n",
    "ax.legend()\n",
    "ax.set_xlabel(\"Annual Maximum Daily Precipitation \\n(mm/day)\")\n",
    "ax.set_ylabel(\"Density\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {},
    "executionInfo": {
     "elapsed": 5,
     "status": "ok",
     "timestamp": 1681923523786,
     "user": {
      "displayName": "Matthias Aengenheyster",
      "userId": "16322208118439170907"
     },
     "user_tz": -60
    },
    "tags": []
   },
   "source": [
    "[*Click for solution*](https://github.com/neuromatch/climate-course-content/tree/main/tutorials/W2D3_ExtremesandVariability/solutions/W2D3_Tutorial3_Solution_e56638a6.py)\n",
    "\n",
    "*Example output:*\n",
    "\n",
    "<img alt='Solution hint' align='left' width=775.0 height=575.0 src=https://raw.githubusercontent.com/neuromatch/climate-course-content/main/tutorials/W2D3_ExtremesandVariability/static/W2D3_Tutorial3_Solution_e56638a6_1.png>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Coding_Exercise_1\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Secondly, you can examine the effects of the location and scale parameters on the GEV by changing them with the sliders in the interactive plot below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " Interactive Plot: Scale and Location Parameter\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown Interactive Plot: Scale and Location Parameter\n",
    "from ipywidgets import interact\n",
    "\n",
    "# slider for scale parameter\n",
    "scale_slider = widgets.FloatSlider(value=7,\n",
    "                                   min=4,\n",
    "                                   max=13,\n",
    "                                   step=0.2,\n",
    "                                   description='Scale parameter: ',\n",
    "                                   disabled=False,\n",
    "                                   orientation='horizontal',\n",
    "                                   continuous_update=True,\n",
    "                                   readout=True,\n",
    "                                   readout_format='.1f'\n",
    "                                  )\n",
    "# slider for location parameter\n",
    "location_slider = widgets.FloatSlider(value=26,\n",
    "                                      min=15,\n",
    "                                      max=50,\n",
    "                                      step=0.2,\n",
    "                                      description='Location parameter: ',\n",
    "                                      disabled=False,\n",
    "                                      orientation='horizontal',\n",
    "                                      continuous_update=True,\n",
    "                                      readout=True,\n",
    "                                      readout_format='.1f'\n",
    "                                     )\n",
    "\n",
    "ui = widgets.HBox([scale_slider, location_slider])\n",
    "\n",
    "def f(scale,loc):\n",
    "    # define shape parameter\n",
    "    shape = 0\n",
    "    # set x values\n",
    "    x_r80 = np.linspace(0, 80, 1000)\n",
    "    # create plot\n",
    "    fig, ax = plt.subplots()\n",
    "    # plot GEV PDF with selected parameters\n",
    "    ax.plot(\n",
    "            x_r80,\n",
    "            gev.pdf(x_r80, shape, loc=loc, scale=scale),\n",
    "            label=f\"Parameters:\\n\\nlocation = {loc:.0f}\\nscale = {scale:.0f}\\nshape = {shape:.0f}\",\n",
    "            )\n",
    "    # fix y-axis by defining limits\n",
    "    ax.set_ylim(-0.005,0.095)\n",
    "    # aesthetics\n",
    "    ax.legend()\n",
    "    ax.set_xlabel(\"Annual Maximum Daily Precipitation \\n(mm/day)\")\n",
    "    ax.set_ylabel(\"Density\")\n",
    "\n",
    "# combine sliders and plotting function\n",
    "out = widgets.interactive_output(f, {'scale':scale_slider, 'loc':location_slider})\n",
    "# show interactive plot\n",
    "display(ui,out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "We summarize the observed behavior in the following two plots. The top one shows the GEV with varying location parameter, while the bottom one shows the effect of the scale parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {},
    "executionInfo": {
     "elapsed": 887,
     "status": "ok",
     "timestamp": 1681923522185,
     "user": {
      "displayName": "Matthias Aengenheyster",
      "userId": "16322208118439170907"
     },
     "user_tz": -60
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# set range of location & scale values to use\n",
    "range_loc = np.arange(20, 41, 4)\n",
    "range_scale = np.arange(4, 11, 1)\n",
    "\n",
    "# set shape parameter\n",
    "shape = 0\n",
    "# set scale parameter for upper plot\n",
    "scale = 7\n",
    "# set location parameter for lower plot\n",
    "loc = 26\n",
    "\n",
    "# set x values\n",
    "x_r80 = np.linspace(0, 80, 1000)\n",
    "\n",
    "# setup plots\n",
    "fig, axs = plt.subplots(2,1, sharex=True)\n",
    "\n",
    "# setup colors to use for lines\n",
    "colors_loc = plt.cm.coolwarm(np.linspace(0, 1, range_loc.size))\n",
    "colors_scale = plt.cm.coolwarm(np.linspace(0, 1, range_scale.size))\n",
    "\n",
    "# generate pdf for each location value\n",
    "for idx, loci in enumerate(range_loc):\n",
    "    axs[0].plot(\n",
    "        x_r80,\n",
    "        gev.pdf(x_r80, shape, loc=loci, scale=scale),\n",
    "        color=colors_loc[idx],\n",
    "        label=\"loc = %i\" % loci,\n",
    "    )\n",
    "for idx, scalei in enumerate(range_scale):\n",
    "    axs[1].plot(\n",
    "        x_r80,\n",
    "        gev.pdf(x_r80, shape, loc=loc, scale=scalei),\n",
    "        color=colors_scale[idx],\n",
    "        label=\"scale = %.2f\" % scalei,\n",
    "    )\n",
    "# aesthetics\n",
    "for i in range(len(axs)):\n",
    "    axs[i].legend()\n",
    "    axs[i].set_ylabel(\"Density\")\n",
    "axs[1].set_xlabel(\"Annual Maximum Daily Precipitation \\n(mm/day)\")\n",
    "axs[1].set_ylabel(\"Density\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Questions 1\n",
    "\n",
    "1. How do the three parameters impact the shape of the distribution? What can you think of how these parameters affect extreme events?**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/neuromatch/climate-course-content/tree/main/tutorials/W2D3_ExtremesandVariability/solutions/W2D3_Tutorial3_Solution_7a0210e5.py)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_Questions_1\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Summary\n",
    "In this tutorial, you explored the the Generalized Extreme Value (GEV) distribution suitable for climate variables containing higher probabilities of extreme events. You used scipy to estimate these parameters from our data and fit a GEV distribution. You compared the fit of the normal and GEV distributions to your data using Quantile-Quantile (QQ) plots and overlaid the probability density functions of both distributions for visual comparison. Finally, you manipulated the parameters of the GEV distribution to understand their effects on the shape of the distribution."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Resources\n",
    "\n",
    "Data from this tutorial uses the 0.25 degree precipitation dataset E-OBS. It combines precipitation observations to generate a gridded (i.e. no \"holes\") precipitation over Europe. We used the precipitation data from the gridpoint at 51 N, 6 E. \n",
    "\n",
    "The dataset can be accessed using the KNMI Climate Explorer [here](https://climexp.knmi.nl/select.cgi?id=someone@somewhere&field=ensembles_025_rr). The Climate Explorer is a great resource to access, manipulate and visualize climate data, including observations and climate model simulations. It is freely accessible - feel free to explore!"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "include_colab_link": true,
   "name": "W2D3_Tutorial3",
   "provenance": [],
   "toc_visible": true
  },
  "kernel": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}