In this section we download and preprocess carbon dioxide (CO2) measurements taken at the Mauna Loa Observatory in Hawaii. The data is provided by Dr. Pieter Tans, NOAA/ESRL and Dr. Ralph Keeling, Scripps Institution of Oceanography and may be downloaded here.

The data describes the ongoing change in concentration of carbon dioxide in Earth's atmosphere since the 1950s. The data collection was initiated under the supervision of Charles David Keeling. Keeling's measurements showed the first significant evidence of rapidly increasing carbon dioxide levels in the atmosphere. If the connection is failing you may download the data set here (downloaded on June 25, 2022).

In [2]:
# First, let's import the needed libraries.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

from datetime import datetime

Let us get aware of the location by plotting the Mauna Loa Observatory on an interactive map. You may click on the blue dot.

In [3]:
import folium

# Create a map, centered on Berlin
m = folium.Map(location=[19.53611, 204.4239], zoom_start=18)

# Add marker FU in Berlin - Dahlem
folium.Marker(
    location=[19.53611, 204.4239],  # coordinates
    popup="Mauna Loa Observatory, Hawaii",  # pop-up label
).add_to(m)


# Add custom base maps to folium
basemaps = {
    "Esri Satellite": folium.TileLayer(
        tiles="https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{z}/{y}/{x}",
        attr="Esri",
        name="Esri Satellite",
        overlay=True,
        control=True,
    )
}

basemaps["Esri Satellite"].add_to(m)


# Display map (m)
m
Out[3]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [4]:
current_time = datetime.now().strftime("%Y-%m-%d")
print("Date of download: ", current_time)
Date of download:  2023-04-03

Now, we want to import the .csv file from the URL.

In [5]:
import requests
from io import StringIO


url = "https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_mm_mlo.csv"
s = requests.get(url).text

co2_raw = pd.read_csv(
    StringIO(s), sep=",", skiprows=56, header=1
)  ## skiprows to skip the intro text, first 51 lines
In [6]:
co2_raw.columns = [
    "year",
    "month",
    "decimal date",
    "interpolated",
    "trend season corr",
    "#days",
    "sdev",
    "unc",
]

co2_raw.head(15)
Out[6]:
year month decimal date interpolated trend season corr #days sdev unc
0 1958 4 1958.2877 317.45 315.16 -1 -9.99 -0.99
1 1958 5 1958.3699 317.51 314.71 -1 -9.99 -0.99
2 1958 6 1958.4548 317.24 315.14 -1 -9.99 -0.99
3 1958 7 1958.5370 315.86 315.18 -1 -9.99 -0.99
4 1958 8 1958.6219 314.93 316.18 -1 -9.99 -0.99
5 1958 9 1958.7068 313.20 316.08 -1 -9.99 -0.99
6 1958 10 1958.7890 312.43 315.41 -1 -9.99 -0.99
7 1958 11 1958.8740 313.33 315.20 -1 -9.99 -0.99
8 1958 12 1958.9562 314.67 315.43 -1 -9.99 -0.99
9 1959 1 1959.0411 315.58 315.55 -1 -9.99 -0.99
10 1959 2 1959.1260 316.48 315.86 -1 -9.99 -0.99
11 1959 3 1959.2027 316.65 315.38 -1 -9.99 -0.99
12 1959 4 1959.2877 317.72 315.41 -1 -9.99 -0.99
13 1959 5 1959.3699 318.29 315.49 -1 -9.99 -0.99
14 1959 6 1959.4548 318.15 316.03 -1 -9.99 -0.99

The data set (co2_raw) shows carbon dioxide (CO2) measurements taken at the Mauna Loa Observatoryin Hawaii. The data is provided by: Trends in Atmospheric Carbon Dioxide, Mauna Loa, Hawaii. Dr. Pieter Tans, NOAA/ESRL and Dr. Ralph Keeling, Scripps Institution of Oceanography.

In [7]:
co2_raw.columns
Out[7]:
Index(['year', 'month', 'decimal date', 'interpolated', 'trend season corr',
       '#days', 'sdev', 'unc'],
      dtype='object')

The data set has 775 rows and 9 columns, with the following variables: 'year', 'month', 'decimal date', 'interpolated', 'trend season corr','#days', 'sdev', 'unc', 'Date'.

The interpolated column contains the monthly mean CO2 mole fraction determined from daily averages. The mole fraction of CO2, expressed as parts per million (ppm) is the number of molecules of CO2 in every one million molecules of dried air (water vapor removed). Missing months are denoted by $-99.99$. The interpolated column includes average values from the preceding column and interpolated values where data are missing.

Let us create an pd.Series object with the interpolated CO2 concentrations and the corresponding date. In the original data set the date is given by the year and the month. We may easily combine the by converting them to characters and adding a + operator.

In [8]:
co2_raw["Date"] = co2_raw["year"].astype(str) + "-" + co2_raw["month"].astype(str)
co2_raw["Date"] = pd.to_datetime(co2_raw["Date"], format="%Y-%m")
In [9]:
co2 = pd.Series(co2_raw["interpolated"].values, index=co2_raw["Date"])
type(co2)
Out[9]:
pandas.core.series.Series

Once the data is captured inside a pandas.Series object, we can easily plot the data.

In [10]:
plt.figure(figsize=(14, 4))
co2.plot()

plt.title("Keeling Curve")
plt.ylabel("$CO_{2}$ (ppm)")
plt.show()

This characteristic graph showing the rising the CO2 concentration over time is often referred to as Keeling Curve. Each year when the terrestrial vegetation of the Northern Hemisphere expands with the seasons, it removes CO2 from the atmosphere in its productive growing phase, while it returns CO2 to the air when it dies and decomposes. This phenomenon creates a seasonal oscillation in the atmosphere's CO2 concentration.

Finally, we store the time series data set in a .json using the to_json file for further processing.

In [11]:
co2_raw.to_json("../data/KeelingCurve.json", date_format="iso")

To read the .json use the following command.

In [12]:
co2_raw = pd.read_json("../data/KeelingCurve.json")
co2_raw
Out[12]:
year month decimal date interpolated trend season corr #days sdev unc Date
0 1958 4 1958.2877 317.45 315.16 -1 -9.99 -0.99 1958-04-01
1 1958 5 1958.3699 317.51 314.71 -1 -9.99 -0.99 1958-05-01
2 1958 6 1958.4548 317.24 315.14 -1 -9.99 -0.99 1958-06-01
3 1958 7 1958.5370 315.86 315.18 -1 -9.99 -0.99 1958-07-01
4 1958 8 1958.6219 314.93 316.18 -1 -9.99 -0.99 1958-08-01
... ... ... ... ... ... ... ... ... ...
774 2022 10 2022.7917 415.78 419.13 30 0.27 0.10 2022-10-01
775 2022 11 2022.8750 417.51 419.51 25 0.52 0.20 2022-11-01
776 2022 12 2022.9583 418.95 419.64 24 0.50 0.20 2022-12-01
777 2023 1 2023.0417 419.47 419.14 31 0.40 0.14 2023-01-01
778 2023 2 2023.1250 420.41 419.49 25 0.64 0.25 2023-02-01

779 rows × 9 columns


Citation

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.