In this section, we download and preprocess a composite ice core atmospheric carbon dioxide (CO2) data set provided by the World Data Center for Paleoclimatology, National Oceanic and Atmospheric Administration (NOAA){target="_blank"} that extends back 800,000 years. The data may be directly accessed here{target="_blank"} or if the data set is not available you may download a copy here{target="_blank"} (downloaded on 2022-06-14).
The EPICA (European Project for Ice Coring in Antarctica) Dome C and Antarctic composite ice core atmospheric CO2 data is a new version of CO2 composite replaces the old version of Luthi et al. (2008). For details see Bereiter et al. (2015).
Age unit is in years before present (yr BP) where present refers to 1950 AD. The composite is built from the following:
51 - 1800 yr BP: Law Dome (Rubino et al. 2013)
1.8 - 2 kyr BP: Law Dome (MacFarling Meure et al. 2006)
2 - 11 kyr BP: Dome C (Monnin et al. 2001 + 2004)
11 - 22 kyr BP: West Antarctic Ice Sheet (WAIS) (Marcott et al. 2014)
22 - 40 kyr BP: Siple Dome (Ahn et al. 2014)
40 - 60 kyr BP: TALDICE (Bereiter et al. 2012)
60 - 115 kyr BP: EDML (Bereiter et al. 2012)
105 - 155 kyr BP: Dome C Sublimation (Schneider et al. 2013)
155 - 393 kyr BP: Vostok (Petit et al. 1999)
393 - 611 kyr BP: Dome C (Siegenthaler et al. 2005)
612 -- 800 kyr BP: Dome C (Bereiter et al. 2014)
In order to get a better intuition about the different data sources used, we have a look at a map of Antarctica, including the geographical locations of the drilling sites, using the plotting machinery in Python.
# First, let's import the needed libraries.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
# data derived from
# http://cdiac.ornl.gov/trends/co2/ice_core_co2.html and
# https://doi.pangaea.de/10.1594/PANGAEA.602127
antarctica = pd.DataFrame(
{
"Site": [
"EDML",
"Law Dome",
"Dome C",
"Taylor Dome",
"Vostok",
"Dome A",
"South Pole station",
"Siple Station",
],
"Elevation masl": [2892, 1390, 3233, 2365, 3500, 4084, 2810, 1054],
"Lat": [-75.003, -66.7333, -75.1, -77.8, -78.467, -80.367, -90, -75.917],
"Lon": [0.068, 112.083, 123.4, 158.717, 106.867, 77.367, 0.0, 276.083],
}
)
antarctica
| Site | Elevation masl | Lat | Lon | |
|---|---|---|---|---|
| 0 | EDML | 2892 | -75.0030 | 0.068 |
| 1 | Law Dome | 1390 | -66.7333 | 112.083 |
| 2 | Dome C | 3233 | -75.1000 | 123.400 |
| 3 | Taylor Dome | 2365 | -77.8000 | 158.717 |
| 4 | Vostok | 3500 | -78.4670 | 106.867 |
| 5 | Dome A | 4084 | -80.3670 | 77.367 |
| 6 | South Pole station | 2810 | -90.0000 | 0.000 |
| 7 | Siple Station | 1054 | -75.9170 | 276.083 |

Now we are ready to download the data. If the data sets are not available you may download a copy of the global data set here (downloaded on 2022-07-25).
current_time = datetime.now().strftime("%Y-%m-%d")
print("Date of download: ", current_time)
Date of download: 2023-04-03
co2_EPICA_raw = pd.read_csv(
"ftp://ftp.ncdc.noaa.gov/pub/data/paleo/icecore/antarctica/antarctica2015co2composite.txt",
skiprows=137,
sep="\t",
)
co2_EPICA_raw.columns[0]
'age_gas_calBP'
The data set consist of 1901 observations (rows) and 3 features (columns). The first column, age_gas_calBP, contains the age of the sample in calibrated years (calendar years) before present (BP). For detailed information on the Antarctic ice core chronology 2012 (AICC2012) refer to Bazin et al. (2013). The second column, co2_ppm, is our variable of interest, the concentration of carbon dioxide (CO2) in the atmosphere, given in parts per million (ppm). The third column, co2_1s_ppm, gives the standard deviations for the CO2 measurements.
For the sake of simplicity we round the age BP, given in age_gas_calBP, to an integer number.
# subset
co2_EPICA = co2_EPICA_raw[["age_gas_calBP", "co2_ppm"]].copy()
# round
co2_EPICA["age_gas_calBP"] = np.round(co2_EPICA["age_gas_calBP"]) / 1000
# rename
co2_EPICA.rename(
columns={"age_gas_calBP": "ky_BP_AICC2012", "co2_ppm": "co2_ppm"}, inplace=True
)
In order to avoid ambiguity of dates we perform an aggregation step using the groupyby() function in combination with agg() and take the mean of ambiguous ages.
co2_EPICA = co2_EPICA.groupby(["ky_BP_AICC2012"], as_index=False).agg(
{"co2_ppm": "mean"}
)
Finally, we transform the data frame into a pd.Series object and plot the data. pd.Series function needs a vector defining the data, but as well a vector with dates.
co2_EPICA_ts = pd.Series(co2_EPICA["co2_ppm"].values, index=co2_EPICA["ky_BP_AICC2012"])
type(co2_EPICA_ts)
pandas.core.series.Series
Now we plot our data set.
plt.figure(figsize=(10, 4))
co2_EPICA_ts.plot().invert_xaxis() ## reverse the x-axis
plt.title("Antarctic Ice Cores Revised 800 kyr $CO_{2}$ Data")
plt.ylabel("$CO_{2}$ (ppm)")
plt.xlabel("kyr BP")
plt.show()
Finally, we store the time series data set in a into a .json file using the to_json command for further processing.
co2_EPICA_ts = co2_EPICA_ts.to_frame(name="co2_ppm").reset_index()
co2_EPICA_ts.to_json("../data/Antartica_Ice_Core.json", date_format="iso")
To read the .json files, use the following command and date conversion.
co2_EPICA_ts = pd.read_json("../data/Antartica_Ice_Core.json")
co2_EPICA_ts
| ky_BP_AICC2012 | co2_ppm | |
|---|---|---|
| 0 | -0.051 | 368.02 |
| 1 | -0.048 | 361.78 |
| 2 | -0.046 | 359.65 |
| 3 | -0.044 | 357.11 |
| 4 | -0.043 | 353.95 |
| ... | ... | ... |
| 1837 | 803.925 | 202.92 |
| 1838 | 804.010 | 207.50 |
| 1839 | 804.523 | 204.86 |
| 1840 | 805.132 | 202.23 |
| 1841 | 805.669 | 207.29 |
1842 rows × 2 columns
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.