Exercise 1: Did the mean January temperature differ significantly between the 3-decadal cycle 1960-1989 and 1990-2019?

First, we import a suitable data set (here) into your Python environment using the pandas package to directly store the information as dataframe object by using the pd.read_csv() function. Afterwards, we create a subset of January temperature data for the two time periods.

Note: Ensure pandas and numpy are installed in your mamba environment!

In [1]:
import pandas as pd
import numpy as np

potsdam_data = pd.read_csv("https://userpage.fu-berlin.de/soga/data/raw-data/SaekularPotsdammonthmeantemp1893-2018.txt",
                           skiprows = 2)

potsdam_january_means_1958_1988 = potsdam_data.loc[(potsdam_data.Year >= 1958) & (potsdam_data.Year <= 1988),
                                                   ["Year", "Jan"]]

potsdam_january_means_1989_2018 = potsdam_data.loc[(potsdam_data.Year >= 1989) & (potsdam_data.Year <= 2018),
                                                   ["Year", "Jan"]]

Exercise 1: Perform a two-sided t-test using the above created pandas dataframes potsdam_january_means_1958_1988 and potsdam_january_means_1989_2018!

In [2]:
### your solution
In [3]:
Show code
from scipy import stats

alpha = 0.05

test_result = stats.ttest_1samp(potsdam_january_means_1989_2018["Jan"],
                                np.mean(potsdam_january_means_1958_1988["Jan"]),
                                alternative = "two-sided")

text = "Because the p value ({}) is lower than the error level alpha ({})\nthe test result indicates" + " a significant difference between these\ntwo time periods to a confidence level of {}."

print(text.format(round(test_result.pvalue, 4), alpha, 1 - alpha))
Because the p value (0.0005) is lower than the error level alpha (0.05)
the test result indicates a significant difference between these
two time periods to a confidence level of 0.95.

Exercise 2: A new type of potato seed is on the market. The following crop yield data is collected from a test field. Conduct a statistical test, to prove whether the new seeds yield an output significantly different from the old ones!

Note: You will have to perform a significance test beforehand to determine wheater the variances of both samples could be assumed equal. Use the bartlett() function provided within the stats module over the scipy package. Additional information for using this function is available within scipies documentation.

In [4]:
yield_seed_1 = [48, 45, 47, 43, 59, 51, 49, 55, 47, 56, 47, 54]
yield_seed_2 = [50, 48, 44, 52, 42, 42, 47, 43, 55, 45, 51, 42]

### your solution
In [5]:
Show code
from scipy import stats

alpha = 0.05

test_result = stats.bartlett(yield_seed_1, yield_seed_2)
text = "Because the p-value ({}) is greater than the error level alpha ({})\nwe do not reject H0 and assume equal variances."
print(text.format(round(test_result.pvalue, 2), alpha))

test_result = stats.ttest_ind(yield_seed_1, yield_seed_2, equal_var = True) 
text = "Because the p-value ({}) is greater than the error level alpha ({})\nwe do not reject H0 and cannot confirm significant differences in crop yield between the two seeds."
print(text.format(round(test_result.pvalue, 2), alpha))
Because the p-value (0.75) is greater than the error level alpha (0.05)
we do not reject H0 and assume equal variances.
Because the p-value (0.1) is greater than the error level alpha (0.05)
we do not reject H0 and cannot confirm significant differences in crop yield between the two seeds.

Exercise 2: Trilobites were found in two layers of a claystone. The size of the animals is now to be used to determine whether the same depositional conditions prevailed (alpha = 5 %).

Note: You will have to perform a significance test beforehand to determine wheater the variances of both samples could be assumed equal. Use the bartlett() function provided within the stats module over the scipy package. Additional information for using this function is available within scipies documentation.

In [6]:
layer_1 = [3.13, 2.92, 2.71, 4, 3.62, 3.87]
layer_2 = [3.03, 3.18, 2.91, 2.75, 3.14]

### your solution
In [7]:
Show code
from scipy import stats

alpha = 0.05

test_result = stats.bartlett(layer_1, layer_2)
text = "Because the p-value ({}) is smaler than the error level alpha ({})\nwe reject H0 and assume that the variance of both sample are not equal."
print(text.format(round(test_result.pvalue, 3), alpha))

test_result = stats.ttest_ind(layer_1, layer_2, equal_var = False) 
text = "Because the p-value ({}) is greater than the error level alpha ({})\nwe do not reject H0 and and cannot confirm significant differences in depositional conditions of the two layers."
print(text.format(round(test_result.pvalue, 3), alpha))
Because the p-value (0.049) is smaler than the error level alpha (0.05)
we reject H0 and assume that the variance of both sample are not equal.
Because the p-value (0.154) is greater than the error level alpha (0.05)
we do not reject H0 and and cannot confirm significant differences in depositional conditions of the two layers.

Citation

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.