Exercise 1: Did the mean January temperature differ significantly between the 3-decadal cycle 1960-1989 and 1990-2019?
First, we import a suitable data set (here) into your Python environment using the pandas
package to directly store the information as dataframe
object by using the pd.read_csv()
function. Afterwards, we create a subset of January temperature data for the two time periods.
Note: Ensure
pandas
andnumpy
are installed in yourmamba
environment!
import pandas as pd
import numpy as np
potsdam_data = pd.read_csv("https://userpage.fu-berlin.de/soga/data/raw-data/SaekularPotsdammonthmeantemp1893-2018.txt",
skiprows = 2)
potsdam_january_means_1958_1988 = potsdam_data.loc[(potsdam_data.Year >= 1958) & (potsdam_data.Year <= 1988),
["Year", "Jan"]]
potsdam_january_means_1989_2018 = potsdam_data.loc[(potsdam_data.Year >= 1989) & (potsdam_data.Year <= 2018),
["Year", "Jan"]]
Exercise 1: Perform a two-sided t-test using the above created
pandas
dataframes
potsdam_january_means_1958_1988
andpotsdam_january_means_1989_2018
!
### your solution
from scipy import stats
alpha = 0.05
test_result = stats.ttest_1samp(potsdam_january_means_1989_2018["Jan"],
np.mean(potsdam_january_means_1958_1988["Jan"]),
alternative = "two-sided")
text = "Because the p value ({}) is lower than the error level alpha ({})\nthe test result indicates" + " a significant difference between these\ntwo time periods to a confidence level of {}."
print(text.format(round(test_result.pvalue, 4), alpha, 1 - alpha))
Because the p value (0.0005) is lower than the error level alpha (0.05) the test result indicates a significant difference between these two time periods to a confidence level of 0.95.
Exercise 2: A new type of potato seed is on the market. The following crop yield data is collected from a test field. Conduct a statistical test, to prove whether the new seeds yield an output significantly different from the old ones!
Note: You will have to perform a significance test beforehand to determine wheater the variances of both samples could be assumed equal. Use the
bartlett()
function provided within thestats
module over thescipy
package. Additional information for using this function is available withinscipies
documentation.
yield_seed_1 = [48, 45, 47, 43, 59, 51, 49, 55, 47, 56, 47, 54]
yield_seed_2 = [50, 48, 44, 52, 42, 42, 47, 43, 55, 45, 51, 42]
### your solution
from scipy import stats
alpha = 0.05
test_result = stats.bartlett(yield_seed_1, yield_seed_2)
text = "Because the p-value ({}) is greater than the error level alpha ({})\nwe do not reject H0 and assume equal variances."
print(text.format(round(test_result.pvalue, 2), alpha))
test_result = stats.ttest_ind(yield_seed_1, yield_seed_2, equal_var = True)
text = "Because the p-value ({}) is greater than the error level alpha ({})\nwe do not reject H0 and cannot confirm significant differences in crop yield between the two seeds."
print(text.format(round(test_result.pvalue, 2), alpha))
Because the p-value (0.75) is greater than the error level alpha (0.05) we do not reject H0 and assume equal variances. Because the p-value (0.1) is greater than the error level alpha (0.05) we do not reject H0 and cannot confirm significant differences in crop yield between the two seeds.
Exercise 2: Trilobites were found in two layers of a claystone. The size of the animals is now to be used to determine whether the same depositional conditions prevailed (alpha = 5 %).
Note: You will have to perform a significance test beforehand to determine wheater the variances of both samples could be assumed equal. Use the
bartlett()
function provided within thestats
module over thescipy
package. Additional information for using this function is available withinscipies
documentation.
layer_1 = [3.13, 2.92, 2.71, 4, 3.62, 3.87]
layer_2 = [3.03, 3.18, 2.91, 2.75, 3.14]
### your solution
from scipy import stats
alpha = 0.05
test_result = stats.bartlett(layer_1, layer_2)
text = "Because the p-value ({}) is smaler than the error level alpha ({})\nwe reject H0 and assume that the variance of both sample are not equal."
print(text.format(round(test_result.pvalue, 3), alpha))
test_result = stats.ttest_ind(layer_1, layer_2, equal_var = False)
text = "Because the p-value ({}) is greater than the error level alpha ({})\nwe do not reject H0 and and cannot confirm significant differences in depositional conditions of the two layers."
print(text.format(round(test_result.pvalue, 3), alpha))
Because the p-value (0.049) is smaler than the error level alpha (0.05) we reject H0 and assume that the variance of both sample are not equal. Because the p-value (0.154) is greater than the error level alpha (0.05) we do not reject H0 and and cannot confirm significant differences in depositional conditions of the two layers.
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.