Exercise 1: Did the mean January temperature differ significantly between the 3-decadal cycle 1960-1989 and 1990-2019?

First, we import a suitable data set (here) into your Python environment using the `pandas`

package to directly store the information as `dataframe`

object by using the `pd.read_csv()`

function. Afterwards, we create a subset of January temperature data for the two time periods.

Note: Ensure`pandas`

and`numpy`

are installed in your`mamba`

environment!

In [1]:

```
import pandas as pd
import numpy as np
potsdam_data = pd.read_csv("https://userpage.fu-berlin.de/soga/data/raw-data/SaekularPotsdammonthmeantemp1893-2018.txt",
skiprows = 2)
potsdam_january_means_1958_1988 = potsdam_data.loc[(potsdam_data.Year >= 1958) & (potsdam_data.Year <= 1988),
["Year", "Jan"]]
potsdam_january_means_1989_2018 = potsdam_data.loc[(potsdam_data.Year >= 1989) & (potsdam_data.Year <= 2018),
["Year", "Jan"]]
```

Exercise 1: Perform a two-sidedt-test using the above created`pandas`

`dataframes`

`potsdam_january_means_1958_1988`

and`potsdam_january_means_1989_2018`

!

In [2]:

```
### your solution
```

In [3]:

```
from scipy import stats
alpha = 0.05
test_result = stats.ttest_1samp(potsdam_january_means_1989_2018["Jan"],
np.mean(potsdam_january_means_1958_1988["Jan"]),
alternative = "two-sided")
text = "Because the p value ({}) is lower than the error level alpha ({})\nthe test result indicates" + " a significant difference between these\ntwo time periods to a confidence level of {}."
print(text.format(round(test_result.pvalue, 4), alpha, 1 - alpha))
```

Exercise 2: A new type of potato seed is on the market. The following crop yield data is collected from a test field. Conduct a statistical test, to prove whether the new seeds yield an output significantly different from the old ones!

Note: You will have to perform a significance test beforehand to determine wheater the variances of both samples could be assumed equal. Use the`bartlett()`

function provided within the`stats`

module over the`scipy`

package. Additional information for using this function is available within`scipies`

documentation.

In [4]:

```
yield_seed_1 = [48, 45, 47, 43, 59, 51, 49, 55, 47, 56, 47, 54]
yield_seed_2 = [50, 48, 44, 52, 42, 42, 47, 43, 55, 45, 51, 42]
### your solution
```

In [5]:

```
from scipy import stats
alpha = 0.05
test_result = stats.bartlett(yield_seed_1, yield_seed_2)
text = "Because the p-value ({}) is greater than the error level alpha ({})\nwe do not reject H0 and assume equal variances."
print(text.format(round(test_result.pvalue, 2), alpha))
test_result = stats.ttest_ind(yield_seed_1, yield_seed_2, equal_var = True)
text = "Because the p-value ({}) is greater than the error level alpha ({})\nwe do not reject H0 and cannot confirm significant differences in crop yield between the two seeds."
print(text.format(round(test_result.pvalue, 2), alpha))
```

Exercise 2: Trilobites were found in two layers of a claystone. The size of the animals is now to be used to determine whether the same depositional conditions prevailed (alpha = 5 %).

Note: You will have to perform a significance test beforehand to determine wheater the variances of both samples could be assumed equal. Use the`bartlett()`

function provided within the`stats`

module over the`scipy`

package. Additional information for using this function is available within`scipies`

documentation.

In [6]:

```
layer_1 = [3.13, 2.92, 2.71, 4, 3.62, 3.87]
layer_2 = [3.03, 3.18, 2.91, 2.75, 3.14]
### your solution
```

In [7]:

```
from scipy import stats
alpha = 0.05
test_result = stats.bartlett(layer_1, layer_2)
text = "Because the p-value ({}) is smaler than the error level alpha ({})\nwe reject H0 and assume that the variance of both sample are not equal."
print(text.format(round(test_result.pvalue, 3), alpha))
test_result = stats.ttest_ind(layer_1, layer_2, equal_var = False)
text = "Because the p-value ({}) is greater than the error level alpha ({})\nwe do not reject H0 and and cannot confirm significant differences in depositional conditions of the two layers."
print(text.format(round(test_result.pvalue, 3), alpha))
```

**Citation**

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: *Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis
using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.*