`my_experiment`

and in addition print out the mean, $\bar x$, for each particular sample.

In [2]:

```
# First, let's import all the needed libraries.
import numpy as np
import random
```

In [3]:

```
population = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
my_experiment = []
```

In [4]:

```
for i in np.arange(1, 6):
my_sample = random.sample(population, 3)
my_experiment.append(np.mean(my_sample))
print(f"Sample number {i} has a mean of {round(np.mean(my_sample),2)}.")
```

Obviously, different samples (of the same length) selected from the same population yield different sample statistics because they contain different elements. Moreover, any sample statistics obtained from any sample, such as the sample mean $\bar x$, will be different from the result obtained from the corresponding population, the population mean, $\mu$. The difference between the value of a sample statistic obtained from a sample and the value of the corresponding population parameter obtained from the population is called the **sampling error**. In the case of the mean the sampling error can be written as

Due to the nature of random sampling, and thus due to the process of drawing a set of values from the population, the resulting sampling error occurs due to chance, or in other words, the sampling error is a random variable. However, one should note that beside the described randomness there are other sources of error. These error are often related the the data generation process and are subsumed under the term non-sampling error. Such errors are introduced by for example human handling of the data, calibration errors of the measuring devices, among others.

In order to gain some intuition on the nature of the sampling error we conduct an experiment. For this experiment the population of interest consists the first 100 integers $\{1,2,3,...,100\}$. We want to analyse the effect of the sample size, $n$, on the sampling error. For the sake of simplicity we choose the sample mean as the statistic of interest. For a sufficient large number of trials (`trials = 1000`

) we calculate the sampling error for samples of sizes $n = 10,25,50,75$.

In [5]:

```
pop = list(np.arange(1, 101))
pop_mean = np.mean(pop)
vector_error_sample_10 = []
vector_error_sample_25 = []
vector_error_sample_50 = []
vector_error_sample_75 = []
```

In [6]:

```
trials = np.arange(1, 1001)
for trial in trials:
my_sample_10 = random.sample(pop, 10)
my_sample_25 = random.sample(pop, 25)
my_sample_50 = random.sample(pop, 50)
my_sample_75 = random.sample(pop, 75)
error_sample_10 = abs(np.mean(my_sample_10) - pop_mean)
error_sample_25 = abs(np.mean(my_sample_25) - pop_mean)
error_sample_50 = abs(np.mean(my_sample_50) - pop_mean)
error_sample_75 = abs(np.mean(my_sample_75) - pop_mean)
vector_error_sample_10.append(error_sample_10)
vector_error_sample_25.append(error_sample_25)
vector_error_sample_50.append(error_sample_50)
vector_error_sample_75.append(error_sample_75)
print(f"Sampling Error, n = 10: {round(np.mean(vector_error_sample_10),3)}")
print(f"Sampling Error, n = 25: {round(np.mean(vector_error_sample_25),3)}")
print(f"Sampling Error, n = 50: {round(np.mean(vector_error_sample_50),3)}")
print(f"Sampling Error, n = 75: {round(np.mean(vector_error_sample_75),3)}")
```

*Inferential Statistics*.

**Citation**

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: *Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis
using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.*