A very common problem that scientists face is the **assessment
of significance** in scattered statistical data. Owing to the
limited availability of observational data, scientists apply inferential
statistical methods to decide whether the observed data contains
significant information or the scattered data is nothing more than the
manifestation of the inherently probabilistic nature of the data
generation process.

Generally speaking, a scientist states such a problem as follows. The
scientist builds a model, which is just a simplification of the data
generation process, and considers a particular assumption - a so called
**hypothesis** - of this model. Given
the data, he wants to evaluate this tentative hypothesis.

The framework of hypothesis testing is all about making
**statistical inferences about populations based on samples taken
from the population**. One way to estimate a population parameter
is the construction of confidence intervals. Another way is to make a
decision about a parameter in form of a test. Any hypothesis test
involves the collection of data (sampling). If the hypothesis is assumed
to be correct, the scientist can calculate the expected results of an
experiment. If the observed data differs significantly from the expected
results, one considers the assumption to be incorrect. Thus, based on
the observed data the scientist makes a decision as to whether or not
there is sufficient evidence, based upon analyses of the data, that the
model - the hypothesis - should be rejected, or that there is not
sufficient evidence to reject the stated hypothesis.

The following functions and R-packages are used in this section (in alphabetical order):

**R-packages**

- ggplot2
- RColorBrewer

**Functions**

- abs()
- as.vector()
- aov()
- cbind()
- chisq.test()
- coef()
- col.names()
- colnames()
- complete.cases()
- cor()
- cor.test()
- cov()
- data.frame()
- head()
- lm()
- length()
- levels()
- margin.table()
- matrix()
- max()
- mean()
- min()
- ncol()
- nrow()
- pairwise.t.test()
- paste()
- pchisq()
- pf()
- pnorm()
- print()
- pt()
- qnorm()
- qt()
- rbind()
- RColorBrewer::brewer.pal()
- read.csv()
- row.names()
- rownames()
- sample()
- sd()
- sigma()
- sqrt()
- str()
- subset()
- sum()
- summary()
- table()
- tapply()
- t.test()
- TukeyHSD()
- var()
- var.test()

**Plotting Functions**

- abline()
- barplot()
- boxplot()
- ggplot()
- hist()
- qqline()
- qqnorm()
- qplot()
- par()
- pie()
- plot()

**Citation**

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: *Hartmann,
K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis
using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.*