A very common problem that scientists face is the assessment of significance in scattered statistical data. Owing to the limited availability of observational data, scientists apply inferential statistical methods to decide whether the observed data contains significant information or the scattered data is nothing more than the manifestation of the inherently probabilistic nature of the data generation process.

Generally speaking, a scientist states such a problem as follows. The scientist builds a model, which is just a simplification of the data generation process, and considers a particular assumption - a so called hypothesis - of this model. Given the data, he wants to evaluate this tentative hypothesis.

The framework of hypothesis testing is all about making statistical inferences about populations based on samples taken from the population. One way to estimate a population parameter is the construction of confidence intervals. Another way is to make a decision about a parameter in form of a test. Any hypothesis test involves the collection of data (sampling). If the hypothesis is assumed to be correct, the scientist can calculate the expected results of an experiment. If the observed data differs significantly from the expected results, one considers the assumption to be incorrect. Thus, based on the observed data the scientist makes a decision as to whether or not there is sufficient evidence, based upon analyses of the data, that the model - the hypothesis - should be rejected, or that there is not sufficient evidence to reject the stated hypothesis.

The following functions and R-packages are used in this section (in alphabetical order):

R-packages

• ggplot2
• RColorBrewer

Functions

• abs()
• as.vector()
• aov()
• cbind()
• chisq.test()
• coef()
• col.names()
• colnames()
• complete.cases()
• cor()
• cor.test()
• cov()
• data.frame()
• lm()
• length()
• levels()
• margin.table()
• matrix()
• max()
• mean()
• min()
• ncol()
• nrow()
• pairwise.t.test()
• paste()
• pchisq()
• pf()
• pnorm()
• print()
• pt()
• qnorm()
• qt()
• rbind()
• RColorBrewer::brewer.pal()
• row.names()
• rownames()
• sample()
• sd()
• sigma()
• sqrt()
• str()
• subset()
• sum()
• summary()
• table()
• tapply()
• t.test()
• TukeyHSD()
• var()
• var.test()

Plotting Functions

• abline()
• barplot()
• boxplot()
• ggplot()
• hist()
• qqline()
• qqnorm()
• qplot()
• par()
• pie()
• plot()

Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.