The basic logic behind a one-way ANOVA is to take independent random samples from each group, then to compute the sample means for each group and thereafter, compare the variation of sample means between the groups to the variation within the groups. Finally, a decision, whether the means of the groups are all equal or not, is made based on a test statistic.
Based on that logic we need quantitative measures of variability. Therefore, we partition the total variability into two segments: the between group variability and the within group variability.
We introduce three quantitative measures of variation:
The sum of squares total (SST) is a measure for the total variability of the variable. It is given by
\[SST = \sum_{i=1}^n(x_i-\bar x)^2\text{,}\] where \(x_i\) corresponds to the observations in the samples and \(\bar x\) to the overall mean of all samples.
The sum of squares groups (SSG) is a measure for the variability between groups and corresponds to the squared deviation of the group means from the overall mean, weighted by the sample size:
\[SSG = \sum_{j=1}^k n_j(\bar x_j-\bar x)^2\] Here, \(n_j\) denotes the sample size for group \(j\), \(\bar x_j\) denotes the mean of group \(j\) and \(\bar x\) denotes the overall mean of the sample.
Finally, the sum of squares error (SSE) is a measure for the variability within groups. It is associated with the unexplained variability, which is the variability that cannot be explained by the group variable. The sum of squares error is given by
\[SSE = \sum_{j=1}^k (n_j-1)s_j^2\text{,}\]
where \(n_j\) denotes the sample size for group \(j\) and \(s_j^2\) the variance of group \(j\). Alternatively, one may calculate the \(SSE\) as the difference between \(SST\) and \(SSG\):
\[SSE = SST-SSG\text{.}\]
So far, we calculated measures of total variability \((SST)\), in between group variability \((SSG)\) and within groups variability \((SSE)\). In the next step, in order to get an average variability, we scale these measures of variability by the sample size (more precisely by the degrees of freedoms, \(df\)).
The degrees of freedom are defined for each partition of variability (total, in between groups, within groups variability).
\[df_T = n-1\text{,}\]
where \(n\) denotes the overall sample size.
\[df_G=k-1\text{,}\]
where \(k\) denotes the number of groups.
\[df_E = n-k\text{.}\]
Now, we may calculate the mean squares for in between group variability and within group variability. The average variability in between and within groups is calculated as the total variability scaled by the associated degrees of freedom.
\[MSG = \frac{SSG}{df_G}\]
\[MSE = \frac{SSE}{df_E}\]
Finally, we compare the mean variation between the groups, \(MSG\), to the variation within the group, \(MSE\). Therefore, we calculate the ratio of the average between group \((MSG)\) and within group variability \((MSE)\), which is denoted as \(F\):
\[F= \frac{MSG}{MSE}\]
The \(F\)-statistic follows the \(F\)-distribution (named after Sir Ronald A. Fisher) with
\[df = (k-1, n-k)\text{,}\]
where \(k\) corresponds to the number of groups and \(n\) to the sample size. Large values of \(F\)-values indicate, that the variation in between the group sample means is large relative to the variation within the group. Further, we may calculate the p-value for any given \(F\)-value. If the p-value is small, the data provides convincing evidence that at least one pair of group means is different from each other. If the p-value is large, the data does not provide convincing evidence that at least one pair of group means is different from each other and thus the observed differences in sample means are attributable to sampling variability (or chance).
As seen above, the one-way analysis of variance includes several analytic steps. Therefore, a common way to display a one-way ANOVA is the so-called one-way ANOVA table. The general design of such a table is shown below:
\[ \begin{array}{|l|c|} \hline \ \text{Source} & df & \text{Sum of Squares }(SS) & \text{Mean Squares }(MS) & F\text{-statistic} & p\text{-value}\\ \hline \ \text{Group/Class} & k-1 & SSG & MSG=\frac{SSG}{k-1} & F = \frac{MSG}{MSE} & p\\ \ \text{Error/Residuals} & n-k & SSE & MSE=\frac{SSE}{n-k} & & \\ \hline \ \text{Total} & n-1 & SST & & & \\ \hline \end{array} \]
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.