The basic logic behind a one-way ANOVA is to take independent random samples from each group, then to compute the sample means for each group and thereafter, compare the variation of sample means between the groups to the variation within the groups. Finally, make a decision based on a test statistic, whether the means of the groups are all equal or not.

Based on that logic we need quantitative measures of the variability. Therefore we partition the total variability into two segments: One, accounting for the between group variability and the other accounting for the within group variability.


Measures of variability

We introduce three quantitative measures of the variation:

The sum of squares total (SST) is a measure for the total variability of the variable. It is given by

\[SST = \sum_{i=1}^n(x_i-\bar x)^2\text{,}\] where \(x_i\) corresponds to the observations in the samples and \(\bar x\) to the overall mean of all samples.

The sum of squares groups (SSG) is a measure for the variability between groups and corresponds to the squared deviation of the group means from the overall mean, weighted by the sample size.

\[SSG = \sum_{i=1}^n n_j(\bar x_i-\bar x)^2\] Here, \(n_j\) denotes the sample size for group \(j\), \(\bar x_i\) denotes the mean of group \(j\) and \(\bar x\) denotes to the overall mean of the sample.

Finally, the sum of squares error (SSE) is a measure for the variability within groups. It is associated with the unexplained variability, which is the variability that cannot be explained by the group variable. The sum of squares error is given by

\[SSE = \sum_{i=1}^n (n_j-1)s_j^2\text{,}\]

where \(n_j\) denotes the sample size for group \(j\) and \(s_j^2\) the variance of group \(j\). Alternatively, one may calculate \(SSE\) as well as the difference of \(SST\) and \(SSG\)

\[SSE = SST-SSG\text{.}\]


Measures of mean variability

So far we calculated measures of total variability \((SST)\), in between group variability \((SSG)\) and within groups variability \((SSE)\). In the next step, in order to get an average variability, we scale these measures of variability by the sample size (more precisely by the degrees of freedoms, \(df\)).

The degrees of freedom are defined for each partition of variability (total, in between groups, and within groups variability).

\[df_T = n-1\text{,}\]

where \(n\) denotes the overall sample size.

\[df_G=k-1\text{,}\]

where \(k\) denotes the number of groups.

\[df_E = n-k\text{.}\]

Now we may calculate the mean squares for in between group variability and within group variability. The average variability in between and within groups is calculated as the total variability scaled by the associated degrees of freedom.

\[MSG = \frac{SSG}{df_G}\]

\[MSE = \frac{SSE}{df_E}\]


Test statistic and p-value

Finally, we compare the mean variation between the groups, \(MSG\), to the variation within the group, \(MSE\). Therefore we calculate the ratio of the average between group \((MSG)\) and within group variability \((MSE)\), which is denoted with \(F\).

\[F= \frac{MSG}{MSE}\]

The \(F\)-statistic has the \(F\)-distribution (named after Sir Ronald A. Fisher) with

\[df = (k-1, n-k)\text{,}\]

where \(k\) corresponds to the number of groups and \(n\) to the sample size. Large values of \(F\)-values indicate that the variation in between the group sample means is large relative to the variation within the group. Further, we may calculate the p-value for any given \(F\)-value. If the p-value is small, the data provides convincing evidence that at least one pair of group means is different from each other. If the p-value is large, the data does not provide convincing evidence that at least one pair of group means is different from each other and thus, the observed differences in sample means are attributable to sampling variability (or chance).


One-Way ANOVA Tables

As seen above the one-way analysis of variance includes several analytic steps. Therefore, a common way to display a one-way ANOVA is the so-called one-way ANOVA table. The general design of such a table is shown below.

\[ \begin{array}{|l|c|} \hline \ \text{Source} & df & \text{Sum of Squares }(SS) & \text{Mean Squares }(MS) & F\text{-statistic} & p\text{-value}\\ \hline \ \text{Group/Class} & k-1 & SSG & MSG=\frac{SSG}{k-1} & F = \frac{MSG}{MSE} & p\\ \ \text{Error/Residuals} & n-k & SSE & MSE=\frac{SSE}{n-k} & & \\ \hline \ \text{Total} & n-1 & SST & & & \\ \hline \end{array} \]