From the three quartiles (\(Q1, Q2, Q3\)) we can obtain a measure of center (the median, \(Q2\)) and measures of variation of the two middle quarters of the data, \(Q2 - Q1\) for the second quarter and \(Q3 - Q2\) for the third quarter. But the three quartiles do not tell us anything about the variation of the first and fourth quarters.

To gain that information, we include the minimum and maximum observations as well. The variation of the first quarter can be measured as the difference between the minimum and the first quartile, \(Q1 - Min\). The variation of the fourth quarter can be measured as the difference between the third quartile and the maximum, \(Max - Q3\). Thus, the minimum, maximum and quartiles together provide, among other things, information on center and variation (Weiss 2010).

The so called Tukey Five-Number Summary (after the mathematician John Wilder Tukey) of a data set consists of the \(Min\), \(Q1\), \(Q2\), \(Q3\) and \(Max\) of the data set.

The five-number summary is easily calculated by applying the in-built fivenum() function in R. For demonstration purposes we calculate the five-number summary for the nc_score variable of the students data set.

students <- read.csv("https://userpage.fu-berlin.de/soga/data/raw-data/students.csv")
nc_score <- students$nc.score
fivenum(nc_score)
## [1] 1.00 1.46 2.04 2.78 4.00

This function returns minimum, lower-hinge, median, upper-hinge and maximum for the input data.

In R there exists a similar function called summary(), which provides, called on a vector, similar statistics; however, including the arithmetic mean as well.

summary(nc_score)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.460   2.040   2.166   2.780   4.000

Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.