Quartiles divide a ranked data set into four equal parts. These three measures are denoted first quartile (denoted by \(Q1\)), second quartile (denoted by \(Q2\)) and third quartile (denoted by \(Q3\)). The second quartile is the same as the median of a data set. The first quartile is the value of the middle term among the observations that are less than the median and the third quartile is the value of the middle term among the observations that are greater than the median (Mann 2012).
Approximately 25 % of the values in a ranked data set are less than \(Q1\) and about 75 % are greater than \(Q1\) The second quartile, \(Q2\), divides a ranked data set into two equal parts; hence, the second quartile and the median are the same. Approximately 75 % of the data values are less than \(Q3\) and about 25 % are greater than \(Q3\). The difference between the third quartile and the first quartile of a data set is called the interquartile range (\(IQR\)) (Mann 2012).
\[ IQR = Q3-Q1\]
Let us switch to R and test its functionality for computing
quantiles/quartiles. We will use the nc.score
variable of
the students
data set to calculate quartiles and the \(IQR\). The nc.score
variable
corresponds to the Numerus Clausus score of each particular
student.
First, we subset the data and plot a histogram to further inspect the variable’s distribution.
students <- read.csv("https://userpage.fu-berlin.de/soga/data/raw-data/students.csv")
nc_score <- students$nc.score
hist(nc_score, breaks = "sturges")
To calculate the quartiles for the nc_score
variable, we
apply the function quantile()
. If you call the
help()
function on quantile()
, you see that
the default values for the argument probs
are set to 0,
0.25, 0.5 and 0.75. Thus, in order to calculate the quartiles for the
nc_score
variable we just write:
quantile(nc_score)
## 0% 25% 50% 75% 100%
## 1.00 1.46 2.04 2.78 4.00
which gives the same result as:
quantile(nc_score, probs = c(0, 0.25, 0.5, 0.75, 1))
## 0% 25% 50% 75% 100%
## 1.00 1.46 2.04 2.78 4.00
Note: Not all statisticians define quartiles in exactly the same way. For a detailed discussion of the different methods for computing quartiles, see the online article “Quartiles in Elementary Statistics” by E. Langford (2006). In addition, you may find the
help(quantile)
function and itstype
argument helpful.
In order to calculate the \(IQR\)
for the nc_score
variable we either write…
nc_score_quart <- quantile(nc_score, names = FALSE)
nc_score_quart[4] - nc_score_quart[2]
## [1] 1.32
…or we apply the in-built function IQR()
:
IQR(nc_score)
## [1] 1.32
We can visualize the partitioning of the nc_score
variable into quartiles by plotting a histogram and by adding a couple
of additional lines of code.
h <- hist(nc_score, breaks = 50, plot = F)
cuts <- cut(h$breaks, c(0, nc_score_quart))
plot(h,
col = rep(c("4", "4", "3", "2", "1"))[cuts],
main = "Quartiles",
xlab = "Numerus Clausus score"
)
# add legend
legend("topright",
legend = c("1st", "2nd", "3rd", "4th"),
col = c(4, 3, 2, 1),
pch = 15
)
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.