Another very important measure of central tendency is the median. The median is the value of the middle term in a data set that has been ranked in increasing order. Thus, the median divides a ranked data set into two equal parts.

The calculation of the median consists of the following two steps:

  1. Rank the data set in increasing order.
  2. Find the middle term. The value of this term is the median.

Note that if the number of observations in a data set is odd, then the median is given by the value of the middle term in the ranked data. However, if the number of observations is even, then the median is given by the average of the values of the two middle terms (Mann 2012) .

Let us evaluate the median for the age variable of the studentsdata set.

students <- read.csv("https://userpage.fu-berlin.de/soga/data/raw-data/students.csv")
stud_age <- students$age # extract age vector
hist(stud_age,breaks = 30,xlim = c(min(stud_age),max(stud_age)))

By plotting the age variable we immediately realize that there are some students, which are much older than the rest of the students.

Let us calculate the median…

median(stud_age)
## [1] 21

…and compare it to the arithmetic mean.

mean(stud_age)
## [1] 22.54157

Now, for visualization we add the median and the arithmetic mean to the plot.

hist(stud_age,breaks = 30,xlim = c(min(stud_age),max(stud_age))) # plot figure
abline(
  v = mean(stud_age),
  col = "red",
  lwd = 3
) # add horizontal line

abline(
  v = median(stud_age),
  col = "green",
  lwd = 3
) # add horizontal line

legend("topright",
  legend = c("Median", "Arithmetic mean"),
  col = c("green", "red"),
  lty = "solid"
) # add legend

As we can see, the median is not influenced by the outliers. Consequently, the median is preferred over the mean as a measure of central tendency for data sets that contain outliers (Mann 2012) .


Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.