Percentiles divide a ranked data set into 100 equal parts. Each (ranked) data set has 99 percentiles that divide it into 100 equal parts. The \(k^{th}\) percentile is denoted by \(P_k\), where \(k\) is an integer in the range 1 to 99. For instance, the 25th percentile is denoted by \(P_{25}\).

Thus, the \(k^{th}\) percentile, \(P_k\), can be defined as a value in a data set such that about \(k\)% of the measurements are smaller than the value of \(P_k\) and about \((100 - k)%\) of the measurements are greater than the value of \(P_k\).

The approximate value of the \(k^{th}\) percentile, denoted by \(P_k\), is \[ P_k = \frac{kn}{100}\] where \(k\) denotes the number of the percentile and \(n\) represents the sample size.

As an exercise we calculate the 38th, the 50th and the 73th percentile of the nc.score variable in R. At first, we calculate the 38th percentile according to the equitation given above. Then we apply the quantile()function of R to find the 38th, 50th and 73th percentile of the nc.score variable.

students <- read.csv("https://userpage.fu-berlin.de/soga/200/2010_data_sets/students.csv")
nc.score <- students$nc.score
k <- 38 # set k 
n <- length(nc.score) # set n
sprintf("The %sth percentile's position is number %s.", k, round(k*n/100))
## [1] "The 38th percentile's position is number 3131."
# select value based on number in the ordered vector
sort(nc.score)[round(k*n/100)] #
## [1] 1.74
quantile(nc.score, probs = c(0.38, .50, .73))
##  38%  50%  73% 
## 1.74 2.04 2.71

That worked out fine! You may check if the median of nc.score variable corresponds to the 50th percentile (2.04), as calculated above.

We can also calculate the percentile rank for a particular value \(x_i\) of a data set by the following equation: \[\text{Percentile rank of } x_i =\frac{\text{Number of values less than } x_i}{\text{Total number of values in the data set}}\] The percentile rank of \(x_i\) gives the percentage of values in the data set that are less than \(x_i\).

In R there is no in-built function to calculate the percentile rank. However, it is fairly easy to write such a function by our own.

# user definded function based on the equation given above
percentile.ranked <- function(a.vector, value) {
  numerator <- length(sort(a.vector)[a.vector < value]) 
  denominator <- length(a.vector)
  round(numerator/denominator,3)*100 
}

Now for instance, we calculate the percentile rank for a numerus clausus of 2.5.

# calculate the percentile rank
value <- 2.5
percentile.ranked(nc.score, value) 
## [1] 66.3

Rounding the result to the nearest integer value, we can state that about 66% of the students in our data set had a numerus clausus better then 2.5.