**Percentiles** divide a ranked data set into 100 equal parts. Each (ranked) data set has 99 percentiles that divide it into 100 equal parts. The \(k^{th}\) percentile is denoted by \(P_k\), where \(k\) is an integer in the range 1 to 99. For instance, the 25^{th} percentile is denoted by \(P_{25}\).

Thus, the \(k^{th}\) percentile, \(P_k\), can be defined as a value in a data set such that about \(k\)% of the measurements are smaller than the value of \(P_k\) and about \((100 - k)%\) of the measurements are greater than the value of \(P_k\).

The approximate value of the \(k^{th}\) percentile, denoted by \(P_k\), is \[ P_k = \frac{kn}{100}\] where \(k\) denotes the number of the percentile and \(n\) represents the sample size.

As an exercise we calculate the 38^{th}, the 50^{th} and the 73^{th} percentile of the `nc.score`

variable in R. At first, we calculate the 38^{th} percentile according to the equitation given above. Then we apply the `quantile()`

function of R to find the 38^{th}, 50^{th} and 73^{th} percentile of the `nc.score`

variable.

```
students <- read.csv("https://userpage.fu-berlin.de/soga/200/2010_data_sets/students.csv")
nc.score <- students$nc.score
```

```
k <- 38 # set k
n <- length(nc.score) # set n
sprintf("The %sth percentile's position is number %s.", k, round(k*n/100))
```

`## [1] "The 38th percentile's position is number 3131."`

```
# select value based on number in the ordered vector
sort(nc.score)[round(k*n/100)] #
```

`## [1] 1.74`

`quantile(nc.score, probs = c(0.38, .50, .73))`

```
## 38% 50% 73%
## 1.74 2.04 2.71
```

That worked out fine! You may check if the median of `nc.score`

variable corresponds to the 50^{th} percentile (2.04), as calculated above.

We can also calculate the **percentile rank** for a particular value \(x_i\) of a data set by the following equation: \[\text{Percentile rank of } x_i =\frac{\text{Number of values less than } x_i}{\text{Total number of values in the data set}}\] The percentile rank of \(x_i\) gives the percentage of values in the data set that are less than \(x_i\).

In R there is no in-built function to calculate the percentile rank. However, it is fairly easy to write such a function by our own.

```
# user definded function based on the equation given above
percentile.ranked <- function(a.vector, value) {
numerator <- length(sort(a.vector)[a.vector < value])
denominator <- length(a.vector)
round(numerator/denominator,3)*100
}
```

Now for instance, we calculate the percentile rank for a *numerus clausus* of 2.5.

```
# calculate the percentile rank
value <- 2.5
percentile.ranked(nc.score, value)
```

`## [1] 66.3`

Rounding the result to the nearest integer value, we can state that about 66% of the students in our data set had a *numerus clausus* better then 2.5.