**Percentiles** divide a ranked data set into 100 equal
parts. Each (ranked) data set has 99 percentiles that divide it into 100
equal parts. The \(k^{th}\) percentile
is denoted by \(P_k\), where \(k\) is an integer in the range 1 to 99. For
instance, the 25^{th} percentile is denoted by \(P_{25}\).

Thus, the \(k^{th}\) percentile, \(P_k\), can be defined as a value in a data set such that about \(k\) % of the measurements are smaller than the value of \(P_k\) and about \((100 - k)\) % of the measurements are greater than the value of \(P_k\).

The approximate value of the \(k^{th}\) percentile, denoted by \(P_k\), is \[ P_k = \frac{k*n}{100}\] where \(k\) denotes the number of the percentile and \(n\) represents the sample size.

As an exercise we calculate the 38^{th}, the 50^{th}
and the 73^{th} percentile of the `nc_score`

variable
in R. At first, we calculate the 38^{th} percentile according to
the equation given above.

```
students <- read.csv("https://userpage.fu-berlin.de/soga/data/raw-data/students.csv")
nc_score <- students$nc.score
```

```
k <- 38 # set k
n <- length(nc_score) # set n
sprintf("The %sth percentile's position is number %s.", k, round(k * n / 100))
```

`## [1] "The 38th percentile's position is number 3131."`

```
# select value based on number in the ordered vector
sort(nc_score)[round(k * n / 100)]
```

`## [1] 1.74`

Alternatively, we apply R’s `quantile()`

function to find
the 38^{th}, 50^{th} and 73^{th} percentile of
the `nc_score`

variable.

`quantile(nc_score, probs = c(0.38, 0.50, 0.73))`

```
## 38% 50% 73%
## 1.74 2.04 2.71
```

That worked out fine! You may check if the median of the
`nc_score`

variable corresponds to the 50^{th}
percentile (2.04), as calculated above.

We can also calculate the **percentile rank** for a
particular value \(x_i\) of a data set
by the following equation: \[\text{Percentile
rank of } x_i =\frac{\text{Number of values less than } x_i}{\text{Total
number of values in the data set}}\] The percentile rank of \(x_i\) gives the percentage of values in the
data set that are less than \(x_i\).

In R there is no in-built function to calculate the percentile rank. However, it is fairly easy to write such a function by ourselves:

```
# user defined function based on the equation given above
percentile.ranked <- function(a_vector, value) {
numerator <- length(sort(a_vector)[a_vector < value])
denominator <- length(a_vector)
round(numerator / denominator, 3) * 100
}
```

Now, we can calculate, for instance, the percentile rank for a
*numerus clausus* of 2.5.

```
# calculate the percentile rank
value <- 2.5
percentile.ranked(nc_score, value)
```

`## [1] 66.3`

Rounding the result to the nearest integer value, we can state that
about 66 % of the students in our data set had a *numerus
clausus* better than 2.5.

**Citation**

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: *Hartmann,
K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis
using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.*