The range is a measure of dispersion that is simple to calculate. It is defined as the difference between the largest and the smallest value of a data set:
$$ \text{Range} = \text{Largest value} - \text{Smallest value} $$Let us calculate the range for our students
data set. We subset the data frame to include numerical data only.
# First, let's import all the needed libraries.
import pandas as pd
students = pd.read_csv(
"https://userpage.fu-berlin.de/soga/200/2010_data_sets/students.csv"
)
quant_vars = ["age", "nc.score", "height", "weight"]
students_quant = students[quant_vars]
students_quant.head(10)
age | nc.score | height | weight | |
---|---|---|---|---|
1 | 19 | 1.91 | 160 | 64.8 |
2 | 19 | 1.56 | 172 | 73.0 |
3 | 22 | 1.24 | 168 | 70.6 |
4 | 19 | 1.37 | 183 | 79.7 |
5 | 21 | 1.46 | 175 | 71.4 |
6 | 19 | 1.34 | 189 | 85.8 |
7 | 21 | 1.11 | 156 | 65.9 |
8 | 21 | 2.03 | 167 | 65.7 |
9 | 18 | 1.29 | 195 | 94.4 |
10 | 18 | 1.19 | 165 | 66.0 |
We built the get_range
function, which takes a vector containing the input data and returns the range value. In combination with the apply
function we can calculate the minimum and maximum for each particular variable, respectively column, of the data set and then return the range value.
def get_range(val_list):
return max(val_list) - min(val_list)
students[quant_vars].apply(get_range)
age 46.0 nc.score 3.0 height 71.0 weight 64.6 dtype: float64
The range, like the mean, has the disadvantage of being influenced by outliers. Consequently, the range is not a good measure of dispersion to use for a data set that contains outliers. Another disadvantage of using the range as a measure of dispersion is that its calculation is based on two values only: the largest and the smallest. All other values in a data set are ignored when calculating the range. Thus, the range is not a very satisfactory measure of dispersion (Mann 2012).
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.