20221_range

The range is a measure of dispersion that is simple to calculate. It is defined as the difference between the largest and the smallest value of a data set:

$$ \text{Range} = \text{Largest value} - \text{Smallest value} $$

Let us calculate the range for our students data set. We subset the data frame to include numerical data only.

In [2]:

# First, let's import all the needed libraries.
import pandas as pd

In [3]:

students = pd.read_csv(
    "https://userpage.fu-berlin.de/soga/200/2010_data_sets/students.csv"
)
quant_vars = ["age", "nc.score", "height", "weight"]

students_quant = students[quant_vars]
students_quant.head(10)

Out[3]:

	age	nc.score	height	weight
1	19	1.91	160	64.8
2	19	1.56	172	73.0
3	22	1.24	168	70.6
4	19	1.37	183	79.7
5	21	1.46	175	71.4
6	19	1.34	189	85.8
7	21	1.11	156	65.9
8	21	2.03	167	65.7
9	18	1.29	195	94.4
10	18	1.19	165	66.0

We built the get_range function, which takes a vector containing the input data and returns the range value. In combination with the apply function we can calculate the minimum and maximum for each particular variable, respectively column, of the data set and then return the range value.

In [4]:

def get_range(val_list):
    return max(val_list) - min(val_list)

In [5]:

students[quant_vars].apply(get_range)

Out[5]:

age         46.0
nc.score     3.0
height      71.0
weight      64.6
dtype: float64

The range, like the mean, has the disadvantage of being influenced by outliers. Consequently, the range is not a good measure of dispersion to use for a data set that contains outliers. Another disadvantage of using the range as a measure of dispersion is that its calculation is based on two values only: the largest and the smallest. All other values in a data set are ignored when calculating the range. Thus, the range is not a very satisfactory measure of dispersion (Mann 2012).

Citation

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.