In statistics, the mode represents the most common value in a data set. Therefore, the mode is the value that occurs with the highest frequency in a data set (Mann 2012). In terms of graphical frequency distribution the mode corresponds to the summit(s) of the graph.
A major shortcoming of the mode is that a data set may have none or may have more than one mode, whereas it will have only one mean and only one median. For instance, a data set with each value occurring only once has no mode. A data set with only one value occurring with the highest frequency has one mode. The data set is called unimodal in this case. A data set with two values that occur most frequently has two modes. The distribution, in this case, is said to be bimodal. If more than two values in a data set occur most frequently, then the data set contains more than two modes and it is said to be multimodal (Mann 2012).
# First, let's import all the needed libraries.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
import statistics as stats
# Create an unimodal, a bimodal and multimodal distribution
def modalDistFunc(m, n, mu, sig):
"""FUNCTION modalDistFunc ###
m int: number of modes
n int: number of random variables
mu vector: mu, mean values
sig vector: standard deviations
"""
out = np.array([0])
for idx in list(range(0, m, 1)):
dist = np.random.normal(mu[idx], sig[idx], n)
out = np.concatenate([out, dist])
return out
### generate plot ###
fig, axs = plt.subplots(1, 3, figsize=(15, 5))
## arrow style ##
col = "red"
wd = 0.05
hln = 0.02
## uni
m = 1
axs[0].hist(modalDistFunc(m, n, mu, sig), bins=breaks, density=1, color="black")
axs[0].title.set_text("unimodal")
axs[0].arrow(
mu[0],
max(h_uni[1]) * 0.11,
mu[0],
max(h_uni[1]) * -0.01,
width=wd,
color=col,
head_length=hln,
head_width=6 * wd,
)
axs[0].axis("off")
axs[0].text(0.7, 0.4, "one mode", style="italic", color="red")
# bimodal
m = 2
axs[1].hist(modalDistFunc(m, n, mu, sig), bins=breaks, density=1, color="black")
axs[1].title.set_text("bimodal")
axs[1].arrow(
mu[0],
0.26,
0,
-0.045,
length_includes_head=True,
width=wd,
color=col,
head_length=hln,
head_width=6 * wd,
)
axs[1].arrow(
mu[1],
0.2,
0,
-0.045,
length_includes_head=True,
width=wd,
color=col,
head_length=hln,
head_width=6 * wd,
)
axs[1].axis("off")
axs[1].text(6, 0.2, "two modes", style="italic", color="red")
# multimodal
m = 4
axs[2].hist(modalDistFunc(m, n, mu, sig), bins=breaks, density=1, color="black")
axs[2].title.set_text("multimodal")
axs[2].arrow(
mu[0],
0.15,
0,
-0.025,
length_includes_head=True,
width=wd,
color=col,
head_length=0.015,
head_width=16 * wd,
)
axs[2].arrow(
mu[1],
0.11,
0,
-0.025,
length_includes_head=True,
width=wd,
color=col,
head_length=0.015,
head_width=16 * wd,
)
axs[2].arrow(
mu[2],
0.07,
0,
-0.025,
length_includes_head=True,
width=wd,
color=col,
head_length=0.015,
head_width=16 * wd,
)
axs[2].arrow(
mu[3],
0.07,
0,
-0.025,
length_includes_head=True,
width=wd,
color=col,
head_length=0.015,
head_width=16 * wd,
)
axs[2].axis("off")
axs[2].text(6, 0.1, "more than two modes", style="italic", color="red")
plt.show()
Unlike the mean and the median, the mode can be applied to quantitative (numeric) and qualitative (categorical) data.
The Python library statistics
provides the mode
function to calculate the mode.This function takes a vector as input and gives the mode value as output.
You can now test this function on the students
data set.
Use the very handy apply
function in order to apply the function mode
to every variable of interest.
If you struggle with the apply
function you may type help(apply)
or look up the documentation with examples here.
students = pd.read_csv(
"https://userpage.fu-berlin.de/soga/200/2010_data_sets/students.csv"
)
vars = ["gender", "age", "religion", "nc.score", "semester", "height", "weight"]
students[vars].apply(stats.mode)
gender Male age 21 religion Catholic nc.score 1.18 semester 1st height 174 weight 67.1 dtype: object
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.