In statistics, the mode represents the most common value in a data set. Therefore, the mode is the value that occurs with the highest frequency in a data set (Mann 2012). In terms of graphical frequency distribution the mode corresponds to the summit(s) of the graph.

A major shortcoming of the mode is that a data set may have none or may have more than one mode, whereas it will have only one mean and only one median. For instance, a data set with each value occurring only once has no mode. A data set with only one value occurring with the highest frequency has one mode. The data set is called unimodal in this case. A data set with two values that occur most frequently has two modes. The distribution, in this case, is said to be bimodal. If more than two values in a data set occur most frequently, then the data set contains more than two modes and it is said to be multimodal (Mann 2012).

In [2]:
# First, let's import all the needed libraries.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
import statistics as stats
In [3]:
# Create an unimodal, a bimodal and multimodal distribution
def modalDistFunc(m, n, mu, sig):
    """FUNCTION modalDistFunc ###
    m int: number of modes
    n int: number of random variables
    mu vector: mu, mean values
    sig vector: standard deviations
    """
    out = np.array([0])
    for idx in list(range(0, m, 1)):
        dist = np.random.normal(mu[idx], sig[idx], n)
        out = np.concatenate([out, dist])
    return out
In [6]:
### generate plot ###
fig, axs = plt.subplots(1, 3, figsize=(15, 5))

## arrow style ##
col = "red"
wd = 0.05
hln = 0.02

## uni
m = 1
axs[0].hist(modalDistFunc(m, n, mu, sig), bins=breaks, density=1, color="black")
axs[0].title.set_text("unimodal")
axs[0].arrow(
    mu[0],
    max(h_uni[1]) * 0.11,
    mu[0],
    max(h_uni[1]) * -0.01,
    width=wd,
    color=col,
    head_length=hln,
    head_width=6 * wd,
)
axs[0].axis("off")
axs[0].text(0.7, 0.4, "one mode", style="italic", color="red")

# bimodal
m = 2
axs[1].hist(modalDistFunc(m, n, mu, sig), bins=breaks, density=1, color="black")
axs[1].title.set_text("bimodal")
axs[1].arrow(
    mu[0],
    0.26,
    0,
    -0.045,
    length_includes_head=True,
    width=wd,
    color=col,
    head_length=hln,
    head_width=6 * wd,
)
axs[1].arrow(
    mu[1],
    0.2,
    0,
    -0.045,
    length_includes_head=True,
    width=wd,
    color=col,
    head_length=hln,
    head_width=6 * wd,
)
axs[1].axis("off")
axs[1].text(6, 0.2, "two modes", style="italic", color="red")


# multimodal
m = 4
axs[2].hist(modalDistFunc(m, n, mu, sig), bins=breaks, density=1, color="black")
axs[2].title.set_text("multimodal")
axs[2].arrow(
    mu[0],
    0.15,
    0,
    -0.025,
    length_includes_head=True,
    width=wd,
    color=col,
    head_length=0.015,
    head_width=16 * wd,
)
axs[2].arrow(
    mu[1],
    0.11,
    0,
    -0.025,
    length_includes_head=True,
    width=wd,
    color=col,
    head_length=0.015,
    head_width=16 * wd,
)
axs[2].arrow(
    mu[2],
    0.07,
    0,
    -0.025,
    length_includes_head=True,
    width=wd,
    color=col,
    head_length=0.015,
    head_width=16 * wd,
)
axs[2].arrow(
    mu[3],
    0.07,
    0,
    -0.025,
    length_includes_head=True,
    width=wd,
    color=col,
    head_length=0.015,
    head_width=16 * wd,
)
axs[2].axis("off")
axs[2].text(6, 0.1, "more than two modes", style="italic", color="red")


plt.show()

Unlike the mean and the median, the mode can be applied to quantitative (numeric) and qualitative (categorical) data. The Python library statistics provides the mode function to calculate the mode.This function takes a vector as input and gives the mode value as output.

You can now test this function on the students data set. Use the very handy apply function in order to apply the function mode to every variable of interest. If you struggle with the apply function you may type help(apply) or look up the documentation with examples here.

In [7]:
students = pd.read_csv(
    "https://userpage.fu-berlin.de/soga/200/2010_data_sets/students.csv"
)
vars = ["gender", "age", "religion", "nc.score", "semester", "height", "weight"]

students[vars].apply(stats.mode)
Out[7]:
gender          Male
age               21
religion    Catholic
nc.score        1.18
semester         1st
height           174
weight          67.1
dtype: object

Citation

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.