In the inverse distance weighting (IDW) approach, also referred to as inverse distance-based weighted interpolation, the estimation of the value $z$ at location $\mathbf x$ is a weighted mean of nearby observations.
$$ \hat z(\mathbf x) = \frac{\sum_i^n w_iz_i}{\sum_i^nw_i}\text{,}$$where
$w_i = \vert \mathbf x-\mathbf x_i \vert^{-\beta}$ and
$\beta \ge0$ and $\vert \cdot \vert$ corresponds to the euclidean distance. The inverse distance power, $\beta$, determines the degree to which the nearer point(s) are preferred over more distant points. Typically $\beta =1$ or $\beta =2$ corresponding to an inverse or inverse squared relationship. The number of surrounding points, $n$, to be included decides whether a global or local weighting is applied. Both parameters $\beta$ and $n$ may be fine tuned by cross-validation, a technique we discuss in the subsequent sections in more detail. If the point $\mathbf x$ coincides with an observation location $(\mathbf x= \mathbf x_i)$, then the observed value,$\mathbf x$, is returned to avoid infinite weights.
Let us gain intuition on the IDW by applying it on a toy example. Given below are five points $z_1, z_2, z_3, z_4, z_5$ and one point $z_0$, which we want to predict, where $z \in \mathbb R^2$. The planar coordinates of each point $z$ are given in the columns x
and y
, and the actual value associated to each point is given in column z
.
# First, let's import the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import random
random.seed(3)
x = [1, 0.5, 1.5, 1, 0.5, 1.2]
y = [1, 0.9, 1.5, 0.5, 1.4, 1]
z = [np.nan, 1, 3, 5, 7, 7]
p = pd.DataFrame({"x": x, "y": y, "z": z})
We plot the points for a better understanding.
p0 = p.iloc[0]
p1 = p.iloc[1:]
plt.scatter(p1.x, p1.y, color="black", s=40)
plt.scatter(p0.x, p0.y, color="orangered", s=40)
label = 0
# add labels for each point
for x, y in zip(p1.x, p1.y):
label = label + 1
plt.annotate(
f"z{label}", # label for each pont
(x, y), # coordinates to position the label
textcoords="offset points", # how to position the text
xytext=(0, 10), # distance from text to points (x,y)
ha="center",
) # horizontal alignment can be left, right or center
plt.grid()
plt.show()
The euclidean distance, $d$, between two points $(\mathbf u,\mathbf v)$ in a plane $(\mathbb R^2)$ is given by
$$d(\mathbf u,\mathbf v) = \sqrt{(u_1-v_1)^2 +(u_2-v_2)^2}$$Hence, we can compute $w_i$ as follows
$$\left[ \begin{array}\\ w_1\\ w_2\\ ...\\ w_5 \end{array} \right] = \left[ \begin{array}\\ \vert \mathbf z-\mathbf z_1 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_2 \vert^{-\beta}\\ \cdots\\ \vert \mathbf z-\mathbf z_5 \vert^{-\beta} \end{array} \right]$$Consider $\beta = 2$ and $n=5$.
beta = 2
n = 5
Plugging in our toy data results in
$$\left[ \begin{array}\\ w_1\\ w_2\\ w_3\\ w_4\\ w_5 \end{array} \right]= \left[ \begin{array}\\ \vert \mathbf z-\mathbf z_1 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_2 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_3 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_4 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_5 \vert^{-\beta} \end{array} \right] = \left[ \begin{array}\\ \sqrt{(1-0.5)^2 +(1-0.9)^2}^{- 2}\\ \sqrt{(1-1)^2 +(1-1.5)^2}^{- 2}\\ \sqrt{(1-1)^2 +(1-0.5)^2}^{- 2}\\ \sqrt{(1-0.5)^2 +(1-1.4)^2}^{- 2}\\ \sqrt{(1-1.2)^2 +(1-1)^2}^{- 2} \end{array} \right] \approx \left[ \begin{array}\\ 3.846\\ 2\\ 4\\ 2.439\\ 25 \end{array} \right]$$w = np.sqrt((p0.x - p1.x) ** 2 + (p0.y - p1.y) ** 2) ** -beta
w = np.round(w, 3)
w
1 3.846 2 2.000 3 4.000 4 2.439 5 25.000 dtype: float64
Hence,
$$\sum_i^nw_i \approx 37.285$$np.round(np.sum(w), 3)
37.285
Following that we may write
$$w_iz_i = \left[ \begin{array}\\ 3.846\\ 2\\ 4\\ 2.439\\ 25 \end{array} \right] \left[ \begin{array}\\ 1\\ 3\\ 5\\ 7\\ 7 \end{array} \right]$$resulting in
$$\sum_i^n w_iz_i = 221.919$$round(sum(w * p1.z), 3)
221.919
Plugging the numerator $(\sum_i^n w_iz_i)$ and the denominator $(\sum_i^n w_i)$ into the equation from above yields
$$ \hat z_0 = \frac{\sum_i^n w_iz_i}{\sum_i^nw_i} \approx \frac{221.919}{37.285} = 5.952 $$hat_z0 = round(sum(w * p1.z) / sum(w), 3)
hat_z0
5.952
Hence, we conclude that given $beta = 2$ and $n = 5$, the value for location $z_0\approx 5.952$.
Note that usually interpolation is not done for one particular point in space but on a regular grid.
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.