The inverse distance weighting (IDW) approach is also known as inverse distance-based weighted interpolation. It is the estimation of the value $$z$$ at location $$\mathbf x$$ by a weighted mean of nearby observations.

$\hat z(\mathbf x) = \frac{\sum_i^n w_iz_i}{\sum_i^nw_i}\text{,}$

where

$w_i = \vert \mathbf x-\mathbf x_i \vert^{-\beta}$ and where $$\beta \ge0$$ and $$\vert \cdot \vert$$ corresponds to the euclidean distance. The inverse distance power $$\beta$$ determines the degree to which nearer points are preferred over more distant points. Typically $$\beta =1$$ or $$\beta =2$$, corresponding to an inverse or inverse squared relationship. The number of surrounding points $$n$$ that are included in the interpolation decides whether a global or local weighting is applied. Both parameters $$\beta$$ and $$n$$ may be fine-tuned by cross-validation, a technique we discuss in more detail in the subsequent sections. If the point $$\mathbf x$$ coincides with an observation location $$(\mathbf x= \mathbf x_i)$$, then the observed value $$\mathbf x$$ is returned to avoid infinite weights.

Let us gain intuition on IDW by applying it on a toy example. Given below are five points ($$z_1, z_2, z_3, z_4, z_5$$) and one point $$z_0$$, which we want to predict, $$z \in \mathbb R^2$$. The coordinates of each point $$z$$ are given in the columns x and y, and the actual value associated to each point is given in column z.

z0 <- c(1, 1, NA)
z1 <- c(0.5, 0.9, 1)
z2 <- c(1.5, 1.5, 3)
z3 <- c(1, 0.5, 5)
z4 <- c(0.5, 1.4, 7)
z5 <- c(1.2, 1, 7)
p <- as.data.frame(rbind(z0, z1, z2, z3, z4, z5))
colnames(p) <- c("x", "y", "z")
p
##      x   y  z
## z0 1.0 1.0 NA
## z1 0.5 0.9  1
## z2 1.5 1.5  3
## z3 1.0 0.5  5
## z4 0.5 1.4  7
## z5 1.2 1.0  7

We plot the points for a better understanding.

library(tidyverse)
p0 <- p[1, ]
p1 <- p[2:nrow(p), ]
ggplot(p1, aes(x = x, y = y, label = rownames(p1))) +
geom_point() +
geom_point(data = p0, aes(
x = x, y = y,
label = "rownames(p0)", color = "red",
cex = 5
)) +
geom_text(vjust = 0, nudge_y = 0.05) +
coord_fixed(ratio = 1) +
theme_bw() +
theme(legend.position = "none") +
coord_fixed(ratio = 1)

The euclidean distance $$d$$ between two points $$(\mathbf u,\mathbf v)$$ in a plane $$(\mathbb R^2)$$ is given by

$d(\mathbf u,\mathbf v) := \sqrt{(u_1-v_1)^2 +(u_2-v_2)^2}$

Now, we can compute $$\mathbf w$$ as follows

$\left[ \begin{array}\\ w_1\\ w_2\\ ...\\ w_5 \end{array} \right] = \left[ \begin{array}\\ \vert \mathbf z-\mathbf z_1 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_2 \vert^{-\beta}\\ \cdots\\ \vert \mathbf z-\mathbf z_5 \vert^{-\beta} \end{array} \right], \; where \, |\mathbf z - \mathbf z_i| = d(\mathbf z, \mathbf z_i)$

Exercise: Calculate the weigths $$w_1, ..., w_5$$ for our toy data with the formula above. We want closest points to have an increased impact on our interpolation. Which value for $$\beta$$ should we take in order to achieve this, $$\beta = 1$$ or $$\beta = 2$$ ?

## Your code here...
beta <- NULL
w <- NULL
Show code

If we use a squared inverse relationship, i.e. $$\beta = 2$$, the weights for more distant points become smaller.

beta <- 2

w <- sqrt((as.numeric(p0["x"]) - p1["x"])^2 +
(as.numeric(p0["y"]) - p1["y"])^2)^ -beta

colnames(w) <- "w"

round(w, 3)
##         w
## z1  3.846
## z2  2.000
## z3  4.000
## z4  2.439
## z5 25.000

$\left[ \begin{array}\\ w_1\\ w_2\\ w_3\\ w_4\\ w_5 \end{array} \right]= \left[ \begin{array}\\ \vert \mathbf z-\mathbf z_1 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_2 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_3 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_4 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_5 \vert^{-\beta} \end{array} \right] = \left[ \begin{array}\\ \sqrt{(1-0.5)^2 +(1-0.9)^2}^{- 2}\\ \sqrt{(1-1.5)^2 +(1-1.5)^2}^{- 2}\\ \sqrt{(1-1)^2 +(1-0.5)^2}^{- 2}\\ \sqrt{(1-0.5)^2 +(1-1.4)^2}^{- 2}\\ \sqrt{(1-1.2)^2 +(1-1)^2}^{- 2} \end{array} \right] \approx \left[ \begin{array}\\ 3.846\\ 2\\ 4\\ 2.439\\ 25 \end{array} \right]$

Now, we can compute $$\sum_i^nw_i$$ and $$\sum_i^nw_i z_i$$ to get $$\hat z(\mathbf x) = \hat z_0 = \frac{\sum_i^n w_iz_i}{\sum_i^nw_i}$$.

Exercise: Predict $$z_0$$ of our toy data.

## Your code here...
hat_z0 <- NULL
Show code
round(sum(w), 3)
## [1] 37.285

$\sum_i^n w_i \approx 37.285$

Following that we may write

$w_iz_i = \left[ \begin{array}\\ 3.846\\ 2\\ 4\\ 2.439\\ 25 \end{array} \right] \bigodot \left[ \begin{array}\\ 1\\ 3\\ 5\\ 7\\ 7 \end{array} \right]$ where $$\bigodot$$ denotes the element-wise vector product.

round(sum(w * p1[, 3]), 3)
## [1] 221.919

$\sum_i^n w_iz_i \approx 221.919$

Plugging the numerator $$(\sum_i^n w_iz_i)$$ and the denominator $$(\sum_i^n w_i)$$ into the equation from above yields

hat_z0 <- round(sum(w * p1[, 3]) / sum(w), 3)
hat_z0
## [1] 5.952

$\hat z_0 = \frac{\sum_i^n w_iz_i}{\sum_i^nw_i} \approx \frac{221.919}{37.285} = 5.952$

Hence, we conclude that given $$\beta = 2$$ and $$n = 5$$, the value for $$z_0$$ at location $$x_0,y_0$$ is $$z_0\approx 5.952$$.

Usually, interpolation is not done only for one particular point in space but on a regular grid.

Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.