The inverse distance weighting (IDW) approach is also known as inverse distance-based weighted interpolation, estimates the value \(z\) at location \(\mathbf x\) as a weighted mean of nearby observations.
\[ \hat z(\mathbf x) = \frac{\sum_i^n w_iz_i}{\sum_i^nw_i}\text{,}\]
where the weights are defined as:
\[w_i = \vert \mathbf x-\mathbf x_i \vert^{-\beta}\] and where \(\beta \ge0\) and \(\vert \cdot \vert\) corresponds to the euclidean distance. The inverse distance power, \(\beta\), determines the degree to which nearer points are preferred over more distant points. Typically, \(\beta =1\) or \(\beta =2\), which correspond to an inverse or inverse squared relationship. The difference is that with \(\beta =2\), weights decrease faster with distance than with \(\beta =1\), meaning points farther away have much less influence on the estimate. The number of surrounding points, \(n\), included in the interpolation decides whether a global or local weighting is applied. Both parameters \(\beta\) and \(n\) may be fine-tuned using cross-validation, a technique we discuss in more detail in the following sections. If the point \(\mathbf x\) coincides with an observation location \((\mathbf x= \mathbf x_i)\), then the observed value \(\mathbf z\) is returned to avoid infinite weights.
Let us gain intuition on IDW by applying it to a toy example. We are
given five known points (\(z_1, z_2, z_3, z_4,
z_5\)) and one point \(z_0\),
which we want to predict, \(z \in \mathbb
R^2\). The coordinates of each point \(z\) are given in the columns x
and y, and the actual value associated to each point is
given in column z.
z0 <- c(1, 1, NA)
z1 <- c(0.5, 0.9, 1)
z2 <- c(1.5, 1.5, 3)
z3 <- c(1, 0.5, 5)
z4 <- c(0.5, 1.4, 7)
z5 <- c(1.2, 1, 7)
p <- as.data.frame(rbind(z0, z1, z2, z3, z4, z5))
colnames(p) <- c("x", "y", "z")
p##      x   y  z
## z0 1.0 1.0 NA
## z1 0.5 0.9  1
## z2 1.5 1.5  3
## z3 1.0 0.5  5
## z4 0.5 1.4  7
## z5 1.2 1.0  7We plot the points to gain a better understanding of their spatial relations.
library(tidyverse)
p0 <- p[1, ]
p1 <- p[2:nrow(p), ]
ggplot(p1, aes(x = x, y = y, label = rownames(p1))) +
  geom_point() +
  geom_point(data = p0, aes(
    x = x, y = y,
    label = "rownames(p0)", color = "red",
    cex = 5
  )) +
  geom_text(vjust = 0, nudge_y = 0.05) +
  coord_fixed(ratio = 1) +
  theme_bw() +
  theme(legend.position = "none") +
  coord_fixed(ratio = 1)The euclidean distance \(d\) between two points \((\mathbf u,\mathbf v)\) in a plane \((\mathbb R^2)\) is given by
\[d(\mathbf u,\mathbf v) := \sqrt{(u_1-v_1)^2 +(u_2-v_2)^2}\]
Now, we can compute \(\mathbf w\) as follows
\[\left[ \begin{array}\\ w_1\\ w_2\\ ...\\ w_5 \end{array} \right] = \left[ \begin{array}\\ \vert \mathbf z-\mathbf z_1 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_2 \vert^{-\beta}\\ \cdots\\ \vert \mathbf z-\mathbf z_5 \vert^{-\beta} \end{array} \right], \; where \, |\mathbf z - \mathbf z_i| = d(\mathbf z, \mathbf z_i)\]
Exercise: Calculate the weigths \(w_1, ..., w_5\) for our toy data with the formula above. We want the closest points to have an increased impact on our interpolation. Which value for \(\beta\) should we take in order to achieve this, \(\beta = 1\) or \(\beta = 2\) ?
## Your code here...
beta <- NULL
w <- NULLIf we use a squared inverse relationship, i.e. \(\beta = 2\), the weights assigned to more distant points become smaller.
beta <- 2
w <- sqrt((as.numeric(p0["x"]) - p1["x"])^2 +
  (as.numeric(p0["y"]) - p1["y"])^2)^ -beta
colnames(w) <- "w"
round(w, 3)##         w
## z1  3.846
## z2  2.000
## z3  4.000
## z4  2.439
## z5 25.000\[\left[ \begin{array}\\ w_1\\ w_2\\ w_3\\ w_4\\ w_5 \end{array} \right]= \left[ \begin{array}\\ \vert \mathbf z-\mathbf z_1 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_2 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_3 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_4 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_5 \vert^{-\beta} \end{array} \right] = \left[ \begin{array}\\ \sqrt{(1-0.5)^2 +(1-0.9)^2}^{- 2}\\ \sqrt{(1-1.5)^2 +(1-1.5)^2}^{- 2}\\ \sqrt{(1-1)^2 +(1-0.5)^2}^{- 2}\\ \sqrt{(1-0.5)^2 +(1-1.4)^2}^{- 2}\\ \sqrt{(1-1.2)^2 +(1-1)^2}^{- 2} \end{array} \right] \approx \left[ \begin{array}\\ 3.846\\ 2\\ 4\\ 2.439\\ 25 \end{array} \right]\]
Now, we can compute \(\sum_i^nw_i\) and \(\sum_i^nw_i z_i\) to get \(\hat z(\mathbf x) = \hat z_0 = \frac{\sum_i^n w_iz_i}{\sum_i^nw_i}\).
Exercise: Predict \(z_0\) of our toy data.
## Your code here...
hat_z0 <- NULLround(sum(w), 3)## [1] 37.285\[\sum_i^n w_i \approx 37.285\]
Following that we may write
\[w_iz_i = \left[ \begin{array}\\ 3.846\\ 2\\ 4\\ 2.439\\ 25 \end{array} \right] \bigodot \left[ \begin{array}\\ 1\\ 3\\ 5\\ 7\\ 7 \end{array} \right]\] where \(\bigodot\) denotes the element-wise vector product.
round(sum(w * p1[, 3]), 3)## [1] 221.919\[\sum_i^n w_iz_i \approx 221.919\]
Plugging the numerator \((\sum_i^n w_iz_i)\) and the denominator \((\sum_i^n w_i)\) into the equation from above yields:
hat_z0 <- round(sum(w * p1[, 3]) / sum(w), 3)
hat_z0## [1] 5.952\[ \hat z_0 = \frac{\sum_i^n w_iz_i}{\sum_i^nw_i} \approx \frac{221.919}{37.285} = 5.952 \]
Hence, we conclude that given \(\beta = 2\) and \(n = 5\), the value for \(z_0\) at location \(x_0,y_0\) is \(z_0\approx 5.952\).
Usually, interpolation is performed not just at a single point in space but over a regular grid.
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.