30930_Inverse_Distance_Weighting.knit

The inverse distance weighting (IDW) approach is also known as inverse distance-based weighted interpolation. It is the estimation of the value \(z\) at location \(\mathbf x\) by a weighted mean of nearby observations.

\[ \hat z(\mathbf x) = \frac{\sum_i^n w_iz_i}{\sum_i^nw_i}\text{,}\]

where

\[w_i = \vert \mathbf x-\mathbf x_i \vert^{-\beta}\] and where \(\beta \ge0\) and \(\vert \cdot \vert\) corresponds to the euclidean distance. The inverse distance power \(\beta\) determines the degree to which nearer points are preferred over more distant points. Typically \(\beta =1\) or \(\beta =2\), corresponding to an inverse or inverse squared relationship. The number of surrounding points \(n\) that are included in the interpolation decides whether a global or local weighting is applied. Both parameters \(\beta\) and \(n\) may be fine-tuned by cross-validation, a technique we discuss in more detail in the subsequent sections. If the point \(\mathbf x\) coincides with an observation location \((\mathbf x= \mathbf x_i)\), then the observed value \(\mathbf x\) is returned to avoid infinite weights.

Let us gain intuition on IDW by applying it on a toy example. Given below are five points (\(z_1, z_2, z_3, z_4, z_5\)) and one point \(z_0\), which we want to predict, \(z \in \mathbb R^2\). The coordinates of each point \(z\) are given in the columns x and y, and the actual value associated to each point is given in column z.

z0 <- c(1, 1, NA)
z1 <- c(0.5, 0.9, 1)
z2 <- c(1.5, 1.5, 3)
z3 <- c(1, 0.5, 5)
z4 <- c(0.5, 1.4, 7)
z5 <- c(1.2, 1, 7)
p <- as.data.frame(rbind(z0, z1, z2, z3, z4, z5))
colnames(p) <- c("x", "y", "z")
p

##      x   y  z
## z0 1.0 1.0 NA
## z1 0.5 0.9  1
## z2 1.5 1.5  3
## z3 1.0 0.5  5
## z4 0.5 1.4  7
## z5 1.2 1.0  7

We plot the points for a better understanding.

library(tidyverse)
p0 <- p[1, ]
p1 <- p[2:nrow(p), ]
ggplot(p1, aes(x = x, y = y, label = rownames(p1))) +
  geom_point() +
  geom_point(data = p0, aes(
    x = x, y = y,
    label = "rownames(p0)", color = "red",
    cex = 5
  )) +
  geom_text(vjust = 0, nudge_y = 0.05) +
  coord_fixed(ratio = 1) +
  theme_bw() +
  theme(legend.position = "none") +
  coord_fixed(ratio = 1)

The euclidean distance \(d\) between two points \((\mathbf u,\mathbf v)\) in a plane \((\mathbb R^2)\) is given by

\[d(\mathbf u,\mathbf v) := \sqrt{(u_1-v_1)^2 +(u_2-v_2)^2}\]

Now, we can compute \(\mathbf w\) as follows

\[\left[ \begin{array}\\ w_1\\ w_2\\ ...\\ w_5 \end{array} \right] = \left[ \begin{array}\\ \vert \mathbf z-\mathbf z_1 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_2 \vert^{-\beta}\\ \cdots\\ \vert \mathbf z-\mathbf z_5 \vert^{-\beta} \end{array} \right], \; where \, |\mathbf z - \mathbf z_i| = d(\mathbf z, \mathbf z_i)\]

Exercise: Calculate the weigths \(w_1, ..., w_5\) for our toy data with the formula above. We want closest points to have an increased impact on our interpolation. Which value for \(\beta\) should we take in order to achieve this, \(\beta = 1\) or \(\beta = 2\) ?

## Your code here...
beta <- NULL
w <- NULL

Show code

If we use a squared inverse relationship, i.e. \(\beta = 2\), the weights for more distant points become smaller.

beta <- 2

w <- sqrt((as.numeric(p0["x"]) - p1["x"])^2 +
  (as.numeric(p0["y"]) - p1["y"])^2)^ -beta

colnames(w) <- "w"

round(w, 3)

##         w
## z1  3.846
## z2  2.000
## z3  4.000
## z4  2.439
## z5 25.000

\[\left[ \begin{array}\\ w_1\\ w_2\\ w_3\\ w_4\\ w_5 \end{array} \right]= \left[ \begin{array}\\ \vert \mathbf z-\mathbf z_1 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_2 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_3 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_4 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_5 \vert^{-\beta} \end{array} \right] = \left[ \begin{array}\\ \sqrt{(1-0.5)^2 +(1-0.9)^2}^{- 2}\\ \sqrt{(1-1.5)^2 +(1-1.5)^2}^{- 2}\\ \sqrt{(1-1)^2 +(1-0.5)^2}^{- 2}\\ \sqrt{(1-0.5)^2 +(1-1.4)^2}^{- 2}\\ \sqrt{(1-1.2)^2 +(1-1)^2}^{- 2} \end{array} \right] \approx \left[ \begin{array}\\ 3.846\\ 2\\ 4\\ 2.439\\ 25 \end{array} \right]\]

Now, we can compute \(\sum_i^nw_i\) and \(\sum_i^nw_i z_i\) to get \(\hat z(\mathbf x) = \hat z_0 = \frac{\sum_i^n w_iz_i}{\sum_i^nw_i}\).

Exercise: Predict \(z_0\) of our toy data.

## Your code here...
hat_z0 <- NULL

Show code

round(sum(w), 3)

## [1] 37.285

\[\sum_i^n w_i \approx 37.285\]

Following that we may write

\[w_iz_i = \left[ \begin{array}\\ 3.846\\ 2\\ 4\\ 2.439\\ 25 \end{array} \right] \bigodot \left[ \begin{array}\\ 1\\ 3\\ 5\\ 7\\ 7 \end{array} \right]\] where \(\bigodot\) denotes the element-wise vector product.

round(sum(w * p1[, 3]), 3)

## [1] 221.919

\[\sum_i^n w_iz_i \approx 221.919\]

Plugging the numerator \((\sum_i^n w_iz_i)\) and the denominator \((\sum_i^n w_i)\) into the equation from above yields

hat_z0 <- round(sum(w * p1[, 3]) / sum(w), 3)
hat_z0

## [1] 5.952

\[ \hat z_0 = \frac{\sum_i^n w_iz_i}{\sum_i^nw_i} \approx \frac{221.919}{37.285} = 5.952 \]

Hence, we conclude that given \(\beta = 2\) and \(n = 5\), the value for \(z_0\) at location \(x_0,y_0\) is \(z_0\approx 5.952\).

Usually, interpolation is not done only for one particular point in space but on a regular grid.

Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.