In the inverse distance weighting (IDW) approach, also referred to as inverse distance-based weighted interpolation, the estimation of the value \(z\) at location \(\mathbf x\) is a weighted mean of nearby observations.

\[ \hat z(\mathbf x) = \frac{\sum_i^n w_iz_i}{\sum_i^nw_i}\text{,}\]

where

\[w_i = \vert \mathbf x-\mathbf x_i \vert^{-\beta}\] and where \(\beta \ge0\) and \(\vert \cdot \vert\) corresponds to the euclidean distance. The inverse distance power, \(\beta\), determines the degree to which the nearer point(s) are preferred over more distant points. Typically \(\beta =1\) or \(\beta =2\) corresponding to an inverse or inverse squared relationship. The number of surrounding points, \(n\), to be included decides whether a global or local weighting is applied. Both parameters \(\beta\) and \(n\) may be fine tuned by cross-validation, a technique we discuss in the subsequent sections in more detail. If the point \(\mathbf x\) coincides with an observation location \((\mathbf x= \mathbf x_i)\), then the observed value, \(\mathbf x\), is returned to avoid infinite weights.


Let us gain intuition on the IDW by applying it on a toy example. Given below are five points \(z_1\), \(z_2\), \(z_3\), \(z_4\), \(z_5\) and one point \(z_0\), which we want to predict, where \(z \in \mathbb R^2\). The planar coordinates of each point \(z\) are given in the columns x and y, and the actual value associated to each point is given in column z.

z0 <- c(1, 1, NA)
z1 <- c(0.5, 0.9, 1)
z2 <- c(1.5, 1.5, 3)
z3 <- c(1, 0.5, 5)
z4 <- c(0.5, 1.4, 7)
z5 <- c(1.2, 1, 7)
p <- as.data.frame(rbind(z0, z1, z2, z3, z4, z5))
colnames(p) <- c('x', 'y', 'z')
p
##      x   y  z
## z0 1.0 1.0 NA
## z1 0.5 0.9  1
## z2 1.5 1.5  3
## z3 1.0 0.5  5
## z4 0.5 1.4  7
## z5 1.2 1.0  7

We plot the points for a better understanding.

library(tidyverse)
p0 <- p[1,]
p1 <- p[2:nrow(p),]
ggplot(p1, aes(x = x, y = y, label = rownames(p1))) + 
  geom_point() + 
  geom_point(data = p0, aes(x = x, y = y, 
                            label = rownames(p0), color = 'red', 
                            cex = 5)) + 
  geom_text(vjust = 0, nudge_y = 0.05) + 
  theme_bw() + 
  theme(legend.position = "none")

The euclidean distance, \(d\), between two points \((\mathbf u,\mathbf v)\) in a plane \((\mathbb R^2)\) is given by

\[d(\mathbf u,\mathbf v) = \sqrt{(u_1-v_1)^2 +(u_2-v_2)^2}\]

Hence, we can compute \(w_i\) as follows

\[\left[ \begin{array}\\ w_1\\ w_2\\ ...\\ w_5 \end{array} \right] = \left[ \begin{array}\\ \vert \mathbf z-\mathbf z_1 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_2 \vert^{-\beta}\\ \cdots\\ \vert \mathbf z-\mathbf z_5 \vert^{-\beta} \end{array} \right]\]

Consider \(\beta = 2\) and \(n=5\).

beta = 2
n = 5

Plugging in our toy data results in

\[\left[ \begin{array}\\ w_1\\ w_2\\ w_3\\ w_4\\ w_5 \end{array} \right]= \left[ \begin{array}\\ \vert \mathbf z-\mathbf z_1 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_2 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_3 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_4 \vert^{-\beta}\\ \vert \mathbf z-\mathbf z_5 \vert^{-\beta} \end{array} \right] = \left[ \begin{array}\\ \sqrt{(1-0.5)^2 +(1-0.9)^2}^{- 2}\\ \sqrt{(1-1.5)^2 +(1-1.5)^2}^{- 2}\\ \sqrt{(1-1)^2 +(1-0.5)^2}^{- 2}\\ \sqrt{(1-0.5)^2 +(1-1.4)^2}^{- 2}\\ \sqrt{(1-1.2)^2 +(1-1)^2}^{- 2} \end{array} \right] \approx \left[ \begin{array}\\ 3.846\\ 2\\ 4\\ 2.439\\ 25 \end{array} \right]\]

w <- sqrt((as.numeric(p0['x'])-p1['x'])^2 +
            (as.numeric(p0['y'])-p1['y'])^2)^-beta
colnames(w) <- 'w'
round(w,3)
##         w
## z1  3.846
## z2  2.000
## z3  4.000
## z4  2.439
## z5 25.000

Hence,

\[\sum_i^nw_i \approx 37.285\]

round(sum(w),3)
## [1] 37.285

Following that, we may write

\[w_iz_i = \left[ \begin{array}\\ 3.846\\ 2\\ 4\\ 2.439\\ 25 \end{array} \right] \left[ \begin{array}\\ 1\\ 3\\ 5\\ 7\\ 7 \end{array} \right]\]

resulting in

\[\sum_i^n w_iz_i = 221.919\]

round(sum(w*p1[,3]),3)
## [1] 221.919

Plugging the numerator \((\sum_i^n w_iz_i)\) and the denominator \((\sum_i^n w_i)\) into the equation from above yields

\[ \hat z_0 = \frac{\sum_i^n w_iz_i}{\sum_i^nw_i} \approx \frac{221.919}{37.285} = 5.952 \]

hat.z0 <- round(sum(w*p1[,3])/ sum(w),3)
hat.z0
## [1] 5.952

Hence, we conclude that given \(beta = 2\) and \(n = 5\), the value for location \(z_0\approx 5.952\).

Note that usually interpolation is not done for one particular point in space but on a regular grid.