The inverse distance weighting (IDW) approach is also known as inverse distance-based weighted interpolation. It is the estimation of the value z at location x by a weighted mean of nearby observations.
ˆz(x)=∑niwizi∑niwi,
where
wi=|x−xi|−β and where β≥0 and |⋅| corresponds to the euclidean distance. The inverse distance power β determines the degree to which nearer points are preferred over more distant points. Typically β=1 or β=2, corresponding to an inverse or inverse squared relationship. The number of surrounding points n that are included in the interpolation decides whether a global or local weighting is applied. Both parameters β and n may be fine-tuned by cross-validation, a technique we discuss in more detail in the subsequent sections. If the point x coincides with an observation location (x=xi), then the observed value x is returned to avoid infinite weights.
Let us gain intuition on IDW by applying it on a toy example. Given
below are five points (z1,z2,z3,z4,z5) and one point z0,
which we want to predict, z∈R2. The coordinates of each point z are given in the columns x
and y
, and the actual value associated to each point is
given in column z
.
z0 <- c(1, 1, NA)
z1 <- c(0.5, 0.9, 1)
z2 <- c(1.5, 1.5, 3)
z3 <- c(1, 0.5, 5)
z4 <- c(0.5, 1.4, 7)
z5 <- c(1.2, 1, 7)
p <- as.data.frame(rbind(z0, z1, z2, z3, z4, z5))
colnames(p) <- c("x", "y", "z")
p
## x y z
## z0 1.0 1.0 NA
## z1 0.5 0.9 1
## z2 1.5 1.5 3
## z3 1.0 0.5 5
## z4 0.5 1.4 7
## z5 1.2 1.0 7
We plot the points for a better understanding.
library(tidyverse)
p0 <- p[1, ]
p1 <- p[2:nrow(p), ]
ggplot(p1, aes(x = x, y = y, label = rownames(p1))) +
geom_point() +
geom_point(data = p0, aes(
x = x, y = y,
label = "rownames(p0)", color = "red",
cex = 5
)) +
geom_text(vjust = 0, nudge_y = 0.05) +
coord_fixed(ratio = 1) +
theme_bw() +
theme(legend.position = "none") +
coord_fixed(ratio = 1)
The euclidean distance d between two points (u,v) in a plane (R2) is given by
d(u,v):=√(u1−v1)2+(u2−v2)2
Now, we can compute w as follows
[w1w2...w5]=[|z−z1|−β|z−z2|−β⋯|z−z5|−β],where|z−zi|=d(z,zi)
Exercise: Calculate the weigths w1,...,w5 for our toy data with the formula above. We want closest points to have an increased impact on our interpolation. Which value for β should we take in order to achieve this, β=1 or β=2 ?
## Your code here...
beta <- NULL
w <- NULL
If we use a squared inverse relationship, i.e. β=2, the weights for more distant points become smaller.
beta <- 2
w <- sqrt((as.numeric(p0["x"]) - p1["x"])^2 +
(as.numeric(p0["y"]) - p1["y"])^2)^ -beta
colnames(w) <- "w"
round(w, 3)
## w
## z1 3.846
## z2 2.000
## z3 4.000
## z4 2.439
## z5 25.000
[w1w2w3w4w5]=[|z−z1|−β|z−z2|−β|z−z3|−β|z−z4|−β|z−z5|−β]=[√(1−0.5)2+(1−0.9)2−2√(1−1.5)2+(1−1.5)2−2√(1−1)2+(1−0.5)2−2√(1−0.5)2+(1−1.4)2−2√(1−1.2)2+(1−1)2−2]≈[3.846242.43925]
Now, we can compute ∑niwi and ∑niwizi to get ˆz(x)=ˆz0=∑niwizi∑niwi.
Exercise: Predict z0 of our toy data.
## Your code here...
hat_z0 <- NULL
round(sum(w), 3)
## [1] 37.285
n∑iwi≈37.285
Following that we may write
wizi=[3.846242.43925]⨀[13577] where ⨀ denotes the element-wise vector product.
round(sum(w * p1[, 3]), 3)
## [1] 221.919
n∑iwizi≈221.919
Plugging the numerator (∑niwizi) and the denominator (∑niwi) into the equation from above yields
hat_z0 <- round(sum(w * p1[, 3]) / sum(w), 3)
hat_z0
## [1] 5.952
ˆz0=∑niwizi∑niwi≈221.91937.285=5.952
Hence, we conclude that given β=2 and n=5, the value for z0 at location x0,y0 is z0≈5.952.
Usually, interpolation is not done only for one particular point in space but on a regular grid.
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.