The interaction between two arbitrary points is measured by second-order properties, which reflect the tendency of the events to appear clustered (points tend to be close together), independently (the Poisson process) or regularly spaced (points tend to avoid each other).
We start this section by loading the required data sets and packages.
library(spatstat)
library(rasterVis)
load(url("https://userpage.fu-berlin.de/soga/data/r-data/ppp_data_berlin.RData"))
par(mar = c(0, 0, 2, 2),
mfrow = c(1, 3),
cex.main = 2)
plot(ppp_random, main = "independant")
plot(ppp_regular, main = "regular")
plot(ppp_locations, main = "clustered")
A simple diagnostic option or dependence between points is the
Fry plot. It is a scatterplot of the vector differences \(x_j - x_i\) between all pairs of distinct
points in the pattern. In spatstat
, the Fry plot is
computed by the command fryplot()
.
par(mar = c(0, 0, 3, 2),
mfrow = c(1, 3),
cex.main = 2)
fryplot(ppp_random, main = "independant", pch = 16, cex = 0.2)
fryplot(ppp_regular, main = "regular", pch = 16, cex = 0.2)
fryplot(ppp_locations, main = "clustered", pch = 16, cex = 0.2)
Classical techniques for investigating inter-point interaction are distance methods, based on measuring the distances between points (Baddeley 2010):
pairwise distances \(s_{ij} = \Vert x_i - x_j \Vert\) between all distinct pairs of points \(x_i\) and \(x_j\) \((i \ne j)\) in the pattern
nearest neighbor distances \(t_i = \min_{j \ne i}s_{ij}\), the distance from each point \(x_i\) to its nearest neighbor
empty space distances \(d(u) = \min \Vert u-x_i \Vert\), the distance from a fixed reference location \(u\) in the window to the nearest data point
These methods are available in the spatstat
package
using the functions pairdist()
, nndist()
and
distmap()
respectively.
par(mar = c(2, 2, 2, 2), mfrow = c(1, 3), oma = c(0, 0, 2, 0))
hist(pairdist(ppp_locations),
freq = FALSE,
breaks = nrow(pairdist(ppp_locations)),
main = "independant",
xlim = c(0, 40000)
)
hist(pairdist(ppp_regular),
freq = FALSE,
breaks = nrow(pairdist(ppp_regular)),
main = "regular"
)
hist(pairdist(ppp_random),
freq = FALSE, breaks = nrow(pairdist(ppp_random)), main = "clustered"
)
mtext("Pairwise Distances Histogram", outer = TRUE, cex = 1.5)
levelplot(pairdist(ppp_random), main = "Pairwise Distances: independant", par.settings=BTCTheme())
levelplot(pairdist(ppp_regular), main = "Pairwise Distances: regular", par.settings=BTCTheme())
levelplot(pairdist(ppp_locations), main = "Pairwise Distances: clustered", par.settings=BTCTheme())
par(mar = c(2, 2, 2, 2), mfrow = c(1, 3), oma = c(0, 0, 2, 0))
plot(nndist(ppp_random), main = "independant", pch = 16, cex = 0.8)
plot(nndist(ppp_regular), main = "regular", pch = 16, cex = 0.8)
plot(nndist(ppp_locations), main = "clustered", pch = 16, cex = 0.8)
mtext("Nearest Neighbor Distances", outer = TRUE, cex = 1.5)
par(mar = c(0, 0, 4, 0),
mfrow = c(1, 3),
oma = c(0, 0, 4, 0),
cex.main = 2)
plot(distmap(ppp_random), main = "independant", pch = 16, cex = 0.2)
plot(distmap(ppp_regular), main = "regular", pch = 16, cex = 0.2)
plot(distmap(ppp_locations), main = "clustered", pch = 16, cex = 0.2)
mtext("Empty Space Distances", outer = TRUE, cex = 2.8)
Recall that under the assumption of a homogeneous Poisson process (HPP), the location of one point in space does not affect the probabilities of points appearing nearby. The intensity of the point process in area \(W\) is a constant \(\lambda(u) = \lambda > 0\), \(\forall u \in W\).
This property of a HPP is often referred to as Complete Spatial Randomness (CSR). In the context of distance methods for point patterns, the HPP serves as a demarcation line between spatial point processes whose realizations can be described as being more regular or less regular than those of the Poisson process, which is a point process pattern that is distributed independently and uniformly over a study area (Baddeley 2010).
The distance
\[d(u, \mathbf x) = \min\{\Vert u - x_i \Vert : x_i \in \mathbf x\}\]
from a fixed location \(u \in \mathbb R^2\) to the nearest point in a point pattern \(\mathbf x\) is called the empty space distance or void distance.
The \(F\) function measures the distribution of all distances from an arbitrary reference location \(u\) in the plane to the nearest observed event \(x\). On a grid of locations \(u_j\) , \(j = 1,..., m\), the empirical distribution function of the observed empty space distances is
\[\hat F(r)= \frac{1}{m}\sum_{j}\mathbf{1}\{d(u_j, \mathbf x) \le r\}\]
For a homogeneous Poisson process of intensity \(\lambda\), hence under CSR, the value of the \(F\) function becomes:
\[F_\text{pois}(r)=1 - \exp(\lambda\pi r^2)\]
The empirical function \(\hat{F}(r)\) is compared against its theoretical expectation \(F_\text{pois}(r)\) by plotting them.
Values \(\hat{F}(r) > F_\text{pois}(r)\) suggest that empty space distances in the point pattern are shorter than for a Poisson process, suggesting a regularly spaced pattern, while values \(\hat{F}(r) < F_\text{pois}(r)\) suggest a clustered pattern.
In spatstat
, the \(F\)
function is computed by the Fest()
command. Recall that a
point process \(X\) extends throughout
two-dimensional space but is observed only inside a window \(W\). Therefore, the empirical distribution
of distances depends on the geometry of the window \(W\) as well as on the characteristics of
the point process \(X\). This leads to
a bias in the distance measurements, often referred to as edge
effects.
Owing the fact the \(\hat{F}(r)\)
depends on the geometry of the window \(W\), spatstat
provides several
edge corrections, such as the Chiu-Stoyan estimation
(correction = "cs"
), the reduced sample estimation
(correction = "rs"
), often referred to as border
correction, and the Baddeley-Gill Kaplan-Meier estimation
(correction = "km"
).
Let us plot the \(F\) function for
the random point data set (ppp_random
), for the clustered
point data set (ppp_locations
), and the regular point data
set (ppp_regular
).
plot(Fest(ppp_random), main = "independant")
The function Fest()
returns by default an object of
class fv
(function value table). By plotting such
an object, the graph includes by default \(F_\text{pois}(r)\) for homogeneous Poisson
processes, the Chiu-Stoyan estimate \(F_\text{cs}(r)\), the reduced
sample estimate (border correction) \(F_\text{board}(r)\), and the
Baddeley-Gill Kaplan-Meier estimate \(F_\text{km}(r)\).
The plot shows that, up to a distance of approximately 1000 meters, the estimates behave like a homogeneous Poisson processes \(\hat{F}(r) \approx F_\text{pois}(r)\). At larger distances, there is some deviation from spatial randomness.
Now, let us examine the clustered data set.
plot(Fest(ppp_locations), main = "clustered")
\(\hat{F}(r) < F_\text{pois}(r)\) for all of the three edge corrected estimates, thus suggesting a clustered pattern.
Finally, we plot the \(F\) function for the regular spaced data set.
plot(Fest(ppp_regular), main = "regular")
Values \(\hat{F}(r) > F_\text{pois}(r)\) indicate a regularly spaced pattern.
The nearest neighbor distances are given by
\[d_i = \text{min}_{j \ne i}\Vert x_i-x_j\Vert\]
The empirical distribution of the nearest neighbour distances depends on the geometry of the window \(W\) as well as on the characteristics of the point process \(X\). Hence, corrections for edge effects are warranted.
Â
The \(G\) function measures the distribution of distances from an arbitrary event to its nearest neighbors.
\[\hat{G}(r) = \frac{1}{n(\mathbf x)}\sum_{i}\mathbf 1\{d_i \le r\}\]
The \(G\) function represents the number of elements in the set of distances up to some threshold \(r\), normalized by the total number of points \(n\) in point pattern \(\mathbf x\).
For a homogeneous Poisson point process of intensity \(\lambda\), the nearest-neighbour distance distribution function becomes
\[G_\text{pois}(r)=1 - \exp(-\lambda\pi r^2)\text{,}\]
where \(\lambda\) is the mean number of events per unit area. Note that this expression is identical to the empty space function \(F\) for the Poisson process.
The interpretation of \(\hat G(r)\) is the reverse of \(\hat F(r)\). Values \(\hat G(r) > G_\text{pois}(r)\) suggest that nearest neighbour distances in the point pattern are shorter than for a Poisson process, suggesting a clustered pattern, while values \(\hat G(r) < G_\text{pois}(r)\) suggest a regular pattern.
The function Gest()
of the spatstat
package
computes estimates of \(\hat G(r)\)
using several edge corrections. By default the plotted graph includes
\(G_\text{pois}(r)\) for homogeneous
Poisson processes, \(G_\text{han}(r)\),
the Hanisch estimate, \(G_\text{board}(r)\), the reduced
sample estimate (border correction) and \(F_\text{km}(r)\), the Baddeley-Gill
Kaplan-Meier estimate.
Let us plot the \(G\) function for
the random point data set (ppp_random
), for the clustered
point data set (ppp_locations
), and the regular point data
set (ppp_regular
).
plot(Gest(ppp_random), main = "independant")
The plot shows that the various estimates \(\hat{G}(r)\), are not too far from \(G_\text{pois}(r)\), corresponding to a homogeneous Poisson process, thus indicating spatial randomness.
Now let us examine the clustered point data set.
plot(Gest(ppp_locations), main = "clustered")
Again, by default, the graph includes three estimates. For a clustered pattern, observed locations should be closer to each other than expected under CSR. All estimates are \(\hat{G}(r) > G_\text{pois}(r)\), thus suggesting a clustered pattern.
Finally, let us examine the regular data set.
plot(Gest(ppp_regular), main = "regular")
A regular pattern should have higher nearest-neighbour distances than expected under CSR. \(\hat{G}(r)\) is zero for \(r \le 2600\), which shows no nearest-neighbor distances shorter than 2600 m.
The observed pairwise distances \(s_{ij} = \Vert x_i-x_j\Vert\) in the data pattern \(\mathbf x\) constitute a biased sample of pairwise distances, with a bias in favour of smaller distances. For example, we can never observe a pairwise distance greater than the diameter of the window.
Ripley (1977) defined the \(K\) function for a stationary point process so that \(\lambda K(r)\) is the expected number of other points of the process within a distance \(r\) of a typical point of the process.
\[K(r)=\frac{1}{\lambda}\mathbb E[n(\mathbf X \cap b(u,r)/\{u\}\vert u \in \mathbf X)]\]
The \(K\) function for a HPP is
\[K_{\text{pois}}(r) = \pi r^2\] regardless of the intensity.
Numerous estimators of \(K\) have been proposed. Most of them are weighted and normalized empirical distribution functions of the pairwise distances, of the general form
\[\hat K(r) = \frac{1}{\hat\lambda \text{ area}(W)}\sum_i\sum_{i \ne j}\mathbf 1\{\Vert x_i-x_j \Vert \le r\}e(x_i. x_j,r)\]
where \(e(u, v, r)\) is an edge correction weight (Baddeley 2010).
We compare the estimate \(\hat K(r)\) with the Poisson \(K\) function. Values \(\hat K(r) >\pi r^2\) suggest clustering, while \(\hat K(r) <\pi r^2\) suggests a regular pattern.
In spatstat
the function Kest()
computes
several estimates of the \(K\)
function. By default the plotted graph includes \(K_\text{pois}(r)\) for homogeneous Poisson
processes, \(K_\text{board}(r)\), the
border-corrected estimate, \(K_\text{trans}(r)\), the
translation-corrected estimate, and \(K_\text{iso}(r)\), the Ripley isotropic
correction estimate.
Let us plot the \(K\) function for
the random point data set (ppp_random
), for the clustered
point data set (ppp_locations
), and for the regular point
data set (ppp_regular
).
plot(Kest(ppp_random), main = "independant")
The plot shows that the various estimates, \(\hat{K}(r)\), are close to \(K_\text{pois}(r)\), corresponding to a homogeneous Poisson processes and thus indicating complete spatial randomness.
Now let us examine the clustered point data set.
plot(Kest(ppp_locations), main = "clustered")
Values of \(\hat K(r) > K_{\text(pois)}(r)\) suggest clustering.
Finally let us examine the regular data set.
plot(Kest(ppp_regular), main = "regular")
Values of \(\hat K(r) < K_{\text(pois)}(r)\) suggest a regular pattern.
A commonly-used transformation of \(K\) is the \(L\) function
\[L(r) = \sqrt{\frac{K(r)}{\pi}}\]
which transforms the Poisson \(K\)
function to the straight line \(L_\text{pois}(r) = r\), making visual
assessment of the graph easier. To compute the estimated \(L\) function, we may use the
Lest()
function from the spatstat
package.
Let us plot the \(L\) function for
the random point data set (ppp_random
), for the clustered
point data set (ppp_locations
), and the regular point data
set (ppp_regular
).
plot(Lest(ppp_random), main = "independant")
Now let us examine the clustered data set.
plot(Lest(ppp_locations), main = "clustered")
plot(Lest(ppp_regular), main = "regular")
We apply the same logic as above, values \(\hat L(r) > L_\text{pois}(r)\) suggest clustering, while \(\hat L(r) < L_\text{pois}(r)\) suggests a regular pattern.
Another summary function is the pair correlation function:
\[g(r)=\frac{K'(r)}{2\pi r}\]
where \(K'(r)\) is the derivative of \(K\). The pair correlation \(g(r)\) is the probability of observing a pair of points separated by a distance \(r\), divided by the corresponding probability for a Poisson process. The value \(g(r) = 1\) corresponds to complete randomness. For other processes, values \(g(r) > 1\) suggest clustering or attraction at a distance \(r\), while values \(g(r) < 1\) suggest inhibition or regularity.
To compute the estimated pair correlation function, use
pcf()
. Let us plot the pair correlation function for the
random point data set (ppp_random
), for the clustered point
data set (ppp_locations
), and the regular point data set
(ppp_regular
).
plot(pcf(ppp_random), main = "independant")
Now let us examine the clustered data set.
plot(pcf(ppp_locations), main = "clustered")
Finally we examine the regular data set.
plot(pcf(ppp_regular), main = "regular")
The \(J\) function is a combination of the \(F\) and \(G\) the function.
\[J(r) = \frac{1-G(r)}{1-F(r)}\]
defined for all \(r \ge 0\) such that \(F(r) < 1\). For a homogeneous Poisson process, \(F_\text{pois} = G_\text{pois}\), so that
\[J_\text{pois}(r) \equiv 1\]
Values \(J(r) > 1\) suggest regularity, and \(J(r) < 1\) suggest clustering.
The \(J\) function is computed by
Jest()
. Let us plot \(J\)
function for the random point data set (ppp_random
), for
the clustered point data set (ppp_locations
), and for the
regular point data set (ppp_regular
).
plot(Jest(ppp_random), main = "independant")
Now let us examine the clustered data set.
plot(Jest(ppp_locations), main = "clustered")
Finally we examine the regular data set.
plot(Jest(ppp_regular), main = "regular")
allstats()
functionNote that the spatstat
package offers the convenient
function allstats()
to compute the \(F\), \(G\), \(J\)
and \(K\) functions for a data set all
at once.
Let us plot the allstats()
function for the random point
data set (ppp_random
), for the clustered point data set
(ppp_locations
), and for the regular point data set
(ppp_regular
).
plot(allstats(ppp_random), main = "independant")
Now let us examine the clustered point data set.
plot(allstats(ppp_locations), main = "clustered")
Finally we examine the regular point data set.
plot(allstats(ppp_regular), main = "regular")
Exercise: Consider the
lansing
data set, which includes the locations and botanical classification of trees in the Lansing Woods. Subset the data such that only the maple trees are included and apply theallstats()
function. Do maple trees appear to grow clustered, independently or spaced regularly?
data("lansing")
lansing
## Marked planar point pattern: 2251 points
## Multitype, with levels = blackoak, hickory, maple, misc, redoak, whiteoak
## window: rectangle = [0, 1] x [0, 1] units (one unit = 924 feet)
## Your code here...
maple <- lansing[lansing$marks == "maple", ]
maple$marks <- droplevels(maple$marks)
maple <- unmark(maple)
plot(allstats(maple))
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.