The interaction between two arbitrary points is measured by second-order properties, which reflect the tendency of the events to appear clustered (points tend to be close together), independently (the Poisson process) or regularly spaced (points tend to avoid each other).

We start this section by loading the required data sets and packages.

library(spatstat)
library(rasterVis)
load(url("https://userpage.fu-berlin.de/soga/data/r-data/ppp_data_berlin.RData"))
par(mar = c(0, 0, 2, 2), 
    mfrow = c(1, 3), 
    cex.main = 2)

plot(ppp_random, main = "independant")
plot(ppp_regular, main = "regular")
plot(ppp_locations, main = "clustered")

A simple diagnostic option or dependence between points is the Fry plot. It is a scatterplot of the vector differences \(x_j - x_i\) between all pairs of distinct points in the pattern. In spatstat, the Fry plot is computed by the command fryplot().

par(mar = c(0, 0, 3, 2), 
    mfrow = c(1, 3), 
    cex.main = 2)

fryplot(ppp_random, main = "independant", pch = 16, cex = 0.2)
fryplot(ppp_regular, main = "regular", pch = 16, cex = 0.2)
fryplot(ppp_locations, main = "clustered", pch = 16, cex = 0.2)


Distance methods for point patterns

Classical techniques for investigating inter-point interaction are distance methods, based on measuring the distances between points (Baddeley 2010):

These methods are available in the spatstat package using the functions pairdist(), nndist() and distmap() respectively.

par(mar = c(2, 2, 2, 2), mfrow = c(1, 3), oma = c(0, 0, 2, 0))
hist(pairdist(ppp_locations),
  freq = FALSE,
  breaks = nrow(pairdist(ppp_locations)),
  main = "independant",
  xlim = c(0, 40000)
)
hist(pairdist(ppp_regular),
  freq = FALSE,
  breaks = nrow(pairdist(ppp_regular)),
  main = "regular"
)
hist(pairdist(ppp_random),
  freq = FALSE, breaks = nrow(pairdist(ppp_random)), main = "clustered"
)
mtext("Pairwise Distances Histogram", outer = TRUE, cex = 1.5)

levelplot(pairdist(ppp_random), main = "Pairwise Distances: independant", par.settings=BTCTheme())

levelplot(pairdist(ppp_regular), main = "Pairwise Distances: regular", par.settings=BTCTheme())

levelplot(pairdist(ppp_locations), main = "Pairwise Distances: clustered", par.settings=BTCTheme())

par(mar = c(2, 2, 2, 2), mfrow = c(1, 3), oma = c(0, 0, 2, 0))
plot(nndist(ppp_random), main = "independant", pch = 16, cex = 0.8)
plot(nndist(ppp_regular), main = "regular", pch = 16, cex = 0.8)
plot(nndist(ppp_locations), main = "clustered", pch = 16, cex = 0.8)
mtext("Nearest Neighbor Distances", outer = TRUE, cex = 1.5)

par(mar = c(0, 0, 4, 0), 
    mfrow = c(1, 3),
    oma = c(0, 0, 4, 0),
    cex.main = 2)

plot(distmap(ppp_random), main = "independant", pch = 16, cex = 0.2)
plot(distmap(ppp_regular), main = "regular", pch = 16, cex = 0.2)
plot(distmap(ppp_locations), main = "clustered", pch = 16, cex = 0.2)

mtext("Empty Space Distances", outer = TRUE, cex = 2.8)


Complete Spatial Randomness

Recall that under the assumption of a homogeneous Poisson process (HPP), the location of one point in space does not affect the probabilities of points appearing nearby. The intensity of the point process in area \(W\) is a constant \(\lambda(u) = \lambda > 0\), \(\forall u \in W\).

This property of a HPP is often referred to as Complete Spatial Randomness (CSR). In the context of distance methods for point patterns, the HPP serves as a demarcation line between spatial point processes whose realizations can be described as being more regular or less regular than those of the Poisson process, which is a point process pattern that is distributed independently and uniformly over a study area (Baddeley 2010).


Empty space distances

The distance

\[d(u, \mathbf x) = \min\{\Vert u - x_i \Vert : x_i \in \mathbf x\}\]

from a fixed location \(u \in \mathbb R^2\) to the nearest point in a point pattern \(\mathbf x\) is called the empty space distance or void distance.


The empty space function \(F\)

The \(F\) function measures the distribution of all distances from an arbitrary reference location \(u\) in the plane to the nearest observed event \(x\). On a grid of locations \(u_j\) , \(j = 1,..., m\), the empirical distribution function of the observed empty space distances is

\[\hat F(r)= \frac{1}{m}\sum_{j}\mathbf{1}\{d(u_j, \mathbf x) \le r\}\]

For a homogeneous Poisson process of intensity \(\lambda\), hence under CSR, the value of the \(F\) function becomes:

\[F_\text{pois}(r)=1 - \exp(\lambda\pi r^2)\]

The empirical function \(\hat{F}(r)\) is compared against its theoretical expectation \(F_\text{pois}(r)\) by plotting them.

Values \(\hat{F}(r) > F_\text{pois}(r)\) suggest that empty space distances in the point pattern are shorter than for a Poisson process, suggesting a regularly spaced pattern, while values \(\hat{F}(r) < F_\text{pois}(r)\) suggest a clustered pattern.


In spatstat, the \(F\) function is computed by the Fest() command. Recall that a point process \(X\) extends throughout two-dimensional space but is observed only inside a window \(W\). Therefore, the empirical distribution of distances depends on the geometry of the window \(W\) as well as on the characteristics of the point process \(X\). This leads to a bias in the distance measurements, often referred to as edge effects.

Owing the fact the \(\hat{F}(r)\) depends on the geometry of the window \(W\), spatstat provides several edge corrections, such as the Chiu-Stoyan estimation (correction = "cs"), the reduced sample estimation (correction = "rs"), often referred to as border correction, and the Baddeley-Gill Kaplan-Meier estimation (correction = "km").

Let us plot the \(F\) function for the random point data set (ppp_random), for the clustered point data set (ppp_locations), and the regular point data set (ppp_regular).

plot(Fest(ppp_random), main = "independant")

The function Fest() returns by default an object of class fv (function value table). By plotting such an object, the graph includes by default \(F_\text{pois}(r)\) for homogeneous Poisson processes, the Chiu-Stoyan estimate \(F_\text{cs}(r)\), the reduced sample estimate (border correction) \(F_\text{board}(r)\), and the Baddeley-Gill Kaplan-Meier estimate \(F_\text{km}(r)\).

The plot shows that, up to a distance of approximately 1000 meters, the estimates behave like a homogeneous Poisson processes \(\hat{F}(r) \approx F_\text{pois}(r)\). At larger distances, there is some deviation from spatial randomness.

Now, let us examine the clustered data set.

plot(Fest(ppp_locations), main = "clustered")

\(\hat{F}(r) < F_\text{pois}(r)\) for all of the three edge corrected estimates, thus suggesting a clustered pattern.

Finally, we plot the \(F\) function for the regular spaced data set.

plot(Fest(ppp_regular), main = "regular")

Values \(\hat{F}(r) > F_\text{pois}(r)\) indicate a regularly spaced pattern.


Nearest neighbor distances

The nearest neighbor distances are given by

\[d_i = \text{min}_{j \ne i}\Vert x_i-x_j\Vert\]

The empirical distribution of the nearest neighbour distances depends on the geometry of the window \(W\) as well as on the characteristics of the point process \(X\). Hence, corrections for edge effects are warranted.

 

The \(G\) function

The \(G\) function measures the distribution of distances from an arbitrary event to its nearest neighbors.

\[\hat{G}(r) = \frac{1}{n(\mathbf x)}\sum_{i}\mathbf 1\{d_i \le r\}\]

The \(G\) function represents the number of elements in the set of distances up to some threshold \(r\), normalized by the total number of points \(n\) in point pattern \(\mathbf x\).

For a homogeneous Poisson point process of intensity \(\lambda\), the nearest-neighbour distance distribution function becomes

\[G_\text{pois}(r)=1 - \exp(-\lambda\pi r^2)\text{,}\]

where \(\lambda\) is the mean number of events per unit area. Note that this expression is identical to the empty space function \(F\) for the Poisson process.

The interpretation of \(\hat G(r)\) is the reverse of \(\hat F(r)\). Values \(\hat G(r) > G_\text{pois}(r)\) suggest that nearest neighbour distances in the point pattern are shorter than for a Poisson process, suggesting a clustered pattern, while values \(\hat G(r) < G_\text{pois}(r)\) suggest a regular pattern.

The function Gest() of the spatstat package computes estimates of \(\hat G(r)\) using several edge corrections. By default the plotted graph includes \(G_\text{pois}(r)\) for homogeneous Poisson processes, \(G_\text{han}(r)\), the Hanisch estimate, \(G_\text{board}(r)\), the reduced sample estimate (border correction) and \(F_\text{km}(r)\), the Baddeley-Gill Kaplan-Meier estimate.

Let us plot the \(G\) function for the random point data set (ppp_random), for the clustered point data set (ppp_locations), and the regular point data set (ppp_regular).

plot(Gest(ppp_random), main = "independant")

The plot shows that the various estimates \(\hat{G}(r)\), are not too far from \(G_\text{pois}(r)\), corresponding to a homogeneous Poisson process, thus indicating spatial randomness.

Now let us examine the clustered point data set.

plot(Gest(ppp_locations), main = "clustered")

Again, by default, the graph includes three estimates. For a clustered pattern, observed locations should be closer to each other than expected under CSR. All estimates are \(\hat{G}(r) > G_\text{pois}(r)\), thus suggesting a clustered pattern.

Finally, let us examine the regular data set.

plot(Gest(ppp_regular), main = "regular")

A regular pattern should have higher nearest-neighbour distances than expected under CSR. \(\hat{G}(r)\) is zero for \(r \le 2600\), which shows no nearest-neighbor distances shorter than 2600 m.


Pairwise distances and the \(K\) function

The observed pairwise distances \(s_{ij} = \Vert x_i-x_j\Vert\) in the data pattern \(\mathbf x\) constitute a biased sample of pairwise distances, with a bias in favour of smaller distances. For example, we can never observe a pairwise distance greater than the diameter of the window.

Ripley (1977) defined the \(K\) function for a stationary point process so that \(\lambda K(r)\) is the expected number of other points of the process within a distance \(r\) of a typical point of the process.

\[K(r)=\frac{1}{\lambda}\mathbb E[n(\mathbf X \cap b(u,r)/\{u\}\vert u \in \mathbf X)]\]

The \(K\) function for a HPP is

\[K_{\text{pois}}(r) = \pi r^2\] regardless of the intensity.

Numerous estimators of \(K\) have been proposed. Most of them are weighted and normalized empirical distribution functions of the pairwise distances, of the general form

\[\hat K(r) = \frac{1}{\hat\lambda \text{ area}(W)}\sum_i\sum_{i \ne j}\mathbf 1\{\Vert x_i-x_j \Vert \le r\}e(x_i. x_j,r)\]

where \(e(u, v, r)\) is an edge correction weight (Baddeley 2010).

We compare the estimate \(\hat K(r)\) with the Poisson \(K\) function. Values \(\hat K(r) >\pi r^2\) suggest clustering, while \(\hat K(r) <\pi r^2\) suggests a regular pattern.

In spatstat the function Kest() computes several estimates of the \(K\) function. By default the plotted graph includes \(K_\text{pois}(r)\) for homogeneous Poisson processes, \(K_\text{board}(r)\), the border-corrected estimate, \(K_\text{trans}(r)\), the translation-corrected estimate, and \(K_\text{iso}(r)\), the Ripley isotropic correction estimate.

Let us plot the \(K\) function for the random point data set (ppp_random), for the clustered point data set (ppp_locations), and for the regular point data set (ppp_regular).

plot(Kest(ppp_random), main = "independant")

The plot shows that the various estimates, \(\hat{K}(r)\), are close to \(K_\text{pois}(r)\), corresponding to a homogeneous Poisson processes and thus indicating complete spatial randomness.

Now let us examine the clustered point data set.

plot(Kest(ppp_locations), main = "clustered")

Values of \(\hat K(r) > K_{\text(pois)}(r)\) suggest clustering.

Finally let us examine the regular data set.

plot(Kest(ppp_regular), main = "regular")

Values of \(\hat K(r) < K_{\text(pois)}(r)\) suggest a regular pattern.


The \(L\) function

A commonly-used transformation of \(K\) is the \(L\) function

\[L(r) = \sqrt{\frac{K(r)}{\pi}}\]

which transforms the Poisson \(K\) function to the straight line \(L_\text{pois}(r) = r\), making visual assessment of the graph easier. To compute the estimated \(L\) function, we may use the Lest() function from the spatstat package.

Let us plot the \(L\) function for the random point data set (ppp_random), for the clustered point data set (ppp_locations), and the regular point data set (ppp_regular).

plot(Lest(ppp_random), main = "independant")

Now let us examine the clustered data set.

plot(Lest(ppp_locations), main = "clustered")

plot(Lest(ppp_regular), main = "regular")

We apply the same logic as above, values \(\hat L(r) > L_\text{pois}(r)\) suggest clustering, while \(\hat L(r) < L_\text{pois}(r)\) suggests a regular pattern.


Other measures of point interaction

The pair correlation function

Another summary function is the pair correlation function:

\[g(r)=\frac{K'(r)}{2\pi r}\]

where \(K'(r)\) is the derivative of \(K\). The pair correlation \(g(r)\) is the probability of observing a pair of points separated by a distance \(r\), divided by the corresponding probability for a Poisson process. The value \(g(r) = 1\) corresponds to complete randomness. For other processes, values \(g(r) > 1\) suggest clustering or attraction at a distance \(r\), while values \(g(r) < 1\) suggest inhibition or regularity.

To compute the estimated pair correlation function, use pcf(). Let us plot the pair correlation function for the random point data set (ppp_random), for the clustered point data set (ppp_locations), and the regular point data set (ppp_regular).

plot(pcf(ppp_random), main = "independant")

Now let us examine the clustered data set.

plot(pcf(ppp_locations), main = "clustered")

Finally we examine the regular data set.

plot(pcf(ppp_regular), main = "regular")


The \(J\) function

The \(J\) function is a combination of the \(F\) and \(G\) the function.

\[J(r) = \frac{1-G(r)}{1-F(r)}\]

defined for all \(r \ge 0\) such that \(F(r) < 1\). For a homogeneous Poisson process, \(F_\text{pois} = G_\text{pois}\), so that

\[J_\text{pois}(r) \equiv 1\]

Values \(J(r) > 1\) suggest regularity, and \(J(r) < 1\) suggest clustering.

The \(J\) function is computed by Jest(). Let us plot \(J\) function for the random point data set (ppp_random), for the clustered point data set (ppp_locations), and for the regular point data set (ppp_regular).

plot(Jest(ppp_random), main = "independant")

Now let us examine the clustered data set.

plot(Jest(ppp_locations), main = "clustered")

Finally we examine the regular data set.

plot(Jest(ppp_regular), main = "regular")


The allstats() function

Note that the spatstat package offers the convenient function allstats() to compute the \(F\), \(G\), \(J\) and \(K\) functions for a data set all at once.

Let us plot the allstats() function for the random point data set (ppp_random), for the clustered point data set (ppp_locations), and for the regular point data set (ppp_regular).

plot(allstats(ppp_random), main = "independant")

Now let us examine the clustered point data set.

plot(allstats(ppp_locations), main = "clustered")

Finally we examine the regular point data set.

plot(allstats(ppp_regular), main = "regular")

Exercise: Consider the lansing data set, which includes the locations and botanical classification of trees in the Lansing Woods. Subset the data such that only the maple trees are included and apply the allstats() function. Do maple trees appear to grow clustered, independently or spaced regularly?

data("lansing")
lansing
## Marked planar point pattern: 2251 points
## Multitype, with levels = blackoak, hickory, maple, misc, redoak, whiteoak 
## window: rectangle = [0, 1] x [0, 1] units (one unit = 924 feet)
## Your code here...
Show code
maple <- lansing[lansing$marks == "maple", ]
maple$marks <- droplevels(maple$marks)
maple <- unmark(maple)
plot(allstats(maple))



Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.