To describe the spatial distribution of points in space, it is essential to consider the various factors that may influence the point pattern:

First-order properties refer to characteristics of the point process that are determined by external variables. In this context, the presence or location of other points does not influence the pattern. Instead, the spatial variation is explained entirely by these external factors.

The interaction between two arbitrary points is measured by second-order properties, which reflect the tendency of the events to appear clustered (points tend to be close together), independently (the Poisson process) or regularly spaced (points tend to avoid each other).

Third-order properties describe phenomena in which the points are influenced by the overall structure or configuration of the point pattern (see Nakoinz, O. & Knitter, D. (2016)).

On this side, we deal with the second-order properties.

Table of contents
Chapter
Distance Methods for point patterns Pairwise distances
Pairwise vector differences
Nearest neighbor distances
Empty Space Distances
Functions Empty space function F
Nearest neighbour function G
Pairwise Distances and the K function
The L function
Other measures of point interaction The pair correlation function
The J-function
Conclusion The allstats() function

We start in this section by loading the required data sets and packages.

library(spatstat)
library(rasterVis)
load(url("https://userpage.fu-berlin.de/soga/data/r-data/ppp_data_berlin.RData"))
par(mar = c(0, 0, 2, 2), 
    mfrow = c(1, 3), 
    cex.main = 2)

plot(ppp_random, main = "independent")
plot(ppp_regular, main = "regular")
plot(ppp_locations, main = "clustered")


Distance methods for point patterns

Classical techniques for investigating inter-point interaction are distance methods, based on measuring the distances between points (Baddeley 2010):

These methods are available in the spatstat package using the functions pairdist(), nndist() and distmap() respectively.


Pairwise distances

par(mar = c(2, 2, 2, 2), mfrow = c(1, 3), oma = c(0, 0, 2, 0))
hist(pairdist(ppp_locations),
  freq = FALSE,
  breaks = nrow(pairdist(ppp_locations)),
  main = "independent",
  xlim = c(0, 40000)
)
hist(pairdist(ppp_regular),
  freq = FALSE,
  breaks = nrow(pairdist(ppp_regular)),
  main = "regular"
)
hist(pairdist(ppp_random),
  freq = FALSE, 
  breaks = nrow(pairdist(ppp_random)), 
  main = "clustered"
)
mtext("Pairwise Distances Histogram", outer = TRUE, cex = 1.5)

Pairwise vector differences

A simple diagnostic option or dependence between points is the Fry plot. It is a scatterplot of the vector differences \(x_j - x_i\) between all pairs of distinct points in the pattern. In spatstat, the Fry plot is computed by the command fryplot().

par(mar = c(0, 0, 3, 2), 
   mfrow = c(1, 3), 
    cex.main = 2)

fryplot(ppp_random, main = "independent", pch = 16, cex = 0.2)
fryplot(ppp_regular, main = "regular", pch = 16, cex = 0.2)
fryplot(ppp_locations, main = "clustered", pch = 16, cex = 0.2)

In contrast to the distances, the differences have a direction. To contribute to these directions, a histogram is not sufficient anymore. Instead, we visualize them in the fryplot.


Nearest neighbor distances

par(mar = c(2, 2, 2, 2), mfrow = c(1, 3), oma = c(0, 0, 2, 0))
plot(nndist(ppp_random), main = "independent", pch = 16, cex = 0.8)
plot(nndist(ppp_regular), main = "regular", pch = 16, cex = 0.8)
plot(nndist(ppp_locations), main = "clustered", pch = 16, cex = 0.8)

As we can see, it is difficult to distinguish between the first and the third point pattern, by just interpreting frequency plots. Therefore we need a closer look at completely random processes and their function to get an objective decision tool.

Empty Space Distances

par(mar = c(0, 0, 4, 0), 
    mfrow = c(1, 3),
    oma = c(0, 0, 4, 0),
    cex.main = 2)

plot(distmap(ppp_random), main = "independant", pch = 16, cex = 0.2)
plot(distmap(ppp_regular), main = "regular", pch = 16, cex = 0.2)
plot(distmap(ppp_locations), main = "clustered", pch = 16, cex = 0.2)


Complete Spatial Randomness

Recall that under the assumption of a homogeneous Poisson process (HPP), the location of one point in space does not affect the probabilities of points appearing nearby. The intensity of the point process in area \(W\) is a constant \(\lambda(u) = \lambda > 0\), \(\forall u \in W\).

This property of a HPP is often referred to as Complete Spatial Randomness (CSR). In the context of distance methods for point patterns, the HPP serves as a demarcation line between spatial point processes whose realizations can be described as being more regular or less regular than those of the Poisson process, which is a point process pattern that is distributed independently and uniformly over a study area (Baddeley 2010).


Empty space distances

The distance

\[d(u, \mathbf x) = \min\{\Vert u - x_i \Vert : x_i \in \mathbf x\}\]

from a fixed location \(u \in \mathbb R^2\) to the nearest point in a point pattern \(\mathbf x\) is called the empty space distance or void distance.


The empty space function \(F\)

The \(F\) function measures the distribution of all distances from an arbitrary reference location \(u\) in the plane to the nearest observed event \(x\). On a grid of locations \(u_j\) , \(j = 1,..., m\), the empirical distribution function of the observed empty space distances is

\[\hat F(r)= \frac{1}{m}\sum_{j}\mathbf{1}\{d(u_j, \mathbf x) \le r\}\]

For a homogeneous Poisson process of intensity \(\lambda\), hence under CSR, the value of the \(F\) function becomes:

\[F_\text{pois}(r)=1 - \exp(\lambda\pi r^2)\]

The empirical function \(\hat{F}(r)\) is compared against its theoretical expectation \(F_\text{pois}(r)\) by plotting them.

Values \(\hat{F}(r) > F_\text{pois}(r)\) suggest that empty space distances in the point pattern are shorter than for a Poisson process, suggesting a regularly spaced pattern, while values \(\hat{F}(r) < F_\text{pois}(r)\) suggest a clustered pattern.


In spatstat, the \(F\) function is computed by the Fest() command. Recall that a point process \(X\) extends throughout two-dimensional space but is observed only inside a window \(W\). Therefore, the empirical distribution of distances depends on the geometry of the window \(W\) as well as on the characteristics of the point process \(X\). This leads to a bias in the distance measurements, often referred to as edge effects.

Owing the fact the \(\hat{F}(r)\) depends on the geometry of the window \(W\), spatstat provides several edge corrections, such as the Chiu-Stoyan estimation (correction = "cs"), the reduced sample estimation (correction = "rs"), often referred to as border correction, and the Baddeley-Gill Kaplan-Meier estimation (correction = "km").

Let us plot the \(F\) function for the random point data set (ppp_random), for the clustered point data set (ppp_locations), and for the regular point data set (ppp_regular).

plot(Fest(ppp_random), main = "independant", xlim = c(0,2000), ylim = c(0,1))
plot(Fest(ppp_locations), main = "clustered", xlim = c(0,2000), ylim = c(0,1))
plot(Fest(ppp_regular), main = "regular", xlim = c(0,2000), ylim = c(0,1))

####Independent:

The function Fest() returns by default an object of class fv (function value table). By plotting such an object, the graph includes by default \(F_\text{pois}(r)\) for homogeneous Poisson processes, the Chiu-Stoyan estimate \(F_\text{cs}(r)\), the reduced sample estimate (border correction) \(F_\text{board}(r)\), and the Baddeley-Gill Kaplan-Meier estimate \(F_\text{km}(r)\).

The plot shows that, up to a distance of approximately 1000 meters, the estimates behave like a homogeneous Poisson processes \(\hat{F}(r) \approx F_\text{pois}(r)\). At larger distances, there is some deviation from spatial randomness.

#####Clustered:

\(\hat{F}(r) < F_\text{pois}(r)\) for all of the three edge corrected estimates, thus suggesting a clustered pattern.

#####Regular:

Values \(\hat{F}(r) > F_\text{pois}(r)\) indicate a regularly spaced pattern.


Nearest neighbor distances

The nearest neighbor distances are given by

\[d_i = \text{min}_{j \ne i}\Vert x_i-x_j\Vert\]

The empirical distribution of the nearest neighbor distances depends on the geometry of the window \(W\) as well as on the characteristics of the point process \(X\). Hence, corrections for edge effects are warranted.

 

The \(G\) function

The \(G\) function measures the distribution of distances from an arbitrary event to its nearest neighbours.

\[\hat{G}(r) = \frac{1}{n(\mathbf x)}\sum_{i}\mathbf 1\{d_i \le r\}\]

The \(G\) function represents the number of elements in the set of distances up to some threshold \(r\), normalized by the total number of points \(n\) in point pattern \(\mathbf x\).

For a homogeneous Poisson point process of intensity \(\lambda\), the nearest-neighbour distance distribution function becomes

\[G_\text{pois}(r)=1 - \exp(-\lambda\pi r^2)\text{,}\]

where \(\lambda\) is the mean number of events per unit area. Note that this expression is identical to the empty space function \(F\) for the Poisson process.

The interpretation of \(\hat G(r)\) is the reverse of \(\hat F(r)\). Values \(\hat G(r) > G_\text{pois}(r)\) suggest that nearest neighbor distances in the point pattern are shorter than for a Poisson process, suggesting a clustered pattern, while values \(\hat G(r) < G_\text{pois}(r)\) suggest a regular pattern.

The function Gest() of the spatstat package computes estimates of \(\hat G(r)\) using several edge corrections. By default the plotted graph includes \(G_\text{pois}(r)\) for homogeneous Poisson process, \(G_\text{han}(r)\), the Hanisch estimate, \(G_\text{board}(r)\), the reduced sample estimate (border correction) and \(F_\text{km}(r)\), the Baddeley-Gill Kaplan-Meier estimate.

Let us plot the \(G\) function for the random point data set (ppp_random), for the clustered point data set (ppp_locations), and the regular point data set (ppp_regular).

plot(Gest(ppp_random), main = "independant", xlim = c(0,2000), ylim = c(0,1))
plot(Gest(ppp_locations), main = "clustered", xlim = c(0,2000), ylim = c(0,1))
plot(Gest(ppp_regular), main = "regular", xlim = c(0,2000), ylim = c(0,1))

Independent: The plot shows that the various estimates \(\hat{G}(r)\), are not too far from \(G_\text{pois}(r)\), corresponding to a homogeneous Poisson process, thus indicating spatial randomness.

Clustered: Also graph includes three estimates by default. For a clustered pattern, observed locations should be closer to each other than expected under CSR. All estimates are \(\hat{G}(r) > G_\text{pois}(r)\), thus suggesting a clustered pattern.

Regular: A regular pattern should have higher nearest-neighbour distances than expected under CSR. \(\hat{G}(r)\) is zero for \(r \le 2600\), which shows no nearest-neighbor distances shorter than 2600 m.


Pairwise distances and the \(K\) function

The observed pairwise distances \(s_{ij} = \Vert x_i-x_j\Vert\) in the data pattern \(\mathbf x\) constitute a biased sample of pairwise distances, with a bias in favour of smaller distances. For example, we can never observe a pairwise distance greater than the diameter of the window.

Ripley (1977) defined the \(K\) function for a stationary point process so that \(\lambda K(r)\) is the expected number of other points of the process within a distance \(r\) of a typical point of the process.

\[K(r)=\frac{1}{\lambda}\mathbb E[n(\mathbf X \cap b(u,r)/\{u\}\vert u \in \mathbf X)]\]

The \(K\) function for a HPP is

\[K_{\text{pois}}(r) = \pi r^2\] regardless of the intensity.

Numerous estimators of \(K\) have been proposed. Most of them are weighted and normalized empirical distribution functions of the pairwise distances, of the general form

\[\hat K(r) = \frac{1}{\hat\lambda \text{ area}(W)}\sum_i\sum_{i \ne j}\mathbf 1\{\Vert x_i-x_j \Vert \le r\}e(x_i. x_j,r)\]

where \(e(u, v, r)\) is an edge correction weight (Baddeley 2010).

We compare the estimate \(\hat K(r)\) with the Poisson \(K\) function. Values \(\hat K(r) >\pi r^2\) suggest clustering, while \(\hat K(r) <\pi r^2\) suggests a regular pattern.

In spatstat the function Kest() computes several estimates of the \(K\) function. By default the plotted graph includes \(K_\text{pois}(r)\) for homogeneous Poisson processes, \(K_\text{board}(r)\), the border-corrected estimate, \(K_\text{trans}(r)\), the translation-corrected estimate, and \(K_\text{iso}(r)\), the Ripley isotropic correction estimate.

Let us plot the \(K\) function for the random point data set (ppp_random), for the clustered point data set (ppp_locations), and for the regular point data set (ppp_regular).

plot(Kest(ppp_random), main = "independant", xlim = c(0,10000))
plot(Kest(ppp_locations), main = "clustered", xlim = c(0,10000))
plot(Kest(ppp_regular), main = "regular", xlim = c(0,10000))

Independant: The plot shows that the various estimates, \(\hat{K}(r)\), are close to \(K_\text{pois}(r)\), corresponding to a homogeneous Poisson processes and thus indicating complete spatial randomness.

Clustered: Values of \(\hat K(r) > K_{\text(pois)}(r)\) suggest clustering.

Regular: Values of \(\hat K(r) < K_{\text(pois)}(r)\) suggest a regular pattern.


The \(L\) function

A commonly-used transformation of \(K\) is the \(L\) function

\[L(r) = \sqrt{\frac{K(r)}{\pi}}\]

which transforms the Poisson \(K\) function to the straight line \(L_\text{pois}(r) = r\), making visual assessment of the graph easier. To compute the estimated \(L\) function, we may use the Lest() function from the spatstat package.

Let us plot the \(L\) function for the random point data set (ppp_random), for the clustered point data set (ppp_locations), and the regular point data set (ppp_regular).

plot(Lest(ppp_random), main = "independant", xlim = c(0,10000),ylim = c(0,15000))
plot(Lest(ppp_locations), main = "clustered", xlim = c(0,10000),ylim = c(0,15000))
plot(Lest(ppp_regular), main = "regular", xlim = c(0,10000),ylim = c(0,15000))

We apply the same logic as above, values \(\hat L(r) > L_\text{pois}(r)\) suggest clustering, while \(\hat L(r) < L_\text{pois}(r)\) suggests a regular pattern.


Other measures of point interaction

The pair correlation function

Another summary function is the pair correlation function:

\[g(r)=\frac{K'(r)}{2\pi r}\]

where \(K'(r)\) is the derivative of \(K\). The pair correlation \(g(r)\) is the probability of observing a pair of points separated by a distance \(r\), divided by the corresponding probability for a Poisson process. The value \(g(r) = 1\) corresponds to complete randomness. For other processes, values \(g(r) > 1\) suggest clustering or attraction at a distance \(r\), while values \(g(r) < 1\) suggest inhibition or regularity.

To compute the estimated pair correlation function, use pcf(). Let us plot the pair correlation function for the random point data set (ppp_random), for the clustered point data set (ppp_locations), and the regular point data set (ppp_regular).

plot(pcf(ppp_random), main = "regular", xlim = c(0,10000), ylim = c(0,15))
plot(pcf(ppp_locations), main = "clustered", xlim = c(0,10000), ylim = c(0,15))
plot(pcf(ppp_regular), main = "regular", xlim = c(0,10000), ylim = c(0,15))


The \(J\) function

The \(J\) function is a combination of the \(F\) and \(G\) the function.

\[J(r) = \frac{1-G(r)}{1-F(r)}\]

defined for all \(r \ge 0\) such that \(F(r) < 1\). For a homogeneous Poisson process, \(F_\text{pois} = G_\text{pois}\), so that

\[J_\text{pois}(r) \equiv 1\]

Values \(J(r) > 1\) suggest regularity, and \(J(r) < 1\) suggest clustering.

The \(J\) function is computed by Jest(). Let us plot \(J\) function for the random point data set (ppp_random), for the clustered point data set (ppp_locations), and for the regular point data set (ppp_regular).

plot(Jest(ppp_random), main = "independant")
plot(Jest(ppp_locations), main = "clustered")
plot(Jest(ppp_regular), main = "regular")


The allstats() function

Note that the spatstat package offers the convenient function allstats() to compute the \(F\), \(G\), \(J\) and \(K\) functions for a data set all at once.

Let us plot the allstats() function for the random point data set (ppp_random), for the clustered point data set (ppp_locations), and for the regular point data set (ppp_regular).

plot(allstats(ppp_random), main = "independant")
plot(allstats(ppp_locations), main = "clustered")
plot(allstats(ppp_regular), main = "regular")

Exercise: Consider the lansing data set, which includes the locations and botanical classification of trees in the Lansing Woods. Subset the data such that only the maple trees are included and apply the allstats() function. Do maple trees appear to grow clustered, independently or spaced regularly?

data("lansing")
lansing
## Marked planar point pattern: 2251 points
## Multitype, with levels = blackoak, hickory, maple, misc, redoak, whiteoak 
## window: rectangle = [0, 1] x [0, 1] units (one unit = 924 feet)
## Your code here...
Show code
maple <- lansing[lansing$marks == "maple", ]
maple$marks <- droplevels(maple$marks)
maple <- unmark(maple)
plot(allstats(maple))



Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.