In the previous sections, we examined several distance measures to assess whether a point pattern data set is completely random. The general approach was to compare the observed point pattern with a completely random pattern, the Poisson point process.

Based on the graphical representation, it was relatively easy to distinguish between the clustered data set (ppp_locations) and the theoretical Poisson process, as well as between the regular data set (ppp.regular) and the theoretical Poisson process. However, in some cases, even the random point pattern data set (ppp_random) showed large discrepancies from the theoretical Poisson process. Due to random variability, the random point data set did not appear like a completely independent pattern. That means that even a truly random process can sometimes look clustered, just by chance. That’s why we simulate many random patterns and check whether our observed pattern behaves similarly, or stands out.

We load the required data sets and packages.

library(spatstat.geom)
library(spatstat.explore)
load(url("https://userpage.fu-berlin.de/soga/data/r-data/ppp_data_berlin.RData"))

Monte Carlo tests

In order to assess if an observed point pattern is significantly different from a Poisson process, we use hypothesis testing. The null hypothesis \((H_0)\) assumes complete spatial randomness (CSR), and the alternative \((H_A)\) suggests some kind of structure, such as clustering or regularity.

To make this more intuitive, imagine analyzing the locations of trees in a forest. Are the trees randomly scattered, or do they tend to grow in groups (clusters)? To answer such questions, we use a simulation-based approach known as a Monte Carlo test, which uses many random simulations to check whether our observed pattern looks different from what we’d expect if the points were placed completely at random (CSR).

Consider one of the distance measure functions we discussed in the previous section, e.g. the \(K\) function. We generate \(M\) independent simulations of Complete Spatial Randomness (CSR) inside the study region \(W\). We compute the estimated \(K\) functions for each of these realizations, \(\hat K^{(j)}(r)\) for \(j = 1,2,3,...M\), and we obtain the point-wise upper and lower envelopes of these simulated curves, given by

\[L(r) = \min_j \hat K^{(j)}(r)\] \[L(r) = \max_j \hat K^{(j)}(r).\]

For any fixed value of \(r\), if the data came from a uniform Poisson process, the probability of \(\hat K(r)\) lying outside the interval \([L(r),U(r)]\) is \(2/(M + 1)\). Instead of the point-wise maximum and minimum, one could use the point-wise order statistics (the point-wise kth largest and kth smallest values) giving a test of exact size \(2k/(M + 1)\).


Simultaneous Monte Carlo tests

In newer versions of R, the original spatstat package has been split into several sub-packages. To perform a Monte Carlo test, we now use envelope() from the spatstat.explore package. In order to compute the point-wise envelopes we need to provide an object containing point pattern data (Y), a function that computes the desired summary statistic for a point pattern (fun), the number of simulated point patterns to be generated when computing the envelopes (nsim) and the rank of the envelope value (nrank). A rank of 1 means that the minimum and maximum simulated values will be used.

Let us apply the envelope() function using Kest() as our desired summary statistic, on the random point data set (ppp_random). First, we make sure that we set the right rank. Recall that we can design the size of the envelope by the equation

\[2k/(M + 1)\]

where \(M\) is the number of simulations and \(k\) is the rank. Consider that we want the envelopes to correspond to the critical value of 5%, then

\[2k/(M+1) = 2\cdot1/(39+1) = 0.05\]

M <- 39
k <- 1
2 * k / (M + 1)
## [1] 0.05

Note that simulation envelopes are not confidence intervals, they are the critical values for a test of the hypothesis that the process is of CSR (Baddeley 2010). The width of the envelopes reflect the variability of the process under the null hypothesis of CSR.

E <- envelope(Y = ppp_random, fun = Kest, nsim = 39, nrank = 1)
## Generating 39 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
## 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 
## 39.
## 
## Done.
E
## Pointwise critical envelopes for K(r)
## and observed value for 'ppp_random'
## Edge correction: "iso"
## Obtained from 39 simulations of CSR
## Alternative: two.sided
## Significance level of pointwise Monte Carlo test: 2/40 = 0.05
## .....................................................................
##      Math.label     Description                                      
## r    r              distance argument r                              
## obs  hat(K)[obs](r) observed value of K(r) for data pattern          
## theo K[theo](r)     theoretical value of K(r) for CSR                
## lo   hat(K)[lo](r)  lower pointwise envelope of K(r) from simulations
## hi   hat(K)[hi](r)  upper pointwise envelope of K(r) from simulations
## .....................................................................
## Default plot formula:  .~r
## where "." stands for 'obs', 'theo', 'hi', 'lo'
## Columns 'lo' and 'hi' will be plotted as shading (by default)
## Recommended range of argument r: [0, 9185.9]
## Available range of argument r: [0, 9185.9]
## Unit of length: 1 meters

Once we have created our object, we may plot it.

plot(E, main = "Pointwise Envelopes")

Please note that by default the envelop() function computes the point-wise critical bands. Simultaneous critical bands can be computed by setting the flag global = TRUE.

If so, we compute for each estimate \(\hat K(r)\), its maximum deviation from the Poisson \(K\) function, \(D = \max_r \vert \hat K(r) - K_{pois}(r) \vert\), and we obtain for each of the \(M\) simulated data sets the maximum value \(D_{max}\). Then the upper and lower limits are

\[L(r) = \pi r^2 - D_{max}\] \[U(r) = \pi r^2 + D_{max}\]

Under \(H_0\), this occurs with probability \(k/(M + 1)\). Thus, critical bands of \(5\)% are constructed by for instance taking \(M = 19\) and \(k=1\).

E <- envelope(ppp_random, Kest, nsim = 19, nrank = 1, global = TRUE)
## Generating 19 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 
## 19.
## 
## Done.
plot(E, main = "Global Envelopes")


The envelope() function is flexible and can be used to compute Monte Carlo envelopes for other types of functions as well.

Exercise: Apply the envelop() function on the random point data set ppp_random to compute the simultaneous critical bands of 10%, using the \(G\) function as summary statistics. Base your analysis on more than 20 simulations.

## Your code here...
Show code
M <- 79
k <- 4
2 * k / (M + 1)
## [1] 0.1
E <- envelope(ppp_random, Gest, nsim = M, nrank = k, global = TRUE)
## Generating 79 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
## 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
## 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,
## 73, 74, 75, 76, 77, 78, 
## 79.
## 
## Done.
plot(E, main = "Global Envelopes")

Exercise: Apply the envelope() function on the random point data set ppp_random to compute the simultaneous critical bands of 1%, using the \(F\) function as the summary statistic. Base your analysis on more than 20 simulations.

## Your code here...
Show code
M <- 199
k <- 1
2 * k / (M + 1)
## [1] 0.01
E <- envelope(ppp_locations, Fest, nsim = M, nrank = k, global = TRUE)
## Generating 199 simulations of CSR  ...
## 1, 2, 3, 4.6.8.10.12.14.16.18.20.22.24.26.28.30.32.34.36.38.40.42
## .44.46.48.50.52.54.56.58.60.62.64.66.68.70.72.74.76.78.80.82.84.86.88.90
## .92.94.96.98.100.102.104.106.108.110.112.114.116.118.120.122.124.126.128.130.132.134.136.138
## .140.142.144.146.148.150.152.154.156.158.160.162.164.166.168.170.172.174.176.178.180.182.184.186
## .188.190.192.194.196.198
## 199.
## 
## Done.
plot(E, main = "Global Envelopes")

Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.