In the previous sections we examined several distance measures to judge whether a point pattern data set is completely random. The general procedure was to compare the observed point pattern with a completely random pattern, the Poisson point process.

Based on the graphical representation, it was fairly easy to distinguish between the clustered data set (ppp_locations) and the theoretical Poisson process, as well as between the regular data set (ppp.regular) and the theoretical Poisson process. However, in some cases even the random data point data set (ppp_random) shows large discrepancies from the theoretical Poisson process. Because of random variability the random point data set did not appear like a completely independent pattern.

We load the required data sets and packages.

library(spatstat)
load(url("https://userpage.fu-berlin.de/soga/data/r-data/ppp_data_berlin.RData"))

Monte Carlo tests

In order to assess if an observed point pattern is significantly different from a Poisson process, we may use the language of hypothesis testing. The null hypothesis \((H_0)\) is that the data point pattern is a realization of complete spatial randomness. The alternative hypothesis \((H_A)\) is that the data pattern is a realization of another, unspecified point process.

In order to test the hypothesis we apply a randomization test based on simulations, denoted as Monte Carlo test.

Consider one of the distance measure functions we discussed in the previous section, e.g. the \(K\) function. We generate \(M\) independent simulations of Complete Spatial Randomness (CSR) inside the study region \(W\). We compute the estimated \(K\) functions for each of these realizations, \(\hat K^{(j)}(r)\) for \(j = 1,2,3,...M\), and we obtain the point-wise upper and lower envelopes of these simulated curves, given by

\[L(r) = \min_j \hat K^{(j)}(r)\] \[L(r) = \max_j \hat K^{(j)}(r).\]

For any fixed value of \(r\), if the data came from a uniform Poisson process, the probability of \(\hat K(r)\) lying outside the interval \([L(r),U(r)]\) is \(2/(M + 1)\). Instead of the point-wise maximum and minimum, one could use the point-wise order statistics (the point-wise kth largest and kth smallest values) giving a test of exact size \(2k/(M + 1)\).


Simultaneous Monte Carlo tests

In spatstat we can perform a Monte Carlo test by applying the envelope() function. In order to compute the point-wise envelopes we need to provide an object containing point pattern data (Y), a function that computes the desired summary statistic for a point pattern (fun), the number of simulated point patterns to be generated when computing the envelopes (nsim) and the rank of the envelope value (nrank). A rank of 1 means that the minimum and maximum simulated values will be used.

Let us apply the envelope() function, using Kest() as our desired summary statistic, on the random point data set (ppp_random). First we make sure that we set the right rank. Recall, that we can design the size of the envelope by the equation

\[2k/(M + 1)\]

where \(M\) is the number of simulations and \(k\) is the rank. Consider that we want the envelopes to correspond to the critical value of 5%, then

\[2k/(M+1) = 2\cdot1/(39+1) = 0.05\]

M <- 39
k <- 1
2 * k / (M + 1)
## [1] 0.05

Note that the simulation envelopes are no confidence intervals, they are the critical values for a test of the hypothesis that the process is of CSR (Baddeley 2010). The width of the envelopes reflect the variability of the process under the null hypothesis of CSR.

E <- envelope(Y = ppp_random, fun = Kest, nsim = 39, nrank = 1)
## Generating 39 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,  39.
## 
## Done.
E
## Pointwise critical envelopes for K(r)
## and observed value for 'ppp_random'
## Edge correction: "iso"
## Obtained from 39 simulations of CSR
## Alternative: two.sided
## Significance level of pointwise Monte Carlo test: 2/40 = 0.05
## .....................................................................
##      Math.label     Description                                      
## r    r              distance argument r                              
## obs  hat(K)[obs](r) observed value of K(r) for data pattern          
## theo K[theo](r)     theoretical value of K(r) for CSR                
## lo   hat(K)[lo](r)  lower pointwise envelope of K(r) from simulations
## hi   hat(K)[hi](r)  upper pointwise envelope of K(r) from simulations
## .....................................................................
## Default plot formula:  .~r
## where "." stands for 'obs', 'theo', 'hi', 'lo'
## Columns 'lo' and 'hi' will be plotted as shading (by default)
## Recommended range of argument r: [0, 9185.9]
## Available range of argument r: [0, 9185.9]
## Unit of length: 1 meters

Once we have created our object we may plot it.

plot(E, main = "Pointwise Envelopes")

Please note that by default the envelop() function computes the point-wise critical bands. Simultaneous critical bands can be computed by setting the flag global = TRUE.

If so, we compute for each estimate \(\hat K(r)\), its maximum deviation from the Poisson \(K\) function, \(D = \max_r \vert \hat K(r) - K_{pois}(r) \vert\), and we obtain for each of the \(M\) simulated data sets the maximum value \(D_{max}\). Then the upper and lower limits are

\[L(r) = \pi r^2 - D_{max}\] \[U(r) = \pi r^2 + D_{max}\]

Under \(H_0\) this occurs with probability \(k/(M + 1)\). Thus, critical bands of \(5\)% are constructed by for instance taking \(M = 19\) and \(k=1\).

E <- envelope(ppp_random, Kest, nsim = 19, nrank = 1, global = TRUE)
## Generating 19 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,  19.
## 
## Done.
plot(E, main = "Global Envelopes")


The envelope() function is flexible and can be used to compute Monte Carlo envelopes for other types of functions as well.

Exercise: Apply the envelop() function on the random point data set ppp_random to compute the simultaneous critical bands of 10%, using the \(G\) function as summary statistics. Base your analysis on more then 20 simulations.

## Your code here...
Show code
M <- 79
k <- 4
2 * k / (M + 1)
## [1] 0.1
E <- envelope(ppp_random, Gest, nsim = M, nrank = k, global = TRUE)
## Generating 79 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,  79.
## 
## Done.
plot(E, main = "Global Envelopes")

Exercise: Apply the envelop() function on the random point data set ppp_random to compute the simultaneous critical bands of 1%, using the \(F\) function as summary statistics. Base your analysis on more then 20 simulations.

## Your code here...
Show code
M <- 199
k <- 1
2 * k / (M + 1)
## [1] 0.01
E <- envelope(ppp_locations, Fest, nsim = M, nrank = k, global = TRUE)
## Generating 199 simulations of CSR  ...
## 1, 2, 3, 4.6.8.10.12.14.16.18.20.22.24.26.28.30.32.34.36.38.40
## .42.44.46.48.50.52.54.56.58.60.62.64.66.68.70.72.74.76.78.80
## .82.84.86.88.90.92.94.96.98.100.102.104.106.108.110.112.114.116.118.120
## .122.124.126.128.130.132.134.136.138.140.142.144.146.148.150.152.154.156.158.160
## .162.164.166.168.170.172.174.176.178.180.182.184.186.188.190.192.194.196.198 199.
## 
## Done.
plot(E, main = "Global Envelopes")

Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.