Intensity is the average density of points or, in other words, the expected number of points per unit area. The intensity may be constant (uniform) or may vary from location to location (non-uniform or inhomogeneous).

One approach to evaluate the intensity is to divide the study area into quadrats and count the numbers of points falling into each quadrat. If the points have uniform intensity, and are completely random, then the quadrat counts should be Poisson random numbers with constant mean. We may test that hypothesis by using the \(\chi^2\) goodness-of-fit test statistic

\[\chi^2 = \sum\frac{(\text{observed}-\text{expected})^2}{\text{expected}}.\]

In R, we use the quadratcount() and quadrat.test() functions of the spatstat package. First, we load the Berlin city data sets we created in the previous section.

load(url("https://userpage.fu-berlin.de/soga/data/r-data/ppp_data_berlin.RData"))
load(url("https://userpage.fu-berlin.de/soga/data/r-data/berlin_district.RData"))
library(spatstat)

par(mar = c(0, 0, 1.5, 1.5),
    cex.main = 2)

qc_locations <- quadratcount(ppp_locations, nx = 3, ny = 4)
plot(ppp_locations, pch = 3, cex = 0.6)
plot(qc_locations, add = TRUE, textargs = list(col = "red"))

quadrat.test(qc_locations)
## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  
## X2 = 1659.4, df = 10, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 
## Quadrats: 11 tiles (irregular windows)
## Tessellation is marked

The very low p-value indicates that the intensity is non-constant. Such a point process is referred to as an inhomogeneous Poisson process (IPP), and its intensity can be estimated non-parametrically, e.g. with kernel smoothing (Bivand et al. 2008).

Let us repeat this approach for the ppp_random data set.

Exercise: Evaluate the intensity of the data set ppp_random by dividing the area into quadrats, counting the points and performing the \(\chi^2\)-Test. Which kind of Poisson point process do you expect?

## Your code here...
qc_random <- NULL
Show code
## configurate figure extends accordingly
par(mar = c(0, 0, 1.5, 1.5),
    cex.main = 2)

## create ppp object
qc_random <- quadratcount(ppp_random, nx = 3, ny = 4)
plot(ppp_random, pch = 3, cex = 0.6)
plot(qc_random, add = TRUE, textargs = list(col = "red"))

## perform Chi-squared test
quadrat.test(qc_random)
## Warning: Some expected counts are small; chi^2 approximation may be inaccurate
## 
##  Chi-squared test of CSR using quadrat counts
## 
## data:  
## X2 = 12.402, df = 10, p-value = 0.5181
## alternative hypothesis: two.sided
## 
## Quadrats: 11 tiles (irregular windows)
## Tessellation is marked

The high p-value \((p > 0.05)\) indicates a homogeneous Poisson process (HPP).


The intensity of homogeneous Poisson processes is constant with \(\lambda(u) = \lambda = \frac{n}{\vert A \vert}\), where \(n\) is the number of points observed in region \(A\) and \(\vert A \vert\) is the area of region \(A\).


Kernel density estimate

There are different ways to estimate the intensity of an IPP. One of them is a non-parametrically estimation by means of kernel smoothing. If we observe \(n\) points \(\{x_i\}_{i=1}^n\), then the form of a kernel smoothing estimator is

\[\hat{\lambda}(x) = \frac{1}{h^2}\sum_{i=1}^n\frac{\kappa(\frac{\Vert x-x_i\Vert}{h})}{q(\Vert x \Vert)}\text{,} \]

where \(x_i \in \{x_1, x_2,..., x_n\}\) is the observed point, \(h\) is the bandwidth, \(q(\Vert x \Vert)\) is a border correction to compensate for observations missing due to edge effects, and \(\kappa(u)\) is a bivariate and symmetrical kernel function.

R currently implements a two-dimensional quartic kernel function:

\[ \kappa(u) = \begin{cases} \frac{3}{\pi}(1-\Vert u \Vert^2)^2 & \text{if } u \in (-1, 1) \\ 0 & \text{otherwise} \end{cases} \]

where \(\Vert u \Vert^2=u_1^2+u_2^2\) is the squared norm of point \(u=(u_1, u_2)\).

There is no general rule for selecting the bandwidth \(h\), which governs the level of smoothing. Small bandwidths result in spikier maps, and large bandwidths result in smoother maps. It is reasonable to use several values depending on the process under consideration and then to choose a value that seems plausible (Bivand et al. 2008).


Kernel density plot (2-D)

A kernel density plot is the graphical representation of kernel smoothing. Several functions are available in R and contributed packages to visualise kernel density plots. Here, we use the density.ppp() function from the spatstat package.

The density.ppp() function estimates the intensity function of the point process that generated the point pattern data. The argument sigma is taken as the standard deviation of the isotropic Gaussian kernel, corresponding to the bandwidth parameter. A small standard deviation results in a spikier density plot and a large standard deviation results in a smoother density plot.

We code a loop to apply the density.ppp() function on our three data sets ppp_locations, ppp_regular and ppp_random with three different values for sigma \((1000, 2500, 5000)\), respectively.

par(mfrow = c(3, 3), mar = c(0, 0, 1, 2))
sigma <- c(1000, 2500, 5000)
data <- list(ppp_locations, ppp_regular, ppp_random)
main <- c("Locations", "Regular", "Random")
for (i in 1:3) {
  for (j in 1:3) {
    ds <- density.ppp(data[[i]], sigma = sigma[j])
    plot(ds,
      main = paste0(main[i], ", sigma: ", sigma[j])
    )
    plot(data[[i]], add = TRUE, cex = 0.01, regular = FALSE)
  }
}


Likelihood methods for modelling intensity

Alternatively, to the non-parametrically estimation of the intensity by means of kernel smoothing we may propose a specific function for the intensity whose parameters are estimated by maximizing the likelihood of the point process (Baddeley 2010).

The log-likelihood for the homogeneous Poisson process with intensity \(\lambda\) is

\[\log L(\lambda; x) = n(x)\log \lambda-\lambda \text{ area}(W)\] where \(n(x)\) is the number of points in the data set \(x\). The maximum likelihood estimator of \(\lambda\) is \[\hat{\lambda} = \frac{n(x)}{\text{area}(W)}\] The variance of \(\hat{\lambda}\) is

\[\text{var}[\hat{\lambda}] = \frac{\lambda}{\text{area}(W)}\]

For an inhomogeneous Poisson process with intensity function \(\lambda_\theta(u)\) depending on a parameter \(\theta\), the log-likelihood for \(\theta\) is

\[\log L(\theta;x) = \sum_{i=1}^n \log \lambda_\theta(x_i)- \int_W \lambda_\theta(u)du\;,\]

where \(\int_W \lambda_\theta(u)du\) is the expected number of cases of the IPP with intensity \(\lambda_\theta(u)\) in region \(W\).

The maximum likelihood estimation \(\text{MLE } \hat{\theta}\) is not analytically tractable, so it must be computed using numerical algorithms such as Newton’s method (Baddeley 2010).


Fitting Poisson processes in spatstat

By fitting a model, we need to describe how the point pattern intensity \(\lambda(u)\) depends on the spatial location \(u\) or spatial covariates \(Z(u)\). The intensity function \(\lambda_\theta(u)\) is log-linear in the parameter \(\theta\):

\[\log\lambda_\theta(u)= \theta \cdot S(u)\;,\] where \(S(u)\) is a real-valued or vector-valued function of location \(u\). \(S(u)\) could be a function of the spatial coordinates of \(u\), or an observed covariate \(Z(u)\), or a mixture of both.

The spatstat package provides the ppm() function (point process model) to fit a point process model to an observed point pattern. The first argument in the ppm() function is a formula in the R language describing the spatial trend model to be fitted. It has the general form pattern ~ trend, where the left-hand side, pattern, is usually the name of a spatial point pattern to which the model should be fitted, and the right-hand side, trend, is an expression specifying the spatial trend of the model.

The table outlines the basic formulations of a point process model.

\[ \begin{array}{ll} \mathtt{R} & \text{Intensity} \\ \hline \mathtt{ppm(P, \tilde{}1)} & \log \lambda((x,y)) = \theta_0\\ \mathtt{ppm(P, \tilde{}x)} & \log \lambda((x,y)) = \theta_0+\theta_1x\\ \mathtt{ppm(P, \tilde{}x+y)} & \log \lambda((x,y)) = \beta_0+\theta_1x+\theta_2y\\ \mathtt{ppm(P, \tilde{}polynom(x,y,3))} & \text{3rd order polynomial in } x \text{ and } y\\ \mathtt{ppm(P, \tilde{}I(y > 23))} & \text{different constants above and below the line } y = 23\\ \end{array} \]

The ppm() function documentation is quite extensive: Type help(ppm) into your console for further information.

Now let us fit some models.

The base model

A homogeneous Poisson model, e.g. \(\lambda_\theta((x,y)) = \exp(\theta_0)\) can be fitted by

base_loc <- ppm(ppp_locations ~1)
base_loc
## Stationary Poisson process
## Fitted to point pattern dataset 'ppp_locations'
## Intensity: 1.04584e-06
##              Estimate     S.E.   CI95.lo   CI95.hi Ztest      Zval
## log(lambda) -13.77069 0.032721 -13.83482 -13.70656   *** -420.8518

The log-linear model

An inhomogeneous Poisson model with an intensity that is log-linear, the means linear in the Cartesian coordinates, e.g. \(\lambda_\theta((x, y)) = \exp(\theta_0 + \theta_1x + \theta_2y)\), is fitted by

loglin_loc <- ppm(ppp_locations ~ x + y)

We can use the coef() function to extract the model coefficients.

coef(loglin_loc)
##   (Intercept)             x             y 
## -1.493209e+02  2.852400e-06  2.287357e-05

This specific model can be written in the form

\(\lambda_\theta((x, y)) = \exp(-149.3209128 + 2.8524004\times 10^{-6}x + 2.2873568\times 10^{-5}y)\)


The log-quadratic model

An inhomogeneous Poisson model with an intensity that is log-quadratic, that means it is quadratic in the Cartesian coordinates \(x\) and \(y\); e.g. \(\lambda_\theta((x, y)) = \exp(\theta_0 + \theta_1x + \theta_2y+ \theta_3x^2 + \theta_4y^2 +\theta_5x y)\), is fitted by

logquad_loc <- ppm(ppp_locations ~ polynom(x, y, 2))
coef(logquad_loc)
##   (Intercept)             x             y        I(x^2)      I(x * y) 
## -1.379287e+06  8.132716e-02  4.622414e-01 -2.583714e-08 -6.873808e-09 
##        I(y^2) 
## -3.919100e-08

So far, we fitted three different models for the ppp_locations data set. For the purpose of comparison, repeat this procedure and fit the same models on the random point data set ppp_random.

Exercise: Fit the previous three models on the random point data set ppp_random and compare them with the models for the ppp_locations data set.

## Your code here...

base_rand <- NULL
loglin_rand <- NULL
logquad_rand <- NULL
Show code
base_rand <- ppm(ppp_random ~1)
base_rand
## Stationary Poisson process
## Fitted to point pattern dataset 'ppp_random'
## Intensity: 2.138709e-07
##              Estimate       S.E.   CI95.lo   CI95.hi Ztest      Zval
## log(lambda) -15.35789 0.07235746 -15.49971 -15.21608   *** -212.2503
loglin_rand <- ppm(ppp_random ~ x + y)
coef(loglin_rand)
##   (Intercept)             x             y 
## -1.230230e+02 -1.440396e-06  1.867609e-05
logquad_rand <- ppm(ppp_random ~ polynom(x, y, 2))
coef(logquad_rand)
##   (Intercept)             x             y        I(x^2)      I(x * y) 
## -1.621871e+04  7.177928e-03  4.559182e-03  8.747276e-11 -1.256704e-09 
##        I(y^2) 
## -3.034431e-10



Model prediction

Now that we fitted a bunch of models on two different data sets, we use these models for prediction.

The value returned by the model-fitting function ppm is an object of class ppm representing the fitted model, analogous to other model objects in R. Thus, standard operations such as plot, predict or coef can be applied.

We combine the plot() command with the predict() command to visualize the log-linear and the log-quadratic models for the ppp_random data set. We add the point coordinates to enhance the visualization.

par(mar = c(0, 0, 1.5, 1.5),
    mfrow = c(1, 2),
    cex.main = 2)

plot(predict(loglin_rand), main = "log-linear")
points(ppp_random, pch = 16, cex = 0.2)

plot(predict(logquad_rand), main = "log-quadratic")
points(ppp_random, pch = 16, cex = 0.2)