Spatial statistics on point patterns are often concerned with determining how regular a distribution of points is. This means, for example, whether the points are clustered or evenly distributed. Questions such as these relate to the intensity or dispersion of the dot pattern as a whole.

**Intensity** is the average density of points or in other words the expected number of points per unit area. The intensity may be constant (*uniform*) or may vary from location to location (*non-uniform* or *inhomogeneous*).

One approach to evaluate the intensity is to divide the study area into quadrats, and count the numbers of points falling into each quadrat. If the points have uniform intensity, and are completely random, then the quadrat counts should be Poisson random numbers with constant mean. We may test that hypothesis by using the $\chi^2$ goodness-of-fit test statistic

$$\chi^2 = \sum\frac{(\text{observed}-\text{expected})^2}{\text{expected}}$$In Python we make use of the `pointpats.quadrat_statistics`

module, including the `qs.QStatistic()`

function. First, we load the *Berlin city data sets* we created in the previous section.

In [2]:

```
# First, let's import the needed libraries.
import matplotlib.pyplot as plt
import numpy as np
import random
import pandas as pd
import geopandas as gpd
import folium
```

In [3]:

```
# load berlin shapefile
berlin_district = gpd.read_file(
"http://userpage.fu-berlin.de/soga/soga-py/300/30800_spatial_point_patterns/berlin_district.zip"
)
# load data from the previous section
berlin_locations = gpd.read_file(
"http://userpage.fu-berlin.de/soga/soga-py/300/30800_spatial_point_patterns/berlin_locations.geojson"
)
berlin_random = gpd.read_file(
"http://userpage.fu-berlin.de/soga/soga-py/300/30800_spatial_point_patterns/berlin_random.geojson"
)
berlin_regular = gpd.read_file(
"http://userpage.fu-berlin.de/soga/soga-py/300/30800_spatial_point_patterns/berlin_regular.geojson"
)
```

**Quadrat statistics**

The first set of techniques, quadrat statistics, receive their name after their approach to split the data up into small areas (quadrants). Once created, these “buckets” are used to examinee the uniformity of counts across them. We employ the `pointpats.quadrat_statistics`

module for this type of analysis.

Find the documentation of

`pointpats`

here.

In [4]:

```
import pointpats.quadrat_statistics as qs
```

For the `qs.QStatistic()`

function, the coordinates must be provided as an array.

In [5]:

```
## reshape coordinates to right input format ([x1,y1], [x2,y2],....[xn, yn])
## extract coordinates from GeoPandas as array
coord_list_loc = np.array(
[(x, y) for x, y in zip(berlin_locations.geometry.x, berlin_locations.geometry.y)]
)
coord_list_ran = np.array(
[(x, y) for x, y in zip(berlin_random.geometry.x, berlin_random.geometry.y)]
)
coord_list_reg = np.array(
[(x, y) for x, y in zip(berlin_regular.geometry.x, berlin_regular.geometry.y)]
)
```

We can visualize the result using the `QStatistic.plot()`

method.

In [6]:

```
qstat = qs.QStatistic(coord_list_ran)
qstat.plot()
```

By default, the `qs.QStatistic()`

function uses a 3x3 grid. We see that the total count of points per square varies alot. We expect the chi-squared test ($\chi$) to be statistically significant, indicated by a very small p-value:

$\chi^2$ tests, how likely this distribution is if the cell counts are uniform.

In [7]:

```
print(f"chi2 ={qstat.chi2}, df ={qstat.df}, p-value ={qstat.chi2_pvalue} ")
```

chi2 =39.6822429906542, df =8, p-value =3.6711911449602187e-06

In contrast, our regular point process will have the nearly same number of points in each square:

In [8]:

```
qstat = qs.QStatistic(coord_list_reg)
qstat.plot()
```

In [9]:

```
print(f"chi2 ={qstat.chi2}, df ={qstat.df}, p-value ={qstat.chi2_pvalue} ")
```

chi2 =8.666666666666668, df =8, p-value =0.37119089551026835

The high *p*-value $(p > 0.05)$ indicates a HPP, where the intensity is a constant $\lambda(u) = \lambda = \frac{n}{\vert A \vert}$, with $n$ being the number of points observed in region $A$, and $\vert A \vert$ being the area of region $A$.

Let us repeat that once more for the `berlin_locations`

data set, which seems to be highly clustered.

In [10]:

```
qstat = qs.QStatistic(coord_list_loc)
qstat.plot()
```

In [11]:

```
print(f"chi2 ={qstat.chi2}, df ={qstat.df}, p-value ={qstat.chi2_pvalue} ")
```

chi2 =3997.276414087514, df =8, p-value =0.0

*p*-value indicates that the intensity is non-constant. Such an point process is referred to as IPP and its intensity can be estimated non-parametrically, e.g., with kernel smoothing (Bivand et al. 2008).

There are different ways to estimate the intensity for an IPP. One of them is a non-parametrically estimation by means of kernel smoothing. If we observe $n$ points $\{x_i\}_{i=1}^n$ then the form of a kernel smoothing estimator is

$$\hat{\lambda}(x) = \frac{1}{h^2}\sum_{i=1}^n\frac{\kappa(\frac{\Vert x-x_i\Vert}{h})}{q(\Vert x \Vert)}\text{,} $$where $x_i \in \{x_1, x_2,..., x_n\}$ is an observed point, $h$ is the bandwidth, $q(\Vert x \Vert)$ is a border correction to compensate for observations missing due to edge effects, and $\kappa(u)$ is a bivariate and symmetrical kernel function.

R currently implements a two-dimensional quartic kernel function:

$$ \kappa(u) = \begin{cases} \frac{3}{\pi}(1-\Vert u \Vert^2)^2 & \text{if } u \in (-1, 1) \\ 0 & \text{otherwise} \end{cases} $$where $\Vert u \Vert^2=u_1^2+u_2^2$ is the squared norm of point $u=(u_1, u_2)$.

There is no general rule for selecting the bandwidth $h$, which governs the level of smoothing. Small bandwidths result in spikier maps and large bandwidths result in smoother map. It is reasonable to use several values depending on the process under consideration, and choose a value that seems plausible (Bivand et al. 2008).

A kernel density plot is the graphical representation of kernel smoothing. When using the Python the `seaborn`

package provides a nice implementation for visualizing spatial kernel density plots. Here we use the `kdeplot()`

function.

The `seaborn.kdeplot()`

funciton provides an estimate of the intensity function of the point process that generated the point pattern data. The argument `bw_method`

will determine the smoothing bandwidth to use, we will pass this argument as `sigma`

. `Sigma`

is taken as the standard deviation of the isotropic Gaussian kernel, corresponding to the bandwidth parameter. A small standard deviation results in spikier density plot and a large standard deviation results in smoother density plot.

We code a loop to apply the `sns.kdeplot()`

function on our three data sets, `berlin_locations`

, `berlin_regular`

, and `berlin_random`

with tree different values for sigma $(0.2, 0.4,0.9)$.

In [12]:

```
import seaborn as sns
```

In [13]:

```
berlin_locations.name
```

Out[13]:

0 Albert's 1 Bellboy Berlin 2 Newton Bar 3 No Limit Shishabar 4 en passant ... 932 uni keller 933 Zur Schmalzstulle 934 spindler & klatt 935 800A 936 Bon Vivant Name: name, Length: 937, dtype: object

In [16]:

```
# Set up figure and axis
f, axs = plt.subplots(3, 3, figsize=(14, 10))
axs = axs.ravel()
sigma = [0.2, 0.4, 0.9]
plotaxis = [0, 3, 6]
main = ["Locations", "Regular", "Random"]
for i, name, (idx, my_df) in zip(
plotaxis, main, enumerate([berlin_locations, berlin_random, berlin_regular])
):
for ax, s in zip(range(0, 3), sigma):
sns.kdeplot(
x=my_df.geometry.x.values,
y=my_df.geometry.y.values,
n_levels=30,
bw_method=s,
fill=True,
alpha=0.6,
cmap="viridis_r",
ax=axs[ax + i],
legend=True,
cbar=True,
)
berlin_district.plot(ax=axs[ax + i], color="none", edgecolor="grey")
axs[ax + i].set_axis_off()
axs[ax + i].set_title(f"{name}, sigma = {s}")
```