We start this section with loading the required data sets and packages.
# First, let's import the needed libraries.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import geopandas as gpd
import pointpats
# load berlin shapefile
berlin_district = gpd.read_file(
"http://userpage.fu-berlin.de/soga/soga-py/300/30800_spatial_point_patterns/berlin_district.zip"
)
# load data from the previous section
berlin_locations = gpd.read_file(
"http://userpage.fu-berlin.de/soga/soga-py/300/30800_spatial_point_patterns/berlin_locations.geojson"
)
berlin_random = gpd.read_file(
"http://userpage.fu-berlin.de/soga/soga-py/300/30800_spatial_point_patterns/berlin_random.geojson"
)
berlin_regular = gpd.read_file(
"http://userpage.fu-berlin.de/soga/soga-py/300/30800_spatial_point_patterns/berlin_regular.geojson"
)
The interaction between two arbitrary points is measured by second order properties, which reflect any tendency of the events to appear:
fig, ax = plt.subplots(1, 3, figsize=(15, 5))
berlin_district.plot(ax=ax[0], edgecolor="black", linewidth=1, color="white")
berlin_random.plot(
ax=ax[0], edgecolor="black", color="lightgrey", markersize=5, aspect=1
)
ax[0].set_title("Independent")
ax[0].set_axis_off()
berlin_district.plot(ax=ax[1], edgecolor="black", linewidth=1, color="white")
berlin_regular.plot(
ax=ax[1], edgecolor="black", color="lightgrey", markersize=5, aspect=1
)
ax[1].set_title("Regular")
ax[1].set_axis_off()
berlin_district.plot(ax=ax[2], edgecolor="black", linewidth=1, color="white")
berlin_locations.plot(
ax=ax[2], edgecolor="black", color="lightgrey", markersize=5, aspect=1
)
ax[2].set_title("Clustered")
ax[2].set_axis_off()
plt.show()
A simple diagnostic option or dependence between points is the 2D Histogram. 2D Histogram is used to analyze the relationship among two variables and to display the two variables as a kind of density heatmap. Heatmaps are able to describe the density or intensity of variables, visualize patterns and even anomalies. In Python we use the plt.hist2d
function.
fig, ax = plt.subplots(1, 3, figsize=(15, 4))
ax[0].hist2d(
berlin_random.geometry.x.values,
berlin_random.geometry.y.values,
bins=20,
cmap="Blues",
)
ax[0].set_title("Independent")
ax[0].set_axis_off()
ax[1].hist2d(
berlin_regular.geometry.x.values,
berlin_regular.geometry.y.values,
bins=20,
cmap="Blues",
)
ax[1].set_title("Regular")
ax[1].set_axis_off()
ax[2].hist2d(
berlin_locations.geometry.x.values,
berlin_locations.geometry.y.values,
bins=20,
cmap="Blues",
)
ax[2].set_title("Clustered")
ax[2].set_axis_off()
plt.show()
Classical techniques for investigating inter-point interaction are distance methods, based on measuring the distances between points (Baddeley 2010).
pairwise distances $s_{ij} = \Vert xi - xj \Vert$ between all distinct pairs of points $x_i$ and $x_j$ $(i \ne j)$ in the pattern
nearest neighbor distances $t_i = \min_{j \ne i}s_{ij}$, the distance from each point $x_i$ to its nearest neighbor
empty space distances $d(u) = \min \Vert u-x_i \Vert$, the distance from a fixed reference location $u$ in the window to the nearest data point
These methods are available in the scipy.spatial
package using the functions distance_matrix
and distance.pdist()
. But once again, also the pointpats
comes in handy for distance calculations, employing the PointPattern()
method, which comes with many different functionalities (see here for more).
distance_matrix
provides a pairwise distances as a distance matrixdistance.pdist()
provides a pairwise distances as an arrayPointPattern().knn(1)
returns the nearest neighbour distances as two arrays (index and distance)from scipy.spatial.distance import pdist
from scipy.spatial import distance_matrix
## reshape coordinates to right input format ([x1,y1], [x2,y2],....[xn, yn])
## extract coordinates from GeoPandas as array
coord_list_loc = np.array(
[(x, y) for x, y in zip(berlin_locations.geometry.x, berlin_locations.geometry.y)]
)
coord_list_ran = np.array(
[(x, y) for x, y in zip(berlin_random.geometry.x, berlin_random.geometry.y)]
)
coord_list_reg = np.array(
[(x, y) for x, y in zip(berlin_regular.geometry.x, berlin_regular.geometry.y)]
)
## calculate euclidian distance using pdist()
berlin_locations_dist = pdist(coord_list_loc, "euclidean")
berlin_random_dist = pdist(coord_list_ran, "euclidean")
berlin_regular_dist = pdist(coord_list_reg, "euclidean")
fig, ax = plt.subplots(1, 3, figsize=(15, 6))
plt.suptitle("Pairwise Distances Histogram", size=16)
ax[0].hist(
berlin_random_dist, bins=400, color="lightgrey", edgecolor="black", density=True
)
ax[0].set_title("Independent")
ax[1].hist(
berlin_regular_dist, bins=50, color="lightgrey", edgecolor="black", density=True
)
ax[1].set_title("Regular")
ax[2].hist(
berlin_locations_dist, bins=200, color="lightgrey", edgecolor="black", density=True
)
ax[2].set_title("Clustered")
plt.tight_layout()
plt.show()
Next, we want to plot the pairwise distances as a distance matrix. Therefore, we will use the distance_matrix
function from scipy.spatial
. The resulting distance matrix will be plotted using matshow()
plot function from matshow
.
from scipy.spatial import distance_matrix
locations_dist_matrix = distance_matrix(x=coord_list_loc, y=coord_list_loc)
random_dist_matrix = distance_matrix(x=coord_list_ran, y=coord_list_ran)
regular_dist_matrix = distance_matrix(x=coord_list_reg, y=coord_list_reg)
fig, ax = plt.subplots(1, 3, figsize=(15, 6))
plt.suptitle("Pairwise Distances", size=16)
fig0 = ax[0].matshow(random_dist_matrix)
ax[0].set_title("Independent")
fig1 = ax[1].matshow(regular_dist_matrix)
ax[1].set_title("Regular")
fig2 = ax[2].matshow(locations_dist_matrix)
ax[2].set_title("Clustered")
# add colorbars
fig.colorbar(fig0, ax=ax[0])
fig.colorbar(fig1, ax=ax[1])
fig.colorbar(fig2, ax=ax[2])
plt.tight_layout()
plt.show()
Here, we will explore two two methods to determine the nearest neighbour distances. One uses the distance matrix from above, the other employs a method from the pointpats
package. Both yield in the same result.
If we want to use a distance matrix, we will have to find the minimum distance (that is >0) for each pair of distances.
random_nearest = []
for dist in range(len(random_dist_matrix)):
i = min(i for i in random_dist_matrix[dist, :] if i > 0)
random_nearest.append(i)
regular_nearest = []
for dist in range(len(regular_dist_matrix)):
i = min(i for i in regular_dist_matrix[dist, :] if i > 0)
regular_nearest.append(i)
locations_nearest = []
for dist in range(len(locations_dist_matrix)):
i = min(i for i in locations_dist_matrix[dist, :] if i > 0)
locations_nearest.append(i)
fig, ax = plt.subplots(1, 3, figsize=(15, 6))
plt.suptitle("Nearest Neighbour Histogram", size=16)
ax[0].scatter(range(len(random_nearest)), random_nearest)
ax[0].set_title("Independent")
ax[1].scatter(range(len(regular_nearest)), regular_nearest)
ax[1].set_title("Regular")
ax[2].scatter(range(len(locations_nearest)), locations_nearest)
ax[2].set_title("Clustered")
plt.tight_layout()
plt.show()
The pointpats
package provides the PointPattern().knn(1)
method, to determine the nearest neighbour distances, which yields in the same result using much less code.
from pointpats import PointPattern
nn_ixs_ra, nn_ds_ra = PointPattern(coord_list_ran).knn(1)
nn_ixs_re, nn_ds_re = PointPattern(coord_list_reg).knn(1)
nn_ixs_l, nn_ds_l = PointPattern(coord_list_loc).knn(1)
fig, ax = plt.subplots(1, 3, figsize=(15, 6))
plt.suptitle("Nearest Neighbour Histogram", size=16)
ax[0].scatter(nn_ixs_ra, nn_ds_ra)
ax[0].set_title("Independent")
ax[1].scatter(nn_ixs_re, nn_ds_re)
ax[1].set_title("Regular")
ax[2].scatter(nn_ixs_l, nn_ds_l)
ax[2].set_title("Clustered")
plt.tight_layout()
plt.show()
C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:1492: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning) C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:1208: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning) C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:1492: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning) C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:1208: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning) C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:1492: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning) C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:1208: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning)
Recall that under the assumption of a homogeneous Poisson process (HPP) the location of one point in space does not affect the probabilities of other points appearing nearby. The intensity of the point process in area $W$ is a constant $\lambda(u) = \lambda > 0$, $\forall u \in W$.
Hence, the properties of HPP is often referred to as Complete Spatial Randomness (CSR). In the context of distance methods for point patterns, the HPP serves as a demarcation line between spatial point processes whose realizations can be described as being more regular, or less regular, than those of the Poisson process, which is a point process pattern that is distributed independently and uniformly over a study area (Baddeley 2010).
The distance
$$d(u, \mathbf x) = \min\{\Vert u - x_i \Vert : x_i \in \mathbf x\}$$from a fixed location $u \in \mathbb R^2$ to the nearest point in a point pattern $\mathbf x$, is called the empty space distance or void distance.
The $F$ function measures the distribution of all distances from an arbitrary reference location $u$ in the plane to the nearest observed event $j$. On a grid of locations $u_j$ , $j = 1,..., m$ the empirical distribution function of the observed empty space distances is
$$\hat F(r)= \frac{1}{m}\sum_{j}\mathbf{1}\{d(u_j, \mathbf x) \le r\}$$For a homogeneous Poisson process of intensity $\lambda$, hence under CSR, the value of the $F$ function becomes:
$$F_\text{pois}(r)=1 - \exp(\lambda\pi r^2)$$The empirical function $\hat{F}(r)$ is compared against its theoretical expectation $F_\text{pois}(r)$ by plotting them.
Values $\hat{F}(r) > F_\text{pois}(r)$ suggest that empty space distances in the point pattern are shorter than for a Poisson process, suggesting a regularly space pattern, while values $\hat{F}(r) < F_\text{pois}(r)$ suggest a clustered pattern.
The $F$ function is implemented in pointpats
through the f_test
function in pointpats.
Let us plot the $F$ function for the random point data set (berlin_random
), for the clustered point data set (berlin_locations
), and for the regular point data set (berlin_regular
).
from pointpats import distance_statistics
from pointpats.distance_statistics import K, L, J
## random:
f_test_ran = distance_statistics.f_test(
np.array(coord_list_ran), support=40, keep_simulations=True
)
## regular:
f_test_reg = distance_statistics.f_test(
np.array(coord_list_reg), support=40, keep_simulations=True
)
## clustered:
f_test_loc = distance_statistics.f_test(
np.array(coord_list_loc), support=40, keep_simulations=True
)
f, ax = plt.subplots(3, 1, figsize=(9, 9))
## random
# plot all the simulations with very fine lines
ax[0].fill_between(
f_test_ran.support,
np.min(f_test_ran.simulations, axis=0),
np.max(f_test_ran.simulations, axis=0),
color="blue",
alpha=0.1,
label="simulations",
)
ax[0].plot(
f_test_ran.support,
np.median(f_test_ran.simulations, axis=0),
color="cyan",
label="median simulation",
)
# observed pattern's G function
ax[0].plot(f_test_ran.support, f_test_ran.statistic, label="observed", color="red")
ax[0].legend()
ax[0].set_xlim(0, 2000)
ax[0].set_title(r"Random $F(r)$ function (berlin_locations)")
#################################################
## regular
ax[1].fill_between(
f_test_reg.support,
np.min(f_test_reg.simulations, axis=0),
np.max(f_test_reg.simulations, axis=0),
color="blue",
alpha=0.1,
label="simulations",
)
ax[1].plot(
f_test_reg.support,
np.median(f_test_reg.simulations, axis=0),
color="cyan",
label="median simulation",
)
ax[1].plot(f_test_reg.support, f_test_reg.statistic, label="observed", color="red")
ax[1].set_ylabel("% of nearest neighbor\ndistances shorter")
ax[1].legend()
# ax[1].set_xlim(0,2000)
ax[1].set_title(r"Regular $F(r)$ function (berlin_locations)")
#################################
# clustered
ax[2].fill_between(
f_test_loc.support,
np.min(f_test_loc.simulations, axis=0),
np.max(f_test_loc.simulations, axis=0),
color="blue",
alpha=0.1,
label="simulations",
)
ax[2].plot(
f_test_loc.support,
np.median(f_test_loc.simulations, axis=0),
color="cyan",
label="median simulation",
)
ax[2].plot(f_test_loc.support, f_test_loc.statistic, label="observed", color="red")
# # clean up labels and axes
ax[2].set_xlabel("distance")
ax[2].legend()
ax[2].set_xlim(0, 2000)
ax[2].set_title(r"Clustered $F(r)$ function (berlin_locations)")
f.tight_layout()
plt.show()
For the random dataset, the plot shows that up to a distance of approximately 1000 meters the estimates behave like a homogeneous Poisson processes $\hat{F}(r) \approx F_\text{pois}(r)$, at larger distances there is same deviation from spatial randomness.
Now let us examine the clustered data set. We see that for all of the three edge corrected estimates $\hat{F}(r) < F_\text{pois}(r)$, thus suggesting a clustered pattern. Since the function estimated for the observed pattern increases much more slowly than the functions for the simulated patterns, we can be confident that there are many gaps in our pattern; i.e. the pattern is clustered.
Finally we look at the $F$ function for the regular spaced data set. Values $\hat{F}(r) > F_\text{pois}(r)$ indicate a regularly spaced pattern.
Recall: the nearest neighbor distances summarizes the distances between each point in the pattern and its nearest neighbor. The nearest neighbor distances are given by
$$t_i = \text{min}_{j \ne i}\Vert x_i-x_j\Vert$$The empirical distribution of the nearest neighbor distances depends on the geometry of the window $W$ as well as on characteristics of the point process $X$. Hence, corrections for edge effects are warranted.
The $G$ function measures the distribution of distances from an arbitrary event to its nearest neighbors.
$$\hat{G}(r) = \frac{1}{n(\mathbf x)}\sum_{i}\mathbf 1\{t_i \le r\}$$The $G$ function represents the number of elements in the set of distances up to some threshold $r$, normalized by the total number of points $n$ in point pattern $\mathbf x$. The idea behind the $G$ function: we want to learn how similar our pattern is to a spatially random one by computing the cumulative distribution of nearest neighbor distances over increasing distance thresholds $r$. Next, we compare that to that of a set of simulated patterns that follow a known spatially-random process. Usually, the used reference distribution is a spatial Poisson point process (Ripley 1991).
For a homogeneous Poisson point process of intensity $\lambda$, the nearest-neighbor distance distribution function becomes
$$G_\text{pois}(r)=1 - \exp(-\lambda\pi r^2)\text{,}$$where $\lambda$ is the mean number of events per unit area. Note that this expression is identical to the empty space function, the $F$ function, for the Poisson process (Ripley 1991).
The interpretation of $\hat G(r)$ is the reverse of $\hat F(r)$. Values $\hat G(r) > G_\text{pois}(r)$ suggest that nearest neighbor distances in the point pattern are shorter than for a Poisson process, suggesting a clustered pattern, while values $\hat G(r) < G_\text{pois}(r)$ suggest a regular pattern.
To do this in the pointpats
package, we can use the g_test
function, which computes both the $G$ function for the empirical data and these hypothetical replications under a completely spatially random process.
Let us plot the $G$ function for the random point data set (berlin_random
), for the clustered point data set (berlin_locations
), and for the regular point data set (berlin_regular
).
## random:
g_test_ran = distance_statistics.g_test(
np.array(coord_list_ran), support=40, keep_simulations=True
)
## regular:
g_test_reg = distance_statistics.g_test(
np.array(coord_list_reg), support=40, keep_simulations=True
)
## clustered:
g_test_loc = distance_statistics.g_test(
np.array(coord_list_loc), support=40, keep_simulations=True
)
f, ax = plt.subplots(3, 1, figsize=(9, 9))
## random
# plot all the simulations with very fine lines
ax[0].fill_between(
g_test_ran.support,
np.min(g_test_ran.simulations, axis=0),
np.max(g_test_ran.simulations, axis=0),
color="blue",
alpha=0.1,
label="simulations",
)
ax[0].plot(
g_test_ran.support,
np.median(g_test_ran.simulations, axis=0),
color="cyan",
label="median simulation",
)
# observed pattern's G function
ax[0].plot(g_test_ran.support, g_test_ran.statistic, label="observed", color="red")
ax[0].legend()
ax[0].set_xlim(0, 2000)
ax[0].set_title(r"Random $G(r)$ function (berlin_locations)")
#################################################
## regular
ax[1].fill_between(
g_test_reg.support,
np.min(g_test_reg.simulations, axis=0),
np.max(g_test_reg.simulations, axis=0),
color="blue",
alpha=0.1,
label="simulations",
)
ax[1].plot(
g_test_reg.support,
np.median(g_test_reg.simulations, axis=0),
color="cyan",
label="median simulation",
)
ax[1].plot(g_test_reg.support, g_test_reg.statistic, label="observed", color="red")
ax[1].set_ylabel("% of nearest neighbor\ndistances shorter")
ax[1].legend()
# ax[1].set_xlim(0,2000)
ax[1].set_title(r"Regular $G(r)$ function (berlin_locations)")
#################################
# clustered
ax[2].fill_between(
g_test_loc.support,
np.min(g_test_loc.simulations, axis=0),
np.max(g_test_loc.simulations, axis=0),
color="blue",
alpha=0.1,
label="simulations",
)
ax[2].plot(
g_test_loc.support,
np.median(g_test_loc.simulations, axis=0),
color="cyan",
label="median simulation",
)
ax[2].plot(g_test_loc.support, g_test_loc.statistic, label="observed", color="red")
# # clean up labels and axes
ax[2].set_xlabel("distance")
ax[2].legend()
ax[2].set_xlim(0, 2000)
ax[2].set_title(r"Clustered $G(r)$ function (berlin_locations)")
f.tight_layout()
plt.show()
Let's make sense of these distribution plots of distances. We expect a clustered pattern to have more points near one another. On the contrary a dispersed pattern, the distances should be equally far. For a completely random pattern we expect something in between.
The first plot shows that the various estimates, $\hat{G}(r)$, are not too far of from $G_\text{pois}(r)$, corresponding to a homogeneous Poisson processes, thus indicating spatial randomness.
Now let us examine the clustered point data set. Again, by default the graph includes three estimates. For a clustered pattern, observed locations should be closer to each other than expected under CSR. All estimates are $\hat{G}(r) > G_\text{pois}(r)$, thus suggesting a clustered pattern.
Finally, let us examine the regular data set. A regular pattern should have higher nearest-neighbor distances than expected under CSR. $\hat{G}(r)$ is zero for $r \le 3600$, which shows that there are no nearest-neighbor distances shorter than 3600 m.
Looking at the $G$ functions, we conclude the following:
The observed pairwise distances $s_{ij} = \Vert x_i-x_j\Vert$ in the data pattern $\mathbf x$ constitute a biased sample of pairwise distances, with a bias in favor of smaller distances. For example, we can never observe a pairwise distance greater than the diameter of the window.
Ripley (1977) defined the $K$ function for a stationary point process so that $\lambda K(r)$ is the expected number of other points of the process within a distance $r$ of a typical point of the process.
$$K(r)=\frac{1}{\lambda}\mathbb E[n(\mathbf X \cap b(u,r)/\{u\}\vert u \in \mathbf X)]$$The $K$ function for a HPP is
$$K_{\text{pois}}(r) = \pi r^2$$regardless of the intensity.
Numerous estimators of $K$ have been proposed. Most of them are weighted and normalized empirical distribution functions of the pairwise distances, of the general form
$$\hat K(r) = \frac{1}{\hat\lambda \text{ area}(W)}\sum_i\sum_{i \ne j}\mathbf 1\{\Vert x_i-x_j \Vert \le r\}e(x_i. x_j,r)$$where $e(u, v, r)$ is an edge correction weight (Baddeley 2010{target="_blank"}).
We compare the estimate $\hat K(r)$ with the Poisson $K$ function. Values $\hat K(r) >\pi r^2$ suggest clustering, while $\hat K(r) <\pi r^2$ suggests a regular pattern.
In the pointpats
package, we can use the K
function, computes several estimates of the $K$ function.
Let us plot the $K$ function for the random point data set (berlin_random
), for the clustered point data set (berlin_locations
), and for the regular point data set (berlin_regular
).
## extract coordinates from GeoPandas as PointPattern
pp_ran = PointPattern(coord_list_ran)
pp_loc = PointPattern(coord_list_loc)
pp_reg = PointPattern(coord_list_reg)
C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:1492: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning) C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:1208: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning) C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:1492: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning) C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:1208: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning) C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:1492: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning) C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:1208: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning)
k_ran = K(pp_ran)
k_loc = K(pp_loc)
k_reg = K(pp_reg)
k_ran.plot()
plt.xlabel("r [m]")
plt.ylabel("K(r)")
plt.title(f"$K$ distance function for the independent pattern")
plt.show()
The plot shows that the various estimates, $\hat{K}(r)$, are close to $K_\text{pois}(r)$, corresponding to a homogeneous Poisson processes and thus indicating complete spatial randomness.
Now let us examine the clustered point data set.
k_loc.plot()
plt.xlabel("r [m]")
plt.ylabel("K(r)")
plt.title(f"$K$ distance function for the clustered pattern")
plt.show()
Values of $\hat K(r) > K_{\text(pois)}(r)$ suggest clustering.
Finally let us examine the regular data set.
k_reg.plot()
plt.xlabel("r [m]")
plt.ylabel("K(r)")
plt.title(f"$K$ distance function for the regular pattern")
plt.show()
Values of $\hat K(r) < K_{\text(pois)}(r)$ suggest a regular pattern.
A commonly-used transformation of $K$ is the $L$ function
$$L(r) = \sqrt{\frac{K(r)}{\pi}}$$which transforms the Poisson $K$ function to the straight line $L_\text{pois}(r) = r$, making visual assessment of the graph easier. To compute the estimated $L$ function, we may use the L()
function from the pointpats
package.
Let us plot the $L$ function for the random point data set (berlin_random
), for the clustered point data set (berlin_locations
), and for the regular point data set (berlin_regular
).
l_ran = L(pp_ran)
l_loc = L(pp_loc)
l_reg = L(pp_reg)
l_ran.plot()
plt.xlabel("r [m]")
plt.ylabel("L(r)")
plt.title(f"$L$ distance function for the independent pattern")
plt.show()
l_loc.plot()
plt.xlabel("r [m]")
plt.ylabel("L(r)")
plt.title(f"$L$ distance function for the clustered pattern")
plt.show()
l_reg.plot()
plt.xlabel("r [m]")
plt.ylabel("L(r)")
plt.title(f"$L$ distance function for the regular pattern")
plt.show()
We apply the same logic as above, values $\hat L(r) > L_\text{pois}(r)$ suggest clustering, while $\hat L(r) < L_\text{pois}(r)$ suggests a regular pattern.
The $J$ function is a combination of the $F$ and $G$ the function.
$$J(r) = \frac{1-G(r)}{1-F(r)}$$defined for all $r \ge 0$ such that $F(r) < 1$. For a homogeneous Poisson process, $F_\text{pois} = G_\text{pois}$, so that
$$J_\text{pois}(r) \equiv 1$$Values $J(r) > 1$ suggest regularity, and $J(r) < 1$ suggest clustering.
The $J$ function is computed by J()
. Let us plot $J$ function for the random point data set (berlin_random
), for the clustered point data set (berlin_locations
), and for the regular point data set (berlin_regular
).
J_ran = J(pp_ran, intervals=40)
J_reg = J(pp_reg, intervals=40)
J_loc = J(pp_loc, intervals=40)
C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:1923: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning) C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:103: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning) C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\pointpats\_deprecated_distance_statistics.py:239: RuntimeWarning: invalid value encountered in divide self.ev = self.j / self.j C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:1923: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning) C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:103: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning) C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\pointpats\_deprecated_distance_statistics.py:239: RuntimeWarning: invalid value encountered in divide self.ev = self.j / self.j C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:1923: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning) C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:103: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning) C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\pointpats\_deprecated_distance_statistics.py:239: RuntimeWarning: invalid value encountered in divide self.ev = self.j / self.j
J_ran.plot()
plt.xlabel("r [m]")
plt.ylabel("L(r)")
plt.title(f"$J$ distance function for the independent pattern")
plt.show()
Now let us examine the clustered data set.
J_loc.plot()
plt.xlabel("r [m]")
plt.ylabel("L(r)")
plt.title(f"$J$ distance function for the clustered pattern")
plt.show()
Finally we examine the regular data set.
J_reg.plot()
plt.xlabel("r [m]")
plt.ylabel("L(r)")
plt.title(f"$J$ distance function for the regular pattern")
plt.show()
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.