Centrography is a set of descriptive statistics that provide summary descriptions for point patterns with a fokus on centrality in a pattern (location and dispersion). In this chapter, will have a closer look at three types of centrography analysis for point patterns:
Centrography includes different measures to provide an indication of the location of a point pattern, how densely the points cluster around a center and how irregular the shape of the pattern might be.
First, we load the Berlin city data sets we created in the previous section.
# First, let's import the needed libraries.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import geopandas as gpd
import seaborn as sns
import contextily
from pointpats import centrography, PointPattern, skyum
# load berlin shapefile
berlin_district = gpd.read_file(
"http://userpage.fu-berlin.de/soga/soga-py/300/30800_spatial_point_patterns/berlin_district.zip"
)
# load data from the previous section
berlin_locations = gpd.read_file(
"http://userpage.fu-berlin.de/soga/soga-py/300/30800_spatial_point_patterns/berlin_locations.geojson"
)
berlin_random = gpd.read_file(
"http://userpage.fu-berlin.de/soga/soga-py/300/30800_spatial_point_patterns/berlin_random.geojson"
)
berlin_regular = gpd.read_file(
"http://userpage.fu-berlin.de/soga/soga-py/300/30800_spatial_point_patterns/berlin_regular.geojson"
)
When using the pointpats
package, we will have to reshape our coodinate data and convert it to a PointPattern
data type.
## reshape coordinates to right input format ([x1,y1], [x2,y2],....[xn, yn])
## extract coordinates from GeoPandas as array
coord_list_loc = np.array(
[(x, y) for x, y in zip(berlin_locations.geometry.x, berlin_locations.geometry.y)]
)
## extract coordinates from GeoPandas as PointPattern
pp_loc = PointPattern(coord_list_loc)
C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:1492: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning) C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\libpysal\cg\shapes.py:1208: FutureWarning: Objects based on the `Geometry` class will deprecated and removed in a future version of libpysal. warnings.warn(dep_msg, FutureWarning)
The Tendency deals with the center point of the two-dimensional distribution. There are several different tendency measure. Which one to use depends on our objective and data (click here for more information). Here, we will explore two examples of tendency measures.
Mean Center ($x_{mc}$,$y_{mc}$)
$$ x_{mc} = \frac{1}{n} \sum_{i=1}^n x_i$$$$ y_{mc} = \frac{1}{n} \sum_{i=1}^n y_i$$The median center works jsut like the median measure that we already know: it represents a point where half of the data is above or below and left or right, respectively.
Euclidean Median ($x_{em}$,$y_{em}$)
$$ minf(x_{em},y_{em})=\sum_{i=1}^n \sqrt{(x_i−x_{em})^2+(y_i−y_{em})^2}$$Euclidean Median describes the point, from which the sum of the Euclidean distances to all other points is at its minimum. It is an optimization problem and thus follow an iterative approach (Kuhn and Kuenne, 1962) to approximate.
mean_center = centrography.mean_center(pp_loc.points)
euclid_median = centrography.euclidean_median(pp_loc.points)
# scatter plot
scatterplot = sns.jointplot(
x=berlin_locations.geometry.x,
y=berlin_locations.geometry.y,
data=berlin_locations,
s=6,
height=10,
color="grey",
)
# mean point
scatterplot.ax_joint.scatter(
*mean_center, color="seagreen", marker="x", s=50, label="Mean Center"
)
# marginal lines
scatterplot.ax_marg_x.axvline(mean_center[0], color="seagreen")
scatterplot.ax_marg_y.axhline(mean_center[1], color="seagreen")
######
# euclidian median
scatterplot.ax_joint.scatter(
*euclid_median, color="red", marker=".", s=50, label="Euclidian Median"
)
scatterplot.ax_marg_x.axvline(euclid_median[0], color="red")
scatterplot.ax_marg_y.axhline(euclid_median[1], color="red")
# Legend
scatterplot.ax_joint.legend()
# Add basemap
contextily.add_basemap(
scatterplot.ax_joint, source=contextily.providers.CartoDB.Positron
)
plt.show()
The dispersion of a point pattern can be measured using the standard distance, which provides the average distance from the point pattern center.
$$ SD = \sqrt{ \frac{\sum_{i=1}^n (x_i−x_{em})^2}{n} + \frac{\sum_{i=1}^n (y_i−y_{em})^2}{n} } $$The standard distance is implemented in pointpats.centrography
through the std_distance()
function:
stdist_loc = centrography.std_distance(pp_loc.points)
stdist_loc
5780.60268493892
On average the berlin nightlife locations are 5780 m from the center.
We can visualize the dispersion with the help of the standard deviational ellipse or standard ellipse. This ellipse depicts the center, dispersion and orientation of the point pattern.
## calculate axis of ellipse
axis_major, axis_minor, rotation = centrography.ellipse(pp_loc.points)
from matplotlib.patches import Ellipse
f, ax = plt.subplots(1, figsize=(9, 9))
# Scatterplot + Mean Center + Euclidean Median
ax.scatter(
x=berlin_locations.geometry.x, y=berlin_locations.geometry.y, s=2, color="grey"
)
ax.scatter(*mean_center, color="seagreen", marker="x", label="Mean Center")
ax.scatter(*euclid_median, color="red", marker="o", label="Euclidean Median")
# standard ellipse using matplotlib
ellipse = Ellipse(
xy=mean_center, # center the ellipse on our mean center
width=axis_major * 2, # centrography.ellipse only gives half the axis
height=axis_minor * 2,
angle=np.rad2deg(rotation), # convert Angles in degrees, not radians
facecolor="none",
edgecolor="red",
linestyle="--",
label="standard ellipse",
)
ax.add_patch(ellipse)
ax.legend()
contextily.add_basemap(ax, source=contextily.providers.CartoDB.Positron)
plt.show()
The extent of a point cloud can be described using four shapes. Each of them reflects varying levels of how tightly the point in a pattern are bound together. If you want to learn more about shape and extent see here for more information on different shapes, which this page is based on or here for more information on shape analysis.
Convex Hull
The convex hull of a point pattern is the smallest/ tightest convex shape. It comes with some characteristics, since it has no indentations, dips, pinnacles or holes and all interior angles are smaller than 180°.
The convex hull is implemented in pointpats.centrography
through the hull()
function.
convex_hull = centrography.hull(coord_list_loc)
Set hull = True
in PointPattern class method plot to easily plot the convex hull.
Minimum Bounding Rectangle
Minimum Bounding Rectangle is the tightest rectangle that can be drawn around the pattern that contains all of points, which makes it almost always bigger than convex hull. There are two kinds of Minimum Bounding Rectangles. First, lets look at the minimum bounding rectangle that can be drawn by using vertical and horizontal lines.
The Minimum Bounding Rectangle is implemented in pointpats.centrography
through the minimum_bounding_rectangle()
function. It calculates maximum left, right, up and down value of the vertices of minimum bounding rectangle.
mbr = centrography.minimum_bounding_rectangle(coord_list_loc)
mbr
(780782.004452564, 5813040.7318415325, 814411.719518311, 5841533.01896305)
Set window = True
in PointPattern class method plot to easily plot the Minimum Bounding Rectangle.
plt.figure(figsize=(9, 9))
pp_loc.plot(
title="Berlin Nightlife Locations", hull=True, window=True
) # plot point pattern "pp", convex hull, and Minimum Bounding Rectangle
plt.scatter(*mean_center, color="green", marker="x", label="Mean Center")
plt.scatter(*euclid_median, color="red", marker="+", label="Euclidean Median")
plt.legend()
plt.show()
<Figure size 900x900 with 0 Axes>
Minimum Rotated Rectangle
Rotating the outline of a rectangle might yield in a tighter rectangular bound around the point pattern.
The Minimum Rotated Rectangle is implemented in pointpats.centrography
through the minimum_rotated_rectangle()
function.
mrr = centrography.minimum_rotated_rectangle(coord_list_loc)
mrr
array([[ 778003.4 , 5838746.5 ], [ 780982.5 , 5811185.5 ], [ 815061.9 , 5814869.5 ], [ 812082.75, 5842430.5 ]], dtype=float32)
Minimum Bounding Circle
The Minimum Bounding Circle is the smallest circle that can be drawn around the point pattern.
The Minimum Bounding Circle is implemented in pointpats.centrography
through the minimum_bounding_circle()
function.
coord_list_loc
array([[ 793357.02215207, 5832557.61648877], [ 797981.40855913, 5827060.34345127], [ 797927.06102323, 5827157.15705207], ..., [ 800919.32625033, 5826552.44394573], [ 797391.9179632 , 5831688.91623683], [ 795508.68296604, 5824958.26911632]])
# (center_x, center_y), radius = centrography.minimum_bounding_circle(coord_list_loc)
Now we will plot all of these bounding boxes together using matplotlib.patches
from matplotlib.patches import Polygon, Circle, Rectangle
# convex hull
convex_hull_patch = Polygon(
convex_hull,
closed=True,
edgecolor="navy",
facecolor="none",
linestyle=":",
linewidth=2,
label="Convex Hull",
)
# minimum bounding rectangle
min_rect_patch = Rectangle(
mbr[0:2],
width=mbr[2] - mbr[0],
height=mbr[2] - mbr[0],
edgecolor="magenta",
facecolor="none",
linestyle="dashed",
linewidth=2,
label="Minimum Bounding Rectangle",
)
## minimum rotatad rect
min_rot_rect_patch = Polygon(
mrr,
closed=True,
edgecolor="limegreen",
facecolor="none",
linestyle="dashed",
label="Minimum Rotated Rectangle",
linewidth=2,
)
# minimum bounding circle
# circle_patch = Circle(
# (center_x, center_y),
# radius=radius,
# edgecolor='crimson',
# facecolor='none',
# linewidth=2,
# label='Min Bounding Circle'
# )
f, ax = plt.subplots(1, figsize=(10, 10))
ax.add_patch(convex_hull_patch)
ax.add_patch(min_rot_rect_patch)
ax.add_patch(min_rect_patch)
# ax.add_patch(circle_patch)
ax.scatter(
x=berlin_locations.geometry.x, y=berlin_locations.geometry.y, s=2, color="grey"
)
ax.legend()
# Add basemap
contextily.add_basemap(ax, source=contextily.providers.CartoDB.Positron)
plt.show()
The broadest outlines are the minimum bounding rectangle and circle, as they include an area much bigger than the point pattern itself. One they are also quite simple to understand and to draw.
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.