In this section we will learn how to apply variography and kriging in Python. We will build Variograms and then we will perform an Ordinary Kriging Approch to decide which of the models results in the lowest error.
# First, let's import the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import rasterio
import requests
import io
from io import StringIO
import zipfile
First we will load the two datasets we will use in this section:
VG5000_LAN.shp
)dwd_sp
) andsrtm.germany.masked.ETRS89
)For the shapefiles of Germany we rely on shapefiles provided by the German Federal Agency for Cartography and Geodesy (under this licence). You may directly download the shapefiles here. For the purpose of this tutorial, we also provided the data here.
We load these data sets into memory:
### LOAD SHAPEFILE DATA ###
url = "https://daten.gdz.bkg.bund.de/produkte/vg/vg5000_0101/aktuell/vg5000_01-01.utm32s.shape.ebenen.zip"
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall(path="../data")
boundary_germany = gpd.read_file(
"../data/vg5000_01-01.utm32s.shape.ebenen/vg5000_ebenen_0101/VG5000_LAN.shp"
)
boundary_germany = boundary_germany.to_crs("epsg:3035")
### LOAD DWD DATA ###
# "http://www.userpage.fu-berlin.de/~soga/300/30100_data_sets/dwd_sp.geojson"
dwd_sp = gpd.read_file(
"http://userpage.fu-berlin.de/soga/soga-py/300/30900_spatial_interpolation/dwd_sp.geojson"
)
dwd_sp
MEAN.ANNUAL.RAINFALL | ALTITUDE | geometry | |
---|---|---|---|
0 | 755.0 | 478.0 | POINT (4234819.61426 2748192.18467) |
1 | 820.0 | 202.0 | POINT (4045677.70989 3081917.63415) |
2 | 759.0 | 44.0 | POINT (4202462.72690 3315312.39845) |
3 | 919.0 | 759.0 | POINT (4245047.74819 2789643.85319) |
4 | 790.0 | 340.0 | POINT (4545939.25005 2838265.72134) |
... | ... | ... | ... |
581 | 657.0 | 308.0 | POINT (4552333.97346 3109603.91660) |
582 | 875.0 | 565.0 | POINT (4297120.95508 2761375.13048) |
583 | 567.0 | 62.0 | POINT (4523829.16908 3341356.30168) |
584 | 997.0 | 530.0 | POINT (4074689.48322 3010038.15064) |
585 | 734.0 | 29.0 | POINT (4319117.24777 3382112.45555) |
586 rows × 3 columns
####################################
### LOAD RASTER DATA ###
import requests, zipfile, io, os
url = "http://www.userpage.fu-berlin.de/~soga/300/30100_data_sets/spatial/srtm_germany_ETRS89.zip"
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("../data")
## open file
srtm_germany_masked_ETRS89 = rasterio.open(
"../data/srtm_germany_ETRS89.tif", driver="GTiff"
)
# plot with rasterio.plot, which provides matplotlib functionality
from rasterio.plot import show
fig, ax = plt.subplots(figsize=(5, 15))
srtm_plot = show(
srtm_germany_masked_ETRS89,
ax=ax,
cmap="RdYlGn_r",
title="SRTM DEM and DWD weather stations\n",
)
dwd_sp.plot(ax=ax, markersize=1, color="black")
## add colorbar
dummy_colorbar = srtm_plot.get_images()[0]
fig.colorbar(dummy_colorbar, ax=ax, fraction=0.046, pad=0.04, label="elevation [m]")
plt.show()
Before we proceed we make sure that all of our data shares the same coordinate reference system.
# check coordinate systems
print(f"dwd_sp crs = {dwd_sp.crs}")
print(f"srtm_germany_masked_ETRS89 crs = {srtm_germany_masked_ETRS89.crs}")
dwd_sp crs = epsg:4326 srtm_germany_masked_ETRS89 crs = EPSG:3035
# reproject spatial data sets
newProj = srtm_germany_masked_ETRS89.crs
dwd_sp = dwd_sp.set_crs(newProj, allow_override=True)
boundary_germany = boundary_germany.to_crs(newProj)
We can explore the spatial correlation of our predictor variable by plotting variogram cloud. The variogram cloud is obtained by plotting all possible squared differences of observation pairs $(Z(s_i)−Z(s_j))^2$ against their separation distance $h_{ij}$. As any point in the variogram cloud refers to a pair of points in the data set, the variogram cloud is used to point us to areas with unusual high or low variability (Bivand et al. 2008).
In Python we use the scikit-gstat
library, which provides a straightforward method to calculate a variogram. The method skg.Variogram(coordinates, values)
passes coordinates and values to the Variogram Class.
import skgstat as skg
coords = list(zip(dwd_sp.geometry.x, dwd_sp.geometry.y))
values = dwd_sp["MEAN.ANNUAL.RAINFALL"]
V = skg.Variogram(coords, values)
When working with the Variogram Class
from scikit-gstat
we have to understand how this class works. Therefore we need to know some basic things about variography. Otherwise, it can be hard to understand what we are doing in this chapter and how to interpret the resulting figures.
There are some difficulties that come with the presented method:
Let's start to understand a few basic thing about variograms with the help of the following figures.
V.distance_difference_plot()
plt.show()
C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\skgstat\plotting\variogram_dd_plot.py:46: UserWarning: Matplotlib is currently using module://matplotlib_inline.backend_inline, which is a non-GUI backend, so cannot show the figure. fig.show()
What actually is a variogram?
The Variogram tries to describe the relationship between a dependent variable to an independent variable with statistical models. This model describes spatial properties and can be further used to interpolation. The variogram relates the distance between two points to measure their similarity at that given distance. Recall the Tobler's first law of geography:
"Everything is related to everything else, but near things are more related than distant things."
In geostatistics, the independent variable is usually a measure of Euclidean distance. This can be traced in the figure above: The scattering of the pairwise distance (variance) increases wih the sepereating distance between two points. But it is not possible to estimate the behavior of the distribution at different distances. We have to group all point pairs at the same distance lag together into one group, or bin. Binning the separating distances is the most important task in variogram analysis.
If you want to learn more about variograms, the scikit-gstat
documentation is a helpful ressource.
V.plot()
plt.show()
C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\skgstat\plotting\variogram_plot.py:123: UserWarning: Matplotlib is currently using module://matplotlib_inline.backend_inline, which is a non-GUI backend, so cannot show the figure. fig.show()
The distance calculation of the Variogram Class
can be controlled by the dist_func
argument. The default value is 'euclidean'
. For more information see here.
The scikit.gstat
module comes with the Variogram()
function, which allows us to specify a wide range of potential variogram models and specific parameters. A list of available models is printed by using the help(skg.Variogram)
command.
A more detailed description of each model with some example can be found in the documentation of the variogram models.
# help(skg.Variogram)
The skg.Variogram()
function is specified by a range of additional arguments such as:
model
, which specifies the model type, e.g. "exponential", "spherical", ... (see above),n_lags
, which is the number of lag bins,use_nugget
, which adds the nugget component to the model,maxlag
, which is Maximum lag distance to be considered in this Variogram instance,Depending on the data set and on the model of choice we have to specify initial values for the different parameters. If we choose initial values too far off from reasonable values, the fit will most likely not be sufficient.
If we consider the sample variogram from above, an exponential model appears to be a good fit for the variogram until 300,000. For simplicity we will just look at that range of values. By eyeballing the sample variogram we set model = "exponential"
and maxlag= 300000
.
Additionally you may set the fit_method
(default None), where you manually fit values for the variogram. You can set the effective range, sill and nugget either directly to the fit function, or as fit_ prefixed keyword arguments on the Variogram instance. We will set:
V = skg.Variogram(
coords,
values,
model="exponential",
maxlag=300000,
fit_range=90000,
fit_sill=40000,
fit_nugget=15000,
)
To verify our parameter choice we plot the sample variogram (blue) and overlay it with the variogram model (green) as specified above.
# plot
V.plot()
plt.show()
C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\skgstat\plotting\variogram_plot.py:123: UserWarning: Matplotlib is currently using module://matplotlib_inline.backend_inline, which is a non-GUI backend, so cannot show the figure. fig.show()
The variogram model looks reasonable, but for sure not perfect.
We may select the variogram model and its parameters statically, by minimizing some goodness-of-fit criterion. For example the V.parameters
function allows us to find an appropriate variogram model. There are three numbers in the output, which describe the following parameters:
Let us apply the V.parameters
to obtain the parameters of an exponential variogram model.
print(V.parameters)
[183916.33172896618, 46850.44487117823, 0]
OK, compare the result with our eyeballed parametrization. We have not been too much off.
Note that we may specify the fitting method, used by skg.Variogram()
by setting the additional argument fit_method
. You may set the method directly to the fit function, or as fit_ prefixed keyword arguments on Variogram instantiation. The following four options are implemented:
Now let us try out two more model types, the spherical variogram model and the Gaussian variogram model, and visualize them as well as the exponential model.
V_sph = skg.Variogram(
coords,
values,
model="spherical",
maxlag=300000,
fit_range=90000,
fit_sill=40000,
fit_nugget=15000,
).plot()
plt.title("Spherical variogram model")
plt.show()
V_gau = skg.Variogram(
coords,
values,
model="gaussian",
maxlag=300000,
fit_range=90000,
fit_sill=40000,
fit_nugget=15000,
).plot()
plt.title("Gaussian variogram model")
plt.show()
C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\skgstat\plotting\variogram_plot.py:123: UserWarning: Matplotlib is currently using module://matplotlib_inline.backend_inline, which is a non-GUI backend, so cannot show the figure. fig.show()
V_exp = skg.Variogram(
coords,
values,
model="exponential",
# use_nugget = True,
# n_lags = 30,
maxlag=300000,
fit_range=90000,
fit_sill=40000,
fit_nugget=15000,
)
V_exp.plot()
plt.title("Exponential variogram model")
plt.show()
Based on the visual inspection the exponential variogram model (V_exp
) appears to be the best fit.
So far, we assumed that the spatial correlation structure is the same in all directions, or isotropic. Hence, the covariance function depends only on the magnitude of the lag vector, $h$, and not the direction. In many cases, however, the autocorrelation shows a structure, which different for different directions. This property of a spatial process is referred to as anisotropy.
The most commonly employed model for anisotropy is geometric anisotropy, with the semivariogram reaching the same sill in all directions, but at different ranges. To check for directional dependence we directional empirical semivariograms, which include only those values for data pairs falling within certain directional bands as well as falling within the prescribed lag limits.
With the scikit-gstat
package we can account for anisotropy by specifying directional bands by an azimuthal direction. The skg.DirectionalVariogram()
function accepts the azimuth
parameter to indicate the direction in plane, in positive degrees clockwise. For example we may specify the main directions to be $0^\circ$, $45^\circ$, $90^\circ$, and $135^\circ$ with angular tolerance of $22.5^\circ$.
azimuth = np.arange(0, 4, 1) * 45
azimuth
array([ 0, 45, 90, 135])
We plug the additional argument azimuth
and tolerance
into the skg.DirectionalVariogram()
function call and plot all graphs on a panel for better comparison. Each plot corresponds to a certain directional band. Note that directional semivariograms are often noisier due to the reduced number of data pairs used for estimation.
We will store our different Variogram classes in a dictionary, to be able to create them in a loop.
## store as dictionary
d = {}
for x in azimuth:
d["V_{0}".format(x)] = skg.DirectionalVariogram(
coords,
values,
azimuth=x,
model="exponential",
tolerance=22.5,
maxlag=300000,
fit_range=90000,
fit_sill=40000,
fit_nugget=15000,
)
fig, axes = plt.subplots(2, 2, figsize=(8, 4), sharey=True)
for V, ax in zip(d.keys(), axes.ravel()):
ax.plot(d[V].bins, d[V].experimental, ".", label=f"{V}")
ax.set_title(V)
ax.set_ylabel("semivariance")
ax.set_xlabel("distane Lag (-)")
plt.tight_layout()
plt.show()
The plot does not show overwhelming evidence of anisotropy. Let us add our exponential variogram model we fitted in the previous section to see if it fits the data in all four directional bands.
import gstools as gs
fig, axes = plt.subplots(2, 2, figsize=(8, 4), sharey=True)
exp_model = V_exp.to_gstools() ## extract the Variogram with gstools
for V, ax in zip(d.keys(), axes.ravel()):
ax.plot(d[V].bins, d[V].experimental, ".", label=f"{V}")
exp_model.plot(ax=ax)
ax.set_title(V)
ax.set_ylabel("semivariance")
ax.set_xlabel("distane Lag (-)")
ax.get_legend().remove()
ax.set_xlim(0, 305000)
plt.tight_layout()
plt.show()
C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\gstools\covmodel\plot.py:114: UserWarning: Matplotlib is currently using module://matplotlib_inline.backend_inline, which is a non-GUI backend, so cannot show the figure. fig.show()
Not too bad! However, we can explicitly specify the geometry of the covariance structure, by adding the azimuth
and tolerance
into the skg.DirectionalVariogram()
function call. Note that we plug in same parameters for nugget, sill and range as obtained for the exponential variogram model V_exp
.
The values of the azimuth
argument give the angle, or in other words the azimuthal direction measured in degrees clockwise, of the major direction, 0 (=North-South). Hence, we model a more ellipsis-like spatial correlation structure.
fig, axes = plt.subplots(2, 2, figsize=(8, 4), sharey=True)
for V, ax in zip(d.keys(), axes.ravel()):
d[V].plot(axes=ax, hist=False)
ax.set_title(V)
ax.set_ylabel("semivariance")
ax.set_xlabel("distane Lag (-)")
plt.tight_layout()
plt.show()
Once we defined a variogram model we would like to evaluate its predictive power. We test our model using the leave-one-out cross-validation (LOOCV) approach. Again we assess the predictive power of the model by using the root-mean-square error (RMSE) as model assessment metric.
$$RMSE = \sqrt{\frac{\sum_{i=1}^n(\hat y_i-y_i)^2}{n}}$$First, we will define our own rmse
function:
def rmse(residuals):
return np.sqrt(np.sum((residuals) ** 2) / len(residuals))
Now we are ready to perfom the kriging interpolation combined with LOOCV. Note that we are using the isotropic exponential variogram model V_exp
. We assess the predictive power of the model by computing the RMSE. cross_validate()
provides an ready to use implementation of the LOOCV of the variogram model by means of kriging. We can set the metric = "rmse"
to get the rmse as our model assessment metric.
V_exp.cross_validate(metric="rmse")
116.13995314666386
Now let us plug in one of the anisotropic exponential variogram model and check its performance.
d["V_0"].cross_validate(metric="rmse")
116.29184381891159
Well, slightly worse.
Let us evaluate one more model, the Matern model. Therefore we will have to build a Variogram instance first.
V_matern = skg.Variogram(
coords,
values,
model="matern",
maxlag=300000,
fit_range=90000,
fit_sill=30000,
fit_nugget=15000,
)
V_matern.cross_validate(metric="rmse")
122.09615008314385
OK, still a bit worse! Hence, we proceed with the Exponential model.
A nice visualization of the model residuals is the bubble plot. We can produce a bubble plot with the help of the seaborn
library which provides the scatterplot()
function. We need to extract the residuals from the kriging with a custom function. Therefore we will built a custom function loocv_kriging()
. Each point in our dataset will be subject to kriging interpolation. Then the difference between the true data value and the estimated data value will be calculated as a residual. The predictive power of the model will be estimated with the help of the RSME. We will use our custom rmse()
function defined above.
def loocv_kriging(
x, y, z, model="exponential", direction=False, azimuth=0, tolerance=22.5
):
"""
Function for ordinary kriging combined with a Leave-one-out cross validation (LOOCV) approach
Function iterates through each point of x,y,z Dataframe, each time leaving one point out.
loocv_kriging will return the interpolated values and the residuals as arrays.
An ordinary kriging based on specified VARIOGRAM will be performed for each point in iteration.
Difference will be calculated
INPUTS:
x, y
arrays containing coordinates of point data
z
array containing data values that are of interest for interpolation
model
Variogram model to be used, e.g. "exponential" (default), "spherical", "gaussian"
For more information, see documentation of scikit-gstat
https://scikit-gstat.readthedocs.io/en/latest/userguide/variogram.html#variogram-models
direction
wether to use a Directional Variogram
https://scikit-gstat.readthedocs.io/en/latest/reference/directionalvariogram.html#skgstat.DirectionalVariogram
default= False,
if True remember to set azimuth and tolerance!
azimuth
default = 0
tolerance
default22.5
"""
from skgstat import OrdinaryKriging
## Initialise two empty vectors to be filled with in a loop
residuals = []
rainfall_krig = []
# combine the data array in one structe
kriging_data = pd.DataFrame({"x_val": x, "y_val": y, "rainfall": z})
for index, row in kriging_data.iterrows():
# drop one point from dataset
loocv = kriging_data.drop([index])
# from coords and values for variogram
loocv_coords = list(zip(loocv.x_val, loocv.y_val))
loocv_values = loocv.rainfall
# set up variogram model
if direction == False:
V = skg.Variogram(
loocv_coords,
loocv_values,
model=model,
# maxlag= 300000,
fit_range=90000,
fit_sill=40000,
fit_nugget=15000,
)
else:
V = skg.DirectionalVariogram(
loocv_coords,
loocv_values,
model=model,
azimuth=azimuth,
tolerance=tolerance,
)
# set up kriging instance
ok = OrdinaryKriging(V, min_points=2, max_points=10, mode="exact")
# interpolate left out point based on kriging
loocv_rainfall_point = ok.transform(
row["x_val"].flatten(), row["y_val"].flatten()
)
# append the interpolated rainfall in one array
rainfall_krig = np.append(rainfall_krig, loocv_rainfall_point)
# calculate difference between interpolated and true rainfall for the respective point
diff = loocv_rainfall_point - row["rainfall"]
# append residuals in one array
residuals = np.append(residuals, diff)
return rainfall_krig, residuals
rainfall_krig, residuals = loocv_kriging(
dwd_sp.geometry.x,
dwd_sp.geometry.y,
dwd_sp["MEAN.ANNUAL.RAINFALL"],
model="exponential",
)
import seaborn as sns
## data prep for nice plotting
data = pd.DataFrame(
{"x": dwd_sp.geometry.x, "y": dwd_sp.geometry.y, "residuals": residuals}
)
data["LOOCV_residuals"] = np.where(data["residuals"] > 0, "neg", "pos")
data_pos = data[(data["residuals"] > 0)]
data_neg = data[(data["residuals"] < 0)]
### PLOTTING ####
fig, ax = plt.subplots(figsize=(7, 6))
# seaborn bubble plot
sns.scatterplot(
x=data["x"],
y=data["y"],
size=abs(data["residuals"]),
hue=data["LOOCV_residuals"],
sizes=(1, 300),
legend=True,
ax=ax,
alpha=0.7,
)
plt.title("Ordinary Kriging rainfall: LOOCV residuals\n")
plt.xlabel("")
plt.ylabel("")
## Legend outside of the plot
plt.legend(title=" ", bbox_to_anchor=(1.01, 1), borderaxespad=0)
plt.tight_layout()
plt.show()
The bubble plot above shows that the model residuals are quite low in the North German Lowlands, indicating a reasonable model performance. In contrast, the residuals are much worse in the Central German Uplands, the Alpine foreland and the Alps.
Finally, we may use our model to predict the annual mean rainfall over the region of Germany from observations at DWD weather stations.
The scikit-gstat
package provides the OrdinaryKriging()
method for kriging. The Ordinary kriging estimator uses a given Variogram class to calculate estimations for unobserved locations. The following parameters are implemented and should be set:
variogram
- Variogram used to build the kriging estimation (contrains the known data).min_points
– Minimum amount of points to be considered within the variogram’s range.max_points
– Maximum amount of points, that will be considered.There are a more additional parameters, but for our application the ones named aboved are sufficient.
Consequently the spatial prediction model can be written as below. Please note that depending on your computing resources and depending on the spatial resolution of the meshgrid
, this step can take some time.
First, we will have to build a grid to interpolate on. A procedure we already know from previous sections.
## meshgrid to interpolate on
## extent
xmin, xmax = min(dwd_sp.geometry.x), max(dwd_sp.geometry.x)
ymin, ymax = min(dwd_sp.geometry.y), max(dwd_sp.geometry.y)
# size of the grid
nx, ny = 50, 50
# generate two arrays of evenly space data between ends of previous arrays
xi = np.linspace(xmin, xmax, nx)
yi = np.linspace(ymin, ymax, ny)
# generate grid
xx, yy = np.meshgrid(xi, yi)
Set up the kriging instance.
from skgstat import OrdinaryKriging
ok = OrdinaryKriging(V_exp, min_points=2, max_points=10, mode="exact")
Apple the kriging instance to the meshgrid as flattened arrays.
kriging_grid = ok.transform(xx.flatten(), yy.flatten()).reshape(xx.shape)
# We calculate the kriging error on our grid:
sigma2d = ok.sigma.reshape(xx.shape)
fig, ax = plt.subplots(1, 2, figsize=(15, 10), sharey=True)
### plot 1: Interpolation grid
contour_levels = np.arange(400, 1900, 50)
contour_levels = np.arange(400, 1900, 50)
krig = ax[0].contourf(
xx,
yy,
kriging_grid,
levels=contour_levels,
# range(150,200,5),
cmap="viridis_r",
alpha=0.7,
)
boundary_germany.plot(ax=ax[0], facecolor="none", edgecolor="black")
fig.colorbar(krig, ax=ax[0])
ax[0].set_title("Predicted annual rainfall [mm]")
### plot 2: sigma error
contour_levels = np.arange(1000, 35000, 50)
sigma = ax[1].contourf(xx, yy, sigma2d, cmap="hot_r", levels=contour_levels, alpha=0.7)
boundary_germany.plot(ax=ax[1], facecolor="none", edgecolor="black")
fig.colorbar(sigma, ax=ax[1])
ax[1].set_title("Kriging error $sigma^2$")
plt.show()
Fortunately, our data set (dwd_sp
) not only contains rainfall data, but as well the altitude of each particular weather station. We might be able to improve our interpolation including the covariate ALTITUDE
. Let us make a scatter plot to verify the dependence of rainfall and elevation as indicated by the bubble plot.
fig, ax = plt.subplots(figsize=(6, 4))
plt.scatter(dwd_sp["ALTITUDE"], dwd_sp["MEAN.ANNUAL.RAINFALL"], color="black", s=3)
plt.title("Rainfall and Altitude")
plt.xlabel("Elevation")
plt.ylabel("Mean Annual Rainfall")
plt.show()
Although there are a number of weather stations at low elevations, probably located close to the northern German coast, which receive a considerable amount of annual rainfall, it appears that in general rainfall increases with altitude as indicated by the bubble plot.
Consequently, we build a model using universal kriging, where the predicted value (mean annual rainfall) is based on the covariate ALTITUDE
plus a weighted function of neighboring measurements. To apply universal kriging LOOCV using our dwd_sp
data set import a new library PyKrige
, which allows for universal kriging.
from pykrige.uk3d import UniversalKriging3D
Apply the kriging method with an exponential model.
fit_sill, fit_range, fit_nugget = 65000, 500000, 0
ok3d = UniversalKriging3D(
dwd_sp.geometry.x,
dwd_sp.geometry.y,
dwd_sp.ALTITUDE,
dwd_sp["MEAN.ANNUAL.RAINFALL"],
variogram_model="exponential",
variogram_parameters=[fit_sill, fit_range, fit_nugget],
enable_plotting=True,
)
Again we visualize the model residuals using a bubble plot and compute the RMSE.
import seaborn as sns
## data prep for nice plotting
resid = np.append(np.nan, ok3d.delta)
data = pd.DataFrame(
{"x": dwd_sp.geometry.x, "y": dwd_sp.geometry.y, "residuals": resid}
)
data["LOOCV_residuals"] = np.where(data["residuals"] > 0, "neg", "pos")
data_pos = data[(data["residuals"] > 0)]
data_neg = data[(data["residuals"] < 0)]
### PLOTTING ####
fig, ax = plt.subplots(figsize=(7, 6))
# seaborn bubble plot
sns.scatterplot(
x=data["x"],
y=data["y"],
size=abs(data["residuals"]),
hue=data["LOOCV_residuals"],
sizes=(1, 300),
legend=True,
ax=ax,
alpha=0.7,
)
plt.title("Ordinary Kriging rainfall: LOOCV residuals\n")
plt.xlabel("")
plt.ylabel("")
## Legend outside of the plot
plt.legend(title=" ", bbox_to_anchor=(1.01, 1), borderaxespad=0)
plt.tight_layout()
plt.show()
rmse(ok3d.delta)
148.0164746390339
The bubble plot above shows that the model residuals are still quite high for elevate terrain.
Finally, we may use our model to predict the annual mean rainfall over the region of Germany from observations at DWD weather stations. Spatial prediction with the PyKrige
package is fairly straight forward. We may use the execute()
function on our 3d kriging instance, which calculates a kriged grid and the associated variance.
The execute()
function takes as input a style (style
) that specifies how to treat input kriging points. Here we specify "grid"
. In addition the function expects the xpoints, ypoints, zpoints
arguments, which
are input arrays describing the grid.
Consequently the spatial prediction model can be written as below. Please note that depending on your computing resources and depending on the spatial resolution, this step can take some time.
First, we will have to build a grid to interpolate on. A procedure we already know from previous sections.
xmin, xmax = min(dwd_sp.geometry.x), max(dwd_sp.geometry.x)
ymin, ymax = min(dwd_sp.geometry.y), max(dwd_sp.geometry.y)
zmin, zmax = min(dwd_sp.ALTITUDE), max(dwd_sp.ALTITUDE)
## meshgrid to interpolate on
# # size of the grid to interpolate
nx, ny = 50, 50
# generate two arrays of evenly space data between ends of previous arrays
xi = np.linspace(xmin, xmax, nx)
yi = np.linspace(ymin, ymax, ny)
zi = np.linspace(zmin, zmax, ny)
# generate grid
xx, yy, zz = np.meshgrid(xi, yi, zi)
interpol_values_3d, sigma3d = ok3d.execute(
"grid", xi.flatten(), yi.flatten(), zi.flatten()
)
Finally we plot the spatial predictions using the contourf()
function.
fig, ax = plt.subplots(1, 2, figsize=(15, 10), sharey=True)
contour_levels = np.arange(400, 1900, 50)
krig = ax[0].contourf(
xi,
yi,
interpol_values_3d[0, :, :],
cmap="viridis_r",
levels=contour_levels,
alpha=0.7,
)
boundary_germany.plot(ax=ax[0], facecolor="none", edgecolor="black")
fig.colorbar(krig, ax=ax[0])
ax[0].set_title("Predicted annual rainfall [mm]")
contour_levels = np.arange(1000, 35000, 50)
sigma = ax[1].contourf(
xi, yi, sigma3d[0, :, :], cmap="hot_r", levels=contour_levels, alpha=0.7
)
boundary_germany.plot(ax=ax[1], facecolor="none", edgecolor="black")
fig.colorbar(sigma, ax=ax[1])
ax[1].set_title("Kriging error $sigma^2$")
plt.show()
np.mean(np.sqrt(sigma2d)), np.mean(np.sqrt(sigma3d[0, :, :]))
(126.18677098523185, 95.20714054394963)
By mere observation, we can see that the introduction of a covariate has improved the kriging result. Thus, the kriging error for universal kriging is lower than for ordinary kriging. In general we can see that the kriging error is much lower in areas of higher sampling density.
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.