In this section apply ordinary kriging to predict the spatial distribution of zinc in Lake Rangsdorf based on sediment samples taken during a field survey in 2017.
%load_ext lab_black
# First, let's import the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import rasterio
import skgstat as skg
In this section we will work with custom functions, that can be imported just like any other package. We will have to download the .py
file to our working directory prior to importing it. The kriging_helper_func.py
file is provided for downloading here.
## load custom helper func
from kriging_helper_func import *
lake_data_chem = gpd.read_file(
"http://userpage.fu-berlin.de/soga/soga-py/300/30900_spatial_interpolation/lake_data_chem.geojson"
)
lake = gpd.read_file(
"http://userpage.fu-berlin.de/soga/soga-py/300/30900_spatial_interpolation/lake.geojson"
)
lake_data_chem.head(5)
ID | x | y | depth_m | visual.depth.cm | elec.cond.muS | pH | LOI550 | LOI990 | LOI | ... | K.mg.g | Pb.mug.g | Mg.mg.g | Mn.mug.g | Na.mg.g | PO4.mg.g | S.mg.g | Sr.mug/g | Zi mug/g | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10a | 391829 | 5794052 | 0.68 | 34 | 305.0 | 7.19 | 1.04 | 0.20 | 98.51 | ... | 0.22 | 1.37 | 0.15 | 15.39 | 0.26 | 0.30 | 0.77 | 7.87 | 11.46 | POINT (13.41414 52.28617) |
1 | 11a | 391824 | 5794187 | 0.66 | 28 | 274.0 | 7.33 | 1.27 | 0.19 | 98.29 | ... | 0.13 | 1.81 | 0.13 | 12.02 | 0.22 | 0.37 | 0.43 | 6.77 | 17.41 | POINT (13.41402 52.28738) |
2 | 12a | 391184 | 5795337 | 1.93 | 15 | 680.0 | 7.25 | 27.32 | 17.46 | 32.99 | ... | 0.34 | 18.49 | 1.41 | 481.61 | 0.54 | 4.83 | 8.24 | 238.28 | 84.19 | POINT (13.40427 52.29759) |
3 | 13b | 391312 | 5795462 | 3.98 | 15 | 825.0 | 7.22 | 36.93 | 18.25 | 21.60 | ... | 0.43 | 34.19 | 1.47 | 454.44 | 0.70 | 4.93 | 17.18 | 261.05 | 94.92 | POINT (13.40611 52.29874) |
4 | 14b | 391729 | 5795736 | 4.95 | 18 | 903.0 | 7.05 | 38.33 | 15.50 | 26.44 | ... | 0.50 | 31.00 | 1.42 | 546.69 | 0.88 | 5.31 | 20.03 | 245.53 | 98.52 | POINT (13.41213 52.30128) |
5 rows × 24 columns
Here we are only interested in the Zi mug/g
variable. Hence we subset the data set. Note that we have to specify the geometry
column as well, since it has to be included for further processing.
### Your code here ...
lake_zinc_sp = lake_data_chem[["Zi mug/g", "geometry"]]
Now we build the simple exponential variogram using the variogram()
function. Therefore we define our coordinates as a zipped array coords
and the values for interpolation as values
### Your code here ...
coords = list(zip(lake_zinc_sp.geometry.x, lake_zinc_sp.geometry.y))
values = lake_zinc_sp["Zi mug/g"]
V_lake = skg.Variogram(coords, values, model="exponential")
# plot the simple variogram
fig, ax = plt.subplots(figsize=(5, 4))
plt.plot(V_lake.bins, V_lake.experimental, "o", label=f"{V_lake}")
# ax.set_title(V)
plt.ylabel("semivariance")
plt.xlabel("distane Lag (-)")
plt.show()
We fit a variogram model and fit the parameters through eyeballing and plot the resulting variogram. We will set:
### Your code here ...
V_lake = skg.Variogram(
coords, values, model="exponential", fit_range=950, fit_sill=2100, fit_nugget=30
)
V_lake.plot(hist=False)
plt.show()
C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\skgstat\plotting\variogram_plot.py:123: UserWarning: Matplotlib is currently using module://matplotlib_inline.backend_inline, which is a non-GUI backend, so cannot show the figure. fig.show()
We evaluate the variogram model by applying leave-one-out cross-validation (LOOCV), using the our custom loocv_kriging()
and rmse()
function. If you just want to evaluate the model quality, the cross_validate()
command will be sufficient!
### Your code here ...
lake_krig, residuals = loocv_kriging(
lake_data_chem.geometry.x,
lake_data_chem.geometry.y,
lake_data_chem["Zi mug/g"],
model="exponential",
)
rmse(residuals)
29.020431993960315
V_lake.cross_validate(metric="rmse")
29.174393216091467
We plot the model residuals by using our own custom function bubbleplot
, which is based on the seaborn
library which provides the scatterplot()
function. We add the lake shore line to the plot.
from kriging_helper_func import bubbleplot
### Your code here ...
fig, ax = plt.subplots(figsize=(7, 6))
bubbleplot(
x=lake_data_chem.geometry.x,
y=lake_data_chem.geometry.y,
residuals=residuals,
main="Ordinary Kriging rainfall: LOOCV residuals\n",
)
## add lake shoreline
lake.plot(ax=ax, facecolor="none", edgecolor="black")
plt.show()
Finally, we may use our model to predict the zinc concentration in Lake Rangsdorf. Note that we apply the OrdinaryKriging()
function from scikit-gstat
. We will build a meshgrid to interpolate on based on the extent of our lake
shapefile.
### Your code here ...
## meshgrid to interpolate on
## extent
xmin, ymin, xmax, ymax = lake.bounds.values[0]
# size of the grid
nx, ny = 50, 50
# generate two arrays of evenly space data between ends of previous arrays
xi = np.linspace(xmin, xmax, nx)
yi = np.linspace(ymin, ymax, ny)
# generate grid
xx, yy = np.meshgrid(xi, yi)
from skgstat import OrdinaryKriging
### Your code here ...
# set up the kriging instance
ok = OrdinaryKriging(V_lake, min_points=2, max_points=10, mode="exact")
kriging_grid = ok.transform(xx.flatten(), yy.flatten()).reshape(xx.shape)
# We calculate the kriging error on our grid:
sigma2d = ok.sigma.reshape(xx.shape)
Next, we are able to plot the kriging result and the associated kriging error next to each other.
fig, ax = plt.subplots(1, 2, figsize=(15, 10), sharey=True)
### plot 1: Interpolation grid
contour_levels = np.arange(0, 150, 100)
krig = ax[0].contourf(
xx,
yy,
kriging_grid, # levels = contour_levels,
# range(150,200,5),
cmap="viridis_r",
alpha=0.7,
)
lake.plot(ax=ax[0], facecolor="none", edgecolor="black")
fig.colorbar(krig, ax=ax[0])
ax[0].set_title("Zinc concentration Lake Rangsdorf [$\mu g/g$]")
### plot 2: sigma error
contour_levels = np.arange(0, 3000, 100)
sigma = ax[1].contourf(xx, yy, sigma2d, cmap="hot_r", levels=contour_levels, alpha=0.7)
lake.plot(ax=ax[1], facecolor="none", edgecolor="black")
fig.colorbar(sigma, ax=ax[1])
ax[1].set_title("Kriging error $sigma^2$")
plt.show()
Save interpolated grid into geoTIFF
raster
The resulting interpolated grid can be processed into a raster data set. It might come in handy to export the resulting raster as geoTIFF
file. Hence we mask out the area outside the lake shoreline. Then we plot it using the rasterio.plot.show()
function. If you want to learn more about this method, checkout this tutorial by Attard & Mälicke 2021.
from osgeo import gdal, osr
# pixel size in x and y directions.
dx = abs(xmax - xmin) / nx
dy = abs(ymax - ymin) / ny
params = (xmin - dx / 2, dx, 0, ymax + dy / 2, 0, -dy)
# file name
tif_name = "lake_rangsdorf_kriging.tif"
# Create the raster
output_raster = gdal.GetDriverByName("GTiff").Create(
tif_name, nx + 1, ny + 1, 1, gdal.GDT_Float32
)
# coordinates
output_raster.SetGeoTransform(params)
# coordinate encoding
srs = osr.SpatialReference()
# Our projection system is specified:
srs.ImportFromEPSG(4326)
# Exports the coordinate system to the file
output_raster.SetProjection(srs.ExportToWkt())
# Writes my array to the raster after some transformation due to the resulting shape of the kriging:
output_raster.GetRasterBand(1).WriteArray(np.transpose(np.flip(kriging_grid[::-1])))
output_raster.GetRasterBand(1).SetNoDataValue(0)
output_raster.FlushCache()
Use this code to read the geoTIFF
back into the workspace.
import rasterio
lake_rangsdorf_kriging = rasterio.open("lake_rangsdorf_kriging.tif")
In the previous chapters we explored methods to mask rasters with shapefiles.
# def convert_2_json(geodataframe):
# """
# Convert to GeoDataFrame features to rasterio friendly format
# Input is a GeoDataFrame
# """
# import json
# return [json.loads(geodataframe.to_json())["features"][0]["geometry"]]
from kriging_helper_func import convert_2_json
## apply
coords = convert_2_json(lake)
from rasterio.mask import mask
lake_rangsdorf_masked, lake_rangsdorf_masked_transform = mask(
dataset=lake_rangsdorf_kriging, shapes=coords, crop=True, nodata=np.nan
)
# plot with rasterio.plot, which provides matplotlib functionality
from rasterio.plot import show
fig, ax = plt.subplots(figsize=(10, 5))
kriging_plot = show(
lake_rangsdorf_masked,
ax=ax,
cmap="viridis_r",
title="Zinc concentration Lake Rangsdorf [$\mu g/g$]",
)
lake.plot(ax=ax, facecolor="none", edgecolor="lightgrey")
## add colorbar
dummy_colorbar = kriging_plot.get_images()[0]
fig.colorbar(
dummy_colorbar, ax=ax, fraction=0.046, pad=0.04, label="Zinc conc. [$\mu g/g$]"
)
plt.show()
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.