Exercise: Lake Rangsdorf¶

In this section apply ordinary kriging to predict the spatial distribution of zinc in Lake Rangsdorf based on sediment samples taken during a field survey in 2017.

In [1]:
%load_ext lab_black
In [2]:
# First, let's import the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import rasterio
import skgstat as skg

In this section we will work with custom functions, that can be imported just like any other package. We will have to download the .py file to our working directory prior to importing it. The kriging_helper_func.py file is provided for downloading here.

In [3]:
## load custom helper func
from kriging_helper_func import *

Data preparation¶

We start with our data set, lake_data_chem, a object of class GeoDataFrame.

In [4]:
lake_data_chem = gpd.read_file(
    "http://userpage.fu-berlin.de/soga/soga-py/300/30900_spatial_interpolation/lake_data_chem.geojson"
)
lake = gpd.read_file(
    "http://userpage.fu-berlin.de/soga/soga-py/300/30900_spatial_interpolation/lake.geojson"
)
In [5]:
lake_data_chem.head(5)
Out[5]:
ID x y depth_m visual.depth.cm elec.cond.muS pH LOI550 LOI990 LOI ... K.mg.g Pb.mug.g Mg.mg.g Mn.mug.g Na.mg.g PO4.mg.g S.mg.g Sr.mug/g Zi mug/g geometry
0 10a 391829 5794052 0.68 34 305.0 7.19 1.04 0.20 98.51 ... 0.22 1.37 0.15 15.39 0.26 0.30 0.77 7.87 11.46 POINT (13.41414 52.28617)
1 11a 391824 5794187 0.66 28 274.0 7.33 1.27 0.19 98.29 ... 0.13 1.81 0.13 12.02 0.22 0.37 0.43 6.77 17.41 POINT (13.41402 52.28738)
2 12a 391184 5795337 1.93 15 680.0 7.25 27.32 17.46 32.99 ... 0.34 18.49 1.41 481.61 0.54 4.83 8.24 238.28 84.19 POINT (13.40427 52.29759)
3 13b 391312 5795462 3.98 15 825.0 7.22 36.93 18.25 21.60 ... 0.43 34.19 1.47 454.44 0.70 4.93 17.18 261.05 94.92 POINT (13.40611 52.29874)
4 14b 391729 5795736 4.95 18 903.0 7.05 38.33 15.50 26.44 ... 0.50 31.00 1.42 546.69 0.88 5.31 20.03 245.53 98.52 POINT (13.41213 52.30128)

5 rows × 24 columns

Here we are only interested in the Zi mug/g variable. Hence we subset the data set. Note that we have to specify the geometry column as well, since it has to be included for further processing.

In [6]:
### Your code here ...
In [7]:
lake_zinc_sp = lake_data_chem[["Zi mug/g", "geometry"]]

Sample variogram¶

Now we build the simple exponential variogram using the variogram() function. Therefore we define our coordinates as a zipped array coords and the values for interpolation as values

In [8]:
### Your code here ...
In [9]:
coords = list(zip(lake_zinc_sp.geometry.x, lake_zinc_sp.geometry.y))
values = lake_zinc_sp["Zi mug/g"]

V_lake = skg.Variogram(coords, values, model="exponential")
In [10]:
# plot the simple variogram
fig, ax = plt.subplots(figsize=(5, 4))
plt.plot(V_lake.bins, V_lake.experimental, "o", label=f"{V_lake}")
# ax.set_title(V)
plt.ylabel("semivariance")
plt.xlabel("distane Lag (-)")
plt.show()

Variogram modelling¶

We fit a variogram model and fit the parameters through eyeballing and plot the resulting variogram. We will set:

  • fit_range = 950
  • fit_sill = 2100
  • fit_nugget = 30
In [11]:
### Your code here ...
In [12]:
V_lake = skg.Variogram(
    coords, values, model="exponential", fit_range=950, fit_sill=2100, fit_nugget=30
)
V_lake.plot(hist=False)
plt.show()
C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\skgstat\plotting\variogram_plot.py:123: UserWarning: Matplotlib is currently using module://matplotlib_inline.backend_inline, which is a non-GUI backend, so cannot show the figure.
  fig.show()

Model evaluation¶

We evaluate the variogram model by applying leave-one-out cross-validation (LOOCV), using the our custom loocv_kriging() and rmse() function. If you just want to evaluate the model quality, the cross_validate() command will be sufficient!

In [13]:
### Your code here ...
In [14]:
lake_krig, residuals = loocv_kriging(
    lake_data_chem.geometry.x,
    lake_data_chem.geometry.y,
    lake_data_chem["Zi mug/g"],
    model="exponential",
)
In [15]:
rmse(residuals)
Out[15]:
29.020431993960315
In [16]:
V_lake.cross_validate(metric="rmse")
Out[16]:
29.174393216091467

We plot the model residuals by using our own custom function bubbleplot, which is based on the seaborn library which provides the scatterplot() function. We add the lake shore line to the plot.

In [17]:
from kriging_helper_func import bubbleplot
In [18]:
### Your code here ...
In [19]:
fig, ax = plt.subplots(figsize=(7, 6))
bubbleplot(
    x=lake_data_chem.geometry.x,
    y=lake_data_chem.geometry.y,
    residuals=residuals,
    main="Ordinary Kriging rainfall: LOOCV residuals\n",
)
## add lake shoreline
lake.plot(ax=ax, facecolor="none", edgecolor="black")

plt.show()

Spatial prediction¶

Finally, we may use our model to predict the zinc concentration in Lake Rangsdorf. Note that we apply the OrdinaryKriging() function from scikit-gstat. We will build a meshgrid to interpolate on based on the extent of our lake shapefile.

In [20]:
### Your code here ...
In [21]:
## meshgrid to interpolate on

## extent
xmin, ymin, xmax, ymax = lake.bounds.values[0]
# size of the grid
nx, ny = 50, 50

# generate two arrays of evenly space data between ends of previous arrays
xi = np.linspace(xmin, xmax, nx)
yi = np.linspace(ymin, ymax, ny)

# generate grid
xx, yy = np.meshgrid(xi, yi)
In [22]:
from skgstat import OrdinaryKriging

### Your code here ...
In [23]:
# set up the kriging instance
ok = OrdinaryKriging(V_lake, min_points=2, max_points=10, mode="exact")
In [24]:
kriging_grid = ok.transform(xx.flatten(), yy.flatten()).reshape(xx.shape)

# We calculate the kriging error on our grid:
sigma2d = ok.sigma.reshape(xx.shape)

Next, we are able to plot the kriging result and the associated kriging error next to each other.

In [25]:
fig, ax = plt.subplots(1, 2, figsize=(15, 10), sharey=True)


### plot 1: Interpolation grid
contour_levels = np.arange(0, 150, 100)
krig = ax[0].contourf(
    xx,
    yy,
    kriging_grid,  # levels = contour_levels,
    # range(150,200,5),
    cmap="viridis_r",
    alpha=0.7,
)

lake.plot(ax=ax[0], facecolor="none", edgecolor="black")
fig.colorbar(krig, ax=ax[0])

ax[0].set_title("Zinc concentration Lake Rangsdorf [$\mu g/g$]")

### plot 2: sigma error
contour_levels = np.arange(0, 3000, 100)

sigma = ax[1].contourf(xx, yy, sigma2d, cmap="hot_r", levels=contour_levels, alpha=0.7)

lake.plot(ax=ax[1], facecolor="none", edgecolor="black")
fig.colorbar(sigma, ax=ax[1])
ax[1].set_title("Kriging error $sigma^2$")

plt.show()

Save interpolated grid into geoTIFF raster

The resulting interpolated grid can be processed into a raster data set. It might come in handy to export the resulting raster as geoTIFF file. Hence we mask out the area outside the lake shoreline. Then we plot it using the rasterio.plot.show() function. If you want to learn more about this method, checkout this tutorial by Attard & Mälicke 2021.

In [26]:
from osgeo import gdal, osr
In [27]:
# pixel size in x and y directions.
dx = abs(xmax - xmin) / nx
dy = abs(ymax - ymin) / ny
In [28]:
params = (xmin - dx / 2, dx, 0, ymax + dy / 2, 0, -dy)

# file name
tif_name = "lake_rangsdorf_kriging.tif"

# Create the raster
output_raster = gdal.GetDriverByName("GTiff").Create(
    tif_name, nx + 1, ny + 1, 1, gdal.GDT_Float32
)

# coordinates
output_raster.SetGeoTransform(params)

# coordinate encoding
srs = osr.SpatialReference()

# Our projection system is specified:
srs.ImportFromEPSG(4326)

# Exports the coordinate system to the file
output_raster.SetProjection(srs.ExportToWkt())

# Writes my array to the raster after some transformation due to the resulting shape of the kriging:
output_raster.GetRasterBand(1).WriteArray(np.transpose(np.flip(kriging_grid[::-1])))
output_raster.GetRasterBand(1).SetNoDataValue(0)
output_raster.FlushCache()

Use this code to read the geoTIFF back into the workspace.

In [29]:
import rasterio

lake_rangsdorf_kriging = rasterio.open("lake_rangsdorf_kriging.tif")

In the previous chapters we explored methods to mask rasters with shapefiles.

In [30]:
# def convert_2_json(geodataframe):
#     """
#     Convert to GeoDataFrame features to rasterio friendly format
#     Input is a GeoDataFrame
#     """
#     import json

#     return [json.loads(geodataframe.to_json())["features"][0]["geometry"]]

from kriging_helper_func import convert_2_json

## apply
coords = convert_2_json(lake)
In [31]:
from rasterio.mask import mask

lake_rangsdorf_masked, lake_rangsdorf_masked_transform = mask(
    dataset=lake_rangsdorf_kriging, shapes=coords, crop=True, nodata=np.nan
)
In [32]:
# plot with rasterio.plot, which provides matplotlib functionality
from rasterio.plot import show

fig, ax = plt.subplots(figsize=(10, 5))

kriging_plot = show(
    lake_rangsdorf_masked,
    ax=ax,
    cmap="viridis_r",
    title="Zinc concentration Lake Rangsdorf [$\mu g/g$]",
)
lake.plot(ax=ax, facecolor="none", edgecolor="lightgrey")

## add colorbar
dummy_colorbar = kriging_plot.get_images()[0]
fig.colorbar(
    dummy_colorbar, ax=ax, fraction=0.046, pad=0.04, label="Zinc conc. [$\mu g/g$]"
)


plt.show()

Citation

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.