In this section we apply the IDW approach to interpolate weather data on the basis of measurements taken at 158 DWD weather stations in East Germany. The data set we use (data_east
) contains the mean annual rainfall and the mean annual temperature for the period from 1981-2022.
First, we load the required data sets and packages:
# First, let's import the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
When loading the data, make sure to get the crs
(coordinate reference system) right:
data_east = pd.read_csv(
"http://userpage.fu-berlin.de/soga/soga-py/300/30900_spatial_interpolation/East_Germany.csv"
)
data_east = gpd.GeoDataFrame(
data_east,
geometry=gpd.points_from_xy(data_east.longitude, data_east.latitude),
crs=4326,
)
For the shapefiles of Germany we rely on shapefiles provided by the German Federal Agency for Cartography and Geodesy (under this licence). You may directly download the shapefiles here. For the purpose of this tutorial, we also provided the data here.
boundary_east = gpd.read_file(
"http://userpage.fu-berlin.de/soga/soga-py/300/30900_spatial_interpolation/east_germany.geojson"
).set_crs(
"epsg:3035", allow_override=True
) ## set the correct crs
boundary_east_states = gpd.read_file(
"http://userpage.fu-berlin.de/soga/soga-py/300/30900_spatial_interpolation/east_germany_states.geojson"
)
# convert to crs 3035
data_east = data_east.to_crs("epsg:3035")
data_east_2021 = data_east[data_east.date == 2021]
fig, ax = plt.subplots(figsize=(12, 6))
boundary_east_states.plot(ax=ax, color="none")
data_east_2021.plot(
ax=ax,
markersize=3.5,
color="red",
)
plt.show()
In most real life use cases we do not interpolate one particular point in space, but we compute the values of our predictor variable on a regular grid. For our use case we build a regular grid with 50x50 cells covering the area of East Germany.
In order to set the geographical extend of our grid we take the bounding box of the boundary_east_states
spatial data set using total_bounds()
.
xmin, ymin, xmax, ymax = boundary_east_states.total_bounds
Based on the extent of the boundary box we use the meshgrid()
function from the NumPy
package to generate a data frame with x
and y
coordinates. We will call the x and y coordinate arrays xi
and yi
. These mark the points that we want to interpolate.
# # size of the grid to interpolate
nx, ny = 50, 50
# generate two arrays of evenly space data between ends of previous arrays
xi = np.linspace(xmin, xmax, nx)
yi = np.linspace(ymin, ymax, ny)
# generate grid
xi, yi = np.meshgrid(xi, yi)
For a better understating we plot the generated grid points.
plt.scatter(xi, yi, marker="o", color="none", edgecolor="grey", s=0.5)
<matplotlib.collections.PathCollection at 0x21097bd4b80>
In this section we interpolate the temperature data, given in data_east
, using the IDW method. To apply this method we use the custom funtions, which can be traced here. The advantage of this method is that we can see exactly what is happening and have direct control over the underlying formulas of our IDW. It's a very simple approach and therefore easy to understand.
Note: This simple method will consume very large ammounts of memory if a large number of points is to be calculated!
There are several more advanced packages dealing with IDW interpolation, such as GDAL
, pyidw
or arcpy
just to name a few. Alternatives methods might be a lot faster, so if you are looking for a more advanced method, the packages named here might be worth a try!
Consequently, we specify our IDW model function as follows:
def distance_matrix(x0, y0, x1, y1):
"""
Calculate distance matrix.
Note: from <http://stackoverflow.com/questions/1871536>
"""
obs = np.vstack((x0, y0)).T
interp = np.vstack((x1, y1)).T
d0 = np.subtract.outer(obs[:, 0], interp[:, 0])
d1 = np.subtract.outer(obs[:, 1], interp[:, 1])
# calculate hypotenuse
return np.hypot(d0, d1)
def simple_idw(x, y, z, xi, yi, beta=2):
"""
Simple inverse distance weighted (IDW) interpolation
x`, `y`,`z` = known data arrays containing coordinates and data used for interpolation
`xi`, `yi` = two arrays of grid coordinates
`beta` = determines the degree to which the nearer point(s) are preferred over more distant points.
Typically 1 or 2 (inverse or inverse squared relationship)
"""
dist = distance_matrix(x, y, xi, yi)
# In IDW, weights are 1 / distance
# weights = 1.0/(dist+1e-12)**power
weights = dist ** (-beta)
# Make weights sum to one
weights /= weights.sum(axis=0)
# Multiply the weights for each interpolated point by all observed Z-values
return np.dot(weights.T, z)
simple_idw
provides the arguments x
, y
and z
, which are the known data arrays containing the coordinates and the data used for interpolation. xi
and yi
are two arrays of grid coordinates, where zi
will be calculated. beta
is an additional argument, which determines the degree to which the nearer point(s) are preferred over more distant points. Typically $\beta =1$ or $\beta =2$ corresponding to an inverse or inverse squared relationship. We perform the IDW as described in the following steps.
Step 1: Generate the grid for interpolation with np.meshgrid()
.
# # size of the grid to interpolate
nx, ny = 50, 50
# generate two arrays of evenly space data between ends of previous arrays
xi = np.linspace(xmin, xmax, nx)
yi = np.linspace(ymin, ymax, ny)
# generate grid
xi, yi = np.meshgrid(xi, yi)
# colapse grid into 1D
xi, yi = xi.flatten(), yi.flatten()
Sanity test: Does the generated grid overlay the data?
fig, ax = plt.subplots(figsize=(12, 6))
boundary_east_states.plot(ax=ax, color="none")
data_east_2021.plot(
ax=ax,
markersize=3.5,
color="red",
)
ax.scatter(xi, yi, s=0.1)
plt.show()
Step 2: Define the known data arrays for interpolation. Here: known points somewhere within the grid and the corresponding temperature values.
x = data_east_2021.geometry.x
y = data_east_2021.geometry.y
z = data_east_2021.Temperature
Step 3: Calculate the IDW with the help of our custom functions.
# Calculate IDW
temp_grid = simple_idw(x, y, z, xi, yi, beta=2)
temp_grid = temp_grid.reshape((nx, ny))
Step 4: Finally we plot our grid, the administrative borders of the federal states within East Germany and the DWD weather stations.
fig, ax = plt.subplots(figsize=(12, 6))
grid = ax.imshow(
temp_grid,
extent=(xmin, xmax, ymin, ymax),
cmap="rainbow",
interpolation="gaussian",
vmin=275,
vmax=285,
)
# plt.scatter(x,y,c=z, cmap='rainbow', edgecolors='black')
ax.scatter(x, y, color="black", s=2)
plt.title("Simple IDW on Temperature [°C] (2021) \n")
boundary_east_states.plot(ax=ax, color="none")
fig.colorbar(grid, ax=ax, label="Temperature [°C]")
plt.show()
Converting NumPy array to raster
Our IDW is stored in a grid format in an NumPy array. There is the handy option to convert this array into a raster
format with the help of the rasterio
package. A This will allow is to perform raster calculations and modifications.
A raster
is a georeferenced grid. Therefore we need to feed quite a lot of information into the function.
driver
: file format
height
: Number of rows
width
: Number of columns
count
: Number of bands
nodata
: The value which represents “No Data”
dtype
: The raster data type
crs
: coordinate reference system
transform
: the origin and resolution of the grid
As we assign a spatial reference system to our grid, we have to define crs
and transform
. Therefore we may simply provide the EPSG identifier to crs
.
The rasterio
documentation is very helpful if you want to learn more about the function.
We already have all the necessary information. We want to store our raster as a GeoTIFF file (GTiff
) in writing mode (w
).
import rasterio
We need to define the transformation of our grid as a georeferenced raster. The function rasterio.transform.from_origin(xmin, ymax, x_step, y_step)
takes the coordinates of the upper left corner west, north (xmin
, ymax
) and pixel sizes (x_step
, y_step
) in x/y direction.
x_step = xi[1] - xi[0]
y_step = yi[51] - yi[0]
transform = rasterio.transform.from_origin(xmin, ymax, x_step, y_step)
East_Germay_Temp_IDW = rasterio.open(
"../East_Germay_Temp_IDW.tif",
"w",
driver="GTiff",
height=temp_grid.shape[0],
width=temp_grid.shape[1],
count=1,
dtype=temp_grid.dtype,
crs=3035,
nodata=np.nan,
transform=transform,
)
Once the file connection is established, one last thing is left to do: we have to use the .write
method to actually write the data into the file. We specify the data to write (temp_grid
) and the band index where the data will be written (Note: the band index starts from 1). Here, we only specified one band.
Finally, we have to .close
the file to make sure that writing is completed.
East_Germay_Temp_IDW.write(temp_grid, 1)
East_Germay_Temp_IDW.close()
Let's see if everything worked out. We open the file and plot it again combined with boundary_east_states
.
## open file
temp_grid_IDW = rasterio.open("../East_Germay_Temp_IDW.tif")
print(temp_grid_IDW.meta)
{'driver': 'GTiff', 'dtype': 'float64', 'nodata': nan, 'width': 50, 'height': 50, 'count': 1, 'crs': CRS.from_epsg(3035), 'transform': Affine(7356.962752158754, 0.0, 4312023.741846806, 0.0, -10285.788742817938, 3513990.204065572)}
# plot with rasterio.plot, which provides matplotlib functionality
from rasterio.plot import show
fig, ax = plt.subplots(figsize=(5, 5))
# use imshow so that we have something to map the colorbar to
colorbar_dummy = ax.imshow(
temp_grid,
extent=(xmin, xmax, ymin, ymax),
cmap="rainbow",
interpolation="gaussian",
vmin=275,
vmax=285,
)
plot = show(
temp_grid_IDW,
ax=ax,
cmap="rainbow",
title="East Germany and \n Temperature Interpolation Grid 2021 \n",
vmin=275,
vmax=285,
)
boundary_east_states.plot(ax=ax, facecolor="none", edgecolor="grey")
# add colorbar using the now hidden image
fig.colorbar(colorbar_dummy, ax=ax, label="Temperature [°C]")
plt.show()
Perfect! Sanity test passed.
An useful function, that we already know from the previous chapters, is the mask()
function from from rasterio.mask
, which creates a new raster
object that has the same values as the input raster
object, except for the masked cells, which are set to 0
by default. We apply the mask()
function to mask out the area outside the borders of East Germany. We already loaded the boundary shapefile in the very beginning of this script (boundary_east
).
The mask()
function requires the coordinates of the GeoDataFrame
to be in a .json
format. We will achieve that through a custom function.
def convert_2_json(geodataframe):
"""
Convert to GeoDataFrame features to rasterio friendly format
Input is a GeoDataFrame
"""
import json
return [json.loads(geodataframe.to_json())["features"][0]["geometry"]]
coords = convert_2_json(boundary_east)
from rasterio.mask import mask
temp_grid_IDW_masked, temp_grid_IDW_transform = mask(
dataset=temp_grid_IDW, shapes=coords, crop=True
)
We plot the clipped interpolation raster.
fig, ax = plt.subplots(figsize=(5, 5))
# use imshow so that we have something to map the colorbar to
colorbar_dummy = ax.imshow(
temp_grid,
extent=(xmin, xmax, ymin, ymax),
cmap="rainbow",
interpolation="gaussian",
vmin=275,
vmax=285,
)
plot = show(
temp_grid_IDW_masked,
ax=ax,
cmap="rainbow",
title="East Germany and \n Temperature Interpolation Grid 2021 \n",
vmin=275,
vmax=285,
)
# add colorbar using the now hidden image
fig.colorbar(colorbar_dummy, ax=ax, label="Temperature [°C]")
plt.show()
In the previous section we learned all necessary ingredients to apply the IDW approach. Let us advance and compute the IDW interpolation for the mean annual rainfall in East Germany, using the data_east
data set again.
##### interpolation
# Define the known data arrays for interpolation
x = data_east_2021.geometry.x
y = data_east_2021.geometry.y
z = data_east_2021.Rainfall
# Calculate IDW
rain_grid = simple_idw(x, y, z, xi, yi, beta=2)
rain_grid = rain_grid.reshape((nx, ny))
### Convert NumPy Array to Raster
East_Germay_Rain_IDW = rasterio.open(
"../East_Germay_Rain_IDW.tif",
"w",
driver="GTiff",
height=rain_grid.shape[0],
width=rain_grid.shape[1],
count=1,
dtype=rain_grid.dtype,
crs=3035,
nodata=np.nan,
transform=transform,
)
East_Germay_Rain_IDW.write(rain_grid, 1)
East_Germay_Rain_IDW.close()
## open file
rain_grid_IDW = rasterio.open("../East_Germay_Rain_IDW.tif")
print(rain_grid_IDW.meta)
{'driver': 'GTiff', 'dtype': 'float64', 'nodata': nan, 'width': 50, 'height': 50, 'count': 1, 'crs': CRS.from_epsg(3035), 'transform': Affine(7356.962752158754, 0.0, 4312023.741846806, 0.0, -10285.788742817938, 3513990.204065572)}
fig, ax = plt.subplots(figsize=(5, 5))
# use imshow so that we have something to map the colorbar to
colorbar_dummy = ax.imshow(
rain_grid,
extent=(xmin, xmax, ymin, ymax),
cmap="rainbow",
interpolation="gaussian",
vmin=110,
vmax=1280,
)
plot = show(
rain_grid_IDW,
ax=ax,
cmap="rainbow",
title="East Germany and \n Rainfall [mm] Interpolation Grid 2021 \n",
vmin=110,
vmax=1280,
)
boundary_east_states.plot(ax=ax, facecolor="none", edgecolor="grey")
# add colorbar using the now hidden image
fig.colorbar(colorbar_dummy, ax=ax, label="rainfall [mm]")
ax.set_ylim(ymin, ymax)
ax.set_xlim(xmin, xmax)
plt.show()
##### masking
## apply
coords = convert_2_json(boundary_east)
rain_grid_IDW_masked, rain_grid_IDW_transform = mask(
dataset=rain_grid_IDW, shapes=coords, crop=True
)
## plot the interpolation raster
fig, ax = plt.subplots(figsize=(5, 5))
# use imshow so that we have something to map the colorbar to
colorbar_dummy = ax.imshow(
rain_grid,
extent=(xmin, xmax, ymin, ymax),
cmap="rainbow",
interpolation="gaussian",
vmin=110,
vmax=1280,
)
plot = show(
rain_grid_IDW_masked,
ax=ax,
cmap="rainbow",
title="East Germany and \n Rainfall [mm] Interpolation Grid 2021 \n",
vmin=110,
vmax=1280,
)
# add colorbar using the now hidden image
fig.colorbar(colorbar_dummy, ax=ax, label="rainfall [mm]")
plt.show()
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.