In this section we apply the IDW approach to interpolate weather data on the basis of measurements taken at 158 DWD weather stations in East Germany. The data set we use (data_east) contains the mean annual rainfall and the mean annual temperature for the period from 1981-2022.

First, we load the required data sets and packages:

In [2]:
# First, let's import the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd

When loading the data, make sure to get the crs (coordinate reference system) right:

In [3]:
data_east = pd.read_csv(
    "http://userpage.fu-berlin.de/soga/soga-py/300/30900_spatial_interpolation/East_Germany.csv"
)
data_east = gpd.GeoDataFrame(
    data_east,
    geometry=gpd.points_from_xy(data_east.longitude, data_east.latitude),
    crs=4326,
)

For the shapefiles of Germany we rely on shapefiles provided by the German Federal Agency for Cartography and Geodesy (under this licence). You may directly download the shapefiles here. For the purpose of this tutorial, we also provided the data here.

In [4]:
boundary_east = gpd.read_file(
    "http://userpage.fu-berlin.de/soga/soga-py/300/30900_spatial_interpolation/east_germany.geojson"
).set_crs(
    "epsg:3035", allow_override=True
)  ## set the correct crs
boundary_east_states = gpd.read_file(
    "http://userpage.fu-berlin.de/soga/soga-py/300/30900_spatial_interpolation/east_germany_states.geojson"
)

# convert to crs 3035
data_east = data_east.to_crs("epsg:3035")
In [5]:
data_east_2021 = data_east[data_east.date == 2021]
In [6]:
fig, ax = plt.subplots(figsize=(12, 6))
boundary_east_states.plot(ax=ax, color="none")
data_east_2021.plot(
    ax=ax,
    markersize=3.5,
    color="red",
)

plt.show()

Meshgrid for interpolation¶

In most real life use cases we do not interpolate one particular point in space, but we compute the values of our predictor variable on a regular grid. For our use case we build a regular grid with 50x50 cells covering the area of East Germany.

In order to set the geographical extend of our grid we take the bounding box of the boundary_east_states spatial data set using total_bounds().

In [7]:
xmin, ymin, xmax, ymax = boundary_east_states.total_bounds

Based on the extent of the boundary box we use the meshgrid() function from the NumPy package to generate a data frame with x and y coordinates. We will call the x and y coordinate arrays xi and yi. These mark the points that we want to interpolate.

In [8]:
# # size of the grid to interpolate
nx, ny = 50, 50

# generate two arrays of evenly space data between ends of previous arrays
xi = np.linspace(xmin, xmax, nx)
yi = np.linspace(ymin, ymax, ny)

# generate grid
xi, yi = np.meshgrid(xi, yi)

For a better understating we plot the generated grid points.

In [9]:
plt.scatter(xi, yi, marker="o", color="none", edgecolor="grey", s=0.5)
Out[9]:
<matplotlib.collections.PathCollection at 0x21097bd4b80>

Interpolation of temperature data using IDW¶

In this section we interpolate the temperature data, given in data_east, using the IDW method. To apply this method we use the custom funtions, which can be traced here. The advantage of this method is that we can see exactly what is happening and have direct control over the underlying formulas of our IDW. It's a very simple approach and therefore easy to understand.

Note: This simple method will consume very large ammounts of memory if a large number of points is to be calculated!

There are several more advanced packages dealing with IDW interpolation, such as GDAL, pyidw or arcpy just to name a few. Alternatives methods might be a lot faster, so if you are looking for a more advanced method, the packages named here might be worth a try!

Consequently, we specify our IDW model function as follows:

In [10]:
def distance_matrix(x0, y0, x1, y1):
    """
    Calculate distance matrix.
    Note: from <http://stackoverflow.com/questions/1871536>
    """

    obs = np.vstack((x0, y0)).T
    interp = np.vstack((x1, y1)).T

    d0 = np.subtract.outer(obs[:, 0], interp[:, 0])
    d1 = np.subtract.outer(obs[:, 1], interp[:, 1])

    # calculate hypotenuse
    return np.hypot(d0, d1)


def simple_idw(x, y, z, xi, yi, beta=2):
    """
    Simple inverse distance weighted (IDW) interpolation
    x`, `y`,`z` = known data arrays containing coordinates and data used for interpolation
    `xi`, `yi` =  two arrays of grid coordinates
    `beta` = determines the degree to which the nearer point(s) are preferred over more distant points.
            Typically 1 or 2 (inverse or inverse squared relationship)
    """

    dist = distance_matrix(x, y, xi, yi)

    # In IDW, weights are 1 / distance
    # weights = 1.0/(dist+1e-12)**power
    weights = dist ** (-beta)

    # Make weights sum to one
    weights /= weights.sum(axis=0)

    # Multiply the weights for each interpolated point by all observed Z-values
    return np.dot(weights.T, z)

simple_idw provides the arguments x, y and z, which are the known data arrays containing the coordinates and the data used for interpolation. xi and yi are two arrays of grid coordinates, where zi will be calculated. beta is an additional argument, which determines the degree to which the nearer point(s) are preferred over more distant points. Typically $\beta =1$ or $\beta =2$ corresponding to an inverse or inverse squared relationship. We perform the IDW as described in the following steps.

Step 1: Generate the grid for interpolation with np.meshgrid().

In [11]:
# # size of the grid to interpolate
nx, ny = 50, 50

# generate two arrays of evenly space data between ends of previous arrays
xi = np.linspace(xmin, xmax, nx)
yi = np.linspace(ymin, ymax, ny)

# generate grid
xi, yi = np.meshgrid(xi, yi)

# colapse grid into 1D
xi, yi = xi.flatten(), yi.flatten()

Sanity test: Does the generated grid overlay the data?

In [12]:
fig, ax = plt.subplots(figsize=(12, 6))
boundary_east_states.plot(ax=ax, color="none")
data_east_2021.plot(
    ax=ax,
    markersize=3.5,
    color="red",
)
ax.scatter(xi, yi, s=0.1)

plt.show()

Step 2: Define the known data arrays for interpolation. Here: known points somewhere within the grid and the corresponding temperature values.

In [13]:
x = data_east_2021.geometry.x
y = data_east_2021.geometry.y
z = data_east_2021.Temperature

Step 3: Calculate the IDW with the help of our custom functions.

In [14]:
# Calculate IDW
temp_grid = simple_idw(x, y, z, xi, yi, beta=2)
temp_grid = temp_grid.reshape((nx, ny))

Step 4: Finally we plot our grid, the administrative borders of the federal states within East Germany and the DWD weather stations.

In [15]:
fig, ax = plt.subplots(figsize=(12, 6))

grid = ax.imshow(
    temp_grid,
    extent=(xmin, xmax, ymin, ymax),
    cmap="rainbow",
    interpolation="gaussian",
    vmin=275,
    vmax=285,
)
# plt.scatter(x,y,c=z, cmap='rainbow', edgecolors='black')
ax.scatter(x, y, color="black", s=2)
plt.title("Simple IDW on Temperature [°C] (2021) \n")
boundary_east_states.plot(ax=ax, color="none")

fig.colorbar(grid, ax=ax, label="Temperature [°C]")
plt.show()

Converting NumPy array to raster

Our IDW is stored in a grid format in an NumPy array. There is the handy option to convert this array into a raster format with the help of the rasterio package. A This will allow is to perform raster calculations and modifications.

A raster is a georeferenced grid. Therefore we need to feed quite a lot of information into the function.

  • driver: file format

  • height: Number of rows

  • width: Number of columns

  • count: Number of bands

  • nodata: The value which represents “No Data”

  • dtype: The raster data type

  • crs: coordinate reference system

  • transform: the origin and resolution of the grid

As we assign a spatial reference system to our grid, we have to define crs and transform. Therefore we may simply provide the EPSG identifier to crs.

The rasterio documentation is very helpful if you want to learn more about the function.

We already have all the necessary information. We want to store our raster as a GeoTIFF file (GTiff) in writing mode (w).

In [16]:
import rasterio

We need to define the transformation of our grid as a georeferenced raster. The function rasterio.transform.from_origin(xmin, ymax, x_step, y_step) takes the coordinates of the upper left corner west, north (xmin, ymax) and pixel sizes (x_step, y_step) in x/y direction.

In [17]:
x_step = xi[1] - xi[0]
y_step = yi[51] - yi[0]

transform = rasterio.transform.from_origin(xmin, ymax, x_step, y_step)
In [18]:
East_Germay_Temp_IDW = rasterio.open(
    "../East_Germay_Temp_IDW.tif",
    "w",
    driver="GTiff",
    height=temp_grid.shape[0],
    width=temp_grid.shape[1],
    count=1,
    dtype=temp_grid.dtype,
    crs=3035,
    nodata=np.nan,
    transform=transform,
)

Once the file connection is established, one last thing is left to do: we have to use the .write method to actually write the data into the file. We specify the data to write (temp_grid) and the band index where the data will be written (Note: the band index starts from 1). Here, we only specified one band. Finally, we have to .close the file to make sure that writing is completed.

In [19]:
East_Germay_Temp_IDW.write(temp_grid, 1)
East_Germay_Temp_IDW.close()

Let's see if everything worked out. We open the file and plot it again combined with boundary_east_states.

In [20]:
## open file
temp_grid_IDW = rasterio.open("../East_Germay_Temp_IDW.tif")
print(temp_grid_IDW.meta)
{'driver': 'GTiff', 'dtype': 'float64', 'nodata': nan, 'width': 50, 'height': 50, 'count': 1, 'crs': CRS.from_epsg(3035), 'transform': Affine(7356.962752158754, 0.0, 4312023.741846806,
       0.0, -10285.788742817938, 3513990.204065572)}
In [21]:
# plot with rasterio.plot, which provides matplotlib functionality
from rasterio.plot import show
In [22]:
fig, ax = plt.subplots(figsize=(5, 5))

# use imshow so that we have something to map the colorbar to
colorbar_dummy = ax.imshow(
    temp_grid,
    extent=(xmin, xmax, ymin, ymax),
    cmap="rainbow",
    interpolation="gaussian",
    vmin=275,
    vmax=285,
)

plot = show(
    temp_grid_IDW,
    ax=ax,
    cmap="rainbow",
    title="East Germany and \n Temperature Interpolation Grid 2021 \n",
    vmin=275,
    vmax=285,
)
boundary_east_states.plot(ax=ax, facecolor="none", edgecolor="grey")

# add colorbar using the now hidden image
fig.colorbar(colorbar_dummy, ax=ax, label="Temperature [°C]")

plt.show()

Perfect! Sanity test passed.

An useful function, that we already know from the previous chapters, is the mask() function from from rasterio.mask, which creates a new raster object that has the same values as the input raster object, except for the masked cells, which are set to 0 by default. We apply the mask() function to mask out the area outside the borders of East Germany. We already loaded the boundary shapefile in the very beginning of this script (boundary_east).

The mask() function requires the coordinates of the GeoDataFrame to be in a .json format. We will achieve that through a custom function.

In [23]:
def convert_2_json(geodataframe):
    """
    Convert to GeoDataFrame features to rasterio friendly format
    Input is a GeoDataFrame
    """
    import json

    return [json.loads(geodataframe.to_json())["features"][0]["geometry"]]
In [24]:
coords = convert_2_json(boundary_east)
In [25]:
from rasterio.mask import mask
In [26]:
temp_grid_IDW_masked, temp_grid_IDW_transform = mask(
    dataset=temp_grid_IDW, shapes=coords, crop=True
)

We plot the clipped interpolation raster.

In [27]:
fig, ax = plt.subplots(figsize=(5, 5))

# use imshow so that we have something to map the colorbar to
colorbar_dummy = ax.imshow(
    temp_grid,
    extent=(xmin, xmax, ymin, ymax),
    cmap="rainbow",
    interpolation="gaussian",
    vmin=275,
    vmax=285,
)
plot = show(
    temp_grid_IDW_masked,
    ax=ax,
    cmap="rainbow",
    title="East Germany and \n Temperature Interpolation Grid 2021 \n",
    vmin=275,
    vmax=285,
)
# add colorbar using the now hidden image
fig.colorbar(colorbar_dummy, ax=ax, label="Temperature [°C]")

plt.show()

Interpolation of rainfall data using IDW¶

In the previous section we learned all necessary ingredients to apply the IDW approach. Let us advance and compute the IDW interpolation for the mean annual rainfall in East Germany, using the data_east data set again.

In [28]:
##### interpolation
# Define the known data arrays for interpolation
x = data_east_2021.geometry.x
y = data_east_2021.geometry.y
z = data_east_2021.Rainfall

# Calculate IDW
rain_grid = simple_idw(x, y, z, xi, yi, beta=2)
rain_grid = rain_grid.reshape((nx, ny))
In [29]:
### Convert NumPy Array to Raster
East_Germay_Rain_IDW = rasterio.open(
    "../East_Germay_Rain_IDW.tif",
    "w",
    driver="GTiff",
    height=rain_grid.shape[0],
    width=rain_grid.shape[1],
    count=1,
    dtype=rain_grid.dtype,
    crs=3035,
    nodata=np.nan,
    transform=transform,
)

East_Germay_Rain_IDW.write(rain_grid, 1)
East_Germay_Rain_IDW.close()
In [30]:
## open file
rain_grid_IDW = rasterio.open("../East_Germay_Rain_IDW.tif")
print(rain_grid_IDW.meta)
{'driver': 'GTiff', 'dtype': 'float64', 'nodata': nan, 'width': 50, 'height': 50, 'count': 1, 'crs': CRS.from_epsg(3035), 'transform': Affine(7356.962752158754, 0.0, 4312023.741846806,
       0.0, -10285.788742817938, 3513990.204065572)}
In [31]:
fig, ax = plt.subplots(figsize=(5, 5))

# use imshow so that we have something to map the colorbar to
colorbar_dummy = ax.imshow(
    rain_grid,
    extent=(xmin, xmax, ymin, ymax),
    cmap="rainbow",
    interpolation="gaussian",
    vmin=110,
    vmax=1280,
)

plot = show(
    rain_grid_IDW,
    ax=ax,
    cmap="rainbow",
    title="East Germany and \n Rainfall [mm] Interpolation Grid 2021 \n",
    vmin=110,
    vmax=1280,
)
boundary_east_states.plot(ax=ax, facecolor="none", edgecolor="grey")

# add colorbar using the now hidden image
fig.colorbar(colorbar_dummy, ax=ax, label="rainfall [mm]")
ax.set_ylim(ymin, ymax)
ax.set_xlim(xmin, xmax)
plt.show()
In [32]:
##### masking

## apply
coords = convert_2_json(boundary_east)
rain_grid_IDW_masked, rain_grid_IDW_transform = mask(
    dataset=rain_grid_IDW, shapes=coords, crop=True
)
In [33]:
## plot the interpolation raster
fig, ax = plt.subplots(figsize=(5, 5))

# use imshow so that we have something to map the colorbar to
colorbar_dummy = ax.imshow(
    rain_grid,
    extent=(xmin, xmax, ymin, ymax),
    cmap="rainbow",
    interpolation="gaussian",
    vmin=110,
    vmax=1280,
)
plot = show(
    rain_grid_IDW_masked,
    ax=ax,
    cmap="rainbow",
    title="East Germany and \n Rainfall [mm] Interpolation Grid 2021 \n",
    vmin=110,
    vmax=1280,
)
# add colorbar using the now hidden image
fig.colorbar(colorbar_dummy, ax=ax, label="rainfall [mm]")

plt.show()

Citation

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.