In the subsequent sections we will work with weather data provided by Deutscher Wetterdienst (DWD) (German Weather Service). Here we provide a preprocessed data set of DWD weather stations located across Germany. The data was downloaded from the DWD (German Weather Service) data portal on April 21, 2017. You may find a detailed description of the data set here. Please note that for the purpose of this tutorial the data set was preprocessed and columns have been renamed.

You may download the DWD.csv file here. We import the data set and assign a proper name to it. Note that we use the read_csv() function from the pandas library to load the data set.

In [1]:
# First, let's import the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
In [2]:
import requests
import io
from io import StringIO


url = "http://www.userpage.fu-berlin.de/soga/300/30100_data_sets/DWD.csv"
s = requests.get(url).text

dwd = pd.read_csv(StringIO(s), sep=";", decimal=",")
In [3]:
dwd.columns
Out[3]:
Index(['id', 'DWD_ID', 'STATION NAME', 'FEDERAL STATE', 'LAT', 'LON',
       'ALTITUDE', 'PERIOD', 'RECORD LENGTH', 'MEAN ANNUAL AIR TEMP',
       'MEAN MONTHLY MAX TEMP', 'MEAN MONTHLY MIN TEMP',
       'MEAN ANNUAL WIND SPEED', 'MEAN CLOUD COVER', 'MEAN ANNUAL SUNSHINE',
       'MEAN ANNUAL RAINFALL', 'MAX MONTHLY WIND SPEED', 'MAX AIR TEMP',
       'MAX WIND SPEED', 'MAX RAINFALL', 'MIN AIR TEMP',
       'MEAN RANGE AIR TEMP'],
      dtype='object')

The dwd data set consists of 599 rows, each of them representing a particular weather station in Germany, and 22 columns, each of them corresponding to a variable/feature related to that particular weather station. These self-explaining variables are: 'id', 'DWD_ID', 'STATION NAME', 'FEDERAL STATE', 'LAT', 'LON', 'ALTITUDE', 'PERIOD', 'RECORD LENGTH', 'MEAN ANNUAL AIR TEMP','MEAN MONTHLY MAX TEMP', 'MEAN MONTHLY MIN TEMP','MEAN ANNUAL WIND SPEED', 'MEAN CLOUD COVER', 'MEAN ANNUAL SUNSHINE','MEAN ANNUAL RAINFALL', 'MAX MONTHLY WIND SPEED', 'MAX AIR TEMP','MAX WIND SPEED', 'MAX RAINFALL', 'MIN AIR TEMP','MEAN RANGE AIR TEMP'.

For the purpose of the tutorial we are only interested in the variable MEAN ANNUAL RAINFALL and ALTITUDE. Hence, we subset or data set based on the variables. Further, we make sure that we exclude all missing values.

In [4]:
dwd = dwd[["LAT", "LON", "MEAN ANNUAL RAINFALL", "ALTITUDE"]].dropna()
dwd.shape[0]
Out[4]:
586

After cleaning up there are 586 observations left in our data set. In the next step we create a GeoDataFrame from the data set. Note that we provide the additional argument crs = 4326 to the function call as the coordinates in the data set are given as geographic coordinates in decimal degrees. Thereafter, we transform the GeoDataFrame object into the ETRS89/LAEA coordinate reference system (European Terrestrial Reference System 1989/Lambert Azimuthal Equal-Area projection coordinate reference system) providing the EPSG identifier $3035$.

In [5]:
## create geopandas
dwd_geo = gpd.GeoDataFrame(dwd, geometry=gpd.points_from_xy(dwd.LON, dwd.LAT), crs=4326)
dwd_geo = dwd_geo.to_crs("epsg:3035")

dwd_geo.head()
Out[5]:
LAT LON MEAN ANNUAL RAINFALL ALTITUDE geometry
0 47.8413 8.8493 755.0 478.0 POINT (4234819.614 2748192.185)
1 50.7827 6.0941 820.0 202.0 POINT (4045677.710 3081917.634)
2 52.9335 8.2370 759.0 44.0 POINT (4202462.727 3315312.398)
3 48.2156 8.9784 919.0 759.0 POINT (4245047.748 2789643.853)
4 48.6159 13.0506 790.0 340.0 POINT (4545939.250 2838265.721)

Before we continue we should remind ourselves that the data set we are working with has a spatial component. Basically, our observations are point measurements of rainfall spread across Germany. Let us plot a simple map to visualize the spatial distribution of our observations. Therefore we rely on the GeoPandas package. For the shapefiles of Germany we rely on shapefiles provided by the German Federal Agency for Cartography and Geodesy (under this licence). You may directly download the shapefiles here. For the purpose of this tutorial, we also provided the data here.

In [6]:
# Retrieve Federal States

import zipfile

url = "https://daten.gdz.bkg.bund.de/produkte/vg/vg5000_0101/aktuell/vg5000_01-01.utm32s.shape.ebenen.zip"

r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall(path="../data")
G1 = gpd.read_file(
    "../data/vg5000_01-01.utm32s.shape.ebenen/vg5000_ebenen_0101/VG5000_LAN.shp"
)

G1 = G1.to_crs("epsg:3035")
In [8]:
# plot the map
fig, ax = plt.subplots(1, 1, figsize=(8, 5))

ax.ticklabel_format(useOffset=False)

G1.plot(ax=ax, color="#fff7bc", edgecolor="black", linewidth=0.7)
dwd_geo.plot(ax=ax, facecolor="none", edgecolor="darkgrey", markersize=6)
ax.set_xlabel("Longitude")
ax.set_ylabel("Latitude")

plt.show()

For later usage we store the spatial object dwd_geo on disk.

In [9]:
dwd_geo.to_file("../data/dwd_geo.geojson", driver="GeoJSON")

Citation

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.