In the previous section we downloaded a preprocessed data set of weather station data for Germany. In this section however, we download and summarize weather data provided by Deutscher Wetterdienst (DWD) (German Weather Service) by our own.

The goal is to obtain a data set of annual mean rainfall and mean annual temperature data for the 30 year period 1981–2010 for weather stations located in East Germany.

The data is available on the DWD data portal. Here we make use of the rdwd package, a package that simplifies downloading of data from the DWD.

First we load the required package from CRAN.

library(rdwd)
library(tidyverse)

The rdwd package provides nice features such as the metaIndex, which provides an overview of the available data and the data structure.

data(metaIndex)
sample_n(metaIndex, 10)
##       Stations_id von_datum bis_datum Stationshoehe geoBreite geoLaenge
## 1746          158  20040701  20181124         210.0   50.7840    8.9545
## 61321        5335  20060614  20181125         348.0   50.8963   10.5484
## 58677        5099  18910101  20181125         131.5   49.7326    6.6131
## 34632        3015  19921013  20181125          98.0   52.2085   14.1180
## 7735          645  19470101  19520131          75.0   52.4083    7.9682
## 50292        4330  19520101  19800331         449.0   48.8026    8.9557
## 28784        2517  19860101  19920630          62.0   51.8833   13.7167
## 7368          613  20041101  20181124         206.0   51.5677    9.2324
## 12229        1016  19410101  20180930          25.0   54.7949    9.6429
## 1105          105  19000101  20181125         195.0   50.4844    6.9712
##                 Stationsname          Bundesland        res
## 1746   Amoeneburg-Ruedigheim              Hessen     hourly
## 61321          Waltershausen          Thueringen 10_minutes
## 58677            Trier-Zewen     Rheinland-Pfalz     annual
## 34632             Lindenberg         Brandenburg 10_minutes
## 7735                Bramsche       Niedersachsen      daily
## 50292              Rutesheim  Baden-Wuerttemberg     annual
## 28784 Karche-Zaacko-Schollen         Brandenburg     annual
## 7368            Borgentreich Nordrhein-Westfalen     hourly
## 12229             Langballig  Schleswig-Holstein      daily
## 1105               Ahrbrueck     Rheinland-Pfalz     annual
##                       var        per hasfile
## 1746           cloud_type     recent   FALSE
## 61321     air_temperature historical    TRUE
## 58677         more_precip     recent    TRUE
## 34632 extreme_temperature     recent    TRUE
## 7735                   kl     recent   FALSE
## 50292                  kl     recent   FALSE
## 28784         more_precip historical    TRUE
## 7368        precipitation historical    TRUE
## 12229         more_precip     recent    TRUE
## 1105          more_precip historical    TRUE

At the moment of writing this data frame lists 76,772 data sets. Here, we focus only on weather stations located in East Germany. Further we restrict our search to weather stations, which provide data on a monthly basis.

states <- c('Sachsen', 'Sachsen-Anhalt', 'Mecklenburg-Vorpommern',
            'Brandenburg', 'Thueringen', 'Berlin')

res <- "monthly"
var <- "kl"
per <- "historical"
df.station <- metaIndex[metaIndex$Bundesland %in% states &
                       metaIndex$res==res &
                       metaIndex$per==per &
                       metaIndex$var==var &
                       metaIndex$hasfile==TRUE,
                       ]
# Exclusion of station Kaltennordheim due to a corrupt file
df.station <- df.station[df.station$Stationsname != 'Kaltennordheim',]

# number of weather staions
nrow(df.station)
## [1] 262

Based on our constraints we end up with a data set of 262 weather stations. The data provided by DWD is monthly data, however we are only interested in the mean annual rainfall and mean annual temperature for the period 1981–2010. Hence, we have to do some data preprocessing. In order to gain a better intuition we first showcase the processing pipeline for one example weather station. Later we automate this pipeline, to be applied on all 262 selected weather stations.


Download example data set

Here we focus on the weather station Berlin-Tempelhof. Detailed information on the structure of the DWD data sets can be found here, and further information on the rdwd package can be found here.

The general procedure is outlined in five steps:

  1. Select the weather station of interest based on the station identifier (id), on the temporal resolution of the record (res), on the variable type (var) and on the observation period (per).

  2. Download the data using the functionality of the rdwd package.

  3. Check the data record for completeness.

  4. Extract the statistic of interest, which in our case is the mean annual temperature and mean annual rainfall for the period 1981–2010.

  5. Repeat step 1 to 4 for all stations of interest.


Step 1

We subset the weather station of interest from the catalog of DWD stations.

sample.station <- df.station[df.station$Stationsname=='Berlin-Tempelhof',]
sample.station
##      Stations_id von_datum bis_datum Stationshoehe geoBreite geoLaenge
## 5357         433  19380101  20181125            48   52.4675   13.4021
##          Stationsname Bundesland     res var        per hasfile
## 5357 Berlin-Tempelhof     Berlin monthly  kl historical    TRUE

Step 2

We download the data using the functionality of the rdwd package. First, we generate the link pointing to the file in the online database by applying the selectDWD() function.

link <- selectDWD(id = sample.station$Stations_id,
                  res = sample.station$res,
                  var = sample.station$var,
                  per = sample.station$per)
link
## [1] "ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/monthly/kl/historical/monatswerte_KL_00433_19380101_20171231_hist.zip"

Then we download the file by applying the dataDWD() function. Once the file is downloaded, we apply the readDWD() function, which reads the file, processes it and returns it as a data.frame object.

tdir <- tempdir()
file <- dataDWD(link, read = F, dir = tdir, quiet = TRUE, force = F)
clim <- readDWD(file)

Step 3

Now we check the data set for completeness by using standard R functions, such as summary(). Note that a description of the variables is available here.

summary(clim)
##   STATIONS_ID  MESS_DATUM_BEGINN  MESS_DATUM_ENDE         QN_4       
##  Min.   :433   Min.   :19380101   Min.   :19380131   Min.   : 3.000  
##  1st Qu.:433   1st Qu.:19577876   1st Qu.:19577906   1st Qu.: 5.000  
##  Median :433   Median :19775651   Median :19775681   Median : 5.000  
##  Mean   :433   Mean   :19775651   Mean   :19775680   Mean   : 6.633  
##  3rd Qu.:433   3rd Qu.:19973426   3rd Qu.:19973456   3rd Qu.:10.000  
##  Max.   :433   Max.   :20171201   Max.   :20171231   Max.   :10.000  
##                                                      NA's   :28      
##       MO_N           MO_TT            MO_TX            MO_TN         
##  Min.   :2.640   Min.   :-9.400   Min.   :-6.000   Min.   :-12.7000  
##  1st Qu.:4.593   1st Qu.: 3.605   1st Qu.: 6.133   1st Qu.:  0.8325  
##  Median :5.255   Median : 9.490   Median :13.670   Median :  5.5400  
##  Mean   :5.257   Mean   : 9.685   Mean   :13.425   Mean   :  5.8589  
##  3rd Qu.:5.970   3rd Qu.:16.203   3rd Qu.:20.802   3rd Qu.: 11.5850  
##  Max.   :7.440   Max.   :24.380   Max.   :30.300   Max.   : 17.8700  
##  NA's   :230     NA's   :28       NA's   :36       NA's   :28        
##      MO_FK           MX_TX           MX_FX           MX_TN         
##  Min.   :1.630   Min.   : 1.60   Min.   :11.30   Min.   :-22.5000  
##  1st Qu.:2.490   1st Qu.:13.50   1st Qu.:17.00   1st Qu.: -5.6000  
##  Median :2.670   Median :22.05   Median :19.10   Median : -0.4000  
##  Mean   :2.706   Mean   :21.39   Mean   :19.95   Mean   : -0.3687  
##  3rd Qu.:2.880   3rd Qu.:29.30   3rd Qu.:22.70   3rd Qu.:  6.5000  
##  Max.   :4.060   Max.   :38.10   Max.   :40.20   Max.   : 13.8000  
##  NA's   :134     NA's   :120     NA's   :396     NA's   :120       
##     MO_SD_S            QN_6            MO_RR            MX_RS        
##  Min.   :  8.00   Min.   : 3.000   Min.   :  0.60   Min.   :  0.400  
##  1st Qu.: 64.65   1st Qu.: 5.000   1st Qu.: 27.38   1st Qu.:  7.275  
##  Median :133.00   Median : 5.000   Median : 43.05   Median : 11.200  
##  Mean   :140.38   Mean   : 6.631   Mean   : 48.26   Mean   : 13.980  
##  3rd Qu.:206.62   3rd Qu.: 9.000   3rd Qu.: 62.80   3rd Qu.: 17.325  
##  Max.   :377.60   Max.   :10.000   Max.   :193.40   Max.   :119.500  
##  NA's   :396                                        NA's   :120      
##   eor     
##  eor:960  
##           
##           
##           
##           
##           
## 

Further, we make use of the xts and lubridate packages for time series analysis and plot the monthly rainfall data.

library(xts)
library(lubridate)
# create time series object of monthly rainfall
rainfall.sample <- xts(x = clim$MO_RR, 
                       order.by = as.yearmon(ymd(clim$MESS_DATUM_BEGINN)))
# plot the data
barplot(rainfall.sample, 
        ylab = "Monthly rainfall", 
        main = paste0('DWD Weather Station ', 
                      sample.station$Stationsname, 
                      ', ', 
                      sample.station$Bundesland))