In the previous section we downloaded a preprocessed data set of weather station data for Germany. In this section however, we download and summarize weather data provided by Deutscher Wetterdienst (DWD) (German Weather Service) by our own.

The goal is to obtain a data set of annual mean rainfall and mean annual temperature data for the 30 year period 1981-2021 for weather stations located in East Germany.

In [2]:
# First, let's import the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd

from datetime import datetime

The data is available on the DWD data portal. Here we make use of the wetterdienst package, a package that simplifies downloading of data from the DWD. wetterdienst is the successor library of dwdweather2, thus for now we will access the library via its GitHub repository. You can install wetterdienst as follows:

pip install wetterdienst

or via GitHub:

pip install git+https://github.com/earthobservations/wetterdienst

In [44]:
# pip install wetterdienst
In [4]:
from wetterdienst import Wetterdienst, Resolution, Period

from wetterdienst.provider.dwd.observation import (
    DwdObservationRequest,
    DwdObservationDataset,
    DwdObservationPeriod,
    DwdObservationResolution,
)
WARNING:pint.util:Redefining 'siemens' (<class 'pint.definitions.UnitDefinition'>)
WARNING:pint.util:Redefining 'S' (<class 'pint.definitions.UnitDefinition'>)

The Python API avaible within the wetterdienst package offers access to different data products. Avaible API's can be accessed by the top-level API Wetterdienst. Here we will focuss on the dwd API's, accessable through the following command.

In [5]:
API = Wetterdienst(provider="dwd", network="observation")

Download example data set¶

Here we focus on the weather station Berlin-Tempelhof. Detailed information on the structure of the DWD data sets can be found here, and further information on the wetterdienst package can be found here.

The general procedure is outlined in five steps:

  1. Select the weather station of interest based on the station identifier (station_id or name), on the temporal resolution of the record (resolution), on the variable type (parameter) and on the observation period (period or start_date/ end_date).

  2. Download the meta data using the functionality of the wetterdienst package by using DwdObservationRequest()

  3. Access the desired data through values.all() and write them to a data frame using values.all().df(). Check the data record for completeness and drop NaN values (dropna())

  4. Extract the statistic of interest, which in our case is the mean annual temperature and mean annual rainfall for the period 1981–2021.

  5. Repeat step 1 to 4 for all stations of interest.


Step 1 and 2

Download the meta data using the functionality of the wetterdienst package by using DwdObservationRequest(). We subset the weather station of interest from the catalog of DWD stations using the filter_by_name() function.

We access the data by using request arguments. The Typical requests are defined by five arguments:

  • parameter
  • resolution
  • period or start_date/ end_date

Here, we focus only on the weather parameters precipitation and mean air temperature. We further restrict our search to these parameters.

In [6]:
dwd_tempelhof = DwdObservationRequest(
    parameter=["temperature_air_mean_200", "precipitation_height"],
    resolution=DwdObservationResolution.MONTHLY,
    start_date=datetime(1981, 1, 1),
    end_date=datetime(
        2021, 12, 31
    ),  # or specify #period=DwdObservationPeriod.HISTORICAL to get all data
).filter_by_name(name="Berlin-Tempelhof")
In [7]:
## access the meta data
dwd_tempelhof.df
Out[7]:
station_id from_date to_date height latitude longitude name state
0 00433 1938-01-01 00:00:00+00:00 2023-05-31 00:00:00+00:00 48.0 52.4675 13.4021 Berlin-Tempelhof Berlin

Step 3

Access the desired climate data.

In [8]:
dwd_tempelhof_data = dwd_tempelhof.values.all().df  # .dropna()

dwd_tempelhof_data.head()
Out[8]:
station_id dataset parameter date value quality
0 00433 climate_summary temperature_air_mean_200 1981-01-01 00:00:00+00:00 272.67 10.0
1 00433 climate_summary temperature_air_mean_200 1981-02-01 00:00:00+00:00 274.11 10.0
2 00433 climate_summary temperature_air_mean_200 1981-03-01 00:00:00+00:00 280.29 10.0
3 00433 climate_summary temperature_air_mean_200 1981-04-01 00:00:00+00:00 281.66 10.0
4 00433 climate_summary temperature_air_mean_200 1981-05-01 00:00:00+00:00 288.67 10.0

Next, we subset the the temperature data and the precipitation data.

In [9]:
dwd_tempelhof_temp = dwd_tempelhof_data[
    dwd_tempelhof_data["parameter"] == "temperature_air_mean_200"
]
dwd_tempelhof_rain = dwd_tempelhof_data[
    dwd_tempelhof_data["parameter"] == "precipitation_height"
]
In [10]:
print(
    f"The temperature time series at Berlin-Tempelhof consists of {dwd_tempelhof_temp.shape[0]} monthly observations spanning a period from January 1981 to December 2021."
)
print(
    f"The rainfall time series at Berlin-Tempelhof consists of {dwd_tempelhof_rain.shape[0]} monthly observations spanning a period from January 1981 to December 2021."
)
The temperature time series at Berlin-Tempelhof consists of 492 monthly observations spanning a period from January 1981 to December 2021.
The rainfall time series at Berlin-Tempelhof consists of 492 monthly observations spanning a period from January 1981 to December 2021.

Check the data record for completeness.

In [11]:
fig, ax = plt.subplots(2, 1, figsize=(10, 6))

ax[0].plot(dwd_tempelhof_temp["date"], dwd_tempelhof_temp["value"])
ax[0].set_title("Temperture at Berlin-Tempelhof")
ax[0].set_ylabel("Temperature [°C]")
ax[1].bar(
    dwd_tempelhof_rain["date"], dwd_tempelhof_rain["value"], align="center", width=50
)
ax[1].set_title("Rainfall at Berlin-Tempelhof")
ax[1].set_ylabel("Rainfall [mm]")

ax[1].set_ylim(0, 200)

plt.tight_layout()
plt.show()

Both parameters seem to be fairly complete. We can check that by using isna().

In [12]:
print(f"Count of NA in temperature dataset ={sum(dwd_tempelhof_temp.value.isna())}")
print(f"Count of NA in rain dataset ={sum(dwd_tempelhof_rain.value.isna())}")
Count of NA in temperature dataset =0
Count of NA in rain dataset =2
In [13]:
## drop NA's
dwd_tempelhof_rain = dwd_tempelhof_rain.dropna()

Step 4

Next, we compute the annual rainfall and the mean annual temperature by applying functions from the pandas package, such as groupby() and agg() or just mean().

In [14]:
tempelhof_temp_mean = dwd_tempelhof_temp.groupby(dwd_tempelhof_temp["date"].dt.year)[
    "value"
].mean()
tempelhof_rain_mean = dwd_tempelhof_rain.groupby(dwd_tempelhof_rain["date"].dt.year)[
    "value"
].sum()
In [15]:
fig, ax = plt.subplots(2, 1, figsize=(10, 6))

ax[0].plot(tempelhof_temp_mean.index, tempelhof_temp_mean.values)
ax[0].set_title("Mean annual temperture at Berlin-Tempelhof")
ax[0].set_ylabel("Temperature [°C]")
ax[1].bar(tempelhof_rain_mean.index, tempelhof_rain_mean.values)
ax[1].set_title("Annual rainfall at Berlin-Tempelhof")
ax[1].set_ylabel("Rainfall [mm]")

ax[1].set_ylim(0, 1000)

plt.tight_layout()
plt.show()

The graph shows the annual rainfall for the period 1981 to 2021. One final step is still pending: We need to take the average of the annual series to get the mean annual rainfall at the weather station Berlin-Tempelhof for the period 1981-2021.

In [16]:
print(
    f"Mean annual rainfall at station {dwd_tempelhof.df.name[0]}: {round(np.mean(tempelhof_rain_mean))} mm"
)
Mean annual rainfall at station Berlin-Tempelhof: 563 mm

Done! Now we are aware of the procedure to download data from the DWD and extract the statistic of interest. In the next section we will automate this procedure to download all weather stations of interest and extract the statistic of interest for each of them.


Download data set of interest¶

In this section we automate the procedure of downloading a data set from DWD and extracting a particular statistic of interest. Recall that the general procedure may be outlined in five steps:

  1. Select the weather station of interest.

  2. Download the meta data.

  3. Download the data record and check it for completeness.

  4. Extract the statistic of interest.

  5. Repeat step 1–4 for all weather stations of interest.

In the previous section we already defined a subset of one weather station of interest. The code below extracts data for multiple stations based the states of Germany. Basically, we are interested in all weather stations in East Germany, which provide weather data on a monthly basis.

Step 1 and 2

We access the data by using request arguments. Here, we focus only on the weather parameters precipitation and mean air temperature. We further restrict our search to these parameters.

In [17]:
dwd_stations = DwdObservationRequest(
    parameter=["temperature_air_mean_200", "precipitation_height"],
    resolution=DwdObservationResolution.MONTHLY,
    start_date=datetime(1981, 1, 1),
    end_date=datetime(
        2021, 12, 31
    ),  # or specify #period=DwdObservationPeriod.HISTORICAL to get all data
).all()


dwd_stations_meta = dwd_stations.df
dwd_stations_meta
Out[17]:
station_id from_date to_date height latitude longitude name state
0 00001 1931-01-01 00:00:00+00:00 1986-06-30 00:00:00+00:00 478.0 47.8413 8.8493 Aach Baden-Württemberg
1 00003 1851-01-01 00:00:00+00:00 2011-03-31 00:00:00+00:00 202.0 50.7827 6.0941 Aachen Nordrhein-Westfalen
2 00044 1971-03-01 00:00:00+00:00 2023-05-31 00:00:00+00:00 44.0 52.9336 8.2370 Großenkneten Niedersachsen
3 00052 1973-01-01 00:00:00+00:00 2001-12-31 00:00:00+00:00 46.0 53.6623 10.1990 Ahrensburg-Wulfsdorf Schleswig-Holstein
4 00061 1975-07-01 00:00:00+00:00 1978-08-31 00:00:00+00:00 339.0 48.8443 12.6171 Aiterhofen Bayern
... ... ... ... ... ... ... ... ...
1141 19510 1950-01-01 00:00:00+00:00 1970-12-31 00:00:00+00:00 159.0 51.1000 12.3300 Lucka Thüringen
1142 19579 1971-01-01 00:00:00+00:00 2005-10-31 00:00:00+00:00 875.0 47.8194 8.3348 Bonndorf Baden-Württemberg
1143 19580 1975-01-01 00:00:00+00:00 2005-12-31 00:00:00+00:00 471.0 48.4951 9.4003 Urach, Bad/Erms Baden-Württemberg
1144 19581 1974-06-01 00:00:00+00:00 2005-03-31 00:00:00+00:00 600.0 48.6671 8.4700 Enzklösterle/Schwarzwald Baden-Württemberg
1145 19582 1945-01-01 00:00:00+00:00 2000-12-31 00:00:00+00:00 754.0 48.1419 8.4255 Königsfeld Baden-Württemberg

1146 rows × 8 columns

Now we got a nice meta data overview of the available stations and the data structure. The data provided by DWD is monthly data, however we are only interested in the mean annual rainfall and mean annual temperature for the period 1981–2021 for the east of Germany. Hence, we have to do some data preprocessing through accessing all available data and filter by the parameters and states in question.

Step 3

We will download the avaible data as follows. This step may take a while.

In [18]:
dwd_stations_data = dwd_stations.values.all().df.dropna()

dwd_stations_data.head()
Out[18]:
station_id dataset parameter date value quality
0 00001 climate_summary temperature_air_mean_200 1981-01-01 00:00:00+00:00 269.63 10.0
1 00001 climate_summary temperature_air_mean_200 1981-02-01 00:00:00+00:00 271.37 10.0
2 00001 climate_summary temperature_air_mean_200 1981-03-01 00:00:00+00:00 280.05 10.0
3 00001 climate_summary temperature_air_mean_200 1981-04-01 00:00:00+00:00 282.50 10.0
4 00001 climate_summary temperature_air_mean_200 1981-05-01 00:00:00+00:00 285.25 10.0

As we can see, the stations data is now only encoded as station_id's. Therefore, we have to find all station_id's encoding stations in East Germany. Therefore, we will filter by the states of East Germany.

In [19]:
## states of East Germany
states = [
    "Sachsen",
    "Sachsen-Anhalt",
    "Mecklenburg-Vorpommern",
    "Brandenburg",
    "Thüringen",
    "Berlin",
]

## filter for East Germany meta data
dwd_stations_meta_east = dwd_stations_meta[dwd_stations_meta["state"].isin(states)]
dwd_stations_meta_east
Out[19]:
station_id from_date to_date height latitude longitude name state
12 00096 2019-05-01 00:00:00+00:00 2023-05-31 00:00:00+00:00 50.0 52.9437 12.8518 Neuruppin-Alt Ruppin Brandenburg
15 00116 1899-06-01 00:00:00+00:00 1973-05-31 00:00:00+00:00 213.0 50.9833 12.4333 Altenburg Thüringen
17 00129 1991-01-01 00:00:00+00:00 2006-12-31 00:00:00+00:00 28.0 53.6893 13.2420 Altentreptow Mecklenburg-Vorpommern
18 00131 2004-11-01 00:00:00+00:00 2023-05-31 00:00:00+00:00 296.0 51.0881 12.9326 Geringswalde-Altgeringswalde Sachsen
25 00164 1908-06-01 00:00:00+00:00 2023-05-31 00:00:00+00:00 54.0 53.0316 13.9908 Angermünde Brandenburg
... ... ... ... ... ... ... ... ...
1137 19318 1946-11-01 00:00:00+00:00 1978-12-31 00:00:00+00:00 292.0 50.7180 10.4310 Schmalkalden Thüringen
1138 19364 1937-12-01 00:00:00+00:00 1977-12-31 00:00:00+00:00 720.0 50.6167 10.8167 Schmiedefeld/Rennsteig Thüringen
1139 19378 1958-01-01 00:00:00+00:00 1977-12-31 00:00:00+00:00 505.0 50.8333 10.5833 Finsterbergen Thüringen
1140 19433 1887-01-01 00:00:00+00:00 1961-12-31 00:00:00+00:00 316.0 50.8188 10.3443 Liebenstein, Bad Thüringen
1141 19510 1950-01-01 00:00:00+00:00 1970-12-31 00:00:00+00:00 159.0 51.1000 12.3300 Lucka Thüringen

279 rows × 8 columns

In [20]:
## filter the meta data for East Germany station_id's
station_id_east = set(dwd_stations_meta_east["station_id"])
len(set(station_id_east))
Out[20]:
279

There are 208 stations avaible in our East Germany meta data set. Now we are able to filter our downloaded data for the required station_id.

In [21]:
# filter the downloaded data
dwd_east_data = dwd_stations_data[dwd_stations_data["station_id"].isin(station_id_east)]
In [22]:
len(set(dwd_east_data["station_id"]))
Out[22]:
207

Perfect! 158 stations in East Germany match our search criteria.

Step 4

Extract the statistic and data of interest. Additionally, we compute the annual rainfall and the mean annual temperature by applying functions from the pandas package, such as groupby() and agg().

In [23]:
# filter by parameter
east_data_temp = dwd_east_data[dwd_east_data["parameter"] == "temperature_air_mean_200"]
east_data_rain = dwd_east_data[dwd_east_data["parameter"] == "precipitation_height"]

# keep specific column
east_data_temp = east_data_temp[["station_id", "date", "value"]]
east_data_rain = east_data_rain[["station_id", "date", "value"]]

# rename "value" column
east_data_temp = east_data_temp.rename(columns={"value": "Temperature"})
east_data_rain = east_data_rain.rename(columns={"value": "Rainfall"})

## merge both dataframes
east_data = pd.merge(
    east_data_temp,
    east_data_rain[["station_id", "date", "Rainfall"]],
    how="outer",
    on=["station_id", "date"],
)
In [24]:
east_data.head()
Out[24]:
station_id date Temperature Rainfall
0 00096 2019-05-01 00:00:00+00:00 285.50 15.9
1 00096 2019-06-01 00:00:00+00:00 294.81 47.8
2 00096 2019-07-01 00:00:00+00:00 292.19 68.6
3 00096 2019-08-01 00:00:00+00:00 293.08 18.3
4 00096 2019-09-01 00:00:00+00:00 287.53 72.5

We now got one data frame containing only rainfall and temperature from East Germany weather stations. By looking at the date column, we realize that we have to aggregate the monthly data to annual data. We will achieve that conveniently through a for-loop.

In [25]:
annual_east_data = pd.DataFrame()

for i in set(east_data["station_id"]):
    ## aggregate on annual scale for each station
    id_east_data = (
        east_data.loc[east_data["station_id"] == i]
        .groupby(east_data["date"].dt.year)
        .agg({"station_id": "first", "Temperature": "mean", "Rainfall": "sum"})
    )
    id_east_data["date"] = id_east_data.index
    ## append them again to one data frame
    annual_east_data = pd.concat([annual_east_data, id_east_data], ignore_index=True)
In [26]:
annual_east_data.head()
Out[26]:
station_id Temperature Rainfall date
0 07343 283.581429 569.5 2006
1 07343 280.102500 1069.1 2007
2 07343 279.665833 884.8 2008
3 07343 279.225833 974.4 2009
4 07343 277.920833 1032.2 2010

Merge the meta data and the climate data based on the station_id to add the geospatial information.

In [27]:
east_data_complete_ann = pd.merge(
    dwd_stations_meta_east, annual_east_data, on="station_id"
)
east_data_complete_ann.head()
Out[27]:
station_id from_date to_date height latitude longitude name state Temperature Rainfall date
0 00096 2019-05-01 00:00:00+00:00 2023-05-31 00:00:00+00:00 50.0 52.9437 12.8518 Neuruppin-Alt Ruppin Brandenburg 286.592500 396.6 2019
1 00096 2019-05-01 00:00:00+00:00 2023-05-31 00:00:00+00:00 50.0 52.9437 12.8518 Neuruppin-Alt Ruppin Brandenburg 283.888333 507.3 2020
2 00096 2019-05-01 00:00:00+00:00 2023-05-31 00:00:00+00:00 50.0 52.9437 12.8518 Neuruppin-Alt Ruppin Brandenburg 282.621667 632.8 2021
3 00129 1991-01-01 00:00:00+00:00 2006-12-31 00:00:00+00:00 28.0 53.6893 13.2420 Altentreptow Mecklenburg-Vorpommern 281.313333 435.0 1991
4 00129 1991-01-01 00:00:00+00:00 2006-12-31 00:00:00+00:00 28.0 53.6893 13.2420 Altentreptow Mecklenburg-Vorpommern 277.042857 291.3 1992
In [28]:
east_data_complete_ann.to_csv("..\data\East_Germany.csv", index=False)
## read the csv like so:
## pd.read_csv('..\data\East_Germany.csv')

Perfect, we downloaded the desired data manually. Now let's automate this process for easily downloading of DWD data and saving them to disk.


In the next step we download each of these weather stations, check the data for completeness and extract the observational data for the period 1981-2021. Finally we store the sum of annual rainfall and the mean annual temperature together with the weather station identifier (station_id), the altitude (height), the geographical coordinates (latitude and longitude), the name of the station (name) and the name of the federal state state, the weather station is situated.

For our convenience we wrap the repetitive part of this procedure into a function, denoted as download_dwd_data. Go through the function defined below, read it line for line and try to figure out what is happening.

In [29]:
def download_dwd_data(
    parameter=["precipitation_height"],
    resolution="monthly",
    start_date="1981-01-01",
    end_date="2021-01-01",
    RUN=True,
):

    """This function provides easy downloading of DWD data from DWD API.
    returns one dataframe of meta data and desired data
    arguments
        paramater: list, e.g. ["temperature_air_mean_200", "precipitation_height"], ...
                    default = ["precipitation_height"]
        resolution: char, e.g. 'annual', 'monthly', 'daily', 'hourly', '1_minute', '10_minutes',...
                    default = "monthly"
        start_date: char, insert as "YYYY-MM-DD", e.g. "1981-01-01" (default)
        end_date: char, insert as "YYYY-MM-DD", e.g. "2021-01-01"(default)
        RUN: boolean, default = True
    """
    ## import wetterdienst package
    from wetterdienst import Wetterdienst
    from wetterdienst.provider.dwd.observation import DwdObservationRequest

    if RUN == True:

        ## set API
        API = Wetterdienst(provider="dwd", network="observation")
        ## step 1 and 2, request the data
        dwd_stations_meta = DwdObservationRequest(
            parameter=parameter,
            resolution=resolution,
            start_date=start_date,
            end_date=end_date,
        ).all()

        ## step 3: download the actual data
        dwd_data = dwd_stations_meta.values.all().df.dropna()

        ## step 4: filter by parameter
        dwd_stations_data = pd.DataFrame(
            {"station_id": dwd_data["station_id"], "date": dwd_data["date"]}
        )

        for i in set(dwd_data["parameter"]):
            temporary = dwd_data[dwd_data["parameter"] == i]
            # drop unnecessary columns
            temporary = temporary[["station_id", "date", "value"]]
            temporary = temporary.rename(columns={"value": i})
            dwd_stations_data = pd.merge(
                dwd_stations_data, temporary, how="outer", on=["station_id", "date"]
            )

        # merge metadata and data
        data_complete = pd.merge(
            dwd_stations_meta.df, dwd_stations_data, on="station_id"
        )

        ## step 5: save to disk
        data_complete.to_csv("..\data\DWD_data.csv", index=False)

        return dwd_stations_meta.df, dwd_stations_data, data_complete
    else:
        ## DOWNLOAD PREPROCESSED DATA ##
        import requests
        import io
        from io import StringIO

        url = "http://www.userpage.fu-berlin.de/~soga/300/30100_data_sets/dwd_data_1981-2010.csv"
        s = requests.get(url).text
        dwd_stations = pd.read_csv(StringIO(s), sep=",", decimal=",")

        print("########################################################")
        print("### NOTE THAT YOUR ARE DOWNLOADING PREPROCESSED DATA ###")
        print("### THE ARGUMENTS TO THE FUNCTION CALL ARE DISMISSED ###")
        print("########################################################")

        return dwd_stations

For the ease of application we included the RUN argument in the function. If RUN = True then the main function body will be executed, the data will be downloaded and processed accordingly. Depending on you internet connection this may take some time. However, if RUN = False the function only downloads a prepared DataFrame object fitting the purpose of this tutorial. This second option may be of interest, if you just want to follow along this tutorial. For sure this second approach will consume much less time.

In [30]:
## test it!
meta_data, _, data_complete = download_dwd_data(
    parameter=["temperature_air_mean_200", "precipitation_height"],
    resolution="annual",
    start_date="1981-01-01",
    end_date="2021-12-01",
)

If the download takes to much time, use the code below.

In [31]:
#### add RUN = False argument in the function

# data_complete = download_dwd_data(RUN = False)

Filter the downloaded data for East Germany.

In [32]:
states = [
    "Sachsen",
    "Sachsen-Anhalt",
    "Mecklenburg-Vorpommern",
    "Brandenburg",
    "Thüringen",
    "Berlin",
]
In [33]:
## filter for East Germany meta data
meta_data_east = meta_data[meta_data["state"].isin(states)]
## filter for East Germany data

data_complete_east = data_complete[data_complete["state"].isin(states)]

The resulting data frame (data_complete_east) contains data of 203 weather stations, with the features 'station_id', 'from_date', 'to_date', 'height', 'latitude', 'longitude','name', 'state', 'date', 'precipitation_height', 'temperature_air_mean_200'.


Data review¶

Now that we downloaded the weather data of interest from DWD we may inspect the data set.

In [34]:
data_complete_east.head(15)
Out[34]:
station_id from_date to_date height latitude longitude name state date temperature_air_mean_200 precipitation_height
522 00096 2019-04-09 00:00:00+00:00 2023-05-31 00:00:00+00:00 50.0 52.9437 12.8518 Neuruppin-Alt Ruppin Brandenburg 2020-01-01 00:00:00+00:00 283.89 507.3
523 00096 2019-04-09 00:00:00+00:00 2023-05-31 00:00:00+00:00 50.0 52.9437 12.8518 Neuruppin-Alt Ruppin Brandenburg 2020-01-01 00:00:00+00:00 283.89 507.3
524 00096 2019-04-09 00:00:00+00:00 2023-05-31 00:00:00+00:00 50.0 52.9437 12.8518 Neuruppin-Alt Ruppin Brandenburg 2021-01-01 00:00:00+00:00 282.62 632.8
525 00096 2019-04-09 00:00:00+00:00 2023-05-31 00:00:00+00:00 50.0 52.9437 12.8518 Neuruppin-Alt Ruppin Brandenburg 2021-01-01 00:00:00+00:00 282.62 632.8
540 00129 1991-01-01 00:00:00+00:00 2006-12-31 00:00:00+00:00 28.0 53.6893 13.2420 Altentreptow Mecklenburg-Vorpommern 1991-01-01 00:00:00+00:00 281.31 435.0
541 00129 1991-01-01 00:00:00+00:00 2006-12-31 00:00:00+00:00 28.0 53.6893 13.2420 Altentreptow Mecklenburg-Vorpommern 1991-01-01 00:00:00+00:00 281.31 435.0
542 00129 1991-01-01 00:00:00+00:00 2006-12-31 00:00:00+00:00 28.0 53.6893 13.2420 Altentreptow Mecklenburg-Vorpommern 1993-01-01 00:00:00+00:00 281.20 628.4
543 00129 1991-01-01 00:00:00+00:00 2006-12-31 00:00:00+00:00 28.0 53.6893 13.2420 Altentreptow Mecklenburg-Vorpommern 1993-01-01 00:00:00+00:00 281.20 628.4
544 00129 1991-01-01 00:00:00+00:00 2006-12-31 00:00:00+00:00 28.0 53.6893 13.2420 Altentreptow Mecklenburg-Vorpommern 1994-01-01 00:00:00+00:00 282.34 565.9
545 00129 1991-01-01 00:00:00+00:00 2006-12-31 00:00:00+00:00 28.0 53.6893 13.2420 Altentreptow Mecklenburg-Vorpommern 1994-01-01 00:00:00+00:00 282.34 565.9
546 00129 1991-01-01 00:00:00+00:00 2006-12-31 00:00:00+00:00 28.0 53.6893 13.2420 Altentreptow Mecklenburg-Vorpommern 1995-01-01 00:00:00+00:00 281.57 537.9
547 00129 1991-01-01 00:00:00+00:00 2006-12-31 00:00:00+00:00 28.0 53.6893 13.2420 Altentreptow Mecklenburg-Vorpommern 1995-01-01 00:00:00+00:00 281.57 537.9
548 00129 1991-01-01 00:00:00+00:00 2006-12-31 00:00:00+00:00 28.0 53.6893 13.2420 Altentreptow Mecklenburg-Vorpommern 1996-01-01 00:00:00+00:00 279.90 443.5
549 00129 1991-01-01 00:00:00+00:00 2006-12-31 00:00:00+00:00 28.0 53.6893 13.2420 Altentreptow Mecklenburg-Vorpommern 1996-01-01 00:00:00+00:00 279.90 443.5
550 00129 1991-01-01 00:00:00+00:00 2006-12-31 00:00:00+00:00 28.0 53.6893 13.2420 Altentreptow Mecklenburg-Vorpommern 1997-01-01 00:00:00+00:00 281.77 558.0

By applying the desribe() function we print basic statistical measures of the variables temperature_air_mean_200 and precipitation_height.

In [35]:
data_complete_east[["temperature_air_mean_200", "precipitation_height"]].describe()
Out[35]:
temperature_air_mean_200 precipitation_height
count 9479.000000 9602.000000
mean 282.058983 652.022308
std 1.494405 229.804514
min 275.100000 220.200000
25% 281.370000 517.700000
50% 282.400000 606.200000
75% 283.050000 717.700000
max 285.080000 2725.000000

By means of visual inspection, we can check for outliers or invalid data points.

In [36]:
import seaborn as sns

sns.pairplot(
    data_complete_east[["height", "temperature_air_mean_200", "precipitation_height"]]
)
Out[36]:
<seaborn.axisgrid.PairGrid at 0x1e45d12b790>