In the previous section we downloaded a preprocessed data set of weather station data for Germany. In this section however, we download and summarize weather data provided by Deutscher Wetterdienst (DWD) (German Weather Service) by our own.
The goal is to obtain a data set of annual mean rainfall and mean annual temperature data for the 30 year period 1981-2021 for weather stations located in East Germany.
# First, let's import the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
from datetime import datetime
The data is available on the DWD data portal. Here we make use of the wetterdienst
package, a package that simplifies downloading of data from the DWD. wetterdienst
is the successor library of dwdweather2
, thus for now we will access the library via its GitHub repository.
You can install wetterdienst
as follows:
pip install wetterdienst
or via GitHub:
pip install git+https://github.com/earthobservations/wetterdienst
# pip install wetterdienst
from wetterdienst import Wetterdienst, Resolution, Period
from wetterdienst.provider.dwd.observation import (
DwdObservationRequest,
DwdObservationDataset,
DwdObservationPeriod,
DwdObservationResolution,
)
WARNING:pint.util:Redefining 'siemens' (<class 'pint.definitions.UnitDefinition'>) WARNING:pint.util:Redefining 'S' (<class 'pint.definitions.UnitDefinition'>)
The Python API avaible within the wetterdienst
package offers access to different data products. Avaible API's can be accessed by the top-level API Wetterdienst. Here we will focuss on the dwd API's, accessable through the following command.
API = Wetterdienst(provider="dwd", network="observation")
Here we focus on the weather station Berlin-Tempelhof. Detailed information on the structure of the DWD data sets can be found here, and further information on the wetterdienst
package can be found here.
The general procedure is outlined in five steps:
Select the weather station of interest based on the station identifier (station_id or name), on the temporal resolution of the record (resolution), on the variable type (parameter) and on the observation period (period or start_date/ end_date).
Download the meta data using the functionality of the wetterdienst
package by using DwdObservationRequest()
Access the desired data through values.all()
and write them to a data frame using values.all().df()
. Check the data record for completeness and drop NaN values (dropna()
)
Extract the statistic of interest, which in our case is the mean annual temperature and mean annual rainfall for the period 1981–2021.
Repeat step 1 to 4 for all stations of interest.
Step 1 and 2
Download the meta data using the functionality of the wetterdienst
package by using DwdObservationRequest()
. We subset the weather station of interest from the catalog of DWD stations using the filter_by_name()
function.
We access the data by using request arguments. The Typical requests are defined by five arguments:
Here, we focus only on the weather parameters precipitation and mean air temperature. We further restrict our search to these parameters.
dwd_tempelhof = DwdObservationRequest(
parameter=["temperature_air_mean_200", "precipitation_height"],
resolution=DwdObservationResolution.MONTHLY,
start_date=datetime(1981, 1, 1),
end_date=datetime(
2021, 12, 31
), # or specify #period=DwdObservationPeriod.HISTORICAL to get all data
).filter_by_name(name="Berlin-Tempelhof")
## access the meta data
dwd_tempelhof.df
station_id | from_date | to_date | height | latitude | longitude | name | state | |
---|---|---|---|---|---|---|---|---|
0 | 00433 | 1938-01-01 00:00:00+00:00 | 2023-05-31 00:00:00+00:00 | 48.0 | 52.4675 | 13.4021 | Berlin-Tempelhof | Berlin |
Step 3
Access the desired climate data.
dwd_tempelhof_data = dwd_tempelhof.values.all().df # .dropna()
dwd_tempelhof_data.head()
station_id | dataset | parameter | date | value | quality | |
---|---|---|---|---|---|---|
0 | 00433 | climate_summary | temperature_air_mean_200 | 1981-01-01 00:00:00+00:00 | 272.67 | 10.0 |
1 | 00433 | climate_summary | temperature_air_mean_200 | 1981-02-01 00:00:00+00:00 | 274.11 | 10.0 |
2 | 00433 | climate_summary | temperature_air_mean_200 | 1981-03-01 00:00:00+00:00 | 280.29 | 10.0 |
3 | 00433 | climate_summary | temperature_air_mean_200 | 1981-04-01 00:00:00+00:00 | 281.66 | 10.0 |
4 | 00433 | climate_summary | temperature_air_mean_200 | 1981-05-01 00:00:00+00:00 | 288.67 | 10.0 |
Next, we subset the the temperature data and the precipitation data.
dwd_tempelhof_temp = dwd_tempelhof_data[
dwd_tempelhof_data["parameter"] == "temperature_air_mean_200"
]
dwd_tempelhof_rain = dwd_tempelhof_data[
dwd_tempelhof_data["parameter"] == "precipitation_height"
]
print(
f"The temperature time series at Berlin-Tempelhof consists of {dwd_tempelhof_temp.shape[0]} monthly observations spanning a period from January 1981 to December 2021."
)
print(
f"The rainfall time series at Berlin-Tempelhof consists of {dwd_tempelhof_rain.shape[0]} monthly observations spanning a period from January 1981 to December 2021."
)
The temperature time series at Berlin-Tempelhof consists of 492 monthly observations spanning a period from January 1981 to December 2021. The rainfall time series at Berlin-Tempelhof consists of 492 monthly observations spanning a period from January 1981 to December 2021.
Check the data record for completeness.
fig, ax = plt.subplots(2, 1, figsize=(10, 6))
ax[0].plot(dwd_tempelhof_temp["date"], dwd_tempelhof_temp["value"])
ax[0].set_title("Temperture at Berlin-Tempelhof")
ax[0].set_ylabel("Temperature [°C]")
ax[1].bar(
dwd_tempelhof_rain["date"], dwd_tempelhof_rain["value"], align="center", width=50
)
ax[1].set_title("Rainfall at Berlin-Tempelhof")
ax[1].set_ylabel("Rainfall [mm]")
ax[1].set_ylim(0, 200)
plt.tight_layout()
plt.show()
Both parameters seem to be fairly complete. We can check that by using isna()
.
print(f"Count of NA in temperature dataset ={sum(dwd_tempelhof_temp.value.isna())}")
print(f"Count of NA in rain dataset ={sum(dwd_tempelhof_rain.value.isna())}")
Count of NA in temperature dataset =0 Count of NA in rain dataset =2
## drop NA's
dwd_tempelhof_rain = dwd_tempelhof_rain.dropna()
Step 4
Next, we compute the annual rainfall and the mean annual temperature by applying functions from the pandas
package, such as groupby()
and agg()
or just mean()
.
tempelhof_temp_mean = dwd_tempelhof_temp.groupby(dwd_tempelhof_temp["date"].dt.year)[
"value"
].mean()
tempelhof_rain_mean = dwd_tempelhof_rain.groupby(dwd_tempelhof_rain["date"].dt.year)[
"value"
].sum()
fig, ax = plt.subplots(2, 1, figsize=(10, 6))
ax[0].plot(tempelhof_temp_mean.index, tempelhof_temp_mean.values)
ax[0].set_title("Mean annual temperture at Berlin-Tempelhof")
ax[0].set_ylabel("Temperature [°C]")
ax[1].bar(tempelhof_rain_mean.index, tempelhof_rain_mean.values)
ax[1].set_title("Annual rainfall at Berlin-Tempelhof")
ax[1].set_ylabel("Rainfall [mm]")
ax[1].set_ylim(0, 1000)
plt.tight_layout()
plt.show()
The graph shows the annual rainfall for the period 1981 to 2021. One final step is still pending: We need to take the average of the annual series to get the mean annual rainfall at the weather station Berlin-Tempelhof for the period 1981-2021.
print(
f"Mean annual rainfall at station {dwd_tempelhof.df.name[0]}: {round(np.mean(tempelhof_rain_mean))} mm"
)
Mean annual rainfall at station Berlin-Tempelhof: 563 mm
Done! Now we are aware of the procedure to download data from the DWD and extract the statistic of interest. In the next section we will automate this procedure to download all weather stations of interest and extract the statistic of interest for each of them.
In this section we automate the procedure of downloading a data set from DWD and extracting a particular statistic of interest. Recall that the general procedure may be outlined in five steps:
Select the weather station of interest.
Download the meta data.
Download the data record and check it for completeness.
Extract the statistic of interest.
Repeat step 1–4 for all weather stations of interest.
In the previous section we already defined a subset of one weather station of interest. The code below extracts data for multiple stations based the states of Germany. Basically, we are interested in all weather stations in East Germany, which provide weather data on a monthly basis.
Step 1 and 2
We access the data by using request arguments. Here, we focus only on the weather parameters precipitation and mean air temperature. We further restrict our search to these parameters.
dwd_stations = DwdObservationRequest(
parameter=["temperature_air_mean_200", "precipitation_height"],
resolution=DwdObservationResolution.MONTHLY,
start_date=datetime(1981, 1, 1),
end_date=datetime(
2021, 12, 31
), # or specify #period=DwdObservationPeriod.HISTORICAL to get all data
).all()
dwd_stations_meta = dwd_stations.df
dwd_stations_meta
station_id | from_date | to_date | height | latitude | longitude | name | state | |
---|---|---|---|---|---|---|---|---|
0 | 00001 | 1931-01-01 00:00:00+00:00 | 1986-06-30 00:00:00+00:00 | 478.0 | 47.8413 | 8.8493 | Aach | Baden-Württemberg |
1 | 00003 | 1851-01-01 00:00:00+00:00 | 2011-03-31 00:00:00+00:00 | 202.0 | 50.7827 | 6.0941 | Aachen | Nordrhein-Westfalen |
2 | 00044 | 1971-03-01 00:00:00+00:00 | 2023-05-31 00:00:00+00:00 | 44.0 | 52.9336 | 8.2370 | Großenkneten | Niedersachsen |
3 | 00052 | 1973-01-01 00:00:00+00:00 | 2001-12-31 00:00:00+00:00 | 46.0 | 53.6623 | 10.1990 | Ahrensburg-Wulfsdorf | Schleswig-Holstein |
4 | 00061 | 1975-07-01 00:00:00+00:00 | 1978-08-31 00:00:00+00:00 | 339.0 | 48.8443 | 12.6171 | Aiterhofen | Bayern |
... | ... | ... | ... | ... | ... | ... | ... | ... |
1141 | 19510 | 1950-01-01 00:00:00+00:00 | 1970-12-31 00:00:00+00:00 | 159.0 | 51.1000 | 12.3300 | Lucka | Thüringen |
1142 | 19579 | 1971-01-01 00:00:00+00:00 | 2005-10-31 00:00:00+00:00 | 875.0 | 47.8194 | 8.3348 | Bonndorf | Baden-Württemberg |
1143 | 19580 | 1975-01-01 00:00:00+00:00 | 2005-12-31 00:00:00+00:00 | 471.0 | 48.4951 | 9.4003 | Urach, Bad/Erms | Baden-Württemberg |
1144 | 19581 | 1974-06-01 00:00:00+00:00 | 2005-03-31 00:00:00+00:00 | 600.0 | 48.6671 | 8.4700 | Enzklösterle/Schwarzwald | Baden-Württemberg |
1145 | 19582 | 1945-01-01 00:00:00+00:00 | 2000-12-31 00:00:00+00:00 | 754.0 | 48.1419 | 8.4255 | Königsfeld | Baden-Württemberg |
1146 rows × 8 columns
Now we got a nice meta data overview of the available stations and the data structure. The data provided by DWD is monthly data, however we are only interested in the mean annual rainfall and mean annual temperature for the period 1981–2021 for the east of Germany. Hence, we have to do some data preprocessing through accessing all available data and filter by the parameters and states in question.
Step 3
We will download the avaible data as follows. This step may take a while.
dwd_stations_data = dwd_stations.values.all().df.dropna()
dwd_stations_data.head()
station_id | dataset | parameter | date | value | quality | |
---|---|---|---|---|---|---|
0 | 00001 | climate_summary | temperature_air_mean_200 | 1981-01-01 00:00:00+00:00 | 269.63 | 10.0 |
1 | 00001 | climate_summary | temperature_air_mean_200 | 1981-02-01 00:00:00+00:00 | 271.37 | 10.0 |
2 | 00001 | climate_summary | temperature_air_mean_200 | 1981-03-01 00:00:00+00:00 | 280.05 | 10.0 |
3 | 00001 | climate_summary | temperature_air_mean_200 | 1981-04-01 00:00:00+00:00 | 282.50 | 10.0 |
4 | 00001 | climate_summary | temperature_air_mean_200 | 1981-05-01 00:00:00+00:00 | 285.25 | 10.0 |
As we can see, the stations data is now only encoded as station_id
's. Therefore, we have to find all station_id
's encoding stations in East Germany. Therefore, we will filter by the states of East Germany.
## states of East Germany
states = [
"Sachsen",
"Sachsen-Anhalt",
"Mecklenburg-Vorpommern",
"Brandenburg",
"Thüringen",
"Berlin",
]
## filter for East Germany meta data
dwd_stations_meta_east = dwd_stations_meta[dwd_stations_meta["state"].isin(states)]
dwd_stations_meta_east
station_id | from_date | to_date | height | latitude | longitude | name | state | |
---|---|---|---|---|---|---|---|---|
12 | 00096 | 2019-05-01 00:00:00+00:00 | 2023-05-31 00:00:00+00:00 | 50.0 | 52.9437 | 12.8518 | Neuruppin-Alt Ruppin | Brandenburg |
15 | 00116 | 1899-06-01 00:00:00+00:00 | 1973-05-31 00:00:00+00:00 | 213.0 | 50.9833 | 12.4333 | Altenburg | Thüringen |
17 | 00129 | 1991-01-01 00:00:00+00:00 | 2006-12-31 00:00:00+00:00 | 28.0 | 53.6893 | 13.2420 | Altentreptow | Mecklenburg-Vorpommern |
18 | 00131 | 2004-11-01 00:00:00+00:00 | 2023-05-31 00:00:00+00:00 | 296.0 | 51.0881 | 12.9326 | Geringswalde-Altgeringswalde | Sachsen |
25 | 00164 | 1908-06-01 00:00:00+00:00 | 2023-05-31 00:00:00+00:00 | 54.0 | 53.0316 | 13.9908 | Angermünde | Brandenburg |
... | ... | ... | ... | ... | ... | ... | ... | ... |
1137 | 19318 | 1946-11-01 00:00:00+00:00 | 1978-12-31 00:00:00+00:00 | 292.0 | 50.7180 | 10.4310 | Schmalkalden | Thüringen |
1138 | 19364 | 1937-12-01 00:00:00+00:00 | 1977-12-31 00:00:00+00:00 | 720.0 | 50.6167 | 10.8167 | Schmiedefeld/Rennsteig | Thüringen |
1139 | 19378 | 1958-01-01 00:00:00+00:00 | 1977-12-31 00:00:00+00:00 | 505.0 | 50.8333 | 10.5833 | Finsterbergen | Thüringen |
1140 | 19433 | 1887-01-01 00:00:00+00:00 | 1961-12-31 00:00:00+00:00 | 316.0 | 50.8188 | 10.3443 | Liebenstein, Bad | Thüringen |
1141 | 19510 | 1950-01-01 00:00:00+00:00 | 1970-12-31 00:00:00+00:00 | 159.0 | 51.1000 | 12.3300 | Lucka | Thüringen |
279 rows × 8 columns
## filter the meta data for East Germany station_id's
station_id_east = set(dwd_stations_meta_east["station_id"])
len(set(station_id_east))
279
There are 208 stations avaible in our East Germany meta data set. Now we are able to filter our downloaded data for the required station_id
.
# filter the downloaded data
dwd_east_data = dwd_stations_data[dwd_stations_data["station_id"].isin(station_id_east)]
len(set(dwd_east_data["station_id"]))
207
Perfect! 158 stations in East Germany match our search criteria.
Step 4
Extract the statistic and data of interest. Additionally, we compute the annual rainfall and the mean annual temperature by applying functions from the pandas
package, such as groupby()
and agg()
.
# filter by parameter
east_data_temp = dwd_east_data[dwd_east_data["parameter"] == "temperature_air_mean_200"]
east_data_rain = dwd_east_data[dwd_east_data["parameter"] == "precipitation_height"]
# keep specific column
east_data_temp = east_data_temp[["station_id", "date", "value"]]
east_data_rain = east_data_rain[["station_id", "date", "value"]]
# rename "value" column
east_data_temp = east_data_temp.rename(columns={"value": "Temperature"})
east_data_rain = east_data_rain.rename(columns={"value": "Rainfall"})
## merge both dataframes
east_data = pd.merge(
east_data_temp,
east_data_rain[["station_id", "date", "Rainfall"]],
how="outer",
on=["station_id", "date"],
)
east_data.head()
station_id | date | Temperature | Rainfall | |
---|---|---|---|---|
0 | 00096 | 2019-05-01 00:00:00+00:00 | 285.50 | 15.9 |
1 | 00096 | 2019-06-01 00:00:00+00:00 | 294.81 | 47.8 |
2 | 00096 | 2019-07-01 00:00:00+00:00 | 292.19 | 68.6 |
3 | 00096 | 2019-08-01 00:00:00+00:00 | 293.08 | 18.3 |
4 | 00096 | 2019-09-01 00:00:00+00:00 | 287.53 | 72.5 |
We now got one data frame containing only rainfall and temperature from East Germany weather stations. By looking at the date
column, we realize that we have to aggregate the monthly data to annual data. We will achieve that conveniently through a for
-loop.
annual_east_data = pd.DataFrame()
for i in set(east_data["station_id"]):
## aggregate on annual scale for each station
id_east_data = (
east_data.loc[east_data["station_id"] == i]
.groupby(east_data["date"].dt.year)
.agg({"station_id": "first", "Temperature": "mean", "Rainfall": "sum"})
)
id_east_data["date"] = id_east_data.index
## append them again to one data frame
annual_east_data = pd.concat([annual_east_data, id_east_data], ignore_index=True)
annual_east_data.head()
station_id | Temperature | Rainfall | date | |
---|---|---|---|---|
0 | 07343 | 283.581429 | 569.5 | 2006 |
1 | 07343 | 280.102500 | 1069.1 | 2007 |
2 | 07343 | 279.665833 | 884.8 | 2008 |
3 | 07343 | 279.225833 | 974.4 | 2009 |
4 | 07343 | 277.920833 | 1032.2 | 2010 |
Merge the meta data and the climate data based on the station_id
to add the geospatial information.
east_data_complete_ann = pd.merge(
dwd_stations_meta_east, annual_east_data, on="station_id"
)
east_data_complete_ann.head()
station_id | from_date | to_date | height | latitude | longitude | name | state | Temperature | Rainfall | date | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 00096 | 2019-05-01 00:00:00+00:00 | 2023-05-31 00:00:00+00:00 | 50.0 | 52.9437 | 12.8518 | Neuruppin-Alt Ruppin | Brandenburg | 286.592500 | 396.6 | 2019 |
1 | 00096 | 2019-05-01 00:00:00+00:00 | 2023-05-31 00:00:00+00:00 | 50.0 | 52.9437 | 12.8518 | Neuruppin-Alt Ruppin | Brandenburg | 283.888333 | 507.3 | 2020 |
2 | 00096 | 2019-05-01 00:00:00+00:00 | 2023-05-31 00:00:00+00:00 | 50.0 | 52.9437 | 12.8518 | Neuruppin-Alt Ruppin | Brandenburg | 282.621667 | 632.8 | 2021 |
3 | 00129 | 1991-01-01 00:00:00+00:00 | 2006-12-31 00:00:00+00:00 | 28.0 | 53.6893 | 13.2420 | Altentreptow | Mecklenburg-Vorpommern | 281.313333 | 435.0 | 1991 |
4 | 00129 | 1991-01-01 00:00:00+00:00 | 2006-12-31 00:00:00+00:00 | 28.0 | 53.6893 | 13.2420 | Altentreptow | Mecklenburg-Vorpommern | 277.042857 | 291.3 | 1992 |
east_data_complete_ann.to_csv("..\data\East_Germany.csv", index=False)
## read the csv like so:
## pd.read_csv('..\data\East_Germany.csv')
Perfect, we downloaded the desired data manually. Now let's automate this process for easily downloading of DWD data and saving them to disk.
In the next step we download each of these weather stations, check the data for completeness and extract the observational data for the period 1981-2021. Finally we store the sum of annual rainfall and the mean annual temperature together with the weather station identifier (station_id
), the altitude (height
), the geographical coordinates (latitude
and longitude
), the name of the station (name
) and the name of the federal state state
, the weather station is situated.
For our convenience we wrap the repetitive part of this procedure into a function, denoted as download_dwd_data
. Go through the function defined below, read it line for line and try to figure out what is happening.
def download_dwd_data(
parameter=["precipitation_height"],
resolution="monthly",
start_date="1981-01-01",
end_date="2021-01-01",
RUN=True,
):
"""This function provides easy downloading of DWD data from DWD API.
returns one dataframe of meta data and desired data
arguments
paramater: list, e.g. ["temperature_air_mean_200", "precipitation_height"], ...
default = ["precipitation_height"]
resolution: char, e.g. 'annual', 'monthly', 'daily', 'hourly', '1_minute', '10_minutes',...
default = "monthly"
start_date: char, insert as "YYYY-MM-DD", e.g. "1981-01-01" (default)
end_date: char, insert as "YYYY-MM-DD", e.g. "2021-01-01"(default)
RUN: boolean, default = True
"""
## import wetterdienst package
from wetterdienst import Wetterdienst
from wetterdienst.provider.dwd.observation import DwdObservationRequest
if RUN == True:
## set API
API = Wetterdienst(provider="dwd", network="observation")
## step 1 and 2, request the data
dwd_stations_meta = DwdObservationRequest(
parameter=parameter,
resolution=resolution,
start_date=start_date,
end_date=end_date,
).all()
## step 3: download the actual data
dwd_data = dwd_stations_meta.values.all().df.dropna()
## step 4: filter by parameter
dwd_stations_data = pd.DataFrame(
{"station_id": dwd_data["station_id"], "date": dwd_data["date"]}
)
for i in set(dwd_data["parameter"]):
temporary = dwd_data[dwd_data["parameter"] == i]
# drop unnecessary columns
temporary = temporary[["station_id", "date", "value"]]
temporary = temporary.rename(columns={"value": i})
dwd_stations_data = pd.merge(
dwd_stations_data, temporary, how="outer", on=["station_id", "date"]
)
# merge metadata and data
data_complete = pd.merge(
dwd_stations_meta.df, dwd_stations_data, on="station_id"
)
## step 5: save to disk
data_complete.to_csv("..\data\DWD_data.csv", index=False)
return dwd_stations_meta.df, dwd_stations_data, data_complete
else:
## DOWNLOAD PREPROCESSED DATA ##
import requests
import io
from io import StringIO
url = "http://www.userpage.fu-berlin.de/~soga/300/30100_data_sets/dwd_data_1981-2010.csv"
s = requests.get(url).text
dwd_stations = pd.read_csv(StringIO(s), sep=",", decimal=",")
print("########################################################")
print("### NOTE THAT YOUR ARE DOWNLOADING PREPROCESSED DATA ###")
print("### THE ARGUMENTS TO THE FUNCTION CALL ARE DISMISSED ###")
print("########################################################")
return dwd_stations
For the ease of application we included the RUN
argument in the function. If RUN = True
then the main function body will be executed, the data will be downloaded and processed accordingly. Depending on you internet connection this may take some time. However, if RUN = False
the function only downloads a prepared DataFrame
object fitting the purpose of this tutorial. This second option may be of interest, if you just want to follow along this tutorial. For sure this second approach will consume much less time.
## test it!
meta_data, _, data_complete = download_dwd_data(
parameter=["temperature_air_mean_200", "precipitation_height"],
resolution="annual",
start_date="1981-01-01",
end_date="2021-12-01",
)
If the download takes to much time, use the code below.
#### add RUN = False argument in the function
# data_complete = download_dwd_data(RUN = False)
Filter the downloaded data for East Germany.
states = [
"Sachsen",
"Sachsen-Anhalt",
"Mecklenburg-Vorpommern",
"Brandenburg",
"Thüringen",
"Berlin",
]
## filter for East Germany meta data
meta_data_east = meta_data[meta_data["state"].isin(states)]
## filter for East Germany data
data_complete_east = data_complete[data_complete["state"].isin(states)]
The resulting data frame (data_complete_east
) contains data of 203 weather stations, with the features 'station_id', 'from_date', 'to_date', 'height', 'latitude', 'longitude','name', 'state', 'date', 'precipitation_height', 'temperature_air_mean_200'.
Now that we downloaded the weather data of interest from DWD we may inspect the data set.
data_complete_east.head(15)
station_id | from_date | to_date | height | latitude | longitude | name | state | date | temperature_air_mean_200 | precipitation_height | |
---|---|---|---|---|---|---|---|---|---|---|---|
522 | 00096 | 2019-04-09 00:00:00+00:00 | 2023-05-31 00:00:00+00:00 | 50.0 | 52.9437 | 12.8518 | Neuruppin-Alt Ruppin | Brandenburg | 2020-01-01 00:00:00+00:00 | 283.89 | 507.3 |
523 | 00096 | 2019-04-09 00:00:00+00:00 | 2023-05-31 00:00:00+00:00 | 50.0 | 52.9437 | 12.8518 | Neuruppin-Alt Ruppin | Brandenburg | 2020-01-01 00:00:00+00:00 | 283.89 | 507.3 |
524 | 00096 | 2019-04-09 00:00:00+00:00 | 2023-05-31 00:00:00+00:00 | 50.0 | 52.9437 | 12.8518 | Neuruppin-Alt Ruppin | Brandenburg | 2021-01-01 00:00:00+00:00 | 282.62 | 632.8 |
525 | 00096 | 2019-04-09 00:00:00+00:00 | 2023-05-31 00:00:00+00:00 | 50.0 | 52.9437 | 12.8518 | Neuruppin-Alt Ruppin | Brandenburg | 2021-01-01 00:00:00+00:00 | 282.62 | 632.8 |
540 | 00129 | 1991-01-01 00:00:00+00:00 | 2006-12-31 00:00:00+00:00 | 28.0 | 53.6893 | 13.2420 | Altentreptow | Mecklenburg-Vorpommern | 1991-01-01 00:00:00+00:00 | 281.31 | 435.0 |
541 | 00129 | 1991-01-01 00:00:00+00:00 | 2006-12-31 00:00:00+00:00 | 28.0 | 53.6893 | 13.2420 | Altentreptow | Mecklenburg-Vorpommern | 1991-01-01 00:00:00+00:00 | 281.31 | 435.0 |
542 | 00129 | 1991-01-01 00:00:00+00:00 | 2006-12-31 00:00:00+00:00 | 28.0 | 53.6893 | 13.2420 | Altentreptow | Mecklenburg-Vorpommern | 1993-01-01 00:00:00+00:00 | 281.20 | 628.4 |
543 | 00129 | 1991-01-01 00:00:00+00:00 | 2006-12-31 00:00:00+00:00 | 28.0 | 53.6893 | 13.2420 | Altentreptow | Mecklenburg-Vorpommern | 1993-01-01 00:00:00+00:00 | 281.20 | 628.4 |
544 | 00129 | 1991-01-01 00:00:00+00:00 | 2006-12-31 00:00:00+00:00 | 28.0 | 53.6893 | 13.2420 | Altentreptow | Mecklenburg-Vorpommern | 1994-01-01 00:00:00+00:00 | 282.34 | 565.9 |
545 | 00129 | 1991-01-01 00:00:00+00:00 | 2006-12-31 00:00:00+00:00 | 28.0 | 53.6893 | 13.2420 | Altentreptow | Mecklenburg-Vorpommern | 1994-01-01 00:00:00+00:00 | 282.34 | 565.9 |
546 | 00129 | 1991-01-01 00:00:00+00:00 | 2006-12-31 00:00:00+00:00 | 28.0 | 53.6893 | 13.2420 | Altentreptow | Mecklenburg-Vorpommern | 1995-01-01 00:00:00+00:00 | 281.57 | 537.9 |
547 | 00129 | 1991-01-01 00:00:00+00:00 | 2006-12-31 00:00:00+00:00 | 28.0 | 53.6893 | 13.2420 | Altentreptow | Mecklenburg-Vorpommern | 1995-01-01 00:00:00+00:00 | 281.57 | 537.9 |
548 | 00129 | 1991-01-01 00:00:00+00:00 | 2006-12-31 00:00:00+00:00 | 28.0 | 53.6893 | 13.2420 | Altentreptow | Mecklenburg-Vorpommern | 1996-01-01 00:00:00+00:00 | 279.90 | 443.5 |
549 | 00129 | 1991-01-01 00:00:00+00:00 | 2006-12-31 00:00:00+00:00 | 28.0 | 53.6893 | 13.2420 | Altentreptow | Mecklenburg-Vorpommern | 1996-01-01 00:00:00+00:00 | 279.90 | 443.5 |
550 | 00129 | 1991-01-01 00:00:00+00:00 | 2006-12-31 00:00:00+00:00 | 28.0 | 53.6893 | 13.2420 | Altentreptow | Mecklenburg-Vorpommern | 1997-01-01 00:00:00+00:00 | 281.77 | 558.0 |
By applying the desribe()
function we print basic statistical measures of the variables temperature_air_mean_200
and precipitation_height
.
data_complete_east[["temperature_air_mean_200", "precipitation_height"]].describe()
temperature_air_mean_200 | precipitation_height | |
---|---|---|
count | 9479.000000 | 9602.000000 |
mean | 282.058983 | 652.022308 |
std | 1.494405 | 229.804514 |
min | 275.100000 | 220.200000 |
25% | 281.370000 | 517.700000 |
50% | 282.400000 | 606.200000 |
75% | 283.050000 | 717.700000 |
max | 285.080000 | 2725.000000 |
By means of visual inspection, we can check for outliers or invalid data points.
import seaborn as sns
sns.pairplot(
data_complete_east[["height", "temperature_air_mean_200", "precipitation_height"]]
)
<seaborn.axisgrid.PairGrid at 0x1e45d12b790>
The graph looks fine. We notice the expected pattern: Rainfall tends to increase with altitude, whereas temperature tends to decrease with altitude.
Finally we contextualize our data set by plotting the weather stations on an interactive folium
map.
# Retrieve Federal States
import zipfile, requests, io
url = "https://biogeo.ucdavis.edu/data/diva/adm/DEU_adm.zip"
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall(path="..")
G1 = gpd.read_file(".." + "/DEU_adm1.shp")
G1 = G1.to_crs("epsg:3035")
east_germany_states = G1[G1["NAME_1"].isin(states)]
import folium
m = folium.Map(
location=[
np.mean(meta_data_east["latitude"]),
np.mean(meta_data_east["longitude"]),
],
zoom_start=6,
tiles="OpenStreetMap",
)
folium.GeoJson(
east_germany_states, style_function=lambda feature: {"color": "grey"}
).add_to(m)
folium.GeoJson(
east_germany_states.unary_union,
style_function=lambda feature: {"color": "darkgrey"},
).add_to(m)
for _, row in meta_data_east.iterrows():
lat = row["latitude"]
lon = row["longitude"]
folium.CircleMarker(
location=[lat, lon], popup=row["name"], radius=2, color="darkblue"
).add_to(m)
m
For the purpose of later usage we create the GeoDataFrames
from the DataFrame
objects and save them to disk.
data_complete_east = gpd.GeoDataFrame(
data_complete_east,
geometry=gpd.points_from_xy(
data_complete_east.longitude, data_complete_east.latitude
),
)
east_germany = east_germany_states.unary_union
east_germany = gpd.GeoDataFrame(geometry=gpd.GeoSeries(east_germany.geoms))
data_complete_east.to_file("..\data\dwd_data_1981-21_east.geojson", driver="GeoJSON")
east_germany_states.to_file("..\data\east_germany_states.geojson", driver="GeoJSON")
east_germany.to_file("..\data\east_germany.geojson", driver="GeoJSON")
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.