Lake Rangsdorf is located at the southern margin of the Teltow Plateau and a former proglacial drainage system, the Rangsdorf-Thyrow-Abflussbahn. Its basin has originally emerged as part of a subglacial drainage system, still visible at the panhandle Krumme Lanke and its northern continuation - the Glasow-Bach valley, whereas the main basin has been designed as a kettle hole. The northern part of Krumme Lanke still exhibits its basin shape as a glacial tunnel lake (max. lake depth 9 m with and average depth of roughly 3.5 m), where the southern main lake shows the typical morphology of a kettle lake silting up to an average depth of approx. 0.9 m (max. 2.7 m). As most of the other shallow lakes in Brandenburg, lake Rangsdorf is an eutrophic to hypertrophic lake suffering from heavy algae blooms in summer season.
Lake bathymetry and chemical sediment data of Lake Rangsdorf was obtained during a field campaign in 2017. The yet unpublished data is kindly provided by Dr. Kai Hartmann of the Department of Earth Sciences at Freie Universität Berlin.
The spatial data for lake Rangsdorf is provided as a zipped file, LakeRangsdorf.zip
. The file contains several ESRI shapefiles, which we download from the server.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import requests, zipfile, io
url = "https://userpage.fu-berlin.de/soga/300/30100_data_sets/spatial/LakeRangsdorf.zip"
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("../data")
The file shoreline_poly.shp
is a polygon shapefile of lake Rangsdorf. We load the file into memory using the read_file()
function from the GeoPandas
package and plot its geometry to get a first visual impression of the data.
lake = gpd.read_file("../data/shoreline_poly.shp")
lake
OBJECTID | Shape_Leng | Shape_Area | geometry | |
---|---|---|---|---|
0 | 1 | 12345.703746 | 2.448172e+06 | POLYGON ((392031.818 5796095.736, 392039.438 5... |
fig, ax = plt.subplots(1, 1, figsize=(5, 5))
lake.plot(color="none", ax=ax)
plt.show()
When working with geospatial represented data, we need to remember to check the coordinates reference system (CRS). When working with multiple GeoDataFrames
make sure to set the same CRS on all of them! You may check the given Projection of the GeoPandasDataframe by calling .crs
.
## check for crs
lake.crs
<Derived Projected CRS: EPSG:32633> Name: WGS 84 / UTM zone 33N Axis Info [cartesian]: - E[east]: Easting (metre) - N[north]: Northing (metre) Area of Use: - name: Between 12°E and 18°E, northern hemisphere between equator and 84°N, onshore and offshore. Austria. Bosnia and Herzegovina. Cameroon. Central African Republic. Chad. Congo. Croatia. Czechia. Democratic Republic of the Congo (Zaire). Gabon. Germany. Hungary. Italy. Libya. Malta. Niger. Nigeria. Norway. Poland. San Marino. Slovakia. Slovenia. Svalbard. Sweden. Vatican City State. - bounds: (12.0, 0.0, 18.0, 84.0) Coordinate Operation: - name: UTM zone 33N - method: Transverse Mercator Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
You can set the projection as well. Just type:
lake.set_crs("EPSG:4326")
If you want to change the projection, just type:
lake.to_crs("EPSG:32633")
We will have to tranform the data in the desired format for later plotting with the folium
package, which is EPSG:4326
.
lake = lake.to_crs("EPSG:4326")
In addition to the polygon shapefile the zip-file contains the file shoreline.shp
, a shapefile of the shoreline of lake Rangsdorf. We load the file into memory using the read_file()
function.
shoreline = gpd.read_file("../data/shoreline.shp")
Then we transform it into the MULTILINESTRING
geometry to a POINT
geometry type using the for-loop
. First, we will extract every individual line, than we will store the x and y coordinates in individual array. This allows us to assign each point on the shoreline a depth of zero, which becomes useful later, when we compute a bathymetric model of the lake.
shoreline_points_lon = []
shoreline_points_lat = []
for i in shoreline.geometry[0].geoms:
for coord in i.coords:
x_coord = coord[0]
y_coord = coord[1]
shoreline_points_lon.append(x_coord)
shoreline_points_lat.append(y_coord)
We add a feature vector denoted as depth_m
to each POINT
and assign the vector values of zero.
Note: We have to add the EPSG identifier $32633$ to the function call, accounting for the fact that the coordinates are given as Universal Transverse Mercator, zone 33 coordinates. Make us of the
crs
argument.
##construct dataframe
shoreline_points = pd.DataFrame(
{
"depth_m": [0] * len(shoreline_points_lat),
"Latitude": shoreline_points_lat,
"Longitude": shoreline_points_lon,
}
)
## create geopandas
shoreline_points = gpd.GeoDataFrame(
shoreline_points,
geometry=gpd.points_from_xy(shoreline_points.Longitude, shoreline_points.Latitude),
crs=32633,
)
shoreline_points = shoreline_points.to_crs("EPSG:4326")
# check for crs
shoreline_points.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
Perfect, the transformation worked!
There is one more spatial data set we load into memory. The file sonar.shp
contains point measurements of the lake depth obtained during a sonar survey. We load the file and plot it. Note that we make sure that the data does not contain any invalid values, such as zero depth.
fig, ax = plt.subplots(1, 1, figsize=(5, 5))
sonar = gpd.read_file("../data/sonar.shp")
sonar = sonar.to_crs("EPSG:4326")
sonar = sonar[sonar["depth_m"] > 0]
sonar.plot(c=sonar["depth_m"], markersize=3, ax=ax, legend=True)
plt.show()
In addition to spatial data we provide a chemical sediment data set (LakeRangsdorf.csv
).
url = "https://userpage.fu-berlin.de/soga/300/30100_data_sets/LakeRangsdorf.csv"
lakeRangsdorf = pd.read_csv(url)
lakeRangsdorf.head(8)
ID | x | y | depth_m | visual.depth.cm | elec.cond.muS | pH | LOI550 | LOI990 | LOI | ... | Fe.mg.g | K.mg.g | Pb.mug.g | Mg.mg.g | Mn.mug.g | Na.mg.g | PO4.mg.g | S.mg.g | Sr.mug/g | Zi mug/g | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10a | 391829 | 5794052 | 0.68 | 34 | 305.0 | 7.19 | 1.04 | 0.20 | 98.51 | ... | 0.52 | 0.22 | 1.37 | 0.15 | 15.39 | 0.26 | 0.30 | 0.77 | 7.87 | 11.46 |
1 | 11a | 391824 | 5794187 | 0.66 | 28 | 274.0 | 7.33 | 1.27 | 0.19 | 98.29 | ... | 0.42 | 0.13 | 1.81 | 0.13 | 12.02 | 0.22 | 0.37 | 0.43 | 6.77 | 17.41 |
2 | 12a | 391184 | 5795337 | 1.93 | 15 | 680.0 | 7.25 | 27.32 | 17.46 | 32.99 | ... | 4.73 | 0.34 | 18.49 | 1.41 | 481.61 | 0.54 | 4.83 | 8.24 | 238.28 | 84.19 |
3 | 13b | 391312 | 5795462 | 3.98 | 15 | 825.0 | 7.22 | 36.93 | 18.25 | 21.60 | ... | 8.58 | 0.43 | 34.19 | 1.47 | 454.44 | 0.70 | 4.93 | 17.18 | 261.05 | 94.92 |
4 | 14b | 391729 | 5795736 | 4.95 | 18 | 903.0 | 7.05 | 38.33 | 15.50 | 26.44 | ... | 10.12 | 0.50 | 31.00 | 1.42 | 546.69 | 0.88 | 5.31 | 20.03 | 245.53 | 98.52 |
5 | 15b | 391892 | 5795771 | 0.93 | 20 | 416.0 | 7.31 | 2.33 | 0.08 | 97.49 | ... | 0.81 | 0.14 | 4.14 | 0.20 | 61.32 | 0.32 | 1.18 | 1.26 | 8.78 | 30.30 |
6 | 16a | 391430 | 5795473 | 3.92 | 18 | 1060.0 | 6.82 | 37.79 | 17.67 | 22.06 | ... | 8.07 | 0.55 | 30.92 | 1.46 | 479.68 | 0.74 | 5.34 | 17.21 | 261.54 | 87.90 |
7 | 17a | 391846 | 5795878 | 8.40 | 18 | 1278.0 | 7.02 | 36.74 | 16.42 | 25.94 | ... | 9.26 | 0.64 | 28.96 | 1.45 | 653.76 | 0.72 | 5.97 | 17.07 | 257.63 | 90.17 |
8 rows × 23 columns
The data set contains 32 measurements on sediment samples taken at different locations within the lake. We make use of the gpd.GeoDataFrame()
function again to convert the PandasDataFrame
object into a GeoDataFrame
object.
Note: We have to add the EPSG identifier $32633$ to the function call, because there is no CRS set yet. For conversion into correct CRS this step is mandatory.
## create geopandas
lake_data_chem = gpd.GeoDataFrame(
lakeRangsdorf,
geometry=gpd.points_from_xy(lakeRangsdorf.x, lakeRangsdorf.y),
crs=32633,
)
lake_data_chem = lake_data_chem.to_crs("epsg:4326")
# lake_data_chem["depth_m"] = lake_data_chem["depth_m"].dropna()
lake_data_chem.head()
ID | x | y | depth_m | visual.depth.cm | elec.cond.muS | pH | LOI550 | LOI990 | LOI | ... | K.mg.g | Pb.mug.g | Mg.mg.g | Mn.mug.g | Na.mg.g | PO4.mg.g | S.mg.g | Sr.mug/g | Zi mug/g | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10a | 391829 | 5794052 | 0.68 | 34 | 305.0 | 7.19 | 1.04 | 0.20 | 98.51 | ... | 0.22 | 1.37 | 0.15 | 15.39 | 0.26 | 0.30 | 0.77 | 7.87 | 11.46 | POINT (13.41414 52.28617) |
1 | 11a | 391824 | 5794187 | 0.66 | 28 | 274.0 | 7.33 | 1.27 | 0.19 | 98.29 | ... | 0.13 | 1.81 | 0.13 | 12.02 | 0.22 | 0.37 | 0.43 | 6.77 | 17.41 | POINT (13.41402 52.28738) |
2 | 12a | 391184 | 5795337 | 1.93 | 15 | 680.0 | 7.25 | 27.32 | 17.46 | 32.99 | ... | 0.34 | 18.49 | 1.41 | 481.61 | 0.54 | 4.83 | 8.24 | 238.28 | 84.19 | POINT (13.40427 52.29759) |
3 | 13b | 391312 | 5795462 | 3.98 | 15 | 825.0 | 7.22 | 36.93 | 18.25 | 21.60 | ... | 0.43 | 34.19 | 1.47 | 454.44 | 0.70 | 4.93 | 17.18 | 261.05 | 94.92 | POINT (13.40611 52.29874) |
4 | 14b | 391729 | 5795736 | 4.95 | 18 | 903.0 | 7.05 | 38.33 | 15.50 | 26.44 | ... | 0.50 | 31.00 | 1.42 | 546.69 | 0.88 | 5.31 | 20.03 | 245.53 | 98.52 | POINT (13.41213 52.30128) |
5 rows × 24 columns
Let's perform a sanity check:
lake_data_chem.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
Perfect, we are able to proceed!
Finally, in order to contextualize the spatial data for lake Rangsdorf we make use of the folium
library.
import folium
m = folium.Map(
location=[
np.mean(shoreline_points.geometry.y),
np.mean(shoreline_points.geometry.x),
],
zoom_start=13,
tiles="OpenStreetMap",
)
folium.GeoJson(shoreline, style_function=lambda feature: {"color": "red"}).add_to(m)
for _, row in sonar.iterrows():
lat = row.geometry.y
lon = row.geometry.x
folium.CircleMarker(location=[lat, lon], radius=0.02, color="black").add_to(m)
for _, row in lake_data_chem.iterrows():
lat = row.geometry.y
lon = row.geometry.x
folium.CircleMarker(
location=[lat, lon], popup=row["ID"], radius=2, color="grey"
).add_to(m)
m
So now that we loaded the data we can do some interesting computations. Let us compute a bathymetric model by interpolation of the measured depths. First we prepare the data. The structure of the GeoDataFrames
objects make this easy. We can simply combine the data sets row-wise by calling the concat()
function from pandas
.
lake_depth = pd.concat(
[
sonar,
lake_data_chem[["depth_m", "geometry"]],
shoreline_points[["depth_m", "geometry"]],
],
ignore_index=True,
)
lake_depth.head(8)
depth_m | geometry | |
---|---|---|
0 | 1.62 | POINT (13.41039 52.29152) |
1 | 1.57 | POINT (13.41033 52.29156) |
2 | 1.57 | POINT (13.41027 52.29162) |
3 | 1.77 | POINT (13.41021 52.29165) |
4 | 1.59 | POINT (13.41017 52.29169) |
5 | 1.59 | POINT (13.41012 52.29172) |
6 | 1.64 | POINT (13.41006 52.29175) |
7 | 1.64 | POINT (13.41002 52.29179) |
If we want to model bathymetry in Python, we have to impute missing depth values. If we print the depth values in our data set we see that exactly one of them is missing, indicated by NaN
. We want to make sure that we do not have any missing values in our data set. Hence, we fill the NaN
values within the data by using the df.interpolate()
function.
lake_depth[lake_depth["depth_m"].isna()]
depth_m | geometry | |
---|---|---|
3231 | NaN | POINT (13.39906 52.28882) |
## first fill na values from chemistry dataset with mean of neighboring values
lake_depth["depth_m"] = lake_depth["depth_m"].interpolate()
lake_depth["depth_m"].isna().sum()
0
Now the missing value is imputed!
The depth values are stored as positive numbers, which is fine, but for the computation of a bathymetric model we want them as negative numbers, indicating the dept below the lake surface.
lake_depth.depth_m = lake_depth.depth_m * -1
Recall, that we want to compute a bathymetric model for lake Rangsdorf. Therefore we need to provide a bunch of coordinates at which we actually want to interpolate the depth. Therefore, we create a regular grid of points using a for-loop
and the the box()
function from shapely.geometry
. We specify the grid based on the bounding box of the lake polygon (lake
). Further we specify the grid step size (cell size) to be 10 meter.
from shapely.geometry import box
from scipy.interpolate import griddata
# grid boundary
minx, miny, maxx, maxy = lake.total_bounds
# cell size
xdelta = 0.00025 # equivalent to meters
ydelta = 0.00025 # equivalent meters
# empty array for saving the grid
grid = np.array([])
# for-loop for building the grid
for x in np.arange(minx, maxx, xdelta): # min, max step
for y in np.arange(miny, maxy, ydelta): # min, max step
cell = box(x, y, x + xdelta, y + ydelta)
grid = np.append(grid, cell)
# convert to geopandas
grid_lake = gpd.GeoDataFrame(grid, columns=["geometry"], crs="EPSG:4326")
grid_lake.head()
geometry | |
---|---|
0 | POLYGON ((13.39244 52.27644, 13.39244 52.27669... |
1 | POLYGON ((13.39244 52.27669, 13.39244 52.27694... |
2 | POLYGON ((13.39244 52.27694, 13.39244 52.27719... |
3 | POLYGON ((13.39244 52.27719, 13.39244 52.27744... |
4 | POLYGON ((13.39244 52.27744, 13.39244 52.27769... |
# Clip cells to shapefile boundary
grid_lake_clip = gpd.clip(grid_lake, lake["geometry"].iloc[0])
fig, ax = plt.subplots(1, 1, figsize=(5, 5))
ax.ticklabel_format(useOffset=False)
lake.plot(ax=ax, facecolor="None", edgecolor="r")
grid_lake_clip.plot(ax=ax, facecolor="None", edgecolor="k")
plt.show()
Perfect, we built a grid within the lake Rangsdorf boundaries! Now, we want to add a vector of zeros to the data frame, representing the not yet known depth.
grid_lake_clip["depth_m"] = [0] * len(grid_lake_clip)
We are ready for the interpolation. We use the griddata()
function from the scipy.interpolate
module. The function implements a bivariate cubic interpolation on a set of points for irregularly spaced input data. Therefore, we add the method = "cubic"
argument. You can find the documentation of the function here, since you can apply different interpolation methods. If you want to learn more about interpolations methods implemented in scipy.interpolate
you may click here.
# Interpolate the values of observed values
x = lake_depth.geometry.x
y = lake_depth.geometry.y
# get center of each raster box
grid_x = grid_lake_clip.centroid.x
grid_y = grid_lake_clip.centroid.y
# Arrange variables in griddata input format
points = (x, y)
values = lake_depth["depth_m"]
depth_inter = griddata(points, values, (grid_x, grid_y), method="cubic")
# Save interpolated data values into the geodataframe
grid_lake_clip["depth_m"] = depth_inter.round(3)
C:\Users\mceck\AppData\Local\Temp\ipykernel_2352\3987311070.py:6: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. grid_x = grid_lake_clip.centroid.x C:\Users\mceck\AppData\Local\Temp\ipykernel_2352\3987311070.py:7: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. grid_y = grid_lake_clip.centroid.y
Done!
We specified the griddata()
function, which yielded in a array. We may simply overwrite the depth_m
column from the grid_lake_clip
data frame and apply the plot()
function for GeoDataFrames
. Please note, that the cubic interpolation resulted in values > 0, which makes totally no sense for a lake surface. Therefore, we mask these values as NA
.
## mask values >0
grid_lake_clip["depth_m"].values[grid_lake_clip["depth_m"].values > 0] = np.nan
# Plot
fig, ax = plt.subplots(1, 1, figsize=(5, 5))
ax.ticklabel_format(useOffset=False)
lake.plot(ax=ax, facecolor="none", edgecolor="black")
grid_lake_clip.plot(
ax=ax,
column="depth_m",
edgecolor="none",
cmap="Reds_r",
legend=True,
legend_kwds={"label": "interpolated lake depth [m]", "orientation": "vertical"},
)
plt.show()
Nice plot! But we may even plot the interpolated data in 3D, using the plot_trisurf()
function from matplotlib
. Therefore, we import the mpl_toolkits.mplot3d
module. Please note that mpl_toolkits.mplot3d
is a very versatile module, providing a lot of different optional arguments.
from mpl_toolkits.mplot3d import axes3d
x, y, z = grid_lake_clip.centroid.x, grid_lake_clip.centroid.y, grid_lake_clip.depth_m
C:\Users\mceck\AppData\Local\Temp\ipykernel_2352\4124280058.py:3: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. x, y, z = grid_lake_clip.centroid.x, grid_lake_clip.centroid.y, grid_lake_clip.depth_m C:\Users\mceck\AppData\Local\Temp\ipykernel_2352\4124280058.py:3: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation. x, y, z = grid_lake_clip.centroid.x, grid_lake_clip.centroid.y, grid_lake_clip.depth_m
# Plot X,Y,Z
# Creating color map
my_cmap = plt.get_cmap("viridis")
fig = plt.figure(figsize=(8, 8))
ax = fig.add_subplot(111, projection="3d")
depth = ax.plot_trisurf(x, y, z, cmap=my_cmap, antialiased=True, alpha=0.8)
fig.colorbar(depth, ax=ax, shrink=0.5, aspect=8, label = "interpolated lake depth [m]")
ax.set_title("Bathymetric model Lake Rangsdorf")
ax.view_init(50, -50)
plt.show()
lake_rangsdorf_area = round(lake.to_crs(32633).area[0])
print(f"{lake_rangsdorf_area} m^2")
2448172 m^2
Based on the bathymetric model we computed in the previous section, we may as well calculate the mean and maximum depth.
grid_lake_clip.depth_m.mean() * -1
1.4172447512767163
grid_lake_clip.depth_m.min() * -1
9.746
In addition we may look at the histogram of depth values by applying the hist()
function.
fig = plt.figure(figsize=(8, 4))
plt.hist(grid_lake_clip.depth_m, density=False, bins=30)
plt.vlines(
grid_lake_clip.depth_m.mean(),
0,
1000,
colors="blue",
linestyles="solid",
label="mean depth",
)
plt.vlines(
grid_lake_clip.depth_m.min(),
0,
1000,
colors="red",
linestyles="solid",
label="max depth",
)
plt.xlabel("depth [m]")
plt.ylabel("counts")
plt.ylim(0, 1000)
plt.title("Depth Histogram Lake Rangsdorf")
plt.legend(loc="upper center")
plt.show()
For later usage we write the several objects to disk. You can write a .geojson
file, using to_file()
on the GeoDataFrames. Remember to set driver='GeoJSON'
for geospatial represented data. The command will convert data frames to a json
representation.
shoreline.to_file("../data/shoreline.geojson", driver="GeoJSON")
lake.to_file("../data/lake.geojson", driver="GeoJSON")
lake_data_chem.to_file("../data/lake_data_chem.geojson", driver="GeoJSON")
grid_lake_clip.to_file("../data/grid_lake_clip.geojson", driver="GeoJSON")
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.