Worldwide Berlin is famously known for its nightlife. The data set we will work with stores the geographical location of clubs and bars located in Berlin. The data source is gathered from OpenStreetMap (OSM). The data was downloaded from GEOFABRIK on August 15, 2022, and contains OpenStreetMap data as of August 14, 2022 (see here).
We download the data and read the osm_pois_p.shp
file, which corresponds to point of interests in Berlin, using the read_file()
function from the GeoPandas
package.
# First, let's import the needed libraries.
import matplotlib.pyplot as plt
import numpy as np
import random
import pandas as pd
import geopandas as gpd
import folium
GeoPandas
package¶GeoPandas
provides two main data structures, namely GeoSeries
and GeoDataFrame
, which correspond to the pandas.Series
and pandas.DataFrame
, respectively. The GeoSeries
is a vector where each entry is a set of shapes corresponding to one observation. The GeoDataFrame
is a tabular data structure that contains a GeoSeries as a column containing the spacial information and additional information in the other columns. This spacial column is often called geometry
and can be accessed through the geometry attribute (gdf.geometry
).
The package provides three basic classes of geometric objects:
See the documentation of the GeoPandas
package here.
Reading Files to GeoPandas
GeoPandas
can read almost any vector-based spatial data format (e.g. ESRI shapefile, GeoJSON files, ...). For this purpose, we use geopandas.read_file()
. Here, we will read a shapefile stored as .zip
file to GeoDataFrame
as follows.
berlin_features = gpd.read_file("C:/Users/mceck/soga/Soga-Py/300/data/osm_pois_p.zip")
berlin_features.head()
osm_id | code | fclass | name | geometry | |
---|---|---|---|---|---|
0 | 16541597 | 2907 | camera_surveillance | Aral | POINT (13.34544 52.54644) |
1 | 26735749 | 2301 | restaurant | Aida | POINT (13.32282 52.50691) |
2 | 26735753 | 2006 | telephone | None | POINT (13.32214 52.50645) |
3 | 26735759 | 2301 | restaurant | Madame Ngo | POINT (13.31808 52.50621) |
4 | 26735763 | 2301 | restaurant | Thanh Long | POINT (13.32078 52.50732) |
berlin_features.dtypes
osm_id object code int64 fclass object name object geometry geometry dtype: object
There are 82491 records (denoted as features), represented as rows, and 5 attributes (denoted as fields), represented as columns.
By looking at the column names we realize that the category related to the point data is stored in the fclass
column. Note, that berlin_features
provides a geometry
column, that provides geometric points. Even if that column is not named geometry
, GeoPandas will be able to detect that column through the .geometry
command.
berlin_features.geometry
0 POINT (13.34544 52.54644) 1 POINT (13.32282 52.50691) 2 POINT (13.32214 52.50645) 3 POINT (13.31808 52.50621) 4 POINT (13.32078 52.50732) ... 82486 POINT (13.29562 52.43827) 82487 POINT (13.29558 52.43829) 82488 POINT (13.31360 52.47638) 82489 POINT (13.63331 52.52282) 82490 POINT (13.49256 52.54817) Name: geometry, Length: 82491, dtype: geometry
berlin_features.columns
Index(['osm_id', 'code', 'fclass', 'name', 'geometry'], dtype='object')
By applying the set()
function we get an overview of the different categories represented in the data set.
set(berlin_features["fclass"]) ## get unique values
{'archaeological', 'arts_centre', 'artwork', 'atm', 'attraction', 'bakery', 'bank', 'bar', 'battlefield', 'beauty_shop', 'bench', 'beverages', 'bicycle_rental', 'bicycle_shop', 'biergarten', 'bookshop', 'butcher', 'cafe', 'camera_surveillance', 'camp_site', 'car_dealership', 'car_rental', 'car_sharing', 'car_wash', 'caravan_site', 'chalet', 'chemist', 'cinema', 'clinic', 'clothes', 'college', 'comms_tower', 'community_centre', 'computer_shop', 'convenience', 'courthouse', 'dentist', 'department_store', 'doctors', 'dog_park', 'doityourself', 'drinking_water', 'embassy', 'fast_food', 'fire_station', 'florist', 'food_court', 'fountain', 'furniture_shop', 'garden_centre', 'general', 'gift_shop', 'golf_course', 'graveyard', 'greengrocer', 'guesthouse', 'hairdresser', 'hospital', 'hostel', 'hotel', 'hunting_stand', 'jeweller', 'kindergarten', 'kiosk', 'laundry', 'library', 'mall', 'market_place', 'memorial', 'mobile_phone_shop', 'monument', 'motel', 'museum', 'newsagent', 'nightclub', 'nursing_home', 'observation_tower', 'optician', 'outdoor_shop', 'park', 'pharmacy', 'picnic_site', 'pitch', 'playground', 'police', 'post_box', 'post_office', 'prison', 'pub', 'recycling', 'recycling_clothes', 'recycling_glass', 'recycling_paper', 'restaurant', 'ruins', 'school', 'shelter', 'shoe_shop', 'sports_centre', 'sports_shop', 'stadium', 'stationery', 'supermarket', 'swimming_pool', 'telephone', 'theatre', 'theme_park', 'toilet', 'tourist_info', 'tower', 'town_hall', 'toy_shop', 'track', 'travel_agent', 'university', 'vending_any', 'vending_cigarette', 'vending_machine', 'vending_parking', 'veterinary', 'video_shop', 'viewpoint', 'waste_basket', 'water_mill', 'water_tower', 'water_well', 'water_works', 'wayside_cross', 'wayside_shrine', 'zoo'}
Now we subset our data set to include only the category nightclub
and bar
using the |
operator.
berlin_locations = berlin_features[
(berlin_features["fclass"] == "nightclub") | (berlin_features["fclass"] == "bar")
]
We plot the relative frequency of the categories in our data set by combing the value_counts(normalize = True)
and the plt.plot()
function.
plt.figure(figsize=(8, 6))
berlin_locations["fclass"].value_counts(normalize=True).plot(
kind="bar"
) ## plot relative frequency
plt.xticks(rotation=0)
(array([0, 1]), [Text(0, 0, 'bar'), Text(1, 0, 'nightclub')])
OK, as expected there are significantly more locations denoted as bar in our data set compared to locations denoted as nightclub. To get the absolute figures we use the value_counts()
function again.
berlin_locations["fclass"].value_counts()
bar 806 nightclub 131 Name: fclass, dtype: int64
To visualize the data we plot it using the folium
package.
berlinmap = folium.Map([52.5464450, 13.34544], zoom_start=10, tiles="cartodbpositron")
group0 = folium.FeatureGroup(name='<span style=\\"color: red;\\">nightclub</span>')
group1 = folium.FeatureGroup(name='<span style=\\"color: darkblue;\\">bar</span>')
for _, row in berlin_locations.iterrows():
if row["fclass"] == "nightclub":
folium.CircleMarker(
[row.geometry.y, row.geometry.x],
popup=row["name"],
radius=3,
color="red",
).add_to(group0)
elif row["fclass"] == "bar":
folium.CircleMarker(
[row.geometry.y, row.geometry.x],
popup=row["name"],
radius=3,
color="darkblue",
).add_to(group1)
group0.add_to(berlinmap)
group1.add_to(berlinmap)
folium.map.LayerControl("topright", collapsed=False).add_to(berlinmap)
berlinmap