In this section we will learn how to read and process raster images and multiband raster images. We will process Landsat 8 imagery of the Berlin region downloaded from the U.S. Geological Survey (USGS) EarthExplorer on October, 14 2022. The observation data of the Landsat 8 scene we will work with is the August, 3 2022. You may easily access the data here.
# First, let's import the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import rasterio
from glob import glob
A tutorial on how to use the USGS EarthExplorer Website is given here. Note that downloading data products from the USGS might take some time and requires and account!
Once we downloaded the .tar
folder containing our desired scene from the USGS EartExplorer we will untar it with the help of python. For the sake of this tutorial, you may download the Landsat scene here and put it in your respective directory.
import os
import tarfile
# create output folder
newFolder = "../data/LandsatScene_Berlin"
# check if folder exist already
if os.path.isdir(newFolder) == False:
os.mkdir(newFolder)
# extract files
tar = tarfile.open("../data/LC08_L1TP_193023_20220803_20220806_02_T1.tar", "r:tar")
tar.extractall(newFolder)
tar.close()
You may want to check for he file size prior to loading the file. Processing large raster images can result in very long computing times.
filepath = "../data/LandsatScene_Berlin/LC08_L1TP_193023_20220803_20220806_02_T1_B4.TIF"
print(f"The file size is {os.path.getsize(filepath)/1e6} MB")
The file size is 94.067522 MB
When working with rasterdata of huge file sizes it can be very helpful to reduce the resolution for lower computation times. We will now plot a low-resolution overview of the Landsat 8 scene. Remember to set the no-data value to NaN for an improved colormap, since pixels outside of the scene are automatically set to "0". If you want to learn about this method, click here.
# acces raster as grid values as a numpy array and plot:
# https://geohackweek.github.io/raster/04-workingwithrasters/
with rasterio.open(filepath) as L8_src:
oviews = L8_src.overviews(1) # list of overviews from biggest to smallest
oview = oviews[-1] # smallest thumbnail
print("Decimation factor= {}".format(oview))
# NOTE this is using a 'decimated read' (http://rasterio.readthedocs.io/en/latest/topics/resampling.html)
low_res = L8_src.read(
1, out_shape=(1, int(L8_src.height // oview), int(L8_src.width // oview))
)
plt.imshow(low_res, cmap="terrain")
plt.colorbar(label="reflectance * 1000")
plt.title("L8 2022-08-03 - Band 4")
plt.show()
Decimation factor= 64
EarthPy
library is makes it very easy to work with spatial raster and vector data using open source tools. Especially when it comes to working with remotely sensed imagery, e.g. multiband raster images it is a nice addition to the rasterio
package. EarthPy
is also based on GeoPandas
for the vector data while rasterio
for the raster data. It also requires matplotlib
for plotting operations. A nice tutorial about the earthpy package is given in the documentation.
Another way to reduce the size of the raster data is to crop it to the area of interest. Here, we are only interested in the Berlin region. We will load a shape file covering Berlin, convert to the right crs
, get the total_bounds
of the Berlin shapefile and cut the window of the L8_band4
raster to size with the help of the EarthPy
package.
Note: It is crucial to convert all geospatial data to the same coordinate reference system (
crs
)!
# Retrieve Federal States
import zipfile, requests, io
url = "https://daten.gdz.bkg.bund.de/produkte/vg/vg5000_0101/aktuell/vg5000_01-01.utm32s.shape.ebenen.zip"
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall(path="../data")
germany_admin = gpd.read_file(
"../data/vg5000_01-01.utm32s.shape.ebenen/vg5000_ebenen_0101/VG5000_LAN.shp"
)
# load shapefile of Berlin
berlin = germany_admin[germany_admin["GEN"] == "Berlin"]
berlin = berlin.to_crs("epsg:4326")
from rasterio.plot import plotting_extent
# open L8 band 4
with rasterio.open(filepath) as src:
L8 = src.read(masked=True)[0]
extent = rasterio.plot.plotting_extent(src)
soap_profile = src.profile
# check for crs of landsat 8 image
setup_crs = src.crs
setup_crs
# convert to Landsat crs
berlin = berlin.to_crs(setup_crs)
# get total_bounds of berlin shapefile
xmin, ymin, xmax, ymax = berlin.total_bounds
xmin, ymin, xmax, ymax
(369991.8490680457, 5799545.725492317, 415748.2566591516, 5837135.521504402)
import earthpy as et
import earthpy.plot as ep
import earthpy.spatial as es
fig, ax = plt.subplots(figsize=(5, 5))
ep.plot_bands(
L8,
cmap="Greys",
vmin=0,
vmax=20000,
extent=extent,
ax=ax,
cbar=True,
title="L8 2022-08-03 - Band 4",
)
berlin.plot(ax=ax, alpha=0.9, color="red",)
<Axes: title={'center': 'L8 2022-08-03 - Band 4'}>
We use the crop_image
function from earthpy.spatial
to crop the raster layer to the total bounds of a shapefile. If you want to learn more about this method, you may click here.
with rasterio.open(filepath) as L8:
L8_crop, L8_meta = es.crop_image(L8, berlin)
# Update the metadata to have the new shape (x and y and affine information)
L8_meta.update(
{
"driver": "GTiff",
"height": L8_crop.shape[0],
"width": L8_crop.shape[1],
"transform": L8_meta["transform"],
}
)
berlin_ext = [xmin, ymin, xmax, ymax]
bound_order = [0, 2, 1, 3]
berlin_ext = [berlin_ext[b] for b in bound_order]
berlin_ext, berlin.total_bounds
([369991.8490680457, 415748.2566591516, 5799545.725492317, 5837135.521504402], array([ 369991.84906805, 5799545.72549232, 415748.25665915, 5837135.5215044 ]))
# mask the nodata and plot the newly cropped raster layer
L8_crop_ma = np.ma.masked_equal(L8_crop[0], -9999.0)
fig, ax = plt.subplots(figsize=(8, 8))
ep.plot_bands(
L8_crop_ma,
title="Landsat 8 - Band 4 \n Berlin August, 3 2022",
cmap="Greys",
vmin=0,
vmax=20000,
cbar=True,
ax=ax,
)
plt.show()