In this section we will learn how to process multiband raster images and calculate vegetation indices. We will process Landsat 8 imagery of the Berlin region downloaded from the U.S. Geological Survey (USGS) EarthExplorer on October, 14 2022. The observation data of the Landsat 8 scene we will work with is the August, 3 2022. You may easily access the data here.
# First, let's import the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import os
import rasterio
import earthpy as et
import earthpy.plot as ep
import earthpy.spatial as es
from glob import glob
from rasterio.plot import plotting_extent
Load the required shapefile data, convert to desired crs
("epsg:32633"
).
# Retrieve Federal States
import zipfile, requests, io
url = "https://daten.gdz.bkg.bund.de/produkte/vg/vg5000_0101/aktuell/vg5000_01-01.utm32s.shape.ebenen.zip"
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall(path="../data")
germany_admin = gpd.read_file(
"../data/vg5000_01-01.utm32s.shape.ebenen/vg5000_ebenen_0101/VG5000_LAN.shp"
)
# load shapefile of Berlin
berlin = germany_admin[germany_admin["GEN"] == "Berlin"]
berlin = berlin.to_crs("epsg:32633")
As we saw in the previous section plotting a color composite might be a good qualitative indication for spectral properties. Charateristics of the area of interest a highlighted, e.g. vegetation, urban areas or agriculture.
# Get list of bands and sort by ascending band number (1-7)
landsat_bands_data_path = glob(
"../data/LandsatScene_Berlin/LC08_L1TP_193023_20220803_20220806_02_T1_B*[2-7].TIF"
)
# Create output directory and the output path
output_dir = "../data/LandsatScene_cropped"
# if path doesnt exist, built direcotry
if os.path.isdir(output_dir) == False:
os.mkdir(output_dir)
# crop to boundary of Berlin
band_paths_list = es.crop_all(
landsat_bands_data_path, output_dir, berlin, overwrite=True
)
# stack all cropped images
landsat_cropped_array, landsat_raster_profile = es.stack(band_paths_list, nodata=-9999)
crop_extent = plotting_extent(
landsat_cropped_array[0], landsat_raster_profile["transform"]
)
### Plot ####
fig, ax = plt.subplots(figsize=(12, 6))
berlin.boundary.plot(ax=ax, color="black", zorder=10)
ep.plot_rgb(
landsat_cropped_array,
rgb=(3, 2, 1),
ax=ax,
stretch=True,
extent=crop_extent,
title="RGB Infrared composite and Boundary Berlin \n August, 3 2022",
)
plt.show()
C:\Users\mceck\miniconda3\envs\rasterdata\lib\site-packages\numpy\lib\function_base.py:4691: UserWarning: Warning: 'partition' will ignore the 'mask' of the MaskedArray. arr.partition(
Red areas indicate a higher amounts of vegetation.
A useful application for the spectral bands is the calculation of Vegetation Indices (VI), since they provide a somewhat quantitative measure of vegetation density and productativity. There are many different indices varying in complexity, accuracy and range of application. In this section, we will calculate the NDVI, to show how to perform raster calculation of multiband raster datasets.
The NDVI stands for Normalized Difference Vegetation Index and is an indicator of the greenness of the biomes. It is used as a rough indicator of vegetation, since it is associated with vegetation density and productivity (Tucker& Sellers 1986).
$$NDVI = \frac{NIR - VIS_{red}}{NIR + VIS_{red}} $$where:
$NIR$ = spectral reflectances measured in the near infrared
$VIS_{red}$ = spectral reflectances measured red waveband
The values of the NDVI range between 0 and 1. See the table below for a NDVI classification after Rutkay et al. 2020.
NDVI range | classification |
---|---|
>0 | water |
0 - 0.03 | bare soil |
0.03 - 0.3 | sparse vegetation |
0.03 - 0.5 | moderate vegetation |
<0.5 | dense vegetation |
Exercise: Judging by the RGB Infrared composite we generated above, where would we expect high and low values for the NDVI?
Calculation the NDVI in Python
We can easily calculate NDVI for our raster stack using the normalized_diff()
function from the earthpy.spatial
. The first argument is the nIR band and the second the red band. See a tutorial on the function here.
Exercise: What are the band indices for the niR and the red band in this case?
### Your code here ...
landsat_bands_data_path
# nIR is band 5 at [3]
# red is band 4 at [2]
['../data/LandsatScene_Berlin\\LC08_L1TP_193023_20220803_20220806_02_T1_B2.TIF', '../data/LandsatScene_Berlin\\LC08_L1TP_193023_20220803_20220806_02_T1_B3.TIF', '../data/LandsatScene_Berlin\\LC08_L1TP_193023_20220803_20220806_02_T1_B4.TIF', '../data/LandsatScene_Berlin\\LC08_L1TP_193023_20220803_20220806_02_T1_B5.TIF', '../data/LandsatScene_Berlin\\LC08_L1TP_193023_20220803_20220806_02_T1_B6.TIF', '../data/LandsatScene_Berlin\\LC08_L1TP_193023_20220803_20220806_02_T1_B7.TIF']
ndvi = es.normalized_diff(landsat_cropped_array[3], landsat_cropped_array[2])
## plot the result##
ep.plot_bands(
ndvi,
cmap="RdYlGn",
cols=1,
title="Landsat 8 - NDVI August, 3 2022",
vmin=-0.1,
vmax=1,
)
plt.show()
The NDVI was not able to detect water areas as no vegetation, resulting in artefacts. But the NDVI worked really well for detection urban areas. Even better would be a discrete color coding instead of a colorbar, to be able to further interpret the results.
Classify NDVI
We will classify the NDVI results into useful classes. Values under 0 will be classified together as no vegetation. Additional classes will be created for bare soil and sparse, moderate, and dense vegetation (see table above). See a tutorial on the method here.
First, we will define our discrete bins.
# Create classes and apply to NDVI results
ndvi_class_bins = [0, 0.03, 0.3, 0.5, np.inf]
ndvi_landsat_class = np.digitize(ndvi, ndvi_class_bins)
# Apply the nodata mask to the newly classified NDVI data
ndvi_landsat_class = np.ma.masked_where(np.ma.getmask(ndvi), ndvi_landsat_class)
np.unique(ndvi_landsat_class)
masked_array(data=[1, 2, 3, 4], mask=False, fill_value=999999, dtype=int64)
Now, we will define the classes and the colors for plotting.
from matplotlib.colors import ListedColormap
# Define color map
nbr_colors = ["gray", "yellow", "yellowgreen", "darkgreen"]
nbr_cmap = ListedColormap(nbr_colors)
# Define class names
ndvi_names = [
"Bare Soil",
"Sparse Vegetation",
"Moderate Vegetation",
"Dense Vegetation",
]
# Get list of classes
classes = np.unique(ndvi_landsat_class)
classes = classes.tolist()
# The mask returns a value of none in the classes. remove that
classes = classes[0:5]
# Plot your data
fig, ax = plt.subplots(figsize=(12, 12))
image = ax.imshow(ndvi_landsat_class, cmap=nbr_cmap)
ep.draw_legend(im_ax=image, classes=classes, titles=ndvi_names)
ax.set_title(
"Landsat 8 - (NDVI) Classes, August, 3 2022",
fontsize=14,
)
ax.set_axis_off()
plt.tight_layout()
plt.show()
Does the classified image meet your expectations? We can see that artifacts still exist, but the classification worked quiet well. But we can clearly see that our algorithm was not able to detect water surfaces. A more detailed classification using more precisely divided bins would yield in more sophisticated results. However, with the NDVI we can only very roughly determine the quality of the vegetation cover. As we could see, his method does not work if we want to classify other land covers.
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.