Raster data is data represented as a grid of values. Each pixel value represents an area on the Earth’s surface and are thus georeferenced. The value of a pixel can be continuous (e.g. temperature, elevation,...) or categorical (e.g. land cover, land use,...).
Raster data sets are georeferenced, which means that each cell in the raster represents a certain area on Earth. The geo-referencing information is contained in the meta data, which contains information about the spatial extent and resolution. The spatial extent is the geographic area covered by the dataset. This is usually defined by a so-called bounding box, which stores the minimum and maximum of the x and y coordinates. The resolution of a raster represents the area on the earth that each pixel of the raster covers. Other attributes included in the metadata are the datum, projection, and additional parameters required to position the raster in the geographic space.
There are many different file formats for working with raster data. Here, we will use with the GeoTIFF
format, which has the file extension .tif
. A GeoTIFF
is a standard .tif
image format with additional spatial (georeferencing) information embedded in the file as tags within the meta data. These information include:
Further reading: If you want to learn more about raster data, click here!
There are several dedicated Python packages for spatial raster analysis, some of which are:
GDAL
and Rasterio
: reading, writing and working with raster datasetsNumPy
: fundamental for scientific computing, as it provides array (and therefore raster) calculationsEarthPy
: helpful when working with multiband rasterselevation
and richDEM
when working with digital evelation models (DEM)rasterstats
: summarizes geospatial raster datasets based on vector geometriesWhen to use which package mainly depends on your desired output and your personal skill level. GDAL
is a powerful library for reading, writing and warping raster datasets. But since it is written in C++ with bindings to other languages, it might not be as comfortable to use. If you are comfortable with the terminal, GDAL
might be a good option. A variety of geospatial libraries available in Python are built on GDAL
. One of them is rasterio
, which provides a more python-like interface and thus might be more comfortable to use. This is why you will be given an introduction to rasterio
in the upcoming chapters.
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.