Data structures¶

After we get to know about the built-in datatypes in Python, we want to introduce the most important data structures in Python in the following chapter. Data structures are in contrast to datatypes ordered arrangement and combination based on the built-in datatypes. These structures are used to store more complex data. Therefore they are essential for data science and statistical purposes. Lists, tuples and dictionaries will be discussed. The concept of indexing and slicing is essential for all of these data containers, as described in the previous chapter.

Note: Per definition, every object is also a data structure.

Note: It is helpful to imagine data structures as a collection of things.

Lists¶

Lists are probably the handiest and most flexible type of container. A list consists of individual elements of potentially different data types. Lists are declared with square brackets []. Individual elements of a list can be selected using the syntax <list_name>[<index>].

Note: Also in lists the index of the first element starts with 0.

Let's have a look at an example. Therefore we want to define a list that contains different data types:

In [1]:
a_list = ["blueberry", "strawberry", "pineapple", 1, True]
type(a_list)
Out[1]:
list

To access the first element:

In [2]:
a_list[0]
Out[2]:
'blueberry'

To select the last element out of the list:

In [3]:
a_list[-1]
Out[3]:
True

We could compare the datatypes with each other:

In [4]:
type(a_list[0]) == type(a_list[-1])
Out[4]:
False

As you can see, also if all elements are stored in the same list, the data types of the individual elements could vary:

In [5]:
type(a_list[0])
Out[5]:
str
In [6]:
type(a_list[-1])
Out[6]:
bool

To invert the order of a list you can easily use the :: operator:

In [7]:
a_list[::-1]
Out[7]:
[True, 1, 'pineapple', 'strawberry', 'blueberry']

To add an element to a list that is already defined, you do not have to initialise the list again. You add an element at the end of a list by using the <list>.append(<element>) method:

In [8]:
a_list
Out[8]:
['blueberry', 'strawberry', 'pineapple', 1, True]
In [9]:
len(a_list)
Out[9]:
5
In [10]:
a_list.append("a new thing")
In [11]:
a_list
Out[11]:
['blueberry', 'strawberry', 'pineapple', 1, True, 'a new thing']
In [12]:
len(a_list)
Out[12]:
6

Sometimes it is useful to get the last element of a list and delete it after you read it. This is done by making use of the <list>.pop() method:

In [13]:
a_list.pop()
a_list.pop()
a_list.pop()
Out[13]:
1
In [14]:
a_list
Out[14]:
['blueberry', 'strawberry', 'pineapple']
In [15]:
len(a_list)
Out[15]:
3

If your list contains only elements of the same data type you can use the <list>.sort() method:

In [16]:
a_list.sort()

To sort your list decreasing, the <list>.reverse() method is used:

In [17]:
a_list.reverse()
In [18]:
a_list
Out[18]:
['strawberry', 'pineapple', 'blueberry']

Note: There are some more methods that are associated with the list object. A full overview is given in the Python documentation.

Tuples¶

We won't say a whole lot about tuples except to mention that they basically work just like lists, with two major exceptions:

  1. You declare tuples using () instead of []
  2. Once you make a tuple, you can't change what's in it (referred to as immutable)

You'll see tuples come up throughout the Python language, and over time you'll develop a feel for when to use them.

In general, they're often used instead of lists:

  1. to group items when the position in the collection is critical, such as coord = (x, y)
  2. when you want to prevent accidental modification of the items, e.g. shape = (12, 23)
  3. when we need a hashable object (as key in a mapping/dict) (explained later)
In [19]:
a_tuple = (1,2,3,4,5)
type(a_tuple)
Out[19]:
tuple

If you want to change an element inside of a tuple, you will get an according error message:

In [20]:
a_tuple[2] = 15
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [20], in <cell line: 1>()
----> 1 a_tuple[2] = 15

TypeError: 'tuple' object does not support item assignment

Dictionaries¶

At last we want to introduce the data structure of dictonaries. This data structure is widely used in Python because dictonaries are enormously fast if you want to get elements of out a huge data. Another difference to the list object is that dictonaries organize their data with the help of key value pairs. That means the index which is used in lists is replaced by keys. To access an element you have to know the associated key:

key $\to$ value

Let's have a look at an example:

In [21]:
my_dict = {"Marry" : 22 , "Frank" : 33 }

Note: Dictonaries are defined with the {} brackets. Every key is intended to be a string. Therefore you need to put the keys in "". The associated value is assigned by the use of the : operator.

To print the content of a dictonary just type it's name:

In [22]:
my_dict
Out[22]:
{'Marry': 22, 'Frank': 33}

To select a specific element / value of a dictonary the [<key>] brackets are used:

In [23]:
my_dict["Marry"]
Out[23]:
22
In [24]:
my_dict["Frank"]
Out[24]:
33

To add a new key-value pair to an already defined dictonary:

In [25]:
my_dict["Anne"] = 13
In [26]:
my_dict
Out[26]:
{'Marry': 22, 'Frank': 33, 'Anne': 13}

If you try to access a key that not exist you will get an error message. You could implement a custom error message by:

In [27]:
my_dict.get("Heidi", "Danger no entry found!")
Out[27]:
'Danger no entry found!'

To retrieve a list of all keys of a dictonary the <dictonary>.keys() method is used:

In [28]:
my_dict.keys()
Out[28]:
dict_keys(['Marry', 'Frank', 'Anne'])

Accordingly you could ask for a list of all elements in a dictonary by make use of the <dictonary>.values() method:

In [29]:
my_dict.values()
Out[29]:
dict_values([22, 33, 13])

Citation

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.