Python Container Data Types You Should Know
’m sure that you must know the basic collection data types in Python, such as list, tuple and dictionary. There are too many resources online regarding these data structures already. However, have you noticed that there are 6 “high-level” data structure tools in the collection
module that is built-in Python?
- Named Tuple
- Ordered Dict (Ordered Dictionary)
- Chain Map
- Counter
- Deque (Double-Ended Queue)
Don’t be scared by their names. I promise that these are something you are already familiar with, but provide you with some extremely convenient features out-of-the-box.
Let’s walk through these container data types to see what they are and what they can do. For convenience purposes, all the demonstration code are supposing all the collection types are imported.
from collections import *
1. Named Tuple
A tuple is an important sequence data type in Python. As long as you have ever used Python, you should know it already. However, what is the “Named Tuple”?
Suppose we are developing an application that needs to use coordinates (latitude and longitude), which is the two decimal numbers to represent a place on the earth that we usually see on the Google Map. It is naturally can be represented in a tuple as follows.
c1 = (-37.814288, 144.963122)
However, if we’re dealing with coordinates all over the world, sometimes it might not be easy to identify which number is latitude or longitude. This could result in extra difficulty of the code readability.
Rather than the values only, named tuples assign meaningful names to each position in a tuple and allow for more readable, self-documenting code. They can be used wherever regular tuples are used, and they add the ability to access fields by name instead of position index.
Before using the named tuple, we can define it as follows.
Coordinate = namedtuple('Coordinate', ['latitude', 'longitude'])
Then, we can use the defined named tuple to define coordinate now.
c1 = Coordinate(-37.814288, 144.963122)
It is not only for the readability but also the convenience of usage, such as accessing the values by names.
print(f'The latitude is {c1.latitude} and the longitude is {c1.longitude}')
If we want to get the field names, we can simply call its _fields()
function.
c1._fields()
You may start to think this somehow overlaps with the class and dictionary. However, this is much simpler and neater than defining a class if you don’t need any class methods. Also, if necessary, you can easily convert a named tuple to a dictionary anytime.
c1._asdict()
Hold on, what is the OrderedDict
? It is indeed a dictionary, but a little bit different. Please refer to the next section.
2. Ordered Dict (Ordered Dictionary)
An ordered dictionary is a sub-class of the dictionary that inherits everything from it. The only difference is that the items in an ordered dictionary are “order sensitive”.
We have got an ordered dictionary from the previous section already. Let’s keep using this as an example.
od = c1._asdict()
Because it inherits everything from a normal dictionary, we can expect it has all the features that a normal dictionary should have, such as accessing value by key.
print(f"The latitude is {od['latitude']} and the longitude is {od['longitude']}")
However, because it is order-sensitive, it has some particular features that a normal dictionary wouldn’t have. For example, we can change the order of the items by calling move_to_end()
function.
od.move_to_end('latitude')
So, the latitude was moved to the end of the ordered dictionary.
Also, we can pop the last item out of the ordered dictionary.
lat = od.popitem()
An ordered dictionary could be very useful in some circumstances. For example, we can use it to memorise the order of the keys that were last inserted.
3. Chain Map
Next, let’s have a look at the Chain Map. A Chain Map is very useful when we want to combine multiple dictionaries together as a whole, but without physically combining them that may consume more resource and have to resolve key conflicts when there are duplicated keys.
Suppose we are developing an application that relies on some configurations. We define the system default configurations in the app, while users are allowed to pass some specific settings to overwrite the default ones. Just make up the example as follows.
usr_config = {'name': 'Chris', 'language': 'Python'}
sys_config = {'name': 'admin', 'language': 'Shell Script', 'editor': 'vm'}
What would you do? Write a for-loop to update the sys_config
based on the usr_config
? What if there are hundreds of items, or there are multiple layers rather than only two of them? It is quite common we have multi-layer configurations such as user-level > application-level > system-level and so on.
Use Chain Map can solve this problem on the fly.
cm = ChainMap(usr_config, sys_config)
From the output, it looks like the Chain map just simply put the dictionaries together. In fact, there is some magic behind it. If we try to convert it into a list, we can see that there are only 3 keys. Indeed, there are 3 unique keys out of the 5.
What if we try to access the value of the “name” key?
Let’s also try the “editor” key.
OK. The magic is that the usr_config
always overwrites the settings in the sys_config
. However, if the key we are accessing is not defined in usr_config
, the one in the sys_config
will be used. That is exactly what we want.
What if we want to update the key “editor” in the Chain Map?
It can be seen that the usr_config
is actually updated. This makes sense because it will overwrite the same item in the sys_config
. Of course, if we delete the “editor” key, it will be deleted from the usr_config
and the default one in the sys_config
will be used again.
del cm['editor']
cm['editor']
See, using a container type in Python correctly can save us a huge amount of time!
4. Counter
The next one is “Counter”. It doesn’t sound like a container type, but it is kind of similar to a dictionary in terms of its presentation. However, it is more like a tool for “counting problems”.
Suppose we have a list with many items in it. Some items are identical and we want to count the number of repeated times for each of them. The list is as follows.
my_list = ['a', 'b', 'c', 'a', 'c', 'a', 'a', 'a', 'c', 'b', 'c', 'c', 'b', 'b', 'c']
Then, we can use Counter to perform this task very easily.
counter = Counter()for letter in my_list:
counter[letter] += 1
It tells us there are 5 “a”, 4 “b” and 6 “c” in the list.
Counter also provides a number of convenient features that are related. For example, we can get the “n” most common ones.
counter.most_common(2)
We can still get all the elements back into a list.
list(counter.elements())
Also, we can define a Counter on the fly.
another_counter = Counter(a=1, b=4, c=3)
When we have two counters, we can even perform operations between them.
counter - another_counter
Finally, if we want to know the total number, we can always sum them up.
sum(counter.values())
Don’t look down at such a small tool in Python. In some circumstances, it can simplify problems to a very large extent. If you are interested in the recipes of this tool, please keep an eye on my updates.
5. Deque (Double-Ended Queue)
If you have a computer science background, you must know many common data structures such as queue and stack. Their difference is FIFO (first in, first out) and LIFO (last in, first out).
There is another data structure manipulation called deque, which is an abbreviation of the double-ended queue. It is implemented in Python and ready to be used out-of-the-box.
Let’s define a deque first.
dq = deque('bcd')
Because it is a “double-ended” queue, we can append either from the left or the right side.
dq.append('e')
dq.appendleft('a')
We can also append multiple elements at one time use the extend()
or extendleft()
function. Please be noticed that the order when we append to the left, you will understand why “210” became “012”. Just thinking we are appending them one by one on the left side.
dq.extend('fgh')
dq.extendleft('210')
There are also some very useful manipulations that are particularly in a deque structure, such as rotation. It rotates the element from the right end to the left end or the other way around. Please be noticed that the argument of the rotate()
function can be any integers.
dq.rotate()
dq.rotate(-1)
Of course, we can let the element “out” of either end of the deque.
dq.pop()
dq.popleft()
6. Default Dict
Finally, the default dictionary is a bit difficult to understand from its name. However, it doesn’t prevent it becomes a useful tool. Of course, after you really understand it :)
A default dictionary is a sub-class of a dictionary. The default does not mean default values but “default factory”. Default factory indicates the default data type that the dictionary will be constructed. The most important one, the default dictionary is used for collecting objects (of the default data type) based on some common keys.
Don’t be confused. Let me show you an example. Suppose we have a list of names as follows.
my_list = ['Alice', 'Bob', 'Chris', 'Bill', 'Ashley', 'Anna']
What we want to do is to collect the names with the same starting letters together in a list. For example, ['Alice', 'Ashley', 'Anna']
should be one of the lists because they are all start with “A”.
In this case, we want the value to be “list”. So, the default factory will be “list”.
dd = defaultdict(list)for name in my_list:
dd[name[0]].append(name)dd.items()
We have used the default dictionary to separate the names very easily! Then, of course, we can use it as a normal dictionary to get the values.
print(f'''
Names start with "A":
{dd["A"]}Names start with "B":
{dd["B"]}Names start with "C":
{dd["C"]}
''')
The default dictionary is very flexible because the default factory can be any data types. For example, we can define the default factory as integer and use it for counting the number of names for each starting letter.
dd = defaultdict(int)for name in my_list:
dd[name[0]] += 1dd.items()
We have re-invented the wheel that the Counter does. Take it easy, this is just an example :)
Summary
In this article, I have introduced 6 container types in the collection
module of Python. The named tuple helps us to write more readable code, the ordered dictionary helps us to define an item order-sensitive dictionary, the chain map helps us to define a multi-layered dictionary, the counter helps us to count everything easily, the deque defines a double-ended queue and finally, the default dictionary helps us to collect objects based on some common keys.
Life is short, I use Python!
Comments