A Guide to Python Itertools Like No Other
Crystalise your understanding of this amazing library through animated GIFs and learn how to write more elegant code
Table of Contents
- Introduction
- itertools.product()
- itertools.permutations()
- itertools.combinations()
- itertools.combinations_with_replacement()
- itertools.count()
- itertools.cycle()
- itertools.repeat()
- itertools.accumulate()
- itertools.chain()
- itertools.compress()
- itertools.dropwhile()
- itertools.takewhile()
- itertools.filterfalse()
- itertools.starmap()
- itertools.tee()
- itertools.zip_longest()
- itertools.pairwise()
- itertools.groupby()
- itertools.islice()
- Conclusion
Introduction
itertools
is a built-in module in Python for handling iterables. It provides a number of fast, memory-efficient way of looping through iterables to achieve different desired results. It is a powerful yet under-rated module that every data scientist should know in order to write clean, elegant and readable code in Python.
While there are plenty of resources about itertools
and its functionalities, they often focus on the code, making it difficult for unfamiliar readers to immediately comprehend the inner-workings of each method. This article takes a different approach — we will walk you through each itertools
method using animated GIFs to illustrate how they actually work. It is hoped that this guide can help you better visualise and appreciate how itertools
can be used.
Note: Because we have taken this approach, many animated illustrations had been deliberately over-simplified so as to aid readers’ understanding. For example, if the output in the GIF shows as “ABC”, it does not mean that the code output is the string, “ABC”. Instead, it represents the code output, [('A', 'B', 'C')]
. Also, itertools
methods generally return a generator (which does not immediately display the resulting elements) as an output. However, in the GIFs, we have represented the output as what you’d get after the output is wrapped around the list()
function.
With that said, let’s get into the action!
itertools.product()
itertools.product()
is a type of combinatoric iterator that gives you the cartesian product of given lists of iterables. Whenever you have nested for-loops in your code, it is a good opportunity to use itertools.product()
.
To compute the product of an iterable with itself, you can specify the number of repetitions with the optional repeat
argument.
itertools.permutations()
itertools.permutations()
gives you all possible permutations of an iterable, i.e., all possible orderings with no repeated elements.
itertools.combinations()
For a given iterable, itertools.combinations()
returns all possible combinations of length r with no repeated elements.
The GIF in Figure 3 assumes r=3
and therefore returns a unique combination of ('A','B','C')
. If r=2
, itertools.combinations('ABC', 2)
will return [('A','B'), ('A','C'),('B','C')]
.
itertools.combinations_with_replacement()
For a given iterable, itertools.combinations_with_replacement()
returns all possible combinations of length r with each element allowed to be repeated more than once.
itertools.count()
itertools.count()
returns evenly spaced values given an input number until infinity. Thus, it is known as an “infinite iterator”. By default, the values will be evenly spaced by 1 but this can be set with the step
argument.
itertools.cycle()
itertools.cycle()
is another infinite iterator that “cycles” through an iterable continuously, producing an infinite sequence.
itertools.repeat()
itertools.repeat()
is the third type of infinite iterator that repeats an iterable over and over again, producing an infinite sequence, unless if the times
is specified. For example, itertools.repeat('ABC', times=3)
will yield ['ABC', 'ABC', 'ABC']
.
itertools.accumulate()
itertools.accumulate()
generates an iterator that accumulates the sums of each element in an iterable.
By default, it accumulates by addition or concatenation. You can also specify a custom function using the func
argument that takes two arguments. For example, itertools.accumulate('ABCD', func=lambda x, y: y.lower()+x)
will yield ['A', 'bA', 'cbA', 'dcbA']
.
itertools.chain()
itertools.chain()
takes multiple iterables and chains them together to produce a single iterable.
A slight variation of this is itertools.chain.from_iterable()
, which takes a single iterable of iterables and chains its individual elements together in an iterable. Hence, itertools.chain.from_iterable([‘ABC’, ‘DEF’])
will yield the same results as itertools.chain(‘ABC’, ‘DEF’)
, which is[‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’]
.
itertools.compress()
itertools.compress()
filters an iterable based on another iterable of Boolean values (known as the “selector”). The resulting iterable will only consist of elements from input iterable whose positions correspond to True
values of the selector.
itertools.dropwhile()
In itertools.dropwhile
, you “drop” elements “while” a condition is True
and “take” elements after the condition first becomes False
.
For the example shown in Figure 10:
- 1st element: condition is
True
— drop - 2nd element: condition is
True
— drop - 3rd element: condition is
False
— keep all elements henceforth
itertools.takewhile()
itertools.takewhile()
works in the opposite way — you “take” elements “while” a condition is True
and “drop” elements after the condition first becomes False
.
For the example shown in Figure 11:
- 1st element: condition is
True
— keep - 2nd element: condition is
True
— keep - 3rd element: condition is
False
— drop all elements henceforth
itertools.filterfalse()
itertools.filterfalse()
, as its name suggests, only keeps elements of an input iterable if the condition is False
.
itertools.starmap()
Typically, you can use map
to map a function to an iterable, like a list. For example, map(lambda x: x*x, [1, 2, 3, 4])
will yield [1, 4, 9, 16]
. However, if you have an iterable of iterables, like a list of tuples, and your function needs to use each element of the inner iterable as argument, you can use itertools.starmap()
.
If you’re interested, check out the following article by
which breaks down the differences betweenmap
and starmap
:itertools.tee()
Given an iterable, itertools.tee()
produces multiple independent iterators as specified by its n
argument.
itertools.zip_longest()
The built-in zip()
function takes in multiple iterables as arguments and returns an iterator, which we can use to generate series of tuples consisting of elements in each iterable. It requires the input iterables to be of equal length. For iterables of differing lengths, zip()
will result in some loss of information. For example, zip(‘ABCD’, ‘12’)
will return [(‘A’, ‘1’), (‘B’, ‘2’)]
only.
itertools.zip_longest()
mitigates this limitation. It behaves exactly the same way as zip()
, except that it “zips” based on the longest input iterable. By default, unmatched elements are filled with None
, unless otherwise specified using the fillvalue
argument.
itertools.pairwise()
Newly introduced in Python 3.10, itertools.pairwise()
generates successive overlapping pairs from an input iterable. This is useful if you have an iterable such as a list or a string, and you want to iterate over it with a rolling window of two elements.
Here’s a bonus! If you’re not using Python 3.10 (yet), you can define your own pairwise
function (credits: Rodrigo).
>>> from itertools import tee
>>> def pairwise(it):
>>> """Mimicks `itertools.pairwise()` method in Python 3.10."""
>>> prev_, next_ = tee(it, 2) # Split `it` into two iterables.
>>> next(next_) # Advance once.
>>> yield from zip(prev_, next_) # Yield the pairs.
itertools.groupby()
Given an input iterable, itertools.groupby()
returns consecutive keys and the iterable of the corresponding groups.
By default, itertools.groupby()
generates a break or new group every time the value of the key changes. For the example in Figure 17, it groups the single “A” (in green) as a separate group, rather than grouping the 4 “A”s together. If the desired behaviour is to group by unique elements in an iterable, then the input iterable will first need to be sorted.
itertools.islice()
itertools.islice()
is an iterator that returns desired elements within an input iterable given the start
, stop
and step
arguments.
You might be thinking, “The same can be done using regular index slicing!”. For example, ‘AAABBACCC’[1:8:2]
will return ‘ABAC’
. Well, turns out there are differences between itertools.islice()
and regular index slicing:
- Regular index slicing supports negative values for start, stop and step, but
itertools.islice()
does not. - Regular index slicing creates a new iterable, whereas
itertools.islice()
creates an interator that iterates over the existing iterable. - Because of previous reason,
itertools.islice()
is much more memory-efficient, especially for large iterables.
Conclusion
Congratulations on making it this far! That was plenty of GIFs, but I hope they have helped you gain a better appreciation of the amazing itertools
library and that you’re on your way to writing elegant Python code!
If you’ve found this post useful, feel free to let me know in the comments. I welcome discussions, questions and constructive feedback too. Here are more related resources to further reinforce your understanding:
No comments:
Post a Comment