A Guide to Python Itertools Like No Other

Crystalise your understanding of this amazing library through animated GIFs and learn how to write more elegant code

Introduction

itertools is a built-in module in Python for handling iterables. It provides a number of fast, memory-efficient way of looping through iterables to achieve different desired results. It is a powerful yet under-rated module that every data scientist should know in order to write clean, elegant and readable code in Python.

While there are plenty of resources about itertools and its functionalities, they often focus on the code, making it difficult for unfamiliar readers to immediately comprehend the inner-workings of each method. This article takes a different approach — we will walk you through each itertools method using animated GIFs to illustrate how they actually work. It is hoped that this guide can help you better visualise and appreciate how itertools can be used.

Note: Because we have taken this approach, many animated illustrations had been deliberately over-simplified so as to aid readers’ understanding. For example, if the output in the GIF shows as “ABC”, it does not mean that the code output is the string, “ABC”. Instead, it represents the code output, [('A', 'B', 'C')]. Also, itertools methods generally return a generator (which does not immediately display the resulting elements) as an output. However, in the GIFs, we have represented the output as what you’d get after the output is wrapped around the list() function.

With that said, let’s get into the action!

itertools.product()

itertools.product() is a type of combinatoric iterator that gives you the cartesian product of given lists of iterables. Whenever you have nested for-loops in your code, it is a good opportunity to use itertools.product().

To compute the product of an iterable with itself, you can specify the number of repetitions with the optional repeat argument.

itertools.permutations()

itertools.permutations() gives you all possible permutations of an iterable, i.e., all possible orderings with no repeated elements.

itertools.combinations()

For a given iterable, itertools.combinations() returns all possible combinations of length r with no repeated elements.

The GIF in Figure 3 assumes r=3 and therefore returns a unique combination of ('A','B','C'). If r=2 , itertools.combinations('ABC', 2) will return [('A','B'), ('A','C'),('B','C')].

itertools.combinations_with_replacement()

For a given iterable, itertools.combinations_with_replacement() returns all possible combinations of length r with each element allowed to be repeated more than once.

itertools.count()

itertools.count() returns evenly spaced values given an input number until infinity. Thus, it is known as an “infinite iterator”. By default, the values will be evenly spaced by 1 but this can be set with the step argument.

itertools.cycle()

itertools.cycle() is another infinite iterator that “cycles” through an iterable continuously, producing an infinite sequence.

itertools.repeat()

itertools.repeat() is the third type of infinite iterator that repeats an iterable over and over again, producing an infinite sequence, unless if the times is specified. For example, itertools.repeat('ABC', times=3) will yield ['ABC', 'ABC', 'ABC'].

itertools.accumulate()

itertools.accumulate() generates an iterator that accumulates the sums of each element in an iterable.

By default, it accumulates by addition or concatenation. You can also specify a custom function using the func argument that takes two arguments. For example, itertools.accumulate('ABCD', func=lambda x, y: y.lower()+x) will yield ['A', 'bA', 'cbA', 'dcbA'] .

itertools.chain()

itertools.chain() takes multiple iterables and chains them together to produce a single iterable.

A slight variation of this is itertools.chain.from_iterable() , which takes a single iterable of iterables and chains its individual elements together in an iterable. Hence, itertools.chain.from_iterable([‘ABC’, ‘DEF’]) will yield the same results as itertools.chain(‘ABC’, ‘DEF’), which is[‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’].

itertools.compress()

itertools.compress() filters an iterable based on another iterable of Boolean values (known as the “selector”). The resulting iterable will only consist of elements from input iterable whose positions correspond to True values of the selector.

itertools.dropwhile()

In itertools.dropwhile, you “drop” elements “while” a condition is True and “take” elements after the condition first becomes False.

For the example shown in Figure 10:

1st element: condition is True — drop
2nd element: condition is True — drop
3rd element: condition is False — keep all elements henceforth

itertools.takewhile()

itertools.takewhile() works in the opposite way — you “take” elements “while” a condition is True and “drop” elements after the condition first becomes False.

For the example shown in Figure 11:

1st element: condition is True — keep
2nd element: condition is True — keep
3rd element: condition is False — drop all elements henceforth

itertools.filterfalse()

itertools.filterfalse(), as its name suggests, only keeps elements of an input iterable if the condition is False.

itertools.starmap()

Typically, you can use map to map a function to an iterable, like a list. For example, map(lambda x: x*x, [1, 2, 3, 4]) will yield [1, 4, 9, 16]. However, if you have an iterable of iterables, like a list of tuples, and your function needs to use each element of the inner iterable as argument, you can use itertools.starmap() .

If you’re interested, check out the following article by

Indhumathy Chelliah

which breaks down the differences between map and starmap:

Exploring Map() vs. Starmap() in Python

Let’s learn about the differences

betterprogramming.pub

itertools.tee()

Given an iterable, itertools.tee() produces multiple independent iterators as specified by its n argument.

itertools.zip_longest()

The built-in zip() function takes in multiple iterables as arguments and returns an iterator, which we can use to generate series of tuples consisting of elements in each iterable. It requires the input iterables to be of equal length. For iterables of differing lengths, zip() will result in some loss of information. For example, zip(‘ABCD’, ‘12’) will return [(‘A’, ‘1’), (‘B’, ‘2’)] only.

itertools.zip_longest() mitigates this limitation. It behaves exactly the same way as zip() , except that it “zips” based on the longest input iterable. By default, unmatched elements are filled with None , unless otherwise specified using the fillvalue argument.

Figure 16: Animated illustration of `itertools.tee()`

itertools.pairwise()

Newly introduced in Python 3.10, itertools.pairwise() generates successive overlapping pairs from an input iterable. This is useful if you have an iterable such as a list or a string, and you want to iterate over it with a rolling window of two elements.

Here’s a bonus! If you’re not using Python 3.10 (yet), you can define your own pairwise function (credits: Rodrigo).

>>> from itertools import tee
>>> def pairwise(it):
>>>    """Mimicks `itertools.pairwise()` method in Python 3.10."""
>>>     prev_, next_ = tee(it, 2) # Split `it` into two iterables.
>>>     next(next_) # Advance once.
>>>     yield from zip(prev_, next_) # Yield the pairs.

itertools.groupby()

Given an input iterable, itertools.groupby() returns consecutive keys and the iterable of the corresponding groups.

By default, itertools.groupby() generates a break or new group every time the value of the key changes. For the example in Figure 17, it groups the single “A” (in green) as a separate group, rather than grouping the 4 “A”s together. If the desired behaviour is to group by unique elements in an iterable, then the input iterable will first need to be sorted.

itertools.islice()

itertools.islice() is an iterator that returns desired elements within an input iterable given the start, stop and step arguments.

You might be thinking, “The same can be done using regular index slicing!”. For example, ‘AAABBACCC’[1:8:2] will return ‘ABAC’. Well, turns out there are differences between itertools.islice() and regular index slicing:

Regular index slicing supports negative values for start, stop and step, but itertools.islice() does not.
Regular index slicing creates a new iterable, whereas itertools.islice() creates an interator that iterates over the existing iterable.
Because of previous reason, itertools.islice() is much more memory-efficient, especially for large iterables.

Conclusion

Congratulations on making it this far! That was plenty of GIFs, but I hope they have helped you gain a better appreciation of the amazing itertools library and that you’re on your way to writing elegant Python code!

If you’ve found this post useful, feel free to let me know in the comments. I welcome discussions, questions and constructive feedback too. Here are more related resources to further reinforce your understanding:

Before You Go

If you’re interested in similar content, feel free to check out my other articles listed below. Follow me on Medium or reach out to me via LinkedIn or Twitter. Have a great day!

Subrat's Technical Blog

Friday, August 12, 2022