Friday, August 12, 2022

A Guide to Python Itertools Like No Other

 

A Guide to Python Itertools Like No Other

Crystalise your understanding of this amazing library through animated GIFs and learn how to write more elegant code

Photo by Elena Rouame on Unsplash

Introduction

 is a built-in module in Python for handling iterables. It provides a number of fast, memory-efficient way of looping through iterables to achieve different desired results. It is a powerful yet under-rated module that every data scientist should know in order to write clean, elegant and readable code in Python.

While there are plenty of resources about  and its functionalities, they often focus on the code, making it difficult for unfamiliar readers to immediately comprehend the inner-workings of each method. This article takes a different approach — we will walk you through each  method using animated GIFs to illustrate how they actually work. It is hoped that this guide can help you better visualise and appreciate how  can be used.

Note: Because we have taken this approach, many animated illustrations had been deliberately over-simplified so as to aid readers’ understanding. For example, if the output in the GIF shows as “ABC”, it does not mean that the code output is the string, “ABC”. Instead, it represents the code output, . Also,  methods generally return a generator (which does not immediately display the resulting elements) as an output. However, in the GIFs, we have represented the output as what you’d get after the output is wrapped around the  function.

With that said, let’s get into the action!

itertools.product()

 is a type of combinatoric iterator that gives you the cartesian product of given lists of iterables. Whenever you have nested for-loops in your code, it is a good opportunity to use .

Figure 1: Animated illustration of `itertools.product()`

To compute the product of an iterable with itself, you can specify the number of repetitions with the optional  argument.

itertools.permutations()

 gives you all possible permutations of an iterable, i.e., all possible orderings with no repeated elements.

Figure 2: Animated illustration of `itertools.permutations()`

itertools.combinations()

For a given iterable,  returns all possible combinations of length r with no repeated elements.

Figure 3: Animated illustration of `itertools.combinations()`

The GIF in Figure 3 assumes  and therefore returns a unique combination of . If  ,  will return .

itertools.combinations_with_replacement()

For a given iterable,  returns all possible combinations of length r with each element allowed to be repeated more than once.

Figure 4: Animated illustration of `itertools.combinations_with_replacement()`

itertools.count()

 returns evenly spaced values given an input number until infinity. Thus, it is known as an “infinite iterator”. By default, the values will be evenly spaced by 1 but this can be set with the  argument.

Figure 5: Animated illustration of `itertools.count()`

itertools.cycle()

 is another infinite iterator that “cycles” through an iterable continuously, producing an infinite sequence.

Figure 6: Animated illustration of `itertools.cycle()`

itertools.repeat()

 is the third type of infinite iterator that repeats an iterable over and over again, producing an infinite sequence, unless if the  is specified. For example,  will yield .

Figure 7: Animated illustration of `itertools.repeat()`

itertools.accumulate()

 generates an iterator that accumulates the sums of each element in an iterable.

Figure 8: Animated illustration of `itertools.accumulate()`

By default, it accumulates by addition or concatenation. You can also specify a custom function using the  argument that takes two arguments. For example,  will yield  .

itertools.chain()

 takes multiple iterables and chains them together to produce a single iterable.

Figure 9: Animated illustration of `itertools.chain()`

A slight variation of this is  , which takes a single iterable of iterables and chains its individual elements together in an iterable. Hence,  will yield the same results as , which is.

itertools.compress()

 filters an iterable based on another iterable of Boolean values (known as the “selector”). The resulting iterable will only consist of elements from input iterable whose positions correspond to  values of the selector.

Figure 10: Animated illustration of `itertools.compress()`

itertools.dropwhile()

In , you “drop” elements “while” a condition is  and “take” elements after the condition first becomes .

Figure 11: Animated illustration of `itertools.dropwhile()`

For the example shown in Figure 10:

  • 1st element: condition is  — drop
  • 2nd element: condition is  — drop
  • 3rd element: condition is  — keep all elements henceforth

itertools.takewhile()

 works in the opposite way — you “take” elements “while” a condition is  and “drop” elements after the condition first becomes .

Figure 12: Animated illustration of `itertools.takewhile()`

For the example shown in Figure 11:

  • 1st element: condition is  — keep
  • 2nd element: condition is  — keep
  • 3rd element: condition is  — drop all elements henceforth

itertools.filterfalse()

, as its name suggests, only keeps elements of an input iterable if the condition is .

Figure 13: Animated illustration of `itertools.filterfalse()`

itertools.starmap()

Typically, you can use  to map a function to an iterable, like a list. For example,  will yield . However, if you have an iterable of iterables, like a list of tuples, and your function needs to use each element of the inner iterable as argument, you can use  .

Figure 14: Animated illustration of `itertools.starmap()`

If you’re interested, check out the following article by 

 which breaks down the differences between  and :

itertools.tee()

Given an iterable,  produces multiple independent iterators as specified by its  argument.

Figure 15: Animated illustration of `itertools.tee()`

itertools.zip_longest()

The built-in  function takes in multiple iterables as arguments and returns an iterator, which we can use to generate series of tuples consisting of elements in each iterable. It requires the input iterables to be of equal length. For iterables of differing lengths,  will result in some loss of information. For example,  will return  only.

 mitigates this limitation. It behaves exactly the same way as  , except that it “zips” based on the longest input iterable. By default, unmatched elements are filled with  , unless otherwise specified using the  argument.

Figure 16: Animated illustration of `itertools.tee()`

itertools.pairwise()

Newly introduced in Python 3.10,  generates successive overlapping pairs from an input iterable. This is useful if you have an iterable such as a list or a string, and you want to iterate over it with a rolling window of two elements.

Figure 17: Animated illustration of `itertools.pairwise()`

Here’s a bonus! If you’re not using Python 3.10 (yet), you can define your own  function (credits: Rodrigo).

>>> from itertools import tee
>>> def pairwise(it):
>>> """Mimicks `itertools.pairwise()` method in Python 3.10."""
>>> prev_, next_ = tee(it, 2) # Split `it` into two iterables.
>>> next(next_) # Advance once.
>>> yield from zip(prev_, next_) # Yield the pairs.

itertools.groupby()

Given an input iterable,  returns consecutive keys and the iterable of the corresponding groups.

Figure 18: Animated illustration of `itertools.groupby()`

By default,  generates a break or new group every time the value of the key changes. For the example in Figure 17, it groups the single “A” (in green) as a separate group, rather than grouping the 4 “A”s together. If the desired behaviour is to group by unique elements in an iterable, then the input iterable will first need to be sorted.

itertools.islice()

 is an iterator that returns desired elements within an input iterable given the and arguments.

Figure 19: Animated illustration of `itertools.islice()`

You might be thinking, “The same can be done using regular index slicing!”. For example,  will return . Well, turns out there are differences between  and regular index slicing:

  1. Regular index slicing supports negative values for start, stop and step, but  does not.
  2. Regular index slicing creates a new iterable, whereas  creates an interator that iterates over the existing iterable.
  3. Because of previous reason,  is much more memory-efficient, especially for large iterables.

Conclusion

Congratulations on making it this far! That was plenty of GIFs, but I hope they have helped you gain a better appreciation of the amazing  library and that you’re on your way to writing elegant Python code!

If you’ve found this post useful, feel free to let me know in the comments. I welcome discussions, questions and constructive feedback too. Here are more related resources to further reinforce your understanding:

  1. Official documentation of 
  2. Iterables vs Iterators in Python by 
  3. Advanced Python: Itertools Library — The Gem Of Python Language by 
  4. How — and why — you should use Python Generators by 

Before You Go

If you’re interested in similar content, feel free to check out my other articles listed below. Follow me on Medium or reach out to me via LinkedIn or Twitter. Have a great day!

No comments:

Must Watch YouTube Videos for Databricks Platform Administrators

  While written word is clearly the medium of choice for this platform, sometimes a picture or a video can be worth 1,000 words. Below are  ...