8 Python Iteration Skills That Data Scientists Shouldn’t Miss Out
One major automatic operation that our program does is to repeat particular tasks for us. This is usually achieved using the for loop, the most basic form of which is the following:
for item in iterable:
# Your tasks go here
Theoretically, we can utilize the basic form to address all iteration-related needs, but in many cases, our code can become more concise if we take advantage of existing functionalities that Python has to offer. In this article, I’d like to review 8 useful techniques that we should consider when we conduct our data science projects.
To illustrate the usefulness of these techniques, I’ll contrast them with the code that only uses the most basic form. From these comparisons, you can see noticeable improvement in code readability.
1. Track Iteration With enumerate()
Suppose that we need to track the counting of the iteration. In other words, we want to know how many loops we have iterated. In this case, we should consider the enumerate()
function.
- To get the index of the item of the sequence, the basic way involves creating a range object, because the typical way (i.e.,
for item in iterable
) doesn’t have index-related information. Although we can find the index using theindex()
method with a list, it returns the index of the first found element by default. Thus, when there are duplicate items, it will give unintended information. - The
enumerate()
function creates an enumerate object as an iterator. It can take an optional argumentstart
, which specifies the start of the counter. By default, it starts the counting from 0. In our case, we starts to count the first rendered element from 1. As you can see, theenumerate()
function directly gives us the counter and the element.
2. Pair Iterables With zip()
When we have a few iterables to begin with and need to retrieve items from each of these iterables at the same positions, we should consider the zip()
function, as shown in this example.
- To get the elements at the same index, we create the index by using the
range()
function, as we did in the previous section. It’s a little tedious to use the indexing to retrieve the element from each iterable. - The
zip()
function can join multiple iterables and in each loop, it produces a tuple object that comprise elements from each iterable at the same index. We can unpack the tuple object to retrieve the elements very conveniently. The code looks much cleaner, doesn’t it? - Another thing to note is that the
zip()
function will zip the iterables matching the shortest iterable among them. If you want the zipping matching the longest iterable, you should usezip_longest()
function in the itertools library.
3. Reverse Iteration With reversed()
When you need to iterate a sequence of elements in the reverse order, it’s best to use the reversed()
function. Suppose that students arrive at the classroom at slightly different times, you want to check their assignments using the reverse order — the first student that arrived gets checked last.
- If you stick with the
range()
function, you’ll use the reverse indexing of the sequence. In other words, we use -1 to refer to the last item of the list and so on. - Alternatively, we can reverse the list using
[::-1]
and then iterate the new created list object. - The best way to do is just simply use the
reversed()
function. It is a very flexible function, because it can take other sequence data, such as tuples and strings.
4. Filter Elements With filter()
You don’t always need to use all the items in the iterable. In these cases, we can usually check if items satisfy particular criteria before we apply the needed operations. Such condition evaluation and creation of the needed iterator can be easily integrated into one function call — filter()
. Let’s see how it works in comparison to the typical way.
- The typical way involves evaluating each element.
- The
filter()
function will evaluate the elements and render the elements as an iterator at the same time. In other words, the function returns an iterator such that it can be used in the for loop. - Depending on your needs, you can consider other filter functions, such as
filterfalse()
in the itertools library, which does the opposite operation (i.e., keep those that evaluateFalse
).
5. Chain Iterables With chain()
In a previous section, we’ve talked about how to work with multiple iterables using the zip()
function, for which, you can think of that we concatenate iterables in the vertical direction. If you want to concatenate iterables head to tail, you should use the chain()
function in the itertools library. Specifically, suppose that you have multiple iterables, you want to iterate each of them sequentially, which is a best use case of the chain()
function.
- The typical way involves concatenating the iterables manually, such as using an intermediate list. If you work with other iterables, such as dictionaries and sets, you need to know how to concatenate them.
- The
chain()
function can chain any number of iterables and make another iterator that produces elements sequentially from each of the iterables. You don’t need to manage another temporary object that holds these elements.
6. Iterate Dictionaries
Dictionaries are a very common data type that stores data in the form of key-value pairs. Because of the implementation using hashes, it’s very fast to look up and retrieve items from dictionaries, and thus they’re the favorite data structure for many developers. The storage of key-value pairs gives us different options to iterate dictionaries.
- To iterate the keys, we’ll just use the
keys()
method on the dictionary object. Alternatively, we can just use the dictionary object itself as the iterable, which is just a syntactical sugar for the view object created by thekeys()
method. - To iterate the values, we’ll just use the
values()
method. - To iterate the items in the form of key-value pairs, we’ll use the
items()
method. - Notably, the objects created by these methods are dictionary view objects, which is pretty much like SQL views. In other words, these view objects will get updated when the dict object is updated, and a trivial example is shown below.
7. Consider Comprehensions As Alternatives
If the purpose of the iteration is to create a new list, dictionary, or set object from the iterable, we should consider the comprehension technique, which is more performant and more concise.
- The list comprehension has the following format:
[expr for item in iterable]
, which is the preferred way to create a list object compared to the for loop. - The dictionary comprehension has the following format:
{key_expr: value_expr for item in iterable}
. Similarly, it’s the preferred way to create a dict object from an iterable. - The set comprehension has the following format:
{expr for item in iterable}
, which is the preferred way to create a set object from an iterable compared to the for loop.
8. Consider the else Clause
The last but not the least is the consideration of using the else clause in the for loop. It should be noted that it’s not the most intuitive technique to use, as many people don’t even know the existence of the else clause with the for loop. The following case shows you a trivial example.
Unlike some people that have mistakenly thought, the code in the else clause will run following the for loop in regular situations. However, if execution encounters any break
statement, the code in the else clause will be skipped. As shown in the first function call, the else clause didn’t execute.
Conclusions
In this article, we reviewed eight techniques that we can consider using in the for loop beyond its basic form. Applying these techniques can make you code much more concise and more performant.
Thanks for reading this piece.
Comments