Friday, August 28, 2020

8 Python Iteration Skills That Data Scientists Shouldn’t Miss Out

 One major automatic operation that our program does is to repeat particular tasks for us. This is usually achieved using the for loop, the most basic form of which is the following:

for item in iterable:
# Your tasks go here

Theoretically, we can utilize the basic form to address all iteration-related needs, but in many cases, our code can become more concise if we take advantage of existing functionalities that Python has to offer. In this article, I’d like to review 8 useful techniques that we should consider when we conduct our data science projects.

To illustrate the usefulness of these techniques, I’ll contrast them with the code that only uses the most basic form. From these comparisons, you can see noticeable improvement in code readability.


1. Track Iteration With enumerate()

Suppose that we need to track the counting of the iteration. In other words, we want to know how many loops we have iterated. In this case, we should consider the enumerate() function.

Use enumerate()
  • To get the index of the item of the sequence, the basic way involves creating a range object, because the typical way (i.e., for item in iterable) doesn’t have index-related information. Although we can find the index using the index() method with a list, it returns the index of the first found element by default. Thus, when there are duplicate items, it will give unintended information.
  • The enumerate() function creates an enumerate object as an iterator. It can take an optional argument start, which specifies the start of the counter. By default, it starts the counting from 0. In our case, we starts to count the first rendered element from 1. As you can see, the enumerate() function directly gives us the counter and the element.

2. Pair Iterables With zip()

When we have a few iterables to begin with and need to retrieve items from each of these iterables at the same positions, we should consider the zip() function, as shown in this example.

Use zip()
  • To get the elements at the same index, we create the index by using the range() function, as we did in the previous section. It’s a little tedious to use the indexing to retrieve the element from each iterable.
  • The zip() function can join multiple iterables and in each loop, it produces a tuple object that comprise elements from each iterable at the same index. We can unpack the tuple object to retrieve the elements very conveniently. The code looks much cleaner, doesn’t it?
  • Another thing to note is that the zip() function will zip the iterables matching the shortest iterable among them. If you want the zipping matching the longest iterable, you should use zip_longest() function in the itertools library.

3. Reverse Iteration With reversed()

When you need to iterate a sequence of elements in the reverse order, it’s best to use the reversed() function. Suppose that students arrive at the classroom at slightly different times, you want to check their assignments using the reverse order — the first student that arrived gets checked last.

Use reversed()
  • If you stick with the range() function, you’ll use the reverse indexing of the sequence. In other words, we use -1 to refer to the last item of the list and so on.
  • Alternatively, we can reverse the list using [::-1] and then iterate the new created list object.
  • The best way to do is just simply use the reversed() function. It is a very flexible function, because it can take other sequence data, such as tuples and strings.

4. Filter Elements With filter()

You don’t always need to use all the items in the iterable. In these cases, we can usually check if items satisfy particular criteria before we apply the needed operations. Such condition evaluation and creation of the needed iterator can be easily integrated into one function call — filter(). Let’s see how it works in comparison to the typical way.

Use filter()
  • The typical way involves evaluating each element.
  • The filter() function will evaluate the elements and render the elements as an iterator at the same time. In other words, the function returns an iterator such that it can be used in the for loop.
  • Depending on your needs, you can consider other filter functions, such as filterfalse() in the itertools library, which does the opposite operation (i.e., keep those that evaluate False).

5. Chain Iterables With chain()

In a previous section, we’ve talked about how to work with multiple iterables using the zip() function, for which, you can think of that we concatenate iterables in the vertical direction. If you want to concatenate iterables head to tail, you should use the chain() function in the itertools library. Specifically, suppose that you have multiple iterables, you want to iterate each of them sequentially, which is a best use case of the chain() function.

Use chain()
  • The typical way involves concatenating the iterables manually, such as using an intermediate list. If you work with other iterables, such as dictionaries and sets, you need to know how to concatenate them.
  • The chain() function can chain any number of iterables and make another iterator that produces elements sequentially from each of the iterables. You don’t need to manage another temporary object that holds these elements.

6. Iterate Dictionaries

Dictionaries are a very common data type that stores data in the form of key-value pairs. Because of the implementation using hashes, it’s very fast to look up and retrieve items from dictionaries, and thus they’re the favorite data structure for many developers. The storage of key-value pairs gives us different options to iterate dictionaries.

Iterate Dictionaries
  • To iterate the keys, we’ll just use the keys() method on the dictionary object. Alternatively, we can just use the dictionary object itself as the iterable, which is just a syntactical sugar for the view object created by the keys() method.
  • To iterate the values, we’ll just use the values() method.
  • To iterate the items in the form of key-value pairs, we’ll use the items() method.
  • Notably, the objects created by these methods are dictionary view objects, which is pretty much like SQL views. In other words, these view objects will get updated when the dict object is updated, and a trivial example is shown below.
Dictionary View Object

7. Consider Comprehensions As Alternatives

If the purpose of the iteration is to create a new list, dictionary, or set object from the iterable, we should consider the comprehension technique, which is more performant and more concise.

Comprehensions
  • The list comprehension has the following format: [expr for item in iterable], which is the preferred way to create a list object compared to the for loop.
  • The dictionary comprehension has the following format: {key_expr: value_expr for item in iterable}. Similarly, it’s the preferred way to create a dict object from an iterable.
  • The set comprehension has the following format: {expr for item in iterable}, which is the preferred way to create a set object from an iterable compared to the for loop.

8. Consider the else Clause

The last but not the least is the consideration of using the else clause in the for loop. It should be noted that it’s not the most intuitive technique to use, as many people don’t even know the existence of the else clause with the for loop. The following case shows you a trivial example.

Else Clause in For Loop

Unlike some people that have mistakenly thought, the code in the else clause will run following the for loop in regular situations. However, if execution encounters any break statement, the code in the else clause will be skipped. As shown in the first function call, the else clause didn’t execute.


Conclusions

In this article, we reviewed eight techniques that we can consider using in the for loop beyond its basic form. Applying these techniques can make you code much more concise and more performant.

Thanks for reading this piece.

No comments:

Must Watch YouTube Videos for Databricks Platform Administrators

  While written word is clearly the medium of choice for this platform, sometimes a picture or a video can be worth 1,000 words. Below are  ...