Data Structures in Python Get proficient with lists — Python’s most versatile data structure

 This article is about lists. They are the most versatile and resourceful, in-built data structure in Python. They can simultaneously hold heterogeneous data i.e., integers, floats, strings, NaN, Booleans, functions, etc. within the same list. They are an ordered sequence of items that means the order of the elements is preserved while accessing lists. They are mutable i.e., you can change (add, delete, modify) any item in the list. They can hold duplicate items unlike “sets” — another data structure in Python.

After reading this article, you will gain a clear understanding and ability to work at an advanced level with Python lists.

I will cover the following topics:

  • Creating a list and adding elements
  • Accessing list elements
  • Removing list elements
  • Inserting elements
  • List arithmetic
  • Reversing a list
  • Sorting a list
  • Index of an item
  • Counting item frequency in a list
  • List comprehensions
  • Copying a list
  • Nested lists

1) Creating a list and adding elements

First, we initialize an empty list called “data”. The list is created using square brackets. Naturally, the length of an empty list is zero.

data = []len(data)
>>> 0
# Check the type of the variable 'data'
type(data)
>>> list

Now let’s add our very first element to this list. This is done using the append() function. You will notice that its length now becomes one.

data.append(100)data 
>>> [100]
len(data)
>>> 1

Let’s add a second element. It will be appended (added) at the end of the list. Similarly, you can append as many elements as you want.

data.append(200)
data
>>> [100, 200]
len(data)
>>> 2
data.append(300)
data
>>> [100, 200, 300]

The append function is useful when you do not know beforehand how many elements will be in your list. For example, to store the number of people entering a shop every hour, you need to append the number of customers on an hourly basis. However, if you just hosted an exam, you know exactly how many students wrote the exam. Now, if you want to store their grades in a list, instead of appending, you can just initialize your list altogether.

grades = [70, 100, 97, 67, 85]len(grades)
>>> 5

Don’t worry! You can still add more elements to your already initialized list using append. Just simply use data.append(80) to add the grades of a sixth student afterward and it will be appended at the end of the list.

How to add the marks of two or more students at once?

Suppose you want to append marks of three students simultaneously. You cannot use grades.append(99, 100, 95) because “append” takes exactly one argument. You will have to use the “append” function three times.

Rather than appending three times, you can use extend() in such cases. You need to put the three elements in a tuple form (an iterable).

Note that you cannot use extend for appending a single element i.e., data.extend((90)) won't work.

grades = [70, 100, 97, 67, 85]
grades.extend((99, 100, 95))
print (grades)
>>> [70, 100, 97, 67, 85, 99, 100, 95]

Now you will ask, “Why can’t we append three grades at once?”

You can, but there is a catch. As shown below, the three grades inserted together show up as a list inside the main list. Such lists are called “Nested Lists”. I will show more examples in the last section of this article.

grades = [70, 100, 97, 67, 85]
grades.append([99, 100, 95])
print (grades)
>>> [70, 100, 97, 67, 85, [99, 100, 95]] # A nested list

2) Accessing list elements

If you are working with data structures, it is very useful to know the concept of indexing. You can think of indexing as a serial number assigned to each element of the list. Simply put, it is similar to your roll numbers in a class.

The most important thing to know is that indexing in Python starts at 0.

So, the first element will have an index of 0, the second element will have an index of 1, and so on. In a list of five elements, the last (fifth) element will have an index value of 4.

grades = [70, 100, 97, 67, 85]# First element (index 0)
grades[0]
>>> 70
# Second element (index 1)
grades[1]
>>> 100
# Last element (index 4)
grades[4]
>>> 85

Do not cross the limits. If you try to use an index value that is greater than the length of the list, you will get an IndexError. So, in a list of 5 elements, you cannot use an index of 5 (since it refers to the sixth element).

grades[5]
-----------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-29-d8836f1h2p9> in <module>
----> 1 data[5]
IndexError: list index out of range

Accessing multiple elements of the list

If you want the first three elements, you can access them using slicing. The general format is list[start_index:end_index]. The tricky part here is that this notation will return the values until the index value of end_index - 1. It means, to get the first three elements of the list that has indices 0, 1, and 2, you need the following way.

grades = [70, 100, 97, 67, 85]
grades[0:3]
>>> [70, 100, 97]

If you want to only access the first element having index 0, you can get it using the slicing notation.

grades[0:1]
>>> [70]

Reverse Indexing

It’s time you should also know the helpful concept of what I call “Reverse indexing” or “Negative indexing”.

To access the last element, you first need to know the length of the list. So, in a list of 5 elements, you need to use index 4 (= 5–1 because indexing starts at 0). So, data[4] will be your last element. Similarly, the second last element will be data[3], and so on.

As you can see, this computation is cumbersome. Negative indexing will help you here. Simply count the number of elements from the end. The last element cannot be at index -0 as there is no such number. Hence, you need to access it using the index value of -1. Similarly, the second last element can be accessed using the index value of -2, and so on.

grades = [70, 100, 97, 67, 85]
grades[-1]
>>> 85
grades[-2]
>>> 67

“Can the negative indices be used to access the last 3 elements?”.

Yes. You need to specify the starting negative indexing. To get all the elements from the third last element onwards, you don’t need to specify the end index.

grades = [70, 100, 97, 67, 85]
grades[-3:]
>>> [97, 67, 85]

However, suppose you want to get the third last and the second last element but not the last element. You can restrict the end index as:

grades[-3:-1]
>>> [97, 67]

2.1) Accessing elements at regular intervals

So far you learned how to access either a single element or several consecutive elements. Suppose you want to get every n-th item from the list. The general syntax to do so islist[start_index : stop_index : step].

Example: If you want to access every second element starting from the first element until the seventh element, i.e., from [1, 2, 3, 4, 5, 6, 7], you need [1, 3, 5, 7].

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
data[0:7:2]
>>> [1, 3, 5, 7]

If you want every second element from the whole list starting from the first element, you can skip both, the start_index and the stop_index.

data[::2]
>>> [1, 3, 5, 7, 9]

If you want every second element from the whole list starting from the second element, use the following code.

data[1::2]
>>> [2, 4, 6, 8, 10]

Note: You can also access the whole list using data[::1] because this will return every element from the start until the end.

Traversing a list backward at regular interval
You need to use a negative step value. To get every element starting from the end i.e., the reverse of the whole list, use the following code.

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
data[::-1]
>>> [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

To access every second element starting from the last, use the following code.

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
data[::-2]
>>> [10, 8, 6, 4, 2]

Let’s look at a more complex reverse traversing example. Suppose you want to start from the third last element, go until the fourth element, and select every second element. Your start index now becomes -3 and the stop index becomes 3 (index of the fourth element as indexing starts at 0).

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
data[-3:3:-2]
>>> [8, 6]

3) Removing list elements

There are three ways to remove elements from a list. All three methods perform the in-place deletion i.e., after the deletion, you do not need to reassign the list to a new variable.

a) del (in-built function) — Can delete multiple items at once
b) remove() (list’s method) — Can delete one item at a time
c) pop() (list’s method) — Can delete one item at a time

Let’s study them one by one.

a) del

While using del, you need to pass the index or a slice of indices of the elements to delete. You can use all the above-introduced concepts of indexing/slicing to delete the elements using del.

# Deleting first element
data = [79, 65, 100, 85, 94]
del data[0]
print (data)
>>> [65, 100, 85, 94]
############################################################## Deleting second last element
data = [79, 65, 100, 85, 94]
del data[-2]
print (data)
>>> [79, 65, 100, 94]
############################################################# # Deleting multiple consecutive elements using slice
data = [79, 65, 100, 85, 94]
del data[0:3]
print (data)
>>> [85, 94]
############################################################## # Deleting multiple elements at regular interval using slice
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
del data[1::2]
data
# [1, 3, 5, 7, 9]

b) remove()

This method is used to remove a specific element from a list. If an element appears more than once, then only the first occurrence of this element will be deleted. For example, in the list below, 1 appears 3 times. So using remove(1) would remove the very first value and keep the rest.

data = [1, 1, 4, 4, 3, 1, 3, 2, 4, 2]
data.remove(1)
print (data)
>>> [1, 4, 4, 3, 1, 3, 2, 4, 2]

BONUS: You can use a while loop to remove all the occurrences of 1.

data = [1, 1, 4, 4, 3, 1, 3, 2, 4, 2]
to_del = 1
while to_del in data:
data.remove(to_del)

print (data)
>>> [4, 4, 3, 3, 2, 4, 2]

c) pop( )

The syntax of this method is list.pop(i) that pops (deletes) the element at index ‘i’ from the list. The following code demonstrates how it works when applied successively to a list.

Note: If you don’t specify an index, the last element will be removed.

# Using pop(i) by specifying the index i
data = [1, 2, 3, 4, 5]
data.pop(0)
print (data)
>>> [2, 3, 4, 5]
data.pop(1)
print (data)
>>> [2, 4, 5]
data.pop(1)
print (data)
>>> [2, 5]
############################################################## Using pop() without specifying the index i
data = [1, 2, 3, 4, 5]
data.pop()
print (data)
>>> [1, 2, 3, 4]
data.pop()
print (data)
>>> [1, 2, 3]

4) Inserting elements

An element can be inserted at a specified location using the function list.insert(i, element). Here ‘i’ is the index of the existing element in the list before which you want to insert the element. As you saw earlier, the append() function inserts the element at the end of the list.

Note: This is an in-place operation so you don’t have to re-assign the list.

# Inserting value of 4 at the start, before the element at index 0
data = [1, 2, 3]
data.insert(0, 4)
print (data)
>>> [4, 1, 2, 3]
############################################################## Inserting value of 4 before the element at index 1
data = [1, 2, 3]
data.insert(1, 4)
print (data)
>>> [1, 4, 2, 3]

If you want to insert at the end of the list, i.e., kind of append, then simply use the length of the list as the place to insert.

data = [1, 2, 3]
data.insert(len(data), 4)
print (data)
>>> [1, 2, 3, 4]

5) List arithmetic

What happens when you add two or more lists? Suppose you have the following two lists.

list_A = [1, 2, 3, 4, 5]
list_B = [6, 7, 8, 9, 10]

If you add them, you would expect an element-wise addition of the two lists. However, you will get a single list with the elements of both the lists appended (concatenated) in the order of addition.

list_A + list_B 
>>> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

The order is important. For lists, A + B is not equal to B + A. So, if you reverse the order of addition, you will get a different result.

list_B + list_A
>>> [6, 7, 8, 9, 10, 1, 2, 3, 4, 5]

Can you subtract, multiply, or divide two lists?

No, you can’t. For example, if you try to subtract the above two lists, you will get a TypeError. Similarly, an error will be thrown for multiplication or division. Try it out to convince yourself.

If you multiply the list by a positive integer greater than 0, your list will be repeated (replicated) that many times. For example, multiplying a list by 3 repeats the list 3 times. Multiplication by a float number (3.0) yields an error. Multiplying the list by either 0 or a negative integer yields an empty list.

# Multiplication by positive integer
data = [1, 2, 3]
data * 3 # equivalent to data + data + data
>>> [1, 2, 3, 1, 2, 3, 1, 2, 3]
# Multiplication by 0
data * 0
>>> []

Note: An error will be reported if you try to multiply by [3] instead of 3.


6) Reversing a list

There are two ways to reverse a list.

a) Using slicing as [::-1]. This will not change the list in-place. You will have to re-assign the list to reflect the changes in the original list.

data = [1, 2, 3]
data[::-1]
>>> [3, 2, 1]
print (data)
>>> [1, 2, 3] # The original list does not change
# You have to re-assign the list after reversing
data = [1, 2, 3]
data = data[::-1] # Re-assign print (data)
>>> [3, 2, 1]

b) Using the list.reverse() function. Here, you do not need to reassign as the list is reversed in-place.

data = [1, 2, 3]
data.reverse() # reverses the list in-place print (data)
>>> [3, 2, 1] # The original list does not change

7) Sorting a list

There are two direct ways to sort a list.

a) Using the sorted() function. This will not sort the list in-place.
b) Using thelist.sort() function. This performs in-place sorting.

In both the functions, you can choose to sort in ascending or descending order by using the keyword “reverse”. If “reverse=True”, the list is sorted in descending order. By default, the list is sorted in ascending order.

# First method using sorted()
data = [7, 4, 1, 3, 8, 5, 9, 6, 2]
sorted(data, reverse=False) # Same as sorted(data) due to default
>>> [1, 2, 3, 4, 5, 6, 7, 8, 9]
############################################################## Second method using list.sort()
data = [7, 4, 1, 3, 8, 5, 9, 6, 2]
data.sort(reverse=True)
print (data)
>>> [1, 2, 3, 4, 5, 6, 7, 8, 9]

How are the strings sorted?

  • Strings of different lengths will be sorted alphabetically by default.
  • If strings have the same starting alphabet but in different cases, the uppercase gets preference.
  • If two or more strings have the same case first letter, they will be sorted alphabetically based on the second letter and so on.

Let us look at some string examples using the second method.

data = ['pineapple', 'kiwi', 'apple', 'azure', 'Apricot', 'mango']
data.sort() # in-place sorting
print (data)
>>> ['Apricot', 'apple', 'azure', 'kiwi', 'mango', 'pineapple']
#############################################################data = ['pineapple', 'kiwi', 'apple', 'Apricot', 'mango']
data.sort(reverse=True)
print (data)
>>> ['pineapple', 'mango', 'kiwi', 'apple', 'Apricot']

How to sort the strings based on their lengths?

You need to use the keyword key=len. If you want a descending order of lengths, use an additional keyword reverse=True.

data = ['pineapple', 'kiwi', 'apple', 'Apricot', 'orange', 'mango']
data.sort(key=len, reverse=True)
print (data)
>>> ['pineapple', 'Apricot', 'orange', 'apple', 'mango', 'kiwi']

8) Index of an item

If you want to get the index of an item in a given list, you can do so using the command list.index(item). It searches for the item in the whole list. If the same item appears more than once, you will get only the index of its first occurrence.

grades = [70, 100, 97, 70, 85]
grades.index(100)
>>> 7
grades.index(70) # 70 appears twice at indices 0 and 3
>>> 0 # Only the first index returns

Suppose your list is quite large and you want to search for an element only in a particular subset of the list. You can specify the “start” and the “end” index for the subset.

grades = [70, 100, 97, 70, 85, 100, 400, 200, 32] # Search in the whole list
grades.index(100)
>>> 7
# Search in the partial list from index 3 until index 8grades.index(100, 3, 8)
>>> 5 # Now the index of the second 100 is returned

9) Counting item frequency in a list

You can count the frequency of a given item in a list using list.count(item). Let us consider the following example.

data = [6, 4, 1, 4, 4, 3, 4, 8, 5, 4, 6, 2, 6]
data.count(4)
>>> 5

Example usage of the function count()

Suppose you want to count and print the frequency of all the elements. For this, we first need the unique items in the list. I will use NumPy’s unique().

import numpy as npdata = [1, 1, 4, 4, 3, 1, 3, 2, 4, 2]for item in np.unique(data):
print("{} occurs {} times in the list"\
.format(item, data.count(item)))
>>> 1 occurs 3 times in the list
>>> 2 occurs 2 times in the list
>>> 3 occurs 2 times in the list
>>> 4 occurs 3 times in the list

10) List comprehensions

Suppose you want to compute the cube of numbers from 0 to 5 and store them in a list. You first need to initialize an empty list, create a for loop, and then append the cubes of individual numbers to this list.

cubes = []
for i in range(6):
cubes.append(i**3)

print (cubes)
>>> [0, 1, 8, 27, 64, 125]

The above code is too much for such a simple task, right? Well, you can simply things using “List Comprehensions” as shown below.

cubes = [i**3 for i in range(6)]
print (cubes)
>>> [0, 1, 8, 27, 64, 125]

11) Copying a list

Suppose you have a list called ‘list_A’ and you assign this list to another list called ‘list_B’. If you delete an element from ‘list_A’, you would expect that ‘list_B’ will not change. The code below shows that this is not the case. Deleting an element from ‘list_A’ also removed it from ‘list_B’.

list_A = [1, 2, 3, 4, 5]
list_B = list_A
del list_A[0] # Delete an element from list_Aprint (list_A, list_B)
# [2, 3, 4, 5] [2, 3, 4, 5]

Why does list_B get affected?

It is because when you write list_A = list_B, you create a reference to ‘list_A’. Hence, the changes in ‘list_A’ will also be reflected in ‘list_B’.

How to avoid changes in list_B?

The answer is to create a shallow copy. I will explain two ways to do it.

a) Using list.copy()
b) Using list[:]

The following example shows that now deleting an element from ‘list_A’ does not affect the shallow copy i.e., the ‘list_B’.

# First method using list.copy()
list_A = [1, 2, 3, 4, 5]
list_B = list_A.copy()
del list_A[0] # Delete an element from list_A
print (list_A, list_B)
# [2, 3, 4, 5] [1, 2, 3, 4, 5]
#############################################################
# Second method using list[:]
list_A = [1, 2, 3, 4, 5]
list_B = list_A[:]
del list_A[0] # Delete an element from list_A
print (list_A, list_B)
# [2, 3, 4, 5] [1, 2, 3, 4, 5]

12) Nested lists

A list that contains another sublist as an element is called a nested list. The element sublist can contain further sublists. The elements inside the sublists can also be accessed using indexing and slicing. Let us consider the following example.

data = [1, 2, 3, [4, 5, 6]]data[2]   # Single element
>>> 3
data[3] # Sublist
>>> [4, 5, 6]

Now the question is, “How to access the elements of the sublist?”. You can access them using double indices. For example, data[3] returns the sublist. So the first element of this sublist can be accessed using data[3][0].

data[3][0]
>>> 4
data[3][1]
>>> 5
data[3][2]
>>> 6

Now consider the following list that has a sublist within the sublist. The length of the list is 4 where the first 3 elements are 1, 2, and 3, and the last element is [4, 5, 6, [7, 8, 9]]. The length of the last element, which is a sublist, is 4. This way, you keep going deeper and deeper into the nested lists.

To access the elements of the sublists, you need to use double indices, triple indices, etc. as exemplified below.

data = [1, 2, 3, [4, 5, 6, [7, 8, 9]]] # A nested list# Length of the list
len(data)
>>> 4
# The last element
data[3]
>>> [4, 5, 6, [7, 8, 9]]
# Length of the last element
len(data[3])
>>> 4
#############################################################
# Accessing the elements of the first sublist
data[3][1] # Double indices
>>> 5
data[3][3] # Double indices
>>> [7, 8, 9]
#############################################################
# Accessing the elements of the second sublist
data[3][3][0] # Triple indices
>>> 7
data[3][3][-1] # Triple indices
>>> 9

This brings me to the end of this article. I covered a majority of operations related to lists and the reader is now expected to have gained improved familiarity with lists in Python. If you are interested in learning about the new features in the upcoming version 3.10 of Python and in Matplotlib 3.0, refer to the following posts.

Comments

Popular posts from this blog

Flutter for Single-Page Scrollable Websites with Navigator 2.0

A Data Science Portfolio is More Valuable than a Resume

Better File Storage in Oracle Cloud