Lists vs. Arrays
We’re all familiar with the standard Python list — a mutable object that has great flexibility in that not all elements of the list need to be of a homogeneous data type. That is, you can have a list containing integers, strings, floats, and even other objects.
my_list = [2, {'dog': ['Rex', 3]}, 'John', 3.14]
The above is a perfectly valid list containing multiple data types as elements — even a dictionary which contains another list!
However, to support all these simultaneous data types, each Python list element must contain its own unique information. Each element acts as a pointer to a unique Python Object. Because of this inefficiency, it becomes much more taxing to use lists as they grow larger and larger.
>>>for element in my_list: print(type(element))<class 'int'> <class 'dict'> <class 'str'> <class 'float'>
With an array, we do away with the flexibility of lists and instead, we have a multidimensional table of elements of the same data type (generally integers).
This allows for much more efficient storage and manipulation of large data sets.
This allows for much more efficient storage and manipulation of large data sets.
Properties of Arrays
Each dimension of the NumPy array is known as an axis. For example, if we declared an array as follows we have an array on 2 axes:
>>> import numpy as np
>>> a = np.arange(10).reshape(2,5)
>>> a
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
The above code uses the arrange() function to create an array of range 10 and reshapes it to be a 2 x 5 array. This method is just the array version of the standard range() function.
You can see properties of your array uses the built-in array attributes shape, ndim, dtype, itemsize, and size.
>>> # Returns a tuple of the size of each dimension >>> a.shape (2, 5)>>> # Returns the number of axes of the array >>> a.ndim 2>>> # Returns the description of the data type of the array >>> a.dtype dtype('int32')>>> # Returns the byte size of each element in the array >>> a.itemsize 4>>> # Returns the total number of elements in the array >>> a.size 10
Creating Arrays
There is a whole host of ways you can create NumPy arrays from scratch — it will generally depend on your application as to which method to use, but some of the more common techniques are outlined below.
Note: the dtype parameter is optional if you want to specify the type explicitly, otherwise it will default to a type most appropriate to the data you pass at the time of creation.
>>> # Create an array of specified size filled with 0's >>> np.zeros((3,3), dtype=int) array([[0, 0, 0], [0, 0, 0], [0, 0, 0]])>>> # Create an array of specified size filled with the given value >>> np.full((4,2), 1.23) array([[1.23, 1.23], [1.23, 1.23], [1.23, 1.23], [1.23, 1.23]])>>> # Create a linear array with values from the arange() function >>> np.arange(10) array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])>>> # Create an array of N evenly spaced values between x and y >>> # np.linspace(x, y, N) array([0. , 0.11111111, 0.22222222, 0.33333333, 0.44444444, 0.55555556, 0.66666667, 0.77777778, 0.88888889, 1. ])>>> # Create an array of random values between 0 and 1 >>> np.random.random((2,2)) array([[0.90416154, 0.56633881], [0.09384551, 0.23539769]])>>> # Create an array of random integers between a given range >>> np.random.randint(0, 5, (2,2)) array([[4, 3], [3, 3]])
Reshaping Arrays
You have already seen us make use of reshape() method and it can be extremely useful for manipulating your arrays. For some functions such as arange() and linspace(), you can use reshape() to create an array in whatever size desired.
However, it is very important to note that to be able to reshape an array, the size of the new array must match that of the original array. For example, the following is not a valid reshape:
However, it is very important to note that to be able to reshape an array, the size of the new array must match that of the original array. For example, the following is not a valid reshape:
>>> b = np.arange(0, 6) >>> b array([0, 1, 2, 3, 4, 5]) >>> b.reshape((3,3))Traceback (most recent call last): File "<pyshell#40>", line 1, in <module> b.reshape((3,3)) ValueError: cannot reshape array of size 6 into shape (3,3)
We have 6 elements, but are trying to reshape into a 3 x 3, which would require 9 elements. We could, however, reshape into a 3 x 2:
>>> b.reshape((3,2))
array([[0, 1],
[2, 3],
[4, 5]])
We can also ‘flatten’ the multidimensional array into a single dimension through using the ravel() method:
>>> b.ravel()
array([0, 1, 2, 3, 4, 5])
Indexing/Slicing Arrays
Array indexing works very similarly to list indexing and slicing, we just need to be mindful of the dimensionality of our data.
For example, suppose we were working with the array as shown below:
>>> x = np.arange(12).reshape((4,3))
>>> x
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
We have a 4 x 3 containing 12 elements in total. To access an element, we need to do so using a tuple. So, if we wanted to return a value in our multidimensional array, we would do it as such:
>>> # Return an entire axis >>> x[0] array([0, 1, 2])>>> # Return a specific element >>> x[3][1] 10
We can use this to modify values in our array as well.
Note that if you try modifying the value to a data type different from your array, you can run into issues! Our float gets converted to an int when we try to modify an array of type int.
Note that if you try modifying the value to a data type different from your array, you can run into issues! Our float gets converted to an int when we try to modify an array of type int.
>>> # Modify a value using a correct data type >>> x[1][1] = 30 >>> x array([[ 0, 1, 2], [ 3, 30, 5], [ 6, 7, 8], [ 9, 10, 11]])>>> # Modify a value using an incorrect data type >>> x[0][1] = 3.14 >>> x array([[ 0, 3, 2], [ 3, 30, 5], [ 6, 7, 8], [ 9, 10, 11]])
For slicing, we again make use of the familiar list slicing notation, but when we deal with arrays of more than one dimension you separate each dimension with a comma.
We access the rows, and columns using the usual x[start:stop:step] format.
We access the rows, and columns using the usual x[start:stop:step] format.
>>> x array([[ 0, 3, 2], [ 3, 30, 5], [ 6, 7, 8], [ 9, 10, 11]])>>> # Return a slice of the first 2 rows and all columns >>> x[:2, :] array([[ 0, 3, 2], [ 3, 30, 5]])>>> # Return a slice of all rows up to the first column >>> x[:, :1] array([[0], [3], [6], [9]])>>> # Return every other row, every column >>> x[::2, :] array([[0, 3, 2], [6, 7, 8]])
Now that you have the basics of arrays under your belt, you’re well on your way to understanding one of the most powerful data structure in Python!
Next up, we’ll look at basic operations on arrays, so be sure to follow along with this series as it progresses.
Links to future stories will be added — along with an index page — as they are released.
Links to future stories will be added — along with an index page — as they are released.