I once walked into a company completely unprepared as a data scientist. While I expected to be training models, my role turned out to be software engineering and the app made the heaviest use of numpy I’d ever seen.

While I’d used np.array() to convert a list to an array many times, I wasn’t prepared for line after line of linspace, meshgrid and vsplit.

I needed to get comfortable with numpy fast if I was going to be able to read and write code.

This is curated list of numpy array functions and examples I’ve built for myself.

We’ll cover background info on Arrays in the first section, then get to the advanced functions that will help you become faster working with data.

Table of Contents:
1. Array Overview
2. Generating Arrays
3. Manipulating Arrays

1) Array Overview

What are Arrays?

Array’s are a data structure for storing homogeneous data. That mean’s all elements are the same type.

Numpy’s Array class is ndarray, meaning “N-dimensional array”.

import numpy as nparr = np.array([[1,2],[3,4]])
type(arr)#=> numpy.ndarray

It’s n-dimensional because it allows creating almost infinitely dimensional arrays depending on the shape you pass on initializing it.

For example: np.zeros((2)) generates a 1D array. np.zeros((2,2)) generates a 2D array. np.zeros((2,2,2)) generates a 3D array. np.zeros((2,2,2,2)) generates a 4D array. And so on…

np.zeros((2))
#=> array([0., 0.])np.zeros((2,2))
#=> array([[0., 0.],
#=>        [0., 0.]])np.zeros((2,2,2))
#=> array([[[0., 0.],
#=>         [0., 0.]],
#=> 
#=>        [[0., 0.],
#=>         [0., 0.]]])
...

Arrays vs Lists

Arrays use less memory than lists
Arrays have significantly more functionality
Arrays require data to be homogeneous; lists do not
Arithmetic on arrays operates like matrix multiplication

Important Parameters

shape: a tuple representing dimensions of an array. An array of shape (2,3,2) is a 2x3x2 dimension array. And looks like below.

np.zeros((2,3,2))#=> array([[[0., 0.],
#=>         [0., 0.],
#=>         [0., 0.]],
#=> 
#=>        [[0., 0.],
#=>         [0., 0.],
#=>         [0., 0.]]])

dtype: the type of value stored in an array. Array’s are homogenious so we can’t mix multiple data types like strings and integers. The value of dtype can be np.float64, np.int8, int, str or one of several other types.

2) Generating Arrays

zeros

Generate an array of zeros with a specified shape.

This is useful when you want to initialize weights in an ML model to 0 before beginning training. This is also often used to initialize an array with a specific shape and then overwrite it with your own values.

np.zeros((2,3))
#=> array([[0., 0., 0.],
#=>        [0., 0., 0.]])

ones

Generate an array of ones with a specified shape.

Useful if you need to initialize values to 1 before incrementally subtracting from them.

np.ones((2,3))
#=> array([[1., 1., 1.],
#=>        [1., 1., 1.]])

empty

np.empty() is a little different than zeros and ones, as it doesn’t preset any values in the array. Some people say it’s slightly faster to initialize but that’s negligible.

This is sometimes used when initializing an array in advance of filling it with data for the sake of readable code.

arr = np.empty((2,2))
arr
#=> array([[1.00000000e+000, 1.49166815e-154],
#=>        [4.44659081e-323, 0.00000000e+000]])

full

Initialize an array with a given value.

Below we initialize an array with 10. And then another array with ['a','b'] pairs.

np.full((3,2), 10)
#=> array([[10, 10],
#=>        [10, 10],
#=>        [10, 10]])np.full((3,2), ['a','b'])
#=> array([['a', 'b'],
#=>        ['a', 'b'],
#=>        ['a', 'b']], dtype='<U1')

array

This is probably what you’ve seen the most in real life. It initializes an array from an “array-like” object.

Useful if you’re storing data in another data structure but need to convert it into a numpy object so it can be passed to sklearn.

li = ['a','b','c']
np.array(li)#=> array(['a', 'b', 'c'], dtype='<U1')

Note: np.array also has a parameter called copy which you can set to True to guarantee a new array object is generated rather than pointing to an existing object.

_like

There are several _like functions corresponding to the functions we’ve discussed: empty_like, ones_like, zeros_like and full_like.

They generate an array with the same shape as the passed-in array but with their own values. So ones_like generates an array of ones, but you pass it an existing array and it takes the shape of that, instead of you specifying the shape directly.

a1 = np.array([[1,2],[3,4]])
#=> array([[1, 2],
#=>        [3, 4]])np.ones_like(a1)
#=> array([[1, 1],
#=>        [1, 1]])

Notice how the 2nd array of 1’s took on the shape of the first array.

rand

Generate an array with random values.

This is useful when you want to initialize pre-trained weights in a model to random values, which is likely more often than initializing them to zero.

np.random.rand(3,2)
#=> array([[0.94664048, 0.76616114],
#=>        [0.395549  , 0.84680126],
#=>        [0.42873   , 0.77736086]])

asarray

np.asarray is a wrapper around np.array, which sets the parameter copy=False. See np.array above.

arange

Generates an array of values with a set interval between an upper and lower limit. It’s numpy’s version of list(range(50,60,2)) with lists.

Below we generate an array of every second value between 50 and 60.

np.arange(50,60,2)
#=> array([50, 52, 54, 56, 58])

linspace

Generates an array of numbers with equal intervals between 2 other numbers. Instead of specifying the interval directly like arange, we specify how many numbers to generate between the upper and lower limit.

Below we return an array of 6 numbers between 10 and 20, and 5 numbers between 0 and 2.

np.linspace(10, 20, 6)
#=> array([10., 12., 14., 16., 18., 20.])np.linspace(0, 2, 5)
#=> array([0. , 0.5, 1. , 1.5, 2. ])

Notice how we specify the number of elements in the array instead of stating the interval itself.

meshgrid

Generates a matrix of coordinates based on 2 input arrays.

This can be a little tricky to wrap your head around. So let’s walk through an example. Generate 2 arrays and pass those to np.meshgrid.

x = np.array([1,2,3])
y = np.array([-3,-2,-1])
 
xcors, ycors = np.meshgrid(x, y) xcors
#=> [[1 2 3]
#=> [1 2 3]
#=> [1 2 3]]ycors
#=> [[-3 -3 -3]
#=> [-2 -2 -2]
#=> [-1 -1 -1]]

Here we can see 2 different matrices outputted, based on the values and shape of inputted arrays.

But don’t imagine this as 2 separate matrices. Those are actually pairs of (x,y) coordinates representing points in a plane. I’ve combined them below.

[[(1, -3), (2, -3), (3, -3)]
 [(1, -2), (2, -2), (3, -2)],
 [(1, -1), (2, -1), (3, -1)]]

3) Manipulating Arrays

copy

Make a copy of an existing array.

Assigning an array to a new variable name will point back to the original array. You need to be careful with this behaviour so you don’t unintentionally modify existing variables.

Consider this example. Although we modify a2, the value of a1 also changes.

a1 = np.array([1,2,3])
a2 = a1a2[0] = 10
a1
#=> array([10,  2,  3])

Now compare that to this. We modify a2 but a1 does not change… because we made a copy!

a1 = np.array([1,2,3])
a2 = a1.copy()a2[0] = 10
a1
#=> array([1, 2, 3])

shape

Get the shape of an array.

Very useful when dealing with massive multi-dimensional arrays where it’s not possible to eyeball the dimensions.

a = np.array([[1,2],[3,4],[5,6]])
a.shape
#=> (3, 2)

reshape

Reshapes an array.

This is insanely useful and I can’t image using a library like Keras without it. Let’s walk through an example of creating and reshaping an array.

Generate an array.

a = np.array([[1,2],[3,4],[5,6]])
a
#=> array([[1, 2],
#=>        [3, 4],
#=>        [5, 6]])

Check it’s shape.

a.shape
#=> (3, 2)

Reshape the array from 3x3 to 2x3.

a.reshape(2,3)
#=> array([[1, 2, 3],
#=>        [4, 5, 6]])

Flatten the array into 1 dimension.

a.reshape(6)
#=> array([1, 2, 3, 4, 5, 6])

Reshape the array into a 6x1 matrix.

a.reshape(6,1)
#=>array([[1],
#=>       [2],
#=>       [3],
#=>       [4],
#=>       [5],
#=>       [6]])

Reshape the array into 3 dimensions, 2x3x1.

a.reshape(2,3,1)
#=> array([[[1],
#=>         [2],
#=>         [3]],
#=> 
#=>        [[4],
#=>         [5],
#=>         [6]]])

resize

Similar to reshape but it mutates the original array.

a = np.array([['a','b'],['c','d']])
a
#=>array([['a', 'b'],
#=>       ['c', 'd']], dtype='<U1')a.reshape(1,4)
#=> array([['a', 'b', 'c', 'd']], dtype='<U1')a
#=>array([['a', 'b'],
#=>       ['c', 'd']], dtype='<U1')a.resize(1,4)
a
#=> array([['a', 'b', 'c', 'd']], dtype='<U1')

Notice how calling reshape didn’t change a, but calling resize permanently changed its shape.

transpose

Transposes an array.

Can we useful for swapping rows and columns before generating a pandas data frame or doing aggregate calculations like count or sum.

a = np.array([['s','t','u'],['x','y','z']])
a
#=> array([['s', 't', 'u'],
#=>        ['x', 'y', 'z']], dtype='<U1')a.T
#=> array([['s', 'x'],
#=>        ['t', 'y'],
#=>        ['u', 'z']], dtype='<U1')

Notice how everything has been flipped over the diagonal axis between s and z .

flatten

Flattens an array into 1 dimension and returns a copy.

This achieves the same result as reshape(6) below. But flatten can be useful when you don’t know the size of an array in advance.

a = np.array([[1,2,3],['a','b','c']])
a.flatten()
#=> array(['1', '2', '3', 'a', 'b', 'c'], dtype='<U21')a.reshape(6)
#=> array(['1', '2', '3', 'a', 'b', 'c'], dtype='<U21')

ravel

Flattens an array-like object into 1 dimension. Similar to flatten but it returns a view of an array instead of a copy.

The big benefit though is that it can be used on non-arrays like lists, where flatten would fail.

np.ravel([[1,2,3],[4,5,6]])
#=> array([1, 2, 3, 4, 5, 6])np.flatten([[1,2,3],[4,5,6]])
#=> AttributeError: module 'numpy' has no attribute 'flatten'

hsplit

Horizontally splits an array into subarrays.

You can imagine this like splitting each column in a matrix into its own array.

Useful in ML for splitting out time-series data if each column describes an object, and each row is a time period for those objects.

a = np.array(
    [[1,2,3],
     [4,5,6]])
a
#=> array([[1, 2, 3],
#=>        [4, 5, 6]])np.hsplit(a,3)# #=> [array([[1],[4]]), 
# #=>  array([[2],[5]]), 
# #=>  array([[3],[6]])]

vsplit

Vertically splits an array into subarrays.

You can imagine this as splitting off each row into its own column.

Useful in ML if each row represents an object and each column is a different feature of those objects.

a = np.array(
    [[1,2,3],
     [4,5,6]])
a
#=> array([[1, 2, 3],
#=>        [4, 5, 6]])np.vsplit(a,2)#=> [array([[1, 2, 3]]), 
#=> array([[4, 5, 6]])]

stack

Joins arrays on an axis.

This is essentially the opposite of vsplit and hsplit in that it combines separate arrays into a single array.

Along axis=0

a = np.array(['a', 'b', 'c'])
b = np.array(['d', 'e', 'f'])np.stack((a, b), axis=0)
#=> array([['a', 'b', 'c'],
#=>       ['d', 'e', 'f']], dtype='<U1')

Along axis=1

a = np.array(['a', 'b', 'c'])
b = np.array(['d', 'e', 'f'])np.stack((a, b), axis=1)
#=> array([['a', 'd'],
#=>        ['b', 'e'],
#=>        ['c', 'f']], dtype='<U1')

Conclusion

I consider this the basics of numpy. You’ll come across these functions repeatedly when reading existing code at work or doing tutorials online.

Comfort with the above means you won’t get stuck understanding how meshgrid is used to generate a matplotlib chart. Or how to quickly add a dimension so your data conforms with input requirements to a Keras model.

Are there any numpy functions you can’t live without?

Subrat's Technical Blog

Saturday, April 11, 2020

Numpy Array Cookbook: Generating and Manipulating Arrays in Python