3) Manipulating Arrays
copy
Make a copy of an existing array.
Assigning an array to a new variable name will point back to the original array. You need to be careful with this behaviour so you don’t unintentionally modify existing variables.
Consider this example. Although we modify a2, the value of a1 also changes.
a1 = np.array([1,2,3])
a2 = a1a2[0] = 10
a1
#=> array([10, 2, 3])
Now compare that to this. We modify a2 but a1 does not change… because we made a copy!
a1 = np.array([1,2,3])
a2 = a1.copy()a2[0] = 10
a1
#=> array([1, 2, 3])
shape
Get the shape of an array.
Very useful when dealing with massive multi-dimensional arrays where it’s not possible to eyeball the dimensions.
a = np.array([[1,2],[3,4],[5,6]])
a.shape
#=> (3, 2)
reshape
Reshapes an array.
This is insanely useful and I can’t image using a library like Keras without it. Let’s walk through an example of creating and reshaping an array.
Generate an array.
a = np.array([[1,2],[3,4],[5,6]])
a
#=> array([[1, 2],
#=> [3, 4],
#=> [5, 6]])
Check it’s shape.
a.shape
#=> (3, 2)
Reshape the array from 3x3 to 2x3.
a.reshape(2,3)
#=> array([[1, 2, 3],
#=> [4, 5, 6]])
Flatten the array into 1 dimension.
a.reshape(6)
#=> array([1, 2, 3, 4, 5, 6])
Reshape the array into a 6x1 matrix.
a.reshape(6,1)
#=>array([[1],
#=> [2],
#=> [3],
#=> [4],
#=> [5],
#=> [6]])
Reshape the array into 3 dimensions, 2x3x1.
a.reshape(2,3,1)
#=> array([[[1],
#=> [2],
#=> [3]],
#=>
#=> [[4],
#=> [5],
#=> [6]]])
resize
Similar to reshape but it mutates the original array.
a = np.array([['a','b'],['c','d']])
a
#=>array([['a', 'b'],
#=> ['c', 'd']], dtype='<U1')a.reshape(1,4)
#=> array([['a', 'b', 'c', 'd']], dtype='<U1')a
#=>array([['a', 'b'],
#=> ['c', 'd']], dtype='<U1')a.resize(1,4)
a
#=> array([['a', 'b', 'c', 'd']], dtype='<U1')
Notice how calling reshape didn’t change a, but calling resize permanently changed its shape.
transpose
Transposes an array.
Can we useful for swapping rows and columns before generating a pandas data frame or doing aggregate calculations like count or sum.
a = np.array([['s','t','u'],['x','y','z']])
a
#=> array([['s', 't', 'u'],
#=> ['x', 'y', 'z']], dtype='<U1')a.T
#=> array([['s', 'x'],
#=> ['t', 'y'],
#=> ['u', 'z']], dtype='<U1')
Notice how everything has been flipped over the diagonal axis between s and z .
flatten
Flattens an array into 1 dimension and returns a copy.
This achieves the same result as reshape(6) below. But flatten can be useful when you don’t know the size of an array in advance.
a = np.array([[1,2,3],['a','b','c']])
a.flatten()
#=> array(['1', '2', '3', 'a', 'b', 'c'], dtype='<U21')a.reshape(6)
#=> array(['1', '2', '3', 'a', 'b', 'c'], dtype='<U21')
ravel
Flattens an array-like object into 1 dimension. Similar to flatten but it returns a view of an array instead of a copy.
The big benefit though is that it can be used on non-arrays like lists, where flatten would fail.
np.ravel([[1,2,3],[4,5,6]])
#=> array([1, 2, 3, 4, 5, 6])np.flatten([[1,2,3],[4,5,6]])
#=> AttributeError: module 'numpy' has no attribute 'flatten'
hsplit
Horizontally splits an array into subarrays.
You can imagine this like splitting each column in a matrix into its own array.
Useful in ML for splitting out time-series data if each column describes an object, and each row is a time period for those objects.
a = np.array(
[[1,2,3],
[4,5,6]])
a
#=> array([[1, 2, 3],
#=> [4, 5, 6]])np.hsplit(a,3)# #=> [array([[1],[4]]),
# #=> array([[2],[5]]),
# #=> array([[3],[6]])]
vsplit
Vertically splits an array into subarrays.
You can imagine this as splitting off each row into its own column.
Useful in ML if each row represents an object and each column is a different feature of those objects.
a = np.array(
[[1,2,3],
[4,5,6]])
a
#=> array([[1, 2, 3],
#=> [4, 5, 6]])np.vsplit(a,2)#=> [array([[1, 2, 3]]),
#=> array([[4, 5, 6]])]
stack
Joins arrays on an axis.
This is essentially the opposite of vsplit and hsplit in that it combines separate arrays into a single array.
Along axis=0
a = np.array(['a', 'b', 'c'])
b = np.array(['d', 'e', 'f'])np.stack((a, b), axis=0)
#=> array([['a', 'b', 'c'],
#=> ['d', 'e', 'f']], dtype='<U1')
Along axis=1
a = np.array(['a', 'b', 'c'])
b = np.array(['d', 'e', 'f'])np.stack((a, b), axis=1)
#=> array([['a', 'd'],
#=> ['b', 'e'],
#=> ['c', 'f']], dtype='<U1')