3) Manipulating Arrays
copy
Make a copy of an existing array.
Assigning an array to a new variable name will point back to the original array. You need to be careful with this behaviour so you don’t unintentionally modify existing variables.
Consider this example. Although we modify a2
, the value of a1
also changes.
a1 = np.array([1,2,3])
a2 = a1a2[0] = 10
a1
#=> array([10, 2, 3])
Now compare that to this. We modify a2
but a1
does not change… because we made a copy!
a1 = np.array([1,2,3])
a2 = a1.copy()a2[0] = 10
a1
#=> array([1, 2, 3])
shape
Get the shape of an array.
Very useful when dealing with massive multi-dimensional arrays where it’s not possible to eyeball the dimensions.
a = np.array([[1,2],[3,4],[5,6]])
a.shape
#=> (3, 2)
reshape
Reshapes an array.
This is insanely useful and I can’t image using a library like Keras without it. Let’s walk through an example of creating and reshaping an array.
Generate an array.
a = np.array([[1,2],[3,4],[5,6]])
a
#=> array([[1, 2],
#=> [3, 4],
#=> [5, 6]])
Check it’s shape.
a.shape
#=> (3, 2)
Reshape the array from 3x3 to 2x3.
a.reshape(2,3)
#=> array([[1, 2, 3],
#=> [4, 5, 6]])
Flatten the array into 1 dimension.
a.reshape(6)
#=> array([1, 2, 3, 4, 5, 6])
Reshape the array into a 6x1 matrix.
a.reshape(6,1)
#=>array([[1],
#=> [2],
#=> [3],
#=> [4],
#=> [5],
#=> [6]])
Reshape the array into 3 dimensions, 2x3x1.
a.reshape(2,3,1)
#=> array([[[1],
#=> [2],
#=> [3]],
#=>
#=> [[4],
#=> [5],
#=> [6]]])
resize
Similar to reshape
but it mutates the original array.
a = np.array([['a','b'],['c','d']])
a
#=>array([['a', 'b'],
#=> ['c', 'd']], dtype='<U1')a.reshape(1,4)
#=> array([['a', 'b', 'c', 'd']], dtype='<U1')a
#=>array([['a', 'b'],
#=> ['c', 'd']], dtype='<U1')a.resize(1,4)
a
#=> array([['a', 'b', 'c', 'd']], dtype='<U1')
Notice how calling reshape
didn’t change a
, but calling resize
permanently changed its shape.
transpose
Transposes an array.
Can we useful for swapping rows and columns before generating a pandas data frame or doing aggregate calculations like count or sum.
a = np.array([['s','t','u'],['x','y','z']])
a
#=> array([['s', 't', 'u'],
#=> ['x', 'y', 'z']], dtype='<U1')a.T
#=> array([['s', 'x'],
#=> ['t', 'y'],
#=> ['u', 'z']], dtype='<U1')
Notice how everything has been flipped over the diagonal axis between s
and z
.
flatten
Flattens an array into 1 dimension and returns a copy.
This achieves the same result as reshape(6)
below. But flatten
can be useful when you don’t know the size of an array in advance.
a = np.array([[1,2,3],['a','b','c']])
a.flatten()
#=> array(['1', '2', '3', 'a', 'b', 'c'], dtype='<U21')a.reshape(6)
#=> array(['1', '2', '3', 'a', 'b', 'c'], dtype='<U21')
ravel
Flattens an array-like object into 1 dimension. Similar to flatten
but it returns a view of an array instead of a copy.
The big benefit though is that it can be used on non-arrays like lists, where flatten
would fail.
np.ravel([[1,2,3],[4,5,6]])
#=> array([1, 2, 3, 4, 5, 6])np.flatten([[1,2,3],[4,5,6]])
#=> AttributeError: module 'numpy' has no attribute 'flatten'
hsplit
Horizontally splits an array into subarrays.
You can imagine this like splitting each column in a matrix into its own array.
Useful in ML for splitting out time-series data if each column describes an object, and each row is a time period for those objects.
a = np.array(
[[1,2,3],
[4,5,6]])
a
#=> array([[1, 2, 3],
#=> [4, 5, 6]])np.hsplit(a,3)# #=> [array([[1],[4]]),
# #=> array([[2],[5]]),
# #=> array([[3],[6]])]
vsplit
Vertically splits an array into subarrays.
You can imagine this as splitting off each row into its own column.
Useful in ML if each row represents an object and each column is a different feature of those objects.
a = np.array(
[[1,2,3],
[4,5,6]])
a
#=> array([[1, 2, 3],
#=> [4, 5, 6]])np.vsplit(a,2)#=> [array([[1, 2, 3]]),
#=> array([[4, 5, 6]])]
stack
Joins arrays on an axis.
This is essentially the opposite of vsplit
and hsplit
in that it combines separate arrays into a single array.
Along axis=0
a = np.array(['a', 'b', 'c'])
b = np.array(['d', 'e', 'f'])np.stack((a, b), axis=0)
#=> array([['a', 'b', 'c'],
#=> ['d', 'e', 'f']], dtype='<U1')
Along axis=1
a = np.array(['a', 'b', 'c'])
b = np.array(['d', 'e', 'f'])np.stack((a, b), axis=1)
#=> array([['a', 'd'],
#=> ['b', 'e'],
#=> ['c', 'f']], dtype='<U1')