6 Costly Numpy Mistakes to Avoid in Python
Numpy
is one of the most central libraries
to Python
but we all make simple mistakes, or we even make those mistakes that we know we shouldn’t make but still haven’t really figured out how to broach the subject.
I’m a pretty average programmer
and even now, I still fumble around with many problems that I face in Python
. So with that in mind, I decided to write down what common problems we face and how to go about them.
It’s embarrassing to have lived through these: but here we go!
1: Lists or Numpy
Arrays?
When I started programming
, I couldn’t figure out the difference and only when I began to use matrix
methods did I really appreciate the difference here.
Simply, a List
is an ordered set of elements whereas, an array
is a grid of values and it contains information about the raw data, how to locate an element
, and how to interpret an element
.
Lists
are declared between by a pair of square brackets [‘this is a list’]
whereas a Numpy
Array
is defined a follows: np.array([1,2,3,4]).
They key difference between them is that numpy
data structures perform better as they require less size, they perform quicker than lists, and other libraries (e.g. scipy) have routines optimised for Numpy
Arrays
.
Also, the output is different. For example,
list1 = [1,2,3,4]
print(“List: “, list1)
import numpy as npa = np.array([1,2,3,4])
print(“Numpy Array: “,a)
Output is:
List: [1, 2, 3, 4]
Numpy Array: [1 2 3 4]
(Note the missing commas!)
2: Miscalculating Reshapes
Sometimes you need to reshape a matrix
and turn it into a vector
and other times, you need to do the opposite and turn a vector
back into a matrix
.
Where I usually mess up is that I make something algorithmic
here but my code would usually have a bug in which the reshaped
array
does not have the same number elements
as the original matrix
.
The following example shows how to reshape an array the right way.
import numpy as npa = np.array([1,3,5,7,9,11])
b = a.reshape(3,2)
print("a: ",a)
print("b: ",b)
The output is,
a: [ 1 3 5 7 9 11]
b:
[[ 1 3]
[ 5 7]
[ 9 11]]
3: Indexing Badly
Numpy
arrays
make it easy to index
but even then, I’ve still made so many stupid mistakes. Firstly, numpy
indices
start with 0
so make sure you’re doing that correctly. Also, the final index is not considered, but the index
before that is considered.
So an array
may be of length
79, but to index into the final item you’ll have to use the index 78 (as it starts at 0).
The following example is the correct way to create a new array from an existing array.
import numpy as np
a = np.array([1,3,5,7,9,11])
b = a[1:4]
print("Original Array, a: ", a)
print("New Array, b: ", b)
The output is,
Original Array, a: [ 1 3 5 7 9 11]
New Array, b: [3 5 7]
4: PATH variable issue
PATH
variables are two dime a dozen and usually take me all day to fix.
The usual scenario is:
import Numpy
and you’re returned:
Import error: No module named numpy
Despite having installed numpy
already (pip3 install numpy
) in your terminal
. Now this occurs when the PATH
variable is not set correctly.
If you see this error, then first check whether the PATH
variable is set correctly and try to fix it. I usually use PyCharms
inbuilt suite to install
libraries
but if all else fails, ask a software
engineering
friend. Keep them close!
5: Sorting
Another common mistake happens when you try to sort
an array. Sorting is super easy in Python
but I’ll often sort the wrong way and struggle to diagnose it till later.
You should always define whether you want to sort by ascending or descending. However, a Numpy
Array
can be sorted in many ways, not just ascending or descending way.
Naturally, you should think about which way you want to sort your array (given the problem you’re faced) but for the reader, some examples:
# Python program to demonstrate sorting in numpy
a = np.array([[1, 4, 2], [3, 4, 6], [0, -1, 5]])# sorted array
print("Array elements in sorted order:\n", np.sort(a, axis=None))# sort array row-wise
print("Row-wise sorted array:\n", np.sort(a, axis=1))# specify sort algorithm
print("Column wise sort by applying merge-sort:\n", np.sort(a, axis=0, kind='mergesort'))# Example to show sorting of structured array set alias names for dtypes
dtypes = [('name', 'S10'), ('grad_year', int), ('cgpa', float)]# Values to be put in array
values = [('Hrithik', 2009, 8.5), ('Ajay', 2008, 8.7), ('Pankaj', 2008, 7.9), ('Aakash', 2009, 9.0)]# Creating array
arr = np.array(values, dtype=dtypes)
print("\nArray sorted by names:\n", np.sort(arr, order='name'))
print("Array sorted by grauation year and then cgpa:\n", np.sort(arr, order=['grad_year', 'cgpa']))
The output is,
Array elements in sorted order:
[-1 0 1 2 3 4 4 5 6]Row-wise sorted array:
[[ 1 2 4]
[ 3 4 6]
[-1 0 5]]Column wise sort by applying merge-sort:
[[ 0 -1 2]
[ 1 4 5]
[ 3 4 6]]Array sorted by names:
[(b’Aakash’, 2009, 9. ) (b’Ajay’, 2008, 8.7) (b’Hrithik’, 2009, 8.5)(b’Pankaj’, 2008, 7.9)]Array sorted by grauation year and then cgpa:
[(b’Pankaj’, 2008, 7.9) (b’Ajay’, 2008, 8.7) (b’Hrithik’, 2009, 8.5)(b’Aakash’, 2009, 9. )]
6: Views vs Copies
This is quite a technical problem but really interesting. In the world of Python
and Numpy
, we have something called a view
and a copy
. A view
is an actual part of the original object
but a copy
is an entirely different object
.
When you look at a copy
: even though you’ve indexed
into the original object
, the compiler will make a copy of what you’ve selected and that’s what you’ll be seeing/using, but, it’ll be slower (as it takes a while for the compiler to copy the required part of the object
).
The following example should clarify this:
import numpy as np
a = np.random.randn(5,2)print("Array is: ", a)
av = a[:3, :]print(av.base is a)
print("av is a View and returns: ", av)ac = a[[0,1,2], :]print(ac.base is a)
print("ac is a Copy and returns: ", av)
The output is:
Array is: [[-9.04167793e-02 -9.86453934e-01]
[ 5.73769512e-01 1.56332206e+00]
[ 1.25860275e-01 -1.01739258e-03]
[-1.36741893e+00 5.46968242e-01]
[ 1.77061813e+00 1.19694848e+00]]Trueav is a View and returns: [[-9.04167793e-02 -9.86453934e-01]
[ 5.73769512e-01 1.56332206e+00]
[ 1.25860275e-01 -1.01739258e-03]]Falseac is a Copy and returns: [[-9.04167793e-02 -9.86453934e-01]
[ 5.73769512e-01 1.56332206e+00]
[ 1.25860275e-01 -1.01739258e-03]]
More information on this topic can be found here:
There are loads of ways to mess up your code by making silly mistakes but for what it’s worth, I’ve pretty much made them all. The final mistake is the one I’d say pay attention to because it can really slow down your programs.
The above mistakes are relatively simple but if you’re cognisant of them, you’ll spend much less time than I did debugging. I may have reinstalled python like 10 times with problem 4 above!
Good luck!
Hopefully you guys found this interesting and keep in touch!
WRITTEN BY
AI and ML. Helper at Towards Data Science. Formerly at Cambridge University ML. Get my Introduction To Data Science eBook for free!
A Medium publication sharing concepts, ideas, and codes.
Take the right action on your data, based on what the data really represent and not on what you think they are
In a previous article I showed how to create with IBM Cloud Pak for Data an automatic process to discover data and ingest them in a catalog while enforcing governance policies. One of the key elements of this process is the ability to recognize what kind of data are ingested. This is what is called Data Classification — not to be confused with classification in the ML context.
In this article I will go deeper in this particular topic and explain the concepts behind the data classification process as implemented in IBM Cloud Pak for Data or the IBM Information Server portfolio. …
Tricks only garnered from experience
When we think about Data
Science
, we have to separate our thought into two streams. There’s the academic
side of things, and then there’s the side which is pragmatic
and full of real life experience.
It’s not easy as well. There’s limited data
, sometimes it’s messy and also there’s often spurious
correlation
. So as much as people tell you there should be a relationship between X and Y, quite simply, often, there’s just not.
However, to really understand if a phenomenon exists, a few helpful tips and tricks can really push you in the right direction. …
They both use physics, it should be easy
Displaying a graph or a network in a way that is not a complete mess can be hard. You want the most connected nodes to be close to each other, and to avoid edges crossing unnecessarily. The idea of the force-directed approach is that, instead of using a set of rules or a complex algorithm, a good layout for the graph is achieved by making every node act as if it was an object in an environment where simple physical properties apply: 1- Things that are connected attract each other, and 2- things that are close to each other push each other away. In that sense, the force-directed layout modules of popular graph visualisation frameworks such as D3.js …
At the start of the Gartner Hype Cycle for a reason
The market for AI services is estimated to exceed 5.5 trillion dollars by 2027. A platform dominating this market could have almost unlimited growth potential. Millions of GPU hours are consumed every few days to train bigger and stronger AIs for the world. Simultaneously, collaboration and sharing of knowledge is achieved through the thousands of academic AI papers published every year. The sharing of the trained models though, is still in its infancy stage.
This is where AI Marketplaces start their career. Sharing is caring, and if AI developers and companies can turn a profit in the process, even better.
In this article, we will look at three promising solutions you have to know. …
Explainable machine learning at your fingertips
Black-box models aren’t cool anymore. It’s easy to build great models nowadays, but what’s going on inside? That’s what Explainable AI and LIME try to uncover.
Don’t feel like reading? Check out my video on the topic:
Knowing why the model makes predictions the way it does is essential for tweaking. Just think about it — if you don’t know what’s going on inside, how the hell will you improve it?
LIME isn’t the only option for machine learning model interpretation. The alternative is SHAP. You can learn more about it here:
Today we also want to train the model ASAP and focus on interpretation. Because of that, the identical dataset and modeling process is used. …
Comments