Becoming a Self-Taught Data Scientist
Nevertheless, if we look only at the job titles from the last 15–20 years, then it’s a whole another story. Stay with me in this article and you’ll find out some of the most amazing resources to break into the data science field. Whether you have a degree in some other field or not is completely irrelevant.
Some time back I’ve published an article about ins and outs of being self-taught vs. getting a degree in data science:
That article went in-depth on both options, so check it out if you’re interested in the topic. Today, however, I want to further explore the self-taught route and provide you with some amazing resources to get started.
Who is this article for?
- Those without any degree
- Those who finished college some time ago and want to switch to data science
Okay, let’s not waste any more time with the intro — now we’ll dive into the real stuff.
What does it mean to be self-taught?
Good question. What it means, in a nutshell, is that you didn’t finish any college degree in a field of interest (let’s say data science), and you’re working in the field of interest (once again, data science), then you’re considered to be a self-taught in that area of interest.
You’re still free to finish online courses and to read books, but you didn’t spend a couple of years behind the college desks getting a formal education.
Okay, now when that’s out of the way, let’s dive into the first way of being a self-taught data scientist.
From 0 Route
You would be in this category if:
- You don’t have a college degree
- Your level of knowledge of data and programming is minimal or non-existent
So what should you do? It’s a tough question to answer. To start out, you’ll need to get the basics covered — and those are math and stats skills. And yeah, you’ll need to learn how to code, preferably in Python.
Some time back I’ve written an article listing my favorite resources for every prerequisite needed to break into the field:
It’s a lot of stuff, I know, but no one said it’s going to be easy. Take some time to cover the basics. You don’t have to do extensive calculations by hand, a solid visual understanding should be more than enough.
I don’t advise to do extensive calculations by hand due to one reason — it’s easy for a computer to do. What is hard for the computer is to frame the problem, and knowing what to do in which situations (no, I’m not talking about conditional statements). That’s why a visual approach to math and stats is a gold mine.
If you take the time to study math, stats, and programming in-depth, I would say you’re in as good of a spot to start learning real data science as anyone who listened to those topics a couple of years back in college (students from math universities excluded) — ergo someone with finished college is by no means ahead of you — at least data science wise.
So this is from 0 route. Let’s now explore one other route before diving into the resources.
Switching Careers Route
Switching careers can be tough. You’ve probably worked a couple of years in one field and decided it’s not for you. And that’s fine. Maybe you got bored, maybe the job wasn’t motivating enough… The reasons are endless and I don’t want to discuss them.
What you’ll want to do as soon as possible is to honestly evaluate your math and stats skills. And I mean honestly. There’s no shame in admitting you’re rusty in a subject you’ve listened to 10 years ago.
If you’re not 100% confident into your understanding of the following topics:
- Linear Algebra
- Calculus
- Probability
- Statistics
- Programming
refer back to this article and select resource that suits your needs — whether in the form of a book or an online course.
Okay, everything covered? You may proceed to my personal selection of resources.
Resources for Self-Taught Data Scientists
So college is not an option for whatever reason, but you can spend an hour or two per day persuing the data science world. The next step will differ from person to person, depending much on whether you are a book or a video person. I’m more of a video person — I just don’t feel like reading a book after an 8-hour shift.
I’ll simply start from my personal favorite — my first exposure to data science:
Jose is an amazing instructor. There’s a quick refresher to essential Python libraries and pretty soon you’ll get to data analysis with Pandas and Numpy, and some data visualization with Matplotlib and Seaborn. And yeah, you’ll do machine learning. Not much, and not in-depth, but it will be enough to get you started.
If you’re more of a book person, then I recommend this one:
It’s called Python Data Science Handbook and it’s about 550 pages long and covers the same ideas the video course does — Numpy, Pandas, Matplotlib, and Scikit-Learn — all crucial to making it in data science.
Once you’re done with the basics, it’s time to dive deeper into machine learning. I have two great books to recommend, and one of them is free. Let’s dive in.
Introduction to Statistical Learning is amazing and also a free book to study machine learning in more depth. It gets a bit mathematical at times, but it’s manageable to read. For a field as broad as machine learning it does a pretty good job at keeping everything concise in about 400 pages. The only disadvantage is that the code is written in R, not Python. But hey, try to “translate” R code to Python code, it will surely be a great exercise.
The next book I’d recommend is called Hands-on Machine Learning with Scikit-Learn and Tensorflow. It’s around 700 pages if I remember correctly, but boy is it a good read. You’ll also dive into some deep learning concepts too, and you’ll also dive deeper into machine learning algorithms.
You won’t go wrong by choosing one or another, they’ll both serve you well (read both actually).
And for the online course, I would have to recommend Coursera’s Machine Learning course by one and only, mister don’t worry if you don’t understand it, Andrew Ng. It’s 10+ weeks and it gets tough pretty soon if your foundations are shaky. But hey, the average rating of 4.9 out of 5 from around 120K users really says something.
The labs aren’t in Python, or even in R, the labs are written in the free version of Matlab called Octave, so that’s something to consider.
Next Steps
You’ve gone through the books or courses (or both), and now you’re wondering what to do next. It will vary much, depending on your situation, but ideally, you should build up a GitHub profile.
Find 5 good datasets and do your best. Do the extensive analysis, write conclusions and thought process in markdown cells, make a readme file, and you know, pour your heart and soul into it.
This is essential to do for two reasons:
- You’re practicing newly acquired skills
- You’re showcasing your ability to produce good quality code and conclusions to potential employees
With regard to the ladder, it’s important for potential employers to see your best work. You don’t have a college degree, at least not the relevant one, so you need to show them somehow that you know how to get stuff done. GitHub is a way to go.
Take a month or two and make something you’ll be proud of.
Comments