Shakespeare wrote King of Lear while in quarantine. Newton laid the foundations for his laws of motion while in quarantine. People are making funny memes while in quarantine.
What can you do? Changing the world of theatre forever or revolutionizing the world of physics — I expect that bar is too high. But you could enhance your skills in Data Science, and kickstart your career when this pandemic is over.
This is no complete guide to all classes that are out there. If you do a simple Google search, you will see that there are more than you can take in a lifetime…
Rather, this is a compendium of classes that I and others have found particularly useful. Which classes are right for you depends on your current level of skills in programming and statistics. Even though in practise the lines between programming and statistics are blurred, this guide is split between the two disciplines to make it easier to navigate.
In terms of programming, three languages are essential: Python, R and shell script. The latter is often underestimated by beginners. But if you’re a Data Scientist, it’s your bread and butter. You’ll also want to know a thing or two about Git and GitHub.
Regarding Python and R, you might get away with knowing only one in depth. This depends on where you want to work later on.
You’ll also need some knowledge about databases. (You want to be a Data Scientist, right?) The state of the art is SQL, so I suggest you get cracking on that before you start applying for jobs.
In terms of statistics, any general class on Data Science will do. Many of them also provide introductions to the most important languages and technical environments. But I still recommend taking a dedicated language class too. Because. Programming. Is. Important!
Since many of us face insecurities regarding our jobs and our budgets in these times, I’ve decided only to list free resources. After all, your discipline is a thousand times more important for your progress than the money you invest.
Programming classes
Depending on your current skillset, you may need some or all of the classes listed here. I’ve deliberately only listed one, maximum two, classes per language to avoid confusion about which is the best for you.
If you’re an absolute beginner, I advise you to follow all of these before moving on with the Data Science classes listed below.
1. Python
The materials at learnpython.org are a great starting point. It’s all interactive, and there’s no need to install anything — you can type your code directly in the browser.
You’ll probably find that you’re able to play around with it after a few hours. Invest a few days, and you’ll be able to write your own basic programs.
Sooner or later, you’ll need to be able to install Python and related packages. If you feel ready for that, you can try Python’s official resources. However, I discourage you from doing so if you have no prior coding experience since that can be quite intimidating.
If you have some money to spend, you can visit the Python classes by DataCamp. You get access to most courses for $25 per month, and they’re also code-in-browser classes. For most purposes, however, the free resources from Python and learnpython.org will be completely sufficient.
2. R
The edX course by Harvard University will give you all you need to get started with R. The class is 8 weeks long, but you’ll only need to put in 1–2 hours a week. So this should work even if you have a busy schedule.
If you want to put in a little more effort and get deeper knowledge, there is also the coursera class by Johns Hopkins University. It’s four weeks long, with 25 hours of effort in the first week and 10–12 hours a week after that. This course also covers the installation and gives you some background knowledge.
3. SQL
While datasets are quite manageable in Python, SQL is the way to go with large volumes. The tutorial by w3schools.com covers pretty much all the basics.
With lots of code-in-browser examples, this tutorial is very suitable for beginners. It should take about 10–15 hours to complete.
If you’re more of a video course type, you could also try the class by Khan Academy. There are more challenging text-in-browser examples in here, but the overall course time will be shorter — it should take you 5–10 hours to complete that one.
4. Shell script
A great way to get started is the Learn Code The Hard Way Book. It’s very systematic and without any fancy UI. But that’s exactly what makes it so good — later on, you’ll also sit at a command line without the usual graphic trills.
Following this class requires some discipline. But it’s worth it. Completing it should take 5–10 hours, depending on how deep you dive.
5. GitHub
If you’ve never used GitHub before, I recommend reading and following Anne Bonner’s guide. It’s here:
It’s an 18-minute read, but I suggest you invest at least a couple of hours to get started. It is a good idea to get everything up and running as you read the guide since you’ll be needing that stuff over and over again.
If you already know a thing or two, then GitHub’s guides can help you fill the gaps.
Data Science & statistics classes
If you have acquired some skills in programming, you can use these classes to deepen your knowledge in statistics. Taking one or two classes will be enough — what is the best fit for you will depend on your time budget and what language you prefer.
6. Johns Hopkins University / coursera: Data Science specialization
time: ~200 hours (or 6 hours weekly for 8 months), self-paced
language: R, but Python is a prerequisite
language: R, but Python is a prerequisite
This class is one of the highest-recommended for aspiring Data Scientists. It’s comprised of 10 sub-courses which you can mix and match as you like. But if you want to earn a certificate to show your future employer, you’ll have to do the whole thing.
7. Harvard School of Engineering: CS109 course material
time: ~100 hours (or 8 hours weekly for 13 weeks), self-paced
language: Python
language: Python
This is an actual class as taught at the Harvard John A. Paulson School of Engineering and Applied Sciences. Even though it’s not an online class in like those on coursera, edX and the likes, the wealth of material is a joy to look through.
8. Harvard University / edX: Statistics and R
time: 8-16 hours (or 2–4 hours weekly for 4 weeks), self-paced
language: R, basic programming is a prerequisite
language: R, basic programming is a prerequisite
Given the small time scope, this class covers the very basics of Data Science. It focuses more on data analysis and visualization. This is an option if you’re short in time but still want to learn something.
9. udacity: Introduction to Data Science
time: ~100 hours (or 10 hours weekly for 2 months), self-paced
language: Python
language: Python
This class covers everything from data acquisition, analysis, and visualization. The number of hours and the interactive format might be ideal for those who are stuck at home at the moment. If you work at it at a higher intensity, you could be done with it within a few weeks!
10. University of Texas / edX: Foundations of Data Analysis
time: 18–36 hours (or 3–6 hours weekly for 6 weeks), self-paced
language: R, basic programming is a prerequisite
language: R, basic programming is a prerequisite
This course markets itself as a typical undergraduate statistics course with an added twist of modeling. However, since the time investment is pretty low, I would argue that it is more an introduction to statistics and modeling.
11. University of Michigan / coursera: Applied Data Science
time: ~120 hours (or 8 hours weekly for 4 months), self-paced
language: Python
language: Python
This course is similar to number 6 in its makeup, but it’s a lot more hands-on. It’s comprised of 5 sub-courses that you can mix and match as you wish. This might be especially useful if you already know what industry you would like to work in. For example, two exciting parts of the course are machine learning and social network analysis.
Final note: Use your domain knowledge. And your network
All these courses are available for everyone with an internet connection. Therefore, while they might give you good foundations to build upon, they will not be the edge that separates you from other candidates.
Think about what makes you as a Data Scientist unique? Which domains have you worked on before — health, mathematics, biology, chemistry, physics or something completely different?
And who in your network could give you a good introduction? Who knows someone who knows somebody else in Data Science? Let your connections play.
And finally, take advice. Don’t take it from me, take it from other people who have made the transition to a Data Scientist. Reading stories in Towards Data Science is a great starting point.
But your goal should also be to get in direct contact and set up informal interviews with the people you look up to. This way, you can learn from their experiences first-hand.
For those of you who are in quarantine, I hope this guide helps to keep you sane and informed. Quarantined or not, I hope we all can use these times to set up a brighter future. Learning something new is only the beginning.
No comments:
Post a Comment