The opportunity to leverage insight from data has never been greater. However, at this point, we do not have enough skilled employees to help us make sense of it all. So, if you want to be a data scientist, working with data in the next 10–15 years, then this is your time. Go win all those jobs!

This blog would briefly focus on the concepts that you must know about before you appear for a Data Science entry-level interview. Through the length of the blog, I will cover a list of topics that help for an entry-level data Science interview, the amount of domain knowledge you need to have about it and BONUS: questions around a topic from my interviews!!!

Traditionally, Data Science focuses on programming, mathematics, computer science and Machine Learning domain knowledge and we would take exactly on that!

With tons of opportunities lying around for a data scientist, here’s my guide for a beginner Data Scientist.

1. Programming Languages

Python and R are the most popular ones I’ve seen in the Data Science space till date. However, a good working knowledge of C/C++ and Java never hurts!

Python

Most Data Scientists have a hidden love for Python given the fact the Python has some amazing libraries as NumPy, SciPy, Pandas, StatsModel, Matplotlib, Seaborn, Plotly making it easier to work. Modules for Machine Learning such as Scikit-learn, TensorFlow, PyTorch are some good resources to know about.

R

For R, consider ggplot as your hero! R is easy when it comes to data visualization for all Data Analysts here, R is something you would want to know visualizations in. If your preferred language for Data Visualization is R, packages like ggplot, Latttice, Highcharter, Plotly, Leaflet, dygraphs can definitely be handy during an interview, or when you explain how to visualize a heat map in R?

Questions:

Tell me some libraries in Python you can use for logistic regression?
If you have two variables x and y, how would you know their dependence?
How would you know if a variable is independent?
Write a code to see first 10 records of a data set in Python

2. Mathematics

Now Math might not be the strongest point for everyone but being a data scientist needs you to perform good on statistics, linear algebra, probability and differential calculus. That’s it!

Statistics

Statistics is a math field you would use mostly to analyze and visualize data, in order to discover, infer and propose helpful insights from data. Box plots, scatter plots, density plots, bar plots, histograms help. However, R makes the task way lot easier to visualize and derive relations rather than calculating mean of 120 records on a mobile calculator :/

Linear Algebra

Used majorly in machine learning and deep learning, linear algebra are used to understand how algorithms work in the back-end. Basically, it’s all about vector and matrix operations, data representations through matrices and vector spaces. I am not a great fan of people asking math questions on interviews but that you gotta answer what you gotta answer.

Calculus

Calculus in machine learning (& deep learning as well) primarily is used to formulate functions used to train data models and algorithms to complete the objective, compute loss/cost/objective functions and derive scalar/vector relationships.

Probability

Bayes’ Theorem. That is something you would know the answer to even in your sleep being a Data Scientist. And then, some random variables and probability distribution

I was asked to write down the Bayes’ Theorem on a paper!

3. Database Management

For Data Scientists who would play with data day and year long, database knowledge would be prerequisite. Data Scientists often drive one of the two diverging roads: Mathematics or Database Management. A double nested SQL query can be something you might be asked to write in an utter nightmare interview :D.

That being said, it is important to have some knowledge of query optimization, relational database schema, database management, cost of evaluation, indexing. Working knowledge of both SQL and noSQL systems help in an interview.

For some strange reason, in three of my interviews, I was asked a similar question on database indexing.

Questions:

Suppose, you have a dataset with 56000 citizen records and you have to find citizens with age > 80 years. What would you do?
Do you know about the ACID property in DBMS?
What is a correlated query?

4. Machine Learning

It is a starting point for a beginner Data Scientist to fathom that machine learning is part of data science. It draws aspects from statistics and algorithms to work on the data generated and extracted from multiple resources. When you visit any website, you generate data and what happens next is that data gets generated in massive volumes and later to process. That is when machine learning comes into action.

Machine learning is the ability of a system to learn and process data sets autonomously without human bias. This is achieved by building complex algorithms and techniques like regression, supervised learning, clustering, Naïve Bayes and more.

Topics to consider for starters:

Classification
Regression
Reinforcement Learning
Deep Learning
Clustering
Segmentation
Recommender Systems (maybe)
Dimensional Modelling

A preliminary knowledge on the above topics would be good for the start. You could always think of examples / applications where you would use Machine Learning algorithms. Say, Netflix recommender systems or Spam Email filtering , Fraud and Risk Detection, Advanced Image Recognition, and so..

5. Basic Definitions

Sometimes, as basic as common concepts might come up to entail a detailed conversation. So, in my interview with PepsiCo, I was asked if I could say on telling stories through data. I perceived it in terms of visualizing data for intuitive results and the next 15 minutes was a discussing Data Visualization, the basics, data cleaning, data perception.

I would recommend knowing what exactly the following data terms mean so when a question peripheral to that comes up, you are ready to fire!

a. Data Analytics

Data analysis is a process of learning, exploring, inspecting, transforming and modeling data with the goal of discovering useful information and draw insightful results. Data Analytics is primary for businesses as it makes decision-making smart and quick with the facts to support to visual representation of company’s standing.

b. Data Wrangling

Data wrangling is the process of transforming and mapping data from a raw data form into a better format with the intent of making it more understandable, appropriate and ready for further processing for a variety of downstream purposes such as analytics.

c. Data Cleaning

Just as it sounds, data cleaning is where you

Remove unusual data occurrences (unusually large number in a years of experience field)
Detect errors in data types (say 127 written in Gender input field)
Correct corrupt or inaccurate records from a record set or database (#$#@% instead of 34325)
Fill in incomplete, incorrect, inaccurate or irrelevant parts of the data (Missing data in gender field)
Replace, modify, or delete the noisy or coarse data

d. Exploratory Data Analysis

EDA is an approach to reduce data to a smaller set of summary variables and use those results to:

Maximize insights from a data set
Uncover the underlying relationship among variables
Extract variables important for visualization
Detect outliers and anomalies (very important!)
Test the underlying assumptions
Determine optimal model construction

Along with knowledge on all these, you also would have to know about the IDEs and tools you work on.

IDE

PyCharm
Jupyter
Google CoLab
Spyder
R-Studio

Tools

Tableau
PowerBI
SAS
Apache Spark
MATLAB
TensorFlow
AWS
Azure

You might not necessarily know about it all, but make sure what you know is the last grain of that bag. Entry-level data science interviews expects you to have all the required knowledge. My advice would be to build the right foundation, focus on the basics.

For practice, make yourself visible online. Write blogs, create repos, participate in data challenges and hackathons, contribute to open-source. With a tailored portfolio to the job you want and the passions you have, plus the interview guide you just read, I am sure you would get the best job out there!

Thank you for reading! If you’ve enjoyed this article, hit the clap button and let me know what would answer to the questions I was asked! Happy Data tenting!

Know your author

Rashi is a graduate student and a Data Analyst, User Experience Analyst and Consultant, a Tech Speaker, and a Blogger! She aspires to form an organization connecting the Women in Business with an ocean of resources to be fearless and passionate about work and world. Feel free to drop her a message here!

Subrat's Technical Blog

Tuesday, November 26, 2019

A Beginner’s Guide To Entry-Level Data Science Interview

1. Programming Languages

Python

R

Questions:

2. Mathematics

Statistics

Linear Algebra

Calculus

Probability

3. Database Management

Questions:

4. Machine Learning

Topics to consider for starters:

5. Basic Definitions

a. Data Analytics

b. Data Wrangling

c. Data Cleaning

d. Exploratory Data Analysis

IDE

Tools

Know your author

Towards Data Science

Sharing concepts, ideas, and codes.

You're following Towards Data Science.

You’ll see more from Towards Data Science across Medium and in your inbox.

No comments:

Must Watch YouTube Videos for Databricks Platform Administrators

Report Abuse

Tuesday, November 26, 2019

A Beginner’s Guide To Entry-Level Data Science Interview

1. Programming Languages

Python

R

Questions:

2. Mathematics

Statistics

Linear Algebra

Calculus

Probability

3. Database Management

Questions:

4. Machine Learning

Topics to consider for starters:

5. Basic Definitions

a. Data Analytics

b. Data Wrangling

c. Data Cleaning

d. Exploratory Data Analysis

IDE

Tools

Know your author

Towards Data Science

Sharing concepts, ideas, and codes.

7

7 claps

You're following Towards Data Science.

You’ll see more from Towards Data Science across Medium and in your inbox.

No comments:

Must Watch YouTube Videos for Databricks Platform Administrators