Friday, November 29, 2019

15 Data Science Books You Should Read

This year, we’ve seen a 56% increase in data science jobs according to TechRepublic. Data science jobs such as data scientists, data engineers, data analysts, and machine learning engineers are booming. Increasingly, software engineers and developers are working side by side with these data professionals. It’s essential for anyone on a development team to understand some of the basics of data science, statistics, and machine learning. Picking up any of one of the below books will give you some knowledge and understanding of important areas of data science such as Statistics, Data Science, Machine Learning, and Deep Learning.

Disclaimer: There are no affiliate links in this post. This post is for information purposes only.

Math & Statistics

Think Stats — by Allen B. Downey

This is a beginner’s introduction to statistical analysis that will also give you a practical understanding of the process of data analysis. You work on a case study to gain an understanding of the process. At the same time, you gain an understanding of probability and statistics by writing code.

2. Practical Statistics for Data Scientists: 50 Essential Concepts — by Peter Bruce and Andrew Bruce

This is a comprehensive reference guide for many of the concepts in statistics for data science. It’s a good book to bridge the gap between statistics and data science. Although the book assumes familiarity with R, it’s still a good book to learn statistical concepts for Python programmers.

Naked Statistics — Stripping Dread from Data — by Charles Wheelan

If you want a good math book that goes over all the main concepts of statistics without making it heavy, then this is the book for you. You learn statistics by looking at real-world examples. Charles Wheelan is brilliant and funny. Not only would you walk away with understanding the core concepts of Statistics such as inference, correlation, and regression analysis, but he will inspire you to learn more.

Innumeracy: Mathematical Illiteracy and its Consequences — by John Allen Paulos

This is one of the must-read books on any data scientist’s bookshelf. John Allen Paulos takes you on a journey of examining the consequence of numbers. It gives you compelling examples of the impact of data in the real world by looking at election stats, sports stats, drug testing, stock scams, and more.

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are — by Seth Stephens-Davidowitz, Tim Andres Pabon, et al.

This is a fascinating book about real applications of data and insights generated by our Big Data. The fun part is navigating through the witty book. But, the practical understanding you gain from how data is used, perceived and processed by people will leave a lasting impression on you. It’s very easy to detach yourself from the human aspect of data science work. This book puts the human aspect back into it. It inspires people who work with data to think about the insights generated. It also inspires a skeptical mindset in someone who works with data.

Python & Data Science

Data Science from Scratch — by Joel Grus

This is a comprehensive beginner’s crash course in statistics, python, and data science. It contains implementations for models, fundamentals of machine learning and even explores the database side of things. It’s an overview book for anyone who works with data scientists to see the big picture of the entire process from beginning to end. If you only have time to read one data science book, then this is probably the book for you.

Python Data Science Handbook: Essential Tools for Working with Data — by Jake VanderPlas

This is a beginner’s guide to tackle day to day issues of data science. From using Jupyter notebooks to working with python libraries: Numpy, Pandas, Matplotlib, Scikit-Learn, etc.. It will get you started on working with Python for Data Science without taking a course. It’s perfect for someone who has experience with Python but needs a guide on the tools available for Data Science work. It will save you time Googling for answers.

Just Python

Fluent Python: Clear: Concise, and Effective Programming — by Luciano Ramalho

If you come from a programming background other than Python, then this is a great book to sharpen your Python skills before delving into any Python for Data Science books. Not only will you learn to be proficient in Python 3, you will learn to write simple and effective code. Luciano Ramalho takes you on a Python journey that allows you to be productive very quickly.

Python Tricks: A Buffet of Awesome Python Features — by Dan Bader

In data science, Python tricks are frequently used to efficiently explore the data. This book will allow you to discover a lot of the best practices to make use of the power and the simplicity of Python code. The fun part is discovering all the hidden gems in the Python standard library. Many levels of Python programmers, beginner to advanced, can appreciate this book.

Machine Learning

Introduction to Machine Learning with Python — by Andreas C. Müller and Sarah Guido

When you work with machine learning engineers, it’s often necessary to have an appreciation for algorithm selection, parameter tuning, and the methods that they use to apply machine learning concepts in their daily work. This book will give you an inside look at machine learning models, the process of training models, selecting algorithm and tuning parameters. For machine learning engineers, it’s a must-read as one of the first books to enter the field of machine learning.

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems — by Aurélien Géron

This is a great overview book of machine learning concepts, techniques, and workflows of machine learning using all of the popular Python libraries for machine learning. It’s often used by machine learning engineers as a reference book or a starting guide. It takes you through the process of training models, selecting algorithm and tuning parameters.

Deep Learning

Make Your Own Neural Network — by Tariq Rashid

This is a great book to get a deep understanding of neural networks. From the math to the implementation, the book takes you on a fun journey that simplifies the math behind neural networks. It allows you to code your own neural network and appreciate both the big picture and the technicals of neural networks. It’s a good book to read for comprehensive learning of the concepts behind neural networks.

Deep Learning (Adaptive Computation and Machine Learning Series)- by Ian Goodfellow, Yoshua Bengio, Aaron Courville

This is a deep dive into the area of deep learning. It gives you the mathematical background and conceptual background. It discusses deep learning techniques, applications, and research perspectives. It’s probably the deep learning bible for any machine learning engineer.

Deep Learning with Python — by Francois Chollet

This is a good beginner’s guide to deep learning with Python. Google AI researcher Francois Chollet that takes you on a journey to learn concepts, principles, methods of deep learning that is both intuitive and in depth. He explains with practical examples of different applications of deep learning in computer vision, natural-language processing and generative modeling. It’s a good practical book on deep learning.

Hands-On Machine Learning and Scikit-Learn —by Aurélien Géron

This is a good practical book for using TensorFlow and Scikit-Learn to train deep learning models. It covers the essentials of other types of machine learning models. But, working through examples can allow you to learn techniques to train and scale deep neural nets. This book makes use of the newest tools to make the machine learning process easier.

Towards Data Science

Sharing concepts, ideas, and codes.

Written by

Jun Wu

Writer, Technologist, Poet: Tech|Future|Leadership, Signup: http://bit.ly/2Wv02me, http://bit.ly/34mkjhe, http://bit.ly/33oLxSM(Forbes-AI, Behind the Code)

Towards Data Science

Sharing concepts, ideas, and codes.

See responses (1)

Thursday, November 28, 2019

Oracle Linux Training at Your Own Pace

nowing that taking training at your own pace, when you have time, suits many people's schedules and learning style, Oracle has just releases new Training-on-Demand courses for those aspiring to build their Linux administration skills.

Why not take advantage to the newly released training to build your Linux skills.

Start your Linux learning with the Oracle Linux System Administration I course. This course covers a range of skills including installation, using the Unbreakable Enterprise Kernel, configuring Linux services, preparing the system for the Oracle Database, monitoring and troubleshooting.

After gaining essential knowledge and skills from taking the Oracle Linux System Administration I course, students are encouraged to continue their Linux learning with Oracle Linux System Administration II.

The Oracle Linux System Administration II course teaches you how to automate the installation of the operating system and implement advanced software package management. How to configure advanced networking and authentication services.

Resources:

Be the first to comment

Comments ( 0 )

Multiple Node.js Applications on Oracle Always Free Cloud

What if you want to host multiple Oracle JET applications? You can do it easily on Oracle Always Free Cloud. The solution is described in the below diagram:

You should wrap Oracle JET application into Node.js and deploy it to Oracle Compute Instance through Docker container. This is described in my previous post - Running Oracle JET in Oracle Cloud Free Tier.

Make sure to create Docker container with a port different than 80. To host multiple Oracle JET apps, you will need to create multiple containers, each assigned with a unique port. For example, I'm using port 5000:

docker run -p 5000:3000 -d --name appname dockeruser/dockerimage

This will map standard Node port 3000 to port 5000, accessible internally within Oracle Compute Instance. We can direct external traffic from port 80 to port 5000 (or any other port, mapped with Docker container) through Nginx.

Install Nginx:

yum install nginx

Go to Nginx folder:

cd etc/nginx

Edit configuration file:

nano nginx.conf

Add context root configuration for Oracle JET application, to be directed to local port 5000:

location /invoicingdemoui/ {
proxy_pass http://127.0.0.1:5000/;
}

To allow HTTP call from Nginx to port 5000 (or other port), run this command (more about it on Stackoverflow):

setsebool -P httpd_can_network_connect 1

Reload Nginx:

systemctl reload nginx

Check Nginx status:

systemctl status nginx

That's all. Your Oracle JET app (demo URL) now accessible from the outside:

Oracle Cloud : Free Tier and Article

Oracle Cloud Free Tier was announced a couple of months ago at Oracle OpenWorld 2019. It was mentioned in one of my posts at the time (here). So what do you get for your zero dollars?

2 Autonomous Databases : Autonomous Transaction Processing (ATP) and/or Autonomous Data Warehouse (ADW). Each have 1 OCPU and 20G user data storage.
2 virtual machines with 1/8 OCPU and 1 GB memory each.
Storage : 2 Block Volumes, 100 GB total. 10 GB Object Storage. 10 GB Archive Storage.
Load Balancer : 1 instance, 10 Mbps bandwidth.
Some other stuff…

I’ve been using Oracle Cloud for a few years now. Looking back at my articles, the first was written over 4 years ago. Since then I’ve written more as new stuff has come out, including the release of OCI, and the Autonomous Database (ADW and ATP) services. As a result of my history, it was a little hard to get exited about the free tier. Don’t get me wrong, I think it’s a great idea. Part of the battle with any service is to get people using it. Once people get used to it, they can start to see opportunities and it sells itself. The issue for me was I already had access to the Oracle Cloud, so the free tier didn’t bring anything new to the table *for me*. Of course, it’s opened the door for a bunch of other people.

More recently I’ve received a few messages from people using the free tier who have followed my articles to set things up, and I’ve found myself cringing somewhat, as aspects of the articles were very out of date. They still gave you then general flow, but the screen shots were old. The interface has come a long way, which is great, but as a content creator it’s annoying that every three months things get tweaked and your posts are out of date.

I promised myself some time ago I would stop re-capturing the screen shots, and even put a note in most articles saying things might look a little different, but now seemed a good time to do some spring cleaning.

First things first, I signed up to the free tier with a new account. I didn’t need to, but I thought it would make sense to work within the constraints of the free tier service.

Oracle Cloud : Free Tier Account Sign-Up

With that done I set about revamping some of my old articles. In most cases it was literally just capturing new screen shots, but there were a few little changes. Here are the articles I’ve revamped as part of this process.

There are some other things I’m probably going to revisit and new things I’m going to add, but for now I feel a little happier about this group of posts. They’ve been nagging at the back of my mind for a while now.

If you haven’t already signed up for a Free Tier account, there is literally nothing to lose here. If you get stuck, the chat support has been pretty good in my experience, and please send feedback to Oracle. The only way services get better is if there is constructive feedback.

Cheers

Tuesday, November 26, 2019

A Beginner’s Guide To Entry-Level Data Science Interview

The opportunity to leverage insight from data has never been greater. However, at this point, we do not have enough skilled employees to help us make sense of it all. So, if you want to be a data scientist, working with data in the next 10–15 years, then this is your time. Go win all those jobs!

This blog would briefly focus on the concepts that you must know about before you appear for a Data Science entry-level interview. Through the length of the blog, I will cover a list of topics that help for an entry-level data Science interview, the amount of domain knowledge you need to have about it and BONUS: questions around a topic from my interviews!!!

Traditionally, Data Science focuses on programming, mathematics, computer science and Machine Learning domain knowledge and we would take exactly on that!

With tons of opportunities lying around for a data scientist, here’s my guide for a beginner Data Scientist.

1. Programming Languages

Python and R are the most popular ones I’ve seen in the Data Science space till date. However, a good working knowledge of C/C++ and Java never hurts!

Python

Most Data Scientists have a hidden love for Python given the fact the Python has some amazing libraries as NumPy, SciPy, Pandas, StatsModel, Matplotlib, Seaborn, Plotly making it easier to work. Modules for Machine Learning such as Scikit-learn, TensorFlow, PyTorch are some good resources to know about.

R

For R, consider ggplot as your hero! R is easy when it comes to data visualization for all Data Analysts here, R is something you would want to know visualizations in. If your preferred language for Data Visualization is R, packages like ggplot, Latttice, Highcharter, Plotly, Leaflet, dygraphs can definitely be handy during an interview, or when you explain how to visualize a heat map in R?

Questions:

Tell me some libraries in Python you can use for logistic regression?
If you have two variables x and y, how would you know their dependence?
How would you know if a variable is independent?
Write a code to see first 10 records of a data set in Python

2. Mathematics

Now Math might not be the strongest point for everyone but being a data scientist needs you to perform good on statistics, linear algebra, probability and differential calculus. That’s it!

Statistics

Statistics is a math field you would use mostly to analyze and visualize data, in order to discover, infer and propose helpful insights from data. Box plots, scatter plots, density plots, bar plots, histograms help. However, R makes the task way lot easier to visualize and derive relations rather than calculating mean of 120 records on a mobile calculator :/

Linear Algebra

Used majorly in machine learning and deep learning, linear algebra are used to understand how algorithms work in the back-end. Basically, it’s all about vector and matrix operations, data representations through matrices and vector spaces. I am not a great fan of people asking math questions on interviews but that you gotta answer what you gotta answer.

Calculus

Calculus in machine learning (& deep learning as well) primarily is used to formulate functions used to train data models and algorithms to complete the objective, compute loss/cost/objective functions and derive scalar/vector relationships.

Probability

Bayes’ Theorem. That is something you would know the answer to even in your sleep being a Data Scientist. And then, some random variables and probability distribution

I was asked to write down the Bayes’ Theorem on a paper!

3. Database Management

For Data Scientists who would play with data day and year long, database knowledge would be prerequisite. Data Scientists often drive one of the two diverging roads: Mathematics or Database Management. A double nested SQL query can be something you might be asked to write in an utter nightmare interview :D.

That being said, it is important to have some knowledge of query optimization, relational database schema, database management, cost of evaluation, indexing. Working knowledge of both SQL and noSQL systems help in an interview.

For some strange reason, in three of my interviews, I was asked a similar question on database indexing.

Questions:

Suppose, you have a dataset with 56000 citizen records and you have to find citizens with age > 80 years. What would you do?
Do you know about the ACID property in DBMS?
What is a correlated query?

4. Machine Learning

It is a starting point for a beginner Data Scientist to fathom that machine learning is part of data science. It draws aspects from statistics and algorithms to work on the data generated and extracted from multiple resources. When you visit any website, you generate data and what happens next is that data gets generated in massive volumes and later to process. That is when machine learning comes into action.

Machine learning is the ability of a system to learn and process data sets autonomously without human bias. This is achieved by building complex algorithms and techniques like regression, supervised learning, clustering, Naïve Bayes and more.

Topics to consider for starters:

Classification
Regression
Reinforcement Learning
Deep Learning
Clustering
Segmentation
Recommender Systems (maybe)
Dimensional Modelling

A preliminary knowledge on the above topics would be good for the start. You could always think of examples / applications where you would use Machine Learning algorithms. Say, Netflix recommender systems or Spam Email filtering , Fraud and Risk Detection, Advanced Image Recognition, and so..

5. Basic Definitions

Sometimes, as basic as common concepts might come up to entail a detailed conversation. So, in my interview with PepsiCo, I was asked if I could say on telling stories through data. I perceived it in terms of visualizing data for intuitive results and the next 15 minutes was a discussing Data Visualization, the basics, data cleaning, data perception.

I would recommend knowing what exactly the following data terms mean so when a question peripheral to that comes up, you are ready to fire!

a. Data Analytics

Data analysis is a process of learning, exploring, inspecting, transforming and modeling data with the goal of discovering useful information and draw insightful results. Data Analytics is primary for businesses as it makes decision-making smart and quick with the facts to support to visual representation of company’s standing.

b. Data Wrangling

Data wrangling is the process of transforming and mapping data from a raw data form into a better format with the intent of making it more understandable, appropriate and ready for further processing for a variety of downstream purposes such as analytics.

c. Data Cleaning

Just as it sounds, data cleaning is where you

Remove unusual data occurrences (unusually large number in a years of experience field)
Detect errors in data types (say 127 written in Gender input field)
Correct corrupt or inaccurate records from a record set or database (#$#@% instead of 34325)
Fill in incomplete, incorrect, inaccurate or irrelevant parts of the data (Missing data in gender field)
Replace, modify, or delete the noisy or coarse data

d. Exploratory Data Analysis

EDA is an approach to reduce data to a smaller set of summary variables and use those results to:

Maximize insights from a data set
Uncover the underlying relationship among variables
Extract variables important for visualization
Detect outliers and anomalies (very important!)
Test the underlying assumptions
Determine optimal model construction

Along with knowledge on all these, you also would have to know about the IDEs and tools you work on.

IDE

PyCharm
Jupyter
Google CoLab
Spyder
R-Studio

Tools

Tableau
PowerBI
SAS
Apache Spark
MATLAB
TensorFlow
AWS
Azure

You might not necessarily know about it all, but make sure what you know is the last grain of that bag. Entry-level data science interviews expects you to have all the required knowledge. My advice would be to build the right foundation, focus on the basics.

For practice, make yourself visible online. Write blogs, create repos, participate in data challenges and hackathons, contribute to open-source. With a tailored portfolio to the job you want and the passions you have, plus the interview guide you just read, I am sure you would get the best job out there!

Thank you for reading! If you’ve enjoyed this article, hit the clap button and let me know what would answer to the questions I was asked! Happy Data tenting!

Know your author

Rashi is a graduate student and a Data Analyst, User Experience Analyst and Consultant, a Tech Speaker, and a Blogger! She aspires to form an organization connecting the Women in Business with an ocean of resources to be fearless and passionate about work and world. Feel free to drop her a message here!

Friday, November 29, 2019

Math & Statistics

Think Stats — by Allen B. Downey

2. Practical Statistics for Data Scientists: 50 Essential Concepts — by Peter Bruce and Andrew Bruce

Naked Statistics — Stripping Dread from Data — by Charles Wheelan

Innumeracy: Mathematical Illiteracy and its Consequences — by John Allen Paulos

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are — by Seth Stephens-Davidowitz, Tim Andres Pabon, et al.

Python & Data Science

Data Science from Scratch — by Joel Grus

Python Data Science Handbook: Essential Tools for Working with Data — by Jake VanderPlas

Just Python

Fluent Python: Clear: Concise, and Effective Programming — by Luciano Ramalho

Python Tricks: A Buffet of Awesome Python Features — by Dan Bader

Machine Learning

Introduction to Machine Learning with Python — by Andreas C. Müller and Sarah Guido

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems — by Aurélien Géron

Deep Learning

Make Your Own Neural Network — by Tariq Rashid

Deep Learning (Adaptive Computation and Machine Learning Series)- by Ian Goodfellow, Yoshua Bengio, Aaron Courville

Deep Learning with Python — by Francois Chollet

Hands-On Machine Learning and Scikit-Learn —by Aurélien Géron

Towards Data Science

Sharing concepts, ideas, and codes.

19

19 claps

Writer, Technologist, Poet: Tech|Future|Leadership, Signup: http://bit.ly/2Wv02me, http://bit.ly/34mkjhe, http://bit.ly/33oLxSM(Forbes-AI, Behind the Code)

Sharing concepts, ideas, and codes.

Thursday, November 28, 2019

Be the first to comment

BLOGROLL

Tuesday, November 26, 2019

1. Programming Languages

Python

R

Questions:

2. Mathematics

Statistics

Linear Algebra

Calculus

Probability

3. Database Management

Questions:

4. Machine Learning

Topics to consider for starters:

5. Basic Definitions

a. Data Analytics

b. Data Wrangling

c. Data Cleaning

d. Exploratory Data Analysis

IDE

Tools

Know your author

Towards Data Science

Sharing concepts, ideas, and codes.

7

7 claps

You're following Towards Data Science.

You’ll see more from Towards Data Science across Medium and in your inbox.