Friday, November 29, 2019

15 Data Science Books You Should Read

This year, we’ve seen a 56% increase in data science jobs according to TechRepublic. Data science jobs such as data scientists, data engineers, data analysts, and machine learning engineers are booming. Increasingly, software engineers and developers are working side by side with these data professionals. It’s essential for anyone on a development team to understand some of the basics of data science, statistics, and machine learning. Picking up any of one of the below books will give you some knowledge and understanding of important areas of data science such as Statistics, Data Science, Machine Learning, and Deep Learning.
Disclaimer: There are no affiliate links in this post. This post is for information purposes only.

Math & Statistics

Image from Amazon.com
This is a beginner’s introduction to statistical analysis that will also give you a practical understanding of the process of data analysis. You work on a case study to gain an understanding of the process. At the same time, you gain an understanding of probability and statistics by writing code.
Image from Amazon
This is a comprehensive reference guide for many of the concepts in statistics for data science. It’s a good book to bridge the gap between statistics and data science. Although the book assumes familiarity with R, it’s still a good book to learn statistical concepts for Python programmers.
Image from Amazon
If you want a good math book that goes over all the main concepts of statistics without making it heavy, then this is the book for you. You learn statistics by looking at real-world examples. Charles Wheelan is brilliant and funny. Not only would you walk away with understanding the core concepts of Statistics such as inference, correlation, and regression analysis, but he will inspire you to learn more.
Image from Amazon
This is one of the must-read books on any data scientist’s bookshelf. John Allen Paulos takes you on a journey of examining the consequence of numbers. It gives you compelling examples of the impact of data in the real world by looking at election stats, sports stats, drug testing, stock scams, and more.
Image from Amazon
This is a fascinating book about real applications of data and insights generated by our Big Data. The fun part is navigating through the witty book. But, the practical understanding you gain from how data is used, perceived and processed by people will leave a lasting impression on you. It’s very easy to detach yourself from the human aspect of data science work. This book puts the human aspect back into it. It inspires people who work with data to think about the insights generated. It also inspires a skeptical mindset in someone who works with data.

Python & Data Science

Image from Amazon
This is a comprehensive beginner’s crash course in statistics, python, and data science. It contains implementations for models, fundamentals of machine learning and even explores the database side of things. It’s an overview book for anyone who works with data scientists to see the big picture of the entire process from beginning to end. If you only have time to read one data science book, then this is probably the book for you.
Image from Amazon
This is a beginner’s guide to tackle day to day issues of data science. From using Jupyter notebooks to working with python libraries: Numpy, Pandas, Matplotlib, Scikit-Learn, etc.. It will get you started on working with Python for Data Science without taking a course. It’s perfect for someone who has experience with Python but needs a guide on the tools available for Data Science work. It will save you time Googling for answers.

Just Python

Image from Amazon
If you come from a programming background other than Python, then this is a great book to sharpen your Python skills before delving into any Python for Data Science books. Not only will you learn to be proficient in Python 3, you will learn to write simple and effective code. Luciano Ramalho takes you on a Python journey that allows you to be productive very quickly.
Image from Amazon
In data science, Python tricks are frequently used to efficiently explore the data. This book will allow you to discover a lot of the best practices to make use of the power and the simplicity of Python code. The fun part is discovering all the hidden gems in the Python standard library. Many levels of Python programmers, beginner to advanced, can appreciate this book.

Machine Learning

Image from Amazon
When you work with machine learning engineers, it’s often necessary to have an appreciation for algorithm selection, parameter tuning, and the methods that they use to apply machine learning concepts in their daily work. This book will give you an inside look at machine learning models, the process of training models, selecting algorithm and tuning parameters. For machine learning engineers, it’s a must-read as one of the first books to enter the field of machine learning.
Image from Amazon
This is a great overview book of machine learning concepts, techniques, and workflows of machine learning using all of the popular Python libraries for machine learning. It’s often used by machine learning engineers as a reference book or a starting guide. It takes you through the process of training models, selecting algorithm and tuning parameters.

Deep Learning

Image from Amazon
This is a great book to get a deep understanding of neural networks. From the math to the implementation, the book takes you on a fun journey that simplifies the math behind neural networks. It allows you to code your own neural network and appreciate both the big picture and the technicals of neural networks. It’s a good book to read for comprehensive learning of the concepts behind neural networks.
Image from Amazon
This is a deep dive into the area of deep learning. It gives you the mathematical background and conceptual background. It discusses deep learning techniques, applications, and research perspectives. It’s probably the deep learning bible for any machine learning engineer.
Image from Amazon
This is a good beginner’s guide to deep learning with Python. Google AI researcher Francois Chollet that takes you on a journey to learn concepts, principles, methods of deep learning that is both intuitive and in depth. He explains with practical examples of different applications of deep learning in computer vision, natural-language processing and generative modeling. It’s a good practical book on deep learning.
Image from Amazon
This is a good practical book for using TensorFlow and Scikit-Learn to train deep learning models. It covers the essentials of other types of machine learning models. But, working through examples can allow you to learn techniques to train and scale deep neural nets. This book makes use of the newest tools to make the machine learning process easier.

Towards Data Science

Sharing concepts, ideas, and codes.

Written by

Writer, Technologist, Poet: Tech|Future|Leadership, Signup: http://bit.ly/2Wv02me, http://bit.ly/34mkjhe, http://bit.ly/33oLxSM(Forbes-AI, Behind the Code)

Towards Data Science

Sharing concepts, ideas, and codes.

Thursday, November 28, 2019

Oracle Linux Training at Your Own Pace

nowing that taking training at your own pace, when you have time, suits many people's schedules and learning style, Oracle has just releases new Training-on-Demand courses for those aspiring to build their Linux administration skills.

Why not take advantage to the newly released training to build your Linux skills.
Start your Linux learning with the Oracle Linux System Administration I course. This course covers a range of skills including installation, using the Unbreakable Enterprise Kernel, configuring Linux services, preparing the system for the Oracle Database, monitoring and troubleshooting.
After gaining essential knowledge and skills from taking the Oracle Linux System Administration I course, students are encouraged to continue their Linux learning with Oracle Linux System Administration II.
The Oracle Linux System Administration II course teaches you how to automate the installation of the operating system and implement advanced software package management. How to configure advanced networking and authentication services.
Resources:

Be the first to comment

Comments ( 0 )

Multiple Node.js Applications on Oracle Always Free Cloud

What if you want to host multiple Oracle JET applications? You can do it easily on Oracle Always Free Cloud. The solution is described in the below diagram:


You should wrap Oracle JET application into Node.js and deploy it to Oracle Compute Instance through Docker container. This is described in my previous post - Running Oracle JET in Oracle Cloud Free Tier.

Make sure to create Docker container with a port different than 80. To host multiple Oracle JET apps, you will need to create multiple containers, each assigned with a unique port. For example, I'm using port 5000:

docker run -p 5000:3000 -d --name appname dockeruser/dockerimage

This will map standard Node port 3000 to port 5000, accessible internally within Oracle Compute Instance. We can direct external traffic from port 80 to port 5000 (or any other port, mapped with Docker container) through Nginx.

Install Nginx:

yum install nginx

Go to Nginx folder:

cd etc/nginx

Edit configuration file:

nano nginx.conf

Add context root configuration for Oracle JET application, to be directed to local port 5000:

location /invoicingdemoui/ {
     proxy_pass http://127.0.0.1:5000/;
}

To allow HTTP call from Nginx to port 5000 (or other port), run this command (more about it on Stackoverflow):

setsebool -P httpd_can_network_connect 1

Reload Nginx:

systemctl reload nginx

Check Nginx status:

systemctl status nginx

That's all. Your Oracle JET app (demo URL) now accessible from the outside:

Oracle Cloud : Free Tier and Article

Oracle Cloud Free Tier was announced a couple of months ago at Oracle OpenWorld 2019. It was mentioned in one of my posts at the time (here). So what do you get for your zero dollars?
  • 2 Autonomous Databases : Autonomous Transaction Processing (ATP) and/or Autonomous Data Warehouse (ADW). Each have 1 OCPU and 20G user data storage.
  • 2 virtual machines with 1/8 OCPU and 1 GB memory each.
  • Storage : 2 Block Volumes, 100 GB total. 10 GB Object Storage. 10 GB Archive Storage.
  • Load Balancer : 1 instance, 10 Mbps bandwidth.
  • Some other stuff…
I’ve been using Oracle Cloud for a few years now. Looking back at my articles, the first was written over 4 years ago. Since then I’ve written more as new stuff has come out, including the release of OCI, and the Autonomous Database (ADW and ATP) services. As a result of my history, it was a little hard to get exited about the free tier. Don’t get me wrong, I think it’s a great idea. Part of the battle with any service is to get people using it. Once people get used to it, they can start to see opportunities and it sells itself. The issue for me was I already had access to the Oracle Cloud, so the free tier didn’t bring anything new to the table *for me*. Of course, it’s opened the door for a bunch of other people.
More recently I’ve received a few messages from people using the free tier who have followed my articles to set things up, and I’ve found myself cringing somewhat, as aspects of the articles were very out of date. They still gave you then general flow, but the screen shots were old. The interface has come a long way, which is great, but as a content creator it’s annoying that every three months things get tweaked and your posts are out of date. 🙂 I promised myself some time ago I would stop re-capturing the screen shots, and even put a note in most articles saying things might look a little different, but now seemed a good time to do some spring cleaning.
First things first, I signed up to the free tier with a new account. I didn’t need to, but I thought it would make sense to work within the constraints of the free tier service.
With that done I set about revamping some of my old articles. In most cases it was literally just capturing new screen shots, but there were a few little changes. Here are the articles I’ve revamped as part of this process.
There are some other things I’m probably going to revisit and new things I’m going to add, but for now I feel a little happier about this group of posts. They’ve been nagging at the back of my mind for a while now.
If you haven’t already signed up for a Free Tier account, there is literally nothing to lose here. If you get stuck, the chat support has been pretty good in my experience, and please send feedback to Oracle. The only way services get better is if there is constructive feedback.

Tuesday, November 26, 2019

A Beginner’s Guide To Entry-Level Data Science Interview

The opportunity to leverage insight from data has never been greater. However, at this point, we do not have enough skilled employees to help us make sense of it all. So, if you want to be a data scientist, working with data in the next 10–15 years, then this is your time. Go win all those jobs!
This blog would briefly focus on the concepts that you must know about before you appear for a Data Science entry-level interview. Through the length of the blog, I will cover a list of topics that help for an entry-level data Science interview, the amount of domain knowledge you need to have about it and BONUS: questions around a topic from my interviews!!!
Traditionally, Data Science focuses on programming, mathematics, computer science and Machine Learning domain knowledge and we would take exactly on that!
With tons of opportunities lying around for a data scientist, here’s my guide for a beginner Data Scientist.

1. Programming Languages

Python and R are the most popular ones I’ve seen in the Data Science space till date. However, a good working knowledge of C/C++ and Java never hurts!
Most Data Scientists have a hidden love for Python given the fact the Python has some amazing libraries as NumPy, SciPy, Pandas, StatsModel, Matplotlib, Seaborn, Plotly making it easier to work. Modules for Machine Learning such as Scikit-learn, TensorFlow, PyTorch are some good resources to know about.
For R, consider ggplot as your hero! R is easy when it comes to data visualization for all Data Analysts here, R is something you would want to know visualizations in. If your preferred language for Data Visualization is R, packages like ggplot, Latttice, Highcharter, Plotly, Leaflet, dygraphs can definitely be handy during an interview, or when you explain how to visualize a heat map in R?
  1. Tell me some libraries in Python you can use for logistic regression?
  2. If you have two variables x and y, how would you know their dependence?
  3. How would you know if a variable is independent?
  4. Write a code to see first 10 records of a data set in Python

2. Mathematics

Now Math might not be the strongest point for everyone but being a data scientist needs you to perform good on statistics, linear algebra, probability and differential calculus. That’s it!
Statistics is a math field you would use mostly to analyze and visualize data, in order to discover, infer and propose helpful insights from data. Box plots, scatter plots, density plots, bar plots, histograms help. However, R makes the task way lot easier to visualize and derive relations rather than calculating mean of 120 records on a mobile calculator :/
Used majorly in machine learning and deep learning, linear algebra are used to understand how algorithms work in the back-end. Basically, it’s all about vector and matrix operations, data representations through matrices and vector spaces. I am not a great fan of people asking math questions on interviews but that you gotta answer what you gotta answer.
Calculus in machine learning (& deep learning as well) primarily is used to formulate functions used to train data models and algorithms to complete the objective, compute loss/cost/objective functions and derive scalar/vector relationships.
Bayes’ Theorem. That is something you would know the answer to even in your sleep being a Data Scientist. And then, some random variables and probability distribution
I was asked to write down the Bayes’ Theorem on a paper!

3. Database Management

For Data Scientists who would play with data day and year long, database knowledge would be prerequisite. Data Scientists often drive one of the two diverging roads: Mathematics or Database Management. A double nested SQL query can be something you might be asked to write in an utter nightmare interview :D.
That being said, it is important to have some knowledge of query optimization, relational database schema, database management, cost of evaluation, indexing. Working knowledge of both SQL and noSQL systems help in an interview.
For some strange reason, in three of my interviews, I was asked a similar question on database indexing.
  1. Suppose, you have a dataset with 56000 citizen records and you have to find citizens with age > 80 years. What would you do?
  2. Do you know about the ACID property in DBMS?
  3. What is a correlated query?

4. Machine Learning

It is a starting point for a beginner Data Scientist to fathom that machine learning is part of data science. It draws aspects from statistics and algorithms to work on the data generated and extracted from multiple resources. When you visit any website, you generate data and what happens next is that data gets generated in massive volumes and later to process. That is when machine learning comes into action.
Machine learning is the ability of a system to learn and process data sets autonomously without human bias. This is achieved by building complex algorithms and techniques like regression, supervised learning, clustering, Naïve Bayes and more.
  1. Classification
  2. Regression
  3. Reinforcement Learning
  4. Deep Learning
  5. Clustering
  6. Segmentation
  7. Recommender Systems (maybe)
  8. Dimensional Modelling
A preliminary knowledge on the above topics would be good for the start. You could always think of examples / applications where you would use Machine Learning algorithms. Say, Netflix recommender systems or Spam Email filtering , Fraud and Risk Detection, Advanced Image Recognition, and so..

5. Basic Definitions

Sometimes, as basic as common concepts might come up to entail a detailed conversation. So, in my interview with PepsiCo, I was asked if I could say on telling stories through data. I perceived it in terms of visualizing data for intuitive results and the next 15 minutes was a discussing Data Visualization, the basics, data cleaning, data perception.
I would recommend knowing what exactly the following data terms mean so when a question peripheral to that comes up, you are ready to fire!
Data analysis is a process of learning, exploring, inspecting, transforming and modeling data with the goal of discovering useful information and draw insightful results. Data Analytics is primary for businesses as it makes decision-making smart and quick with the facts to support to visual representation of company’s standing.
Data wrangling is the process of transforming and mapping data from a raw data form into a better format with the intent of making it more understandable, appropriate and ready for further processing for a variety of downstream purposes such as analytics.
Just as it sounds, data cleaning is where you
  1. Remove unusual data occurrences (unusually large number in a years of experience field)
  2. Detect errors in data types (say 127 written in Gender input field)
  3. Correct corrupt or inaccurate records from a record set or database (#$#@% instead of 34325)
  4. Fill in incomplete, incorrect, inaccurate or irrelevant parts of the data (Missing data in gender field)
  5. Replace, modify, or delete the noisy or coarse data
EDA is an approach to reduce data to a smaller set of summary variables and use those results to:
  1. Maximize insights from a data set
  2. Uncover the underlying relationship among variables
  3. Extract variables important for visualization
  4. Detect outliers and anomalies (very important!)
  5. Test the underlying assumptions
  6. Determine optimal model construction

Along with knowledge on all these, you also would have to know about the IDEs and tools you work on.

IDE

  1. PyCharm
  2. Jupyter
  3. Google CoLab
  4. Spyder
  5. R-Studio

Tools

  1. Tableau
  2. PowerBI
  3. SAS
  4. Apache Spark
  5. MATLAB
  6. TensorFlow
  7. AWS
  8. Azure

You might not necessarily know about it all, but make sure what you know is the last grain of that bag. Entry-level data science interviews expects you to have all the required knowledge. My advice would be to build the right foundation, focus on the basics.
For practice, make yourself visible online. Write blogs, create repos, participate in data challenges and hackathons, contribute to open-source. With a tailored portfolio to the job you want and the passions you have, plus the interview guide you just read, I am sure you would get the best job out there!
Thank you for reading! If you’ve enjoyed this article, hit the clap button and let me know what would answer to the questions I was asked! Happy Data tenting!

Know your author

Rashi is a graduate student and a Data Analyst, User Experience Analyst and Consultant, a Tech Speaker, and a Blogger! She aspires to form an organization connecting the Women in Business with an ocean of resources to be fearless and passionate about work and world. Feel free to drop her a message here!

Towards Data Science

Sharing concepts, ideas, and codes.

You're following Towards Data Science.

You’ll see more from Towards Data Science across Medium and in your inbox.

Must Watch YouTube Videos for Databricks Platform Administrators

  While written word is clearly the medium of choice for this platform, sometimes a picture or a video can be worth 1,000 words. Below are  ...