Tuesday, November 12, 2019

Data Analysis Practice Guide: How to Begin?

Data Analysis Practice Guide: How to Begin?

Lewis Chou
Lewis Chou
Nov 12 · 6 min read
Many beginners are confused about how to learn data analysis. Today, I will introduce the whole process of data analysis to answer your doubts and open up new ideas.
I believe you already know the importance of data analysis in modern society. Mastering the data means mastering the law. When you understand the market data and analyze it, you can get the market rules. When you master the data of the product itself, analyze it, you can understand the user source of the product, user portraits and so on. Data analysis is so important, it is not only the “data structure + algorithm” of the new era, but also the high ground for enterprises to compete for talents.

1. What is the process of data analysis?

Data analysis is mainly divided into three steps.
  • Data Collection
That is to take raw materials, we can’t analyze without data.
  • Data Mining
Data mining is the value of the entire business. The core of data mining is to mine the commercial value of data, which is what we call business intelligence.
  • Data Visualization
Simply put, let us intuitively understand the results of data analysis.
Talking like this may be too simple, let me introduce you to these three steps in detail.

1.1 Data collection

In the data collection section, you usually work with different data sources and then use tools to collect them.
On the web you can collect a wide variety of data sets. There are also many tools that can help you automatically scrape data. Of course, if you write a Python crawler, it will be even more efficient. The fun of mastering Python crawlers is endless. It not only allows you to get hot reviews on social media, automatically downloads posters with keywords, but also automatically adds fans to your account, giving you the thrill of automation.

1.2 Data mining

The second part is data mining, which can be compared to the “algorithm” part of the entire data analysis process. First you need to know its basic flow, the top ten algorithms, and the mathematical foundation behind it. In this part, we will come into contact with some concepts, such as association analysis, Adaboost algorithm, etc.
Mastering data mining is like holding a crystal ball. It uses historical data to tell you what will happen in the future. Of course it will also show you the high reliability.

1.3 Data visualization

The third is data visualization, which is a very important step that we are particularly interested in. Data is often implicit, especially when data is large, and visualization is a good way to understand the structure of the data and the presentation of the results. How can we visualize data? There are two ways.
The first is to use Python. In the process of cleaning and mining data in Python, we can use third-party libraries such as Matplotlib and Seaborn to render.
The second is to use third-party tools. If you have already generated a csv format file and want to use WYSIWYG to render it, you can use third-party tools such as Data GIF Maker, TableauFineReport, etc., which can easily process the data and help you make the presentation. For more information aout data visualization tools, you can read this article .
Of course, these theories are relatively abstract, so I think the best way to learn data analysis is to use them in tools and deepen understanding in projects.

2. Practice guide

Just now we talked about the data analysis panorama, including data acquisition, data mining, and data visualization. You may feel that there are a lot of things, you can’t start, or you feel that data mining involves many algorithms, and some are difficult to master. In fact, these are unnecessary troubles.
Here we introduce the MAS (Multi-dimension, Ask, Share) learning method. With this method, learning data analysis is a process from “thinking” to “tool” to “practice”.
Today I will share my learning experience with you from more angles. We can
call today’s content a “practice guide” . We turn knowledge into our own language, and it really becomes our own thing. The process of this transformation is the process of cognition.
So how do you improve your ability of learning? Simply put, it is to “know and do”.
If cognition is the brain, tools are like our hands, and data engineers and algorithm scientists deal with the tools every day. If you start to do data analysis projects and have already thought about the algorithm model of data mining in your mind, please keep in mind the following two principles.

2.1 Do not repeat producing wheels

I have seen many companies that have data collection needs. They thought that some tools can’t meet their individual needs, so they decided to recruit people to do this work. What happened? After more than a year of practice, they invested a lot of money, found a lot of bugs, and finally chose third-party tools. At this time, in fact, with timely assessment of need, you can save losses in a timely manner. For example, data reporting tools like FineReport can provide solutions for various industries. It also helps you with your needs assessment.

2.2 Tools determine efficiency

“Don’t repeat producing wheels” means you first need to find a wheel that can be used, which is a tool. So how do we choose?
It depends on the work you are going to do. The tools are not good or bad, only suitable or not. In addition to research-type work, in most cases, engineers will choose the most user-friendly tools. For example, Python has a lot of third-party libraries for handling data mining. These libraries have a large number of users and help files to help you get started.
If you are looking for a suitable data analysis tool, you can refer to this article .
After choosing a good tool, all you have to do is accumulate “assets”. It’s hard to remember a lot of knowledge points, and we can’t follow the instructions of the tools, but we can usually remember the stories, the projects we have done, and the problems we have done. These topics and projects are your first “assets”.
How do we quickly accumulate these “assets”? The answer is proficiency. Solving the problems is only the first step. The key is to train the “proficiency” used by our tools. As proficiency increases, your thinking cognitive model is gradually improving, and efficiency will naturally increase.

Conclusion

Cognitive trilogy, from cognition to tools to actual combat, is the learning advice I most want to share with you. And I hope this article will be helpful to you!

You might also be interested in…

Towards Data Science

Sharing concepts, ideas, and codes.

You're following Towards Data Science.

You’ll see more from Towards Data Science across Medium and in your inbox.

Lewis Chou
WRITTEN BY

Lewis Chou

Data Analyst at FanRuan Data Institute. Love data, football, jazz, running, cooking and Spanish. https://www.linkedin.com/in/lewis-chou-a54585181/

1 comment:

Muthamil said...

Thank you for the information. Please keep posting.

Data Analytics Solutions

Must Watch YouTube Videos for Databricks Platform Administrators

  While written word is clearly the medium of choice for this platform, sometimes a picture or a video can be worth 1,000 words. Below are  ...