Learn how to read data into a Pandas DataFrame in 5 minutes
Overview It is said that Data Scientist spends 80% of their time in preprocessing the data, so lets deep dive into the data preprocessing pipeline also known as ETL pipeline and let's find out which stage takes the most time. In this blog post, we will learn how to extract data from different data sources. Let's take a real-life dataset so it’s easier to follow. This lesson uses data from the World Bank. The data comes from two sources: World Bank Indicator Data — This data contains socio-economic indicators for countries around the world. A few example indicators include population, arable land, and central government debt. World Bank Project Data — This data set contains information about World Bank project lending since 1947. Types of Data files CSV — CSV stands for comma-separated value. This is how the file looks id,regionname,countryname,prodline,lendinginstr P162228,Other,World;World,RE,Investment Project Financing P163962,Africa,Democratic Repub...