Posts

Showing posts from January, 2024

Snowflake Links

 https://www.snowflake.com/en/resources/?mkt_tok=MjUyLVJGTy0yMjcAAAGQ08tpDkt2fpnCNQlKaKa6nf4S9XQefOMQGc7qP3HVmWNQqyjp13DWSax8hZBzV1RQtUsyJdPLpYDc3dAYGqlqsJY5kEvKAEa7ZnLbIzZcBF1oWudQ1w&utm_campaign=asia-snowflake-discover-email-20240123&utm_source=snowflake&utm_medium=Email&mkt_tok=MjUyLVJGTy0yMjcAAAGQ08tpDkt2fpnCNQlKaKa6nf4S9XQefOMQGc7qP3HVmWNQqyjp13DWSax8hZBzV1RQtUsyJdPLpYDc3dAYGqlqsJY5kEvKAEa7ZnLbIzZcBF1oWudQ1w https://www.snowflake.com/virtual-hands-on-lab/?mkt_tok=MjUyLVJGTy0yMjcAAAGQ08tpDkt2fpnCNQlKaKa6nf4S9XQefOMQGc7qP3HVmWNQqyjp13DWSax8hZBzV1RQtUsyJdPLpYDc3dAYGqlqsJY5kEvKAEa7ZnLbIzZcBF1oWudQ1w https://www.snowflake.com/virtual-hands-on-lab/?mkt_tok=MjUyLVJGTy0yMjcAAAGQ08tpDkt2fpnCNQlKaKa6nf4S9XQefOMQGc7qP3HVmWNQqyjp13DWSax8hZBzV1RQtUsyJdPLpYDc3dAYGqlqsJY5kEvKAEa7ZnLbIzZcBF1oWudQ1w

Python Data Engineering: Comprehensive Workflow for Data Modeling, Analytics with DuckDB

Image
  A Complete Data Engineering Workflow, Data Modelling and Advanced Analytics using Python, DuckDB Project Overview Our primary goal is to convert the raw dataset into structured Dimension and Fact tables, allowing for efficient analysis and modelling. This process involves data cleaning and creating specific dimension tables covering attributes like date-time, passenger count, trip distance, payment types, and more. We will touch the fundamentals of data modelling, granularity and basic data engineering terminologies in simple  human friendly  terms. Additionally, we’ll explore automation using Python libraries, such as pandas, DuckDB and highlight the advantages of the Parquet file format, and perform analytical queries to derive meaningful insights. Prerequisites Python 3 Pandas DuckDB Jupyter Notebook (optional) Vs Code — Vs code provides notebook support out of the box, if open a file with extension of .ipynb it will open it in notebook like interface Introduction to the Data The