Posts

Showing posts from December, 2022

GCP data eng

  How to prepare for the GCP Professional Data Engineer certification # googlecloud # dataengineering # gcp # bigdata Hi! I recently passed the GCP Professional Data Engineer certification exam and some people asked me for tips for the exam and study materials, so I decided to write this post to explain the path I took. This is certainly not the only possible way to prepare for this test, but it was the one that worked for me. So be open-minded as each person has different ways of studying and learning. Expectations alignment Unlike some other certifications, the Professional Data Engineer (PDE) is definitely not a simple exam, where just study for 8 hours and you'll be ready. Google writes its questions in a way that only someone with hands-on experience and an understanding of their services can get across. It is important to note that a certification is the validation of the knowledge you gain. The goal is not for you to memorize questions, but to actually understand the service

Setting Up Apache Airflow with Docker-Compose in 5 Minutes

Image
  Create a development environment and start building DAGs Photo by  Fabio Ballasina  on  Unsplash A lthough being pretty late to the party (Airflow became an Apache Top-Level Project in 2019), I still had trouble finding an easy-to-understand, up-to-date, and lightweight solution to installing Airflow. Today, we’re about to change all that. In the following sections, we will create a lightweight, standalone, and easily deployed Apache Airflow development environment in just a few minutes. Docker-Compose will be our close companion, allowing us to create a smooth development workflow with quick iteration cycles. Simply spin up a few docker containers and we can start to create our own workflows. Note : The following setup will not be suitable for any production purposes and is intended to be used in a development environment only. Why Airflow? Apache Airflow is a  batch-oriented framework  that allows us to easily build scheduled data pipelines in Python. Think of “workflow as code” ca