Posts

Showing posts from December, 2023

DP-203 AT one place

  What is the Microsoft DP-203 Exam? It is a professional exam to become certified as a Data Engineer working with Azure technologies, such as Azure Synapse Analytics, Azure Data Factory, Azure Stream Analytics, Azure Event Hubs, Azure Data Lake Storage, Azure Databricks, and other related technologies. What Does a Data Engineer Do? Data Engineers are responsible for designing how to store, collect, process, and analyze data. They build the data pipelines, integrate the data, work with Big Data technologies, help improve scalability and performance, protect the data, automate the process, and coordinate with Data Scientists. What are the Main Differences Between a Data Engineer and a Data Scientist? The main difference is that the Data Scientist is focused on data analysis, while the Data Engineer focuses on extracting, processing, and protecting the data. The engineer is in charge of the data maintenance, updates, etc., and the scientist gets insights and knowledge from the data. Is t

Implementing data quality with Databricks

Image
  Data quality is one of the key factors that we need to consider when designing our data platform. It is one of the core pillars of data governance and should be at the center of the platform and pipeline design. The purpose of this article is to provide practical recommendations for implementing data quality, based on samples and my personal experience. The focus is not on discussing the concept of data quality itself. This article is written in the context of Databricks’ spin-off of a Lakehouse platform, with Unity catalog for governance on top. However, the general ideas can be applied to any Lakehouse platform. Databricks published a comprehensive article that provides a deep dive into data quality principles and how features of Delta and Databricks can help you achieve them.  I highly recommend reading it through . The article begins with a diagram that showcases all the parts and Databricks Lakehouse features to address these principles. This diagram is the perfect starting poin