Posts

Showing posts from July, 2024

Developing Data Engineering Solutions with Databricks

Image
  118 4 Data Solutions Architecture — image created by the author The goal of a data engineering (DE) solution is to provide the  right  stakeholders with the data they  need , in the  format  they need,  when  they need it. I emphasise “ solution ” over “ pipeline ” because data processing code is just one part of a data engineering solution. In my opinion, coding transformation and processing logic is not the same thing as developing the overarching solution that ultimately delivers the value. There is a lot of content on the internet on how to develop  pipelines  with Spark and Databricks. In this article, I take a broader view on general data engineering challenges in developing  solutions  and how to address them in Databricks. Why We Need Environments and Tests Data is not inherently valuable. It’s only valuable when it’s used by people. People only use data they trust and which provides the information they need. In consequence, the most important aspects of every data product a