Posts

Showing posts from January, 2022

Apache Spark — Multi-part Series: What is Apache Spark?

Image
  The main driving goal of Apache Spark is to enable users to build big data applications via a unified platform in an accessible and familiar way. Spark is designed in such a way that traditional data engineers and analytical developers will be able to integrate their current skill sets, whether that be coding languages or data structures with ease. But what does that all mean and you still haven’t answered the question! https://www.datanami.com/2019/03/08/a-decade-later-apache-spark-still-going-strong/ Apache Spark is a computing engine which contains multiple API’s (Application Programming Interfaces), these API’s allow a user to interact using traditional methods with the back-end Spark engine. One key aspect of Apache Spark is that it does not store data for a long period of time. Data can be notoriously expensive to move from one location to another so Apache Spark utilises its compute functionality over the data, wherever it resides. Within the Apache Spark user interfaces, Spar

Apache Spark — Multi-part Series: Spark Architecture

Image
  Spark Architecture was one of the toughest elements to grasp when initially learning about Spark. I think one of the main reasons is that there is a vast amount of information out there, but nothing which gives insight into all aspects of the Spark Ecosystem. This is most likely because, its complicated! There are so many fantastic resources out there, but not all are intuitive or easy to follow. I am hoping this part of the se r ies will help people who have very little knowledge of the topic understand how Spark Architecture is built, from the foundations up. We will also investigate how we can provide our Spark system with work and how that work is consumed and completed by the system in the most efficient way possible. As promised, this section is going to be a little more heavy. So buckle yourself in, this is going to be fun! Physical Spark Hierarchy: To understand how Spark programs work, we need to understand how a Spark system is built, brick by brick (see what I did there).