Posts

Showing posts from November, 2022

10 Best Practices For Using Kafka In Your Architecture

Image
  Key lessons I have learned While Using Kafka Apache Kafka, also known as Kafka, is an enterprise-level messaging and streaming broker system. Kafka is a great technology to use to architect and build real-time data pipelines and streaming applications. I highly recommend the architects to familiarise themselves with the Kafka ecosystem, in particular on the concepts of Kafka cluster, broker, topics, partitions, consumer, producer and offsets. Article Aim This article will highlight 12 of the important lessons I have learned whilst using Kafka. To enable parallel processing of messages, create multiple partitions in a topic. This enables multiple consumers to process the messages in parallel. Each partition can be consumed by only one consumer within a consumer group. So, if there are multiple consumers in a consumer group, they can consume messages from different partitions. Therefore, if we want to parallelize the consumption of the messages, create multiple partitions in a topic. E

DB cert

  10 QuestionsTo Practice Before Your Databricks Apache Spark 3.0 Developer Exam | by AnBento | Towards Data Science Study guide for clearing “Databricks Certified Associate Developer for Apache Spark 3.0” exam (python) | by Shruti Bhawsar | Medium 4 Full Practice Tests To Prepare Databricks Associate Certification (PySpark | 2022) | by AnBento | CodeX | Medium

Transformations on a JSON file using Pandas

Image
  A set of useful pandas tools to successfully load and transform a JSON file Loading and doing Transformations over a JSON (JavaScript Object Notation) file is something pretty common in the Data Engineering/Science world. JSON is a widely used format for storing and exchanging data. For example, NoSQL database like MongoDB store the data in JSON format, and REST API’s responses are mostly available in JSON. Although JSON works great for exchanging data over a network, if we intend to process the data, we would need to convert it into a tabular form, meaning something with columns and rows. Now, in general terms, we can encounter two types of JSON structures in a file: a JSON object a list of JSON objects JSON object vs list of JSON objects (image by author) In this article, we will focus on the second one (a list of JSON objects), as I understand it’s the most common scenario  and more importantly,  by learning to deal with the list scenario, we can then easily deal with the single J