Posts

Showing posts from January, 2023

Kafka: An Overview

Image
  Explain Kafka Like I am 5 :   Kafka is like a big post office where people can send messages to different rooms (called “topics”) and other people can come and read the messages. The messages are saved in a big notebook (called “log”) so even if the rooms get too full, the messages don’t get lost. The log keeps track of all the messages that have been received, in the order that they were received. And if more people want to read the messages, we can just make more rooms (more “topics”) out of thin air. Definition : Kafka is a distributed streaming platform   that is used for building real-time data pipelines and streaming applications. It is designed to handle high volumes of data and provide a fault-tolerant way of storing and processing streams of records in real-time. Kafka is based on a publish-subscribe model, where producers write data to topics, and consumers read data from those topics. Topics are partitioned and replicated across a cluster of servers, which al...

How to send tabular time series data to Apache Kafka with Python and Pandas

Time-series data comes in all shapes and sizes and it’s often produced in high frequencies in the form of sensor data and transaction logs. It’s also produced in huge volumes where the records are separated by milliseconds rather than hours or days. But what kind of system that can handle such a constant stream of data? An older approach would be to dump the raw data in Data Lake and process it in huge batches with a long-running process. Nowadays, many companies prefer to process the raw data in real-time and write the aggregated results to a database. For example, an online retailer could continuously aggregate transactional data by product and day rather than running expensive database queries on demand. But how would this work in practice? Let’s find out! In this tutorial, we’ll use Python and Apache Kafka to process large volumes of time series data from that comes from a real online retailer. What you’ll learn By the end of this tutorial you’ll understand: Why startups and online...