Key lessons I have learned While Using Kafka
Apache Kafka, also known as Kafka, is an enterprise-level messaging and streaming broker system. Kafka is a great technology to use to architect and build real-time data pipelines and streaming applications.
I highly recommend the architects to familiarise themselves with the Kafka ecosystem, in particular on the concepts of Kafka cluster, broker, topics, partitions, consumer, producer and offsets.
Article Aim
This article will highlight 12 of the important lessons I have learned whilst using Kafka.
- To enable parallel processing of messages, create multiple partitions in a topic. This enables multiple consumers to process the messages in parallel. Each partition can be consumed by only one consumer within a consumer group. So, if there are multiple consumers in a consumer group, they can consume messages from different partitions. Therefore, if we want to parallelize the consumption of the messages, create multiple partitions in a topic.
- Each message goes to every consumer group that has subscribed to a topic/partition, but within a group, it goes to only one consumer. Therefore, all
consumer groups
that have subscribed to the topic get the messages but only one consumer within a consumer group gets a message from a partition. So if you want to broadcast the message to multiple consumers, assign them differentconsumer groups
. - The default setting of a message size in Kafka is 1MB. Messages can be compressed before they are delivered to Kafka. To store more data in a single topic, we can create multiple partitions across multiple servers
- Ensure the messages that are required to be published or consumed are serializable. Take special care of date time and nested structures.
- Use the function seek(TopicPartition, long) to specify the new position.
- If we are designing an application where the order of messages is important and we want the order of messages to be guaranteed, then use the same Partition Id in all of the messages. The reason is that the ordering guarantee applies at the partition level. So, if you have more than one partition in a topic, you’ll need to ensure the messages you are required to appear in order to have the same partition Id. All messages that are pushed into a partition of a topic will be ordered correctly if they have the same partition ID.
- If we want global ordering across all topics, use a single partition topic.
- Keep your logs manageable and monitor disk space on regular basis.
- To design a durable system, ensure a high replication factor is set in the Kafka setting. Kafka replicates the log for each topic’s partitions across multiple servers. When a server fails, this allows automatic failover to these replicas as the messages remain available in the presence of failures. We can set the replication factor on a topic-by-topic basis. Plus, we can set the producer batch size to 1. It will ensure each message is saved to the disk and the messages are not flushed in batches. This will impact the performance. For durable and highly available systems, it’s important to have high topic replication. Usually, a minimum of 3 brokers is recommended for reliable failover.
- If we want to delete older messages, use compacted topics where older events for a key are removed as and when newer events are published to the topic.
- To secure Kafka, use TLS client certificates, encrypt the messages, and add user permissions.
- We could also Java DSL or Kafka’s SQL-like streaming language to create and process the streams of data that are stored in Kafka.
Summary
This article highlighted 10 important lessons I have learned whilst using Kafka.
No comments:
Post a Comment