One of the primary building blocks of Apache Kafka is a topic. In simple terms, we can define a Kafka topic as a category or a storage unit which is used by an Apache Kafka cluster to store a stream of events.
To best understand an Apache Kafka topic, think of it as a log of events which is similar to a log folder within a specific system. Each file in the log folder is then treated as a single event within the cluster.
A Kafka topic inherits the following properties:
- A Kafka topic is append-only. This means that the new events are appended at the end of the log.
- Events in a given topic are immutable. Hence, once an event is written to a topic, it cannot be modified.
- A Kafka consumer reads a log by searching for a specific offset and then reads the entries from that offset sequentially.
- Kafka topics are multi-consumer and multi-producer. Hence, each topic can have the producers and consumers from zero to as many producers as possible.
Before we can write or read any event to a Kafka topic, we must ensure that the topic exists. The most common method of creating a Kafka topic is using a set of provided CLI tools.
However, you may encounter a scenario where you need to modify the properties of an existing Kafka topic.
Using this tutorial, we will learn how we can use the various CLI tools to modify the properties of an existing Kafka topic.
Properties of Kafka Topic
When creating a Kafka topic, we define a set of properties that define how the topic is stored in the cluster and how the multiple brokers handle the data in a Kafka cluster.
It is also good to note that Kafka may use the default values if not defined during the topic creation.
There are a lot of parameters that you can configure for your topic. You can check the list as shown in the following link: https://kafka.apache.org/documentation/#topicconfigs
However, the most common ones include:
The topic replication factor – This property is used to determine the number of replicas of each partition that is maintained across the Kafka cluster. Each partition in a Kafka topic is replicated across multiple brokers. This improves the fault-tolerance and high availability. For example, if we set the replication factor of a topic to 3, the partition data is stored in 3 different brokers within the cluster.
The number of partitions – The number of partitions parameter governs the topic’s parallelism and throughput. Think of a partition as a separate data stream within a Kafka topic. Kafka consumers can read from each partition independently which allows for parallel processing of the events across multiple consumers.
The message size – As the name suggests, this parameter allows us to define the maximum size that can be stored within a given Kafka topic. Any message that exceeds the specified size is rejected from the cluster. Therefore, it is good to consider this parameter when creating a Kafka topic as it can heavily impact the cluster’s performance, storage, and network requirements.
Compression level parameter – The compression level parameter of a Kafka topic allows us to determine the compression algorithm and level that are used when compressing the messages within a Kafka topic. Message compression can dramatically reduce the data that is stored in Kafka. Although the compression level is valuable, setting a higher value can increase the CPU usage when compressing and decompressing the messages.
The log cleanup policy – This parameter allows us to dictate the policy that is used to clean up the logs from a Kafka topic.
We must consider the provided significant parameters when creating a Kafka topic.
Reasons Why You Need to Change a Kafka Topic Parameter
There are several reasons why we may need to change the various Kafka topic parameters:
Changing the replication factor of a topic can help increase the fault tolerance and availability of the data in a given topic.
Similarly, modifying the number of partitions in a given topic can help improve parallelism and throughput or reduce the overhead costs.
We can also modify the message parameter to accommodate larger or smaller messages to the topic.
We can also change the compression level parameter to improve the cluster’s performance or reduce the storage requirements.
Create a Sample Kafka Topic
Let us set up a basic Kafka topic to demonstrate how to modify the various parameters of a given topic.
For this tutorial, we will use the Kafka CLI tools.
Run the following command:
The previous command creates a topic called “sample_topic” with a replication factor of 3, 3 partitions, a compression level of medium (snappy), a max message size of 3MB, and a cleanup policy of compact.
Modifying the Kafka Topic Parameters
To modify an existing topic parameter, we can use the “kafka-config.sh” utility. The command syntax is as follows:
--entity-type topics --entity-name <topic-name> \
--alter --add-config <topic_parameter=value>
For example, to change the compression level of the sample_topic that we created earlier, we can use the following command:
--entity-type topics --entity-name sample_topic \
--alter --add-config compression.type=gzip
The previous command should change the compression type from snappy to gzip.
We can also alter a Kafka topic to add an existing parameter that did not exist as shown in the following command syntax:
--entity-type topics --entity-name <topic-name> \
--alter --add-config <parameter=value>
For example, to add a “delete.retention” value for the sample_topic, we can run the following command:
--entity-type topics --entity-name sample_topic \
--alter --add-config delete.retention.ms=86400000
The previous command should modify the sample_topic and add the “delete.retention.ms” parameter with a value of 24 hrs.
NOTE: If the configuration exists within a topic, running the –add-config command alters the parameter’s value.
Conclusion
In this comprehensive tutorial, you learned about the Kafka topic configuration and some of the most popular configuration parameters. We also discussed how to modify the existing Kafka topic parameters using the “kafka-config.sh” utility.