Consumers subscribe to topics and read the records from the partitions in a balanced manner which allows for parallel processing and scaling of data streams.
We have been talking about partitions in Kafka. But what exactly are partitions?
In Kafka, a partition refers to a subset of a given topic’s data which is located in a single node within the Kafka cluster. The subset of the data is then replicated to the other nodes within the cluster which allows Kafka to provide a fault tolerance for the data that is stored in it.
Each partition in a cluster is an ordered, immutable sequence of records that is continually updated. Records within a partition are assigned a unique offset which acts as an identifier for that record within the partition.
Partitions allow for the parallel processing of data within a topic by multiple consumers as each consumer can read from a different partition in parallel. This helps increase the throughput and overall capacity of the Kafka cluster and enables the consumers to work independently and scale horizontally.
This tutorial explores in creating a partition in the Apache Kafka cluster. This tutorial provides demonstration commands that should not be copied to production environments. It is always good to consider other options such as available resources, data management, source connectors, etc.
Create a Partition in Apache Kafka
By default, Apache Kafka initializes the partitions for a given topic during the topic creation. The default number of partitions created by Kafka is determined by the “num.partitions” property in the server configuration file.
You can check the default partition as shown in the following:
The given command uses the cat command to retrieve the contents of the “server.properties” file which is located in the kafka/config directory. We then filter the output to display only the lines that contain the “num.partitions” string.
The num.partitions configuration setting specifies the default number of partitions that a new topic will have if no value is specified when the topic is created.
The following command should return an example output as shown in the following:
In this case, if the number of partitions is not specified during the topic creation, Kafka assigns a single partition to the topic.
To define a custom partition value during the topic creation, we need to use the –partitions option with the “topic-topic.sh” utility.
An example command is as follows:
The previous command creates a new topic called “testing_topic” with three partitions.
It is good to remember that increasing the number of partitions inherently increases the data parallelism and, in turn, the data capacity.
It can also increase the complexity of managing the topic. Choosing the number of partitions carefully is essential, considering the factors such as the volume of data, the desired level of parallelism, and the number of consumers.
Conclusion
We outlined how you can configure the number of partitions of a given Kafka topic using the “kafka-topics.sh” command.