Apache Kafka

Get the First Offset for the Given Partitions in Apache Kafka Consumer

We can define a Kafka partition as a single unit of parallelism and scalability. Think of it as a way to divide a topic into multiple, smaller parts that can be spread across multiple servers. Each Kafka partition is an ordered, immutable sequence of messages that are appended over time.

Once you manually write or a producer application writes the messages to a Kafka topic, the Kafka broker chooses the partition on which to assign the message using a default partitioning technique, mainly using the round-robin algorithm. In some cases, Kafka allows you to manually specify the target partition using the key parameter in the message.

During message consumption, the consumer application subscribes to the topic. The consumer can choose to read from any available partitions in the topic. Multiple consumers can also consume from different partitions in parallel which allows for efficient parallel processing of the messages.

It is good to remember that the order of messages within a partition is guaranteed. However, the order of messages across multiple partitions is not guaranteed. This means that the consumer cannot “hop” from one topic to another and expect to read the messages in the same order.

If you need to process the messages in a specific order, you must send them to the same partition.

What Is an Offset Value in a Partition?

Next, let us talk about the offset value.

In a Kafka partition, each message is assigned with a unique identifier value which is known as a message offset. This identifier is a 64-bit integer which is allocated to each message by the broker when it is produced.

The broker then uses it to keep track of the consumer’s position within a given partition. This offset value allows the consumer to read the messages in a given order, keep track of where it left off to resume the operations, and such. This is necessary for the reliability of the consumer since it allows it to pick up from where it left off in case of failures or rebalances.

A consumer can also explicitly state from which offset it wishes to start reading using the seek method. A consumer can also start at the latest offset which is known as the partition’s end. Similarly, the consumer can start reading from the earliest offset in the partition which is also known as the beginning of the partition.

In addition to providing reliability, the offset allows for flexible consumption patterns. For example, a consumer can rewind to an earlier offset in the partition and start consuming from there. Alternatively, it can skip over several offsets to consume only specific messages.

Get the First Offset for a Given Partition

In this tutorial, we will build a simple Python script that fetches the earliest offset from a Kafka partition using the confluent-kafka package.

from kafka import KafkaConsumer, TopicPartition

# configure consumer
consumer = KafkaConsumer(
bootstrap_servers=["localhost:9092"],
auto_offset_reset="earliest"
)

# fetch the first partition
partition = TopicPartition("users", 0)
consumer.assign([partition])

# # seek earliest offset
consumer.seek_to_beginning()
offset = consumer.position(partition)
print(f"The first offset for partition {partition}: {offset}")

In the previous example, we start by importing the KafkaConsumer and TopicPartition classes from the Kafka module.

These classes allow us to create a Kafka consumer and use the structs from the TopicPartition class. This allows us to specify the target partition for the topic from which we wish to consume.

Next, we define the details of the consumer including the broker address and the auto offset reset technique. In this case, we use the earliest offset.

The following section creates a TopicPartition instance with the topic name “users” and with a partition of 0. We then call the assign method on a KafkaConsumer instance. Calling the assign method allows us to assign a specific partition of a topic to a consumer, as opposed to using the subscribe method, which allows the consumer to subscribe to multiple topics and automatically load the balance partitions among the consumers in a group.

This is useful when you want a consumer that only consumes from a specific partition rather than having the Kafka Consumer Group protocol manage the partition assignment. In our case, this is useful as we only need the first offset in the first partition.

Finally, we seek the consumer at the beginning of the partition and read the first offset. Once we run the previous code, we should get the output as shown in the following:

The first offset for partition TopicPartition(topic='users', partition=0): 0

In this case, the first offset in the first partition of the “users” topic is 0.

Conclusion

We hope you learned something from this tutorial. You are free to extend the application to fetch the first offsets of various partitions in various topics of your cluster.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list