Apache Kafka

Apache Kafka Auto Offset Reset

In Kafka, a message offset is a unique identifier for a message within a Kafka partition. It tracks which messages are consumed by a consumer and which are still pending in consumption.

Each message in a Kafka partition is assigned with a monotonically increasing offset. The consumer uses the offset to determine which message to consume next. The offset is critical to ensure that the messages are processed in the correct order and enable a reliable message processing in a distributed system.

However, it can happen that a consumer does not have the information about the previously consumed offset. This can happen for various reasons such as consumer crashes, loss of broker data, or manual offset reset.

When that happens, we need a way to tell the consumer what action to take next. For example, we can tell the consumer to start reading the messages from the beginning of the partition or at the end and such.

The behavior of how a consumer behaves when the information of the previously committed offset is lost is controlled by the auto.offset.reset configuration property.

This property is responsible for telling the consumer what the next action is in case of such a phenomenon.

Supported Values for the Parameter

The auto.offset.reset parameter accepts three main values:

  1. Earliest – Setting the parameter’s value to “earliest” forces the consumer to start reading from the earliest offset in the partition, regardless of whether there is a committed offset.
  2. Latest – By setting the value to “latest”, the consumer starts reading from the latest offset in the partition, regardless of whether there is a committed offset.
  3. None – We can also set the value of the parameter to “none”. This causes the consumer to throw an OffsetOutOfRangeException exception if it tries to access a partition without a committed offset.

Other Parameters that Govern the Auto Offset Reset Functionality

Another parameter that plays in governing the auto offset reset functionality is the offset.retention.minutes parameter.

This property sets the retention period for committed offsets in the offset storage. It determines the length of time that the offset storage retains the committed offsets for a consumer group before discarding them.

When a consumer group consumes the messages from a partition, it periodically commits the offset of the last processed message. The committed offset is stored in the offset storage, typically a compacted topic named “__consumer_offsets” in the Kafka cluster.

The “offset.retention.minutes” property controls the time that the offset storage retains the committed offsets for a consumer group before discarding them.

Setting a high value for this parameter can ensure that the committed offsets are available for extended periods. This can be helpful in such scenarios where the consumers might fail and need to restore their offset from the offset storage.

However, it also means that the offset storage retains more data, which can increase the storage requirements for the Kafka cluster.

On the other hand, setting a low value can reduce the storage requirements for the Kafka cluster. Still, it also means that the committed offsets might not be available for long after they have been committed, which can make it difficult to restore a consumer’s offset after a failure.

Reset the Offset to the Earliest

If you have access to the Kafka cluster, you can reset the offset value, manually using the Kafka CLI as shown in the following example command:

./kafka-consumer-groups.sh --bootstrap-server <broker_address> --group <consumer-group-id> --reset-offsets --to-earliest --topic <topic-name> --execute

The previous command resets the offset for all partitions of the specified topic in the specified consumer group to the earliest offset. We use the –execute flag to confirm that we want to execute the reset operation. This helps to prevent accidental resets which can lead to data loss.

Resetting the Offset Using the Shift By Parameter

We can also use the –shift-by parameter to rewind the offset by a specific value. Specifying a negative value allows you to go back in the produced messages, while a positive value allows you to advance in the available messages.

./kafka-consumer-groups.sh --bootstrap-server  --group  --reset-offsets --topic  --shift-by  --execute

Conclusion

We learned about the auto.offset.reset parameter in the Kafka configuration file that determines the behavior that is taken by the consumer group once the offset information is corrupt. We also learned the commands that we can use to reset the offset values.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list