Apache Kafka

Configure the Apache Kafka Retention Time

In Kafka, the retention time can be defined as the amount of time that a message is kept within the Kafka cluster before it is deleted. Two configuration properties determine the retention time: “log.retention.hours” and “log.retention.bytes”.

The “log.retention.hours” property specifies the number of hours that a message is kept. The “log.retention.bytes” property specifies the amount of disk space that the messages can consume. Whichever property is reached first will trigger the deletion of the oldest messages in the topic partition.

It is possible to set a different retention policy per topic which allows more flexibility in storing and managing the data within the Kafka cluster.

In this tutorial, we will learn about the Kafka retention features and how they impact the functionality of a Kafka cluster.

Kafka Retention.Hours Parameter

The principal determiner of how long a message is retained in a Kafka cluster is governed by the “log.retention.hours” parameter. This parameter sets the duration (in hours) of how long a message is stored.

By default, Kafka stores the message for one week or 168 hours.

You can change this value to your liking, but it has an impact. For example, setting a higher value results in more disk usage from the brokers on a given topic. Similarly, setting a smaller value reduces the available data on the target topic. Hence, if a consumer is unavailable for a specific duration, the data is removed regardless of whether the consumer has read it or not.

It is good to remember that Kafka supports lower retention parameters. For example, you can specify the “log.retention.minutes” and “log.retention.ms”.

Kafka uses the smaller unit size if multiple retention durations are set in the same configuration file.

Kafka Retention.Bytes

We can also specify the retention functionality using the “log.retention.bytes” parameter. This parameter allows Kafka to remove the messages based on the number of bytes that the message contains. This value is applied per partition in the cluster.

By default, the value is set to -1 which means that Kafka does not remove the messages based on the byte limit.

You can combine both the “log.retention.hours” and “log.retention.bytes” to ensure that the logs never exceed a specified size or a specific duration. Keep in mind that the ideal retention parameter combination depends on the storage requirements and consumer/producer functionality.

Remove Logs by Time

To allow Kafka to remove the messages based only on duration, we can set the “log.retention.hours” to a given value and the “log.retention.bytes” to -1:

retention.hours = 168
retention.bytes = -1

Remove Logs Based on Size

To remove the messages by a specific size limit, you can specify the log.retention.hours = -1 and the “log.retention.bytes” to the target size limit:

retention.hours = -1
retention.bytes = 1073741824

Command

The following command shows how you can configure the retention values using the kafka-config parameter:

kafka-configs.sh --bootstrap-server=localhost:9092 --alter --entity-type topics --entity-name sample_d --add-config retention.hours=-1,retention.bytes=1073741824

Conclusion

We discussed how to use the “log.retention” parameters to govern the duration and size of the messages that are retained by the Kafka cluster.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list