The “log.retention.hours” property specifies the number of hours that a message is kept. The “log.retention.bytes” property specifies the amount of disk space that the messages can consume. Whichever property is reached first will trigger the deletion of the oldest messages in the topic partition.
It is possible to set a different retention policy per topic which allows more flexibility in storing and managing the data within the Kafka cluster.
In this tutorial, we will learn about the Kafka retention features and how they impact the functionality of a Kafka cluster.
Kafka Retention.Hours Parameter
The principal determiner of how long a message is retained in a Kafka cluster is governed by the “log.retention.hours” parameter. This parameter sets the duration (in hours) of how long a message is stored.
By default, Kafka stores the message for one week or 168 hours.
You can change this value to your liking, but it has an impact. For example, setting a higher value results in more disk usage from the brokers on a given topic. Similarly, setting a smaller value reduces the available data on the target topic. Hence, if a consumer is unavailable for a specific duration, the data is removed regardless of whether the consumer has read it or not.
It is good to remember that Kafka supports lower retention parameters. For example, you can specify the “log.retention.minutes” and “log.retention.ms”.
Kafka uses the smaller unit size if multiple retention durations are set in the same configuration file.
Kafka Retention.Bytes
We can also specify the retention functionality using the “log.retention.bytes” parameter. This parameter allows Kafka to remove the messages based on the number of bytes that the message contains. This value is applied per partition in the cluster.
By default, the value is set to -1 which means that Kafka does not remove the messages based on the byte limit.
You can combine both the “log.retention.hours” and “log.retention.bytes” to ensure that the logs never exceed a specified size or a specific duration. Keep in mind that the ideal retention parameter combination depends on the storage requirements and consumer/producer functionality.
Remove Logs by Time
To allow Kafka to remove the messages based only on duration, we can set the “log.retention.hours” to a given value and the “log.retention.bytes” to -1:
retention.bytes = -1
Remove Logs Based on Size
To remove the messages by a specific size limit, you can specify the log.retention.hours = -1 and the “log.retention.bytes” to the target size limit:
retention.bytes = 1073741824
Command
The following command shows how you can configure the retention values using the kafka-config parameter:
Conclusion
We discussed how to use the “log.retention” parameters to govern the duration and size of the messages that are retained by the Kafka cluster.