Apache Kafka

Apache Kafka Streams Tumbling Time Window Transformations

In Apache Kafka Streams, a session gap is the period of inactivity that defines the end of a session in a session window transformation.

When events occur in a session window, they are grouped based on how closely they occur in time. The session gap parameter determines how long a session can be inactive before it is over.

For example, if the session gap duration is set to 5 minutes, any event that is produced in the period equal to or less than the session gap value is grouped as a single session. However, any event that is produced after the session gap duration starts a new session.

Setting the session gap too short can create too many small windows which increases the processing overhead. Similarly, placing it too long can result in unrelated combined sessions which leads to incorrect processing results.

Hence, it is good to find an optimal session gap depending on the specific use and the characteristics of the data that are stored in the cluster.

Kafka Streams Tumbling Time Window Transformations

One of the most prevalent features of Kafka Streams is the ability to perform the windowed operations on a given stream. A time window is a fixed-size, sliding window which is defined based on time.

There are two main types of time windows in Kafka:

  1. Tumbling Time Windows
  2. Hopping Time Windows

In this tutorial, we will focus on the tumbling time window transformations, what they are, how they work, and how to use them in a Kafka application.

What Are Tumbling Time Windows?

In a Kafka Stream, a tumbling time window is a fixed-size, non-overlapping window which is defined based on time. This means that the window size is fixed, and each window starts and ends at a specific duration.

We can use the tumbling time windows to group the events from a stream within a specific time. The duration of a given window is defined in the window size parameter.

Kafka Streams WindowedBy Method

To use a tumbling time window transformation in a Kafka stream, we need to define a windowed stream using the WindowedBy() method. The method takes a TimeWindows object which contains the duration of the window.

An example definition is as shown in the following:

TimeWindows tumblingWindow = TimeWindows.of(Duration.ofMinutes(3));

KStream<String, Integer> windowedStream = stream.windowedBy(tumblingWindow);

The given example defines a WindowedStream with a window duration of 3 minutes.

Once we define a windowed stream, we can perform a windowed operation using the reduce() method. An example code is as follows:

KTable<Windowed, Integer> sumByWindow = windowedStream
  .reduce(
    (value1, value2) -> value1 + value2,
    Materialized.with(Serdes.String(), Serdes.Integer())
  );

The previous example creates a KTable which contains the sum of values within each tumbling window.

Conclusion

We explored the fundamentals of working with tumbling window transformations in Kafka Streams. You can explore the documentation to learn more.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list