When events occur in a session window, they are grouped based on how closely they occur in time. The session gap parameter determines how long a session can be inactive before it is over.
For example, if the session gap duration is set to 5 minutes, any event that is produced in the period equal to or less than the session gap value is grouped as a single session. However, any event that is produced after the session gap duration starts a new session.
Setting the session gap too short can create too many small windows which increases the processing overhead. Similarly, placing it too long can result in unrelated combined sessions which leads to incorrect processing results.
Hence, it is good to find an optimal session gap depending on the specific use and the characteristics of the data that are stored in the cluster.
Kafka Streams Tumbling Time Window Transformations
One of the most prevalent features of Kafka Streams is the ability to perform the windowed operations on a given stream. A time window is a fixed-size, sliding window which is defined based on time.
There are two main types of time windows in Kafka:
- Tumbling Time Windows
- Hopping Time Windows
In this tutorial, we will focus on the tumbling time window transformations, what they are, how they work, and how to use them in a Kafka application.
What Are Tumbling Time Windows?
In a Kafka Stream, a tumbling time window is a fixed-size, non-overlapping window which is defined based on time. This means that the window size is fixed, and each window starts and ends at a specific duration.
We can use the tumbling time windows to group the events from a stream within a specific time. The duration of a given window is defined in the window size parameter.
Kafka Streams WindowedBy Method
To use a tumbling time window transformation in a Kafka stream, we need to define a windowed stream using the WindowedBy() method. The method takes a TimeWindows object which contains the duration of the window.
An example definition is as shown in the following:
KStream<String, Integer> windowedStream = stream.windowedBy(tumblingWindow);
The given example defines a WindowedStream with a window duration of 3 minutes.
Once we define a windowed stream, we can perform a windowed operation using the reduce() method. An example code is as follows:
The previous example creates a KTable which contains the sum of values within each tumbling window.
Conclusion
We explored the fundamentals of working with tumbling window transformations in Kafka Streams. You can explore the documentation to learn more.