Table of Contents
Tumbling Window
Key Characteristics
Example
Use Cases
RisingWave SQL Implementation
Advantages
Considerations
Related Blog Posts
Frequently Asked Questions
Related Glossary Terms

Tumbling Window

A Tumbling Window is a specific type of time window used in stream processing that divides a data stream into a series of fixed-size, non-overlapping, and contiguous time intervals. Each event in the stream belongs to exactly one tumbling window.

Think of them as a series of adjacent, fixed-duration "buckets" where events are collected for processing (e.g., aggregation).

Key Characteristics

  1. Fixed Size (Duration): All tumbling windows have the same, predefined duration (e.g., 5 minutes, 1 hour, 1 day).
  2. Non-overlapping: Windows do not share any time. The end of one window is immediately followed by the start of the next.
    • If a window is defined as [T, T + duration), the next window will be [T + duration, T + 2*duration), and so on.
  3. Contiguous: There are no gaps between consecutive windows. The entire timeline is covered.
  4. Assignment: Each event from the stream is assigned to a single window based on its timestamp (event time or processing time).

Example

If you define a 10-minute tumbling window:

  • Events occurring between 10:00:00 and 10:09:59.999 would fall into the window [10:00:00, 10:10:00).
  • Events occurring between 10:10:00 and 10:19:59.999 would fall into the window [10:10:00, 10:20:00).
  • And so on.

Use Cases

Tumbling windows are commonly used for:

  • Periodic Reporting: Generating reports at regular intervals (e.g., total sales per minute, number of errors per hour, average sensor reading per day).
  • Fixed-Interval Analysis: Analyzing data within distinct, non-overlapping time segments.
  • Simple Aggregations: When you need to aggregate data over fixed chunks of time without overlap. For example, counting the number of tweets every 5 minutes.

RisingWave SQL Implementation

In RisingWave, tumbling windows are typically implemented using the TUMBLE() Time Window Valued Function (TVF) within a GROUP BY clause.

-- Example: Calculate the sum of 'amount' for each 15-minute tumbling window
-- based on the 'event_timestamp' column from 'transactions_stream'.

SELECT
    window_start,    -- The start timestamp of the window
    window_end,      -- The end timestamp of the window
    SUM(amount) AS total_amount
FROM TUMBLE(transactions_stream, event_timestamp, INTERVAL '15' MINUTE)
GROUP BY window_start, window_end;

-- Example: Count the number of user logins every hour
SELECT
    TUMBLE_START(login_time, INTERVAL '1' HOUR) AS hour_bucket, -- Alternative way to get window start
    COUNT(user_id) AS login_count
FROM user_logins
GROUP BY TUMBLE(login_time, INTERVAL '1' HOUR); -- Grouping by the TUMBLE function itself

Advantages

  • Simplicity: Easy to understand and implement.
  • Clear Boundaries: Each event belongs to one and only one window, avoiding ambiguity.
  • Efficiency: Can be processed efficiently as windows are distinct.

Considerations

  • Boundary Effects: Events occurring exactly on a window boundary might be assigned to one window or the other based on inclusive/exclusive boundary definitions (typically, start is inclusive, end is exclusive).
  • Event Time vs. Processing Time: The choice of time characteristic (event time or processing time) significantly impacts how events are assigned to windows and the accuracy of results, especially with out-of-order data or processing delays. Using event time with watermarks is generally recommended for accuracy.

Tumbling windows are a fundamental building block for many stream processing analytics and are well-supported in systems like RisingWave.

Was this content helpful?
Help us improve by giving us your feedback.
Yes
No
The Modern Backbone for Your
Data Streaming Workloads
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.