How to Monitor and Debug Streaming SQL Pipelines in Production

How to Monitor and Debug Streaming SQL Pipelines in Production

·

16 min read

You deployed your streaming SQL pipelines to production. The materialized views are running, data is flowing, and your dashboards look great. Then something changes. A view falls behind. Query latency spikes. Storage grows faster than expected. How do you figure out what went wrong?

Monitoring streaming pipelines is fundamentally different from monitoring batch jobs. A batch job either finishes or fails. A streaming pipeline runs continuously, and problems manifest as gradual degradation: increasing latency, growing state size, or subtle data staleness. Without the right observability, these issues go undetected until they become outages.

This guide walks you through the complete monitoring and debugging toolkit built into RisingWave. You will learn how to use system catalog tables to inspect every running pipeline, read EXPLAIN plans to understand execution topology, track barrier latency to catch processing bottlenecks, and analyze storage statistics to manage state growth. Every query in this article has been verified against RisingWave 2.8.0.

What Makes Streaming Pipeline Monitoring Different from Batch

In batch processing, monitoring is straightforward. You track job duration, success/failure status, and output row counts. A job is either running or done.

Streaming pipelines introduce a different set of challenges:

  • Continuous execution - Pipelines run indefinitely. There is no "completion" state to check.
  • Latency over throughput - What matters is not how many rows you process per hour, but how far behind real-time your results are.
  • Stateful operators - Joins, aggregations, and windows accumulate internal state that grows over time.
  • Cascading dependencies - One slow materialized view can propagate backpressure to every downstream view in its dependency chain.
  • Barrier-based consistency - RisingWave uses barriers (similar to Flink's checkpointing mechanism) to maintain exactly-once consistency. If barriers slow down, your entire pipeline's freshness degrades.

RisingWave exposes all of this observability through SQL-queryable system tables in the rw_catalog and pg_catalog schemas. No external monitoring agents needed. You query your pipeline health the same way you query your data.

Setting Up a Sample Pipeline to Monitor

Before diving into the monitoring tools, let's create a realistic pipeline that we can inspect. We will build an order analytics pipeline with multiple materialized views.

CREATE TABLE order_events (
  order_id VARCHAR,
  customer_id VARCHAR,
  product_id VARCHAR,
  category VARCHAR,
  amount DECIMAL,
  region VARCHAR,
  event_time TIMESTAMPTZ
);

CREATE TABLE web_events (
  session_id VARCHAR,
  user_id VARCHAR,
  page_url VARCHAR,
  event_type VARCHAR,
  event_time TIMESTAMPTZ
);

Now insert sample data to give the pipeline something to process:

INSERT INTO order_events VALUES
  ('ord-001', 'cust-10', 'prod-A', 'electronics', 299.99, 'us-east', '2026-03-29 10:00:00+00'),
  ('ord-002', 'cust-11', 'prod-B', 'clothing', 59.99, 'us-west', '2026-03-29 10:01:00+00'),
  ('ord-003', 'cust-12', 'prod-C', 'electronics', 149.99, 'eu-west', '2026-03-29 10:02:00+00'),
  ('ord-004', 'cust-10', 'prod-D', 'home', 89.99, 'us-east', '2026-03-29 10:03:00+00'),
  ('ord-005', 'cust-13', 'prod-A', 'electronics', 299.99, 'us-west', '2026-03-29 10:04:00+00'),
  ('ord-006', 'cust-14', 'prod-E', 'clothing', 45.00, 'eu-west', '2026-03-29 10:05:00+00'),
  ('ord-007', 'cust-15', 'prod-F', 'home', 199.99, 'us-east', '2026-03-29 10:06:00+00'),
  ('ord-008', 'cust-16', 'prod-A', 'electronics', 299.99, 'ap-east', '2026-03-29 10:07:00+00');

Create three materialized views with different characteristics:

-- Aggregation with two group-by keys
CREATE MATERIALIZED VIEW mv_order_metrics AS
SELECT
  region,
  category,
  COUNT(*) AS order_count,
  SUM(amount) AS total_revenue,
  AVG(amount) AS avg_order_value
FROM order_events
GROUP BY region, category;

-- Time-bucketed aggregation
CREATE MATERIALIZED VIEW mv_hourly_revenue AS
SELECT
  date_trunc('hour', event_time) AS hour,
  region,
  COUNT(*) AS orders,
  SUM(amount) AS revenue
FROM order_events
GROUP BY date_trunc('hour', event_time), region;

-- Multi-column aggregation
CREATE MATERIALIZED VIEW mv_category_stats AS
SELECT
  o.category,
  COUNT(DISTINCT o.customer_id) AS unique_customers,
  COUNT(*) AS total_orders,
  SUM(o.amount) AS total_revenue
FROM order_events o
GROUP BY o.category;

Verify the pipeline is producing correct results:

SELECT * FROM mv_order_metrics ORDER BY region, category;
 region  |  category   | order_count | total_revenue | avg_order_value
---------+-------------+-------------+---------------+-----------------
 ap-east | electronics |           1 |        299.99 |          299.99
 eu-west | clothing    |           1 |         45.00 |           45.00
 eu-west | electronics |           1 |        149.99 |          149.99
 us-east | electronics |           1 |        299.99 |          299.99
 us-east | home        |           2 |        289.98 |          144.99
 us-west | clothing    |           1 |         59.99 |           59.99
 us-west | electronics |           1 |        299.99 |          299.99
(7 rows)

The pipeline is running. Now let's learn how to monitor it.

How to Inspect Running Pipelines with System Catalog Tables

RisingWave's rw_catalog schema contains over 70 system tables that expose every aspect of your cluster. Here are the most important ones for pipeline monitoring.

Listing All Materialized Views

The rw_materialized_views table is your starting point. It shows every MV in the system with its creation time and current status:

SELECT id, name, status, created_at
FROM rw_catalog.rw_materialized_views
ORDER BY created_at DESC;
 id  |       name        | status  |        created_at
-----+-------------------+---------+---------------------------
 387 | mv_category_stats | Created | 2026-03-31 07:35:43+00:00
 384 | mv_hourly_revenue | Created | 2026-03-31 07:35:41+00:00
 381 | mv_order_metrics  | Created | 2026-03-31 07:35:39+00:00
(3 rows)

The status field tells you whether the MV is fully created and serving results. During initial backfill of a large table, you would see a Creating status instead.

For a unified view across all object types (tables, views, materialized views, sinks), use rw_relations:

SELECT id, name, relation_type
FROM rw_catalog.rw_relations
WHERE name LIKE 'mv_%' OR name LIKE 'order_%' OR name LIKE 'web_%'
ORDER BY relation_type, name;
 id  |       name        |   relation_type
-----+-------------------+-------------------
 387 | mv_category_stats | materialized view
 384 | mv_hourly_revenue | materialized view
 381 | mv_order_metrics  | materialized view
 379 | order_events      | table
 380 | web_events        | table
(5 rows)

Checking Cluster Health

The rw_worker_nodes table shows every node in your cluster with its role, status, and version:

SELECT
  id,
  type,
  host || ':' || port AS address,
  state,
  parallelism,
  rw_version,
  started_at
FROM rw_catalog.rw_worker_nodes
ORDER BY type;
 id |           type           |    address     |  state  | parallelism | rw_version |        started_at
----+--------------------------+----------------+---------+-------------+------------+---------------------------
  1 | WORKER_TYPE_COMPACTOR    | 127.0.0.1:6660 | RUNNING |          30 | 2.8.0      | 2026-03-30 17:46:41+00:00
  2 | WORKER_TYPE_COMPUTE_NODE | 127.0.0.1:5688 | RUNNING |          10 | 2.8.0      | 2026-03-30 17:46:41+00:00
  3 | WORKER_TYPE_FRONTEND     | 0.0.0.0:4566   | RUNNING |             | 2.8.0      | 2026-03-30 17:46:41+00:00
  0 | WORKER_TYPE_META         | 127.0.0.1:5690 | RUNNING |             | 2.8.0      | 2026-03-30 17:46:40+00:00
(4 rows)

A healthy cluster shows all nodes in RUNNING state. If a compute node disappears or shows a non-RUNNING state, your streaming pipelines will be affected. This is the first thing to check when something goes wrong.

Monitoring Streaming Job Status

The rw_streaming_job_info table provides a higher-level view of each streaming job's health:

SELECT id, name, status, parallelism, max_parallelism, resource_group
FROM rw_catalog.rw_streaming_job_info
WHERE name LIKE 'mv_%'
ORDER BY name;
 id  |       name        | status  | parallelism | max_parallelism | resource_group
-----+-------------------+---------+-------------+-----------------+----------------
 387 | mv_category_stats | CREATED | ADAPTIVE    |             256 | default
 384 | mv_hourly_revenue | CREATED | ADAPTIVE    |             256 | default
 381 | mv_order_metrics  | CREATED | ADAPTIVE    |             256 | default
(3 rows)

The parallelism column shows how many parallel workers are processing data for each MV. ADAPTIVE means RisingWave automatically scales parallelism based on the available compute nodes. The max_parallelism of 256 means the MV can scale up to 256 parallel tasks if you add more compute nodes.

How to Read EXPLAIN Plans for Streaming Queries

Understanding how RisingWave executes a materialized view is critical for debugging performance issues. The EXPLAIN CREATE MATERIALIZED VIEW statement shows the streaming execution plan without actually creating the MV.

Reading a Simple Aggregation Plan

EXPLAIN CREATE MATERIALIZED VIEW test_plan AS
SELECT
  region,
  category,
  COUNT(*) AS order_count,
  SUM(amount) AS total_revenue,
  AVG(amount) AS avg_order_value
FROM order_events
GROUP BY region, category;
StreamMaterialize { columns: [region, category, order_count, total_revenue, avg_order_value],
                    stream_key: [category, region], pk_columns: [category, region],
                    pk_conflict: NoCheck }
└─StreamProject { exprs: [order_events.region, order_events.category, count,
                  sum(order_events.amount),
                  (sum(order_events.amount) / count(order_events.amount)::Decimal)] }
  └─StreamHashAgg { group_key: [order_events.category, order_events.region],
                    aggs: [count, sum(order_events.amount), count(order_events.amount)] }
    └─StreamExchange { dist: HashShard(order_events.category, order_events.region) }
      └─StreamTableScan { table: order_events, columns: [category, amount, region, _row_id] }

Reading from bottom to top:

  1. StreamTableScan - Reads rows from order_events, selecting only the columns needed (category, amount, region).
  2. StreamExchange - Redistributes data by hash of (category, region) so all rows for the same group land on the same worker.
  3. StreamHashAgg - Performs the aggregation. Note it computes three aggregates: count, sum(amount), and count(amount) (the latter is needed for AVG computation).
  4. StreamProject - Computes the final expressions, including dividing sum by count to produce avg_order_value.
  5. StreamMaterialize - Writes results to storage with primary key (category, region).

Reading a Join Plan

Join-based MVs have more complex plans. Let's examine one:

EXPLAIN CREATE MATERIALIZED VIEW test_join_plan AS
SELECT
  o.order_id,
  o.product_id,
  o.amount,
  w.event_type AS web_action,
  o.event_time
FROM order_events o
JOIN web_events w ON o.customer_id = w.user_id;
StreamMaterialize { columns: [order_id, product_id, amount, web_action, event_time, ...],
                    stream_key: [...], pk_conflict: NoCheck }
└─StreamExchange { dist: HashShard(...) }
  └─StreamHashJoin { type: Inner, predicate: order_events.customer_id = web_events.user_id }
    ├─StreamExchange { dist: HashShard(order_events.customer_id) }
    │ └─StreamTableScan { table: order_events, columns: [order_id, customer_id, ...] }
    └─StreamExchange { dist: HashShard(web_events.user_id) }
      └─StreamTableScan { table: web_events, columns: [user_id, event_type, ...] }

The key operator here is StreamHashJoin. Both inputs are redistributed by their join key (customer_id and user_id respectively) before the join. This means the join maintains state for both sides. In production, this state can grow large if your join keys have high cardinality. This is one of the most common sources of state growth.

What to Look For in EXPLAIN Plans

When debugging performance, focus on these patterns:

PatternWhat It MeansPotential Issue
StreamHashJoinTwo-sided stateful joinState grows with both input sizes
StreamHashAgg with many group keysHigh-cardinality aggregationLarge state per unique key combination
Multiple StreamExchange nodesData shuffling between workersNetwork overhead if cluster is large
StreamTableScan reading many columnsFull or wide table scanConsider projecting fewer columns in the source

How to Track Barrier Latency and Pipeline Freshness

Barrier latency is the single most important metric for streaming pipeline health. RisingWave uses barriers (epoch boundaries) to ensure consistency across all operators. A barrier flows through the entire pipeline, and when it completes, all operators have processed data up to that point.

Understanding Barriers

Think of a barrier as a timestamp marker that flows through your pipeline. Every second (by default), RisingWave injects a new barrier. When the barrier reaches the end of every pipeline branch, it is "completed" and the results become visible to queries.

If barriers start taking longer to complete, your pipeline is falling behind real-time. This is the streaming equivalent of "query is slow."

Querying Barrier Latency from Event Logs

The rw_event_logs table records every barrier completion with its duration:

SELECT
  event_type,
  COUNT(*) AS total_barriers,
  ROUND(AVG((info->'barrierComplete'->>'durationSec')::DECIMAL), 4) AS avg_duration_sec,
  ROUND(MAX((info->'barrierComplete'->>'durationSec')::DECIMAL), 4) AS max_duration_sec,
  ROUND(MIN((info->'barrierComplete'->>'durationSec')::DECIMAL), 4) AS min_duration_sec
FROM rw_catalog.rw_event_logs
WHERE event_type = 'BARRIER_COMPLETE'
  AND timestamp > NOW() - INTERVAL '5 minutes'
GROUP BY event_type;
    event_type    | total_barriers | avg_duration_sec | max_duration_sec | min_duration_sec
------------------+----------------+------------------+------------------+------------------
 BARRIER_COMPLETE |             10 |           0.0278 |           0.0347 |           0.0236
(1 row)

In a healthy system, barrier duration stays in the low millisecond range (under 100ms). Here our barriers complete in about 25-35ms, which is excellent.

Warning signs to watch for:

  • Average barrier duration > 1 second - Pipeline is processing slowly. Check for hot keys, insufficient parallelism, or compute node resource pressure.
  • Max barrier duration >> average - Occasional spikes indicate intermittent bottlenecks, often caused by garbage collection pauses, storage compaction, or network issues.
  • Fewer barriers than expected - If barrier_interval_ms is 1000 (default) and you see fewer than 60 barriers per minute, some barriers are getting delayed.

Checking the Barrier Interval Configuration

SELECT name, setting
FROM pg_catalog.pg_settings
WHERE name = 'barrier_interval_ms';
        name         | setting
---------------------+---------
 barrier_interval_ms | 1000
(1 row)

A 1000ms barrier interval means RisingWave targets one barrier per second. For lower-latency use cases, you can decrease this to 100ms, though it increases the overhead per barrier.

Monitoring All Event Types

The event log captures more than just barriers. Here's how to see what types of events are being recorded:

SELECT
  event_type,
  COUNT(*) AS count
FROM rw_catalog.rw_event_logs
GROUP BY event_type
ORDER BY count DESC;
       event_type        | count
-------------------------+-------
 BARRIER_COMPLETE        |    10
 GLOBAL_RECOVERY_START   |     1
 GLOBAL_RECOVERY_SUCCESS |     1
 CREATE_STREAM_JOB_FAIL  |     1
 META_NODE_START         |     1
(5 rows)

The CREATE_STREAM_JOB_FAIL event is especially useful for debugging. It records the DDL statement, the error message, and the job name:

SELECT
  timestamp,
  info->'createStreamJobFail'->>'name' AS job_name,
  info->'createStreamJobFail'->>'error' AS error_message
FROM rw_catalog.rw_event_logs
WHERE event_type = 'CREATE_STREAM_JOB_FAIL';

This lets you see historical failures even if you were not watching the terminal when the CREATE statement ran.

How to Analyze Storage and State Size

Every stateful operator in your pipeline (joins, aggregations, deduplication) maintains internal state. Monitoring state size helps you predict storage growth and catch runaway state before it becomes a problem.

Table-Level Storage Statistics

The rw_table_stats table shows the size of every internal and user-facing table:

SELECT
  r.name,
  ts.total_key_count,
  ts.total_key_size,
  ts.total_value_size
FROM rw_catalog.rw_table_stats ts
JOIN rw_catalog.rw_relations r ON ts.id = r.id
WHERE r.name IN ('mv_order_metrics', 'mv_hourly_revenue', 'mv_category_stats')
ORDER BY ts.total_value_size DESC;
       name        | total_key_count | total_key_size | total_value_size
-------------------+-----------------+----------------+------------------
 mv_order_metrics  |               7 |            288 |              491
 mv_hourly_revenue |               4 |            136 |              192
 mv_category_stats |               3 |             84 |              146
(3 rows)

The total_key_count tells you how many unique keys each MV maintains. For mv_order_metrics, there are 7 unique (region, category) combinations. As your data grows, this number shows you whether your aggregation cardinality is bounded (good) or unbounded (watch out).

Internal State Tables

Every materialized view creates internal state tables for its operators. You can inspect these to understand where state is accumulating:

SELECT id, name
FROM rw_catalog.rw_internal_table_info
WHERE name LIKE '%order_metrics%';
 id  |                       name
-----+--------------------------------------------------
 382 | __internal_mv_order_metrics_216_hashaggstate_286
 383 | __internal_mv_order_metrics_217_streamscan_287
(2 rows)

The mv_order_metrics view has two internal tables: one for the HashAgg operator's state (storing partial aggregation results) and one for the StreamScan operator. You can query rw_table_stats for these internal tables too, to see exactly which operator is consuming the most storage.

Hummock Storage Engine Statistics

RisingWave uses Hummock, an LSM-tree-based storage engine. You can inspect its health through system tables:

SELECT
  level_id,
  COUNT(*) AS sstable_count,
  SUM(file_size) AS total_file_size,
  SUM(total_key_count) AS total_keys
FROM rw_catalog.rw_hummock_sstables
GROUP BY level_id
ORDER BY level_id;
 level_id | sstable_count | total_file_size | total_keys
----------+---------------+-----------------+------------
        0 |            25 |         1013048 |      24480
(1 row)

Data sitting at level 0 means it has not been compacted yet. In a healthy system with steady load, you should see data distributed across multiple levels. If level 0 accumulates too many SSTables, it means compaction is falling behind, which will eventually slow down reads.

How to Debug Common Production Issues

Issue 1: Materialized View Stuck in "Creating" State

When you create a MV on a large existing table, RisingWave backfills historical data before serving fresh results. Monitor backfill progress:

SELECT
  ddl_id,
  ddl_statement,
  progress,
  initialized_at,
  backfill_type
FROM rw_catalog.rw_ddl_progress;

If this returns rows, a DDL operation is actively backfilling. The progress field shows how far along it is. If progress stops advancing, check:

  1. Compute node resource usage (CPU, memory)
  2. Barrier latency (slow barriers slow everything)
  3. Whether the source table is receiving a burst of writes that compete with backfill

Issue 2: Unexpected State Growth

If storage grows faster than expected, identify which MVs are contributing:

SELECT
  r.name,
  r.relation_type,
  ts.total_key_count,
  ts.total_value_size
FROM rw_catalog.rw_table_stats ts
JOIN rw_catalog.rw_relations r ON ts.id = r.id
WHERE ts.total_value_size > 0
ORDER BY ts.total_value_size DESC
LIMIT 10;

Common causes of unexpected state growth:

  • High-cardinality GROUP BY - If you group by a column with millions of unique values (like user_id), the MV state will grow with every new user.
  • Unbounded joins - A streaming join between two tables without time-based bounds keeps all historical data in state.
  • Missing time filters - If your MV processes all historical data without a WHERE event_time > ... predicate, state accumulates indefinitely.

Issue 3: Tuning Parallelism

If a specific MV is processing slowly, check its current parallelism:

SELECT id, name, parallelism, max_parallelism
FROM rw_catalog.rw_streaming_parallelism
WHERE name = 'mv_order_metrics';
 id  |       name       | parallelism | max_parallelism
-----+------------------+-------------+-----------------
 381 | mv_order_metrics | ADAPTIVE    |             256
(1 row)

You can manually set parallelism for a specific MV:

ALTER MATERIALIZED VIEW mv_order_metrics SET PARALLELISM = 4;

After the change, verify:

SELECT id, name, parallelism, max_parallelism
FROM rw_catalog.rw_streaming_parallelism
WHERE name = 'mv_order_metrics';
 id  |       name       | parallelism | max_parallelism
-----+------------------+-------------+-----------------
 381 | mv_order_metrics | FIXED(4)    |             256
(1 row)

The parallelism changed from ADAPTIVE to FIXED(4). To revert to automatic scaling:

ALTER MATERIALIZED VIEW mv_order_metrics SET PARALLELISM = ADAPTIVE;

Issue 4: Inspecting the Fragment Graph

Each MV is broken into fragments (units of parallel execution). Understanding the fragment graph helps diagnose which stage of a pipeline is the bottleneck:

SELECT
  f.fragment_id,
  f.distribution_type,
  f.parallelism,
  f.flags,
  f.upstream_fragment_ids
FROM rw_catalog.rw_fragment_infos f
WHERE f.table_id IN (
  SELECT id FROM rw_catalog.rw_materialized_views
  WHERE name = 'mv_order_metrics'
)
ORDER BY f.fragment_id;
 fragment_id | distribution_type | parallelism |                    flags                    | upstream_fragment_ids
-------------+-------------------+-------------+---------------------------------------------+-----------------------
         216 | HASH              |          10 | {MVIEW}                                     | {217}
         217 | HASH              |          10 | {STREAM_SCAN,SNAPSHOT_BACKFILL_STREAM_SCAN} | {212}
(2 rows)

Fragment 216 is the materialization fragment (marked with MVIEW flag) and receives data from fragment 217, which is the stream scan. Both run with parallelism 10. If one fragment's parallelism is much lower than another, it could be a bottleneck.

Building a Production Monitoring Dashboard

Here is a practical monitoring query you can run on a schedule or wire into a Grafana dashboard using the PostgreSQL data source:

-- Pipeline health overview
SELECT
  mv.name AS materialized_view,
  mv.status,
  sp.parallelism,
  ts.total_key_count AS state_keys,
  ts.total_value_size AS state_bytes,
  mv.created_at
FROM rw_catalog.rw_materialized_views mv
LEFT JOIN rw_catalog.rw_streaming_parallelism sp ON mv.id = sp.id
LEFT JOIN rw_catalog.rw_table_stats ts ON mv.id = ts.id
ORDER BY ts.total_value_size DESC NULLS LAST;

For alerting on barrier latency, run this query periodically and alert when avg_duration_sec exceeds your SLA threshold:

SELECT
  ROUND(AVG((info->'barrierComplete'->>'durationSec')::DECIMAL), 4) AS avg_barrier_sec,
  ROUND(MAX((info->'barrierComplete'->>'durationSec')::DECIMAL), 4) AS max_barrier_sec,
  COUNT(*) AS barrier_count
FROM rw_catalog.rw_event_logs
WHERE event_type = 'BARRIER_COMPLETE'
  AND timestamp > NOW() - INTERVAL '1 minute';

Key thresholds to monitor:

MetricHealthyWarningCritical
Avg barrier latency< 100ms100ms - 1s> 1s
Level-0 SSTable count< 5050 - 200> 200
DDL progress stalledN/ANo progress for 5mNo progress for 30m
Worker node stateAll RUNNINGAny restarted recentlyAny not RUNNING

FAQ

What is barrier latency in RisingWave?

Barrier latency is the time it takes for a consistency checkpoint to flow through all operators in your streaming pipeline. RisingWave injects barriers at regular intervals (default: every 1 second). Each barrier ensures all operators have processed data up to that point. Lower barrier latency means fresher query results.

How do I find which materialized view is using the most storage?

Query the rw_table_stats system table joined with rw_relations to see storage consumption per object. Sort by total_value_size descending to find the largest state. Check internal tables (prefixed with __internal_) to identify which specific operator within a view is accumulating state.

Can I change the parallelism of a running materialized view?

Yes. Use ALTER MATERIALIZED VIEW <name> SET PARALLELISM = <number> to set a fixed parallelism, or SET PARALLELISM = ADAPTIVE to let RisingWave auto-scale. The change takes effect without recreating the view. This is useful when a specific view needs more processing capacity.

How do I tell if my streaming pipeline is falling behind?

Check the barrier latency in rw_event_logs. If the average durationSec for BARRIER_COMPLETE events is consistently above 1 second, your pipeline is struggling. Also check rw_ddl_progress for backfill operations that might be consuming resources, and rw_worker_nodes to confirm all compute nodes are healthy.

Conclusion

Monitoring streaming SQL pipelines requires a different mental model than monitoring batch jobs. Instead of checking "did it finish," you are continuously watching latency, state size, and cluster health.

Key takeaways:

  • Use rw_catalog system tables to inspect every aspect of your pipeline through standard SQL queries.
  • Read EXPLAIN plans before deploying MVs to understand their execution topology and state requirements.
  • Track barrier latency as your primary freshness metric, alerting when duration exceeds your SLA.
  • Monitor state growth via rw_table_stats to catch unbounded state before it becomes a storage problem.
  • Use ALTER PARALLELISM to scale individual MVs up or down without recreating them.

Ready to try this yourself? Get started with RisingWave in 5 minutes. Quickstart

Join our Slack community to ask questions and connect with other stream processing developers.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.