CDC with RisingWave: Capture Database Changes in Real Time

CDC with RisingWave: Capture Database Changes in Real Time

Introduction

Your application writes to PostgreSQL, but your analytics team needs that data in a warehouse. Your search service needs to index every product update. Your cache layer needs to invalidate entries when the underlying data changes. All of these problems share one root cause: getting data out of your primary database and into other systems, reliably and in real time.

CDC with RisingWave solves this by capturing every insert, update, and delete from your PostgreSQL database and making those changes available as a continuous SQL stream. Instead of building custom polling scripts or scheduled ETL jobs, you define a CDC source in RisingWave and write SQL to transform, filter, and route the changes wherever they need to go.

This guide covers how to set up Change Data Capture (CDC) from PostgreSQL to RisingWave, build real-time transformations on top of the change stream, and sink the results to downstream systems. All examples target RisingWave v2.8.0.

What Is Change Data Capture?

Change Data Capture (CDC) is a technique that identifies and captures changes made to data in a database, then delivers those changes to downstream systems. Rather than querying the full table periodically, CDC reads the database's transaction log (WAL in PostgreSQL) to detect individual row-level changes as they happen.

CDC captures three types of events:

  • INSERT: A new row was added
  • UPDATE: An existing row was modified
  • DELETE: A row was removed

The main advantage over batch extraction is latency. Batch ETL might poll the database every 15 minutes, missing intermediate states and creating lag. CDC captures every change within seconds, preserving the full history of modifications.

How RisingWave Handles CDC

RisingWave supports two approaches for CDC ingestion:

  1. Direct CDC source: RisingWave connects directly to your PostgreSQL database's replication slot. No middleware required. This is the simplest setup for PostgreSQL.

  2. Kafka-based CDC: Use Debezium to capture changes from your database into Kafka, then create a Kafka source in RisingWave with the Debezium format. This approach works with any database that Debezium supports (MySQL, SQL Server, MongoDB, etc.).

For more about supported CDC sources, see the RisingWave CDC documentation.

Setting Up PostgreSQL CDC with RisingWave

Prerequisites

Before creating a CDC source, your PostgreSQL database needs the following configuration:

  1. WAL level set to logical: In postgresql.conf, set wal_level = logical. This enables the database to output logical replication events.

  2. Replication role: The database user RisingWave connects with must have the REPLICATION privilege.

  3. Publication: Create a publication for the tables you want to capture:

-- Run this in your source PostgreSQL database (not in RisingWave)
ALTER SYSTEM SET wal_level = logical;
-- Restart PostgreSQL after this change
CREATE PUBLICATION my_publication FOR ALL TABLES;

Creating a CDC Source in RisingWave

Use the CREATE SOURCE command with the PostgreSQL CDC connector:

-- This creates a CDC source pointing to an external PostgreSQL
-- Note: This requires a running PostgreSQL with logical replication enabled
-- For this tutorial, we demonstrate the syntax and then show
-- transformation patterns using regular tables

CREATE SOURCE pg_cdc_source WITH (
    connector = 'postgres-cdc',
    hostname = 'postgres-host',
    port = '5432',
    username = 'replication_user',
    password = 'secure_password',
    database.name = 'production_db',
    slot.name = 'risingwave_slot',
    publication.name = 'my_publication'
);

Once the source is created, create tables that map to your PostgreSQL tables:

-- Map to the upstream 'users' table
CREATE TABLE cdc_users (
    user_id INT PRIMARY KEY,
    email VARCHAR,
    plan VARCHAR,
    created_at TIMESTAMP,
    updated_at TIMESTAMP
) FROM pg_cdc_source TABLE 'public.users';

RisingWave will start capturing all changes from the users table: inserts, updates, and deletes are all reflected in cdc_users in real time.

Building Transformations on CDC Data

The real power of CDC with RisingWave is what you do after capturing the changes. Let's walk through practical transformation patterns using tables that simulate CDC data.

Setting Up Example Data

Create tables that represent data captured from an upstream database:

CREATE TABLE users (
    user_id INT,
    email VARCHAR,
    plan VARCHAR,
    signup_date DATE
);
CREATE TABLE user_events (
    event_id INT,
    user_id INT,
    event_type VARCHAR,
    page VARCHAR,
    event_time TIMESTAMP
);
INSERT INTO users VALUES
    (1, 'alice@example.com', 'pro', '2026-01-15'),
    (2, 'bob@example.com', 'free', '2026-02-01'),
    (3, 'charlie@example.com', 'enterprise', '2026-02-20'),
    (4, 'diana@example.com', 'pro', '2026-03-01'),
    (5, 'eve@example.com', 'free', '2026-03-10');
INSERT INTO user_events VALUES
    (1, 1, 'page_view', '/dashboard', '2026-03-28 08:00:00'),
    (2, 1, 'click', '/settings', '2026-03-28 08:05:00'),
    (3, 2, 'page_view', '/pricing', '2026-03-28 08:10:00'),
    (4, 3, 'page_view', '/dashboard', '2026-03-28 08:15:00'),
    (5, 3, 'click', '/api-docs', '2026-03-28 08:20:00'),
    (6, 4, 'page_view', '/dashboard', '2026-03-28 08:25:00'),
    (7, 2, 'click', '/pricing', '2026-03-28 08:30:00'),
    (8, 5, 'page_view', '/blog', '2026-03-28 08:35:00'),
    (9, 1, 'page_view', '/billing', '2026-03-28 08:40:00'),
    (10, 4, 'click', '/integrations', '2026-03-28 08:45:00');

Pattern 1: Enriched Change Stream

Join CDC data with reference tables to add context. This is useful for feeding enriched events to downstream systems:

CREATE MATERIALIZED VIEW mv_enriched_events AS
SELECT
    e.event_id,
    e.event_type,
    e.page,
    e.event_time,
    u.email,
    u.plan
FROM user_events e
JOIN users u ON e.user_id = u.user_id;
SELECT event_id, event_type, page, email, plan FROM mv_enriched_events ORDER BY event_id LIMIT 5;

Expected output:

 event_id | event_type |    page    |       email        |    plan
----------+------------+------------+--------------------+------------
        1 | page_view  | /dashboard | alice@example.com  | pro
        2 | click      | /settings  | alice@example.com  | pro
        3 | page_view  | /pricing   | bob@example.com    | free
        4 | page_view  | /dashboard | charlie@example.com | enterprise
        5 | click      | /api-docs  | charlie@example.com | enterprise

When the upstream users table gets an update (say, Bob upgrades from "free" to "pro"), the materialized view automatically reflects the change for all of Bob's events.

Pattern 2: Real-Time Activity Aggregation

Track user engagement metrics that update as new events arrive:

CREATE MATERIALIZED VIEW mv_user_activity AS
SELECT
    u.user_id,
    u.email,
    u.plan,
    COUNT(*) AS total_events,
    COUNT(*) FILTER (WHERE e.event_type = 'click') AS total_clicks,
    MAX(e.event_time) AS last_active
FROM users u
JOIN user_events e ON u.user_id = e.user_id
GROUP BY u.user_id, u.email, u.plan;
SELECT user_id, email, plan, total_events, total_clicks FROM mv_user_activity ORDER BY total_events DESC;

Expected output:

 user_id |        email        |    plan     | total_events | total_clicks
---------+---------------------+-------------+--------------+--------------
       1 | alice@example.com   | pro         |            3 |            1
       3 | charlie@example.com | enterprise  |            2 |            1
       4 | diana@example.com   | pro         |            2 |            1
       2 | bob@example.com     | free        |            2 |            1
       5 | eve@example.com     | free        |            1 |            0

Pattern 3: Filtered Replication

Not every downstream system needs every change. Create materialized views that filter for specific conditions, then sink only the relevant data:

CREATE MATERIALIZED VIEW mv_enterprise_activity AS
SELECT
    u.user_id,
    u.email,
    e.event_type,
    e.page,
    e.event_time
FROM user_events e
JOIN users u ON e.user_id = u.user_id
WHERE u.plan = 'enterprise';
SELECT * FROM mv_enterprise_activity ORDER BY event_time;

Expected output:

 user_id |        email        | event_type |    page    |     event_time
---------+---------------------+------------+------------+---------------------
       3 | charlie@example.com | page_view  | /dashboard | 2026-03-28 08:15:00
       3 | charlie@example.com | click      | /api-docs  | 2026-03-28 08:20:00

This view only includes activity from enterprise users. You could sink this filtered stream to a CRM system or a dedicated analytics table without cluttering it with data from free-tier users.

Sinking CDC Results Downstream

After transforming your CDC data, sink the results to the systems that need them. RisingWave supports a variety of sink connectors:

Sink to PostgreSQL (for a read replica pattern):

CREATE SINK sink_user_activity FROM mv_user_activity
WITH (
    connector = 'jdbc',
    jdbc.url = 'jdbc:postgresql://analytics-db:5432/analytics',
    table.name = 'user_activity_realtime',
    type = 'upsert',
    primary_key = 'user_id'
);

Sink to Kafka (for event distribution):

CREATE SINK sink_enriched_events FROM mv_enriched_events
WITH (
    connector = 'kafka',
    properties.bootstrap.server = 'kafka:9092',
    topic = 'enriched-user-events',
    type = 'append-only'
);

Sink to Apache Iceberg (for the data lake):

CREATE SINK sink_activity_iceberg FROM mv_user_activity
WITH (
    connector = 'iceberg',
    type = 'upsert',
    primary_key = 'user_id',
    catalog.type = 'rest',
    catalog.uri = 'http://iceberg-catalog:8181',
    database.name = 'user_analytics',
    table.name = 'user_activity'
);

For the full list of supported sink connectors, see the RisingWave sink documentation.

Monitoring Your CDC Pipeline

Once your CDC pipeline is running, monitor it to catch issues early:

-- Check source throughput
SELECT * FROM rw_catalog.rw_sources;
-- List all materialized views and their definitions
SELECT name, definition FROM rw_catalog.rw_materialized_views;

Key things to watch:

  • Replication lag: If RisingWave falls behind the source database, the replication slot in PostgreSQL will accumulate WAL. Monitor WAL size on the source to prevent disk issues.
  • Materialized view freshness: Query your materialized views and compare timestamps with the source to verify latency meets your requirements.
  • Sink errors: Check RisingWave logs for sink connector errors, especially after schema changes in the downstream system.

FAQ

What is CDC with RisingWave?

CDC with RisingWave captures row-level changes (inserts, updates, deletes) from databases like PostgreSQL and makes them available as a continuous stream. You can then transform these changes with SQL using materialized views and route the results to downstream systems through sink connectors.

Does RisingWave require Debezium for CDC?

No, not for PostgreSQL. RisingWave has a native PostgreSQL CDC connector that reads directly from the database's logical replication stream without any middleware. For other databases (MySQL, SQL Server, MongoDB), you can use Debezium to publish changes to Kafka, then ingest from Kafka into RisingWave.

How does RisingWave handle schema changes in the source database?

Schema changes like adding columns require recreating the CDC source and associated tables in RisingWave. Dropping a column in the source that the RisingWave table references will cause the CDC pipeline to fail. Plan schema migrations carefully and update RisingWave objects to match.

What happens if the connection to the source database is lost?

RisingWave automatically reconnects and resumes from the last committed position in the replication stream. The PostgreSQL replication slot retains WAL entries until RisingWave confirms they have been consumed, so no data is lost during temporary disconnections.

Conclusion

CDC with RisingWave provides a straightforward path from database changes to real-time downstream systems, using standard SQL instead of custom application code. Key takeaways:

  • RisingWave's native PostgreSQL CDC connector requires no middleware, connecting directly to the logical replication stream
  • Materialized views let you join, filter, and aggregate CDC data in real time before routing it downstream
  • Sink connectors deliver transformed data to PostgreSQL, Kafka, Iceberg, and other systems
  • Filtered replication reduces noise by sinking only the data each downstream system needs
  • The pipeline is fully defined in SQL, making it easy to version, review, and modify

For a step-by-step setup guide with PostgreSQL CDC, see the RisingWave PostgreSQL CDC quickstart.


Ready to try this yourself? Get started with RisingWave in 5 minutes. Quickstart ->

Join our Slack community to ask questions and connect with other stream processing developers.

Best-in-Class Event Streaming
for Agents, Apps, and Analytics
GitHubXLinkedInSlackYouTube
Sign up for our to stay updated.