The Live Index: Why Vector Search Should Be a Streaming Problem

There is a quiet assumption baked into almost every vector database in production today: that the corpus you search is, in some practical sense, frozen.

You batch up your documents. You run them through an embedding model. You bulk insert the resulting vectors into Pinecone, Weaviate, Milvus, pgvector, or one of a dozen others. You build an HNSW or IVF index over that snapshot. Then you point your retrieval layer at it and serve queries. When new data shows up, you queue it for the next ingestion run. The index, meanwhile, keeps answering questions based on the world as it existed an hour ago, or a day ago, or whenever the last cron job finished.

For static corpora, this is fine. Most of Wikipedia does not change in the next ten minutes. But the workloads people actually want to build now are not static. They are product catalogs that change with every price update. They are conversation histories that grow with every user message. They are fraud signals where the embedding of a session is only useful if you can query it within seconds of the session happening. The data is alive. The index is not.

We think this gap, between the data that exists and the data you can search, is the most underappreciated problem in vector retrieval. And we think it is fundamentally a streaming problem.

How traditional vector databases got here

The conventional architecture is straightforward to draw and harder to operate. A source system, usually a transactional database or an event bus, holds the canonical data. A pipeline, usually a batch job or a workflow orchestrator, reads from that source, runs each row through an embedding model, and writes the resulting vectors to a separate vector store. The vector store builds an approximate nearest neighbor index, typically HNSW for low-latency reads or IVF-PQ for memory-constrained deployments. Application code then queries the vector store at serve time.

There are three failure modes baked into this shape.

The first is the lag. There is always a window between a row being written to the source and the corresponding vector being queryable. For batch ingestion, this window is hours. For micro-batch, it is minutes. Even with the most aggressive pipelines, it is rarely less than tens of seconds, because building an HNSW index incrementally is hard, and most systems amortize index updates over batches.

The second is the consistency problem. The source has the authoritative row. The vector store has a derived representation of that row. When the row is deleted or updated, the vector becomes stale, but the vector store does not know that until the pipeline tells it. If the pipeline misses an event, you have a phantom vector that returns to queries forever, or until someone notices.

The third is operational. You are now running two stateful systems, with two replication topologies, two backup strategies, two scaling stories, and a pipeline that has to be exactly-once or you have to accept duplicates. This is real engineering work, and it shows up in postmortems.

The streaming community has spent a decade solving exactly these problems for tabular data. Change data capture, incremental view maintenance, exactly-once sinks. The interesting question is whether the same machinery applies to vectors.

What changes when the index is streaming

A streaming database treats data as a continuous flow of changes rather than a static table. When you write a materialized view in RisingWave, you are not asking for a snapshot; you are asking the system to keep that view correct as the underlying data evolves. New rows update the view. Deleted rows update the view. Joins, aggregations, and window functions all participate in this incremental maintenance.

If you can put a vector column into that flow, and if you can put a vector index on the output, then the gap between data and queryable index collapses to whatever the streaming engine's processing latency is. For most workloads, that is well under a second.

This is what RisingWave's native vector support is for. We added a vector(n) type to the system, similarity operators that work in both ad-hoc queries and continuous queries, and HNSW indexes that are built and maintained as part of the streaming dataflow. The result is that an embedding written to a source table at time T is searchable through an HNSW index at time T plus the streaming latency, with no separate pipeline, no separate index build, no separate consistency model.

What this looks like in SQL

The vector type is a first-class column type. Dimensions are declared up front and enforced on write:

CREATE TABLE documents (
  doc_id BIGINT PRIMARY KEY,
  title TEXT,
  body TEXT,
  embedding vector(384),
  updated_at TIMESTAMPTZ
);

If you try to insert a vector of the wrong length, the system rejects it at write time. This sounds obvious until you have spent an afternoon debugging an inference service that started returning 768 dimensions instead of 384 because someone changed the model.

The similarity operators follow the same conventions as pgvector, which keeps the porting cost low:

Distance type	Operator	Notes
L2 (Euclidean)	`<->`	Default for most embedding models
Cosine	`<=>`	Common for normalized text embeddings
Inner product	`<#>`	Returns the negative inner product
L1 (Manhattan)	`<+>`	Less common, but supported

A nearest neighbor query is what you would expect:

SELECT doc_id, title, embedding <-> '[...]'::vector(384) AS dist
FROM documents
ORDER BY embedding <-> '[...]'::vector(384)
LIMIT 10;

The interesting part is the index. Today, RisingWave supports HNSW indexes on append-only inputs, which means append-only tables or append-only materialized views. The syntax is recognizable to anyone who has used pgvector, with a few RisingWave-specific knobs:

CREATE TABLE doc_events (
  doc_id BIGINT,
  title TEXT,
  embedding vector(384),
  created_at TIMESTAMPTZ
) APPEND ONLY;

CREATE INDEX idx_doc_hnsw ON doc_events
USING HNSW (embedding)
INCLUDE (title, created_at)
WITH (
  distance_type = 'cosine',
  m = 32,
  ef_construction = 40
);

INCLUDE is worth pausing on. In a serving system, you rarely want just the vector key. You want the title, the URL, the tenant ID, anything you need to render the result. Including those columns in the index avoids a round trip back to the base table during retrieval, which is what makes vector search latency live in the single-digit milliseconds rather than tens.

ef_search is set per session via batch_hnsw_ef_search and trades recall for latency at query time, the standard HNSW knob.

A live retrieval pipeline in one file

Here is the shape of a real pipeline. We have a stream of document updates flowing in from upstream, and we want a continuously maintained, queryable HNSW index over the latest version of each document's embedding.

-- Source: document events from your application
CREATE TABLE doc_events (
  doc_id BIGINT,
  title TEXT,
  body TEXT,
  embedding vector(384),
  event_time TIMESTAMPTZ
) APPEND ONLY;

-- HNSW index, maintained as events arrive
CREATE INDEX idx_docs_live ON doc_events
USING HNSW (embedding)
INCLUDE (doc_id, title)
WITH (distance_type = 'cosine', m = 32, ef_construction = 40);

Application code, written against the same database that owns the source table, queries the index directly:

SELECT doc_id, title,
       embedding <=> $1::vector(384) AS dist
FROM doc_events
ORDER BY embedding <=> $1::vector(384)
LIMIT 20;

When an upstream event arrives, the embedding is written to the source. The HNSW index incorporates the new vector as part of the same streaming dataflow that maintains every other materialized view in the system. The next query sees it.

You can compose vector search with normal SQL too. A materialized view can pre-filter by tenant, freshness window, or any other predicate, and the index can sit on the filtered output:

CREATE MATERIALIZED VIEW recent_docs AS
SELECT *
FROM doc_events
WHERE event_time > NOW() - INTERVAL '7 days';

Vector retrieval is now also a SQL operation, which means you get joins, window functions, watermarks, and the rest of the streaming engine's toolbox in the same query plane.

The architectural shift

The standard architecture is two systems with a pipeline between them. The streaming architecture is one system with a materialized view inside it.

Traditional:
  OLTP DB  -->  ETL  -->  Embedding service  -->  Vector DB
                                                       |
                                                       v
                                                  Application

RisingWave:
  Source  -->  Streaming DB (vector type + HNSW + MVs)  -->  Application

The thing you lose by collapsing the boxes is the freedom to pick a vector database that is purpose-built for vectors and only vectors. Specialized systems often have better recall at the high end, more sophisticated quantization options, and tighter latency under extreme query load. If you are running a billion-vector deduplication index and your only job is retrieval, a dedicated vector database is probably still the right answer.

The thing you gain is freshness, consistency, and operational simplicity. The same database that ingests your events maintains the index over those events. There is no second system to keep in sync. Deletes and updates flow through the same exactly-once machinery as every other write. The retrieval query is a SQL query, which means your existing observability, access control, and connection pooling all just work.

When to choose which

We try to be honest about this. Native vector search in a streaming database is not a universal replacement for dedicated vector databases. It is a different shape with different tradeoffs.

Reach for a dedicated vector database when:

The corpus is essentially static and the read path is the only thing that matters.
You need billion-scale vectors with aggressive quantization (PQ, OPQ, scalar quantization at extreme ratios).
You have specialized recall requirements that need a system tuned end to end for ANN.

Reach for a streaming database with native vector support when:

Freshness matters. The data feeding the embeddings is changing continuously, and you need search results to reflect that within seconds.
You are already maintaining materialized views over the same source data and want to keep the embeddings in the same place.
Operational simplicity matters more than the last 1 percent of recall.
Retrieval is one step in a larger pipeline that also involves filters, joins, aggregations, or watermarks.

The agentic and AI-native workloads we have been talking to teams about almost all fall into the second bucket. The embeddings represent things that just happened: a message in a conversation, an event in a session, a row in a constantly updating product table. The half-life of relevance is short. A search system that sees the world as it was an hour ago is not useful.

What live indexes make possible

Static indexes shaped what we built on top of them. Retrieval-augmented generation, recommendation systems, semantic deduplication: these all assume that the corpus, once indexed, is the corpus. When you remove that assumption, some new things become straightforward.

Real-time personalization where the user's last interaction is searchable on the next query. Anomaly detection where a session embedding is checked against the last hour of sessions, not yesterday's. Multi-agent systems where each agent's memory is a continuously updated, queryable embedding stream. Tool selection where the set of available tools changes per request and the index has to keep up.

None of these are impossible with a static vector database. They just require a pipeline that the application team has to build, operate, and debug forever. We think that pipeline belongs inside the database, where the rest of the data already lives. A vector is just another column. An index is just another materialized view. Similarity is just another operator. Treat it that way, and a lot of the architecture gets simpler.

The data is live. The index should be too.