Usage Guide

StreamForge is strongest when you want to move a selected subset of Kafka data into downstream systems that need a different shape, a lower-trust payload, or a different topic layout.

This guide focuses on the high-value usage patterns that match the current product position: selective replication, shaping, redaction, and routing. It does not treat StreamForge as a full active-active mirroring tool or a general stream processing engine.

Start with a Validated Example

If you want a working starting point instead of building from scratch:

QUICKSTART.md for the five-minute local demo
EXAMPLES.md for validated example packs
YAML_CONFIGURATION.md for the configuration structure
ADVANCED_DSL_GUIDE.md for filter and transform expressions

Core Usage Patterns

Filtered Replication to Analytics or Lake

Use StreamForge when an operational topic contains more fields than downstream analytics systems should receive.

Typical flow:

consume an application topic
filter to the event types you actually want downstream
project only the analytics-safe fields
publish to a dedicated analytics topic or cluster

Good fit:

CDC or event streams feeding a warehouse or lake
domain event cleanup before analytics ingestion
splitting operational and analytical contracts

See:

PII-Safe Replication Across Trust Boundaries

Use StreamForge when the destination system should receive business events but not raw identifiers or sensitive fields.

Typical flow:

keep only approved fields
hash, mask, or drop sensitive values
publish to a downstream topic with a tighter contract

Good fit:

staging or partner environments
lower-trust analytics consumers
internal topics that should not expose raw customer data

See:

Topic Fan-Out with Consumer-Specific Shapes

Use one input topic with multiple outputs when different consumers need different subsets or payload shapes.

Typical flow:

read one source topic
apply per-destination filters
publish consumer-specific payloads to separate topics

Good fit:

one source topic feeding operations, analytics, and audits
service-specific integration topics
separating high-volume raw streams from narrower downstream contracts

Derived Metrics with Aggregations

Use an aggregation: block inside a routing destination when the downstream system needs a compact metrics topic instead of every raw event.

Typical flow:

filter to the events that should count toward the rollup
reshape the value so aggregation reads a stable payload
emit tumbling-window metrics to a dedicated Kafka topic

Good fit:

per-region or per-tenant rollups for analytics
low-overhead operational metrics streams derived from business events
approximate distinct counts or quantile summaries without standing up a separate streaming stack

Boundaries:

aggregations run after the destination filter and value transform
aggregated outputs go to the destination output topic
windowing is processing-time and in-memory in v1
emit_interval_seconds controls how often StreamForge checks for completed windows
commit_strategy.manual_commit: true is not supported for aggregation destinations
this mode is for lightweight rollups, not joins, SQL, sliding windows, session windows, or durable state

See:

Cross-Cluster Replication with Shaping

Use StreamForge when you need to move data between clusters but do not want to mirror whole topics unchanged.

Good fit:

regional or environment replication with filtering
topic migrations where downstream contracts are changing
Redpanda or Kafka targets that only need a selected portion of the source stream

See:

Build a Pipeline

1. Choose the Source and Destinations

Define the input topic and decide whether you are publishing to:

one destination topic
multiple destination topics in the same cluster
a different Kafka-compatible target cluster

2. Add Selection Logic

Use filters when only part of the source stream should move downstream.

Common selectors:

event type
region or tenant
presence or value of a field
metadata in key or headers

3. Shape the Payload

Use transforms to:

keep only downstream-safe fields
rename or restructure fields
construct smaller consumer-specific payloads
hash or drop sensitive data

4. Validate Before Running

Prefer validated YAML configs over ad hoc inline examples.

cargo run --quiet --bin streamforge-validate -- path/to/config.yaml

5. Deploy in the Right Mode

Use the standalone binary when you want the lightest operational path. Use the operator and Helm chart when you want pipelines managed as Kubernetes resources.

See:

When StreamForge Is the Wrong Tool

Do not use StreamForge as your primary answer for:

MirrorMaker 2 active-active replication
consumer offset synchronization across clusters
general SQL stream processing
joins, session/sliding windows, and broad stateful event computation

That boundary is intentional. StreamForge is the selective replication and shaping layer. Heavier stateful analytics belongs in tools built for that purpose.

Usage Guide

Start with a Validated Example

Core Usage Patterns

Filtered Replication to Analytics or Lake

PII-Safe Replication Across Trust Boundaries

Topic Fan-Out with Consumer-Specific Shapes

Derived Metrics with Aggregations

Cross-Cluster Replication with Shaping

Build a Pipeline

1. Choose the Source and Destinations

2. Add Selection Logic

3. Shape the Payload

4. Validate Before Running

5. Deploy in the Right Mode

When StreamForge Is the Wrong Tool

Recommended Reading Order

Table of contents