StreamForge Recording Packages
Use these packages to record the seven campaign videos with human narration. Each package includes a recording objective, screen plan, terminal commands, narration script, expected proof points, YouTube metadata, and publishing notes.
Human audio is recommended for every video. Read the narration naturally instead of word-for-word if the terminal takes longer than expected. Keep the recording practical: show commands, configs, output topics, and verification.
Repo link to use before a YouTube URL exists: https://github.com/rahulbsw/streamforge
Shared Recording Setup
Prepare once before recording:
cd /Users/rajain5/dev/tools/cisco-git/wap-mirrormaker-rust
git status --short
docker version
cargo --version
Terminal setup:
- Use a large readable font.
- Use two terminal panes for local demos: left for StreamForge, right for Kafka/Redpanda commands.
- Clear terminal scrollback before each take.
- Keep secrets and AWS account identifiers out of frame.
- Start every demo with the repo root visible so viewers trust the commands are reproducible.
- Record full-screen app windows only.
- Do not record the desktop, laptop wallpaper, personal dock, notifications, or unrelated windows.
- Use one full-screen terminal scene and one full-screen browser scene instead of dragging windows around on the desktop.
- Disable notifications and hide the menu bar/dock if your recorder does not crop them.
- When switching apps, pause recording or use a clean scene transition so personal windows are never visible.
Recommended capture setup:
- Use OBS, Screen Studio, or another recorder that supports window capture.
- Capture the terminal window directly for command-heavy sections.
- Capture the browser window directly for UI sections.
- Use full-screen terminal tabs instead of desktop-level screen capture.
- Keep a separate private desktop space for notes, AWS console, chat, and credentials.
Human narration tone:
- Plain technical explanation.
- Avoid hype-only language.
- Say “selective replication” and “data shaping” consistently.
- Say “Kafka-compatible broker” when using Redpanda.
- Say “production-style” for AWS unless every production hardening step is actually included.
Package 1: Local Redpanda Quickstart
Objective: Show the shortest reproducible StreamForge path: one source topic replicated into analytics-safe and PII-safe destination topics.
Target Viewer: Data engineers, platform engineers, and developers evaluating the project for the first time.
Screen Plan:
- Show
README.mdtagline. - Show
examples/redpanda/selective-replication.yaml. - Terminal pane 1: start Redpanda and run StreamForge.
- Terminal pane 2: create topics, produce one event, consume both outputs.
- Optional browser shot: GitHub README demo section after recording.
Terminal Commands:
Pre-build before recording so compile output does not dominate the video:
cargo build --release --bin streamforge
Main recording commands:
docker compose -f examples/redpanda/docker-compose.yml up -d
docker compose -f examples/redpanda/docker-compose.yml ps
cargo run --quiet --bin streamforge-validate -- examples/redpanda/selective-replication.yaml
CONFIG_FILE=examples/redpanda/selective-replication.yaml ./target/release/streamforge
In a second terminal:
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic delete raw-orders analytics-orders pii-safe-orders || true
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic create raw-orders analytics-orders pii-safe-orders
printf '%s\n' \
'{"order_id":"ord-1001","customer":{"id":"cust-42","email":"alice@example.com"},"amount":125,"region":"us","created_at":"2026-05-12T15:04:05Z"}' \
| docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic produce raw-orders
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic consume analytics-orders -n 1 --offset start
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic consume pii-safe-orders -n 1 --offset start
curl http://localhost:8080/health
curl http://localhost:8080/metrics | rg "streamforge_messages_(consumed|produced)|streamforge_consumer_lag"
Cleanup:
docker compose -f examples/redpanda/docker-compose.yml down
Expected Proof Points:
- Config validation succeeds.
- StreamForge starts with
examples/redpanda/selective-replication.yaml. analytics-orderscontainsorder_id,customer_id,amount,region, andcreated_at.pii-safe-orderscontainsorder_id,amount,region, andcreated_at.- Raw email is absent from the
pii-safe-ordersvalue. - Metrics show one consumed message, one produced message for
analytics-orders, one produced message forpii-safe-orders, and zero lag.
Human Audio Script:
“This is StreamForge, a selective replication and data-shaping layer for Kafka-compatible brokers. I am using Redpanda locally so the demo is easy to reproduce.”
“The source topic is raw-orders. Instead of mirroring the whole topic unchanged, this config writes two downstream contracts: one for analytics and one that keeps raw email out of the payload.”
“I validate the config first because the YAML is the contract. If the config is wrong, I want to know before the pipeline starts.”
“Now StreamForge is running. I will produce one raw order event with a customer ID and email address.”
“The analytics topic keeps the fields analytics needs. The PII-safe topic keeps the business facts but does not expose the raw email in the value.”
“That is the core use case: move only the records and fields downstream systems actually need.”
YouTube Metadata:
- Title:
StreamForge in 5 Minutes: Selective Kafka Replication with Redpanda - Description:
Run StreamForge locally with Redpanda and replicate one raw orders topic into analytics-safe and PII-safe downstream topics. - Thumbnail text:
One Kafka Topic -> Safe Outputs - Pinned comment:
Commands are in docs/QUICKSTART.md and examples/redpanda/selective-replication.yaml. Try the demo and open an issue if any command does not work in your environment.
Publish Copy: Use Demo 1 from social-posts.md.
Dry-Run Result: 2026-05-25
- Config validation passed.
- Redpanda started from
examples/redpanda/docker-compose.yml. - The three demo topics were reset and recreated before producing the sample event.
- StreamForge started cleanly from
./target/release/streamforge. - Produced sample event at
raw-ordersoffset0. analytics-ordersoutput value:
{"amount":125,"created_at":"2026-05-12T15:04:05Z","customer_id":"cust-42","order_id":"ord-1001","region":"us"}
pii-safe-ordersoutput value:
{"amount":125,"created_at":"2026-05-12T15:04:05Z","order_id":"ord-1001","region":"us"}
pii-safe-ordersoutput key was SHA-256 hashff8d9819fc0e12bf0d24892e45987e249a28dce836a85cad60e28eaaa8c6d976.- Health endpoint returned
OK. - Metrics showed
streamforge_messages_consumed_total 1, one produced message for each destination, andstreamforge_consumer_lagat0. - Cleaned up with
docker compose -f examples/redpanda/docker-compose.yml down.
V2 DSL Re-Check: 2026-05-26
- Re-ran the local quickstart with v2 filter and transform syntax in
examples/redpanda/selective-replication.yaml. - StreamForge logs showed
and($region == 'us', $amount >= 100),regex(field('/customer/email'), ...),construct(...),$order_id, andhash('SHA256', $customer.email)parsed at runtime. analytics-ordersandpii-safe-ordersproduced the same shaped outputs as the earlier dry run.- Health returned
OK; metrics showed one consumed source message, one produced message per destination, and lag0.
Package 2: Kubernetes UI and Operator Demo
Objective: Show StreamForge as a Kubernetes-native pipeline workflow with a UI front door and YAML/CRD control plane.
Target Viewer: Platform engineers, Kubernetes operators, and data platform teams.
Screen Plan:
- Terminal: show cluster health.
- Terminal: install Helm chart.
- Browser: open UI and create a pipeline.
- Browser: show generated YAML preview.
- Terminal: show CRD and pods.
- Terminal: produce and consume verification event.
Terminal Commands:
kubectl get nodes
helm lint ./helm/streamforge-operator
helm template streamforge-operator ./helm/streamforge-operator \
--namespace streamforge-system \
--set ui.enabled=true | rg 'apiGroups: \["streamforge.io"\]'
Build and load the current StreamForge pipeline image before recording. This is required for the v2 filter and transform syntax used in the demo form:
docker build -t streamforge:local .
docker save streamforge:local -o /private/tmp/streamforge-local.tar
minikube image load /private/tmp/streamforge-local.tar
On Apple Silicon Minikube, also build and load a local UI image because the published ghcr.io/rahulbsw/streamforge-ui:latest image may not include a linux/arm64 manifest:
docker build -f ui/Dockerfile -t streamforge-ui:local ui
docker save streamforge-ui:local -o /private/tmp/streamforge-ui-local.tar
minikube image load /private/tmp/streamforge-ui-local.tar
Install with UI enabled and the current local pipeline image. Use the local UI image override on Apple Silicon:
helm upgrade --install streamforge-operator ./helm/streamforge-operator \
--namespace streamforge-system \
--create-namespace \
--set defaults.image.repository=streamforge \
--set defaults.image.tag=local \
--set defaults.image.pullPolicy=IfNotPresent \
--set ui.enabled=true \
--set ui.image.repository=streamforge-ui \
--set ui.image.tag=local \
--set ui.image.pullPolicy=IfNotPresent
On an amd64 recording machine where the published UI image pulls successfully, omit the three ui.image.* overrides but keep the local pipeline image override:
helm upgrade --install streamforge-operator ./helm/streamforge-operator \
--namespace streamforge-system \
--create-namespace \
--set defaults.image.repository=streamforge \
--set defaults.image.tag=local \
--set defaults.image.pullPolicy=IfNotPresent \
--set ui.enabled=true
Verify operator, UI, service, CRD, and RBAC:
kubectl get pods -n streamforge-system
kubectl get svc -n streamforge-system
kubectl get crd | rg streamforge
kubectl auth can-i list streamforgepipelines.streamforge.io \
--as=system:serviceaccount:streamforge-system:streamforge-ui \
-n streamforge-system
If using local port-forward:
kubectl port-forward -n streamforge-system svc/streamforge-operator-ui 3001:3001
For a self-contained local Kubernetes recording, deploy a small in-cluster Redpanda broker:
kubectl create namespace redpanda
kubectl create deployment redpanda -n redpanda \
--image=docker.redpanda.com/redpandadata/redpanda:v25.1.2 \
-- /entrypoint.sh redpanda start \
--overprovisioned \
--smp 1 \
--memory 1G \
--reserve-memory 0M \
--check=false \
--node-id 0 \
--kafka-addr PLAINTEXT://0.0.0.0:9092 \
--advertise-kafka-addr PLAINTEXT://redpanda.redpanda.svc.cluster.local:9092
kubectl expose deployment redpanda -n redpanda \
--port=9092 \
--target-port=9092 \
--name=redpanda
kubectl rollout status deployment/redpanda -n redpanda --timeout=180s
kubectl exec -n redpanda deployment/redpanda -- rpk cluster info
kubectl exec -n redpanda deployment/redpanda -- \
rpk topic create raw-orders-ui-demo analytics-orders-ui-demo
UI values for the pipeline form:
- Pipeline name:
ui-orders-demo - Namespace:
streamforge-system - Application ID:
ui-orders-demo - Source bootstrap:
redpanda.redpanda.svc.cluster.local:9092 - Source topic:
raw-orders-ui-demo - Consumer group:
streamforge-ui-demo - Offset:
earliest - Destination bootstrap:
redpanda.redpanda.svc.cluster.local:9092 - Destination topic:
analytics-orders-ui-demo - Filter:
$region == 'us' - Transform:
construct(order_id=$order_id, amount=$amount, region=$region) - Replicas:
1 - Threads:
2
After the UI create step, verify Kubernetes resources:
kubectl get sfp ui-orders-demo -n streamforge-system -o yaml
kubectl get pods -n streamforge-system -l streamforge.io/pipeline=ui-orders-demo
kubectl get configmap ui-orders-demo-config -n streamforge-system -o jsonpath='{.data.config\.yaml}'
Produce and consume one event:
printf '%s\n' \
'{"order_id":"ord-ui-demo-1001","customer":{"id":"cust-42","email":"alice@example.com"},"amount":125,"region":"us","created_at":"2026-05-25T20:35:00Z"}' \
| kubectl exec -i -n redpanda deployment/redpanda -- \
rpk topic produce raw-orders-ui-demo
kubectl exec -n redpanda deployment/redpanda -- \
rpk topic consume analytics-orders-ui-demo -n 1 --offset start
The pipeline pod should use streamforge:local in this recording so the current v2 DSL support is present. If the pod falls back to ghcr.io/rahulbsw/streamforge:0.3.0, stop and fix the image override before recording transform verification.
Cleanup:
kubectl delete sfp ui-orders-demo -n streamforge-system
helm uninstall streamforge-operator -n streamforge-system
kubectl delete namespace streamforge-system
kubectl delete namespace redpanda
Expected Proof Points:
- Helm install succeeds.
- UI is reachable.
- Pipeline can be created in form mode.
- Generated YAML is visible before deploy.
- Pipeline becomes Kubernetes state.
- Kafka output verifies that the deployed pipeline is consuming, filtering, transforming, and producing.
- Pipeline pod image is
streamforge:local, not the chart defaultghcr.io/rahulbsw/streamforge:0.3.0.
Human Audio Script:
“This demo is for platform teams. A pipeline workflow should be usable, but it should also be reviewable and operable.”
“I am installing StreamForge with Helm. The operator gives us Kubernetes-native pipeline management, and the UI gives users a guided way to create pipelines.”
“The important part is the YAML preview. The UI is not a black box; the pipeline becomes declarative state.”
“After deploying the CRD, I verify behavior from Kafka. A successful UI save is not enough. The proof is input event in, pipeline output out.”
“This is the platform story: guided creation for users, Kubernetes control for operators.”
YouTube Metadata:
- Title:
StreamForge Kubernetes Demo: Helm, Operator, UI, and Kafka Pipeline - Description:
Install StreamForge with Helm, create a pipeline in the UI, preview generated YAML, deploy the CRD, and verify Kafka output. - Thumbnail text:
UI -> YAML -> CRD - Pinned comment:
The UI demo path is documented in docs/UI_MINIKUBE_DEMO.md. The Helm chart lives under helm/streamforge-operator.
Publish Copy: Use Demo 2 from social-posts.md.
Dry-Run Result: 2026-05-25
- Minikube started and kubectl context switched to
minikube. helm lint ./helm/streamforge-operatorpassed.- Helm rendering initially exposed a UI RBAC bug: the UI ClusterRole granted
streaming.streamforge.dev, while the CRD and UI API usestreamforge.io. - Patched
helm/streamforge-operator/templates/ui-rbac.yamlto grantstreamforge.io. kubectl auth can-i list streamforgepipelines.streamforge.io --as=system:serviceaccount:streamforge-system:streamforge-ui -n streamforge-systemchanged fromnotoyes.- On Apple Silicon Minikube,
ghcr.io/rahulbsw/streamforge-ui:latestfailed withno matching manifest for linux/arm64/v8. - Built
streamforge-ui:local, exported it to/private/tmp/streamforge-ui-local.tar, loaded it into Minikube, and installed the chart with the local UI image override. - Operator and UI pods reached
Running. - UI login worked with
admin/admin. - UI pipeline listing worked after the RBAC fix.
- The repo’s
examples/kubernetes/kafka/kafka-standalone.yamldid not work cleanly in this dry run; the Kafka container failed with anadvertised.listenerserror. - An in-cluster Redpanda broker worked when deployed through
/entrypoint.sh redpanda start .... - Created
raw-orders-ui-demoandanalytics-orders-ui-demo. - Created
ui-orders-demothrough the UI form and previewed the generated YAML. StreamforgePipelineresource was created instreamforge-system.- Pipeline pod reached
Runningwithghcr.io/rahulbsw/streamforge:0.3.0. - Produced one event to
raw-orders-ui-demo. - Consumed one event from
analytics-orders-ui-demo. - The consumed event verified runtime deployment and Kafka output, but it was a raw mirrored event with the chart default
0.3.0image. - For the v2-syntax recording, build and load
streamforge:local, install the chart with thedefaults.image.*overrides above, and verify transformed output before claiming transform behavior.
Package 3: PII-Safe Data Engineering Pipeline
Objective: Show how StreamForge creates a safer downstream analytics contract by filtering, projecting, and hashing or removing sensitive fields.
Target Viewer: Data engineers, analytics engineers, privacy-aware platform teams, and compliance-adjacent teams.
Screen Plan:
- Show a raw event with customer identifiers.
- Show
examples/production/pii-redaction.yaml. - Show
docs/marketing/streamforge-launch/configs/pii-redaction-local.yamlfor the local recording. - Validate both configs.
- Run StreamForge.
- Produce the raw event.
- Consume analytics, marketing, third-party, and compliance outputs.
- Point out where raw email, name, and IP are absent, and where full internal compliance data is intentionally retained.
Terminal Commands:
Pre-build before recording:
cargo run --quiet --bin streamforge-validate -- examples/production/pii-redaction.yaml
cargo run --quiet --bin streamforge-validate -- docs/marketing/streamforge-launch/configs/pii-redaction-local.yaml
cargo build --release --bin streamforge
Main recording commands:
docker compose -f examples/redpanda/docker-compose.yml up -d
docker compose -f examples/redpanda/docker-compose.yml ps
In a second terminal, reset and create topics before starting StreamForge:
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic delete user-events-raw user-events-analytics user-events-marketing \
events-third-party user-events-compliance pii-redaction-dlq || true
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic create user-events-raw user-events-analytics user-events-marketing \
events-third-party user-events-compliance pii-redaction-dlq
Back in the first terminal:
CONFIG_FILE=docs/marketing/streamforge-launch/configs/pii-redaction-local.yaml ./target/release/streamforge
Continue in the second terminal:
printf '%s\n' \
'{"event_type":"account_created","timestamp":"2026-05-25T21:00:00Z","region":"us","device_type":"ios","email":"alice@example.com","user":{"id":"user-42","email":"alice@example.com","name":"Alice Example"},"consent":{"marketing":true,"third_party":true},"properties":{"non_pii":{"plan":"pro","source":"mobile"},"pii":{"ip":"203.0.113.10"}},"data":{"user_id":"user-42","email":"alice@example.com","name":"Alice Example","event_type":"account_created","region":"us"}}' \
| docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic produce user-events-raw
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic consume user-events-analytics -n 1 --offset start
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic consume user-events-marketing -n 1 --offset start
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic consume events-third-party -n 1 --offset start
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic consume user-events-compliance -n 1 --offset start
curl http://localhost:8080/health
curl http://localhost:8080/metrics | rg "streamforge_messages_(consumed|produced)|streamforge_consumer_lag"
Cleanup:
docker compose -f examples/redpanda/docker-compose.yml down
Expected Proof Points:
- Production PII redaction config validates unchanged.
- Local recording config validates and points at
localhost:9092. - Raw event contains sensitive fields.
- Analytics output contains only approved analytics fields and uses a SHA-256 user ID key.
- Marketing output contains only event metadata and uses an MD5 email hash key.
- Third-party output contains only event type and
properties.non_pii. - Compliance output intentionally contains full internal compliance data.
- Raw email, raw name, and IP address are absent from the analytics, marketing, and third-party values.
Human Audio Script:
“Operational topics often contain more than analytics systems should receive.”
“This config is explicit about what leaves the source boundary. It filters the event, projects approved fields, and keeps raw email, name, and IP out of the lower-trust outputs.”
“Projection is safer than relying on every downstream consumer to ignore fields.”
“There is also a compliance destination. That one keeps the fuller internal payload, which is useful for audit or regulated workflows, but it is separate from analytics and third-party outputs.”
“StreamForge does not replace governance, access control, or audit logs. It gives data teams a concrete enforcement point in the Kafka path.”
YouTube Metadata:
- Title:
PII-Safe Kafka Pipelines: Filter, Transform, and Redact with StreamForge - Description:
Use StreamForge to create analytics-safe Kafka topics by filtering events, projecting approved fields, and keeping raw PII out of lower-trust outputs. - Thumbnail text:
PII Out - Pinned comment:
The production example is examples/production/pii-redaction.yaml. Review your own governance requirements before using any data minimization pattern in production.
Publish Copy: Use Demo 3 from social-posts.md.
Dry-Run Result: 2026-05-25
examples/production/pii-redaction.yamlvalidation passed unchanged.docs/marketing/streamforge-launch/configs/pii-redaction-local.yamlvalidation passed with four destinations and no warnings.- Redpanda started from
examples/redpanda/docker-compose.yml. - The six demo topics were reset and recreated before producing the sample event.
- StreamForge started cleanly from
./target/release/streamforge. - Produced sample event at
user-events-rawoffset0. user-events-analyticsoutput key was SHA-256 hash6d894aa3ee802549d7f340e7c1cf0d1c1cb14cd84f768d92ffaa6785337c4997.user-events-analyticsoutput value:
{"device":"ios","event_type":"account_created","region":"us","timestamp":"2026-05-25T21:00:00Z","user_id":"user-42"}
user-events-marketingoutput key was MD5 hashc160f8cc69a4f0bf2b0362752353d060.user-events-marketingoutput value:
{"anonymous_id":"user-42","event":"account_created","timestamp":"2026-05-25T21:00:00Z"}
events-third-partyoutput key was SHA-256 hash6d894aa3ee802549d7f340e7c1cf0d1c1cb14cd84f768d92ffaa6785337c4997.events-third-partyoutput value:
{"event":"account_created","properties":{"plan":"pro","source":"mobile"}}
user-events-complianceoutput key wasuser-42.user-events-complianceoutput value:
{"email":"alice@example.com","event_type":"account_created","name":"Alice Example","region":"us","user_id":"user-42"}
- Health endpoint returned
OK. - Metrics showed
streamforge_messages_consumed_total 1, one produced message for each destination, andstreamforge_consumer_lagat0. - External values verified absent raw
alice@example.com,Alice Example, and203.0.113.10in analytics, marketing, and third-party topics. - If
streamforge-validateprintsxcruncache warnings on macOS, treat those as local toolchain noise when validation still exits0.
V2 DSL Re-Check: 2026-05-26
- Re-ran the local PII-safe recording config with v2 filter, transform, and key transform syntax.
- StreamForge logs showed
regex(field('/user/id'), '.+'),$consent.marketing == true,construct(...), andhash(...)parsed at runtime. - Analytics, marketing, third-party, and compliance outputs matched the earlier dry-run payloads and keys.
- Health returned
OK; metrics showed one consumed source message, one produced message per destination, and lag0.
Package 4: CDC to Data Lake Pipeline
Objective: Show StreamForge as a lightweight shaping layer between CDC topics and lake or warehouse consumers.
Target Viewer: Data engineers, lakehouse pipeline owners, analytics platform teams.
Screen Plan:
- Show a Debezium-style CDC envelope.
- Show
examples/production/cdc-to-datalake.yaml. - Show
docs/marketing/streamforge-launch/configs/cdc-to-datalake-local.yamlfor the local recording. - Explain create/update/delete/schema-change routing.
- Validate both configs.
- Run StreamForge.
- Produce sample CDC records.
- Consume shaped outputs.
Terminal Commands:
Pre-build before recording:
cargo run --quiet --bin streamforge-validate -- examples/production/cdc-to-datalake.yaml
cargo run --quiet --bin streamforge-validate -- docs/marketing/streamforge-launch/configs/cdc-to-datalake-local.yaml
cargo build --release --bin streamforge
Main recording commands:
docker compose -f examples/redpanda/docker-compose.yml up -d
docker compose -f examples/redpanda/docker-compose.yml ps
In a second terminal, reset and create topics before starting StreamForge:
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic delete dbserver.inventory.orders datalake-orders \
datalake-orders-deleted datalake-schema-changes cdc-datalake-dlq || true
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic create dbserver.inventory.orders datalake-orders \
datalake-orders-deleted datalake-schema-changes cdc-datalake-dlq
Back in the first terminal:
CONFIG_FILE=docs/marketing/streamforge-launch/configs/cdc-to-datalake-local.yaml ./target/release/streamforge
Continue in the second terminal:
printf '%s\n' \
'{"payload":{"op":"c","ts_ms":1779742500000,"after":{"id":"ord-2001","customer_id":"cust-101","status":"created","amount":199.5,"updated_at":"2026-05-25T21:05:00Z"},"before":null,"source":{"db":"inventory","table":"orders"}}}' \
'{"payload":{"op":"u","ts_ms":1779742560000,"after":{"id":"ord-2001","customer_id":"cust-101","status":"paid","amount":199.5,"updated_at":"2026-05-25T21:06:00Z"},"before":{"id":"ord-2001","customer_id":"cust-101","status":"created","amount":199.5,"updated_at":"2026-05-25T21:05:00Z"},"source":{"db":"inventory","table":"orders"}}}' \
'{"payload":{"op":"d","ts_ms":1779742620000,"after":null,"before":{"id":"ord-2002","customer_id":"cust-202","status":"cancelled","amount":49.95,"updated_at":"2026-05-25T21:07:00Z"},"source":{"db":"inventory","table":"orders"}}}' \
'{"payload":{"op":"s","ts_ms":1779742680000,"ddl":"ALTER TABLE orders ADD COLUMN coupon_code VARCHAR(32)","source":{"db":"inventory","table":"orders"}}}' \
| docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic produce dbserver.inventory.orders
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic consume datalake-orders -n 2 --offset start
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic consume datalake-orders-deleted -n 1 --offset start
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic consume datalake-schema-changes -n 1 --offset start
curl http://localhost:8080/health
curl http://localhost:8080/metrics | rg "streamforge_messages_(consumed|produced)|streamforge_consumer_lag"
Cleanup:
docker compose -f examples/redpanda/docker-compose.yml down
Expected Proof Points:
- Production and local CDC configs validate.
- Create and update records route to
datalake-ordersand emit only thepayload.afterrow. - Delete records route to
datalake-orders-deletedand emitidplusdeleted_at. - Schema-change records route to
datalake-schema-changes. - Metrics show four consumed source messages, two produced lake order rows, one produced delete row, one produced schema-change row, and zero lag.
- Viewer understands the lake sink remains separate from StreamForge.
Human Audio Script:
“CDC topics are powerful, but raw envelopes are not always the right downstream contract.”
“This demo treats StreamForge as the Kafka-side shaping layer. It extracts the payload shape that lake, warehouse, Spark, Flink, or custom consumers can read more easily.”
“Create and update records usually care about the after state. Delete records often need the before state. Schema changes can be routed separately.”
“StreamForge is not the data lake sink. It makes the Kafka contract cleaner before the sink reads it.”
YouTube Metadata:
- Title:
Kafka CDC to Data Lake: Shape Debezium Events with StreamForge - Description:
Shape Debezium-style Kafka CDC events with StreamForge before downstream lake, warehouse, Spark, Flink, or custom consumers read them. - Thumbnail text:
CDC -> Clean Topics - Pinned comment:
The reference config is examples/production/cdc-to-datalake.yaml. This demo focuses on Kafka-side shaping, not lake sink configuration.
Publish Copy: Use Demo 4 from social-posts.md.
Dry-Run Result: 2026-05-25
examples/production/cdc-to-datalake.yamlvalidation passed.docs/marketing/streamforge-launch/configs/cdc-to-datalake-local.yamlvalidation passed with three destinations and no warnings.- The dry run caught and fixed a config issue in the production example:
EXTRACT:/payload/after,ordervalidated but did not transform at runtime because the supported extraction syntax is/payload/after. - Redpanda started from
examples/redpanda/docker-compose.yml. - The five CDC demo topics were reset and recreated before producing sample events.
- StreamForge started cleanly from
./target/release/streamforge. - Produced four sample CDC events at
dbserver.inventory.ordersoffsets0through3. datalake-ordersreceived the create row with keyord-2001:
{"amount":199.5,"customer_id":"cust-101","id":"ord-2001","status":"created","updated_at":"2026-05-25T21:05:00Z"}
datalake-ordersreceived the update row with keyord-2001:
{"amount":199.5,"customer_id":"cust-101","id":"ord-2001","status":"paid","updated_at":"2026-05-25T21:06:00Z"}
datalake-orders-deletedreceived the delete row with keyord-2002:
{"deleted_at":1779742620000,"id":"ord-2002"}
datalake-schema-changesreceived the schema-change payload:
{"ddl":"ALTER TABLE orders ADD COLUMN coupon_code VARCHAR(32)","op":"s","source":{"db":"inventory","table":"orders"},"ts_ms":1779742680000}
- Health endpoint returned
OK. - Metrics showed
streamforge_messages_consumed_total 4, produced counts of2fordatalake-orders,1fordatalake-orders-deleted,1fordatalake-schema-changes, andstreamforge_consumer_lagat0. - If
streamforge-validateprintsxcruncache warnings on macOS, treat those as local toolchain noise when validation still exits0.
V2 DSL Re-Check: 2026-05-26
- Re-ran the local CDC recording config with v2 filter, transform, and key transform syntax.
- StreamForge logs showed
or($payload.op == 'c', $payload.op == 'u'),$payload.op == 'd',field('/payload/after'), andconstruct(...)parsed at runtime. - Create, update, delete, and schema-change outputs matched the earlier dry-run payloads and keys.
- Metrics showed four consumed source messages, produced counts of
2/1/1, and lag0.
Package 5: AI-Ready Event Stream
Objective: Show a practical AI infrastructure use case: create a safe real-time topic for AI or ML systems without exposing raw operational payloads.
Target Viewer: AI infrastructure teams, MLOps teams, platform engineers, and data engineers.
Screen Plan:
- Show raw business event.
- Explain why raw operational topics are too broad for AI systems.
- Show
docs/marketing/streamforge-launch/configs/ai-ready-events-local.yaml. - Validate the config.
- Run StreamForge.
- Produce one rich operational order event.
- Consume
ai-features-ordersandmodel-monitoring-events. - Explain downstream consumers: features, model monitoring, RAG/event context, experimentation.
Terminal Commands:
Pre-build before recording:
cargo run --quiet --bin streamforge-validate -- docs/marketing/streamforge-launch/configs/ai-ready-events-local.yaml
cargo build --release --bin streamforge
Main recording commands:
docker compose -f examples/redpanda/docker-compose.yml up -d
docker compose -f examples/redpanda/docker-compose.yml ps
In a second terminal, reset and create topics before starting StreamForge:
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic delete raw-ai-orders ai-features-orders model-monitoring-events ai-ready-dlq || true
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic create raw-ai-orders ai-features-orders model-monitoring-events ai-ready-dlq
Back in the first terminal:
CONFIG_FILE=docs/marketing/streamforge-launch/configs/ai-ready-events-local.yaml ./target/release/streamforge
Continue in the second terminal:
printf '%s\n' \
'{"event_type":"order_completed","event_time":"2026-05-25T21:15:00Z","region":"us","ai_approved":true,"model_monitoring":true,"customer":{"id":"cust-ai-42","email":"alice@example.com","name":"Alice Example","tier":"gold","ip":"203.0.113.42"},"order":{"id":"ord-ai-9001","amount":249.99,"currency":"USD","product_count":3,"status":"completed"},"decision_context":{"risk_bucket":"low","channel":"mobile","experiment":"checkout-v3"},"internal_notes":"VIP customer requested callback","payment":{"card_last4":"4242"}}' \
| docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic produce raw-ai-orders
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic consume ai-features-orders -n 1 --offset start
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic consume model-monitoring-events -n 1 --offset start
curl http://localhost:8080/health
curl http://localhost:8080/metrics | rg "streamforge_messages_(consumed|produced)|streamforge_consumer_lag"
Cleanup:
docker compose -f examples/redpanda/docker-compose.yml down
Expected Proof Points:
- Raw event contains more fields than the AI-facing contract.
ai-features-orderscontains approved business fields for feature pipelines.model-monitoring-eventscontains approved monitoring context.- Both destination values exclude raw email, raw name, IP address, internal notes, and payment fields.
- Both destination keys use the SHA-256 hash of the customer ID.
- Metrics show one consumed source message, one produced message for each AI-facing destination, and zero lag.
- Narration avoids implying that StreamForge runs model inference.
Human Audio Script:
“The AI use case here is not an LLM wrapper. It is safer real-time data infrastructure.”
“AI and ML systems need fresh business events, but they should not automatically consume raw operational Kafka topics.”
“StreamForge creates AI-facing contracts close to Kafka: approved event types, stable fields, and raw PII removed from values.”
“In this recording, one output is shaped for feature pipelines and one output is shaped for model monitoring.”
“That topic can feed feature pipelines, model monitoring, event context, experimentation, or analytics.”
“The key idea is simple: AI infrastructure still needs good data engineering.”
YouTube Metadata:
- Title:
Build PII-Safe Real-Time Streams for AI Systems with Kafka and StreamForge - Description:
Create an AI-facing Kafka topic by filtering raw events, projecting stable business fields, and keeping raw PII out of downstream AI and ML systems. - Thumbnail text:
AI-Ready Kafka - Pinned comment:
This demo is about data contracts for AI infrastructure. It does not run model inference; it prepares safer real-time streams for downstream AI systems.
Publish Copy: Use Demo 5 from social-posts.md.
Dry-Run Result: 2026-05-25
docs/marketing/streamforge-launch/configs/ai-ready-events-local.yamlvalidation passed with two destinations and no warnings.- Redpanda started from
examples/redpanda/docker-compose.yml. - The four AI demo topics were reset and recreated before producing the sample event.
- StreamForge started cleanly from
./target/release/streamforge. - Produced one rich operational event at
raw-ai-ordersoffset0. ai-features-ordersoutput key was SHA-256 hashd58bbe4912473f6bd3841146c8c4170ca12f1801c05e854d1f0230e93f3b2ed9.ai-features-ordersoutput value:
{"amount":249.99,"created_at":"2026-05-25T21:15:00Z","currency":"USD","customer_tier":"gold","event_type":"order_completed","order_id":"ord-ai-9001","product_count":3,"region":"us"}
model-monitoring-eventsoutput key was SHA-256 hashd58bbe4912473f6bd3841146c8c4170ca12f1801c05e854d1f0230e93f3b2ed9.model-monitoring-eventsoutput value:
{"amount":249.99,"created_at":"2026-05-25T21:15:00Z","decision_context":{"channel":"mobile","experiment":"checkout-v3","risk_bucket":"low"},"event_type":"order_completed","region":"us"}
- Health endpoint returned
OK. - Metrics showed
streamforge_messages_consumed_total 1, one produced message forai-features-orders, one produced message formodel-monitoring-events, andstreamforge_consumer_lagat0. - AI-facing values verified absent raw
alice@example.com,Alice Example,203.0.113.42,VIP customer requested callback, and4242. - If
streamforge-validateprintsxcruncache warnings on macOS, treat those as local toolchain noise when validation still exits0.
V2 DSL Re-Check: 2026-05-26
- Re-ran the local AI-ready recording config with v2 filter, transform, and key transform syntax.
- StreamForge logs showed
and($event_type == 'order_completed', ...),construct(...), andhash('SHA256', $customer.id)parsed at runtime. ai-features-ordersandmodel-monitoring-eventsmatched the earlier dry-run payloads and keys.- Metrics showed one consumed source message, one produced message per destination, and lag
0.
Package 6: AWS Production Deployment
Objective: Show a production-style AWS path for StreamForge using EKS, MSK or a clearly labeled Kafka-compatible fallback, Helm/operator deployment, verification, and cleanup.
Target Viewer: Cloud platform teams, production data infrastructure teams, and evaluators who need cloud credibility.
Screen Plan:
- Show architecture and cost controls.
- Show the preflight gate before provisioning.
- Show AWS identity and region without exposing sensitive details.
- Show EKS cluster.
- Show MSK bootstrap or fallback broker.
- Install StreamForge with Helm.
- Apply pipeline config.
- Produce and consume verification events.
- Show metrics/logs.
- Run cleanup commands.
Terminal Commands:
Use docs/marketing/streamforge-launch/aws-demo-runbook.md as the source of truth. Preflight commands to run before creating paid resources:
export AWS_REGION=us-west-2
export EKS_CLUSTER=streamforge-demo
export NAMESPACE=streamforge
export AWS_PROFILE=<demo-profile>
aws --version
kubectl version --client
eksctl version
helm version
docker version
aws sts get-caller-identity --profile "$AWS_PROFILE" --query Account --output text | sed 's/[0-9]/*/g'
aws configure get region --profile "$AWS_PROFILE"
helm lint ./helm/streamforge-operator
helm show crds ./helm/streamforge-operator | rg 'usernameSecret|passwordSecret|caSecret'
Do not create resources until the preflight passes. Minimum visible deployment checks after provisioning:
kubectl get nodes
helm install streamforge ./helm/streamforge-operator --namespace streamforge --create-namespace
kubectl get pods -n streamforge
kubectl get streamforgepipeline -n streamforge
Cleanup:
helm uninstall streamforge -n streamforge
kubectl delete namespace streamforge
eksctl delete cluster --name streamforge-demo --region us-west-2
Expected Proof Points:
- AWS region and account context are controlled.
- Preflight passes before provisioning.
- EKS is running StreamForge.
- Kafka broker is reachable from the cluster.
- Produce/consume verification succeeds.
- Cleanup plan is shown before resources are created.
Human Audio Script:
“Local demos are useful, but production teams also need to see the cloud pattern.”
“This is a production-style temporary setup: EKS runs StreamForge, MSK or a clearly labeled Kafka-compatible broker provides topics, and Helm installs the operator and UI.”
“I am showing the cleanup path up front because cloud demos should not leave expensive resources behind.”
“The important verification is Kafka-level. Pods running is not enough. I want to see a raw input event become the expected downstream output.”
“This is not a complete production hardening guide. It is the deploy-and-verify path.”
YouTube Metadata:
- Title:
Deploy StreamForge on AWS: EKS, MSK, Helm, and Kafka Pipeline Verification - Description:
Run a production-style StreamForge demo on AWS with EKS, MSK or a Kafka-compatible fallback, Helm/operator deployment, Kafka verification, metrics, and cleanup. - Thumbnail text:
Kafka Pipelines on AWS - Pinned comment:
AWS resources can incur cost. Follow docs/marketing/streamforge-launch/aws-demo-runbook.md and clean up resources after recording.
Publish Copy: Use Demo 6 from social-posts.md.
Dry-Run Preflight Result: 2026-05-25
- No AWS resources were provisioned in this dry run.
- AWS CLI was installed.
- Docker, kubectl, and Helm were installed.
- Homebrew was installed.
eksctlwas not installed, so the EKS creation path is blocked until it is installed.- Default AWS credentials were not available:
aws sts get-caller-identityreturnedNoCredentials. - One named AWS profile was present, but its token was expired:
ExpiredToken. - The named profile did not have a region configured.
- kubectl had no current context, so no EKS cluster was reachable from this machine.
helm lint ./helm/streamforge-operatorpassed.helm template streamforge ./helm/streamforge-operator --namespace streamforge --create-namespace --set ui.enabled=truerendered successfully.- The Helm chart CRD schema was updated so the documented secure Kafka fields
caSecret,usernameSecret, andpasswordSecretare accepted by the Kubernetes API schema instead of relying only on the operator model. - Next live recording prerequisites: install
eksctl, refresh AWS credentials for the recording profile, set the profile region, confirm budget/cost controls, then run the EKS/MSK provisioning flow and cleanup in one recording session.
Package 7: Observability and Scaling
Objective: Show the operational signals for StreamForge: health, metrics, lag, throughput, latency, scaling considerations, retry, and DLQ behavior.
Target Viewer: SREs, platform engineers, Kafka operators, and production owners.
Screen Plan:
- Show
docs/marketing/streamforge-launch/configs/observability-local.yaml. - Start StreamForge.
- Curl health and initial metrics endpoints.
- Generate keyed traffic across three source partitions.
- Show consumed, produced, filtered, error, latency, and lag metrics.
- Discuss partition-aware scaling and key distribution.
- Show retry and DLQ configuration; keep the recording to a clean run unless a controlled failure is prepared.
Terminal Commands:
Pre-build before recording:
cargo run --quiet --bin streamforge-validate -- docs/marketing/streamforge-launch/configs/observability-local.yaml
cargo build --release --bin streamforge
Main recording commands:
docker compose -f examples/redpanda/docker-compose.yml up -d
docker compose -f examples/redpanda/docker-compose.yml ps
In a second terminal, reset and create three-partition topics:
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic delete observability-orders premium-events standard-events observability-dlq || true
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic create observability-orders -p 3
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic create premium-events -p 3
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic create standard-events -p 3
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic create observability-dlq -p 3
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic describe observability-orders
Back in the first terminal:
CONFIG_FILE=docs/marketing/streamforge-launch/configs/observability-local.yaml ./target/release/streamforge
Capture the zero-traffic baseline:
curl http://localhost:9090/health
curl http://localhost:9090/metrics | rg "streamforge_messages_(consumed|produced|filtered)|streamforge_filter_evaluations_total"
Produce keyed traffic:
printf '%s\n' \
'cust-premium-1 {"order_id":"obs-1001","customer":{"id":"cust-premium-1","tier":"premium"},"amount":125,"region":"us","created_at":"2026-05-25T21:30:01Z"}' \
'cust-standard-1 {"order_id":"obs-1002","customer":{"id":"cust-standard-1","tier":"standard"},"amount":64,"region":"us","created_at":"2026-05-25T21:30:02Z"}' \
'cust-premium-2 {"order_id":"obs-1003","customer":{"id":"cust-premium-2","tier":"premium"},"amount":250,"region":"eu","created_at":"2026-05-25T21:30:03Z"}' \
'cust-standard-2 {"order_id":"obs-1004","customer":{"id":"cust-standard-2","tier":"standard"},"amount":80,"region":"apac","created_at":"2026-05-25T21:30:04Z"}' \
'cust-premium-3 {"order_id":"obs-1005","customer":{"id":"cust-premium-3","tier":"premium"},"amount":310,"region":"us","created_at":"2026-05-25T21:30:05Z"}' \
'cust-standard-3 {"order_id":"obs-1006","customer":{"id":"cust-standard-3","tier":"standard"},"amount":42,"region":"eu","created_at":"2026-05-25T21:30:06Z"}' \
| docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic produce observability-orders -f '%k %v{json}\n'
Consume outputs:
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic consume premium-events -n 3 --offset start
docker compose -f examples/redpanda/docker-compose.yml exec -T redpanda \
rpk topic consume standard-events -n 3 --offset start
Capture final metrics after one lag interval:
curl http://localhost:9090/metrics | rg "streamforge_messages_(consumed|produced|filtered)|streamforge_filter_evaluations_total|streamforge_consumer_(lag|offset|high_watermark)|streamforge_processing_duration_seconds_(count|sum)|streamforge_messages_in_flight"
Cleanup:
docker compose -f examples/redpanda/docker-compose.yml down
Prometheus queries to prepare:
rate(streamforge_messages_consumed_total[5m])
sum(rate(streamforge_messages_produced_total[5m])) by (destination)
sum(streamforge_consumer_lag)
histogram_quantile(0.99, rate(streamforge_processing_duration_seconds_bucket[5m]))
Expected Proof Points:
- Health endpoint responds.
- Metrics endpoint exposes StreamForge counters/gauges/histograms.
- Traffic changes consumed, produced, filtered, and filter-evaluation metrics.
- Consumer lag is visible per source partition.
- Source and destination topics have three partitions.
- Scaling discussion is tied to Kafka partitions, key distribution, and consumer groups.
- DLQ and retry are configured; the clean run shows zero error-filtered messages.
Human Audio Script:
“A Kafka pipeline is not production-ready just because it processed one message.”
“For selective replication, I need to know whether the service is healthy, whether it is keeping up, how many messages are filtered, how many are produced, and whether errors are increasing.”
“Consumer lag is the basic production question: is the pipeline falling behind the source topic?”
“Scaling is tied to Kafka partitions. More replicas help only when there are enough partitions to assign.”
“Retry and DLQ behavior are the safety rails for bad records or downstream failures.”
YouTube Metadata:
- Title:
Operating StreamForge: Kafka Lag, Prometheus Metrics, Scaling, Retry, and DLQ - Description:
Inspect StreamForge health, Prometheus metrics, consumer lag, throughput, latency, scaling behavior, retry, and DLQ signals for selective Kafka replication. - Thumbnail text:
Operate It - Pinned comment:
Start with docs/OBSERVABILITY_QUICKSTART.md for metrics and lag monitoring. Scale replicas with Kafka partition count in mind.
Publish Copy: Use Demo 7 from social-posts.md.
Dry-Run Result: 2026-05-25
docs/marketing/streamforge-launch/configs/observability-local.yamlvalidation passed with two destinations and no warnings.- Redpanda started from
examples/redpanda/docker-compose.yml. - Created
observability-orders,premium-events,standard-events, andobservability-dlqwith three partitions each. - StreamForge started cleanly from
./target/release/streamforge. - Initial health endpoint returned
OK. - Initial metrics showed consumed, produced, filtered, and filter-evaluation counters at
0. - Produced six keyed records to
observability-orders; records landed across source partitions0,1, and2. premium-eventsreceived three records forcust-premium-1,cust-premium-2, andcust-premium-3.standard-eventsreceived three records forcust-standard-1,cust-standard-2, andcust-standard-3.- Final metrics showed
streamforge_messages_consumed_total 6. - Final metrics showed produced counts of
3forpremium-eventsand3forstandard-events. - Final metrics showed filter pass/fail counts of
3/3for each destination. - Final metrics showed
streamforge_messages_filtered_total{reason="filter_failed"} 3for each destination andreason="error"at0. - Final lag gauges showed partition
0lag0, partition1lag0, and partition2lag0. - Final metrics showed
streamforge_messages_in_flight 0. - Final metrics exposed processing duration histogram counts and sums for both destinations.
observability-dlqwas created and ready; no controlled failure was injected during this clean recording run.- If
streamforge-validateprintsxcruncache warnings on macOS, treat those as local toolchain noise when validation still exits0.
V2 DSL Re-Check: 2026-05-26
- Re-ran the local observability recording config with v2 filter, transform, and key transform syntax.
- StreamForge logs showed
$customer.tier == 'premium',not($customer.tier == 'premium'),construct(...), and$customer.idparsed at runtime. - Produced six keyed records; source records landed across partitions
0,1, and2. premium-eventsreceived three records andstandard-eventsreceived three records with the expected shaped values and keys.- Metrics showed
streamforge_messages_consumed_total 6, produced counts of3per destination, filter pass/fail counts of3/3,streamforge_messages_in_flight 0, and lag0on all three source partitions.