StreamForge Troubleshooting Guide

Version: 1.0.0
Last Updated: 2026-04-18

This guide covers common issues, symptoms, causes, and solutions for StreamForge operations.

Quick Diagnosis
Startup Issues
Performance Issues
Data Issues
Connectivity Issues
Resource Issues
Configuration Issues
Kafka Issues
Debug Commands

Quick Diagnosis

Health Check Commands

# 1. Check pod status
kubectl get pods -n streamforge

# 2. Check logs
kubectl logs -f deployment/streamforge -n streamforge --tail=50

# 3. Check metrics
curl http://streamforge:8080/metrics | grep -E "(up|error|lag)"

# 4. Check consumer group
kafka-consumer-groups --bootstrap-server kafka:9092 \
  --describe --group <appid>

# 5. Check resource usage
kubectl top pods -n streamforge

Common Symptoms Quick Reference

Symptom	Likely Cause	Quick Fix
Pod stuck in `CrashLoopBackOff`	Config error or missing secret	Check logs, validate config
Consumer lag growing	Insufficient replicas or CPU	Scale up
High error rate	Bad data or config mismatch	Check DLQ headers
Zero throughput	Kafka connection failure	Check connectivity
High memory usage	Large messages or memory leak	Increase limits, restart
Slow processing	Complex filters or transforms	Optimize DSL, add threads

Startup Issues

Issue: Pod stuck in `Pending`

Symptoms:

NAME                          READY   STATUS    RESTARTS   AGE
streamforge-7c8f9d4b6-abc12   0/1     Pending   0          5m

Diagnosis:

kubectl describe pod streamforge-7c8f9d4b6-abc12 -n streamforge

Common causes:

1. Insufficient resources:

Events:
  Warning  FailedScheduling  5m  default-scheduler  0/3 nodes are available: insufficient cpu.

Solution:

Reduce resource requests
Add more nodes to cluster
Remove resource limits temporarily

kubectl patch deployment streamforge -n streamforge --patch '
spec:
  template:
    spec:
      containers:
      - name: streamforge
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
'

2. ImagePullBackOff:

Events:
  Warning  Failed  5m  kubelet  Failed to pull image "streamforge:1.0.0": rpc error: code = Unknown

Solution:

Check image exists: docker pull streamforge:1.0.0
Check image pull secret: kubectl get secret -n streamforge
Use correct image registry

3. PVC not bound:

Events:
  Warning  FailedMount  5m  kubelet  Unable to attach or mount volumes

Solution:

Check PVC status: kubectl get pvc -n streamforge
Create missing PV/PVC
Remove volume if not needed

Issue: Pod stuck in `Init:Error`

Symptoms:

NAME                          READY   STATUS       RESTARTS   AGE
streamforge-7c8f9d4b6-abc12   0/1     Init:Error   0          2m

Diagnosis:

kubectl logs streamforge-7c8f9d4b6-abc12 -c init-container -n streamforge
kubectl describe pod streamforge-7c8f9d4b6-abc12 -n streamforge

Common causes:

Init container failed:

Check init container logs
Verify dependencies (e.g., Kafka must be reachable)
Fix init script

Solution: Remove init container if not essential, or fix the init logic.

Issue: Pod `CrashLoopBackOff`

Symptoms:

NAME                          READY   STATUS             RESTARTS   AGE
streamforge-7c8f9d4b6-abc12   0/1     CrashLoopBackOff   5          5m

Diagnosis:

# Check current logs
kubectl logs streamforge-7c8f9d4b6-abc12 -n streamforge

# Check previous crashed instance
kubectl logs streamforge-7c8f9d4b6-abc12 -n streamforge --previous

Common causes:

1. Config parse error:

ERROR Failed to parse config: invalid YAML at line 10

Solution:

# Validate config
streamforge-validate config.yaml

# Fix ConfigMap
kubectl edit configmap streamforge-config -n streamforge

# Restart
kubectl rollout restart deployment/streamforge -n streamforge

2. Missing environment variable:

ERROR Environment variable KAFKA_BOOTSTRAP not set

Solution:

kubectl set env deployment/streamforge KAFKA_BOOTSTRAP=kafka:9092 -n streamforge

3. Kafka connection failure:

ERROR Failed to connect to Kafka broker at kafka:9092: Connection refused

Solution:

Check Kafka is running
Verify bootstrap servers in config
Check network policies
Test connectivity: kubectl exec -it streamforge-xxx -n streamforge -- ping kafka

4. OOMKilled (Out of Memory):

kubectl describe pod streamforge-xxx -n streamforge

Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137

Solution:

# Increase memory limits
kubectl patch deployment streamforge -n streamforge --patch '
spec:
  template:
    spec:
      containers:
      - name: streamforge
        resources:
          limits:
            memory: 4Gi
'

Issue: Container exits immediately with code 1

Diagnosis:

kubectl logs streamforge-xxx -n streamforge --previous

Common causes:

Invalid filter/transform syntax:

ERROR Failed to parse filter: /status,==,active,extra-arg

Solution:

Fix DSL syntax
Use streamforge-validate to check config
Review docs/DSL_SPEC.md for correct syntax

Permission denied (TLS certs):

ERROR Failed to read TLS certificate: Permission denied

Solution:

# Check file permissions in secret
kubectl describe secret kafka-tls -n streamforge

# Ensure securityContext allows reading
kubectl patch deployment streamforge -n streamforge --patch '
spec:
  template:
    spec:
      securityContext:
        fsGroup: 1000
'

Performance Issues

Issue: High Consumer Lag

Symptoms:

Lag > 10000 messages
Lag growing over time
Alert: “StreamForgeHighLag”

Diagnosis:

# Check lag
kafka-consumer-groups --bootstrap-server kafka:9092 \
  --describe --group <appid>

# Check throughput
curl http://streamforge:8080/metrics | grep messages_consumed_total

# Check CPU usage
kubectl top pods -n streamforge

Common causes:

1. Insufficient parallelism (too few replicas):

Partitions: 16
Replicas: 2
Result: 14 partitions idle, only 2 being consumed

Solution:

kubectl scale deployment streamforge --replicas=8 -n streamforge

2. CPU saturation:

CPU: 1900m/2000m (95%)

Solution:

# Increase CPU limits
kubectl patch deployment streamforge -n streamforge --patch '
spec:
  template:
    spec:
      containers:
      - name: streamforge
        resources:
          limits:
            cpu: 4000m
'

3. Complex filters/transforms:

Filter: REGEX:/email,.*@[a-z]+\.[a-z]{2,}$
Transform: CONSTRUCT with 20 fields

Solution:

Simplify filters (use KEY_PREFIX instead of REGEX)
Move complex logic upstream
Increase threads:
```
threads: 8  # increase from 4
```

4. Kafka broker slow:

Fetch wait time: 500ms (high)

Solution:

Scale Kafka brokers
Add partitions to topic

Tune fetch_max_wait_ms:

performance:
fetch_max_wait_ms: 50  # reduce wait time

Issue: High Processing Latency

Symptoms:

p95 latency > 100ms (was 20ms)
Slow end-to-end processing

Diagnosis:

curl http://streamforge:8080/metrics | grep processing_duration

Common causes:

1. Large batch sizes:

performance:
  batch_size: 10000  # too large

Solution:

performance:
  batch_size: 500  # smaller for lower latency
  linger_ms: 0     # send immediately

2. Slow producer (destination Kafka slow):

ERROR Producer timeout after 30000ms

Solution:

Check destination Kafka health
Increase producer timeout
Use async producer (default)

3. Commit overhead:

commit_strategy: "per-message"  # high overhead

Solution:

commit_strategy: "time-based"
commit_interval_ms: 1000  # commit every 1 second

Issue: Low Throughput

Symptoms:

Throughput < 10K msg/s (expected 50K msg/s)
CPU usage low (< 30%)

Diagnosis:

# Check threading
kubectl logs deployment/streamforge -n streamforge | grep "threads"

# Check batch sizes
kubectl logs deployment/streamforge -n streamforge | grep "batch"

Common causes:

1. Too few threads:

threads: 1  # only using 1 CPU core

Solution:

threads: 8  # match CPU cores

2. Small batches:

performance:
  fetch_min_bytes: 1       # wait for 1 byte
  batch_size: 10           # small producer batches

Solution:

performance:
  fetch_min_bytes: 10240   # 10 KB
  batch_size: 5000         # large batches
  linger_ms: 50            # allow batching

3. Too many commits:

commit_strategy: "per-message"

Solution:

commit_strategy: "manual"
commit_interval_ms: 5000  # commit every 5 seconds

4. Compression overhead:

performance:
  compression: "gzip"  # slow

Solution:

performance:
  compression: "zstd"  # faster
  # or
  compression: "none"  # no compression overhead

Data Issues

Issue: Messages going to DLQ

Symptoms:

DLQ accumulating messages
Error rate > 1/s
Alert: “StreamForgeHighDLQRate”

Diagnosis:

# Sample DLQ messages
kafka-console-consumer --bootstrap-server kafka:9092 \
  --topic streamforge-dlq \
  --property print.headers=true \
  --max-messages 10

# Check error types
kubectl logs deployment/streamforge -n streamforge | grep "Sending to DLQ"

Common error types:

1. FilterEvaluation error:

Headers:
  x-streamforge-error-type: FilterEvaluation
  x-streamforge-filter: /status,==,active
  x-streamforge-source-topic: users

Cause: Message missing /status field or field is not a string.

Solution:

# Make filter more lenient
filter: "OR:/status,==,active:/status,==,null"

# Or skip messages without field
filter: "AND:EXISTS:/status:/status,==,active"

2. TransformError:

Headers:
  x-streamforge-error-type: TransformError
  x-streamforge-transform: /user/nonexistent

Cause: Transform path does not exist in message.

Solution:

# Use default value
transform: "EXTRACT:/user/id,user-id,default-value"

# Or use CONSTRUCT with fallback
transform: "CONSTRUCT:id=/user/id:name=/user/name:fallback=unknown"

3. SerializationError:

Headers:
  x-streamforge-error-type: SerializationError

Cause: Transform produced invalid JSON.

Solution:

Review transform logic
Validate transform output
Use simpler transform (e.g., /data instead of CONSTRUCT)

4. ProducerTimeout:

Headers:
  x-streamforge-error-type: RetryExhausted
  x-streamforge-retry-attempts: 3

Cause: Destination Kafka slow or unavailable, retries exhausted.

Solution:

Check destination Kafka health

Increase retry attempts:

retry:
max_attempts: 5
max_delay_ms: 60000

Issue: Missing Messages (Data Loss)

Symptoms:

Messages consumed but not produced
No DLQ entries
No errors logged

Diagnosis:

# Check filter logic
kubectl logs deployment/streamforge -n streamforge | grep "filtered out"

# Check transform logic
kubectl logs deployment/streamforge -n streamforge | grep "transform result: null"

# Compare consume vs produce counts
curl http://streamforge:8080/metrics | grep messages_consumed_total
curl http://streamforge:8080/metrics | grep messages_produced_total

Common causes:

1. Overly restrictive filter:

filter: "/status,==,active"
# If most messages have status != "active", they're filtered out

Solution:

Review filter logic
Check sample messages to verify filter correctness
Add logging to see filtered messages: ```yaml
In dev/staging, enable debug logging

env:
name: RUST_LOG value: “streamforge=debug” ```

2. Transform returns null:

transform: "/user/optional-field"
# If field doesn't exist, transform returns null, message skipped

Solution:

transform: "EXTRACT:/user/optional-field,field,default-value"

3. Partition mismatch:

partitioning: "field:/user/region"
# If /user/region doesn't exist, message sent to partition -1 (error)

Solution:

Use default partitioning
Or ensure partition key field always exists

Issue: Duplicate Messages

Symptoms:

Same message ID appears multiple times in destination
At-least-once delivery expected but duplicates excessive

Diagnosis:

# Check consumer group stability
kafka-consumer-groups --bootstrap-server kafka:9092 \
  --describe --group <appid>

# Check for rebalances
kubectl logs deployment/streamforge -n streamforge | grep "rebalance"

Common causes:

1. Consumer group rebalancing:

Pod restarts trigger rebalance
New replicas trigger rebalance
Partitions redistributed, some messages re-consumed

Solution:

Reduce pod churn (avoid frequent restarts)
Use stable replica count

Commit more frequently:

commit_strategy: "time-based"
commit_interval_ms: 1000  # commit every 1 second

2. Producer retries:

Producer sends message
Kafka acknowledges
Acknowledgment lost (network blip)
Producer retries, message duplicated

Solution:

This is expected with at-least-once semantics
Use idempotent producer (enabled by default in rdkafka)
Implement deduplication downstream (use message ID)

3. Manual offset reset:

Offsets reset to earlier position
Messages re-consumed

Solution:

Avoid manual offset resets
If needed, reset to specific timestamp, not “earliest”

Connectivity Issues

Issue: Cannot connect to Kafka

Symptoms:

ERROR Failed to connect to Kafka broker: Connection refused

Diagnosis:

# Test connectivity from pod
kubectl exec -it streamforge-xxx -n streamforge -- nc -zv kafka 9092

# Check DNS resolution
kubectl exec -it streamforge-xxx -n streamforge -- nslookup kafka

# Check network policies
kubectl get networkpolicy -n streamforge

Common causes:

1. Wrong bootstrap servers:

bootstrap: "kafka:9092"  # but Kafka is at kafka.kafka.svc:9092

Solution:

bootstrap: "kafka.kafka.svc.cluster.local:9092"

2. Network policy blocking traffic:

kubectl describe networkpolicy -n streamforge

Solution:

Add egress rule for Kafka: ```yaml apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: streamforge-netpol namespace: streamforge spec: podSelector: matchLabels: app: streamforge policyTypes:
- Egress egress:
- to:
  - namespaceSelector: matchLabels: name: kafka ports:
  - protocol: TCP port: 9092 ```

3. Kafka not running:

kubectl get pods -n kafka

Solution:

Start Kafka cluster
Wait for Kafka to be ready

4. TLS certificate error:

ERROR SSL handshake failed: certificate verify failed

Solution:

Check TLS config:

kafka:
ssl:
  ca_location: "/certs/ca.crt"  # must exist

Verify secret mounted:

kubectl exec -it streamforge-xxx -n streamforge -- ls -la /certs

Check certificate validity:

kubectl exec -it streamforge-xxx -n streamforge -- openssl x509 -in /certs/ca.crt -noout -dates

Issue: SASL authentication failure

Symptoms:

ERROR SASL authentication failed: Invalid credentials

Diagnosis:

# Check SASL config
kubectl get configmap streamforge-config -n streamforge -o yaml

# Check credentials
kubectl get secret kafka-credentials -n streamforge -o yaml

Common causes:

1. Wrong SASL mechanism:

kafka:
  security:
    sasl_mechanism: "PLAIN"  # but Kafka uses SCRAM-SHA-512

Solution:

kafka:
  security:
    sasl_mechanism: "SCRAM-SHA-512"

2. Incorrect username/password:

sasl_username: "${KAFKA_USER}"  # env var not set

Solution:

kubectl set env deployment/streamforge KAFKA_USER=myuser KAFKA_PASSWORD=mypass -n streamforge

3. Secret not mounted:

kubectl exec -it streamforge-xxx -n streamforge -- env | grep KAFKA

Solution:

spec:
  containers:
  - name: streamforge
    envFrom:
    - secretRef:
        name: kafka-credentials

Resource Issues

Issue: Out of Memory (OOMKilled)

Symptoms:

Last State: Terminated
  Reason: OOMKilled
  Exit Code: 137

Diagnosis:

kubectl describe pod streamforge-xxx -n streamforge
kubectl top pod streamforge-xxx -n streamforge

Common causes:

1. Memory limit too low:

resources:
  limits:
    memory: 512Mi  # too small

Solution:

resources:
  limits:
    memory: 4Gi

2. Large messages:

Average message size: 10 MB
Batch size: 1000
Total: 10 GB in memory

Solution:

performance:
  batch_size: 100  # reduce batch size
  fetch_max_bytes: 10485760  # 10 MB limit

3. Memory leak (rare):

Memory usage grows over time
Not correlated with load

Solution:

Restart pods periodically
Report issue to StreamForge GitHub

Issue: CPU Throttling

Symptoms:

CPU usage at limit (100%)
Slow processing despite high CPU request

Diagnosis:

kubectl top pods -n streamforge

# Check throttling
kubectl exec -it streamforge-xxx -n streamforge -- cat /sys/fs/cgroup/cpu/cpu.stat

Common causes:

1. CPU limit too low:

resources:
  limits:
    cpu: 1000m  # 1 core, but workload needs 4

Solution:

resources:
  limits:
    cpu: 4000m

2. Set requests == limits (guaranteed QoS):

resources:
  requests:
    cpu: 2000m
  limits:
    cpu: 4000m  # can throttle

Solution:

resources:
  requests:
    cpu: 2000m
  limits:
    cpu: 2000m  # guaranteed, no throttling

Issue: Disk Space Full

Symptoms:

ERROR Failed to write log: No space left on device

Diagnosis:

kubectl exec -it streamforge-xxx -n streamforge -- df -h

Common causes:

1. Excessive logging:

env:
- name: RUST_LOG
  value: "debug"  # too verbose

Solution:

env:
- name: RUST_LOG
  value: "info"

2. DLQ messages accumulating locally (if local DLQ):

Solution:

Send DLQ to Kafka topic (default)
Increase volume size

3. Persistent volume full:

kubectl get pvc -n streamforge

Solution:

Increase PVC size (if storage class supports expansion)
Clean up old data

Configuration Issues

Issue: Invalid DSL Syntax

Symptoms:

ERROR Failed to parse filter: unexpected token at position 10

Diagnosis:

streamforge-validate config.yaml

Common syntax errors:

1. Missing colon:

filter: "AND/status,==,active/age,>,18"  # wrong
filter: "AND:/status,==,active:/age,>,18"  # correct

2. Unescaped special characters:

filter: 'REGEX:/email,.*@.*\.com'  # wrong (. not escaped)
filter: 'REGEX:/email,.*@.*\\.com'  # correct

3. Wrong operator:

filter: "/age,>=,18"  # wrong (>= not supported)
filter: "/age,>,17"   # correct (use > instead)

4. Mismatched quotes:

filter: "REGEX:/name,^(John|Jane)"  # wrong (unclosed parenthesis)
filter: 'REGEX:/name,^(John|Jane)$'  # correct

Solution:

Use streamforge-validate before deploying
Review docs/DSL_SPEC.md for syntax
Test config locally first

Issue: Deprecated Syntax Warning

Symptoms:

WARNING Deprecated syntax: KEY_SUFFIX is deprecated, use KEY_MATCHES instead

Diagnosis:

streamforge-validate config.yaml

Solution:

# Old (deprecated)
filter: "KEY_SUFFIX:-prod"

# New
filter: 'KEY_MATCHES:.*-prod$'

Migration guide: docs/DSL_SPEC.md (Backward Compatibility section)

Issue: Config not reloading

Symptoms:

Updated ConfigMap
Pods still using old config

Diagnosis:

kubectl get configmap streamforge-config -n streamforge -o yaml
kubectl exec -it streamforge-xxx -n streamforge -- cat /app/config.yaml

Causes:

1. ConfigMap not propagated:

Kubernetes propagates ConfigMap updates eventually (up to 60 seconds)

Solution:

# Force restart
kubectl rollout restart deployment/streamforge -n streamforge

2. Hot-reload not enabled:

StreamForge requires restart for config changes

Solution:

Always restart after ConfigMap update

Kafka Issues

Issue: Consumer group lag not decreasing

Symptoms:

StreamForge running, no errors
Lag stays at 10000, not decreasing

Diagnosis:

kafka-consumer-groups --bootstrap-server kafka:9092 \
  --describe --group <appid>

Common causes:

1. More replicas than partitions:

Partitions: 4
Replicas: 8
Result: 4 replicas consume, 4 are idle

Solution:

Scale replicas to match partitions: kubectl scale deployment streamforge --replicas=4
Or add more partitions: kafka-topics --alter --partitions 8

2. Consumer group rebalancing:

Consumer rebalancing...

Solution:

Wait for rebalance to complete (30-60 seconds)
Reduce pod churn

3. Kafka brokers overloaded:

Fetch latency: 5000ms

Solution:

Scale Kafka brokers
Tune Kafka performance

Issue: Topic does not exist

Symptoms:

ERROR Topic 'nonexistent-topic' does not exist

Diagnosis:

kafka-topics --bootstrap-server kafka:9092 --list

Solution:

Option 1: Create topic

kafka-topics --bootstrap-server kafka:9092 \
  --create --topic output-topic \
  --partitions 16 \
  --replication-factor 3

Option 2: Enable auto-create

# Kafka broker config
auto.create.topics.enable=true

Option 3: Fix topic name in config

routing:
  destinations:
    - output: "output-topic"  # ensure spelling is correct

Issue: Partition count mismatch

Symptoms:

Some partitions have high lag
Others have zero lag
Unbalanced consumption

Diagnosis:

kafka-consumer-groups --bootstrap-server kafka:9092 \
  --describe --group <appid>

Cause:

Producer uses key-based partitioning
Keys are skewed (e.g., 80% have key “default”)
Most messages go to one partition

Solution:

Option 1: Use random partitioning

partitioning: "random"

Option 2: Use field-based partitioning with uniform distribution

partitioning: "field:/user/id"  # if user IDs are uniformly distributed

Option 3: Add more partitions

kafka-topics --bootstrap-server kafka:9092 \
  --alter --topic source-topic \
  --partitions 32

Debug Commands

Enable Debug Logging

Temporarily (current pod):

kubectl exec -it streamforge-xxx -n streamforge -- kill -USR1 1
# Toggles debug logging for duration of pod lifetime

Permanently (all pods):

kubectl set env deployment/streamforge RUST_LOG=streamforge=debug -n streamforge

Restore info logging:

kubectl set env deployment/streamforge RUST_LOG=streamforge=info -n streamforge

Inspect Message Contents

Sample source topic:

kafka-console-consumer --bootstrap-server kafka:9092 \
  --topic source-topic \
  --property print.key=true \
  --property print.headers=true \
  --property print.timestamp=true \
  --max-messages 10

Sample destination topic:

kafka-console-consumer --bootstrap-server kafka:9092 \
  --topic dest-topic \
  --property print.key=true \
  --max-messages 10

Sample DLQ:

kafka-console-consumer --bootstrap-server kafka:9092 \
  --topic streamforge-dlq \
  --property print.headers=true \
  --max-messages 10

Profile Performance

CPU profiling:

kubectl exec -it streamforge-xxx -n streamforge -- kill -SIGUSR2 1
# Outputs CPU profile to /tmp/cpu-profile.txt
kubectl cp streamforge-xxx:/tmp/cpu-profile.txt ./cpu-profile.txt -n streamforge

Memory profiling:

kubectl exec -it streamforge-xxx -n streamforge -- cat /proc/$(pgrep streamforge)/status

Test Filters/Transforms Locally

Test config:

# Use dry-run mode (if available)
docker run --rm \
  -v $(pwd)/config.yaml:/app/config.yaml:ro \
  streamforge:1.0.0 \
  --config /app/config.yaml \
  --dry-run

Validate config:

streamforge-validate config.yaml --verbose

Force Consumer Rebalance

Restart single pod:

kubectl delete pod streamforge-xxx -n streamforge

Restart all pods:

kubectl rollout restart deployment/streamforge -n streamforge

Force rebalance by changing group ID:

appid: "streamforge-prod-v2"  # new group ID
offset: "latest"  # start from latest to avoid reprocessing

Check Kafka Broker Health

Broker API versions:

kafka-broker-api-versions --bootstrap-server kafka:9092

Topic metadata:

kafka-topics --bootstrap-server kafka:9092 \
  --describe --topic source-topic

Consumer group state:

kafka-consumer-groups --bootstrap-server kafka:9092 \
  --describe --group <appid> \
  --state

Capture Metrics Snapshot

Export all metrics:

curl http://streamforge:8080/metrics > metrics-$(date +%s).txt

Query specific metrics:

curl -s http://streamforge:8080/metrics | grep -E "(lag|error|duration)"

Getting Help

Check Documentation

DSL Specification - Filter/transform syntax
Deployment Guide - Deployment options
Operations Guide - Day-to-day operations
Architecture - System design

Enable Verbose Logging

env:
- name: RUST_LOG
  value: "streamforge=debug,rdkafka=info"
- name: RUST_BACKTRACE
  value: "full"

Collect Diagnostic Bundle

#!/bin/bash
# collect-diagnostics.sh

mkdir -p diagnostics/$(date +%Y-%m-%d)
cd diagnostics/$(date +%Y-%m-%d)

# Pod status
kubectl get pods -n streamforge -o wide > pods.txt

# Logs
kubectl logs deployment/streamforge -n streamforge --tail=1000 > logs.txt

# Config
kubectl get configmap streamforge-config -n streamforge -o yaml > config.yaml

# Metrics
curl http://streamforge:8080/metrics > metrics.txt

# Consumer group
kafka-consumer-groups --bootstrap-server kafka:9092 \
  --describe --group <appid> > consumer-group.txt

# Events
kubectl get events -n streamforge --sort-by='.lastTimestamp' > events.txt

# Resource usage
kubectl top pods -n streamforge > resources.txt

echo "Diagnostics collected in diagnostics/$(date +%Y-%m-%d)/"

Report Issues

GitHub Issues: https://github.com/rahulbsw/streamforge/issues

Include:

StreamForge version
Kubernetes version
Kafka version
Config file (redact sensitive data)
Logs (last 100 lines)
Error messages
Steps to reproduce

Issue Decision Tree

Is StreamForge running?
  ├─ No → Check startup issues
  │        └─ CrashLoopBackOff? → Check logs for config errors
  │        └─ Pending? → Check resource availability
  │        └─ ImagePullBackOff? → Check image registry
  │
  └─ Yes → Check metrics
           ├─ High lag? → Check performance issues
           │             └─ CPU high? → Scale up or add threads
           │             └─ CPU low? → Increase batch sizes
           │
           ├─ High errors? → Check DLQ headers
           │               └─ FilterEvaluation? → Fix filter logic
           │               └─ ProducerTimeout? → Check destination Kafka
           │
           ├─ Zero throughput? → Check connectivity
           │                    └─ Kafka connection error? → Check network
           │                    └─ SASL error? → Check credentials
           │
           └─ Duplicates? → Check commit strategy
                           └─ Frequent rebalances? → Reduce pod churn
                           └─ Manual offset reset? → Avoid resets

Document Version: 1.0.0
Last Updated: 2026-04-18
Feedback: https://github.com/rahulbsw/streamforge/issues

StreamForge Troubleshooting Guide

Table of Contents

Quick Diagnosis

Health Check Commands

Common Symptoms Quick Reference

Startup Issues

Issue: Pod stuck in Pending

Issue: Pod stuck in Init:Error

Issue: Pod CrashLoopBackOff

Issue: Container exits immediately with code 1

Performance Issues

Issue: High Consumer Lag

Issue: High Processing Latency

Issue: Low Throughput

Data Issues

Issue: Messages going to DLQ

Issue: Missing Messages (Data Loss)

In dev/staging, enable debug logging

Issue: Duplicate Messages

Connectivity Issues

Issue: Cannot connect to Kafka

Issue: SASL authentication failure

Resource Issues

Issue: Out of Memory (OOMKilled)

Issue: CPU Throttling

Issue: Disk Space Full

Configuration Issues

Issue: Invalid DSL Syntax

Issue: Deprecated Syntax Warning

Issue: Config not reloading

Kafka Issues

Issue: Consumer group lag not decreasing

Issue: Topic does not exist

Issue: Partition count mismatch

Debug Commands

Enable Debug Logging

Inspect Message Contents

Profile Performance

Test Filters/Transforms Locally

Force Consumer Rebalance

Check Kafka Broker Health

Capture Metrics Snapshot

Getting Help

Check Documentation

Enable Verbose Logging

Collect Diagnostic Bundle

Report Issues

Issue Decision Tree

Issue: Pod stuck in `Pending`

Issue: Pod stuck in `Init:Error`

Issue: Pod `CrashLoopBackOff`