StreamForge Troubleshooting Guide
Version: 1.0.0
Last Updated: 2026-04-18
This guide covers common issues, symptoms, causes, and solutions for StreamForge operations.
Table of Contents
- Quick Diagnosis
- Startup Issues
- Performance Issues
- Data Issues
- Connectivity Issues
- Resource Issues
- Configuration Issues
- Kafka Issues
- Debug Commands
Quick Diagnosis
Health Check Commands
# 1. Check pod status
kubectl get pods -n streamforge
# 2. Check logs
kubectl logs -f deployment/streamforge -n streamforge --tail=50
# 3. Check metrics
curl http://streamforge:8080/metrics | grep -E "(up|error|lag)"
# 4. Check consumer group
kafka-consumer-groups --bootstrap-server kafka:9092 \
--describe --group <appid>
# 5. Check resource usage
kubectl top pods -n streamforge
Common Symptoms Quick Reference
| Symptom | Likely Cause | Quick Fix |
|---|---|---|
Pod stuck in CrashLoopBackOff | Config error or missing secret | Check logs, validate config |
| Consumer lag growing | Insufficient replicas or CPU | Scale up |
| High error rate | Bad data or config mismatch | Check DLQ headers |
| Zero throughput | Kafka connection failure | Check connectivity |
| High memory usage | Large messages or memory leak | Increase limits, restart |
| Slow processing | Complex filters or transforms | Optimize DSL, add threads |
Startup Issues
Issue: Pod stuck in Pending
Symptoms:
NAME READY STATUS RESTARTS AGE
streamforge-7c8f9d4b6-abc12 0/1 Pending 0 5m
Diagnosis:
kubectl describe pod streamforge-7c8f9d4b6-abc12 -n streamforge
Common causes:
1. Insufficient resources:
Events:
Warning FailedScheduling 5m default-scheduler 0/3 nodes are available: insufficient cpu.
Solution:
- Reduce resource requests
- Add more nodes to cluster
- Remove resource limits temporarily
kubectl patch deployment streamforge -n streamforge --patch '
spec:
template:
spec:
containers:
- name: streamforge
resources:
requests:
cpu: 500m
memory: 1Gi
'
2. ImagePullBackOff:
Events:
Warning Failed 5m kubelet Failed to pull image "streamforge:1.0.0": rpc error: code = Unknown
Solution:
- Check image exists:
docker pull streamforge:1.0.0 - Check image pull secret:
kubectl get secret -n streamforge - Use correct image registry
3. PVC not bound:
Events:
Warning FailedMount 5m kubelet Unable to attach or mount volumes
Solution:
- Check PVC status:
kubectl get pvc -n streamforge - Create missing PV/PVC
- Remove volume if not needed
Issue: Pod stuck in Init:Error
Symptoms:
NAME READY STATUS RESTARTS AGE
streamforge-7c8f9d4b6-abc12 0/1 Init:Error 0 2m
Diagnosis:
kubectl logs streamforge-7c8f9d4b6-abc12 -c init-container -n streamforge
kubectl describe pod streamforge-7c8f9d4b6-abc12 -n streamforge
Common causes:
Init container failed:
- Check init container logs
- Verify dependencies (e.g., Kafka must be reachable)
- Fix init script
Solution: Remove init container if not essential, or fix the init logic.
Issue: Pod CrashLoopBackOff
Symptoms:
NAME READY STATUS RESTARTS AGE
streamforge-7c8f9d4b6-abc12 0/1 CrashLoopBackOff 5 5m
Diagnosis:
# Check current logs
kubectl logs streamforge-7c8f9d4b6-abc12 -n streamforge
# Check previous crashed instance
kubectl logs streamforge-7c8f9d4b6-abc12 -n streamforge --previous
Common causes:
1. Config parse error:
ERROR Failed to parse config: invalid YAML at line 10
Solution:
# Validate config
streamforge-validate config.yaml
# Fix ConfigMap
kubectl edit configmap streamforge-config -n streamforge
# Restart
kubectl rollout restart deployment/streamforge -n streamforge
2. Missing environment variable:
ERROR Environment variable KAFKA_BOOTSTRAP not set
Solution:
kubectl set env deployment/streamforge KAFKA_BOOTSTRAP=kafka:9092 -n streamforge
3. Kafka connection failure:
ERROR Failed to connect to Kafka broker at kafka:9092: Connection refused
Solution:
- Check Kafka is running
- Verify bootstrap servers in config
- Check network policies
- Test connectivity:
kubectl exec -it streamforge-xxx -n streamforge -- ping kafka
4. OOMKilled (Out of Memory):
kubectl describe pod streamforge-xxx -n streamforge
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Solution:
# Increase memory limits
kubectl patch deployment streamforge -n streamforge --patch '
spec:
template:
spec:
containers:
- name: streamforge
resources:
limits:
memory: 4Gi
'
Issue: Container exits immediately with code 1
Diagnosis:
kubectl logs streamforge-xxx -n streamforge --previous
Common causes:
Invalid filter/transform syntax:
ERROR Failed to parse filter: /status,==,active,extra-arg
Solution:
- Fix DSL syntax
- Use
streamforge-validateto check config - Review docs/DSL_SPEC.md for correct syntax
Permission denied (TLS certs):
ERROR Failed to read TLS certificate: Permission denied
Solution:
# Check file permissions in secret
kubectl describe secret kafka-tls -n streamforge
# Ensure securityContext allows reading
kubectl patch deployment streamforge -n streamforge --patch '
spec:
template:
spec:
securityContext:
fsGroup: 1000
'
Performance Issues
Issue: High Consumer Lag
Symptoms:
- Lag > 10000 messages
- Lag growing over time
- Alert: “StreamForgeHighLag”
Diagnosis:
# Check lag
kafka-consumer-groups --bootstrap-server kafka:9092 \
--describe --group <appid>
# Check throughput
curl http://streamforge:8080/metrics | grep messages_consumed_total
# Check CPU usage
kubectl top pods -n streamforge
Common causes:
1. Insufficient parallelism (too few replicas):
Partitions: 16
Replicas: 2
Result: 14 partitions idle, only 2 being consumed
Solution:
kubectl scale deployment streamforge --replicas=8 -n streamforge
2. CPU saturation:
CPU: 1900m/2000m (95%)
Solution:
# Increase CPU limits
kubectl patch deployment streamforge -n streamforge --patch '
spec:
template:
spec:
containers:
- name: streamforge
resources:
limits:
cpu: 4000m
'
3. Complex filters/transforms:
Filter: REGEX:/email,.*@[a-z]+\.[a-z]{2,}$
Transform: CONSTRUCT with 20 fields
Solution:
- Simplify filters (use KEY_PREFIX instead of REGEX)
- Move complex logic upstream
- Increase threads:
threads: 8 # increase from 4
4. Kafka broker slow:
Fetch wait time: 500ms (high)
Solution:
- Scale Kafka brokers
- Add partitions to topic
- Tune
fetch_max_wait_ms:performance: fetch_max_wait_ms: 50 # reduce wait time
Issue: High Processing Latency
Symptoms:
- p95 latency > 100ms (was 20ms)
- Slow end-to-end processing
Diagnosis:
curl http://streamforge:8080/metrics | grep processing_duration
Common causes:
1. Large batch sizes:
performance:
batch_size: 10000 # too large
Solution:
performance:
batch_size: 500 # smaller for lower latency
linger_ms: 0 # send immediately
2. Slow producer (destination Kafka slow):
ERROR Producer timeout after 30000ms
Solution:
- Check destination Kafka health
- Increase producer timeout
- Use async producer (default)
3. Commit overhead:
commit_strategy: "per-message" # high overhead
Solution:
commit_strategy: "time-based"
commit_interval_ms: 1000 # commit every 1 second
Issue: Low Throughput
Symptoms:
- Throughput < 10K msg/s (expected 50K msg/s)
- CPU usage low (< 30%)
Diagnosis:
# Check threading
kubectl logs deployment/streamforge -n streamforge | grep "threads"
# Check batch sizes
kubectl logs deployment/streamforge -n streamforge | grep "batch"
Common causes:
1. Too few threads:
threads: 1 # only using 1 CPU core
Solution:
threads: 8 # match CPU cores
2. Small batches:
performance:
fetch_min_bytes: 1 # wait for 1 byte
batch_size: 10 # small producer batches
Solution:
performance:
fetch_min_bytes: 10240 # 10 KB
batch_size: 5000 # large batches
linger_ms: 50 # allow batching
3. Too many commits:
commit_strategy: "per-message"
Solution:
commit_strategy: "manual"
commit_interval_ms: 5000 # commit every 5 seconds
4. Compression overhead:
performance:
compression: "gzip" # slow
Solution:
performance:
compression: "zstd" # faster
# or
compression: "none" # no compression overhead
Data Issues
Issue: Messages going to DLQ
Symptoms:
- DLQ accumulating messages
- Error rate > 1/s
- Alert: “StreamForgeHighDLQRate”
Diagnosis:
# Sample DLQ messages
kafka-console-consumer --bootstrap-server kafka:9092 \
--topic streamforge-dlq \
--property print.headers=true \
--max-messages 10
# Check error types
kubectl logs deployment/streamforge -n streamforge | grep "Sending to DLQ"
Common error types:
1. FilterEvaluation error:
Headers:
x-streamforge-error-type: FilterEvaluation
x-streamforge-filter: /status,==,active
x-streamforge-source-topic: users
Cause: Message missing /status field or field is not a string.
Solution:
# Make filter more lenient
filter: "OR:/status,==,active:/status,==,null"
# Or skip messages without field
filter: "AND:EXISTS:/status:/status,==,active"
2. TransformError:
Headers:
x-streamforge-error-type: TransformError
x-streamforge-transform: /user/nonexistent
Cause: Transform path does not exist in message.
Solution:
# Use default value
transform: "EXTRACT:/user/id,user-id,default-value"
# Or use CONSTRUCT with fallback
transform: "CONSTRUCT:id=/user/id:name=/user/name:fallback=unknown"
3. SerializationError:
Headers:
x-streamforge-error-type: SerializationError
Cause: Transform produced invalid JSON.
Solution:
- Review transform logic
- Validate transform output
- Use simpler transform (e.g., /data instead of CONSTRUCT)
4. ProducerTimeout:
Headers:
x-streamforge-error-type: RetryExhausted
x-streamforge-retry-attempts: 3
Cause: Destination Kafka slow or unavailable, retries exhausted.
Solution:
- Check destination Kafka health
- Increase retry attempts:
retry: max_attempts: 5 max_delay_ms: 60000
Issue: Missing Messages (Data Loss)
Symptoms:
- Messages consumed but not produced
- No DLQ entries
- No errors logged
Diagnosis:
# Check filter logic
kubectl logs deployment/streamforge -n streamforge | grep "filtered out"
# Check transform logic
kubectl logs deployment/streamforge -n streamforge | grep "transform result: null"
# Compare consume vs produce counts
curl http://streamforge:8080/metrics | grep messages_consumed_total
curl http://streamforge:8080/metrics | grep messages_produced_total
Common causes:
1. Overly restrictive filter:
filter: "/status,==,active"
# If most messages have status != "active", they're filtered out
Solution:
- Review filter logic
- Check sample messages to verify filter correctness
- Add logging to see filtered messages: ```yaml
In dev/staging, enable debug logging
env:
- name: RUST_LOG value: “streamforge=debug” ```
2. Transform returns null:
transform: "/user/optional-field"
# If field doesn't exist, transform returns null, message skipped
Solution:
transform: "EXTRACT:/user/optional-field,field,default-value"
3. Partition mismatch:
partitioning: "field:/user/region"
# If /user/region doesn't exist, message sent to partition -1 (error)
Solution:
- Use default partitioning
- Or ensure partition key field always exists
Issue: Duplicate Messages
Symptoms:
- Same message ID appears multiple times in destination
- At-least-once delivery expected but duplicates excessive
Diagnosis:
# Check consumer group stability
kafka-consumer-groups --bootstrap-server kafka:9092 \
--describe --group <appid>
# Check for rebalances
kubectl logs deployment/streamforge -n streamforge | grep "rebalance"
Common causes:
1. Consumer group rebalancing:
- Pod restarts trigger rebalance
- New replicas trigger rebalance
- Partitions redistributed, some messages re-consumed
Solution:
- Reduce pod churn (avoid frequent restarts)
- Use stable replica count
- Commit more frequently:
commit_strategy: "time-based" commit_interval_ms: 1000 # commit every 1 second
2. Producer retries:
- Producer sends message
- Kafka acknowledges
- Acknowledgment lost (network blip)
- Producer retries, message duplicated
Solution:
- This is expected with at-least-once semantics
- Use idempotent producer (enabled by default in rdkafka)
- Implement deduplication downstream (use message ID)
3. Manual offset reset:
- Offsets reset to earlier position
- Messages re-consumed
Solution:
- Avoid manual offset resets
- If needed, reset to specific timestamp, not “earliest”
Connectivity Issues
Issue: Cannot connect to Kafka
Symptoms:
ERROR Failed to connect to Kafka broker: Connection refused
Diagnosis:
# Test connectivity from pod
kubectl exec -it streamforge-xxx -n streamforge -- nc -zv kafka 9092
# Check DNS resolution
kubectl exec -it streamforge-xxx -n streamforge -- nslookup kafka
# Check network policies
kubectl get networkpolicy -n streamforge
Common causes:
1. Wrong bootstrap servers:
bootstrap: "kafka:9092" # but Kafka is at kafka.kafka.svc:9092
Solution:
bootstrap: "kafka.kafka.svc.cluster.local:9092"
2. Network policy blocking traffic:
kubectl describe networkpolicy -n streamforge
Solution:
- Add egress rule for Kafka: ```yaml apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: streamforge-netpol namespace: streamforge spec: podSelector: matchLabels: app: streamforge policyTypes:
- Egress egress:
- to:
- namespaceSelector: matchLabels: name: kafka ports:
- protocol: TCP port: 9092 ```
3. Kafka not running:
kubectl get pods -n kafka
Solution:
- Start Kafka cluster
- Wait for Kafka to be ready
4. TLS certificate error:
ERROR SSL handshake failed: certificate verify failed
Solution:
- Check TLS config:
kafka: ssl: ca_location: "/certs/ca.crt" # must exist - Verify secret mounted:
kubectl exec -it streamforge-xxx -n streamforge -- ls -la /certs - Check certificate validity:
kubectl exec -it streamforge-xxx -n streamforge -- openssl x509 -in /certs/ca.crt -noout -dates
Issue: SASL authentication failure
Symptoms:
ERROR SASL authentication failed: Invalid credentials
Diagnosis:
# Check SASL config
kubectl get configmap streamforge-config -n streamforge -o yaml
# Check credentials
kubectl get secret kafka-credentials -n streamforge -o yaml
Common causes:
1. Wrong SASL mechanism:
kafka:
security:
sasl_mechanism: "PLAIN" # but Kafka uses SCRAM-SHA-512
Solution:
kafka:
security:
sasl_mechanism: "SCRAM-SHA-512"
2. Incorrect username/password:
sasl_username: "${KAFKA_USER}" # env var not set
Solution:
kubectl set env deployment/streamforge KAFKA_USER=myuser KAFKA_PASSWORD=mypass -n streamforge
3. Secret not mounted:
kubectl exec -it streamforge-xxx -n streamforge -- env | grep KAFKA
Solution:
spec:
containers:
- name: streamforge
envFrom:
- secretRef:
name: kafka-credentials
Resource Issues
Issue: Out of Memory (OOMKilled)
Symptoms:
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Diagnosis:
kubectl describe pod streamforge-xxx -n streamforge
kubectl top pod streamforge-xxx -n streamforge
Common causes:
1. Memory limit too low:
resources:
limits:
memory: 512Mi # too small
Solution:
resources:
limits:
memory: 4Gi
2. Large messages:
Average message size: 10 MB
Batch size: 1000
Total: 10 GB in memory
Solution:
performance:
batch_size: 100 # reduce batch size
fetch_max_bytes: 10485760 # 10 MB limit
3. Memory leak (rare):
- Memory usage grows over time
- Not correlated with load
Solution:
- Restart pods periodically
- Report issue to StreamForge GitHub
Issue: CPU Throttling
Symptoms:
- CPU usage at limit (100%)
- Slow processing despite high CPU request
Diagnosis:
kubectl top pods -n streamforge
# Check throttling
kubectl exec -it streamforge-xxx -n streamforge -- cat /sys/fs/cgroup/cpu/cpu.stat
Common causes:
1. CPU limit too low:
resources:
limits:
cpu: 1000m # 1 core, but workload needs 4
Solution:
resources:
limits:
cpu: 4000m
2. Set requests == limits (guaranteed QoS):
resources:
requests:
cpu: 2000m
limits:
cpu: 4000m # can throttle
Solution:
resources:
requests:
cpu: 2000m
limits:
cpu: 2000m # guaranteed, no throttling
Issue: Disk Space Full
Symptoms:
ERROR Failed to write log: No space left on device
Diagnosis:
kubectl exec -it streamforge-xxx -n streamforge -- df -h
Common causes:
1. Excessive logging:
env:
- name: RUST_LOG
value: "debug" # too verbose
Solution:
env:
- name: RUST_LOG
value: "info"
2. DLQ messages accumulating locally (if local DLQ):
Solution:
- Send DLQ to Kafka topic (default)
- Increase volume size
3. Persistent volume full:
kubectl get pvc -n streamforge
Solution:
- Increase PVC size (if storage class supports expansion)
- Clean up old data
Configuration Issues
Issue: Invalid DSL Syntax
Symptoms:
ERROR Failed to parse filter: unexpected token at position 10
Diagnosis:
streamforge-validate config.yaml
Common syntax errors:
1. Missing colon:
filter: "AND/status,==,active/age,>,18" # wrong
filter: "AND:/status,==,active:/age,>,18" # correct
2. Unescaped special characters:
filter: 'REGEX:/email,.*@.*\.com' # wrong (. not escaped)
filter: 'REGEX:/email,.*@.*\\.com' # correct
3. Wrong operator:
filter: "/age,>=,18" # wrong (>= not supported)
filter: "/age,>,17" # correct (use > instead)
4. Mismatched quotes:
filter: "REGEX:/name,^(John|Jane)" # wrong (unclosed parenthesis)
filter: 'REGEX:/name,^(John|Jane)$' # correct
Solution:
- Use
streamforge-validatebefore deploying - Review docs/DSL_SPEC.md for syntax
- Test config locally first
Issue: Deprecated Syntax Warning
Symptoms:
WARNING Deprecated syntax: KEY_SUFFIX is deprecated, use KEY_MATCHES instead
Diagnosis:
streamforge-validate config.yaml
Solution:
# Old (deprecated)
filter: "KEY_SUFFIX:-prod"
# New
filter: 'KEY_MATCHES:.*-prod$'
Migration guide: docs/DSL_SPEC.md (Backward Compatibility section)
Issue: Config not reloading
Symptoms:
- Updated ConfigMap
- Pods still using old config
Diagnosis:
kubectl get configmap streamforge-config -n streamforge -o yaml
kubectl exec -it streamforge-xxx -n streamforge -- cat /app/config.yaml
Causes:
1. ConfigMap not propagated:
- Kubernetes propagates ConfigMap updates eventually (up to 60 seconds)
Solution:
# Force restart
kubectl rollout restart deployment/streamforge -n streamforge
2. Hot-reload not enabled:
- StreamForge requires restart for config changes
Solution:
- Always restart after ConfigMap update
Kafka Issues
Issue: Consumer group lag not decreasing
Symptoms:
- StreamForge running, no errors
- Lag stays at 10000, not decreasing
Diagnosis:
kafka-consumer-groups --bootstrap-server kafka:9092 \
--describe --group <appid>
Common causes:
1. More replicas than partitions:
Partitions: 4
Replicas: 8
Result: 4 replicas consume, 4 are idle
Solution:
- Scale replicas to match partitions:
kubectl scale deployment streamforge --replicas=4 - Or add more partitions:
kafka-topics --alter --partitions 8
2. Consumer group rebalancing:
Consumer rebalancing...
Solution:
- Wait for rebalance to complete (30-60 seconds)
- Reduce pod churn
3. Kafka brokers overloaded:
Fetch latency: 5000ms
Solution:
- Scale Kafka brokers
- Tune Kafka performance
Issue: Topic does not exist
Symptoms:
ERROR Topic 'nonexistent-topic' does not exist
Diagnosis:
kafka-topics --bootstrap-server kafka:9092 --list
Solution:
Option 1: Create topic
kafka-topics --bootstrap-server kafka:9092 \
--create --topic output-topic \
--partitions 16 \
--replication-factor 3
Option 2: Enable auto-create
# Kafka broker config
auto.create.topics.enable=true
Option 3: Fix topic name in config
routing:
destinations:
- output: "output-topic" # ensure spelling is correct
Issue: Partition count mismatch
Symptoms:
- Some partitions have high lag
- Others have zero lag
- Unbalanced consumption
Diagnosis:
kafka-consumer-groups --bootstrap-server kafka:9092 \
--describe --group <appid>
Cause:
- Producer uses key-based partitioning
- Keys are skewed (e.g., 80% have key “default”)
- Most messages go to one partition
Solution:
Option 1: Use random partitioning
partitioning: "random"
Option 2: Use field-based partitioning with uniform distribution
partitioning: "field:/user/id" # if user IDs are uniformly distributed
Option 3: Add more partitions
kafka-topics --bootstrap-server kafka:9092 \
--alter --topic source-topic \
--partitions 32
Debug Commands
Enable Debug Logging
Temporarily (current pod):
kubectl exec -it streamforge-xxx -n streamforge -- kill -USR1 1
# Toggles debug logging for duration of pod lifetime
Permanently (all pods):
kubectl set env deployment/streamforge RUST_LOG=streamforge=debug -n streamforge
Restore info logging:
kubectl set env deployment/streamforge RUST_LOG=streamforge=info -n streamforge
Inspect Message Contents
Sample source topic:
kafka-console-consumer --bootstrap-server kafka:9092 \
--topic source-topic \
--property print.key=true \
--property print.headers=true \
--property print.timestamp=true \
--max-messages 10
Sample destination topic:
kafka-console-consumer --bootstrap-server kafka:9092 \
--topic dest-topic \
--property print.key=true \
--max-messages 10
Sample DLQ:
kafka-console-consumer --bootstrap-server kafka:9092 \
--topic streamforge-dlq \
--property print.headers=true \
--max-messages 10
Profile Performance
CPU profiling:
kubectl exec -it streamforge-xxx -n streamforge -- kill -SIGUSR2 1
# Outputs CPU profile to /tmp/cpu-profile.txt
kubectl cp streamforge-xxx:/tmp/cpu-profile.txt ./cpu-profile.txt -n streamforge
Memory profiling:
kubectl exec -it streamforge-xxx -n streamforge -- cat /proc/$(pgrep streamforge)/status
Test Filters/Transforms Locally
Test config:
# Use dry-run mode (if available)
docker run --rm \
-v $(pwd)/config.yaml:/app/config.yaml:ro \
streamforge:1.0.0 \
--config /app/config.yaml \
--dry-run
Validate config:
streamforge-validate config.yaml --verbose
Force Consumer Rebalance
Restart single pod:
kubectl delete pod streamforge-xxx -n streamforge
Restart all pods:
kubectl rollout restart deployment/streamforge -n streamforge
Force rebalance by changing group ID:
appid: "streamforge-prod-v2" # new group ID
offset: "latest" # start from latest to avoid reprocessing
Check Kafka Broker Health
Broker API versions:
kafka-broker-api-versions --bootstrap-server kafka:9092
Topic metadata:
kafka-topics --bootstrap-server kafka:9092 \
--describe --topic source-topic
Consumer group state:
kafka-consumer-groups --bootstrap-server kafka:9092 \
--describe --group <appid> \
--state
Capture Metrics Snapshot
Export all metrics:
curl http://streamforge:8080/metrics > metrics-$(date +%s).txt
Query specific metrics:
curl -s http://streamforge:8080/metrics | grep -E "(lag|error|duration)"
Getting Help
Check Documentation
- DSL Specification - Filter/transform syntax
- Deployment Guide - Deployment options
- Operations Guide - Day-to-day operations
- Architecture - System design
Enable Verbose Logging
env:
- name: RUST_LOG
value: "streamforge=debug,rdkafka=info"
- name: RUST_BACKTRACE
value: "full"
Collect Diagnostic Bundle
#!/bin/bash
# collect-diagnostics.sh
mkdir -p diagnostics/$(date +%Y-%m-%d)
cd diagnostics/$(date +%Y-%m-%d)
# Pod status
kubectl get pods -n streamforge -o wide > pods.txt
# Logs
kubectl logs deployment/streamforge -n streamforge --tail=1000 > logs.txt
# Config
kubectl get configmap streamforge-config -n streamforge -o yaml > config.yaml
# Metrics
curl http://streamforge:8080/metrics > metrics.txt
# Consumer group
kafka-consumer-groups --bootstrap-server kafka:9092 \
--describe --group <appid> > consumer-group.txt
# Events
kubectl get events -n streamforge --sort-by='.lastTimestamp' > events.txt
# Resource usage
kubectl top pods -n streamforge > resources.txt
echo "Diagnostics collected in diagnostics/$(date +%Y-%m-%d)/"
Report Issues
GitHub Issues: https://github.com/rahulbsw/streamforge/issues
Include:
- StreamForge version
- Kubernetes version
- Kafka version
- Config file (redact sensitive data)
- Logs (last 100 lines)
- Error messages
- Steps to reproduce
Issue Decision Tree
Is StreamForge running?
├─ No → Check startup issues
│ └─ CrashLoopBackOff? → Check logs for config errors
│ └─ Pending? → Check resource availability
│ └─ ImagePullBackOff? → Check image registry
│
└─ Yes → Check metrics
├─ High lag? → Check performance issues
│ └─ CPU high? → Scale up or add threads
│ └─ CPU low? → Increase batch sizes
│
├─ High errors? → Check DLQ headers
│ └─ FilterEvaluation? → Fix filter logic
│ └─ ProducerTimeout? → Check destination Kafka
│
├─ Zero throughput? → Check connectivity
│ └─ Kafka connection error? → Check network
│ └─ SASL error? → Check credentials
│
└─ Duplicates? → Check commit strategy
└─ Frequent rebalances? → Reduce pod churn
└─ Manual offset reset? → Avoid resets
Document Version: 1.0.0
Last Updated: 2026-04-18
Feedback: https://github.com/rahulbsw/streamforge/issues