Scaling Guide

Complete guide to scaling StreamForge for high-throughput production deployments.

Scaling Fundamentals
Horizontal Scaling
Vertical Scaling
Kafka Partition Strategy
Consumer Group Coordination
Load Balancing
Scaling Patterns
Monitoring and Tuning
Best Practices

Scaling Fundamentals

Understanding the Architecture

┌────────────────────────────────────────────────────────────┐
│                    Source Kafka Cluster                     │
│  Topic: events (10 partitions, 100K msg/s)                 │
└─────────────────┬──────────────────────────────────────────┘
                  │
    ┌─────────────┼─────────────┐
    │             │             │
    ▼             ▼             ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│Instance1│ │Instance2│ │Instance3│  Consumer Group: "streamforge"
│P0,P1,P2 │ │P3,P4,P5 │ │P6,P7,P8 │  Each gets partitions
└────┬────┘ └────┬────┘ └────┬────┘
     │           │           │
     └───────────┼───────────┘
                 │
                 ▼
┌────────────────────────────────────────────────────────────┐
│                   Target Kafka Cluster                      │
│  Multiple topics (filtered/transformed)                     │
└────────────────────────────────────────────────────────────┘

Key Concepts

Partition-Based Parallelism:

Kafka partitions are the unit of parallelism
Each instance consumes specific partitions
Cannot have more consumers than partitions

Consumer Group:

All instances share the same appid (consumer group ID)
Kafka automatically assigns partitions to instances
Rebalancing happens when instances join/leave

Throughput Formula:

Total Throughput = (Partitions × Per-Partition Throughput) / Replication Factor
Instance Throughput = Total Throughput / Number of Instances

Horizontal Scaling

Adding More Instances

When to Scale Horizontally:

Consumer lag increasing
CPU usage > 80% across instances
Want higher availability
Input topic has many partitions

Maximum Instances:

Max Instances = Number of Source Topic Partitions

Example: 10-partition topic

1 instance: Consumes all 10 partitions
5 instances: Each consumes 2 partitions
10 instances: Each consumes 1 partition
11+ instances: Some instances idle (wasted resources)

Configuration for Horizontal Scaling

Same config for all instances:

{
  "appid": "streamforge",
  "bootstrap": "kafka:9092",
  "input": "events",
  "output": "events-mirror",
  "threads": 4
}

Key points:

✅ Same appid (consumer group)
✅ Same topic configuration
✅ Same filter/transform logic
✅ Kafka handles partition assignment

Deployment Strategies

Docker Compose

version: '3.8'
services:
  mirrormaker:
    image: streamforge:latest
    deploy:
      replicas: 5  # Scale to 5 instances
    environment:
      - CONFIG_FILE=/app/config/config.json
    volumes:
      - ./config.json:/app/config/config.json:ro

Scale dynamically:

docker-compose up -d --scale mirrormaker=5

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: streamforge
spec:
  replicas: 5  # Number of instances
  selector:
    matchLabels:
      app: streamforge
  template:
    metadata:
      labels:
        app: streamforge
    spec:
      containers:
      - name: mirrormaker
        image: streamforge:latest
        resources:
          requests:
            memory: "256Mi"
            cpu: "1000m"
          limits:
            memory: "512Mi"
            cpu: "2000m"
        env:
        - name: CONFIG_FILE
          value: /app/config/config.json
        volumeMounts:
        - name: config
          mountPath: /app/config
          readOnly: true
      volumes:
      - name: config
        configMap:
          name: mirrormaker-config

Scale with kubectl:

kubectl scale deployment streamforge --replicas=10

Horizontal Pod Autoscaler (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: streamforge-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: streamforge
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Auto-scales based on CPU/memory usage.

Consumer Group Rebalancing

What happens when scaling:

New instance joins:

Before: Instance1 (P0-P9)
Add Instance2
Rebalance...
After: Instance1 (P0-P4), Instance2 (P5-P9)

Instance removed:

Before: Inst1 (P0-P4), Inst2 (P5-P9)
Remove Instance2
Rebalance...
After: Instance1 (P0-P9)

Rebalance Settings:

{
  "consumer_properties": {
    "session.timeout.ms": "30000",
    "heartbeat.interval.ms": "3000",
    "max.poll.interval.ms": "300000"
  }
}

During rebalancing:

Processing pauses briefly (1-5 seconds)
Partitions reassigned
Consumer lag may spike temporarily
No message loss (Kafka handles offsets)

Vertical Scaling

Adding More Resources Per Instance

When to Scale Vertically:

CPU bottleneck (>90% usage)
Memory pressure
Simple scaling without coordination
Fewer than max partitions

Thread Scaling

Configuration:

{
  "threads": 8
}

Guidelines:

Threads = CPU Cores × 1-2

Examples:
- 2 cores → threads: 2-4
- 4 cores → threads: 4-8
- 8 cores → threads: 8-16

Impact:

More threads = Higher throughput
Too many threads = CPU thrashing
Monitor CPU usage to find optimal

Memory Scaling

Configuration:

{
  "consumer_properties": {
    "fetch.max.bytes": "52428800",
    "max.poll.records": "1000"
  },
  "producer_properties": {
    "buffer.memory": "67108864",
    "batch.size": "131072"
  }
}

Memory Sizing:

Base Memory: ~50MB
Per Thread: +10-20MB
Buffer Memory: (producer.buffer.memory / 1MB)
Total ≈ 50 + (threads × 15) + (buffer / 1MB)

Example (8 threads, 64MB buffer):
Memory = 50 + (8 × 15) + 64 = 234MB
Allocate 512MB for safety

CPU Allocation

Docker:

docker run \
  --cpus="4.0" \
  --memory="1g" \
  streamforge

Kubernetes:

resources:
  requests:
    cpu: "2000m"      # 2 cores guaranteed
    memory: "512Mi"
  limits:
    cpu: "4000m"      # Can burst to 4 cores
    memory: "1Gi"

Kafka Partition Strategy

Optimal Partition Count

Formula:

Partitions = (Target Throughput × Replication) / Per-Partition Throughput

Example:
Target: 100K msg/s
Per-Partition: 10K msg/s
Replication: 3x
Partitions = (100K × 3) / 10K = 30 partitions

Recommendations:

Minimum: 3 partitions (basic parallelism)
Good: 10-20 partitions (room to scale)
High throughput: 50-100 partitions
Maximum: 1000s (diminishing returns)

Partition Key Strategy

Important for ordering:

{
  "partition": "/userId"
}

Effects:

Same key → Same partition → Same consumer → Ordering preserved
Different keys → Different partitions → Parallel processing

Example:

User123 messages → Partition 5 → Instance 2
User456 messages → Partition 8 → Instance 3

Rebalancing Impact

Adding partitions (requires restart):

# Add partitions to topic
kafka-topics.sh --alter \
  --topic events \
  --partitions 20 \
  --bootstrap-server kafka:9092

# Restart consumers to pick up new partitions
kubectl rollout restart deployment streamforge

Note: Cannot reduce partition count (Kafka limitation).

Consumer Group Coordination

Consumer Group Settings

{
  "appid": "streamforge",
  "consumer_properties": {
    "group.id": "streamforge",
    "enable.auto.commit": "false",
    "auto.offset.reset": "latest",
    "session.timeout.ms": "30000",
    "heartbeat.interval.ms": "3000",
    "max.poll.interval.ms": "300000"
  }
}

Multiple Consumer Groups

Use Case: Different Processing Logic

Group 1 (appid: "mirrormaker-archive")
  - All messages to archive

Group 2 (appid: "mirrormaker-realtime")
  - Filtered messages to real-time topics

Group 3 (appid: "mirrormaker-analytics")
  - Transformed messages to analytics

Each group processes independently with own offsets.

Offset Management

Automatic (Recommended):

{
  "consumer_properties": {
    "enable.auto.commit": "false"
  }
}

Application commits offsets after successful processing.

Manual: Monitor lag:

kafka-consumer-groups.sh \
  --bootstrap-server kafka:9092 \
  --group streamforge \
  --describe

Load Balancing

Kafka Native Load Balancing

Kafka automatically balances partitions:

10 partitions, 5 instances:
Instance 1: P0, P1
Instance 2: P2, P3
Instance 3: P4, P5
Instance 4: P6, P7
Instance 5: P8, P9

Automatic rebalancing when:

Instance added
Instance removed
Instance fails
Network partition

Uneven Load Distribution

Problem:

Partition 0: 50K msg/s (hot partition)
Partition 1-9: 5K msg/s each

Solution 1: Better Partition Keys

{
  "partition": "/userId"
}

Use high-cardinality fields to distribute load.

Solution 2: Increase Partitions

# More partitions = Better distribution
kafka-topics.sh --alter --topic events --partitions 20

Solution 3: Custom Partitioning

// Use hash of multiple fields
let partition_key = format!("{}-{}", user_id, timestamp % 1000);

Scaling Patterns

Pattern 1: Linear Scaling

Start: 1 instance, 10 partitions, 10K msg/s

Scale:

instances → 20K msg/s
instances → 50K msg/s
instances → 100K msg/s

Configuration:

Same for all instances
Let Kafka balance partitions
Monitor throughput and CPU

Pattern 2: Staged Scaling

Progressive scaling with monitoring:

# Stage 1: Start with 3 instances
kubectl scale deployment streamforge --replicas=3

# Monitor for 30 minutes
# Check: CPU, memory, lag, throughput

# Stage 2: Scale to 5 instances
kubectl scale deployment streamforge --replicas=5

# Monitor...

# Stage 3: Scale to 10 instances
kubectl scale deployment streamforge --replicas=10

Pattern 3: Auto-Scaling

Based on consumer lag:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mirrormaker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: streamforge
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: External
    external:
      metric:
        name: kafka_consumer_lag
        selector:
          matchLabels:
            topic: events
            group: streamforge
      target:
        type: AverageValue
        averageValue: "1000"  # Scale when lag > 1000

Pattern 4: Geographic Distribution

Multi-region deployment:

Region US-EAST:
  - 5 instances
  - Consume from local Kafka
  - Produce to central Kafka

Region US-WEST:
  - 5 instances
  - Consume from local Kafka
  - Produce to central Kafka

Region EU:
  - 5 instances
  - Consume from local Kafka
  - Produce to central Kafka

Each region processes independently, different consumer groups.

Monitoring and Tuning

Key Metrics

Consumer Lag:

kafka-consumer-groups.sh --describe --group streamforge

Watch for:

Lag increasing → Need more capacity
Lag stable → Adequate capacity
Lag decreasing → Catching up

Application Metrics:

Stats: processed=10000 (1000.0/s), filtered=100 (10.0/s),
       completed=9900 (990.0/s), errors=0 (0.0/s)

Per Instance:

Throughput: 1000 msg/s → 10K msg/s typical range
CPU: 50-80% optimal
Memory: <80% of limit

Scaling Triggers

Scale UP when:

Consumer lag > 10K messages for 5+ minutes
CPU > 80% sustained
Memory > 80% sustained
Throughput below target

Scale DOWN when:

Consumer lag < 1K messages sustained
CPU < 30% sustained
Memory < 50% sustained
Cost optimization needed

Tuning After Scaling

After adding instances:

Monitor rebalancing (1-5 minutes)

Verify partition distribution

kafka-consumer-groups.sh --describe --group streamforge

Check per-instance throughput
Adjust thread count if needed

Optimization loop:

Scale horizontally (add instances)
Monitor for 30 minutes
Tune threads per instance
Monitor for 30 minutes
Adjust batch sizes if needed
Repeat until optimal

Best Practices

1. Start Small, Scale Gradually

# Day 1: 2 instances
# Day 2: Monitor, maybe 3 instances
# Week 1: Stable at 5 instances
# Month 1: Auto-scaling with HPA

2. Match Partitions to Scale

Planning for 10 instances?
→ Create topic with 10-20 partitions

Planning for 100 instances?
→ Create topic with 100-200 partitions

3. Monitor Before Scaling

Don’t scale blindly:

Check consumer lag trend
Verify actual bottleneck (CPU/memory/network)
Review instance utilization
Test with smaller scale first

4. Use Health Checks

# Kubernetes
livenessProbe:
  exec:
    command: ["pgrep", "streamforge"]
  initialDelaySeconds: 10
  periodSeconds: 30

readinessProbe:
  exec:
    command: ["pgrep", "streamforge"]
  initialDelaySeconds: 5
  periodSeconds: 10

5. Plan for Failures

Total Capacity: 10 instances
Plan for: 2 instance failures
Deploy: 12 instances
Result: 83% utilization, 20% overhead

6. Use Resource Requests and Limits

resources:
  requests:    # Guaranteed resources
    cpu: "1000m"
    memory: "256Mi"
  limits:      # Maximum resources
    cpu: "2000m"
    memory: "512Mi"

7. Network Optimization

Same datacenter/region:

Lower latency
Higher throughput
Better reliability

Cross-region:

Increase timeouts
Enable compression
Use dedicated network

8. Partition Strategy

Good partition keys:

User ID (high cardinality)
Order ID (unique)
Device ID (distributed)

Poor partition keys:

Country (low cardinality)
Day of week (very low cardinality)
Constant value (all to one partition)

Scaling Examples

Example 1: Small Deployment

Scenario:

1K msg/s throughput
5 partitions
Single datacenter

Configuration:

replicas: 2
resources:
  requests:
    cpu: "500m"
    memory: "256Mi"
  limits:
    cpu: "1000m"
    memory: "512Mi"

{
  "threads": 2,
  "consumer_properties": {
    "fetch.min.bytes": "1048576"
  }
}

Example 2: Medium Deployment

Scenario:

25K msg/s throughput
20 partitions
Multi-az

Configuration:

replicas: 5
resources:
  requests:
    cpu: "1000m"
    memory: "512Mi"
  limits:
    cpu: "2000m"
    memory: "1Gi"

{
  "threads": 4,
  "consumer_properties": {
    "fetch.min.bytes": "1048576",
    "max.poll.records": "500"
  },
  "producer_properties": {
    "batch.size": "65536",
    "linger.ms": "10"
  }
}

Example 3: Large Deployment

Scenario:

100K msg/s throughput
50 partitions
Multi-region

Configuration:

replicas: 20
resources:
  requests:
    cpu: "2000m"
    memory: "1Gi"
  limits:
    cpu: "4000m"
    memory: "2Gi"

{
  "threads": 8,
  "consumer_properties": {
    "fetch.min.bytes": "2097152",
    "max.poll.records": "1000"
  },
  "producer_properties": {
    "batch.size": "131072",
    "linger.ms": "10",
    "compression.type": "snappy"
  }
}

Troubleshooting Scaling Issues

Issue: Instances Not Consuming

Symptoms:

Some instances idle
Uneven partition distribution

Check:

# Verify consumer group
kafka-consumer-groups.sh --describe --group streamforge

# Check instance count vs partitions
kubectl get pods | grep mirrormaker | wc -l

Solution:

# Ensure instances < partitions
# Or add more partitions to topic

Issue: Rebalancing Loops

Symptoms:

Constant rebalancing
High lag
No progress

Check:

# Check logs for rebalance messages
kubectl logs -f deployment/streamforge | grep rebalance

Solution:

{
  "consumer_properties": {
    "session.timeout.ms": "45000",
    "max.poll.interval.ms": "600000"
  }
}

Issue: Uneven Load

Symptoms:

Some instances 100% CPU
Other instances idle

Solution:

Better partition keys
Increase partition count
Check for hot partitions

Summary

Quick Reference

Throughput	Partitions	Instances	Threads	Memory
1K msg/s	3-5	1-2	2	256Mi
10K msg/s	5-10	2-5	4	512Mi
25K msg/s	10-20	5-10	4	512Mi
50K msg/s	20-40	10-20	4-8	1Gi
100K msg/s	50-100	20-50	8	1Gi

Scaling Checklist

Next Steps

PERFORMANCE.md - Performance tuning
USAGE.md - Use cases and patterns
DOCKER.md - Deployment options

Remember: Start small, monitor closely, scale gradually. It’s easier to scale up than down!

Scaling Guide

Table of Contents

Scaling Fundamentals

Understanding the Architecture

Key Concepts

Horizontal Scaling

Adding More Instances

Configuration for Horizontal Scaling

Deployment Strategies

Docker Compose

Kubernetes Deployment

Horizontal Pod Autoscaler (HPA)

Consumer Group Rebalancing

Vertical Scaling

Adding More Resources Per Instance

Thread Scaling

Memory Scaling

CPU Allocation

Kafka Partition Strategy

Optimal Partition Count

Partition Key Strategy

Rebalancing Impact

Consumer Group Coordination

Consumer Group Settings

Multiple Consumer Groups

Offset Management

Load Balancing

Kafka Native Load Balancing

Uneven Load Distribution

Scaling Patterns

Pattern 1: Linear Scaling

Pattern 2: Staged Scaling

Pattern 3: Auto-Scaling

Pattern 4: Geographic Distribution

Monitoring and Tuning

Key Metrics

Scaling Triggers

Tuning After Scaling

Best Practices

1. Start Small, Scale Gradually

2. Match Partitions to Scale

3. Monitor Before Scaling

4. Use Health Checks

5. Plan for Failures

6. Use Resource Requests and Limits

7. Network Optimization

8. Partition Strategy

Scaling Examples

Example 1: Small Deployment

Example 2: Medium Deployment

Example 3: Large Deployment

Troubleshooting Scaling Issues

Issue: Instances Not Consuming

Issue: Rebalancing Loops

Issue: Uneven Load

Summary

Quick Reference

Scaling Checklist

Next Steps