Event-Driven ML Pipelines: The Middle Ground Between Batch and Real-Time

Most ML architecture debates get framed as a false binary. You either run batch scoring every few hours and accept staleness, or you build a fully online inference stack with strict latency targets and always-on compute.

This framing ignores the architecture many teams actually need: Event-Driven ML Pipelines. By reacting to business events (via Kafka or Pulsar), you can achieve near-real-time scoring without the massive overhead of always-on GPU clusters.

Why the Middle Ground Matters

Batch is too slow for real-time fraud detection or session-based recommendations. Conversely, synchronous real-time serving is overkill for workloads where a 2-second delay is acceptable.

Event-driven architectures decouple the trigger from the response, allowing you to smooth out traffic spikes and maximize hardware utilization.

Technical Implementation: Micro-Batching with Kafka

To get the most out of your accelerators, you shouldn't process events one by one. Micro-batching allows you to group events together for a single model execution.

# Production-grade Kafka Consumer with Micro-batching
from confluent_kafka import Consumer
import time

consumer = Consumer({'bootstrap.servers': 'kafka:9092', 'group.id': 'ml-workers'})
consumer.subscribe(['input-events'])

def process_batch(events):
    # Perform inference on a batch for better GPU utilization
    print(f"Processing batch of {len(events)} events")
    # results = model.predict(events)
    pass

batch = []
last_flush = time.time()

while True:
    msg = consumer.poll(0.1)
    if msg:
        batch.append(msg.value())
    
    # Trigger batch processing on size or time threshold
    if len(batch) >= 100 or (time.time() - last_flush > 1.0 and batch):
        process_batch(batch)
        batch = []
        last_flush = time.time()

Scaling on Lag with KEDA

Instead of scaling based on CPU or memory, event-driven pipelines should scale based on Topic Lag. If your consumer group is falling behind, you need more workers.

Using KEDA (Kubernetes Event-driven Autoscaling), you can scale your inference pods to zero when there's no traffic.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: ml-inference-scaler
spec:
  scaleTargetRef:
    name: ml-inference-worker
  minReplicaCount: 0  # Scale to zero!
  maxReplicaCount: 20
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka:9092
        consumerGroup: ml-workers
        topic: input-events
        lagThreshold: "100"

Data Consistency and Feature Stores

A major challenge in streaming ML is ensuring that the features used during inference are as fresh as the event itself. This is why feature store reliability is paramount. If your model reacts to an event in 100ms but uses features that are 1 hour old, the prediction quality will suffer.

Final Takeaway

The choice between batch and real-time is no longer binary. Event-driven ML pipelines offer a cost-effective, scalable, and highly performant middle ground for the majority of production use cases. By leveraging Kafka, micro-batching, and lag-based autoscaling, you can build systems that are "fast enough" for the business without breaking the bank on idle GPU capacity.

Resilio Tech specializes in designing these high-throughput, event-driven AI architectures. We help companies bridge the gap between batch processing and real-time serving, ensuring your models are integrated seamlessly into your existing data streams while optimizing for both performance and cost.

Looking to move beyond batch processing? Talk to Resilio Tech about building an event-driven ML pipeline that scales with your business.

Event-Driven ML Pipelines: When Batch Isn't Fast Enough and Real-Time Is Too Expensive

Why the Middle Ground Matters

Technical Implementation: Micro-Batching with Kafka

Scaling on Lag with KEDA

Data Consistency and Feature Stores

Final Takeaway

Share this article

Resilio Tech Team

Article Info

Continue Reading

From Monolith to Microservices for ML: When and How to Break Up Your ML System

Model Registry Best Practices: Versioning, Lineage, and Promotion Workflows

Moving from Notebook-Based ML to Production Pipelines: The Practical Path

Ready to move from notebook to production?