Skip to main content
0%
MLOps

Event-Driven ML Pipelines: When Batch Isn't Fast Enough and Real-Time Is Too Expensive

A deep guide to event-driven ML pipelines as the middle ground between batch and always-on real-time inference, using Kafka or Pulsar for near-real-time scoring without wasting GPU capacity.

3 min read512 words

Most ML architecture debates get framed as a false binary. You either run batch scoring every few hours and accept staleness, or you build a fully online inference stack with strict latency targets and always-on compute.

This framing ignores the architecture many teams actually need: Event-Driven ML Pipelines. By reacting to business events (via Kafka or Pulsar), you can achieve near-real-time scoring without the massive overhead of always-on GPU clusters.

Why the Middle Ground Matters

Batch is too slow for real-time fraud detection or session-based recommendations. Conversely, synchronous real-time serving is overkill for workloads where a 2-second delay is acceptable.

Event-driven architectures decouple the trigger from the response, allowing you to smooth out traffic spikes and maximize hardware utilization.

Technical Implementation: Micro-Batching with Kafka

To get the most out of your accelerators, you shouldn't process events one by one. Micro-batching allows you to group events together for a single model execution.

# Production-grade Kafka Consumer with Micro-batching
from confluent_kafka import Consumer
import time

consumer = Consumer({'bootstrap.servers': 'kafka:9092', 'group.id': 'ml-workers'})
consumer.subscribe(['input-events'])

def process_batch(events):
    # Perform inference on a batch for better GPU utilization
    print(f"Processing batch of {len(events)} events")
    # results = model.predict(events)
    pass

batch = []
last_flush = time.time()

while True:
    msg = consumer.poll(0.1)
    if msg:
        batch.append(msg.value())
    
    # Trigger batch processing on size or time threshold
    if len(batch) >= 100 or (time.time() - last_flush > 1.0 and batch):
        process_batch(batch)
        batch = []
        last_flush = time.time()

Scaling on Lag with KEDA

Instead of scaling based on CPU or memory, event-driven pipelines should scale based on Topic Lag. If your consumer group is falling behind, you need more workers.

Using KEDA (Kubernetes Event-driven Autoscaling), you can scale your inference pods to zero when there's no traffic.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: ml-inference-scaler
spec:
  scaleTargetRef:
    name: ml-inference-worker
  minReplicaCount: 0  # Scale to zero!
  maxReplicaCount: 20
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka:9092
        consumerGroup: ml-workers
        topic: input-events
        lagThreshold: "100"

Data Consistency and Feature Stores

A major challenge in streaming ML is ensuring that the features used during inference are as fresh as the event itself. This is why feature store reliability is paramount. If your model reacts to an event in 100ms but uses features that are 1 hour old, the prediction quality will suffer.

Final Takeaway

The choice between batch and real-time is no longer binary. Event-driven ML pipelines offer a cost-effective, scalable, and highly performant middle ground for the majority of production use cases. By leveraging Kafka, micro-batching, and lag-based autoscaling, you can build systems that are "fast enough" for the business without breaking the bank on idle GPU capacity.

Resilio Tech specializes in designing these high-throughput, event-driven AI architectures. We help companies bridge the gap between batch processing and real-time serving, ensuring your models are integrated seamlessly into your existing data streams while optimizing for both performance and cost.

Looking to move beyond batch processing? Talk to Resilio Tech about building an event-driven ML pipeline that scales with your business.

Share this article

Help others discover this content

Share with hashtags:

#Mlops#Streaming#Kafka#Pulsar#Model Deployment
RT

Resilio Tech Team

Building AI infrastructure tools and sharing knowledge to help companies deploy ML systems reliably.

Article Info

Published4/10/2026
Reading Time3 min read
Words512
Scale Your AI Infrastructure

Ready to move from notebook to production?

We help companies deploy, scale, and operate AI systems reliably. Book a free 30-minute audit to discuss your specific infrastructure challenges.