Model Deployment
8 min read
Serving Open-Source LLMs with vLLM on Kubernetes
A practical guide to deploying open-source LLMs with vLLM on Kubernetes — covering GPU sizing, request routing, autoscaling, batching, and safe rollouts.
We share everything we learn — real use cases, real production lessons. Technical deep-dives on MLOps, model deployment, AI reliability, and more.
📝 Building in public
Posts authored by the Resilio Tech Team. More in-depth tutorials and case studies coming soon.
A practical guide to deploying open-source LLMs with vLLM on Kubernetes — covering GPU sizing, request routing, autoscaling, batching, and safe rollouts.
3/30/2026 • 6 min read
3/29/2026 • 8 min read
3/28/2026 • 6 min read
3/27/2026 • 5 min read