Host LLM in Kubernetes
Overview
In this workshop, you'll deploy a complete AI stack on Kubernetes using Kuberise.io:
- Ollama — serve LLMs (like DeepSeek, Llama) via a REST API
- Open-WebUI — a ChatGPT-like web interface for your self-hosted models
- K8sGPT — an AI assistant that diagnoses Kubernetes cluster issues
All three tools are available as Kuberise.io platform components and can be enabled with a single configuration change.
Part 1: Ollama — Self-Hosted LLM Server
Enable Ollama
In your enabler file, enable the Ollama component:
# app-of-apps/values-{name}.yaml (e.g. values-shared.yaml)
ArgocdApplications:
ollama:
enabled: true
Try It Locally First
Install the Ollama CLI on your machine and run a model locally:
ollama run deepseek-r1:1.5b
Example prompt: "Write a love letter from a smartphone to its charger"
Try Your Kubernetes-Hosted Model
Point the Ollama CLI at your cluster's Ollama instance:
OLLAMA_HOST=https://ollama.onprem.kuberise.dev ollama run deepseek-r1:1.5b
Compare and Observe
- Compare the speed — is local inference faster or slower than the cluster?
- Check resource usage — open the Grafana dashboard to see CPU and memory consumption of the Ollama pod
Configuration Options
You can fine-tune Ollama's deployment through the values file:
| Configuration | Description |
|---|---|
replicaCount | Scale horizontally for higher throughput |
| GPU resources | Enable GPU passthrough for faster inference |
| Persistent Volume | Avoid re-downloading models on pod restart |
Working with Multiple Models
- Pull models: Ollama can host multiple models simultaneously
- Switch models: Use the
/show infocommand in the Ollama CLI to verify which model is running - GitOps workflow: Change the model in your values file, push to Git, and ArgoCD automatically restarts the pod with the new model
deepseek-r1:1.5b) to conserve resources. You can switch to larger models in production clusters with more capacity.Part 2: Open-WebUI — Chat Interface
Enable Open-WebUI
# app-of-apps/values-{name}.yaml (e.g. values-shared.yaml)
ArgocdApplications:
open-webui:
enabled: true
Access the Chat Interface
- Check that ArgoCD has deployed Open-WebUI successfully
- Verify a new Ingress resource has been created
- Open the URL:
https://webui.onprem.kuberise.dev - Select the model you want to chat with from the dropdown
- Start a conversation — your chat history is saved automatically
Open-WebUI provides a familiar ChatGPT-like experience, but your data stays on your infrastructure. This is especially valuable for:
- Data privacy — sensitive prompts never leave your cluster
- Cost control — no per-token API charges
- Customization — fine-tune models for your specific use cases
Part 3: K8sGPT — AI-Powered Cluster Diagnostics
K8sGPT connects your Kubernetes cluster to an LLM to automatically diagnose issues and provide human-readable explanations.
Enable K8sGPT
# app-of-apps/values-{name}.yaml (e.g. values-shared.yaml)
ArgocdApplications:
k8sgpt:
enabled: true
Configure the K8sGPT Custom Resource
Ensure the model specified in the K8sGPT CR matches a model running in your Ollama instance:
kind: K8sGPT
metadata:
name: k8sgpt-ollama
spec:
ai:
enabled: true
model: llama3.2:3b
backend: localai
baseUrl: http://ollama.ollama.svc.cluster.local:11434/
Enable the Ollama backend in the K8sGPT values:
enable_ollama: true
Test It
- Check existing results — K8sGPT creates
Resultresources in thek8sgptnamespace for any issues it detects:
kubectl get results -n k8sgpt
- Create a faulty pod to trigger a diagnostic:
kubectl run faulty-pod --image=fakeimage:latest
- Watch K8sGPT detect the issue — a new
Resultresource will appear, containing:- The error description
- An AI-generated explanation of the problem
- Suggested remediation steps
kubectl get results -n k8sgpt -o yaml
- Clean up and verify the report is removed:
kubectl delete pod faulty-pod
K8sGPT continuously monitors your cluster and provides actionable insights — like having an AI SRE watching over your infrastructure.
Key Takeaways
- Ollama makes it easy to self-host LLMs on Kubernetes with minimal configuration
- Open-WebUI provides a polished chat interface for team-wide access to AI
- K8sGPT turns your LLM into a Kubernetes debugging assistant
- All three integrate seamlessly with the GitOps workflow — enable, commit, push, done
- Self-hosting AI gives you data privacy, cost predictability, and full control