DeepSeek on Kubernetes: AI-Powered Reasoning at Scale
Introduction As artificial intelligence continues to evolve, deploying AI-powered applications efficiently and at scale has become critical. Kubernetes, the de facto orchestration platform, plays a crucial role in managing containerized AI workloads, ensuring scalability, resilience, and ease of management. In this article, we explore DeepSeek on Kubernetes, a deployment that integrates DeepSeek-R1, a powerful reasoning AI model, with Open WebUI for seamless interaction. Why Kubernetes for DeepSeek? DeepSeek is an advanced reasoning model that benefits significantly from a containerized deployment within a Kubernetes cluster. Kubernetes provides: Scalability: Effortlessly scale AI workloads across multiple nodes. Resilience: Automatic pod rescheduling in case of failures. Service Discovery: Manage microservices effectively using Kubernetes Services. Persistent Storage: Use PVCs to store and manage AI model data across restarts. Load Balancing: Distribute workloads efficiently across multiple replicas. Deploying DeepSeek on Kubernetes Kubernetes Cluster Setup In our setup, we have a three-node Kubernetes cluster with the following nodes: $ kubectl get nodes NAME STATUS ROLES AGE VERSION deepseek-control-plane Ready control-plane 6d5h v1.32.0 deepseek-worker Ready 6d5h v1.32.0 deepseek-worker2 Ready 6d5h v1.32.0 Even if Kubernetes nodes are not powered using GPU, DeepSeek-R1 will still function, although response times may be slower. GPU acceleration is recommended for optimal performance, especially for complex reasoning tasks. Kubernetes clusters can be set up locally using tools like: KIND (Kubernetes IN Docker) Minikube MicroK8s If deployed on a cloud provider, the setup can be made securely accessible using an Ingress object to expose services through a web interface with proper authentication and TLS security. Deploying DeepSeek-R1 with Ollama DeepSeek-R1 is deployed within Kubernetes using Ollama, which handles AI model inference. Below is the Kubernetes manifest for the Ollama Deployment: apiVersion: apps/v1 kind: Deployment metadata: name: ollama labels: app: ollama spec: replicas: 1 selector: matchLabels: app: ollama template: metadata: labels: app: ollama spec: containers: - name: ollama image: ollama/ollama:latest ports: - containerPort: 11434 volumeMounts: - mountPath: /root/.ollama name: ollama-storage env: - name: OLLAMA_MODEL value: deepseek-r1:1.5b - name: OLLAMA_KEEP_ALIVE value: "-1" - name: OLLAMA_NO_THINKING value: "true" - name: OLLAMA_SYSTEM_PROMPT value: "You are DeepSeek-R1, a reasoning model. Provide direct answers without detailed reasoning steps or tags." volumes: - name: ollama-storage emptyDir: {} Exposing Ollama as a Service To allow other services to communicate with Ollama, we define a NodePort Service: apiVersion: v1 kind: Service metadata: name: ollama-service spec: selector: app: ollama ports: - protocol: TCP port: 11434 targetPort: 11434 type: NodePort Deploying Open WebUI For an interactive experience, we integrate Open WebUI, which connects to Ollama and provides a user-friendly interface. The deployment is as follows: apiVersion: apps/v1 kind: Deployment metadata: name: openweb-ui labels: app: openweb-ui spec: replicas: 1 selector: matchLabels: app: openweb-ui template: metadata: labels: app: openweb-ui spec: containers: - name: openweb-ui image: ghcr.io/open-webui/open-webui:main env: - name: WEBUI_NAME value: "DeepSeek India - Hardware Software Gheware" - name: OLLAMA_BASE_URL value: "http://ollama-service:11434" - name: OLLAMA_DEFAULT_MODEL value: "deepseek-r1:1.5b" ports: - containerPort: 8080 volumeMounts: - name: openweb-data mountPath: /app/backend/data volumes: - name: openweb-data persistentVolumeClaim: claimName: openweb-ui-pvc Running Inference on DeepSeek-R1 To test the deployment, we can execute a command within the Ollama container: kubectl exec -it deploy/ollama -- bash ollama run deepseek-r1:1.5b This command starts an interactive session with the AI model, allowing direct input queries. Accessing Open WebUI After deployment, Open WebUI is accessible at: http://deepseek.gheware.com/auth This interface allows users to interact with DeepSeek-R1 through a chat-based environment. Conclusion By deploying DeepSeek on Kubernetes, we achieve a scalable, resilient, and production-ready AI reasoning system. Kubernetes efficiently orchestrates DeepSeek-R1, ensuring smooth model execution and user intera
![DeepSeek on Kubernetes: AI-Powered Reasoning at Scale](https://media2.dev.to/dynamic/image/width%3D1000,height%3D500,fit%3Dcover,gravity%3Dauto,format%3Dauto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9duol8zy727moju348rv.png)
Introduction
As artificial intelligence continues to evolve, deploying AI-powered applications efficiently and at scale has become critical. Kubernetes, the de facto orchestration platform, plays a crucial role in managing containerized AI workloads, ensuring scalability, resilience, and ease of management. In this article, we explore DeepSeek on Kubernetes, a deployment that integrates DeepSeek-R1, a powerful reasoning AI model, with Open WebUI for seamless interaction.
Why Kubernetes for DeepSeek?
DeepSeek is an advanced reasoning model that benefits significantly from a containerized deployment within a Kubernetes cluster. Kubernetes provides:
Scalability: Effortlessly scale AI workloads across multiple nodes.
Resilience: Automatic pod rescheduling in case of failures.
Service Discovery: Manage microservices effectively using Kubernetes Services.
Persistent Storage: Use PVCs to store and manage AI model data across restarts.
Load Balancing: Distribute workloads efficiently across multiple replicas.
Deploying DeepSeek on Kubernetes
- Kubernetes Cluster Setup In our setup, we have a three-node Kubernetes cluster with the following nodes:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
deepseek-control-plane Ready control-plane 6d5h v1.32.0
deepseek-worker Ready 6d5h v1.32.0
deepseek-worker2 Ready 6d5h v1.32.0
Even if Kubernetes nodes are not powered using GPU, DeepSeek-R1 will still function, although response times may be slower. GPU acceleration is recommended for optimal performance, especially for complex reasoning tasks.
Kubernetes clusters can be set up locally using tools like:
KIND (Kubernetes IN Docker)
Minikube
MicroK8s
If deployed on a cloud provider, the setup can be made securely accessible using an Ingress object to expose services through a web interface with proper authentication and TLS security.
- Deploying DeepSeek-R1 with Ollama DeepSeek-R1 is deployed within Kubernetes using Ollama, which handles AI model inference. Below is the Kubernetes manifest for the Ollama Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
labels:
app: ollama
spec:
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
ports:
- containerPort: 11434
volumeMounts:
- mountPath: /root/.ollama
name: ollama-storage
env:
- name: OLLAMA_MODEL
value: deepseek-r1:1.5b
- name: OLLAMA_KEEP_ALIVE
value: "-1"
- name: OLLAMA_NO_THINKING
value: "true"
- name: OLLAMA_SYSTEM_PROMPT
value: "You are DeepSeek-R1, a reasoning model. Provide direct answers without detailed reasoning steps or tags."
volumes:
- name: ollama-storage
emptyDir: {}
- Exposing Ollama as a Service To allow other services to communicate with Ollama, we define a NodePort Service:
apiVersion: v1
kind: Service
metadata:
name: ollama-service
spec:
selector:
app: ollama
ports:
- protocol: TCP
port: 11434
targetPort: 11434
type: NodePort
- Deploying Open WebUI For an interactive experience, we integrate Open WebUI, which connects to Ollama and provides a user-friendly interface. The deployment is as follows:
apiVersion: apps/v1
kind: Deployment
metadata:
name: openweb-ui
labels:
app: openweb-ui
spec:
replicas: 1
selector:
matchLabels:
app: openweb-ui
template:
metadata:
labels:
app: openweb-ui
spec:
containers:
- name: openweb-ui
image: ghcr.io/open-webui/open-webui:main
env:
- name: WEBUI_NAME
value: "DeepSeek India - Hardware Software Gheware"
- name: OLLAMA_BASE_URL
value: "http://ollama-service:11434"
- name: OLLAMA_DEFAULT_MODEL
value: "deepseek-r1:1.5b"
ports:
- containerPort: 8080
volumeMounts:
- name: openweb-data
mountPath: /app/backend/data
volumes:
- name: openweb-data
persistentVolumeClaim:
claimName: openweb-ui-pvc
- Running Inference on DeepSeek-R1 To test the deployment, we can execute a command within the Ollama container:
kubectl exec -it deploy/ollama -- bash
ollama run deepseek-r1:1.5b
This command starts an interactive session with the AI model, allowing direct input queries.
Accessing Open WebUI
After deployment, Open WebUI is accessible at:
http://deepseek.gheware.com/auth
This interface allows users to interact with DeepSeek-R1 through a chat-based environment.
Conclusion
By deploying DeepSeek on Kubernetes, we achieve a scalable, resilient, and production-ready AI reasoning system. Kubernetes efficiently orchestrates DeepSeek-R1, ensuring smooth model execution and user interaction through Open WebUI. This architecture can be further extended by adding GPU acceleration, auto-scaling, and monitoring with Prometheus and Grafana.
For AI practitioners, Kubernetes offers the perfect foundation for deploying and managing reasoning models like DeepSeek-R1, paving the way for efficient, scalable AI-powered solutions.
Ready to explore AI on Kubernetes? Let’s deploy DeepSeek together!