ServingStack Custom Resource
A ServingStack installs the serving substrate (LeaderWorkerSet, Gateway API, cert-manager, Prometheus) on a Kubernetes cluster.
#Metadata
#Example
Manifest
apiVersion: infrastructure.modelplane.ai/v1alpha1
kind: ServingStack
metadata:
name: west-gke-stack
namespace: platform
spec:
secrets:
- type: Kubeconfig
name: west-gke-kubeconfig
key: kubeconfig
- type: GCPServiceAccountKey
name: west-gke-sa-key
key: private_key
versions:
gatewayApi: "v1.5.1"
certManager: "v1.17.1"
envoyGateway: "v1.8.1"
gateway:
listeners:
- name: http
port: 80
protocol: HTTP
#Spec
ServingStackSpec defines the desired state of ServingStack.
Configuration for the cluster’s inference traffic gateway.
GatewayClass name. Override if the cluster already has a GatewayClass named envoy.
Host path where the NVIDIA driver is installed, passed to the DRA driver as nvidiaDriverRoot. Defaults to / (the upstream default), which suits EKS and self-managed clusters. Set it for platforms that install the driver elsewhere — GKE uses /home/kubernetes/bin/nvidia. A non-default value also makes the serving stack compose a ResourceQuota permitting the DRA driver’s system-critical pods, which GKE requires. The cluster composition sets this; the serving stack never inspects its own cloud.
Key within the Secret that holds the credential data.
Name of the Secret.
The type of credential this secret contains. Kubeconfig is required. Cloud-specific types are optional and determine how the ProviderConfigs authenticate.
Version pins for each component. Defaults are the latest tested combination. Override individual versions to upgrade components independently.
cert-manager chart version.
Envoy Gateway chart version. Must support InferencePool backend resources (the disaggregated-serving routing path), which requires v1.8.x or newer; older releases lack the Gateway API CRDs (ListenerSet) the AI Gateway needs.
Gateway API CRD version.
LeaderWorkerSet chart version.
Node Feature Discovery chart version. NFD labels GPU nodes so the NVIDIA DRA driver targets its kubelet plugin to them.
NVIDIA DRA driver chart version. Publishes GPUs as DRA ResourceSlices and the gpu.nvidia.com DeviceClass that ModelReplica ResourceClaims bind through.
kube-prometheus-stack chart version.