Version

ServingStack Custom Resource

On this page

A ServingStack installs the serving substrate (LeaderWorkerSet, Gateway API, cert-manager, Prometheus) on a Kubernetes cluster.

#Metadata

API version: infrastructure.modelplane.ai/v1alpha1
Kind: ServingStack
Scope: Namespaced
Short names: ss

#Example

Manifest

apiVersion: infrastructure.modelplane.ai/v1alpha1
kind: ServingStack
metadata:
  name: west-gke-stack
  namespace: platform
spec:
  secrets:
    - type: Kubeconfig
      name: west-gke-kubeconfig
      key: kubeconfig
    - type: GCPServiceAccountKey
      name: west-gke-sa-key
      key: private_key
  versions:
    gatewayApi: "v1.5.1"
    certManager: "v1.17.1"
    envoyGateway: "v1.8.1"
  gateway:
    listeners:
      - name: http
        port: 80
        protocol: HTTP

#Spec

ServingStackSpec defines the desired state of ServingStack.

Configuration for the cluster’s inference traffic gateway.

GatewayClass name. Override if the cluster already has a GatewayClass named envoy.

Unique listener name.

Port number for this listener.

Protocol for this listener.

Host path where the NVIDIA driver is installed, passed to the DRA driver as nvidiaDriverRoot. Defaults to / (the upstream default), which suits EKS and self-managed clusters. Set it for platforms that install the driver elsewhere — GKE uses /home/kubernetes/bin/nvidia. A non-default value also makes the serving stack compose a ResourceQuota permitting the DRA driver’s system-critical pods, which GKE requires. The cluster composition sets this; the serving stack never inspects its own cloud.

Key within the Secret that holds the credential data.

Name of the Secret.

The type of credential this secret contains. Kubeconfig is required. Cloud-specific types are optional and determine how the ProviderConfigs authenticate.

Version pins for each component. Defaults are the latest tested combination. Override individual versions to upgrade components independently.

cert-manager chart version.

Envoy Gateway chart version. Must support InferencePool backend resources (the disaggregated-serving routing path), which requires v1.8.x or newer; older releases lack the Gateway API CRDs (ListenerSet) the AI Gateway needs.

Gateway API CRD version.

LeaderWorkerSet chart version.

Node Feature Discovery chart version. NFD labels GPU nodes so the NVIDIA DRA driver targets its kubelet plugin to them.

NVIDIA DRA driver chart version. Publishes GPUs as DRA ResourceSlices and the gpu.nvidia.com DeviceClass that ModelReplica ResourceClaims bind through.

kube-prometheus-stack chart version.

#Status

Status of the cluster’s inference gateway.

The gateway’s external address, once assigned by the cloud load balancer.