Modelplane Modelplane docs
Version

ServingStack Custom Resource

A ServingStack installs the serving substrate (LeaderWorkerSet, Gateway API, cert-manager, Prometheus) on a Kubernetes cluster.

#Metadata

API version
infrastructure.modelplane.ai/v1alpha1
Kind
ServingStack
Scope
Namespaced
Short names
ss

#Example

Manifest
apiVersion: infrastructure.modelplane.ai/v1alpha1
kind: ServingStack
metadata:
  name: west-gke-stack
  namespace: platform
spec:
  secrets:
    - type: Kubeconfig
      name: west-gke-kubeconfig
      key: kubeconfig
    - type: GCPServiceAccountKey
      name: west-gke-sa-key
      key: private_key
  versions:
    gatewayApi: "v1.5.1"
    certManager: "v1.17.1"
    envoyGateway: "v1.8.1"
  gateway:
    listeners:
      - name: http
        port: 80
        protocol: HTTP

#Spec

ServingStackSpec defines the desired state of ServingStack.

# gateway optional object

Configuration for the cluster’s inference traffic gateway.

# className optional string 1–63 chars default: envoy

GatewayClass name. Override if the cluster already has a GatewayClass named envoy.

# listeners optional object[] ≤ 8 items
# name required string 1–63 chars

Unique listener name.

# port required integer 1–65535

Port number for this listener.

# protocol required enum: HTTP | TCP

Protocol for this listener.

# nvidiaDriverRoot optional string ≤ 512 chars default: /

Host path where the NVIDIA driver is installed, passed to the DRA driver as nvidiaDriverRoot. Defaults to / (the upstream default), which suits EKS and self-managed clusters. Set it for platforms that install the driver elsewhere — GKE uses /home/kubernetes/bin/nvidia. A non-default value also makes the serving stack compose a ResourceQuota permitting the DRA driver’s system-critical pods, which GKE requires. The cluster composition sets this; the serving stack never inspects its own cloud.

# secrets required object[] 1–8 items
# key required string ≤ 253 chars

Key within the Secret that holds the credential data.

# name required string ≤ 253 chars

Name of the Secret.

# type required enum: Kubeconfig | GCPServiceAccountKey

The type of credential this secret contains. Kubeconfig is required. Cloud-specific types are optional and determine how the ProviderConfigs authenticate.

# versions optional object

Version pins for each component. Defaults are the latest tested combination. Override individual versions to upgrade components independently.

# certManager optional string 1–32 chars default: v1.17.1

cert-manager chart version.

# envoyGateway optional string 1–32 chars default: v1.8.1

Envoy Gateway chart version. Must support InferencePool backend resources (the disaggregated-serving routing path), which requires v1.8.x or newer; older releases lack the Gateway API CRDs (ListenerSet) the AI Gateway needs.

# gatewayApi optional string 1–32 chars default: v1.5.1

Gateway API CRD version.

# leaderWorkerSet optional string 1–32 chars default: v0.8.0

LeaderWorkerSet chart version.

# nodeFeatureDiscovery optional string 1–32 chars default: 0.18.3

Node Feature Discovery chart version. NFD labels GPU nodes so the NVIDIA DRA driver targets its kubelet plugin to them.

# nvidiaDraDriver optional string 1–32 chars default: 0.4.0

NVIDIA DRA driver chart version. Publishes GPUs as DRA ResourceSlices and the gpu.nvidia.com DeviceClass that ModelReplica ResourceClaims bind through.

# prometheus optional string 1–32 chars default: 72.6.2

kube-prometheus-stack chart version.

#Status

# gateway optional object

Status of the cluster’s inference gateway.

# address optional string ≤ 256 chars

The gateway’s external address, once assigned by the cloud load balancer.