Version

Register a Cluster

On this page

API: modelplane.ai/v1alpha1 · InferenceCluster

An InferenceCluster represents a Kubernetes cluster configured for model serving. Platform teams create these to provide GPU capacity.

Each cluster has:

A cluster source: GKE or EKS (Modelplane provisions the full cluster) or Existing (bring a cluster you manage yourself). See Supported Providers for the clouds and neoclouds Modelplane runs on.
One or more node pools, each referencing an InferenceClass for its hardware capabilities and provisioning recipe.
Labels for organizational metadata: tier, region, provider. These are the matching surface for ModelDeployment.clusterSelector.

Modelplane installs the serving stack it needs on every cluster it manages, including existing clusters, which it assumes are solely for its use.

Ownership and requirements

Modelplane assumes exclusive ownership of every InferenceCluster. The fleet scheduler’s capacity accounting relies on Modelplane being the only thing placing GPU workloads on the cluster, so dedicate each cluster to Modelplane rather than sharing it with other workloads.

Modelplane also has opinions about how a cluster is set up: its Kubernetes version, the components it installs, and required features like DRA for binding GPUs to pods. On provisioned clusters Modelplane handles this for you. On an existing cluster the platform team must meet the requirements.

Provisioned and existing clusters

The cluster.source discriminator picks one of two models:

Provisioned (GKE, EKS). Modelplane creates the cluster and its GPU node pools from each pool’s InferenceClass, labels the pool’s nodes so the scheduler’s placement is enforced, and provisions the storage class for model weights. It also injects a non-GPU system pool with opinionated defaults to run the inference stack, so you only declare the GPU pools you want.
Existing (Existing). A kubeconfig Secret provides access to a cluster you run yourself. Modelplane installs the serving stack it needs but doesn’t provision infrastructure, and each pool’s InferenceClass provides hardware capabilities for scheduling only. You’re responsible for the cluster meeting Modelplane’s requirements, including labeling each pool’s nodes modelplane.ai/pool=<pool-name> (see how scheduling pins placement).

Examples

# An InferenceCluster backed by a GKE cluster.
#
# Modelplane provisions the full GKE cluster (VPC, subnet, system pool,
# GPU pools, service account, IAM bindings) and installs the inference
# stack (cert-manager, Envoy Gateway, Prometheus, LeaderWorkerSet,
# Gateway API).
#
# The system pool that hosts control-plane components is provisioned
# automatically and is not declared here. Only GPU pools - each
# referencing an InferenceClass that describes the hardware shape and
# how to provision it - need to be declared.
#
# Add labels to this InferenceCluster to control which deployments land on it
# via a ModelDeployment's clusterSelector.
apiVersion: modelplane.ai/v1alpha1
kind: InferenceCluster
metadata:
  name: gke-us-central
  labels:
    modelplane.ai/region: us-central
spec:
  cluster:
    source: GKE
    gke:
      project: my-gcp-project  # Replace with your GCP project ID.
      region: us-central1

  nodePools:
  - name: gpu-l4
    className: gke-l4-1x-g2
    nodeCount: 1
    minNodeCount: 1
    maxNodeCount: 4
    zones:
    - us-central1-a
    - us-central1-c

# An InferenceCluster backed by an EKS cluster.
#
# Modelplane provisions the full EKS cluster (VPC, subnets, internet
# gateway, IAM roles for the cluster and nodes, system + GPU node
# groups, vpc-cni / kube-proxy / coredns addons) and installs the
# inference stack (cert-manager, Traefik, Prometheus, KEDA,
# LeaderWorkerSet).
#
# The system node group that hosts control-plane components is
# provisioned automatically and is not declared here. Only GPU node
# groups - each referencing an InferenceClass that describes the
# hardware shape and how to provision it - need to be declared.
#
# Modelplane provisions EFS RWX storage for ModelCache on EKS: an
# Elastic-throughput file system, mount targets, the EFS CSI driver, and
# a 'modelplane-rwx-efs' StorageClass pinned to it. The admin does
# nothing, and provisioned EKS clusters take no StorageClass override.
#
# Delete this InferenceCluster with foreground cascading deletion for a
# clean teardown:
#
#   kubectl delete inferencecluster eks-us-west --cascade=foreground
#
# The inference stack runs on the EKS cluster and must uninstall while
# the cluster's API server and kubeconfig still exist - otherwise its
# Helm releases hang, and a load balancer one of them created can leak
# its security group and block the VPC from deleting. Foreground
# deletion holds the cluster until the stack is uninstalled. Background
# deletion (the kubectl default) tears everything down at once and can
# orphan cloud resources.
apiVersion: modelplane.ai/v1alpha1
kind: InferenceCluster
metadata:
  name: eks-us-west
  labels:
    modelplane.ai/region: us-west
spec:
  cluster:
    source: EKS
    eks:
      region: us-west-2

  nodePools:
  - name: gpu-l4
    className: eks-l4-1x-g6
    nodeCount: 1
    minNodeCount: 1
    maxNodeCount: 4
    zones:
    - us-west-2a
    - us-west-2b

# An InferenceCluster using an existing cluster you manage yourself.
#
# Provide a kubeconfig Secret so Modelplane can install the inference
# stack and deploy models. Each GPU pool references an InferenceClass
# that describes the hardware - used by the scheduler to know what
# capacity is available.
#
# The kubeconfig Secret must exist in the control plane cluster before
# creating this InferenceCluster.
apiVersion: modelplane.ai/v1alpha1
kind: InferenceCluster
metadata:
  name: byo-us-east
  labels:
    modelplane.ai/region: us-east
spec:
  cluster:
    source: Existing
    existing:
      secretRef:
        name: byo-cluster-kubeconfig
        key: kubeconfig

      # Optional: a cloud identity Secret for pulling images from private
      # registries or accessing cloud APIs from the remote cluster.
      # identitySecretRef:
      #   name: byo-cluster-sa-key
      #   key: private_key

  # Each pool's nodes must be labeled modelplane.ai/pool=<name> (here
  # modelplane.ai/pool=gpu-h100). The scheduler pins a worker to its pool by
  # this label; Modelplane provisions and labels EKS/GKE pools itself, but on a
  # BYO cluster you label the nodes. Without it worker pods stay Pending.
  nodePools:
  - name: gpu-h100
    className: h100-8x-byo
    nodeCount: 2

Cache storage

A ModelCache stages model weights on a ReadWriteMany (RWX) StorageClass on the workload cluster. Where that comes from depends on the source:

GKE (Filestore Enterprise) and EKS (EFS): auto-provisioned. Those classes are fixed; nothing for the admin to do.
Existing: bring your own. Create an RWX StorageClass on the cluster, with any backend that supports automatic PVC provisioning (WekaIO, NetApp Trident, FSx for NetApp, and similar), and name it in cluster.existing.cache.storageClassName.

The ML team’s ModelCache and ModelDeployment specs are the same regardless of which backing storage a cluster uses.