Version

InferenceClass Custom Resource

On this page

Hardware recipe defining GPU type, count, and provisioning for a node pool.

Concept guide: Define Hardware Classes →

#Metadata

API version: modelplane.ai/v1alpha1
Kind: InferenceClass
Scope: Cluster
Short names: icl

#Example

Manifest

apiVersion: modelplane.ai/v1alpha1
kind: InferenceClass
metadata:
  name: gke-l4-1x-g2
spec:
  description: "GKE g2-standard-8, 1x NVIDIA L4"
  provisioning:
    provider: GKE
    gke:
      machineType: g2-standard-8
      diskSizeGb: 100
      accelerator:
        type: nvidia-l4
        count: 1
  devices:
    - name: gpu
      claim: DRA
      driver: gpu.nvidia.com
      deviceClassName: gpu.nvidia.com
      count: 1
      attributes:
        architecture: { string: Ada Lovelace }
      capacity:
        memory: { value: "23034Mi" }

#Spec

Human-readable description of the class.

DRA-style typed attributes for this device. Keys are bare names (e.g. architecture); the domain comes from the device’s driver. Each value sets exactly one typed field.

DRA-style capacity quantities for this device. Keys are bare names (e.g. memory); values are Kubernetes Quantities.

How Modelplane treats this device. DRA emits it as a request in a ResourceClaim, so DRA binds a matching device to the pod at admission time; use it for hardware a real DRA driver exposes. Synthetic describes the device for fleet scheduling only and never claims it; use it for hardware that matters for placement but has no DRA driver yet, like an InfiniBand fabric.

How many of this device a node has.

Name of the cluster-scoped DRA DeviceClass to claim this device through. Required for claim: DRA devices; the DRA driver install creates the DeviceClass (e.g. gpu.nvidia.com). Ignored for Synthetic devices.

DRA driver that owns this device (e.g. gpu.nvidia.com). Becomes the attribute/capacity domain a nodeSelector reads as device.attributes[""]..

Name of this device within the class (e.g. gpu, nic).

How to provision a node pool of this class. Omit for classes that describe BYO node pools that already exist.

GPU accelerator to attach when provisioning the node group. Provisioning input only: the scheduler matches against spec.devices, not this block.

GPU accelerator type (e.g. nvidia-a10g, nvidia-h100). Informational - reported on the consuming InferenceCluster’s status.

EC2 instance type (e.g. g6.xlarge, p4d.24xlarge). The instance family determines the GPU model; the accelerator block below is informational.

GPU accelerator to attach when provisioning the node pool. Provisioning input only: the scheduler matches against spec.devices, not this block, so count here is the GCP machine’s GPU count and need not be restated in devices.

GPU accelerator type passed to GCP (e.g. nvidia-l4, nvidia-h100-80gb).

InferenceClass Custom Resource

#Metadata

#Example

#Spec

#Status