InferenceClass Custom Resource
Hardware recipe defining GPU type, count, and provisioning for a node pool.
Concept guide: Define Hardware Classes →
#Metadata
#Example
Manifest
apiVersion: modelplane.ai/v1alpha1
kind: InferenceClass
metadata:
name: gke-l4-1x-g2
spec:
description: "GKE g2-standard-8, 1x NVIDIA L4"
provisioning:
provider: GKE
gke:
machineType: g2-standard-8
diskSizeGb: 100
accelerator:
type: nvidia-l4
count: 1
devices:
- name: gpu
claim: DRA
driver: gpu.nvidia.com
deviceClassName: gpu.nvidia.com
count: 1
attributes:
architecture: { string: Ada Lovelace }
capacity:
memory: { value: "23034Mi" }
#Spec
Human-readable description of the class.
DRA-style typed attributes for this device. Keys are bare names (e.g. architecture); the domain comes from the device’s driver. Each value sets exactly one typed field.
DRA-style capacity quantities for this device. Keys are bare names (e.g. memory); values are Kubernetes Quantities.
How Modelplane treats this device. DRA emits it as a request in a ResourceClaim, so DRA binds a matching device to the pod at admission time; use it for hardware a real DRA driver exposes. Synthetic describes the device for fleet scheduling only and never claims it; use it for hardware that matters for placement but has no DRA driver yet, like an InfiniBand fabric.
How many of this device a node has.
Name of the cluster-scoped DRA DeviceClass to claim this device through. Required for claim: DRA devices; the DRA driver install creates the DeviceClass (e.g. gpu.nvidia.com). Ignored for Synthetic devices.
DRA driver that owns this device (e.g. gpu.nvidia.com). Becomes the attribute/capacity domain a nodeSelector reads as device.attributes["
Name of this device within the class (e.g. gpu, nic).
How to provision a node pool of this class. Omit for classes that describe BYO node pools that already exist.
GPU accelerator to attach when provisioning the node group. Provisioning input only: the scheduler matches against spec.devices, not this block.
EC2 instance type (e.g. g6.xlarge, p4d.24xlarge). The instance family determines the GPU model; the accelerator block below is informational.
GPU accelerator to attach when provisioning the node pool. Provisioning input only: the scheduler matches against spec.devices, not this block, so count here is the GCP machine’s GPU count and need not be restated in devices.