Define Hardware Classes
API: modelplane.ai/v1alpha1 · InferenceClass
An InferenceClass is a tested recipe for a GPU node pool. It bundles:
- Devices: the node’s hardware as a list of Dynamic Resource Allocation (DRA)
style devices, each with a driver, count, typed attributes, and capacity. The
scheduler matches a member’s
nodeSelectoragainst these devices, and GPUs bind to pods through DRA. - Provisioning (optional): how to create a node pool of this class on a specific cloud. Classes without provisioning are for existing clusters where the pool already exists.
Different clouds and GPU types imply different classes. A GKE L4 pool is
gke-l4-1x-g2. A bare-metal H100 pool is h100-8x-byo (no provisioning).
Describing devices
A class’s devices follow Kubernetes
Dynamic Resource Allocation
(DRA), the mechanism modern Kubernetes uses to match GPUs to pods. Each device
has a driver (the vendor that owns it, such as gpu.nvidia.com), a count
(how many a node has), typed attributes (such as architecture), and
capacity (quantities, such as memory). This mirrors the shape the GPU’s DRA
driver publishes on a real node, so what you declare here is what an ML team’s
nodeSelector matches against and what DRA binds at runtime.
You author the attribute and capacity keys, and there’s no fixed list. Pick the ones an ML team would reasonably select on, the GPU memory, the architecture, the compute capability, using the same names the driver reports.
DRA and synthetic devices
Each device sets a claim discriminator:
DRA(the default) is hardware a real DRA driver exposes, today GPUs. Modelplane both schedules against it and binds it to pods.Syntheticis described for scheduling only, never claimed. Use it for hardware that matters for placement but has no DRA driver yet, like an InfiniBand fabric.
The device contract
The driver, attribute keys, and capacity keys a class declares are a contract
with the ML team: a ModelDeployment’s nodeSelector matches a pool only if the
class publishes the attributes and capacity it asks for. ML teams write those
matches as CEL selectors over the keys you publish here. For
GPUs, these keys should mirror what the DRA driver reports, so the same selector
that places a deployment on the pool also binds the right device.
Publish a device’s real usable capacity, not its nominal spec. An 80GB H100
reports about 81559Mi of usable memory, so a class that declares 80Gi would
let a nodeSelector asking for >= 80Gi match the pool but then fail to bind the
GPU.
Examples
# An InferenceClass describing GKE g2-standard-8 with one NVIDIA L4 GPU.
#
# The provisioning block tells Modelplane how to create a node pool of
# this class on GKE. The devices block describes what hardware a node of
# this class has, DRA-style - used by the scheduler to match models to
# clusters, and to form DRA ResourceClaims for claim: DRA devices.
apiVersion: modelplane.ai/v1alpha1
kind: InferenceClass
metadata:
name: gke-l4-1x-g2
spec:
description: "GKE g2-standard-8, 1x NVIDIA L4"
provisioning:
provider: GKE
gke:
machineType: g2-standard-8
diskSizeGb: 100
accelerator:
type: nvidia-l4
count: 1
devices:
- name: gpu
claim: DRA
driver: gpu.nvidia.com
deviceClassName: gpu.nvidia.com
count: 1
attributes:
architecture: { string: Ada Lovelace }
capacity:
memory: { value: "23034Mi" }
# An InferenceClass describing EKS g6.xlarge with one NVIDIA L4 GPU.
#
# The provisioning block tells Modelplane how to create a node group of
# this class on EKS. The devices block describes what hardware a node of
# this class has, DRA-style - used by the scheduler to match models to
# clusters, and to form DRA ResourceClaims for claim: DRA devices.
apiVersion: modelplane.ai/v1alpha1
kind: InferenceClass
metadata:
name: eks-l4-1x-g6
spec:
description: "EKS g6.xlarge, 1x NVIDIA L4"
provisioning:
provider: EKS
eks:
instanceType: g6.xlarge
diskSizeGb: 100
accelerator:
type: nvidia-l4
count: 1
devices:
- name: gpu
claim: DRA
driver: gpu.nvidia.com
deviceClassName: gpu.nvidia.com
count: 1
attributes:
architecture: { string: Ada Lovelace }
capacity:
# The L4's real usable VRAM, what the NVIDIA DRA driver reports in the
# node's ResourceSlice, not its nominal 24GB. The scheduler matches a
# nodeSelector against this, so declaring the marketing number would let
# it place a model that wants 23Gi onto a node DRA then can't satisfy.
memory: { value: "23034Mi" }
# An InferenceClass describing a BYO 8x H100 node pool.
#
# No provisioning block: this class describes hardware that already
# exists on a bring-your-own cluster. Modelplane copies the devices block
# onto the cluster's status.gpuPools for the scheduler to match against.
apiVersion: modelplane.ai/v1alpha1
kind: InferenceClass
metadata:
name: h100-8x-byo
spec:
description: "BYO 8x NVIDIA H100 80GB"
# DRA-style devices. These are the contract a ModelDeployment's
# nodeSelector matches against. Keys are bare names; the domain comes
# from each device's driver. claim: DRA devices are emitted as requests
# in a DRA ResourceClaim; claim: Synthetic devices (here the InfiniBand
# fabric, which has no DRA driver) are matched for scheduling only.
devices:
- name: gpu
claim: DRA
driver: gpu.nvidia.com
deviceClassName: gpu.nvidia.com
count: 8
attributes:
architecture: { string: Hopper }
cudaComputeCapability: { version: "9.0.0" }
capacity:
# The H100 80GB's real usable VRAM, what the NVIDIA DRA driver reports,
# not its nominal 80GB. A nodeSelector asking for >= 80Gi would never bind.
memory: { value: "81559Mi" }
- name: nic
claim: Synthetic
driver: nic.nvidia.com
count: 8
attributes:
linkType: { string: infiniband }