Version

Define Hardware Classes

On this page

API: modelplane.ai/v1alpha1 · InferenceClass

An InferenceClass is a tested recipe for a GPU node pool. It bundles:

Devices: the node’s hardware as a list of Dynamic Resource Allocation (DRA) style devices, each with a driver, count, typed attributes, and capacity. The scheduler matches a member’s nodeSelector against these devices, and GPUs bind to pods through DRA.
Provisioning (optional): how to create a node pool of this class on a specific cloud. Classes without provisioning are for existing clusters where the pool already exists.

Different clouds and GPU types imply different classes. A GKE L4 pool is gke-l4-1x-g2. A bare-metal H100 pool is h100-8x-byo (no provisioning).

Describing devices

A class’s devices follow Kubernetes Dynamic Resource Allocation (DRA), the mechanism modern Kubernetes uses to match GPUs to pods. Each device has a driver (the vendor that owns it, such as gpu.nvidia.com), a count (how many a node has), typed attributes (such as architecture), and capacity (quantities, such as memory). This mirrors the shape the GPU’s DRA driver publishes on a real node, so what you declare here is what an ML team’s nodeSelector matches against and what DRA binds at runtime.

You author the attribute and capacity keys, and there’s no fixed list. Pick the ones an ML team would reasonably select on, the GPU memory, the architecture, the compute capability, using the same names the driver reports.

DRA and synthetic devices

Each device sets a claim discriminator:

DRA (the default) is hardware a real DRA driver exposes, today GPUs. Modelplane both schedules against it and binds it to pods.
Synthetic is described for scheduling only, never claimed. Use it for hardware that matters for placement but has no DRA driver yet, like an InfiniBand fabric.

The device contract

The driver, attribute keys, and capacity keys a class declares are a contract with the ML team: a ModelDeployment’s nodeSelector matches a pool only if the class publishes the attributes and capacity it asks for. ML teams write those matches as CEL selectors over the keys you publish here. For GPUs, these keys should mirror what the DRA driver reports, so the same selector that places a deployment on the pool also binds the right device.

Publish a device’s real usable capacity, not its nominal spec. An 80GB H100 reports about 81559Mi of usable memory, so a class that declares 80Gi would let a nodeSelector asking for >= 80Gi match the pool but then fail to bind the GPU.

Examples

# An InferenceClass describing GKE g2-standard-8 with one NVIDIA L4 GPU.
#
# The provisioning block tells Modelplane how to create a node pool of
# this class on GKE. The devices block describes what hardware a node of
# this class has, DRA-style - used by the scheduler to match models to
# clusters, and to form DRA ResourceClaims for claim: DRA devices.
apiVersion: modelplane.ai/v1alpha1
kind: InferenceClass
metadata:
  name: gke-l4-1x-g2
spec:
  description: "GKE g2-standard-8, 1x NVIDIA L4"
  provisioning:
    provider: GKE
    gke:
      machineType: g2-standard-8
      diskSizeGb: 100
      accelerator:
        type: nvidia-l4
        count: 1
  devices:
  - name: gpu
    claim: DRA
    driver: gpu.nvidia.com
    deviceClassName: gpu.nvidia.com
    count: 1
    attributes:
      architecture: { string: Ada Lovelace }
    capacity:
      memory: { value: "23034Mi" }

# An InferenceClass describing EKS g6.xlarge with one NVIDIA L4 GPU.
#
# The provisioning block tells Modelplane how to create a node group of
# this class on EKS. The devices block describes what hardware a node of
# this class has, DRA-style - used by the scheduler to match models to
# clusters, and to form DRA ResourceClaims for claim: DRA devices.
apiVersion: modelplane.ai/v1alpha1
kind: InferenceClass
metadata:
  name: eks-l4-1x-g6
spec:
  description: "EKS g6.xlarge, 1x NVIDIA L4"
  provisioning:
    provider: EKS
    eks:
      instanceType: g6.xlarge
      diskSizeGb: 100
      accelerator:
        type: nvidia-l4
        count: 1
  devices:
  - name: gpu
    claim: DRA
    driver: gpu.nvidia.com
    deviceClassName: gpu.nvidia.com
    count: 1
    attributes:
      architecture: { string: Ada Lovelace }
    capacity:
      # The L4's real usable VRAM, what the NVIDIA DRA driver reports in the
      # node's ResourceSlice, not its nominal 24GB. The scheduler matches a
      # nodeSelector against this, so declaring the marketing number would let
      # it place a model that wants 23Gi onto a node DRA then can't satisfy.
      memory: { value: "23034Mi" }

# An InferenceClass describing a BYO 8x H100 node pool.
#
# No provisioning block: this class describes hardware that already
# exists on a bring-your-own cluster. Modelplane copies the devices block
# onto the cluster's status.gpuPools for the scheduler to match against.
apiVersion: modelplane.ai/v1alpha1
kind: InferenceClass
metadata:
  name: h100-8x-byo
spec:
  description: "BYO 8x NVIDIA H100 80GB"
  # DRA-style devices. These are the contract a ModelDeployment's
  # nodeSelector matches against. Keys are bare names; the domain comes
  # from each device's driver. claim: DRA devices are emitted as requests
  # in a DRA ResourceClaim; claim: Synthetic devices (here the InfiniBand
  # fabric, which has no DRA driver) are matched for scheduling only.
  devices:
  - name: gpu
    claim: DRA
    driver: gpu.nvidia.com
    deviceClassName: gpu.nvidia.com
    count: 8
    attributes:
      architecture: { string: Hopper }
      cudaComputeCapability: { version: "9.0.0" }
    capacity:
      # The H100 80GB's real usable VRAM, what the NVIDIA DRA driver reports,
      # not its nominal 80GB. A nodeSelector asking for >= 80Gi would never bind.
      memory: { value: "81559Mi" }
  - name: nic
    claim: Synthetic
    driver: nic.nvidia.com
    count: 8
    attributes:
      linkType: { string: infiniband }