Build the Inference Stack Platform on Modelplane Docs

Set Up the Gateway

API: modelplane.ai/v1alpha1 · InferenceGateway

The InferenceGateway sets up the control plane’s front door: one unified, OpenAI-compatible address that every ModelService is exposed through, routing each request on to the inference cluster serving it.

The InferenceGateway is a singleton: create exactly one, named default, on your Modelplane control plane. It fronts every inference cluster in the fleet, so you don’t create one per cluster.

The backend field selects which gateway runs it. Traefik is the only value today.

Define Hardware Classes

API: modelplane.ai/v1alpha1 · InferenceClass

An InferenceClass is a tested recipe for a GPU node pool. It bundles:

Devices: the node’s hardware as a list of Dynamic Resource Allocation (DRA) style devices, each with a driver, count, typed attributes, and capacity. The scheduler matches a member’s nodeSelector against these devices, and GPUs bind to pods through DRA.
Provisioning (optional): how to create a node pool of this class on a specific cloud. Classes without provisioning are for existing clusters where the pool already exists.

Different clouds and GPU types imply different classes. A GKE L4 pool is gke-l4-1x-g2. A bare-metal H100 pool is h100-8x-byo (no provisioning).

Register a Cluster

API: modelplane.ai/v1alpha1 · InferenceCluster

An InferenceCluster represents a Kubernetes cluster configured for model serving. Platform teams create these to provide GPU capacity.

Each cluster has:

A cluster source: GKE or EKS (Modelplane provisions the full cluster) or Existing (bring a cluster you manage yourself). See Supported Providers for the clouds and neoclouds Modelplane runs on.
One or more node pools, each referencing an InferenceClass for its hardware capabilities and provisioning recipe.
Labels for organizational metadata: tier, region, provider. These are the matching surface for ModelDeployment.clusterSelector.

Modelplane installs the serving stack it needs on every cluster it manages, including existing clusters, which it assumes are solely for its use.

Supported Providers

Modelplane is built on Crossplane and shares its infrastructure providers, so the set of clouds and neoclouds it reaches grows alongside Crossplane itself. This page shows where Modelplane runs today and where it’s headed.

A provider can show up here in three ways:

Note

Provisioning supported. Modelplane creates and manages the whole cluster from an InferenceCluster, selected through provisioning.provider. GKE and EKS work this way today.
Bring your own supported. Register a cluster you already run with source: Existing. This works on any provider whose Kubernetes meets Modelplane’s requirements (Dynamic Resource Allocation and a recent Kubernetes version), so you can run on the providers below now, ahead of native provisioning.
Crossplane provider exists. A Crossplane provider is published for the cloud. That provider is the path by which native provisioning lands, so it marks where Modelplane can grow next.

Clouds and neoclouds

Listed alphabetically, spanning hyperscalers and GPU-specialist neoclouds. Each runs a managed Kubernetes service with GPU node pools, so the bring-your-own path covers them all today. Where a Crossplane provider exists, it’s the path to native provisioning.