Version

Build the platform

On this page

This is the platform team’s side of Modelplane. You set up the gateway that fronts your models, give the control plane cloud credentials, and register your first GPU cluster: a hardware profile published as an InferenceClass and an InferenceCluster that offers it.

In the next step, the ML team will create a model deployment that schedules against this capacity without knowing which cluster it runs on.

Prerequisites

An AWS account with permissions to create EKS clusters, VPCs, and IAM roles
AWS access key ID and secret access key

A GCP account with permissions to create GKE clusters, VPCs, and IAM roles
A GCP service account JSON key

Set up the InferenceGateway

The InferenceGateway installs Traefik Proxy and MetalLB on the control plane. Traefik routes inference traffic to model replicas. MetalLB assigns Traefik’s LoadBalancer service an external IP on kind, which doesn’t have a cloud load balancer. You need one named default per control plane.

If you run the control plane on a cloud cluster with native LoadBalancer support, omit the loadBalancer field.

# The InferenceGateway creates a unified, OpenAI-compatible endpoint on the
# control plane cluster. It installs Traefik Proxy and creates a Gateway that
# routes traffic to model replicas on remote inference clusters.
#
# Create one InferenceGateway per control plane. It must be named "default".
#
# For kind or bare-metal clusters, set loadBalancer to MetalLB and configure an
# address pool. For cloud clusters with native LoadBalancer support, omit the
# loadBalancer field entirely.
apiVersion: modelplane.ai/v1alpha1
kind: InferenceGateway
metadata:
  name: default
spec:
  backend: Traefik
  traefik:
    version: "40.2.0"

    # Remove the loadBalancer section if your cluster supports LoadBalancer
    # services natively (e.g. GKE, EKS).
    loadBalancer: MetalLB
    metallb:
      addressPool: "172.18.255.200-172.18.255.250"

Wait until the gateway is ready:

kubectl wait --for=condition=Ready ig/default --timeout=5m

Configure cloud credentials

Give the control plane credentials so it can provision clusters in your cloud account.

Create an AWS credentials file:

[default]
aws_access_key_id = 
aws_secret_access_key =

Create a Kubernetes secret:

kubectl create secret generic aws-creds \
  --from-file=credentials= \
  -n crossplane-system

Apply the ClusterProviderConfig referencing your secret:

# Points the AWS provider at the credentials Secret you created. Named default,
# so InferenceClusters with an EKS source use it without further configuration.
apiVersion: aws.m.upbound.io/v1beta1
kind: ClusterProviderConfig
metadata:
  name: default
spec:
  credentials:
    source: Secret
    secretRef:
      namespace: crossplane-system
      name: aws-creds
      key: credentials

Create a Kubernetes secret:

kubectl create secret generic gcp-creds \
  --from-file=credentials=.json \
  -n crossplane-system

Apply the ClusterProviderConfig, setting projectID to your GCP project:

# Points the GCP provider at the credentials Secret you created. Named default,
# so InferenceClusters with a GKE source use it without further configuration.
apiVersion: gcp.m.upbound.io/v1beta1
kind: ClusterProviderConfig
metadata:
  name: default
spec:
  projectID: my-gcp-project  # replace with your GCP project
  credentials:
    source: Secret
    secretRef:
      namespace: crossplane-system
      name: gcp-creds
      key: credentials

curl -fsSL https://v0-1.docs.modelplane.ai/examples/getting-started/clusterproviderconfig-gke.yaml \
  | sed 's/my-gcp-project//' \
  | kubectl apply -f -

Publish hardware and register the cluster

The InferenceClass describes a hardware profile and how to provision it. The InferenceCluster registers a cluster that offers it. Apply both:

apiVersion: modelplane.ai/v1alpha1
kind: InferenceClass
metadata:
  name: l4-1x-g6
spec:
  description: "EKS g6.xlarge, 1x NVIDIA L4"
  provisioning:
    provider: EKS
    eks:
      instanceType: g6.xlarge
      diskSizeGb: 50
      accelerator:
        type: nvidia-l4
        count: 1
  devices:
  - name: gpu
    claim: DRA
    driver: gpu.nvidia.com
    deviceClassName: gpu.nvidia.com
    count: 1
    attributes:
      architecture: { string: Ada Lovelace }
    capacity:
      memory: { value: "23034Mi" }   # L4's real reported VRAM (not the nominal 24GB)
---
apiVersion: modelplane.ai/v1alpha1
kind: InferenceCluster
metadata:
  name: eks-us-east
  labels:
    modelplane.ai/region: us-east
spec:
  cluster:
    source: EKS
    eks:
      region: us-east-1
  nodePools:
  - name: gpu-l4
    className: l4-1x-g6
    nodeCount: 1
    minNodeCount: 1
    maxNodeCount: 1
    zones:
    - us-east-1b

Modelplane provisions the cluster. This takes about 15 minutes:

kubectl wait --for=condition=Ready ic/eks-us-east --timeout=20m

Apply the manifest, setting the cluster’s project to your GCP project:

apiVersion: modelplane.ai/v1alpha1
kind: InferenceClass
metadata:
  name: gke-l4-1x-g2
spec:
  description: "GKE g2-standard-8, 1x NVIDIA L4"
  provisioning:
    provider: GKE
    gke:
      machineType: g2-standard-8
      diskSizeGb: 100
      accelerator:
        type: nvidia-l4
        count: 1
  devices:
  - name: gpu
    claim: DRA
    driver: gpu.nvidia.com
    deviceClassName: gpu.nvidia.com
    count: 1
    attributes:
      architecture: { string: Ada Lovelace }
    capacity:
      memory: { value: "23034Mi" }   # L4's real reported VRAM (not the nominal 24GB)
---
apiVersion: modelplane.ai/v1alpha1
kind: InferenceCluster
metadata:
  name: starter
  labels:
    modelplane.ai/region: us-central
spec:
  cluster:
    source: GKE
    gke:
      project: my-gcp-project
      region: us-central1
  nodePools:
  - name: gpu-l4
    className: gke-l4-1x-g2
    nodeCount: 1
    minNodeCount: 1   # keep >=1; the autoscaler can't scale a GPU pool up from 0 for DRA pods
    maxNodeCount: 2
    zones:
    - us-central1-a

curl -fsSL https://v0-1.docs.modelplane.ai/examples/getting-started/gke/platform.yaml \
  | sed 's/my-gcp-project//' \
  | kubectl apply -f -

Modelplane provisions the cluster. This takes about 15 minutes:

kubectl wait --for=condition=Ready ic/starter --timeout=20m

Note

Modelplane is reconciling the infrastructure against the source of truth, the manifest you just applied.

While you wait, Modelplane is creating the EKS or GKE cluster and its GPU node pool, then installing the inference stack with LeaderWorkerSet for multi-node serving, llm-d for inference-aware routing, Envoy Gateway for traffic management, and the storage class for model weights. This is the same reconciliation loop Crossplane uses to configure other infrastructure, extended to the inference layer.

Once the cluster is Ready the ML team can deploy a model on it.

Note

A cloud GPU cluster costs money while it runs. To stop the tour and resume later, follow Clean up.

Next step

Now that the platform is provisioned, the ML team can deploy a model by describing what the model needs, not the infrastructure.