Version

Get started

On this page

Modelplane is an open source control plane for AI inference. It separates two concerns: a platform team managing GPU capacity, and ML teams deploying models against it. Without it, every change on one side creates work for the other. When the platform team updates infrastructure, ML teams have to react. When model requirements change, the platform team gets a request.

With Modelplane, the platform team publishes hardware without knowing what models will run on it. The ML team declares what a model needs without knowing what clusters exist. The control plane resolves it and keeps it current as both sides change.

In this tour, you’ll switch between provisioning infrastructure and declaring a model to see how they interact. By the end you’ll have a GPU fleet across three regions and one OpenAI-compatible endpoint routing to a model served across two of them.

This is not a production setup and takes around 45 minutes to run.

What you’ll build

The platform team provisions a starter cluster and grows it to two A100 regions; the ML team serves a model on the L4, then scales it onto an A100, all behind one endpoint.

Before you begin

You’ll need kind, kubectl, and Helm installed, plus an AWS or GCP account with permission to create clusters. Each step covers what it needs as you reach it.

The tour

Installation: stand up the Modelplane control plane.
Build the platform: provision your first GPU cluster.
Deploying a model: serve a model and send it a request.
Scale the platform: grow to a multi-region fleet.
Scale the model: serve the model from two regions behind one endpoint.

First, follow the Installation guide.