# Modelplane Documentation

> Modelplane is the open source control plane for AI model serving. It extends Crossplane to manage AI inference across a fleet of GPU clusters.

## Documentation

- [Overview](https://v0-1.docs.modelplane.ai/overview/): What Modelplane is, why it exists, and how it works.
- [Deploy a Model](https://v0-1.docs.modelplane.ai/models/model-deployment/): Deploy a model to the fleet, from a single pod to disaggregated prefill and decode.
- [Get started](https://v0-1.docs.modelplane.ai/getting-started/): A guided tour of Modelplane, from an empty control plane to a model served across regions.
- [Installation](https://v0-1.docs.modelplane.ai/getting-started/installation/): Stand up the Modelplane control plane on a local kind cluster.
- [Qwen3-8B](https://v0-1.docs.modelplane.ai/examples/qwen3-8b/): An 8.2B dense chat model on a single NVIDIA L4.
- [Set Up the Gateway](https://v0-1.docs.modelplane.ai/platform/inference-gateway/): Unified OpenAI-compatible endpoint on the control plane cluster.
- [Why Modelplane](https://v0-1.docs.modelplane.ai/overview/why/): The problem Modelplane solves and how it compares to the alternatives.
- [Build the platform](https://v0-1.docs.modelplane.ai/getting-started/build-the-platform/): Set up the gateway, give the control plane cloud credentials, and provision your first GPU cluster.
- [Define Hardware Classes](https://v0-1.docs.modelplane.ai/platform/inference-class/): Hardware recipe defining GPU type, count, and provisioning for a node pool.
- [Expose a Model](https://v0-1.docs.modelplane.ai/models/model-service/): Expose model endpoints via a unified OpenAI-compatible URL.
- [How it schedules](https://v0-1.docs.modelplane.ai/architecture/scheduling/): How Modelplane places a deployment's replicas across the fleet, and the limits of that placement.
- [How Modelplane works](https://v0-1.docs.modelplane.ai/overview/how-it-works/): The architecture, the resources, and what happens when you deploy a model.
- [Qwen3-Coder-480B](https://v0-1.docs.modelplane.ai/examples/qwen3-coder/): A 480B code MoE, multi-node BF16 over EFA or single-node FP8 on SGLang.
- [Cache Model Weights](https://v0-1.docs.modelplane.ai/models/model-cache/): Stage model weights on cluster storage before serving.
- [Deploying a model](https://v0-1.docs.modelplane.ai/getting-started/deploying-a-model/): Declare what your model needs and serve it behind a unified endpoint.
- [Kimi-K2](https://v0-1.docs.modelplane.ai/examples/kimi-k2/): A 1T MoE served prefill/decode disaggregated across two H200 nodes.
- [Register a Cluster](https://v0-1.docs.modelplane.ai/platform/inference-cluster/): A Kubernetes cluster registered with Modelplane for model serving.
- [FAQ](https://v0-1.docs.modelplane.ai/overview/faq/): Short answers to the questions practitioners ask about Modelplane first.
- [Glossary](https://v0-1.docs.modelplane.ai/overview/glossary/): Terms used throughout the Modelplane docs and what they mean.
- [AI tools](https://v0-1.docs.modelplane.ai/overview/ai-tools/): Connect AI assistants and coding agents to the Modelplane docs through MCP, Markdown, and llms.txt.
- [Architecture](https://v0-1.docs.modelplane.ai/architecture/): How Modelplane is built, the Crossplane foundation, the composition-function model, and the choices behind them.
- [Llama-3.1-8B](https://v0-1.docs.modelplane.ai/examples/llama-3.1-8b/): An 8B dense chat model on a single NVIDIA L4.
- [Route to External Providers](https://v0-1.docs.modelplane.ai/models/model-endpoint/): A reachable inference endpoint, composed per replica or created manually for external providers.
- [Scale the platform](https://v0-1.docs.modelplane.ai/getting-started/scale-the-platform/): Grow from one cluster to a multi-region fleet.
- [Supported Providers](https://v0-1.docs.modelplane.ai/platform/providers/): The clouds and neoclouds Modelplane runs on today, and the Crossplane providers it grows into.
- [API Reference](https://v0-1.docs.modelplane.ai/reference/): Every Modelplane API type, grouped by Platform, Models, and Composed.
- [Scale the model](https://v0-1.docs.modelplane.ai/getting-started/scale-the-model/): Serve the model from two regions behind a single endpoint.
- [Clean up](https://v0-1.docs.modelplane.ai/getting-started/clean-up/): Tear down everything you created during the tour.
- [EKSCluster](https://v0-1.docs.modelplane.ai/reference/eksclusters/)
- [GKECluster](https://v0-1.docs.modelplane.ai/reference/gkeclusters/)
- [ServingStack](https://v0-1.docs.modelplane.ai/reference/servingstacks/)

## Resources

- Full documentation as one file: https://v0-1.docs.modelplane.ai/llms-full.txt
- Connect an AI assistant (MCP, Markdown): https://v0-1.docs.modelplane.ai/overview/ai-tools/
- GitHub: https://github.com/modelplaneai/modelplane