<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Build the Inference Stack Platform on Modelplane Docs</title><link>https://v0-1.docs.modelplane.ai/platform/</link><description>Recent content in Build the Inference Stack Platform on Modelplane Docs</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Mon, 01 Jan 0001 00:00:00 +0000</lastBuildDate><atom:link href="https://v0-1.docs.modelplane.ai/platform/index.xml" rel="self" type="application/rss+xml"/><item><title>Set Up the Gateway</title><link>https://v0-1.docs.modelplane.ai/platform/inference-gateway/</link><pubDate/><guid>https://v0-1.docs.modelplane.ai/platform/inference-gateway/</guid><description>&lt;p&gt;&lt;strong&gt;API:&lt;/strong&gt; &lt;a href="https://v0-1.docs.modelplane.ai/reference/inferencegateways/"&gt;&lt;code&gt;modelplane.ai/v1alpha1&lt;/code&gt; · InferenceGateway&lt;/a&gt;&lt;/p&gt;
&lt;!-- vale write-good.Passive = NO --&gt;
&lt;p&gt;The &lt;code&gt;InferenceGateway&lt;/code&gt; sets up the control plane&amp;rsquo;s front door: one unified,
OpenAI-compatible address that every &lt;code&gt;ModelService&lt;/code&gt; is exposed through, routing
each request on to the inference cluster serving it.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;InferenceGateway&lt;/code&gt; is a singleton: create exactly one, named &lt;code&gt;default&lt;/code&gt;, on
your Modelplane control plane. It fronts every inference cluster in the fleet, so
you don&amp;rsquo;t create one per cluster.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;backend&lt;/code&gt; field selects which gateway runs it. &lt;code&gt;Traefik&lt;/code&gt; is the only value
today.&lt;/p&gt;</description></item><item><title>Define Hardware Classes</title><link>https://v0-1.docs.modelplane.ai/platform/inference-class/</link><pubDate/><guid>https://v0-1.docs.modelplane.ai/platform/inference-class/</guid><description>&lt;p&gt;&lt;strong&gt;API:&lt;/strong&gt; &lt;a href="https://v0-1.docs.modelplane.ai/reference/inferenceclasses/"&gt;&lt;code&gt;modelplane.ai/v1alpha1&lt;/code&gt; · InferenceClass&lt;/a&gt;&lt;/p&gt;
&lt;!-- vale write-good.Passive = NO --&gt;
&lt;p&gt;An &lt;code&gt;InferenceClass&lt;/code&gt; is a tested recipe for a GPU node pool. It bundles:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Devices&lt;/strong&gt;: the node&amp;rsquo;s hardware as a list of Dynamic Resource Allocation (DRA)
style devices, each with a driver, count, typed attributes, and capacity. The
scheduler matches a member&amp;rsquo;s &lt;code&gt;nodeSelector&lt;/code&gt; against these devices, and GPUs
bind to pods through DRA.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Provisioning&lt;/strong&gt; (optional): how to create a node pool of this class on a
specific cloud. Classes without provisioning are for existing clusters where
the pool already exists.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Different clouds and GPU types imply different classes. A GKE L4 pool is
&lt;code&gt;gke-l4-1x-g2&lt;/code&gt;. A bare-metal H100 pool is &lt;code&gt;h100-8x-byo&lt;/code&gt; (no provisioning).&lt;/p&gt;</description></item><item><title>Register a Cluster</title><link>https://v0-1.docs.modelplane.ai/platform/inference-cluster/</link><pubDate/><guid>https://v0-1.docs.modelplane.ai/platform/inference-cluster/</guid><description>&lt;p&gt;&lt;strong&gt;API:&lt;/strong&gt; &lt;a href="https://v0-1.docs.modelplane.ai/reference/inferenceclusters/"&gt;&lt;code&gt;modelplane.ai/v1alpha1&lt;/code&gt; · InferenceCluster&lt;/a&gt;&lt;/p&gt;
&lt;!-- vale write-good.Passive = NO --&gt;
&lt;p&gt;An &lt;code&gt;InferenceCluster&lt;/code&gt; represents a Kubernetes cluster configured for model
serving. Platform teams create these to provide GPU capacity.&lt;/p&gt;
&lt;p&gt;Each cluster has:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;cluster source&lt;/strong&gt;: &lt;code&gt;GKE&lt;/code&gt; or &lt;code&gt;EKS&lt;/code&gt; (Modelplane provisions the full cluster)
or &lt;code&gt;Existing&lt;/code&gt; (bring a cluster you manage yourself). See
&lt;a href="https://v0-1.docs.modelplane.ai/platform/providers/"&gt;Supported Providers&lt;/a&gt; for the clouds and
neoclouds Modelplane runs on.&lt;/li&gt;
&lt;li&gt;One or more &lt;strong&gt;node pools&lt;/strong&gt;, each referencing an &lt;code&gt;InferenceClass&lt;/code&gt; for its
hardware capabilities and provisioning recipe.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Labels&lt;/strong&gt; for organizational metadata: tier, region, provider. These are the
matching surface for &lt;code&gt;ModelDeployment.clusterSelector&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Modelplane installs the serving stack it needs on every cluster it manages,
including existing clusters, which it assumes are solely for its use.&lt;/p&gt;</description></item><item><title>Supported Providers</title><link>https://v0-1.docs.modelplane.ai/platform/providers/</link><pubDate/><guid>https://v0-1.docs.modelplane.ai/platform/providers/</guid><description>&lt;p&gt;Modelplane is built on &lt;a href="https://crossplane.io"&gt;Crossplane&lt;/a&gt; and shares its
infrastructure providers, so the set of clouds and neoclouds it reaches grows
alongside Crossplane itself. This page shows where Modelplane runs today and
where it&amp;rsquo;s headed.&lt;/p&gt;
&lt;p&gt;A provider can show up here in three ways:&lt;/p&gt;
&lt;div class="admonition note"&gt;
&lt;div class="admonition-title"&gt;
&lt;svg class="bi flex-shrink-0" role="img" aria-label="note:"&gt;&lt;use
xlink:href="#info"/&gt;&lt;/svg&gt;
&lt;span class="ps-1"&gt;Note&lt;/span&gt;
&lt;/div&gt;
&lt;div class="admonition-content"&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Provisioning supported.&lt;/strong&gt; Modelplane creates and manages the whole cluster
from an &lt;code&gt;InferenceCluster&lt;/code&gt;, selected through &lt;code&gt;provisioning.provider&lt;/code&gt;. GKE and
EKS work this way today.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Bring your own supported.&lt;/strong&gt; Register a cluster you already run with
&lt;code&gt;source: Existing&lt;/code&gt;. This works on any provider whose Kubernetes meets
Modelplane&amp;rsquo;s requirements (Dynamic Resource Allocation and a recent Kubernetes
version), so you can run on the providers below now, ahead of native
provisioning.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Crossplane provider exists.&lt;/strong&gt; A Crossplane provider is published for the
cloud. That provider is the path by which native provisioning lands, so it
marks where Modelplane can grow next.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2 id="clouds-and-neoclouds"&gt;Clouds and neoclouds &lt;a class="anchor-link" id="clouds-and-neoclouds" href="#clouds-and-neoclouds" aria-label="Link to this section: Clouds and neoclouds"&gt;&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Listed alphabetically, spanning hyperscalers and GPU-specialist neoclouds. Each
runs a managed Kubernetes service with GPU node pools, so the bring-your-own path
covers them all today. Where a Crossplane provider exists, it&amp;rsquo;s the path to
native provisioning.&lt;/p&gt;</description></item></channel></rss>