Control Plane
Schedule workflows across a fleet of workers with a central gRPC control plane
The Blazen Control Plane (blazen-controlplane) is a central gRPC server that a fleet of workers connect into. It owns the authoritative view of running workflows, the connected-worker registry, and the assignment queue. Orchestrators submit workflows to it; the control plane decides which worker runs each one and streams the results back.
Where Distributed Workflows (blazen-peer) is a flat mesh of equal peers that dial each other directly, the control plane is hub-and-spoke: workers always open the connection outbound, so only the control plane needs a reachable address. That makes it NAT-friendly and a natural fit for a pool of GPU boxes phoning home to a coordinator.
:::note[Control Plane vs. ModelManager]
The control plane orchestrates workflows across workers. It does not centrally manage LLM provider instances or their credentials. Managing named provider instances inside one process is the job of the ModelManager — see Custom Providers. Each worker holds its own provider credentials locally.
:::
When to use it
- You have a pool of workers (GPU machines, agent runners, transcoders) and want to submit work to the pool without knowing which node will run it.
- You need capability-based routing — “run this on a worker that can do
inference:vision:depth” — instead of addressing a specific host. - You want a NAT-friendly topology where workers dial in and the coordinator stays put.
- You need priority scheduling, draining, idempotent submits, or human-in-the-loop input during a run.
If you just want one workflow to delegate a sub-step to one other machine, blazen-peer is simpler. Reach for the control plane when you’re scheduling across many workers.
The worker / orchestrator model
Two kinds of clients connect to one server:
orchestrator ──┐ submit / cancel / describe / subscribe /
│ list-workers / drain / respond-to-input
▼
ControlPlaneServer
(registry · queue · admission)
▲
worker ────────┘ opens a bidi session: Hello → Welcome,
then heartbeats, results, and events
- Server (
ControlPlaneServer) — the hub. Keeps the worker registry, the per-capability assignment queue (priority WFQ scheduler), and the admission/routing logic. It does not run workflows itself. - Worker (
Worker+AssignmentHandler) — dials in over a bidirectional stream, advertises its capabilities, tags, labels, and taints, and runs assignments it’s given. - Orchestrator (
Client, implementingOrchestratorClient) — submits workflow runs and subscribes to their events.
Run a server
use blazen_controlplane::ControlPlaneServer;
let server = ControlPlaneServer::new("cp-node-1");
// .with_tls(tls_config) // optional mTLS
// .with_store(arc_store) // optional durable AssignmentStore (Valkey)
server.serve("0.0.0.0:7445".parse()?).await?;
Connect a worker
A worker pairs a WorkerConfig (what it can do, how it authenticates, how its capacity is bounded) with an AssignmentHandler (what it actually does with an assignment):
use blazen_controlplane::{Worker, WorkerConfig, AssignmentHandler,
AssignmentContext, AssignmentFailure};
use blazen_controlplane::protocol::Assignment;
use blazen_core::distributed::{AdmissionMode, WorkerCapability};
use async_trait::async_trait;
use serde_json::Value;
struct MyHandler;
#[async_trait]
impl AssignmentHandler for MyHandler {
async fn handle(&self, assignment: Assignment, ctx: AssignmentContext)
-> Result<Value, AssignmentFailure>
{
// Run the workflow. Optionally emit progress with ctx.emit_event(...)
// or pause for input with ctx.request_input(...).
Ok(serde_json::json!({ "ok": true }))
}
}
let config = WorkerConfig::new("http://cp-host:7445", "worker-a")
.with_capability(WorkerCapability { kind: "inference:llm:ollama".into(), version: 1 })
.with_tag("region", "us-west")
.with_admission(AdmissionMode::Fixed { max_in_flight: 4 });
Worker::connect(config)?.run(MyHandler).await?;
The worker reconnects automatically under its configured retry policy if the connection drops.
Drive it from an orchestrator
Client implements blazen_core::distributed::OrchestratorClient:
use blazen_controlplane::Client;
use blazen_core::distributed::{OrchestratorClient, SubmitWorkflowRequest};
let client = Client::connect("http://cp-host:7445", None /* tls */, None /* bearer */).await?;
let snapshot = client.submit_workflow(SubmitWorkflowRequest {
workflow_name: "summarize".into(),
workflow_version: None,
input: serde_json::json!({ "url": "https://example.com" }),
required_tags: vec!["region=us-west".into()],
idempotency_key: Some("dedupe-1".into()),
deadline_ms: Some(30_000),
wait_for_worker: true,
resource_hint: None,
}).await?;
// Also available: cancel_workflow, describe_workflow, subscribe_run_events,
// list_workers — plus respond_to_input and drain_worker on Client directly.
Capability-based routing
Every assignment names a required capability { kind, version }. The control plane routes it through a stateless filter pipeline:
- Capability match — only workers advertising the exact
kind@version. - Tag predicate — drop workers whose tags don’t AND-match every
key=value(orkey=*) requirement. - Node selector + taints/tolerations —
requiredlabels must all match,forbiddennone,preferredadds to the tie-break score; un-toleratedNoScheduletaints exclude a worker,PreferNoScheduleonly penalizes it. - Capacity filter — drop workers whose per-worker admission strategy is full.
- Tie-break — least-loaded remaining worker (round-robin on ties), with a bonus for workers that already have the requested model resident.
The kind strings (inference:llm:*, inference:vision:*, media:*, agent:*, …) and the node-label / taint / priority conventions are the authoritative registry in the Capability Namespaces document — pick kinds from there rather than inventing strings.
Per-worker admission strategy
Each worker declares how its capacity is bounded:
| Strategy | Meaning |
|---|---|
Fixed { max_in_flight } | Hard count cap. Best for fungible CPU work. |
VramBudget { max_vram_mb } | VRAM-sum cap; assignments must carry a resource_hint.vram_mb. |
Reactive | Worker self-decides via offer / claim / decline. |
For Reactive workers the server sends an offer; the worker replies Claim or Decline { reason }, and the queue re-routes on decline.
Priority and fairness
Assignments carry a priority: u8 (lower = served first; default 128). The queue runs a deficit-round-robin weighted-fair-queueing scheduler across 8 priority bands (width 32, weights stepping 256 → 32), so high-priority work is favored but low-priority work never starves.
Authentication
The control plane uses the same auth surface as blazen-peer:
- Bearer token — a shared secret sent as
authorization: Bearer <token>metadata on every request. Read fromBLAZEN_PEER_TOKEN, or supplied explicitly: orchestrators pass it toClient::connect, workers viaWorkerConfig::with_bearer_token. The server verifies it on every RPC with a constant-time compare. If no token is configured server-side, auth is off — intended for local / loopback development only. - mTLS — configure the server with
with_tlsand clients with aClientTlsConfig(or thewith_mtlsPEM-loading helpers). Recommended for production fleets.
The crate also describes planned enrollment policies — Open, Allowlist, and Signed-handshake — but only the bearer + mTLS surface above is enforced today.
Human-in-the-loop input requests
A running assignment can pause, ask for input, and resume with the answer — useful for approvals, clarifying questions, or any human-in-the-loop step.
- The worker calls
ctx.request_input(prompt, metadata, timeout_ms). This emits an"input.request"event carrying{ request_id, prompt, metadata }and blocks. - The control plane fans that event out to subscribers (
subscribe_run_events/subscribe_all), so an orchestrator — or a UI behind it — sees the request. - The orchestrator answers with
client.respond_to_input(run_id, request_id, response). - The control plane delivers the answer to the worker assigned that run, and
request_inputreturns the JSON value to the handler so the workflow continues.
The call fails if the run is cancelled, the worker disconnects, or timeout_ms elapses with no answer.
Transports and durability
- gRPC (default) over HTTP/2, with optional mTLS.
- HTTP/SSE bridge (
http-transportfeature) for clients that can’t speak HTTP/2 — browsers and some serverless / wasi targets. - OpenAI-compatible REST (
http-restfeature) plus Blazen-admin endpoints. - Durable assignment queue (
valkey-storefeature) so queued and in-flight work survives a control-plane restart; the default store is in-memory.
Related
- Distributed Workflows — the flat peer-mesh alternative (
blazen-peer). - Custom Providers — the
ModelManagerfor managing provider instances inside one process. - Capability Namespaces — the authoritative
kind/ label / taint / priority registry.