Models & Providers

The canonical way to construct LLM providers and dispatch across many with ModelManager

Blazen talks to LLMs through providers — OpenAI, Anthropic, fal.ai, Groq, OpenRouter, and 15+ more, plus local in-process backends (mistral.rs, llama.cpp, candle). Every provider implements the same Model interface: build a list of ChatMessages, call complete() (or stream()), and read back a typed ModelResponse. Switching providers is a one-line change.

There are two tiers to the API. Reach for Tier 1 by default; graduate to Tier 2 when you’re juggling several models.

Tier 1 — construct a provider, call `.complete()`

Instantiate the provider you want and call it directly. This is the headline pattern and covers the majority of use cases.

from blazen import OpenAiProvider, ProviderOptions, ChatMessage

model = OpenAiProvider(options=ProviderOptions(api_key="sk-..."))

resp = await model.complete([ChatMessage.user("Hello")])
print(resp.content)

import { OpenAiProvider, ChatMessage } from "blazen";

const model = OpenAiProvider.create({ apiKey: "sk-..." });

const resp = await model.complete([ChatMessage.user("Hello")]);
console.log(resp.content);

use blazen_llm::{Model, ModelRequest, ChatMessage};
use blazen_llm::providers::openai::OpenAiProvider;

let model = OpenAiProvider::new("sk-...");

let resp = model
    .complete(ModelRequest::new(vec![ChatMessage::user("Hello")]))
    .await?;
println!("{}", resp.content.unwrap_or_default());

Every provider has its own class — swap OpenAiProvider for AnthropicProvider, FalProvider, GroqProvider, OpenRouterProvider, and so on. Anthropic and fal take provider-specific options:

from blazen import AnthropicProvider, FalProvider, FalOptions, ProviderOptions, ChatMessage

claude = AnthropicProvider(options=ProviderOptions(api_key="sk-ant-..."))
fal = FalProvider(options=FalOptions(api_key="fal-..."))

resp = await claude.complete([ChatMessage.user("Hello")])

import { AnthropicProvider, FalProvider, ChatMessage } from "blazen";

const claude = AnthropicProvider.create({ apiKey: "sk-ant-..." });
const fal = FalProvider.create({ apiKey: "fal-..." });

const resp = await claude.complete([ChatMessage.user("Hello")]);

use blazen_llm::{Model, ModelRequest, ChatMessage};
use blazen_llm::providers::anthropic::AnthropicProvider;
use blazen_llm::providers::fal::FalProvider;

let claude = AnthropicProvider::new("sk-ant-...");
let fal = FalProvider::new("fal-...");

let resp = claude
    .complete(ModelRequest::new(vec![ChatMessage::user("Hello")]))
    .await?;

Omit the API key (or the whole options object) and Blazen reads the provider’s conventional environment variable — OPENAI_API_KEY, ANTHROPIC_API_KEY, FAL_KEY, and so on.

Streaming

Every provider streams too. The Python and Rust surfaces yield chunks via async iteration; Node and WASM take an onChunk callback.

async for chunk in model.stream([ChatMessage.user("Count to five")]):
    print(chunk.delta, end="")

await model.stream([ChatMessage.user("Count to five")], (chunk) => {
  if (chunk.delta) process.stdout.write(chunk.delta);
});

Tier 2 — many providers, one `ModelManager`

When an app grows past a single model — a fast cheap one for routing, a smart expensive one for the hard turns, a local model for offline work — register them all in a single ModelManager and dispatch by name. This is a multi-provider registry inspired by LlamaIndex’s model-settings pattern: name your models once, then refer to them by name everywhere.

register is the unified entry point. It accepts both remote providers (which dispatch straight through and never count against a memory budget) and local in-process backends (which additionally participate in load/unload lifecycle with per-pool LRU eviction). Once registered:

complete(id, messages) runs a completion against the named provider.
stream(id, messages, onChunk) streams against it.
get(id) hands back the provider instance so you can call it, pass it around, or compose it (e.g. wrap several in a fallback).

from blazen import ModelManager, FalProvider, FalOptions, AnthropicProvider, ProviderOptions, ChatMessage

mgr = ModelManager()
await mgr.register("fast", FalProvider(options=FalOptions(api_key="fal-...")))
await mgr.register("smart", AnthropicProvider(options=ProviderOptions(api_key="sk-ant-...")))

resp = await mgr.complete("fast", [ChatMessage.user("Hello")])

# Fetch an instance back to use or compose directly:
smart = await mgr.get("smart")
detailed = await smart.complete([ChatMessage.user("Explain in depth")])

import { ModelManager, FalProvider, AnthropicProvider, ChatMessage } from "blazen";

const mgr = new ModelManager();
await mgr.register("fast", FalProvider.create({ apiKey: "fal-..." }));
await mgr.register("smart", AnthropicProvider.create({ apiKey: "sk-ant-..." }));

const resp = await mgr.complete("fast", [ChatMessage.user("Hello")]);

// Fetch an instance back to use or compose directly:
const smart = await mgr.get("smart");
const detailed = await smart.complete([ChatMessage.user("Explain in depth")]);

use std::sync::Arc;
use blazen_manager::ModelManager;
use blazen_llm::{ChatMessage, ModelRequest};
use blazen_llm::providers::anthropic::AnthropicProvider;
use blazen_llm::providers::fal::FalProvider;

let mgr = ModelManager::with_budgets_gb(64.0, 24.0);
// `None` (no local lifecycle) + `0` bytes: remote providers own no local weights.
mgr.register_provider("fast", Arc::new(FalProvider::new("fal-...")), None, 0).await;
mgr.register_provider("smart", Arc::new(AnthropicProvider::new("sk-ant-...")), None, 0).await;

let resp = mgr
    .complete("fast", ModelRequest::new(vec![ChatMessage::user("Hello")]))
    .await?;

// Fetch the instance back to use or compose directly:
let smart = mgr.get("smart").await.expect("registered");

In the WASM SDK, register takes a generic Model plus a memory estimate — convert a provider instance with provider.toModel() and pass 0 for remote providers:

import { ModelManager, FalProvider, AnthropicProvider, ChatMessage } from "@blazen-dev/wasm";

const mgr = new ModelManager(0);
await mgr.register("fast", new FalProvider({ apiKey: "fal-..." }).toModel(), 0);
await mgr.register("smart", new AnthropicProvider({ apiKey: "sk-ant-..." }).toModel(), 0);

const resp = await mgr.complete("fast", [ChatMessage.user("Hello")]);

Because remote and local providers share the registry, you can mix a hosted API and a local GGUF model under the same manager and route between them by name — the manager handles loading, eviction, and budget accounting for the local entries while remote entries dispatch straight through.

Shorthand: `Model.openai(...)`

Each provider also has a matching static factory on Model — Model.openai(...), Model.anthropic(...), Model.fal(...), Model.groq(...), and so on. Both surfaces wrap the exact same Rust provider, so they’re interchangeable; the factory is just an alternate constructor some people find tidier.

from blazen import Model, ProviderOptions, ChatMessage

model = Model.openai(options=ProviderOptions(api_key="sk-..."))   # same as OpenAiProvider(...)
resp = await model.complete([ChatMessage.user("Hello")])

import { Model, ChatMessage } from "blazen";

const model = Model.openai({ apiKey: "sk-..." });   // same as OpenAiProvider.create(...)
const resp = await model.complete([ChatMessage.user("Hello")]);

In the WASM SDK the Model.x() factories are zero-argument and read keys from environment variables (Model.openai() → OPENAI_API_KEY); use the provider constructor (new OpenAiProvider({ apiKey })) when you need to pass an explicit key.

Models & Providers

Tier 1 — construct a provider, call .complete()

Streaming

Tier 2 — many providers, one ModelManager

Shorthand: Model.openai(...)

See also

Tier 1 — construct a provider, call `.complete()`

Tier 2 — many providers, one `ModelManager`

Shorthand: `Model.openai(...)`