Node.js API Reference
Complete API reference for blazen in Node.js
ChatMessage
A class for building typed chat messages. Supports text, multimodal (image) content, and content parts.
Constructor
new ChatMessage({ role?: string, content?: string, parts?: ContentPart[] })
Create a message from an options object. role defaults to "user" if omitted. Supply either content (text) or parts (multimodal), not both.
// Text message with explicit role
new ChatMessage({ role: "user", content: "Hello" })
// Using the Role enum
new ChatMessage({ role: Role.User, content: "Hello" })
// System message
new ChatMessage({ role: "system", content: "You are a helpful assistant." })
// Multimodal message with content parts
new ChatMessage({
role: "user",
parts: [
{ partType: "text", text: "Describe this image:" },
{ partType: "image", image: { source: { sourceType: "url", url: "https://example.com/photo.jpg" } } }
]
})
Static Factory Methods
| Method | Description |
|---|---|
ChatMessage.system(content: string) | Create a system message |
ChatMessage.user(content: string) | Create a user message |
ChatMessage.assistant(content: string) | Create an assistant message |
ChatMessage.tool(content: string) | Create a tool result message |
ChatMessage.userImageUrl(text: string, url: string, mediaType?: string) | Create a user message with text and an image URL |
ChatMessage.userImageBase64(text: string, data: string, mediaType: string) | Create a user message with text and a base64-encoded image |
ChatMessage.userParts(parts: ContentPart[]) | Create a user message from an explicit list of content parts |
const msg = ChatMessage.user("What is 2 + 2?");
const imgMsg = ChatMessage.userImageUrl(
"What's in this image?",
"https://example.com/photo.jpg",
"image/jpeg"
);
const b64Msg = ChatMessage.userImageBase64(
"Describe this:",
base64Data,
"image/png"
);
Properties
| Property | Type | Description |
|---|---|---|
.role | string | The message role: "system", "user", "assistant", or "tool" |
.content | string | null | The text content of the message, if any. For tool results that returned a plain string, the string lives here (and .toolResult is null). |
.toolResult | ToolOutput | null | Structured tool-result payload. null for non-tool messages or when the tool returned a plain string. When non-null, .toolResult.data is the full structured value the caller should consume; the LLM-facing wire form is derived from .toolResult.llmOverride (if set) or from .toolResult.data via the provider’s default conversion. |
Note on
toolCallId: The Rust core stores atool_call_idon tool messages so providers can correlate a tool result with the originatingToolCall.id. As of this writing the Node binding does not expose a public.toolCallIdgetter onChatMessage— correlation is handled internally by the agent loop and on the wire by each provider. If you need explicit access, file an issue and we can surface it.
Role
String enum for message roles.
Role.System // "system"
Role.User // "user"
Role.Assistant // "assistant"
Role.Tool // "tool"
Can be used interchangeably with plain strings in the ChatMessage constructor.
ContentPart
Types for multimodal message content, used in the parts field of the ChatMessage constructor and in ChatMessage.userParts().
interface ContentPart {
partType: "text" | "image";
text?: string; // Required when partType is "text"
image?: ImageContent; // Required when partType is "image"
}
interface ImageContent {
source: ImageSource;
mediaType?: string; // MIME type, e.g. "image/png"
}
interface ImageSource {
sourceType: "url" | "base64";
url?: string; // Required when sourceType is "url"
data?: string; // Required when sourceType is "base64"
}
// `MediaSource` is a type alias for `ImageSource` re-exported for compute APIs
// that accept any media (image / video frame / audio cover-art) using the same shape.
// All compute requests that take a `MediaSource` accept exactly the same value
// you would pass to `ImageSource` — `MediaSource` exists only as an aliasing affordance.
export type MediaSource = ImageSource;
Note on
MediaSource: preferMediaSourcein compute / generation APIs (image upscaling, video frames, audio cover art) andImageSourcein chat-message APIs. The two are interchangeable at the type level —MediaSourceis just the structurally identical alias forImageSource.
// Text part
{ partType: "text", text: "Describe this image:" }
// Image from URL
{
partType: "image",
image: {
source: { sourceType: "url", url: "https://example.com/photo.jpg" },
mediaType: "image/jpeg"
}
}
// Image from base64
{
partType: "image",
image: {
source: { sourceType: "base64", data: "iVBORw0KGgo..." },
mediaType: "image/png"
}
}
Constructing a provider — direct
The canonical way to talk to a provider is to construct its class directly with
its static create() and call complete() on it. Each provider class wraps the
same Rust provider as the matching Model.x() shorthand, but the class form is
the preferred surface: it is explicit about which provider you are using and
exposes that provider’s full capability set (e.g. OpenAiProvider.textToSpeech,
FalProvider.generateImage).
All providers accept an optional options object containing an apiKey (and other
provider-specific fields). If options is omitted — or apiKey is not set within
it — the key is read from the provider’s standard environment variable
(OPENAI_API_KEY, ANTHROPIC_API_KEY, FAL_KEY, etc.). If model is not set, the
provider’s default model is used.
import { OpenAiProvider, ChatMessage } from "blazen";
// Read key from OPENAI_API_KEY env var
const openai = OpenAiProvider.create();
// Pass an explicit key and override the model
const openaiExplicit = OpenAiProvider.create({ apiKey: "sk-...", model: "gpt-4o" });
const response = await openai.complete([
ChatMessage.system("You are a helpful assistant."),
ChatMessage.user("What is 2 + 2?"),
]);
console.log(response.content); // "4"
Every LLM provider class exposes the same four methods:
provider.complete(messages: ChatMessage[]): Promise<ModelResponse>
provider.completeWithOptions(messages: ChatMessage[], options: ModelOptions): Promise<ModelResponse>
provider.stream(messages: ChatMessage[], onChunk: (chunk) => void): Promise<void>
provider.streamWithOptions(messages: ChatMessage[], onChunk: (chunk) => void, options: ModelOptions): Promise<void>
await openai.streamWithOptions(
[ChatMessage.user("Tell me a story")],
(chunk) => { if (chunk.delta) process.stdout.write(chunk.delta); },
{ temperature: 0.7 }
);
Provider classes
Construct any of these directly with Provider.create(options?):
| Class | Constructor |
|---|---|
OpenAiProvider | OpenAiProvider.create(options?: ProviderOptions) |
AnthropicProvider | AnthropicProvider.create(options?: ProviderOptions) |
GeminiProvider | GeminiProvider.create(options?: ProviderOptions) |
AzureOpenAiProvider | AzureOpenAiProvider.create(options: AzureOptions) |
OpenRouterProvider | OpenRouterProvider.create(options?: ProviderOptions) |
GroqProvider | GroqProvider.create(options?: ProviderOptions) |
CohereProvider | CohereProvider.create(options?: ProviderOptions) |
DeepSeekProvider | DeepSeekProvider.create(options?: ProviderOptions) |
FireworksProvider | FireworksProvider.create(options?: ProviderOptions) |
MistralProvider | MistralProvider.create(options?: ProviderOptions) |
BedrockProvider | BedrockProvider.create(options: BedrockOptions) |
FalProvider | FalProvider.create(options?: FalOptions) |
The OpenAiProvider and FalProvider classes carry extra capabilities beyond
chat — OpenAiProvider.textToSpeech(...), and the full FalProvider
image/video/audio/3D surface. See media generation
for those.
Model
Model is the generic, provider-agnostic chat handle. Its static Model.x()
factories are shorthand constructors that build the same underlying provider
as the classes above; reach for them when you want a uniform Model type (e.g. to
mix providers in Model.withFallback([...]) or to register remote providers in a
ModelManager). For single-provider code, prefer the
direct provider class.
Shorthand constructors (Model.x())
Each returns a generic Model. The same env-var / apiKey / model resolution
rules as the provider classes apply.
| Method | Signature |
|---|---|
Model.openai | (options?: ProviderOptions) |
Model.anthropic | (options?: ProviderOptions) |
Model.gemini | (options?: ProviderOptions) |
Model.azure | (options: AzureOptions) |
Model.fal | (options?: FalOptions) |
Model.openrouter | (options?: ProviderOptions) |
Model.groq | (options?: ProviderOptions) |
Model.together | (options?: ProviderOptions) |
Model.mistral | (options?: ProviderOptions) |
Model.deepseek | (options?: ProviderOptions) |
Model.fireworks | (options?: ProviderOptions) |
Model.perplexity | (options?: ProviderOptions) |
Model.xai | (options?: ProviderOptions) |
Model.cohere | (options?: ProviderOptions) |
Model.bedrock | (options: BedrockOptions) |
Model.ollama | (host: string, port: number, model: string) |
// Read key from OPENAI_API_KEY env var
const model = Model.openai();
// Pass an explicit key and override the model
const claude = Model.anthropic({ apiKey: "sk-ant-...", model: "claude-sonnet-4-20250514" });
const gemini = Model.gemini({ model: "gemini-2.5-flash" });
Properties
| Property | Type | Description |
|---|---|---|
.modelId | string | The model identifier string |
await model.complete(messages: ChatMessage[]): ModelResponse
Perform a chat completion.
const response = await model.complete([
ChatMessage.system("You are a helpful assistant."),
ChatMessage.user("What is 2 + 2?"),
]);
console.log(response.content); // "4"
await model.completeWithOptions(messages: ChatMessage[], options: ModelOptions): ModelResponse
Perform a chat completion with additional options.
const response = await model.completeWithOptions(
[ChatMessage.user("Write a haiku about Rust.")],
{ temperature: 0.7, maxTokens: 100 }
);
await model.stream(messages: ChatMessage[], onChunk: (chunk) => void): void
Stream a chat completion. The callback receives each chunk as it arrives.
await model.stream(
[ChatMessage.user("Tell me a story")],
(chunk) => {
if (chunk.delta) process.stdout.write(chunk.delta);
}
);
Each chunk has the shape:
{
delta?: string; // Text content delta
finishReason?: string; // Set on the final chunk
toolCalls: ToolCall[]; // Tool calls, if any
}
await model.streamWithOptions(messages: ChatMessage[], onChunk: (chunk) => void, options: ModelOptions): void
Stream a chat completion with additional options.
await model.streamWithOptions(
[ChatMessage.user("Explain quantum computing")],
(chunk) => { if (chunk.delta) process.stdout.write(chunk.delta); },
{ temperature: 0.5, maxTokens: 500 }
);
Middleware Decorators
Each decorator returns a new Model wrapping the original with additional behaviour.
model.withRetry(config?: RetryConfig): Model
Automatic retry with exponential backoff on transient failures.
const resilient = model.withRetry({ maxRetries: 5, initialDelayMs: 500, maxDelayMs: 15000 });
| Field | Type | Default | Description |
|---|---|---|---|
maxRetries | number | 3 | Maximum retry attempts. |
initialDelayMs | number | 1000 | Delay before first retry (ms). |
maxDelayMs | number | 30000 | Upper bound on backoff delay (ms). |
model.withCache(config?: CacheConfig): Model
In-memory response cache for identical non-streaming requests.
const cached = model.withCache({ ttlSeconds: 600, maxEntries: 500 });
| Field | Type | Default | Description |
|---|---|---|---|
ttlSeconds | number | 300 | Cache entry TTL in seconds. |
maxEntries | number | 1000 | Maximum entries before eviction. |
Model.withFallback(models: Model[]): Model
Static factory method. Tries providers in order; falls back on transient errors.
const model = Model.withFallback([
Model.openai(),
Model.anthropic(),
]);
ModelOptions
Options object for completeWithOptions() and streamWithOptions().
interface ModelOptions {
temperature?: number; // Sampling temperature (0.0 - 2.0)
maxTokens?: number; // Maximum tokens to generate
topP?: number; // Nucleus sampling parameter
model?: string; // Override the default model ID
tools?: ToolDefinition[]; // Tool definitions for function calling
}
ModelResponse
Returned by model.complete() and model.completeWithOptions().
interface ModelResponse {
content?: string; // The generated text
toolCalls: ToolCall[]; // Tool calls requested by the model
usage?: TokenUsage; // Token usage statistics
model: string; // Model name used for the completion
finishReason?: string; // Why generation stopped ("stop", "tool_calls", etc.)
cost?: number; // Cost in USD, if reported by the provider
timing?: RequestTiming; // Request timing breakdown
images: object[]; // Generated images, if any (provider-specific)
audio: object[]; // Generated audio, if any (provider-specific)
videos: object[]; // Generated videos, if any (provider-specific)
metadata: object; // Raw provider-specific metadata
}
ToolCall
A tool invocation requested by the model.
| Property | Type | Description |
|---|---|---|
.id | string | Unique identifier for the tool call |
.name | string | Name of the tool to invoke |
.arguments | object | Parsed JSON arguments |
ToolOutput
Two-channel return shape for tool handlers. Tool results have two distinct audiences. The caller (your TypeScript code) wants the full structured data; the LLM, on the next turn, may need a different shape — sometimes shorter, sometimes provider-specific. ToolOutput carries both channels.
import type { ToolOutput, LlmPayload } from "blazen";
const out: ToolOutput = {
data: { items: [1, 2, 3] },
};
Properties
| Member | Type | Description |
|---|---|---|
data | any | The structured value the caller sees programmatically. Dict, array, scalar, or string. |
llmOverride | LlmPayload | undefined | Optional override for what the LLM sees on the next turn. undefined means each provider applies its default conversion from data. |
Both llmOverride (camelCase) and llm_override (snake_case) are accepted on input from a JS tool handler, so a hand-written object using either casing will deserialize correctly.
When the agent loop appends a tool result to the conversation, the resulting ChatMessage exposes the structured output via .toolResult:
const last = result.messages[result.messages.length - 1];
// last.toolResult?.data is the full structured payload from your handler.
// last.toolResult?.llmOverride is the override (if any) the LLM saw this turn.
For tool handlers that returned a plain string, last.toolResult is null and the string lives on last.content instead.
LlmPayload
A tagged union describing what the LLM sees for a tool result on the next turn. Used as the llmOverride field of ToolOutput.
import type { LlmPayload } from "blazen";
const text: LlmPayload = { kind: "text", text: "Found 3 results." };
const json: LlmPayload = { kind: "json", value: { items: [1, 2, 3] } };
const parts: LlmPayload = {
kind: "parts",
parts: [
{ partType: "text", text: "Here is the table:" },
{ partType: "text", text: "| col |\n| --- |\n| 1 |" },
],
};
const raw: LlmPayload = {
kind: "provider_raw",
provider: "anthropic",
value: [{ type: "text", text: "Custom Anthropic-shaped payload." }],
};
Variants
kind | Required fields | Behavior |
|---|---|---|
"text" | text | Plain text. Works on every provider. |
"json" | value | Structured JSON. Anthropic and Gemini natively consume the structure; OpenAI-family stringifies once at the wire boundary. |
"parts" | parts (ContentPart[]) | Multimodal content blocks; Anthropic supports natively, OpenAI falls back to text concatenation, Gemini wraps as { parts: [...] }. |
"provider_raw" | provider, value | Provider-specific escape hatch. Only the named provider sees value; every other provider falls back to converting ToolOutput.data with its default. provider is one of "openai", "openai_compat", "azure", "anthropic", "gemini", "responses", "fal". |
Per-provider behavior
When a tool returns structured data and no llmOverride, each provider sends a sensible default to the LLM:
- OpenAI / OpenAI-compat / Azure / Responses / Fal: the data is JSON-stringified into the
contentfield of the tool message. - Anthropic: structured data becomes
[{ type: "text", text: <stringified-json> }]insidetool_result.content. - Gemini: structured object data is passed natively as
functionResponse.response. Scalars wrap as{ result: <scalar> }.
When llmOverride is set, that override always wins for the variants the provider understands; kind: "provider_raw" is the strictest — it’s only honoured when provider matches the active provider, otherwise the provider falls back to converting data with its default.
ToolDefinition
Describes a tool that the model may invoke.
interface ToolDefinition {
name: string; // Unique tool name
description: string; // Human-readable description
parameters: object; // JSON Schema for the tool's parameters
}
const tools: ToolDefinition[] = [
{
name: "getWeather",
description: "Get the current weather for a city",
parameters: {
type: "object",
properties: { city: { type: "string" } },
required: ["city"]
}
}
];
Content Subsystem
Pluggable storage and handle plumbing for multimodal payloads (images, audio, video, documents, 3D models, CAD files). A ContentStore issues opaque handles; tools accept handle-id strings as arguments and Blazen substitutes the resolved typed content before the handler runs.
import {
ContentStore,
imageInput,
audioInput,
videoInput,
fileInput,
threeDInput,
cadInput,
} from "blazen";
import type {
ContentHandle,
ContentKind,
JsContentMetadata,
PutOptions,
} from "blazen";
ContentKind
String enum tagging what a handle refers to. Values match the serde-tag form so they round-trip across any Blazen API that takes a kind string.
const enum ContentKind {
Image = "image",
Audio = "audio",
Video = "video",
Document = "document",
ThreeDModel = "three_d_model",
Cad = "cad",
Archive = "archive",
Font = "font",
Code = "code",
Data = "data",
Other = "other",
}
ContentHandle
Opaque reference returned by ContentStore.put() and consumed by every other store method. Treat id as a black box — store-defined.
interface ContentHandle {
id: string; // Opaque, store-defined identifier
kind: ContentKind; // What kind of content this handle refers to
mimeType?: string; // MIME type if known
byteSize?: number; // Byte size if known (i64 -- napi has no u64)
displayName?: string; // Human-readable display name (e.g. original filename)
}
ContentHandle is a type alias for the underlying JsContentHandle interface.
ContentStore
Pluggable registry for multimodal content. Construct via the static factories; instances are cheap to clone (internally an Arc), so reusing one store across multiple agents and requests is fine.
class ContentStore {
// Factories
static inMemory(): ContentStore;
static localFile(root: string): ContentStore;
static openaiFiles(apiKey: string, baseUrl?: string | null): ContentStore;
static anthropicFiles(apiKey: string, baseUrl?: string | null): ContentStore;
static geminiFiles(apiKey: string, baseUrl?: string | null): ContentStore;
static falStorage(apiKey: string, baseUrl?: string | null): ContentStore;
static custom(options: CustomContentStoreOptions): ContentStore;
// Operations
put(body: Buffer | string, options: PutOptions): Promise<ContentHandle>;
resolve(handle: ContentHandle): Promise<any>; // serialized MediaSource
fetchBytes(handle: ContentHandle): Promise<Buffer>;
metadata(handle: ContentHandle): Promise<JsContentMetadata>;
delete(handle: ContentHandle): Promise<void>;
}
put() accepts either a Buffer (inline bytes uploaded to the store) or a string — interpreted as a URL when it contains "://" (the store records the reference) and as a local filesystem path otherwise (the store reads or copies the file as needed).
resolve() returns a serialized MediaSource — the same JSON shape Blazen’s request builders accept. fetchBytes() is for tools that need to operate on the raw payload (parse a PDF, transcribe audio); most tools reason over the handle and let resolve() produce the wire form. delete() is best-effort — default implementations on most stores are no-ops.
const store = ContentStore.inMemory();
const handle = await store.put(Buffer.from(pngBytes), {
mimeType: "image/png",
kind: ContentKind.Image,
displayName: "diagram.png",
});
const meta = await store.metadata(handle);
const bytes = await store.fetchBytes(handle);
Subclassing ContentStore
ContentStore is subclassable from JavaScript / TypeScript. Override the methods your backend needs; napi-rs wraps your subclass in a Rust adapter that dispatches into your JS async functions via ThreadsafeFunction.
import { ContentStore } from "blazen";
import type { ContentHandle, ContentKind } from "blazen";
class S3ContentStore extends ContentStore {
constructor(bucket: string) {
super();
this.bucket = bucket;
}
async put(body, hint) {
// ...
return { id: "...", kind: "image" };
}
async resolve(handle) {
return { sourceType: "url", url: "..." };
}
async fetchBytes(handle) {
return Buffer.from("...");
}
// Optional:
async fetchStream(handle) { return Buffer.from("..."); }
async delete(handle) { /* no-op */ }
}
Subclasses MUST override put, resolve, fetchBytes. The base-class default impls throw an error so any missing override is a clear failure rather than silent recursion via super().
ContentStore.custom({...})
Callback-based factory. Direct JS mirror of Rust CustomContentStore::builder.
ContentStore.custom(options: {
put: (body: any, hint: any) => Promise<ContentHandle>;
resolve: (handle: ContentHandle) => Promise<any>; // serialized MediaSource
fetchBytes: (handle: ContentHandle) => Promise<Buffer>;
fetchStream?: (handle: ContentHandle) => Promise<Buffer>; // single-chunk for now
delete?: (handle: ContentHandle) => Promise<void>;
name?: string;
}): ContentStore
put, resolve, fetchBytes are required. fetchStream and delete are optional. The body argument arrives as a JS object shaped like {type: "bytes", data: [...]} / {type: "url", url} / {type: "local_path", path} / {type: "provider_file", provider, id} / {type: "stream", stream: AsyncIterable<Uint8Array>, sizeHint: number | null}. The hint has optional mimeType / kindHint / displayName / byteSize.
resolve returns a serialized MediaSource JS object (e.g. {sourceType: "url", url: "..."}). fetchBytes returns a Buffer. fetchStream may return either Buffer / Uint8Array / number[] / base64 string (legacy, single-chunk) or an AsyncIterable<Uint8Array> for true chunk-by-chunk streaming — a Node Readable qualifies since it implements [Symbol.asyncIterator].
PutOptions
Optional hints attached to a put() call. Every field is optional; the store may auto-detect from the bytes when a hint is missing.
interface PutOptions {
mimeType?: string; // MIME type, if known
kind?: ContentKind; // Caller's preferred classification -- overrides any auto-detection
displayName?: string; // Human-readable display name (filename, caption)
byteSize?: number; // Byte size, if known up-front (i64 since napi has no u64)
}
ContentMetadata
Cheap metadata summary returned by ContentStore.metadata(). No bytes are materialized.
interface ContentMetadata {
kind: ContentKind;
mimeType?: string;
byteSize?: number;
displayName?: string;
}
ContentMetadata is a type alias for the underlying JsContentMetadata interface.
Built-in stores
| Factory | Purpose |
|---|---|
ContentStore.inMemory() | Ephemeral in-memory store. Default for tests and short-lived sessions. |
ContentStore.localFile(root) | Filesystem-backed store rooted at root. Directory is created if missing. |
ContentStore.openaiFiles(apiKey, baseUrl?) | Backed by the OpenAI Files API. |
ContentStore.anthropicFiles(apiKey, baseUrl?) | Backed by the Anthropic Files API. |
ContentStore.geminiFiles(apiKey, baseUrl?) | Backed by the Gemini Files API. |
ContentStore.falStorage(apiKey, baseUrl?) | Backed by fal.ai’s storage API. |
ContentStore.custom({...}) | User-defined backend via async callbacks (see above). |
Tool-input schema helpers
Each helper builds a JSON Schema fragment declaring a single required handle-id input. The model emits a handle-id string; Blazen swaps it for the resolved typed content before your tool runs.
| Helper | Declares an input expecting a handle of kind |
|---|---|
imageInput(name, description) | image |
audioInput(name, description) | audio |
videoInput(name, description) | video |
fileInput(name, description) | document |
threeDInput(name, description) | three_d_model |
cadInput(name, description) | cad |
Each call returns the same shape (kind tag varies):
imageInput("photo", "The image to describe");
// =>
// {
// type: "object",
// properties: {
// photo: {
// type: "string",
// description: "The image to describe",
// "x-blazen-content-ref": { kind: "image" }
// }
// },
// required: ["photo"]
// }
The x-blazen-content-ref extension is invisible to providers — they see a plain string parameter — but Blazen’s resolver uses it as a marker for handle substitution.
How resolution works
When the model emits a tool call like {"photo": "blazen_xxx"}, Blazen scans the tool’s parameter schema for x-blazen-content-ref markers. For each marked field, it looks up the handle id in the active ContentStore and replaces the bare string with a typed object before invoking your handler:
{
kind: "image",
handleId: "blazen_xxx",
mimeType: "image/png",
byteSize: 24576,
displayName: "diagram.png",
source: { /* resolved MediaSource -- url, base64, or provider-native ref */ }
}
source is the same wire form ContentStore.resolve() returns, so your handler can forward it straight into a downstream provider request, or call store.fetchBytes(handle) if it needs the raw payload. If the handle is unknown to the store the call fails before the handler runs.
TokenUsage
Token usage statistics for a completion request.
| Property | Type | Description |
|---|---|---|
.promptTokens | number | Tokens in the prompt |
.completionTokens | number | Tokens in the completion |
.totalTokens | number | Total tokens used |
RequestTiming
Timing metadata for a completion request.
| Property | Type | Description |
|---|---|---|
.queueMs | number | undefined | Time spent waiting in queue (ms) |
.executionMs | number | undefined | Time spent executing (ms) |
.totalMs | number | undefined | Total wall-clock time (ms) |
runAgent
Run an agentic tool execution loop. The agent repeatedly calls the model, executes tool calls via the handler callback, feeds results back, and repeats until the model stops calling tools or maxIterations is reached.
const result = await runAgent(model, messages, tools, toolHandler, options?);
Parameters
| Parameter | Type | Description |
|---|---|---|
model | Model | The completion model to use |
messages | ChatMessage[] | Initial conversation messages |
tools | ToolDef[] | Tool definitions the agent can invoke |
toolHandler | (toolName: string, args: object) => Promise<any | ToolOutput> | Callback that executes tool calls. May return a bare value (auto-wrapped) or an explicit ToolOutput. |
options | AgentRunOptions? | Optional configuration |
Example
import { Model, ChatMessage, runAgent } from "blazen";
const model = Model.openai();
const result = await runAgent(
model,
[ChatMessage.user("What is the weather in NYC?")],
[{
name: "getWeather",
description: "Get weather for a city",
parameters: {
type: "object",
properties: { city: { type: "string" } },
required: ["city"]
}
}],
async (toolName, args) => {
if (toolName === "getWeather") {
return { temp: 72, condition: "sunny" };
}
throw new Error(`Unknown tool: ${toolName}`);
},
{ maxIterations: 5 }
);
console.log(result.response.content);
console.log(`Took ${result.iterations} iterations`);
Tool handler return shapes
The handler’s return value is checked for an explicit data key. If present and the value deserializes as a ToolOutput, it’s used directly. Otherwise the bare value is wrapped via ToolOutput { data: <value>, llmOverride: undefined }.
This means an arbitrary user dict like { items: [1, 2, 3] } is treated as plain data, not as a ToolOutput. Only objects with a top-level data field are unpacked.
import { runAgent, type ToolDef } from "blazen";
// Simplest: return a value directly, auto-wrapped.
const search: ToolDef = {
name: "search",
description: "Search for items.",
parameters: { type: "object", properties: { q: { type: "string" } } },
};
async function handlerSimple(toolName: string, args: any) {
if (toolName === "search") {
return { items: [1, 2, 3] }; // wrapped as ToolOutput { data: { items: [1,2,3] } }
}
throw new Error(`Unknown tool: ${toolName}`);
}
// With override: structured ToolOutput so the LLM sees a summary,
// but the caller's `messages[messages.length-1].toolResult.data`
// still has the full list.
async function handlerWithOverride(toolName: string, args: any) {
if (toolName === "search") {
return {
data: { items: [1, 2, 3], rawResponse: "..." },
llmOverride: { kind: "text", text: "Found 3 items." },
};
}
throw new Error(`Unknown tool: ${toolName}`);
}
To return a string to the caller (so it lives on ChatMessage.content and toolResult is null), simply return a string from the handler:
async function handlerString(toolName: string, args: any) {
return "ok"; // appears as ChatMessage.content; toolResult is null
}
See ToolOutput and LlmPayload for the full shape and per-provider wire behaviour.
ToolDef
Describes a tool that the agent may invoke.
interface ToolDef {
name: string; // Unique tool name
description: string; // Human-readable description
parameters: object; // JSON Schema for the tool's parameters
}
AgentRunOptions
Options for configuring an agent run.
interface AgentRunOptions {
maxIterations?: number; // Max tool-calling iterations (default: 10)
systemPrompt?: string; // System prompt prepended to the conversation
temperature?: number; // Sampling temperature (0.0 - 2.0)
maxTokens?: number; // Maximum tokens per completion call
addFinishTool?: boolean; // Add a built-in "finish" tool the model can call to signal completion
}
AgentResult
The result of an agent run, returned by runAgent(). AgentResult is a typed class with getter properties (not a plain object) so it carries identity across the FFI boundary and supports instanceof checks.
Properties
| Property | Type | Description |
|---|---|---|
.response | string | Final assistant text from the last completion. |
.messages | ChatMessage[] | Full message history (all tool calls and results). |
.iterations | number | Number of tool-calling iterations that occurred. |
.totalCost | number | null | Aggregated cost in USD across all iterations, or null if pricing is unknown. |
.toString() | string | Human-readable summary, mirrors the Python __repr__. |
const result = await runAgent(model, [ChatMessage.user("Hi")], tools);
console.log(result.response); // "Hello!"
console.log(result.messages.length); // e.g. 3
console.log(result.iterations); // e.g. 1
console.log(result.totalCost); // e.g. 0.00012 or null
BatchResult
Returned by completeBatch() / completeBatchConfig(). A typed class wrapping per-request outcomes plus aggregates.
Properties
| Property | Type | Description |
|---|---|---|
.responses | (ModelResponse | null)[] | One entry per input request. null for failed requests. |
.errors | (string | null)[] | Per-request error message, or null for successful requests. |
.totalUsage | TokenUsage | null | Aggregated token usage across all successful responses. |
.totalCost | number | null | Aggregated cost in USD across all successful responses. |
.successCount | number | Number of requests that succeeded. |
.failureCount | number | Number of requests that failed. |
.length | number | Total number of requests in the batch (= responses.length = errors.length). |
.toString() | string | Human-readable batch summary. |
const batch = await completeBatch(model, [
[ChatMessage.user("What's 2+2?")],
[ChatMessage.user("Capital of France?")],
]);
console.log(`${batch.successCount}/${batch.length} succeeded`);
for (let i = 0; i < batch.length; i++) {
if (batch.errors[i]) console.error(`req ${i}:`, batch.errors[i]);
else console.log(`req ${i}:`, batch.responses[i]?.content);
}
Always-on agents (Bot)
A Bot is a persistent, event-driven agent: configure it once via BotBuilder, build() it into a running loop, drive it with send(), consume replies via responses(), and tear it down with shutdown(). Unlike runAgent — which runs a single bounded tool-calling loop and returns — a Bot stays alive across many turns, holding a conversation-memory window between messages.
Bot.builder(model: Model): BotBuilder
Static factory that starts a BotBuilder for the given chat Model handle. Equivalent to new BotBuilder(model).
import { Model, Bot } from "blazen";
const model = Model.openai({ apiKey: "sk-..." });
const bot = await Bot.builder(model)
.systemPrompt("You are concise.")
.idleTimeout(300)
.costBudgetUsd(1.0)
.build();
bot.send(text: string): void
Send a message into the bot, driving one agentic turn. Non-blocking: the turn runs on the bot’s event loop; its reply arrives on the responses stream. Throws if the bot’s loop has already exited (after shutdown(), an idle timeout, or a cost-budget breach).
bot.responses(onResponse: (response: BotResponse) => void): void
Stream the bot’s replies to onResponse until the bot shuts down. Returns immediately; the pump runs on the shared runtime, invoking the callback with a BotResponse per reply. Each call subscribes from the current point in time (replies emitted before the call are not replayed), so subscribe before sending the first message to avoid missing its reply.
bot.responses((response) => {
console.log("bot:", response.text);
});
bot.send("Hello!");
bot.shutdown(): void
Shut the bot down: terminate the live event loop. After this, send() will throw.
await bot.snapshot(): string
Capture a serializable snapshot of the bot’s current state (including its persisted conversation memory) as a JSON string, without stopping the loop.
const json = await bot.snapshot();
await fs.promises.writeFile("bot-state.json", json);
BotBuilder
Fluent builder for a Bot. Obtain one with new BotBuilder(model) or Bot.builder(model). All configuration methods return this so they can be chained; the async build() consumes the builder.
import { Model, BotBuilder, type ToolDef } from "blazen";
const tools: ToolDef[] = [{
name: "getWeather",
description: "Get weather for a city",
parameters: {
type: "object",
properties: { city: { type: "string" } },
required: ["city"],
},
}];
const bot = await new BotBuilder(Model.openai())
.systemPrompt("You are a helpful weather assistant.")
.tools(tools, async (toolName, args) => {
if (toolName === "getWeather") return { temp: 72, condition: "sunny" };
throw new Error(`Unknown tool: ${toolName}`);
})
.maxIterations(8)
.historyTokens(4096)
.idleTimeout(300)
.costBudgetUsd(1.0)
.injectTime(true)
.summarize(false)
.build();
Methods
| Method | Signature | Description |
|---|---|---|
systemPrompt | systemPrompt(prompt: string): this | Set the system prompt prepended to every agentic turn. |
tools | tools(tools: ToolDef[], handler: (toolName: string, args: any) => any | Promise<any>): this | Register the tools the bot’s agent may call, dispatched through a single JS handler. The names in each ToolDef must match the names the handler dispatches on. |
maxIterations | maxIterations(n: number): this | Cap the number of agentic tool-call rounds per turn (default 10). |
historyTokens | historyTokens(n: number): this | Cap the conversation-memory window at this many estimated tokens (default 8192). |
idleTimeout | idleTimeout(secs: number): this | Shut the bot down after this many seconds elapse with no message processed. Omit to disable the idle timeout. |
costBudgetUsd | costBudgetUsd(ceiling: number): this | Abort the bot once accumulated LLM cost exceeds this USD ceiling. Omit to disable the budget guard. |
injectTime | injectTime(enabled: boolean): this | When true (the default), prepend the current UTC time to the system prompt each turn and add the current-time tool. |
summarize | summarize(enabled: boolean): this | When true, compact conversation history via LLM summarization before each turn (default false). |
build | await build(): Bot | Build the bot: wire the live workflow and start its event loop. Consumes the builder. |
BotResponse
The bot’s reply for one turn, delivered to the responses() callback.
interface BotResponse {
text: string; // The assistant's reply text for this turn
}
Prompt caching
Many providers can cache a stable prompt prefix (a long system prompt, a shared document, a tool schema) so repeated requests skip re-processing it — cutting latency and input-token cost. Blazen exposes two surfaces: a per-request CachePolicy, and a CacheManager for providers that expose explicit, addressable managed caches.
Caching is on by default: when you omit the cache option, Blazen uses CachePolicy.auto(), which lets the provider apply caching wherever it supports it. Providers that don’t support a given mode silently fall back to auto.
CachePolicy
How a provider should apply prompt/context caching for a request. napi-rs cannot express a data-carrying enum directly, so the policy is a class with static factory methods plus discriminator getters.
| Factory | Description |
|---|---|
CachePolicy.auto() | Let the provider choose the best breakpoints automatically (the default; caching on where supported). |
CachePolicy.off() | Disable prompt caching for the request where the provider allows. |
CachePolicy.explicit(breakpoints: number) | Request breakpoints explicit cache breakpoints at the end of the stable prefix. Providers cap this (e.g. Anthropic allows 4); providers without explicit-breakpoint support treat it as auto. |
CachePolicy.handle(id: string) | Reference a provider-managed cache by id (e.g. a Gemini cachedContents/{id} resource created via CacheManager.create). Providers without managed caches treat it as auto. |
Getters: .kind ("auto", "off", "explicit", or "handle"), .breakpoints (number | null, set when kind === "explicit"), .id (string | null, set when kind === "handle").
Pass a policy via the cache field on the request options. The field accepts either a CachePolicy instance or a plain tagged object such as { kind: "explicit", breakpoints: 2 } / { kind: "handle", id: "cachedContents/abc" }:
import { Model, ChatMessage, CachePolicy } from "blazen";
const model = Model.anthropic();
// Place 2 explicit cache breakpoints at the end of the stable prefix.
const response = await model.completeWithOptions(
[
ChatMessage.system(longReusableSystemPrompt),
ChatMessage.user("First question"),
],
{ cache: CachePolicy.explicit(2) },
);
// Disable caching for a one-off request.
await model.completeWithOptions(
[ChatMessage.user("ephemeral")],
{ cache: CachePolicy.off() },
);
Reading cache token usage
When a provider reports cache activity, it shows up on the response’s TokenUsage. Two extra getters expose it:
| Getter | Type | Description |
|---|---|---|
.cachedInputTokens | number | Tokens served from a prompt cache (a cache hit), if the provider reports them. |
.cacheCreationTokens | number | Tokens written to a prompt cache (a cache write), if the provider reports them. |
const response = await model.complete([ChatMessage.user("Hi")]);
const usage = response.usage;
if (usage) {
console.log("cache hit tokens:", usage.cachedInputTokens);
console.log("cache write tokens:", usage.cacheCreationTokens);
}
CacheManager
Lifecycle manager for provider-managed (explicit) prompt caches — caches that live on the provider as addressable resources (e.g. Gemini cachedContents/{id}) rather than being inlined per request. Construct it from any Model handle. Call supports() first: providers that cache only automatically (or not at all) return false, surface an “unsupported” error from create() / get() / delete(), and return an empty list from list().
import { Model, CacheManager, CachePolicy, ChatMessage } from "blazen";
const model = Model.gemini({ apiKey: "..." });
const caches = new CacheManager(model);
if (await caches.supports()) {
const handle = await caches.create(
{ model: "gemini-2.5-flash", ttlSeconds: 3600 },
[ChatMessage.user("a very large reusable context")],
);
// Reference it on later completion requests via CachePolicy.handle.
const response = await model.completeWithOptions(
[ChatMessage.user("Question about the cached context")],
{ cache: CachePolicy.handle(handle.id) },
);
await caches.delete(handle.id);
}
| Method | Signature | Description |
|---|---|---|
supports | supports(): boolean | Whether the wrapped provider supports explicit managed caches (create/get/delete handles). |
create | await create(request: CacheCreateRequest, messages: ChatMessage[]): CacheHandle | Create a provider-managed cache from the given context. Returns the CacheHandle whose id can be referenced via CachePolicy.handle. Errors if the provider does not support explicit caches. |
get | await get(id: string): CacheHandle | null | Fetch a managed cache by id. Returns null when no cache with that id exists; errors if the provider does not support explicit caches. |
delete | await delete(id: string): void | Delete a managed cache by id. Errors if the provider does not support explicit caches. |
list | await list(): CacheHandle[] | List the managed caches the provider currently holds. Returns an empty list (not an error) for providers without explicit-cache support. |
Per-provider note: explicit managed caches (
CacheManager) are a Gemini-style feature —supports()returnsfalsefor providers that don’t expose them. Anthropic’s caching is breakpoint-based: useCachePolicy.explicit(n)(it allows up to 4 breakpoints). OpenAI-family providers cache the stable prefix automatically underCachePolicy.auto()with no explicit knobs. When a policy mode isn’t supported, the provider falls back toautorather than erroring.
CacheCreateRequest
Request to create a provider-managed cache, passed to CacheManager.create().
interface CacheCreateRequest {
model: string; // Model to bind the cache to
system?: string; // Optional system instruction to cache
ttlSeconds?: number; // TTL in seconds (provider default when omitted)
displayName?: string; // Optional human-friendly display name
}
CacheHandle
A handle to a provider-managed cached context. Returned by CacheManager.create(), .get(), and .list(). Reference the id on later completion requests via CachePolicy.handle(id).
interface CacheHandle {
id: string; // Provider resource id/name (e.g. "cachedContents/abc123")
model: string; // Model the cache is bound to
tokenCount?: number; // Tokens stored in the cache, if reported by the provider
expireTime?: string; // RFC3339 expiry timestamp, if reported by the provider
}
Pipeline
A Pipeline is a sequence of named Stages built with PipelineBuilder. Each stage runs as its own workflow; on completion, an optional persist callback fires with a typed snapshot so the caller can durably store progress for later resumption.
new PipelineBuilder(name: string)
import { PipelineBuilder } from "blazen";
const pipeline = new PipelineBuilder("ingest")
.stage(stageA)
.stage(stageB)
.timeoutPerStage(120)
.build();
.onPersist(callback)
Register a TSFN-based persist callback that receives a typed PipelineSnapshot after each stage completes. The callback must return Promise<void> (or be async). A rejected promise aborts the pipeline with a PipelineError.
import { PipelineBuilder, PipelineSnapshot } from "blazen";
const pipeline = new PipelineBuilder("ingest")
.stage(stage)
.onPersist(async (snapshot: PipelineSnapshot) => {
await db.put(`pipeline:${snapshot.runId}`, snapshot.toJsonPretty());
})
.build();
.onPersistJson(callback)
Same as onPersist, but the callback receives the snapshot pre-serialized as a JSON string. Useful for backends that store opaque blobs (IndexedDB, Redis, S3).
const pipeline = new PipelineBuilder("ingest")
.stage(stage)
.onPersistJson(async (json: string) => {
await idb.put("snapshots", json, runId);
})
.build();
The snapshot can later be replayed via pipeline.resume(PipelineSnapshot.fromJson(json)).
If your onPersist (or onPersistJson) callback throws or returns a rejected promise, the rejection is wrapped as a PersistError (a BlazenError subclass) and propagated to the running pipeline, aborting it. Catch the error from await handler.result() and inspect with instanceof PersistError to distinguish persistence failures from stage failures.
import { PersistError } from "blazen";
try {
await handler.result();
} catch (e) {
if (e instanceof PersistError) {
console.error("snapshot persistence failed:", e.message);
}
}
Workflow
new Workflow(name: string)
Create a new workflow instance. Default timeout is 300 seconds (5 minutes).
const wf = new Workflow("my-workflow");
.addStep(name: string, eventTypes: string[], handler: StepHandler)
Register a step that listens for one or more event types.
wf.addStep("process", ["MyEvent"], async (event, ctx) => {
return { type: "blazen::StopEvent", result: { done: true } };
});
.setTimeout(seconds: number)
Set the maximum execution time for the workflow in seconds. Set to 0 or negative to disable.
wf.setTimeout(30);
await wf.run(input: object): WorkflowResult
Run the workflow to completion with the given input.
const result = await wf.run({ prompt: "Hello" });
await wf.runStreaming(input: object, callback: (event) => void): WorkflowResult
Run the workflow with a streaming callback invoked for each event published via ctx.writeEventToStream().
const result = await wf.runStreaming({ prompt: "Hello" }, (event) => {
console.log("stream:", event);
});
await wf.runWithHandler(input: object): WorkflowHandler
Run the workflow and return a handler for pause/resume and streaming control.
const handler = await wf.runWithHandler({ prompt: "Hello" });
await wf.resume(snapshotJson: string): WorkflowHandler
Resume a previously paused workflow from a snapshot JSON string.
const snapshot = fs.readFileSync("snapshot.json", "utf-8");
const handler = await wf.resume(snapshot);
const result = await handler.result();
WorkflowResult
interface WorkflowResult {
type: string; // Event type of the final result (e.g. "blazen::StopEvent")
data: object; // Result data extracted from the StopEvent's result field
}
StepHandler
async (event: object, ctx: Context) => object | object[] | null
A step handler receives an event and a context. It can return:
- A single event object to emit one event.
- An array of event objects to fan-out multiple events.
nullfor side-effect-only steps that emit no events.
WorkflowHandler
Returned by Workflow.runWithHandler() and Workflow.resume(). Provides control over a running workflow.
Important: result() consumes the handler internally — you can only call it once. The other control methods (pause, resumeInPlace, abort, respondToInput, snapshot) borrow the handler and can be called multiple times.
await handler.result(): WorkflowResult
Await the final workflow result.
await handler.pause(): void
Signal the running workflow to pause. After pausing, use snapshot() to get a serializable snapshot, or resumeInPlace() to continue execution.
await handler.snapshot(): string
Get a serializable snapshot of the paused workflow as a JSON string. Save this to resume later with Workflow.resume().
await handler.resumeInPlace(): void
Resume a paused workflow in place without creating a new handler.
await handler.streamEvents(callback: (event) => void): void
Subscribe to intermediate events published via ctx.writeEventToStream(). Must be called before result() or pause().
const handler = await wf.runWithHandler({ prompt: "Hello" });
// Subscribe to stream events
await handler.streamEvents((event) => console.log(event));
// Then await the result
const result = await handler.result();
Events
Events are plain objects with a type field.
{ type: "MyEvent", payload: "data" }
Start Event
{ type: "blazen::StartEvent", ...input }
The workflow begins by emitting a StartEvent containing the input data.
Stop Event
{ type: "blazen::StopEvent", result: { ... } }
Returning a StopEvent from a step handler completes the workflow.
Context
Shared workflow context accessible by all steps. All methods are async.
StateValue
All context values conform to the StateValue type:
type StateValue = string | number | boolean | null | Buffer | StateValue[] | { [key: string]: StateValue };
await ctx.set(key: string, value: Exclude<StateValue, Buffer>): void
Store a JSON-serializable value in the workflow context. Accepts strings, numbers, booleans, null, arrays, and nested objects. For binary data, use ctx.setBytes() instead.
The legacy ctx.set / ctx.get shortcuts still work and route values through the same 4-tier dispatch. For new code, prefer the explicit ctx.state / ctx.session namespaces documented below.
await ctx.get(key: string): Promise<StateValue | null>
Retrieve a value from the workflow context. Returns null if not found. Returns data for all StateValue variants — strings, numbers, booleans, arrays, objects, and Buffer (if the key was stored via setBytes). No data is silently dropped.
await ctx.setBytes(key: string, buffer: Buffer): void
Store raw binary data in the workflow context. Use this for explicit binary storage (e.g., MessagePack, protobuf, raw buffers). Binary data persists through pause/resume/checkpoint.
await ctx.getBytes(key: string): Buffer | null
Retrieve raw binary data from the workflow context. Returns null if not found. Note that ctx.get() also returns binary data now, so getBytes is mainly useful when you want to assert that a key holds binary content.
await ctx.runId(): string
Get the unique run UUID for the current workflow execution.
await ctx.sendEvent(event: object): void
Manually route an event into the workflow event bus. The event will be delivered to any step whose eventTypes list includes its type.
await ctx.writeEventToStream(event: object): void
Publish an event to the external broadcast stream. Consumers that subscribed via runStreaming or handler.streamEvents() will receive this event. Unlike sendEvent, this does not route the event through the internal step registry.
get state(): StateNamespace
Persistable workflow state. Survives pause() / resume(), checkpoints, and durable storage. See StateNamespace below.
get session(): SessionNamespace
In-process-only values, excluded from snapshots. Use this for things that should not survive pause() / resume(). JS object identity is NOT preserved on Node — see the SessionNamespace caveat below.
StateNamespace
Namespace for persistable workflow state. Values stored via state.set / state.setBytes go into the underlying ContextInner.state map and survive snapshots, pause() / resume(), and checkpoint stores.
await state.set(key: string, value: Exclude<StateValue, Buffer>): Promise<void>
Store a JSON-serializable value under the given key.
await state.get(key: string): Promise<StateValue | null>
Retrieve a value previously stored under the given key. Returns null if not found.
await state.setBytes(key: string, data: Buffer): Promise<void>
Store raw binary data under the given key.
await state.getBytes(key: string): Promise<Buffer | null>
Retrieve raw binary data previously stored under the given key. Returns null if not found.
workflow.addStep("step", ["blazen::StartEvent"], async (event, ctx) => {
await ctx.state.set("counter", 5);
const count = await ctx.state.get("counter");
return { type: "blazen::StopEvent", result: { count } };
});
SessionNamespace
Namespace for in-process-only workflow values. Values stored via session.set are kept in the ContextInner.objects side-channel and are excluded from snapshots. Use this for state that should not survive a pause() / resume() round-trip (request IDs, rate-limit counters, ephemeral caches, …).
Important — napi-rs identity caveat: JS object identity is NOT preserved through ctx.session on the Node bindings. Values are routed through serde_json::Value because napi-rs’s Reference<T> is !Send — its Drop must run on the v8 main thread, and tokio worker threads cannot safely cross the napi boundary with live JS object references. await ctx.session.get(key) returns a plain object equal to the one you passed in, not the same object instance. For true JS class identity preservation, use the Python or WASM bindings, or keep the work inside a single Rust step. Full identity through events is tracked as a follow-up architectural refactor.
The session namespace is still functionally distinct from state: session values are excluded from snapshots, state values are not.
await session.set(key: string, value: unknown): Promise<void>
Store a JSON-serializable value under the given key. The value is excluded from snapshots.
await session.get(key: string): Promise<unknown>
Retrieve a value previously stored under the given key. Returns null if the key does not exist.
await session.has(key: string): Promise<boolean>
Check whether a value exists under the given key.
await session.remove(key: string): Promise<void>
Remove the value stored under the given key.
workflow.addStep("step", ["blazen::StartEvent"], async (event, ctx) => {
await ctx.session.set("reqId", "abc123");
if (await ctx.session.has("reqId")) {
const id = await ctx.session.get("reqId");
console.log("request id:", id);
}
return { type: "blazen::StopEvent", result: {} };
});
state and session use independent keyspaces — the same key can exist in both namespaces without colliding:
await ctx.state.set("k", "state-value");
await ctx.session.set("k", "session-value");
// Both are accessible; they don't collide.
BlazenState
Base class for typed state objects with per-field context storage. Extend this class to define structured workflow state that is automatically serialized and deserialized field by field.
static meta?: BlazenStateMeta
Optional static metadata that controls how fields are stored and which fields are transient.
interface BlazenStateMeta {
transient?: Set<string> | string[];
}
| Field | Type | Description |
|---|---|---|
transient | Set<string> | string[] | Field names to exclude from persistence. These fields are not saved by saveTo() and will not be present after loadFrom(). |
restore?(): void | Promise<void>
Optional instance method called by loadFrom() after all persisted fields have been restored. Use this to recreate transient fields (caches, connections, derived data) that were excluded from persistence.
await state.saveTo(ctx: Context, key: string): Promise<void>
Persist every non-transient field of the state instance into the workflow context. Each field is stored individually under a namespaced key derived from key.
static loadFrom<T>(ctx: Context, key: string): Promise<T>
Restore a state instance from the workflow context. Reads each persisted field, constructs a new instance, and calls restore() if defined.
const state = await AgentState.loadFrom<AgentState>(ctx, "state");
Compute Request Types
Typed request interfaces for compute operations (image generation, video, speech, music, transcription, 3D models).
ImageRequest
interface ImageRequest {
prompt: string; // Text prompt describing the desired image
negativePrompt?: string; // Things to avoid in the image
width?: number; // Desired image width in pixels
height?: number; // Desired image height in pixels
numImages?: number; // Number of images to generate
model?: string; // Model override (provider-specific)
parameters?: object; // Additional provider-specific parameters
}
UpscaleRequest
interface UpscaleRequest {
imageUrl: string; // URL of the image to upscale
scale: number; // Scale factor (e.g. 2.0, 4.0)
model?: string;
parameters?: object;
}
VideoRequest
interface VideoRequest {
prompt: string; // Text prompt describing the desired video
imageUrl?: string; // Source image URL for image-to-video
durationSeconds?: number; // Desired duration in seconds
negativePrompt?: string; // Things to avoid
width?: number; // Video width in pixels
height?: number; // Video height in pixels
model?: string;
parameters?: object;
}
SpeechRequest
interface SpeechRequest {
text: string; // Text to synthesize into speech
voice?: string; // Voice identifier (provider-specific)
voiceUrl?: string; // Reference voice sample URL for cloning
language?: string; // Language code (e.g. "en", "fr", "ja")
speed?: number; // Speech speed multiplier (1.0 = normal)
model?: string;
parameters?: object;
}
MusicRequest
interface MusicRequest {
prompt: string; // Text prompt describing the desired audio
durationSeconds?: number; // Desired duration in seconds
model?: string;
parameters?: object;
}
TranscriptionRequest
interface TranscriptionRequest {
audioUrl: string; // URL of the audio file to transcribe
language?: string; // Language hint (e.g. "en", "fr")
diarize?: boolean; // Whether to perform speaker diarization
model?: string;
parameters?: object;
}
ThreeDRequest
interface ThreeDRequest {
prompt?: string; // Text prompt describing the desired 3D model
imageUrl?: string; // Source image URL for image-to-3D
format?: string; // Output format (e.g. "glb", "obj", "usdz")
model?: string;
parameters?: object;
}
Compute Result Types
ImageResult
interface ImageResult {
images: GeneratedImage[]; // Generated or upscaled images
timing?: ComputeTiming; // Request timing breakdown
cost?: number; // Cost in USD
metadata: object; // Provider-specific metadata
}
VideoResult
interface VideoResult {
videos: GeneratedVideo[];
timing?: ComputeTiming;
cost?: number;
metadata: object;
}
AudioResult
interface AudioResult {
audio: GeneratedAudio[];
timing?: ComputeTiming;
cost?: number;
metadata: object;
}
TranscriptionResult
interface TranscriptionResult {
text: string; // Full transcribed text
segments: TranscriptionSegment[]; // Time-aligned segments
language?: string; // Detected or specified language code
timing?: ComputeTiming;
cost?: number;
metadata: object;
}
TranscriptionSegment
interface TranscriptionSegment {
text: string; // Transcribed text for this segment
start: number; // Start time in seconds
end: number; // End time in seconds
speaker?: string; // Speaker label (if diarization enabled)
}
ThreeDResult
interface ThreeDResult {
models: Generated3DModel[];
timing?: ComputeTiming;
cost?: number;
metadata: object;
}
Compute Job Types
Low-level types for generic compute jobs.
ComputeRequest
interface ComputeRequest {
model: string; // Model/endpoint to run (e.g. "fal-ai/flux/dev")
input: object; // Input parameters (model-specific)
webhook?: string; // Webhook URL for async completion notification
}
JobHandle
interface JobHandle {
id: string; // Provider-assigned job identifier
provider: string; // Provider name (e.g. "fal", "replicate", "runpod")
model: string; // Model/endpoint that was invoked
submittedAt: string; // ISO 8601 timestamp
}
JobStatus
String enum for compute job status.
JobStatus.Queued // "queued"
JobStatus.Running // "running"
JobStatus.Completed // "completed"
JobStatus.Failed // "failed"
JobStatus.Cancelled // "cancelled"
ComputeResult
interface ComputeResult {
job?: JobHandle; // Job handle that produced this result
output: object; // Output data (model-specific)
timing?: ComputeTiming; // Request timing breakdown
cost?: number; // Cost in USD
metadata: object; // Raw provider-specific metadata
}
ComputeTiming
interface ComputeTiming {
queueMs?: number; // Time spent waiting in queue (ms)
executionMs?: number; // Time spent executing (ms)
totalMs?: number; // Total wall-clock time (ms)
}
Media Output Types
MediaOutput
A single piece of generated media content.
interface MediaOutput {
url?: string; // URL where the media can be downloaded
base64?: string; // Base64-encoded media data
rawContent?: string; // Raw text content (SVG, OBJ, GLTF JSON)
mediaType: string; // MIME type (e.g. "image/png", "video/mp4")
fileSize?: number; // File size in bytes
metadata: object; // Provider-specific metadata
}
GeneratedImage
interface GeneratedImage {
media: MediaOutput;
width?: number; // Image width in pixels
height?: number; // Image height in pixels
}
GeneratedVideo
interface GeneratedVideo {
media: MediaOutput;
width?: number; // Video width in pixels
height?: number; // Video height in pixels
durationSeconds?: number; // Duration in seconds
fps?: number; // Frames per second
}
GeneratedAudio
interface GeneratedAudio {
media: MediaOutput;
durationSeconds?: number; // Duration in seconds
sampleRate?: number; // Sample rate in Hz
channels?: number; // Number of audio channels
}
Generated3DModel
interface Generated3DModel {
media: MediaOutput;
vertexCount?: number; // Total vertex count
faceCount?: number; // Total face/triangle count
hasTextures: boolean; // Whether the model includes textures
hasAnimations: boolean; // Whether the model includes animations
}
EmbeddingModel
Generate vector embeddings from text. Created via static factory methods. Keys are read from environment variables (OPENAI_API_KEY, TOGETHER_API_KEY, etc.) when options is omitted, or can be passed explicitly via { apiKey: "..." }.
import { EmbeddingModel } from "blazen";
const model = EmbeddingModel.openai();
const together = EmbeddingModel.together();
const cohere = EmbeddingModel.cohere({ apiKey: "co-..." });
const fireworks = EmbeddingModel.fireworks();
Provider Factory Methods
| Method | Default Model | Default Dimensions |
|---|---|---|
EmbeddingModel.openai(options?) | text-embedding-3-small | 1536 |
EmbeddingModel.together(options?) | togethercomputer/m2-bert-80M-8k-retrieval | 768 |
EmbeddingModel.cohere(options?) | embed-v4.0 | 1024 |
EmbeddingModel.fireworks(options?) | nomic-ai/nomic-embed-text-v1.5 | 768 |
Properties
| Property | Type | Description |
|---|---|---|
.modelId | string | The model identifier. |
.dimensions | number | Output vector dimensionality. |
await model.embed(texts: string[]): EmbeddingResponse
Embed one or more texts, returning one vector per input.
const response = await model.embed(["Hello", "World"]);
console.log(response.embeddings.length); // 2
console.log(response.embeddings[0].length); // 1536
EmbeddingResponse
Returned by EmbeddingModel.embed().
interface EmbeddingResponse {
embeddings: number[][]; // One vector per input text
model: string; // Model that produced the embeddings
usage?: TokenUsage; // Token usage statistics
cost?: number; // Estimated cost in USD
timing?: RequestTiming; // Request timing breakdown
metadata: object; // Provider-specific metadata
}
Token Estimation
Lightweight token counting functions. Uses a heuristic (~3.5 characters per token) suitable for budget checks without external data files.
estimateTokens(text: string, contextSize?: number): number
Estimate token count for a text string.
import { estimateTokens } from "blazen";
const count = estimateTokens("Hello, world!"); // 4
countMessageTokens(messages: ChatMessage[], contextSize?: number): number
Estimate total tokens for an array of chat messages, including per-message overhead.
import { countMessageTokens, ChatMessage } from "blazen";
const count = countMessageTokens([
ChatMessage.system("You are helpful."),
ChatMessage.user("Hello!"),
]);
contextSize defaults to 128000 if omitted.
Subclassable Providers
Model, EmbeddingModel, and Transcription can be subclassed to implement custom providers. Override the relevant methods and the framework will dispatch to your implementation.
Model
import { Model, ChatMessage } from "blazen";
class MyLLM extends Model {
constructor() {
super({ modelId: "my-llm" });
}
async complete(messages: ChatMessage[]) {
// Your inference logic here
return { content: "Hello from my custom model" };
}
async stream(messages: ChatMessage[], onChunk: (chunk: any) => void) {
onChunk({ delta: "Hello", finishReason: null, toolCalls: [] });
onChunk({ delta: null, finishReason: "stop", toolCalls: [] });
}
}
const model = new MyLLM();
const response = await model.complete([ChatMessage.user("Hi")]);
EmbeddingModel
import { EmbeddingModel } from "blazen";
class MyEmbedder extends EmbeddingModel {
constructor() {
super({ modelId: "my-embedder", dimensions: 128 });
}
async embed(texts: string[]) {
return {
embeddings: texts.map(() => new Array(128).fill(0.1)),
model: "my-embedder",
};
}
}
Transcription
import { Transcription } from "blazen";
class MyTranscriber extends Transcription {
constructor() {
super({ providerId: "my-stt" });
}
async transcribe(request: any) {
return { text: "transcribed text", segments: [] };
}
}
Per-Capability Provider Classes
Seven provider base classes let you implement a single compute capability without dealing with the full ComputeProvider interface. Subclass and override the relevant methods.
| Class | Methods to Override | Rust Trait |
|---|---|---|
TTSProvider | textToSpeech(request) | AudioGeneration |
MusicProvider | generateMusic(request), generateSfx(request) | AudioGeneration |
ImageProvider | generateImage(request), upscaleImage(request) | ImageGeneration |
VideoProvider | textToVideo(request), imageToVideo(request) | VideoGeneration |
ThreeDProvider | generate3d(request) | ThreeDGeneration |
BackgroundRemovalProvider | removeBackground(request) | BackgroundRemoval |
VoiceProvider | cloneVoice(request), listVoices(), deleteVoice(voice) | VoiceCloning |
Constructor
All provider classes share the same constructor config:
new TTSProvider({
providerId: string,
baseUrl?: string,
pricing?: ModelPricing,
memoryEstimateBytes?: number,
})
| Field | Type | Description |
|---|---|---|
providerId | string | Identifier for the provider instance. |
baseUrl | string? | Optional base URL for the provider API. |
pricing | ModelPricing? | Optional pricing info for cost tracking. |
memoryEstimateBytes | number? | Estimated memory footprint in bytes (host RAM if on CPU, GPU VRAM otherwise). Used by ModelManager to charge the appropriate pool. |
Example
import { TTSProvider } from "blazen";
class ElevenLabsTTS extends TTSProvider {
private apiKey: string;
constructor(apiKey: string) {
super({ providerId: "elevenlabs" });
this.apiKey = apiKey;
}
async textToSpeech(request: any) {
// Call ElevenLabs API with this.apiKey
return { audio: audioBuffer, format: "mp3" };
}
}
const tts = new ElevenLabsTTS("sk-...");
const result = await tts.textToSpeech({ text: "Hello world", voice: "alice" });
MemoryBackend
Base class for custom memory storage backends. Subclass to implement persistence backed by Postgres, SQLite, DynamoDB, or any other store.
import { MemoryBackend } from "blazen";
class PostgresBackend extends MemoryBackend {
async put(entry: any): Promise<void> {
// Insert or update entry in Postgres
}
async get(id: string): Promise<any | null> {
// Retrieve entry by id
}
async delete(id: string): Promise<boolean> {
// Delete entry, return true if it existed
}
async list(): Promise<any[]> {
// Return all entries
}
async len(): Promise<number> {
// Return count of entries
}
async searchByBands(bands: any, limit: number): Promise<any[]> {
// Return candidates sharing LSH bands with the query
}
}
Methods to Override
| Method | Signature | Description |
|---|---|---|
put | async put(entry): Promise<void> | Insert or update a stored entry. |
get | async get(id: string): Promise<any | null> | Retrieve a stored entry by id. |
delete | async delete(id: string): Promise<boolean> | Delete an entry by id. Returns true if it existed. |
list | async list(): Promise<any[]> | Return all stored entries. |
len | async len(): Promise<number> | Return the number of stored entries. |
searchByBands | async searchByBands(bands, limit): Promise<any[]> | Return candidate entries sharing at least one LSH band. |
ProgressCallback
Subclassable base for download progress callbacks. Pass an instance to ModelCache.download() (and other download-capable APIs) to receive byte-count progress updates.
onProgress takes bigint byte counts so multi-gigabyte downloads keep full precision. total is null when the server does not send Content-Length.
import { ModelCache, ProgressCallback } from "blazen";
class MyProgress extends ProgressCallback {
onProgress(downloaded: bigint, total?: bigint | null): void {
if (total != null) {
const pct = Number((downloaded * 100n) / total);
console.log(`${pct}%`);
} else {
console.log(`${downloaded} bytes`);
}
}
}
const cache = ModelCache.create();
await cache.download("bert-base-uncased", "config.json", new MyProgress());
The base onProgress always throws — overriding it is mandatory. super() must be called from the subclass constructor.
ProgressCallback instances are accepted anywhere the SDK exposes a download hook — ModelCache.download(), the local-inference Provider.create() paths that pull weights from HuggingFace, and the ProgressCallback-aware variants of dataset loaders. Pass the same ProgressCallback instance to multiple downloads to centralise reporting (e.g. for a TUI progress bar).
ModelManager
A ModelManager is a named registry for providers — local and remote — that
you dispatch against by id. Register a provider once under a string id, then call
complete(id, ...) / stream(id, ...) from anywhere, or fetch it back with
get(id) to use or compose it directly. This is the natural way to scale a single
provider up to a fleet: one place that holds “the cheap model”, “the smart model”,
“the local model”, each addressable by name.
import { ModelManager, Model, FalProvider, ChatMessage } from "blazen";
const manager = new ModelManager();
// Remote providers register as dispatch-only entries (no memory budget).
await manager.register("fast", Model.openai());
await manager.register("smart", Model.anthropic());
await manager.register("image", FalProvider.create());
// Dispatch by name from anywhere.
const resp = await manager.complete("smart", [ChatMessage.user("Explain CRDTs.")]);
console.log(resp.content);
// Stream by name (callback per chunk).
await manager.stream("fast", [ChatMessage.user("Count to 5")], (chunk) => {
if (chunk.delta) process.stdout.write(chunk.delta);
});
// Fetch a registered provider to use or compose directly.
const smart = await manager.get("smart");
const fast = await manager.get("fast");
const fallback = Model.withFallback([fast, smart]);
The same registry is also a per-pool VRAM/RAM budgeter for local weights: when
you register a local provider (mistral.rs, llama.cpp, candle, or a Hugging Face
repo via loadFromHf), the manager tracks its memory footprint per pool (host RAM,
or per-GPU VRAM) and applies LRU eviction within a pool when a load would exceed the
budget. Remote providers own no local weights, so they never count against a budget.
A CPU embedder never evicts a GPU LLM, and vice versa.
ModelManageris a memory budget bookkeeper, not a performance scheduler. It answers “will this fit?” — not “will this run fast?”. Whether a 70B model loaded on CPU is useful at 1–3 tok/s is a workload-choice question the manager intentionally does not answer.
Constructor
import { ModelManager } from "blazen";
// Common case: separate CPU and GPU budgets, in gigabytes.
const manager = new ModelManager({ cpuRamGb: 100, gpuVramGb: 24 });
// Explicit per-pool budgets in bytes (BigInt). Pool labels: "cpu", "gpu", "gpu:N".
const manager = new ModelManager({
poolBudgets: {
"cpu": 100_000_000_000n,
"gpu:0": 24_000_000_000n,
},
});
// No-arg / empty-object: BOTH the "cpu" and "gpu:0" pools default to
// u64::MAX (unlimited-budget sentinel). Useful for tests and ad-hoc scripts.
const manager = new ModelManager();
| Field | Type | Description |
|---|---|---|
cpuRamGb | number? | Host RAM budget in gigabytes for the "cpu" pool. |
gpuVramGb | number? | VRAM budget in gigabytes for the "gpu:0" pool. Omit if you don’t load GPU models. |
poolBudgets | Record<string, bigint>? | Explicit per-pool byte budgets. Keys are pool labels ("cpu", "gpu", "gpu:N"); values are bigint byte budgets. Use this when you need budgets above 4 GiB on a per-pool basis or multi-GPU setups. |
Methods
Registry / dispatch — the primary surface:
| Method | Signature | Description |
|---|---|---|
register | await manager.register(id, model: Model, memoryEstimateBytes?: bigint) | Register a provider under id. Remote providers (Model.openai(), FalProvider.create(), …) register as dispatch-only entries and never count against a budget; local in-process providers (mistral.rs, llama.cpp, candle) additionally participate in load/unload + per-pool LRU eviction, with memoryEstimateBytes reporting their footprint. The pool charged is determined by the model’s device() (defaults to "cpu"). |
complete | await manager.complete(id, messages): Promise<ModelResponse> | Run a chat completion against the provider registered under id. Local entries auto-load on first use; remote entries dispatch straight through. Throws if id is unknown or was registered for lifecycle only. |
stream | await manager.stream(id, messages, onChunk) | Streaming counterpart to complete. onChunk receives each chunk as a typed StreamChunk. |
get | await manager.get(id): Promise<Model | null> | Fetch the chat provider registered under id to use or compose directly. Returns null if id is unknown or was registered for lifecycle only. |
Lifecycle / budgeting — for local weights:
| Method | Signature | Description |
|---|---|---|
registerLocalModel | await manager.registerLocalModel(id, load, unload, isLoaded?, memoryEstimateBytes?: bigint, device?: string) | Register an arbitrary JS-managed resource from raw lifecycle closures (not dispatchable via complete). The optional device string (default "cpu") selects which pool to charge — e.g. "cuda:0" charges "gpu:0". |
load | await manager.load(id) | Load a model, evicting LRU models in the same pool if needed. |
unload | await manager.unload(id) | Unload a model and free its memory. |
isLoaded | await manager.isLoaded(id): boolean | Check if a model is currently loaded. |
ensureLoaded | await manager.ensureLoaded(id) | Alias for load(). |
usedBytes | await manager.usedBytes(pool?: string): bigint | Bytes currently used by loaded models in the given pool. pool defaults to "cpu". Invalid pool labels reject with invalid pool label '<x>': expected 'cpu', 'gpu', or 'gpu:N' where N is a non-negative integer. |
availableBytes | await manager.availableBytes(pool?: string): bigint | Bytes still available within the given pool’s budget. Same default and validation rules as usedBytes. |
pools | manager.pools(): Array<{ pool: string; budgetBytes: bigint }> | Sync. List every configured pool and its byte budget. |
status | await manager.status(): ModelStatus[] | Status of all registered models. |
ModelStatus
interface ModelStatus {
id: string; // Model identifier
loaded: boolean; // Whether the model is currently loaded
memoryEstimateBytes: bigint; // Estimated memory footprint in bytes
pool: string; // Pool label this model is charged to ("cpu", "gpu:0", ...)
}
Why
bigint? The byte-budget surface (poolBudgetsvalues,register’smemoryEstimateBytes,usedBytes(),availableBytes(),ModelStatus.memoryEstimateBytes) used to benumber(u32on the Rust side), which capped budgets at ~4 GiB and silently truncated larger inputs — a real footgun for 7B+ local models that need 8 GiB+ of memory. Pass values asBigIntliterals (8n * 1_073_741_824n) or viaBigInt(8 * 1024 ** 3). ThecpuRamGb/gpuVramGb: numberconstructor path is unchanged for users who prefer plain numbers and gigabyte granularity.
ModelRegistry
An ABC for model catalogs. Subclass it to advertise the models your code knows about — used by capability-discovery code, dynamic model menus, and parity with the Rust blazen_llm::traits::ModelRegistry trait.
export declare class ModelRegistry {
listModels(): Promise<ModelInfo[]>;
getModel(modelId: string): Promise<ModelInfo | null>;
}
| Method | Signature | Description |
|---|---|---|
listModels | async listModels(): Promise<ModelInfo[]> | Return every model the registry advertises. |
getModel | async getModel(modelId: string): Promise<ModelInfo | null> | Look up a single model by id, or return null if unknown. |
The base implementations of both methods throw — overriding them is mandatory. super() must be called from the subclass constructor.
import { ModelRegistry } from "blazen";
import type { ModelInfo } from "blazen";
class MyRegistry extends ModelRegistry {
async listModels(): Promise<ModelInfo[]> {
return [
{ id: "gpt-4o", provider: "openai" /* ...other ModelInfo fields */ },
{ id: "claude-sonnet-4", provider: "anthropic" /* ... */ },
];
}
async getModel(modelId: string): Promise<ModelInfo | null> {
const all = await this.listModels();
return all.find((m) => m.id === modelId) ?? null;
}
}
See the ModelInfo reference for the full set of fields each entry must populate (id, provider, capabilities, context window, pricing, etc.).
Mirrors PyModelRegistry (Python) and WasmModelRegistry exposed as ModelRegistry (WASM SDK) — subclassing ModelRegistry in any binding produces the same Rust-side blazen_llm::traits::ModelRegistry implementation.
ModelPricing and Pricing Functions
ModelPricing
Pricing metadata for cost tracking.
interface ModelPricing {
inputPerMillion?: number; // Cost per million input tokens (USD)
outputPerMillion?: number; // Cost per million output tokens (USD)
perImage?: number; // Cost per generated image (USD)
perSecond?: number; // Cost per second of compute (USD)
}
registerPricing()
Register custom pricing for a model. Overrides any existing pricing for the same model ID.
import { registerPricing } from "blazen";
registerPricing("my-model", { inputPerMillion: 1.0, outputPerMillion: 2.0 });
lookupPricing()
Look up pricing for a model by ID. Returns null if the model is unknown.
import { lookupPricing } from "blazen";
const pricing = lookupPricing("gpt-4o");
if (pricing) {
console.log(`Input: $${pricing.inputPerMillion}/M tokens`);
}
LocalModel Methods on Model
Model instances backed by local inference (not remote APIs) support explicit load/unload lifecycle management.
| Method | Signature | Description |
|---|---|---|
load | await model.load(): void | Load the model into memory (host RAM if on CPU, GPU VRAM otherwise). Idempotent. |
unload | await model.unload(): void | Free the model’s memory. Idempotent. |
isLoaded | await model.isLoaded(): boolean | Whether the model is currently loaded. |
memoryBytes | await model.memoryBytes(): number | null | Approximate memory footprint in bytes (host RAM if on CPU, GPU VRAM otherwise), or null if unknown. |
device | await model.device(): string | Return the device string this model targets ('cpu', 'cuda:0', 'metal', etc.). Determines which pool the manager charges. Subclasses must override — the base implementation throws. |
// For a local model:
await model.load();
console.log(await model.isLoaded()); // true
console.log(await model.memoryBytes()); // e.g. 4000000000
console.log(await model.device()); // e.g. "cuda:0"
await model.unload();
Error Handling
Errors thrown across the FFI boundary are surfaced as instances of typed BlazenError subclasses. Every error class extends the base BlazenError, which extends the standard JavaScript Error, so existing instanceof Error checks keep working while gaining structural classification.
The BlazenError hierarchy is what makes typed error routing possible — any caught value can be matched against BlazenError (catch-all for anything from the SDK), against the direct subclass (broad category like RateLimitError), or against a leaf class (specific failure like LlamaCppModelLoadError). Use whichever level of specificity your handler needs.
Root and direct subclasses
BlazenError is the root. The following 18 classes extend it directly:
| Class | When it’s thrown |
|---|---|
AuthError | Invalid or expired API key. |
RateLimitError | Provider rate limit reached. |
TimeoutError | Request exceeded its deadline. |
ValidationError | Invalid request parameters or option set. |
ContentPolicyError | Content moderated by the provider. |
ProviderError | Provider-specific error (HTTP status / endpoint detail attached — see below). |
UnsupportedError | Feature not supported by the chosen provider. |
ComputeError | Compute job failure (cancelled, quota exhausted, runtime failure). |
MediaError | Invalid or oversized media content. |
PeerEncodeError | Failed to encode a peer envelope. |
PeerTransportError | Network failure between peers. |
PeerEnvelopeVersionError | Peer protocol version mismatch. |
PeerWorkflowError | Remote peer workflow failed. |
PeerTlsError | TLS handshake or cert validation failed for a peer connection. |
PeerUnknownStepError | Peer requested an unknown workflow step. |
PersistError | Snapshot persistence backend failure. |
PromptError | Prompt registry / template failure. |
MemoryError | Memory store / embedder failure. |
CacheError | Model cache / download failure. |
ProviderError structured fields
ProviderError carries structured context in addition to the message string. All fields are nullable.
| Field | Type | Description |
|---|---|---|
.provider | string | null | Provider name (e.g. "openai", "anthropic"). |
.status | number | null | HTTP status code, when the call reached the provider. |
.endpoint | string | null | The endpoint that returned the error. |
.requestId | string | null | Provider-assigned request id (use this when filing support tickets). |
.detail | string | null | Provider-supplied error detail / body. |
.retryAfterMs | number | null | Suggested back-off when the provider returned a Retry-After hint. |
Per-backend ProviderError subclasses
Each local-inference and provider-side backend has its own ProviderError subclass with narrower variants. Use instanceof to route to backend-specific handling.
| Class | Backend | Representative narrower subclasses |
|---|---|---|
LlamaCppError | llama.cpp | LlamaCppInvalidOptionsError, LlamaCppModelLoadError, LlamaCppInferenceError, LlamaCppEngineNotAvailableError |
MistralRsError | mistral.rs | MistralRsInvalidOptionsError, MistralRsInitError, MistralRsInferenceError, MistralRsEngineNotAvailableError |
CandleLlmError | candle (LLM) | CandleLlmInvalidOptionsError, CandleLlmModelLoadError, CandleLlmInferenceError, CandleLlmEngineNotAvailableError |
CandleEmbedError | candle (embeddings) | CandleEmbedModelLoadError, CandleEmbedEmbeddingError, CandleEmbedEngineNotAvailableError, CandleEmbedTaskPanickedError |
WhisperError | whisper.cpp | WhisperModelLoadError, WhisperTranscriptionError, WhisperEngineNotAvailableError, WhisperIoError |
PiperError | Piper TTS | PiperModelLoadError, PiperSynthesisError, PiperEngineNotAvailableError |
DiffusionError | diffusion image gen | DiffusionModelLoadError, DiffusionGenerationError |
FastEmbedError | fastembed | EmbedUnknownModelError, EmbedInitError, EmbedEmbedError, EmbedMutexPoisonedError, EmbedTaskPanickedError |
TractError | tract ONNX runtime | (no narrower variants) |
PromptError similarly has narrower variants like PromptMissingVariableError, PromptNotFoundError, PromptVersionNotFoundError, PromptIoError, PromptYamlError, PromptJsonError, PromptValidationError. MemoryError exposes MemoryNoEmbedderError, MemoryEmbeddingError, MemoryNotFoundError, MemorySerializationError, MemoryIoError, MemoryBackendError. CacheError exposes DownloadError, CacheDirError, IoError. There are around 80 narrower subclasses in total — every public Rust error variant gets its own JS class.
enrichError(err: unknown): unknown
Plain Error instances thrown across the FFI boundary lose their original Rust type. Pass any caught value through enrichError to re-classify it into the proper BlazenError subclass before further inspection. It’s a no-op when the error is already typed.
import { enrichError, BlazenError } from "blazen";
try {
await model.complete([ChatMessage.user("Hi")]);
} catch (raw) {
const err = enrichError(raw);
if (err instanceof BlazenError) {
// typed handling
}
throw err;
}
Example: routing on the typed hierarchy
import {
RateLimitError, AuthError, TimeoutError,
ProviderError, LlamaCppEngineNotAvailableError,
} from "blazen";
try {
const response = await model.complete([ChatMessage.user("Hello")]);
} catch (e) {
if (e instanceof RateLimitError) {
await sleep(e instanceof ProviderError && e.retryAfterMs ? e.retryAfterMs : 1000);
} else if (e instanceof AuthError) {
refreshApiKey();
} else if (e instanceof TimeoutError) {
// safe to retry once
} else if (e instanceof LlamaCppEngineNotAvailableError) {
console.error("Build was compiled without llama.cpp support");
} else if (e instanceof ProviderError) {
console.error(`[${e.provider} ${e.status ?? "?"}] ${e.detail ?? e.message}`);
} else {
throw e;
}
}
RateLimitError, TimeoutError, transient ProviderErrors with status >= 500, and most PeerTransportErrors are safe to retry. AuthError, ValidationError, ContentPolicyError, and UnsupportedError are not.
When to call enrichError
The Rust core always emits typed BlazenError subclasses, but errors that originate in JS callbacks (tool handlers, persist callbacks, custom providers) and bubble back through Rust come out as plain Error instances. If you want uniform instanceof BlazenError matching everywhere, run every caught value through enrichError at the catch site:
import { enrichError, BlazenError, ProviderError } from "blazen";
async function safeCall<T>(fn: () => Promise<T>): Promise<T> {
try {
return await fn();
} catch (raw) {
const err = enrichError(raw);
if (err instanceof ProviderError && err.retryAfterMs) {
await new Promise(r => setTimeout(r, err.retryAfterMs!));
}
throw err;
}
}
enrichError is idempotent — passing it an already-typed BlazenError returns the same value unchanged, so it’s safe to layer.
Local Inference Types
Local backends (MistralRsProvider, LlamaCppProvider, CandleLlmProvider) expose typed input/output classes for direct use without going through the generic Model surface. Two parallel families exist — un-prefixed * for mistral.rs (the canonical surface) and LlamaCpp* for llama.cpp — plus a single CandleInferenceResult for the candle backend.
mistral.rs (canonical, un-prefixed)
| Class / enum | Purpose |
|---|---|
ChatMessageInput | Inference-side chat message. Constructor: new ChatMessageInput(role, text, images?); static ChatMessageInput.fromText(role, text). Getters: .role, .text, .images, .hasImages. |
ChatRole | Const enum: System, User, Assistant, Tool. |
InferenceImage | Image attachment for vision-capable models. Static factories: fromBytes(buf), fromPath(path), fromSource(src). |
InferenceImageSource | Tagged-union source. Static factories: bytes(buf), path(p). Getters: .kind ("bytes" or "path"), .data, .filePath. |
InferenceResult | Non-streaming result. Getters: .content, .reasoningContent, .toolCalls, .finishReason, .model, .usage. |
InferenceChunk | Streaming chunk. Getters: .delta, .reasoningDelta, .toolCalls, .finishReason. |
InferenceChunkStream | Async chunk source. Pull with await stream.next(); returns null when exhausted. |
InferenceToolCall | Tool call requested by the model. Constructor new InferenceToolCall(id, name, arguments); getters .id, .name, .arguments (JSON string). |
InferenceUsage | Token usage. Getters: .promptTokens, .completionTokens, .totalTokens, .totalTimeSec. |
import { ChatMessageInput, ChatRole, InferenceImage, InferenceChunkStream, MistralRsProvider } from "blazen";
const provider = await MistralRsProvider.create({ modelId: "..." });
const stream: InferenceChunkStream = await provider.inferStream([
ChatMessageInput.fromText(ChatRole.User, "Describe this image"),
]);
for (let chunk = await stream.next(); chunk !== null; chunk = await stream.next()) {
process.stdout.write(chunk.delta ?? "");
}
InferenceChunkStream is single-pass — once you’ve reached the terminating null, the stream is exhausted. Errors raised from inside InferenceChunkStream.next() are typed BlazenError subclasses (MistralRsInferenceError, etc.) so they can be matched alongside the rest of the error hierarchy.
llama.cpp (LlamaCpp prefix)
The llama.cpp surface mirrors the mistral.rs one with a narrower feature set (no reasoning content, no images on the message input itself).
| Class / enum | Purpose |
|---|---|
LlamaCppChatMessageInput | Constructor: new LlamaCppChatMessageInput(role, text). Getters: .role, .text. |
LlamaCppChatRole | Const enum: System, User, Assistant, Tool (capitalised — distinct from ChatRole). |
LlamaCppInferenceResult | Non-streaming result. Getters: .content, .finishReason, .model, .usage. |
LlamaCppInferenceChunk | Streaming chunk. Getters: .delta, .finishReason. |
LlamaCppInferenceChunkStream | Async chunk source. Same await stream.next() pattern. |
LlamaCppInferenceUsage | Getters: .promptTokens, .completionTokens, .totalTokens, .totalTimeSec. |
import {
LlamaCppChatMessageInput, LlamaCppChatRole,
LlamaCppInferenceChunkStream, LlamaCppProvider,
} from "blazen";
const provider = await LlamaCppProvider.create({ modelPath: "/models/llama.gguf" });
const stream: LlamaCppInferenceChunkStream = await provider.inferStream([
new LlamaCppChatMessageInput(LlamaCppChatRole.User, "What is 2+2?"),
]);
for (let chunk = await stream.next(); chunk !== null; chunk = await stream.next()) {
process.stdout.write(chunk.delta ?? "");
}
Like the mistral.rs InferenceChunkStream, LlamaCppInferenceChunkStream is single-pass; mid-stream failures throw a typed LlamaCppInferenceError.
candle
The candle backend exposes a single non-streaming result class.
| Class | Purpose |
|---|---|
CandleInferenceResult | Constructor: new CandleInferenceResult(content, promptTokens, completionTokens, totalTimeSecs). Getters: .content, .promptTokens, .completionTokens, .totalTimeSecs. |
The candle backend has no streaming counterpart to InferenceChunkStream / LlamaCppInferenceChunkStream — pull CandleInferenceResult once per call. If you need token-by-token streaming on candle, swap the provider to mistral.rs or llama.cpp.
Errors raised from local inference
All three families propagate errors as typed BlazenError subclasses. The mapping is documented in the Error Handling section above. As elsewhere, run callbacks through enrichError to re-classify any plain Error that bubbles back through Rust from JS-side code (custom samplers, custom token decoders, etc.).
Telemetry
OpenTelemetry-compatible tracing flows through the standard tracing subscriber. Blazen ships an optional Langfuse exporter that ships span batches to the Langfuse ingestion API. This is gated by the langfuse Cargo feature on the underlying crate, so it’s only available in builds that opted in at compile time.
LangfuseConfig
import { LangfuseConfig } from "blazen";
const config = new LangfuseConfig(
process.env.LANGFUSE_PUBLIC_KEY!,
process.env.LANGFUSE_SECRET_KEY!,
"https://cloud.langfuse.com", // host (optional, defaults to cloud)
100, // batchSize (optional, default 100)
5000, // flushIntervalMs (optional, default 5000)
);
| Property | Type | Description |
|---|---|---|
.publicKey | string | The Langfuse public API key. |
.secretKey | string | The Langfuse secret API key. |
.host | string | null | The configured host URL, or null when defaulted. |
.batchSize | number | Maximum events buffered before an automatic flush. |
.flushIntervalMs | number | Background flush interval in milliseconds. |
initLangfuse(config: LangfuseConfig): void
Install the global tracing subscriber. Spawns a background tokio task that flushes buffered span envelopes to Langfuse on the configured interval. Calling this more than once per process is safe — subsequent calls no-op because the global subscriber is already registered.
import { initLangfuse, LangfuseConfig } from "blazen";
initLangfuse(new LangfuseConfig(
process.env.LANGFUSE_PUBLIC_KEY!,
process.env.LANGFUSE_SECRET_KEY!,
));
Available only when the host build was compiled with the langfuse feature on the underlying telemetry crate. In builds without it, the symbol is still exported but the configured exporter is a no-op.
Wiring LangfuseConfig from environment
Most deployments construct LangfuseConfig directly from environment variables at startup. Tune batchSize and flushIntervalMs to balance ingestion latency against the per-request overhead of HTTP flushes:
import { initLangfuse, LangfuseConfig } from "blazen";
const cfg = new LangfuseConfig(
process.env.LANGFUSE_PUBLIC_KEY!,
process.env.LANGFUSE_SECRET_KEY!,
process.env.LANGFUSE_HOST, // null → cloud default
Number(process.env.LANGFUSE_BATCH_SIZE ?? 100),
Number(process.env.LANGFUSE_FLUSH_INTERVAL_MS ?? 5000),
);
initLangfuse(cfg);
A LangfuseConfig instance is purely declarative — it carries no IO state — so it’s safe to construct, inspect, log (with secrets redacted), and pass to initLangfuse independently.
You can keep multiple LangfuseConfig objects around (e.g. one per environment) and choose which one to install at startup; only the first initLangfuse call wins per process.
version()
Returns the Blazen library version string.
import { version } from "blazen";
console.log(version()); // "0.1.0"