Node.js API Reference
Complete API reference for blazen in Node.js
ChatMessage
A class for building typed chat messages. Supports text, multimodal (image) content, and content parts.
Constructor
new ChatMessage({ role?: string, content?: string, parts?: ContentPart[] })
Create a message from an options object. role defaults to "user" if omitted. Supply either content (text) or parts (multimodal), not both.
// Text message with explicit role
new ChatMessage({ role: "user", content: "Hello" })
// Using the Role enum
new ChatMessage({ role: Role.User, content: "Hello" })
// System message
new ChatMessage({ role: "system", content: "You are a helpful assistant." })
// Multimodal message with content parts
new ChatMessage({
role: "user",
parts: [
{ partType: "text", text: "Describe this image:" },
{ partType: "image", image: { source: { sourceType: "url", url: "https://example.com/photo.jpg" } } }
]
})
Static Factory Methods
| Method | Description |
|---|---|
ChatMessage.system(content: string) | Create a system message |
ChatMessage.user(content: string) | Create a user message |
ChatMessage.assistant(content: string) | Create an assistant message |
ChatMessage.tool(content: string) | Create a tool result message |
ChatMessage.userImageUrl(text: string, url: string, mediaType?: string) | Create a user message with text and an image URL |
ChatMessage.userImageBase64(text: string, data: string, mediaType: string) | Create a user message with text and a base64-encoded image |
ChatMessage.userParts(parts: ContentPart[]) | Create a user message from an explicit list of content parts |
const msg = ChatMessage.user("What is 2 + 2?");
const imgMsg = ChatMessage.userImageUrl(
"What's in this image?",
"https://example.com/photo.jpg",
"image/jpeg"
);
const b64Msg = ChatMessage.userImageBase64(
"Describe this:",
base64Data,
"image/png"
);
Properties
| Property | Type | Description |
|---|---|---|
.role | string | The message role: "system", "user", "assistant", or "tool" |
.content | string | null | The text content of the message, if any. For tool results that returned a plain string, the string lives here (and .toolResult is null). |
.toolResult | ToolOutput | null | Structured tool-result payload. null for non-tool messages or when the tool returned a plain string. When non-null, .toolResult.data is the full structured value the caller should consume; the LLM-facing wire form is derived from .toolResult.llmOverride (if set) or from .toolResult.data via the provider’s default conversion. |
Note on
toolCallId: The Rust core stores atool_call_idon tool messages so providers can correlate a tool result with the originatingToolCall.id. As of this writing the Node binding does not expose a public.toolCallIdgetter onChatMessage— correlation is handled internally by the agent loop and on the wire by each provider. If you need explicit access, file an issue and we can surface it.
Role
String enum for message roles.
Role.System // "system"
Role.User // "user"
Role.Assistant // "assistant"
Role.Tool // "tool"
Can be used interchangeably with plain strings in the ChatMessage constructor.
ContentPart
Types for multimodal message content, used in the parts field of the ChatMessage constructor and in ChatMessage.userParts().
interface ContentPart {
partType: "text" | "image";
text?: string; // Required when partType is "text"
image?: ImageContent; // Required when partType is "image"
}
interface ImageContent {
source: ImageSource;
mediaType?: string; // MIME type, e.g. "image/png"
}
interface ImageSource {
sourceType: "url" | "base64";
url?: string; // Required when sourceType is "url"
data?: string; // Required when sourceType is "base64"
}
// `MediaSource` is a type alias for `ImageSource` re-exported for compute APIs
// that accept any media (image / video frame / audio cover-art) using the same shape.
// All compute requests that take a `MediaSource` accept exactly the same value
// you would pass to `ImageSource` — `MediaSource` exists only as an aliasing affordance.
export type MediaSource = ImageSource;
Note on
MediaSource: preferMediaSourcein compute / generation APIs (image upscaling, video frames, audio cover art) andImageSourcein chat-message APIs. The two are interchangeable at the type level —MediaSourceis just the structurally identical alias forImageSource.
// Text part
{ partType: "text", text: "Describe this image:" }
// Image from URL
{
partType: "image",
image: {
source: { sourceType: "url", url: "https://example.com/photo.jpg" },
mediaType: "image/jpeg"
}
}
// Image from base64
{
partType: "image",
image: {
source: { sourceType: "base64", data: "iVBORw0KGgo..." },
mediaType: "image/png"
}
}
CompletionModel
A chat completion model. Created via static factory methods for each provider.
Provider Factory Methods
All providers accept an optional options object containing an apiKey (and other provider-specific fields). If options is omitted — or apiKey is not set within it — the key is read from the provider’s standard environment variable (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, etc.). If model is not set, the provider’s default model is used.
| Method | Signature |
|---|---|
CompletionModel.openai | (options?: ProviderOptions) |
CompletionModel.anthropic | (options?: ProviderOptions) |
CompletionModel.gemini | (options?: ProviderOptions) |
CompletionModel.azure | (options: AzureOptions) |
CompletionModel.fal | (options?: FalOptions) |
CompletionModel.openrouter | (options?: ProviderOptions) |
CompletionModel.groq | (options?: ProviderOptions) |
CompletionModel.together | (options?: ProviderOptions) |
CompletionModel.mistral | (options?: ProviderOptions) |
CompletionModel.deepseek | (options?: ProviderOptions) |
CompletionModel.fireworks | (options?: ProviderOptions) |
CompletionModel.perplexity | (options?: ProviderOptions) |
CompletionModel.xai | (options?: ProviderOptions) |
CompletionModel.cohere | (options?: ProviderOptions) |
CompletionModel.bedrock | (options: BedrockOptions) |
// Read key from OPENAI_API_KEY env var
const model = CompletionModel.openai();
// Pass an explicit key and override the model
const claude = CompletionModel.anthropic({ apiKey: "sk-ant-...", model: "claude-sonnet-4-20250514" });
const gemini = CompletionModel.gemini({ model: "gemini-2.5-flash" });
Properties
| Property | Type | Description |
|---|---|---|
.modelId | string | The model identifier string |
await model.complete(messages: ChatMessage[]): CompletionResponse
Perform a chat completion.
const response = await model.complete([
ChatMessage.system("You are a helpful assistant."),
ChatMessage.user("What is 2 + 2?"),
]);
console.log(response.content); // "4"
await model.completeWithOptions(messages: ChatMessage[], options: CompletionOptions): CompletionResponse
Perform a chat completion with additional options.
const response = await model.completeWithOptions(
[ChatMessage.user("Write a haiku about Rust.")],
{ temperature: 0.7, maxTokens: 100 }
);
await model.stream(messages: ChatMessage[], onChunk: (chunk) => void): void
Stream a chat completion. The callback receives each chunk as it arrives.
await model.stream(
[ChatMessage.user("Tell me a story")],
(chunk) => {
if (chunk.delta) process.stdout.write(chunk.delta);
}
);
Each chunk has the shape:
{
delta?: string; // Text content delta
finishReason?: string; // Set on the final chunk
toolCalls: ToolCall[]; // Tool calls, if any
}
await model.streamWithOptions(messages: ChatMessage[], onChunk: (chunk) => void, options: CompletionOptions): void
Stream a chat completion with additional options.
await model.streamWithOptions(
[ChatMessage.user("Explain quantum computing")],
(chunk) => { if (chunk.delta) process.stdout.write(chunk.delta); },
{ temperature: 0.5, maxTokens: 500 }
);
Middleware Decorators
Each decorator returns a new CompletionModel wrapping the original with additional behaviour.
model.withRetry(config?: RetryConfig): CompletionModel
Automatic retry with exponential backoff on transient failures.
const resilient = model.withRetry({ maxRetries: 5, initialDelayMs: 500, maxDelayMs: 15000 });
| Field | Type | Default | Description |
|---|---|---|---|
maxRetries | number | 3 | Maximum retry attempts. |
initialDelayMs | number | 1000 | Delay before first retry (ms). |
maxDelayMs | number | 30000 | Upper bound on backoff delay (ms). |
model.withCache(config?: CacheConfig): CompletionModel
In-memory response cache for identical non-streaming requests.
const cached = model.withCache({ ttlSeconds: 600, maxEntries: 500 });
| Field | Type | Default | Description |
|---|---|---|---|
ttlSeconds | number | 300 | Cache entry TTL in seconds. |
maxEntries | number | 1000 | Maximum entries before eviction. |
CompletionModel.withFallback(models: CompletionModel[]): CompletionModel
Static factory method. Tries providers in order; falls back on transient errors.
const model = CompletionModel.withFallback([
CompletionModel.openai(),
CompletionModel.anthropic(),
]);
CompletionOptions
Options object for completeWithOptions() and streamWithOptions().
interface CompletionOptions {
temperature?: number; // Sampling temperature (0.0 - 2.0)
maxTokens?: number; // Maximum tokens to generate
topP?: number; // Nucleus sampling parameter
model?: string; // Override the default model ID
tools?: ToolDefinition[]; // Tool definitions for function calling
}
CompletionResponse
Returned by model.complete() and model.completeWithOptions().
interface CompletionResponse {
content?: string; // The generated text
toolCalls: ToolCall[]; // Tool calls requested by the model
usage?: TokenUsage; // Token usage statistics
model: string; // Model name used for the completion
finishReason?: string; // Why generation stopped ("stop", "tool_calls", etc.)
cost?: number; // Cost in USD, if reported by the provider
timing?: RequestTiming; // Request timing breakdown
images: object[]; // Generated images, if any (provider-specific)
audio: object[]; // Generated audio, if any (provider-specific)
videos: object[]; // Generated videos, if any (provider-specific)
metadata: object; // Raw provider-specific metadata
}
ToolCall
A tool invocation requested by the model.
| Property | Type | Description |
|---|---|---|
.id | string | Unique identifier for the tool call |
.name | string | Name of the tool to invoke |
.arguments | object | Parsed JSON arguments |
ToolOutput
Two-channel return shape for tool handlers. Tool results have two distinct audiences. The caller (your TypeScript code) wants the full structured data; the LLM, on the next turn, may need a different shape — sometimes shorter, sometimes provider-specific. ToolOutput carries both channels.
import type { ToolOutput, LlmPayload } from "blazen";
const out: ToolOutput = {
data: { items: [1, 2, 3] },
};
Properties
| Member | Type | Description |
|---|---|---|
data | any | The structured value the caller sees programmatically. Dict, array, scalar, or string. |
llmOverride | LlmPayload | undefined | Optional override for what the LLM sees on the next turn. undefined means each provider applies its default conversion from data. |
Both llmOverride (camelCase) and llm_override (snake_case) are accepted on input from a JS tool handler, so a hand-written object using either casing will deserialize correctly.
When the agent loop appends a tool result to the conversation, the resulting ChatMessage exposes the structured output via .toolResult:
const last = result.messages[result.messages.length - 1];
// last.toolResult?.data is the full structured payload from your handler.
// last.toolResult?.llmOverride is the override (if any) the LLM saw this turn.
For tool handlers that returned a plain string, last.toolResult is null and the string lives on last.content instead.
LlmPayload
A tagged union describing what the LLM sees for a tool result on the next turn. Used as the llmOverride field of ToolOutput.
import type { LlmPayload } from "blazen";
const text: LlmPayload = { kind: "text", text: "Found 3 results." };
const json: LlmPayload = { kind: "json", value: { items: [1, 2, 3] } };
const parts: LlmPayload = {
kind: "parts",
parts: [
{ partType: "text", text: "Here is the table:" },
{ partType: "text", text: "| col |\n| --- |\n| 1 |" },
],
};
const raw: LlmPayload = {
kind: "provider_raw",
provider: "anthropic",
value: [{ type: "text", text: "Custom Anthropic-shaped payload." }],
};
Variants
kind | Required fields | Behavior |
|---|---|---|
"text" | text | Plain text. Works on every provider. |
"json" | value | Structured JSON. Anthropic and Gemini natively consume the structure; OpenAI-family stringifies once at the wire boundary. |
"parts" | parts (ContentPart[]) | Multimodal content blocks; Anthropic supports natively, OpenAI falls back to text concatenation, Gemini wraps as { parts: [...] }. |
"provider_raw" | provider, value | Provider-specific escape hatch. Only the named provider sees value; every other provider falls back to converting ToolOutput.data with its default. provider is one of "openai", "openai_compat", "azure", "anthropic", "gemini", "responses", "fal". |
Per-provider behavior
When a tool returns structured data and no llmOverride, each provider sends a sensible default to the LLM:
- OpenAI / OpenAI-compat / Azure / Responses / Fal: the data is JSON-stringified into the
contentfield of the tool message. - Anthropic: structured data becomes
[{ type: "text", text: <stringified-json> }]insidetool_result.content. - Gemini: structured object data is passed natively as
functionResponse.response. Scalars wrap as{ result: <scalar> }.
When llmOverride is set, that override always wins for the variants the provider understands; kind: "provider_raw" is the strictest — it’s only honoured when provider matches the active provider, otherwise the provider falls back to converting data with its default.
ToolDefinition
Describes a tool that the model may invoke.
interface ToolDefinition {
name: string; // Unique tool name
description: string; // Human-readable description
parameters: object; // JSON Schema for the tool's parameters
}
const tools: ToolDefinition[] = [
{
name: "getWeather",
description: "Get the current weather for a city",
parameters: {
type: "object",
properties: { city: { type: "string" } },
required: ["city"]
}
}
];
Content Subsystem
Pluggable storage and handle plumbing for multimodal payloads (images, audio, video, documents, 3D models, CAD files). A ContentStore issues opaque handles; tools accept handle-id strings as arguments and Blazen substitutes the resolved typed content before the handler runs.
import {
ContentStore,
imageInput,
audioInput,
videoInput,
fileInput,
threeDInput,
cadInput,
} from "blazen";
import type {
ContentHandle,
ContentKind,
JsContentMetadata,
PutOptions,
} from "blazen";
ContentKind
String enum tagging what a handle refers to. Values match the serde-tag form so they round-trip across any Blazen API that takes a kind string.
const enum ContentKind {
Image = "image",
Audio = "audio",
Video = "video",
Document = "document",
ThreeDModel = "three_d_model",
Cad = "cad",
Archive = "archive",
Font = "font",
Code = "code",
Data = "data",
Other = "other",
}
ContentHandle
Opaque reference returned by ContentStore.put() and consumed by every other store method. Treat id as a black box — store-defined.
interface ContentHandle {
id: string; // Opaque, store-defined identifier
kind: ContentKind; // What kind of content this handle refers to
mimeType?: string; // MIME type if known
byteSize?: number; // Byte size if known (i64 -- napi has no u64)
displayName?: string; // Human-readable display name (e.g. original filename)
}
ContentHandle is a type alias for the underlying JsContentHandle interface.
ContentStore
Pluggable registry for multimodal content. Construct via the static factories; instances are cheap to clone (internally an Arc), so reusing one store across multiple agents and requests is fine.
class ContentStore {
// Factories
static inMemory(): ContentStore;
static localFile(root: string): ContentStore;
static openaiFiles(apiKey: string, baseUrl?: string | null): ContentStore;
static anthropicFiles(apiKey: string, baseUrl?: string | null): ContentStore;
static geminiFiles(apiKey: string, baseUrl?: string | null): ContentStore;
static falStorage(apiKey: string, baseUrl?: string | null): ContentStore;
static custom(options: CustomContentStoreOptions): ContentStore;
// Operations
put(body: Buffer | string, options: PutOptions): Promise<ContentHandle>;
resolve(handle: ContentHandle): Promise<any>; // serialized MediaSource
fetchBytes(handle: ContentHandle): Promise<Buffer>;
metadata(handle: ContentHandle): Promise<JsContentMetadata>;
delete(handle: ContentHandle): Promise<void>;
}
put() accepts either a Buffer (inline bytes uploaded to the store) or a string — interpreted as a URL when it contains "://" (the store records the reference) and as a local filesystem path otherwise (the store reads or copies the file as needed).
resolve() returns a serialized MediaSource — the same JSON shape Blazen’s request builders accept. fetchBytes() is for tools that need to operate on the raw payload (parse a PDF, transcribe audio); most tools reason over the handle and let resolve() produce the wire form. delete() is best-effort — default implementations on most stores are no-ops.
const store = ContentStore.inMemory();
const handle = await store.put(Buffer.from(pngBytes), {
mimeType: "image/png",
kind: ContentKind.Image,
displayName: "diagram.png",
});
const meta = await store.metadata(handle);
const bytes = await store.fetchBytes(handle);
Subclassing ContentStore
ContentStore is subclassable from JavaScript / TypeScript. Override the methods your backend needs; napi-rs wraps your subclass in a Rust adapter that dispatches into your JS async functions via ThreadsafeFunction.
import { ContentStore } from "blazen";
import type { ContentHandle, ContentKind } from "blazen";
class S3ContentStore extends ContentStore {
constructor(bucket: string) {
super();
this.bucket = bucket;
}
async put(body, hint) {
// ...
return { id: "...", kind: "image" };
}
async resolve(handle) {
return { sourceType: "url", url: "..." };
}
async fetchBytes(handle) {
return Buffer.from("...");
}
// Optional:
async fetchStream(handle) { return Buffer.from("..."); }
async delete(handle) { /* no-op */ }
}
Subclasses MUST override put, resolve, fetchBytes. The base-class default impls throw an error so any missing override is a clear failure rather than silent recursion via super().
ContentStore.custom({...})
Callback-based factory. Direct JS mirror of Rust CustomContentStore::builder.
ContentStore.custom(options: {
put: (body: any, hint: any) => Promise<ContentHandle>;
resolve: (handle: ContentHandle) => Promise<any>; // serialized MediaSource
fetchBytes: (handle: ContentHandle) => Promise<Buffer>;
fetchStream?: (handle: ContentHandle) => Promise<Buffer>; // single-chunk for now
delete?: (handle: ContentHandle) => Promise<void>;
name?: string;
}): ContentStore
put, resolve, fetchBytes are required. fetchStream and delete are optional. The body argument arrives as a JS object shaped like {type: "bytes", data: [...]} / {type: "url", url} / {type: "local_path", path} / {type: "provider_file", provider, id} / {type: "stream", stream: AsyncIterable<Uint8Array>, sizeHint: number | null}. The hint has optional mimeType / kindHint / displayName / byteSize.
resolve returns a serialized MediaSource JS object (e.g. {sourceType: "url", url: "..."}). fetchBytes returns a Buffer. fetchStream may return either Buffer / Uint8Array / number[] / base64 string (legacy, single-chunk) or an AsyncIterable<Uint8Array> for true chunk-by-chunk streaming — a Node Readable qualifies since it implements [Symbol.asyncIterator].
PutOptions
Optional hints attached to a put() call. Every field is optional; the store may auto-detect from the bytes when a hint is missing.
interface PutOptions {
mimeType?: string; // MIME type, if known
kind?: ContentKind; // Caller's preferred classification -- overrides any auto-detection
displayName?: string; // Human-readable display name (filename, caption)
byteSize?: number; // Byte size, if known up-front (i64 since napi has no u64)
}
ContentMetadata
Cheap metadata summary returned by ContentStore.metadata(). No bytes are materialized.
interface ContentMetadata {
kind: ContentKind;
mimeType?: string;
byteSize?: number;
displayName?: string;
}
ContentMetadata is a type alias for the underlying JsContentMetadata interface.
Built-in stores
| Factory | Purpose |
|---|---|
ContentStore.inMemory() | Ephemeral in-memory store. Default for tests and short-lived sessions. |
ContentStore.localFile(root) | Filesystem-backed store rooted at root. Directory is created if missing. |
ContentStore.openaiFiles(apiKey, baseUrl?) | Backed by the OpenAI Files API. |
ContentStore.anthropicFiles(apiKey, baseUrl?) | Backed by the Anthropic Files API. |
ContentStore.geminiFiles(apiKey, baseUrl?) | Backed by the Gemini Files API. |
ContentStore.falStorage(apiKey, baseUrl?) | Backed by fal.ai’s storage API. |
ContentStore.custom({...}) | User-defined backend via async callbacks (see above). |
Tool-input schema helpers
Each helper builds a JSON Schema fragment declaring a single required handle-id input. The model emits a handle-id string; Blazen swaps it for the resolved typed content before your tool runs.
| Helper | Declares an input expecting a handle of kind |
|---|---|
imageInput(name, description) | image |
audioInput(name, description) | audio |
videoInput(name, description) | video |
fileInput(name, description) | document |
threeDInput(name, description) | three_d_model |
cadInput(name, description) | cad |
Each call returns the same shape (kind tag varies):
imageInput("photo", "The image to describe");
// =>
// {
// type: "object",
// properties: {
// photo: {
// type: "string",
// description: "The image to describe",
// "x-blazen-content-ref": { kind: "image" }
// }
// },
// required: ["photo"]
// }
The x-blazen-content-ref extension is invisible to providers — they see a plain string parameter — but Blazen’s resolver uses it as a marker for handle substitution.
How resolution works
When the model emits a tool call like {"photo": "blazen_xxx"}, Blazen scans the tool’s parameter schema for x-blazen-content-ref markers. For each marked field, it looks up the handle id in the active ContentStore and replaces the bare string with a typed object before invoking your handler:
{
kind: "image",
handleId: "blazen_xxx",
mimeType: "image/png",
byteSize: 24576,
displayName: "diagram.png",
source: { /* resolved MediaSource -- url, base64, or provider-native ref */ }
}
source is the same wire form ContentStore.resolve() returns, so your handler can forward it straight into a downstream provider request, or call store.fetchBytes(handle) if it needs the raw payload. If the handle is unknown to the store the call fails before the handler runs.
TokenUsage
Token usage statistics for a completion request.
| Property | Type | Description |
|---|---|---|
.promptTokens | number | Tokens in the prompt |
.completionTokens | number | Tokens in the completion |
.totalTokens | number | Total tokens used |
RequestTiming
Timing metadata for a completion request.
| Property | Type | Description |
|---|---|---|
.queueMs | number | undefined | Time spent waiting in queue (ms) |
.executionMs | number | undefined | Time spent executing (ms) |
.totalMs | number | undefined | Total wall-clock time (ms) |
runAgent
Run an agentic tool execution loop. The agent repeatedly calls the model, executes tool calls via the handler callback, feeds results back, and repeats until the model stops calling tools or maxIterations is reached.
const result = await runAgent(model, messages, tools, toolHandler, options?);
Parameters
| Parameter | Type | Description |
|---|---|---|
model | CompletionModel | The completion model to use |
messages | ChatMessage[] | Initial conversation messages |
tools | ToolDef[] | Tool definitions the agent can invoke |
toolHandler | (toolName: string, args: object) => Promise<any | ToolOutput> | Callback that executes tool calls. May return a bare value (auto-wrapped) or an explicit ToolOutput. |
options | AgentRunOptions? | Optional configuration |
Example
import { CompletionModel, ChatMessage, runAgent } from "blazen";
const model = CompletionModel.openai();
const result = await runAgent(
model,
[ChatMessage.user("What is the weather in NYC?")],
[{
name: "getWeather",
description: "Get weather for a city",
parameters: {
type: "object",
properties: { city: { type: "string" } },
required: ["city"]
}
}],
async (toolName, args) => {
if (toolName === "getWeather") {
return { temp: 72, condition: "sunny" };
}
throw new Error(`Unknown tool: ${toolName}`);
},
{ maxIterations: 5 }
);
console.log(result.response.content);
console.log(`Took ${result.iterations} iterations`);
Tool handler return shapes
The handler’s return value is checked for an explicit data key. If present and the value deserializes as a ToolOutput, it’s used directly. Otherwise the bare value is wrapped via ToolOutput { data: <value>, llmOverride: undefined }.
This means an arbitrary user dict like { items: [1, 2, 3] } is treated as plain data, not as a ToolOutput. Only objects with a top-level data field are unpacked.
import { runAgent, type ToolDef } from "blazen";
// Simplest: return a value directly, auto-wrapped.
const search: ToolDef = {
name: "search",
description: "Search for items.",
parameters: { type: "object", properties: { q: { type: "string" } } },
};
async function handlerSimple(toolName: string, args: any) {
if (toolName === "search") {
return { items: [1, 2, 3] }; // wrapped as ToolOutput { data: { items: [1,2,3] } }
}
throw new Error(`Unknown tool: ${toolName}`);
}
// With override: structured ToolOutput so the LLM sees a summary,
// but the caller's `messages[messages.length-1].toolResult.data`
// still has the full list.
async function handlerWithOverride(toolName: string, args: any) {
if (toolName === "search") {
return {
data: { items: [1, 2, 3], rawResponse: "..." },
llmOverride: { kind: "text", text: "Found 3 items." },
};
}
throw new Error(`Unknown tool: ${toolName}`);
}
To return a string to the caller (so it lives on ChatMessage.content and toolResult is null), simply return a string from the handler:
async function handlerString(toolName: string, args: any) {
return "ok"; // appears as ChatMessage.content; toolResult is null
}
See ToolOutput and LlmPayload for the full shape and per-provider wire behaviour.
ToolDef
Describes a tool that the agent may invoke.
interface ToolDef {
name: string; // Unique tool name
description: string; // Human-readable description
parameters: object; // JSON Schema for the tool's parameters
}
AgentRunOptions
Options for configuring an agent run.
interface AgentRunOptions {
maxIterations?: number; // Max tool-calling iterations (default: 10)
systemPrompt?: string; // System prompt prepended to the conversation
temperature?: number; // Sampling temperature (0.0 - 2.0)
maxTokens?: number; // Maximum tokens per completion call
addFinishTool?: boolean; // Add a built-in "finish" tool the model can call to signal completion
}
AgentResult
The result of an agent run, returned by runAgent(). AgentResult is a typed class with getter properties (not a plain object) so it carries identity across the FFI boundary and supports instanceof checks.
Properties
| Property | Type | Description |
|---|---|---|
.response | string | Final assistant text from the last completion. |
.messages | ChatMessage[] | Full message history (all tool calls and results). |
.iterations | number | Number of tool-calling iterations that occurred. |
.totalCost | number | null | Aggregated cost in USD across all iterations, or null if pricing is unknown. |
.toString() | string | Human-readable summary, mirrors the Python __repr__. |
const result = await runAgent(model, [ChatMessage.user("Hi")], tools);
console.log(result.response); // "Hello!"
console.log(result.messages.length); // e.g. 3
console.log(result.iterations); // e.g. 1
console.log(result.totalCost); // e.g. 0.00012 or null
BatchResult
Returned by completeBatch() / completeBatchConfig(). A typed class wrapping per-request outcomes plus aggregates.
Properties
| Property | Type | Description |
|---|---|---|
.responses | (CompletionResponse | null)[] | One entry per input request. null for failed requests. |
.errors | (string | null)[] | Per-request error message, or null for successful requests. |
.totalUsage | TokenUsage | null | Aggregated token usage across all successful responses. |
.totalCost | number | null | Aggregated cost in USD across all successful responses. |
.successCount | number | Number of requests that succeeded. |
.failureCount | number | Number of requests that failed. |
.length | number | Total number of requests in the batch (= responses.length = errors.length). |
.toString() | string | Human-readable batch summary. |
const batch = await completeBatch(model, [
[ChatMessage.user("What's 2+2?")],
[ChatMessage.user("Capital of France?")],
]);
console.log(`${batch.successCount}/${batch.length} succeeded`);
for (let i = 0; i < batch.length; i++) {
if (batch.errors[i]) console.error(`req ${i}:`, batch.errors[i]);
else console.log(`req ${i}:`, batch.responses[i]?.content);
}
Pipeline
A Pipeline is a sequence of named Stages built with PipelineBuilder. Each stage runs as its own workflow; on completion, an optional persist callback fires with a typed snapshot so the caller can durably store progress for later resumption.
new PipelineBuilder(name: string)
import { PipelineBuilder } from "blazen";
const pipeline = new PipelineBuilder("ingest")
.stage(stageA)
.stage(stageB)
.timeoutPerStage(120)
.build();
.onPersist(callback)
Register a TSFN-based persist callback that receives a typed PipelineSnapshot after each stage completes. The callback must return Promise<void> (or be async). A rejected promise aborts the pipeline with a PipelineError.
import { PipelineBuilder, PipelineSnapshot } from "blazen";
const pipeline = new PipelineBuilder("ingest")
.stage(stage)
.onPersist(async (snapshot: PipelineSnapshot) => {
await db.put(`pipeline:${snapshot.runId}`, snapshot.toJsonPretty());
})
.build();
.onPersistJson(callback)
Same as onPersist, but the callback receives the snapshot pre-serialized as a JSON string. Useful for backends that store opaque blobs (IndexedDB, Redis, S3).
const pipeline = new PipelineBuilder("ingest")
.stage(stage)
.onPersistJson(async (json: string) => {
await idb.put("snapshots", json, runId);
})
.build();
The snapshot can later be replayed via pipeline.resume(PipelineSnapshot.fromJson(json)).
If your onPersist (or onPersistJson) callback throws or returns a rejected promise, the rejection is wrapped as a PersistError (a BlazenError subclass) and propagated to the running pipeline, aborting it. Catch the error from await handler.result() and inspect with instanceof PersistError to distinguish persistence failures from stage failures.
import { PersistError } from "blazen";
try {
await handler.result();
} catch (e) {
if (e instanceof PersistError) {
console.error("snapshot persistence failed:", e.message);
}
}
Workflow
new Workflow(name: string)
Create a new workflow instance. Default timeout is 300 seconds (5 minutes).
const wf = new Workflow("my-workflow");
.addStep(name: string, eventTypes: string[], handler: StepHandler)
Register a step that listens for one or more event types.
wf.addStep("process", ["MyEvent"], async (event, ctx) => {
return { type: "blazen::StopEvent", result: { done: true } };
});
.setTimeout(seconds: number)
Set the maximum execution time for the workflow in seconds. Set to 0 or negative to disable.
wf.setTimeout(30);
await wf.run(input: object): WorkflowResult
Run the workflow to completion with the given input.
const result = await wf.run({ prompt: "Hello" });
await wf.runStreaming(input: object, callback: (event) => void): WorkflowResult
Run the workflow with a streaming callback invoked for each event published via ctx.writeEventToStream().
const result = await wf.runStreaming({ prompt: "Hello" }, (event) => {
console.log("stream:", event);
});
await wf.runWithHandler(input: object): WorkflowHandler
Run the workflow and return a handler for pause/resume and streaming control.
const handler = await wf.runWithHandler({ prompt: "Hello" });
await wf.resume(snapshotJson: string): WorkflowHandler
Resume a previously paused workflow from a snapshot JSON string.
const snapshot = fs.readFileSync("snapshot.json", "utf-8");
const handler = await wf.resume(snapshot);
const result = await handler.result();
WorkflowResult
interface WorkflowResult {
type: string; // Event type of the final result (e.g. "blazen::StopEvent")
data: object; // Result data extracted from the StopEvent's result field
}
StepHandler
async (event: object, ctx: Context) => object | object[] | null
A step handler receives an event and a context. It can return:
- A single event object to emit one event.
- An array of event objects to fan-out multiple events.
nullfor side-effect-only steps that emit no events.
WorkflowHandler
Returned by Workflow.runWithHandler() and Workflow.resume(). Provides control over a running workflow.
Important: result() consumes the handler internally — you can only call it once. The other control methods (pause, resumeInPlace, abort, respondToInput, snapshot) borrow the handler and can be called multiple times.
await handler.result(): WorkflowResult
Await the final workflow result.
await handler.pause(): void
Signal the running workflow to pause. After pausing, use snapshot() to get a serializable snapshot, or resumeInPlace() to continue execution.
await handler.snapshot(): string
Get a serializable snapshot of the paused workflow as a JSON string. Save this to resume later with Workflow.resume().
await handler.resumeInPlace(): void
Resume a paused workflow in place without creating a new handler.
await handler.streamEvents(callback: (event) => void): void
Subscribe to intermediate events published via ctx.writeEventToStream(). Must be called before result() or pause().
const handler = await wf.runWithHandler({ prompt: "Hello" });
// Subscribe to stream events
await handler.streamEvents((event) => console.log(event));
// Then await the result
const result = await handler.result();
Events
Events are plain objects with a type field.
{ type: "MyEvent", payload: "data" }
Start Event
{ type: "blazen::StartEvent", ...input }
The workflow begins by emitting a StartEvent containing the input data.
Stop Event
{ type: "blazen::StopEvent", result: { ... } }
Returning a StopEvent from a step handler completes the workflow.
Context
Shared workflow context accessible by all steps. All methods are async.
StateValue
All context values conform to the StateValue type:
type StateValue = string | number | boolean | null | Buffer | StateValue[] | { [key: string]: StateValue };
await ctx.set(key: string, value: Exclude<StateValue, Buffer>): void
Store a JSON-serializable value in the workflow context. Accepts strings, numbers, booleans, null, arrays, and nested objects. For binary data, use ctx.setBytes() instead.
The legacy ctx.set / ctx.get shortcuts still work and route values through the same 4-tier dispatch. For new code, prefer the explicit ctx.state / ctx.session namespaces documented below.
await ctx.get(key: string): Promise<StateValue | null>
Retrieve a value from the workflow context. Returns null if not found. Returns data for all StateValue variants — strings, numbers, booleans, arrays, objects, and Buffer (if the key was stored via setBytes). No data is silently dropped.
await ctx.setBytes(key: string, buffer: Buffer): void
Store raw binary data in the workflow context. Use this for explicit binary storage (e.g., MessagePack, protobuf, raw buffers). Binary data persists through pause/resume/checkpoint.
await ctx.getBytes(key: string): Buffer | null
Retrieve raw binary data from the workflow context. Returns null if not found. Note that ctx.get() also returns binary data now, so getBytes is mainly useful when you want to assert that a key holds binary content.
await ctx.runId(): string
Get the unique run UUID for the current workflow execution.
await ctx.sendEvent(event: object): void
Manually route an event into the workflow event bus. The event will be delivered to any step whose eventTypes list includes its type.
await ctx.writeEventToStream(event: object): void
Publish an event to the external broadcast stream. Consumers that subscribed via runStreaming or handler.streamEvents() will receive this event. Unlike sendEvent, this does not route the event through the internal step registry.
get state(): StateNamespace
Persistable workflow state. Survives pause() / resume(), checkpoints, and durable storage. See StateNamespace below.
get session(): SessionNamespace
In-process-only values, excluded from snapshots. Use this for things that should not survive pause() / resume(). JS object identity is NOT preserved on Node — see the SessionNamespace caveat below.
StateNamespace
Namespace for persistable workflow state. Values stored via state.set / state.setBytes go into the underlying ContextInner.state map and survive snapshots, pause() / resume(), and checkpoint stores.
await state.set(key: string, value: Exclude<StateValue, Buffer>): Promise<void>
Store a JSON-serializable value under the given key.
await state.get(key: string): Promise<StateValue | null>
Retrieve a value previously stored under the given key. Returns null if not found.
await state.setBytes(key: string, data: Buffer): Promise<void>
Store raw binary data under the given key.
await state.getBytes(key: string): Promise<Buffer | null>
Retrieve raw binary data previously stored under the given key. Returns null if not found.
workflow.addStep("step", ["blazen::StartEvent"], async (event, ctx) => {
await ctx.state.set("counter", 5);
const count = await ctx.state.get("counter");
return { type: "blazen::StopEvent", result: { count } };
});
SessionNamespace
Namespace for in-process-only workflow values. Values stored via session.set are kept in the ContextInner.objects side-channel and are excluded from snapshots. Use this for state that should not survive a pause() / resume() round-trip (request IDs, rate-limit counters, ephemeral caches, …).
Important — napi-rs identity caveat: JS object identity is NOT preserved through ctx.session on the Node bindings. Values are routed through serde_json::Value because napi-rs’s Reference<T> is !Send — its Drop must run on the v8 main thread, and tokio worker threads cannot safely cross the napi boundary with live JS object references. await ctx.session.get(key) returns a plain object equal to the one you passed in, not the same object instance. For true JS class identity preservation, use the Python or WASM bindings, or keep the work inside a single Rust step. Full identity through events is tracked as a follow-up architectural refactor.
The session namespace is still functionally distinct from state: session values are excluded from snapshots, state values are not.
await session.set(key: string, value: unknown): Promise<void>
Store a JSON-serializable value under the given key. The value is excluded from snapshots.
await session.get(key: string): Promise<unknown>
Retrieve a value previously stored under the given key. Returns null if the key does not exist.
await session.has(key: string): Promise<boolean>
Check whether a value exists under the given key.
await session.remove(key: string): Promise<void>
Remove the value stored under the given key.
workflow.addStep("step", ["blazen::StartEvent"], async (event, ctx) => {
await ctx.session.set("reqId", "abc123");
if (await ctx.session.has("reqId")) {
const id = await ctx.session.get("reqId");
console.log("request id:", id);
}
return { type: "blazen::StopEvent", result: {} };
});
state and session use independent keyspaces — the same key can exist in both namespaces without colliding:
await ctx.state.set("k", "state-value");
await ctx.session.set("k", "session-value");
// Both are accessible; they don't collide.
BlazenState
Base class for typed state objects with per-field context storage. Extend this class to define structured workflow state that is automatically serialized and deserialized field by field.
static meta?: BlazenStateMeta
Optional static metadata that controls how fields are stored and which fields are transient.
interface BlazenStateMeta {
transient?: Set<string> | string[];
}
| Field | Type | Description |
|---|---|---|
transient | Set<string> | string[] | Field names to exclude from persistence. These fields are not saved by saveTo() and will not be present after loadFrom(). |
restore?(): void | Promise<void>
Optional instance method called by loadFrom() after all persisted fields have been restored. Use this to recreate transient fields (caches, connections, derived data) that were excluded from persistence.
await state.saveTo(ctx: Context, key: string): Promise<void>
Persist every non-transient field of the state instance into the workflow context. Each field is stored individually under a namespaced key derived from key.
static loadFrom<T>(ctx: Context, key: string): Promise<T>
Restore a state instance from the workflow context. Reads each persisted field, constructs a new instance, and calls restore() if defined.
const state = await AgentState.loadFrom<AgentState>(ctx, "state");
Compute Request Types
Typed request interfaces for compute operations (image generation, video, speech, music, transcription, 3D models).
ImageRequest
interface ImageRequest {
prompt: string; // Text prompt describing the desired image
negativePrompt?: string; // Things to avoid in the image
width?: number; // Desired image width in pixels
height?: number; // Desired image height in pixels
numImages?: number; // Number of images to generate
model?: string; // Model override (provider-specific)
parameters?: object; // Additional provider-specific parameters
}
UpscaleRequest
interface UpscaleRequest {
imageUrl: string; // URL of the image to upscale
scale: number; // Scale factor (e.g. 2.0, 4.0)
model?: string;
parameters?: object;
}
VideoRequest
interface VideoRequest {
prompt: string; // Text prompt describing the desired video
imageUrl?: string; // Source image URL for image-to-video
durationSeconds?: number; // Desired duration in seconds
negativePrompt?: string; // Things to avoid
width?: number; // Video width in pixels
height?: number; // Video height in pixels
model?: string;
parameters?: object;
}
SpeechRequest
interface SpeechRequest {
text: string; // Text to synthesize into speech
voice?: string; // Voice identifier (provider-specific)
voiceUrl?: string; // Reference voice sample URL for cloning
language?: string; // Language code (e.g. "en", "fr", "ja")
speed?: number; // Speech speed multiplier (1.0 = normal)
model?: string;
parameters?: object;
}
MusicRequest
interface MusicRequest {
prompt: string; // Text prompt describing the desired audio
durationSeconds?: number; // Desired duration in seconds
model?: string;
parameters?: object;
}
TranscriptionRequest
interface TranscriptionRequest {
audioUrl: string; // URL of the audio file to transcribe
language?: string; // Language hint (e.g. "en", "fr")
diarize?: boolean; // Whether to perform speaker diarization
model?: string;
parameters?: object;
}
ThreeDRequest
interface ThreeDRequest {
prompt?: string; // Text prompt describing the desired 3D model
imageUrl?: string; // Source image URL for image-to-3D
format?: string; // Output format (e.g. "glb", "obj", "usdz")
model?: string;
parameters?: object;
}
Compute Result Types
ImageResult
interface ImageResult {
images: GeneratedImage[]; // Generated or upscaled images
timing?: ComputeTiming; // Request timing breakdown
cost?: number; // Cost in USD
metadata: object; // Provider-specific metadata
}
VideoResult
interface VideoResult {
videos: GeneratedVideo[];
timing?: ComputeTiming;
cost?: number;
metadata: object;
}
AudioResult
interface AudioResult {
audio: GeneratedAudio[];
timing?: ComputeTiming;
cost?: number;
metadata: object;
}
TranscriptionResult
interface TranscriptionResult {
text: string; // Full transcribed text
segments: TranscriptionSegment[]; // Time-aligned segments
language?: string; // Detected or specified language code
timing?: ComputeTiming;
cost?: number;
metadata: object;
}
TranscriptionSegment
interface TranscriptionSegment {
text: string; // Transcribed text for this segment
start: number; // Start time in seconds
end: number; // End time in seconds
speaker?: string; // Speaker label (if diarization enabled)
}
ThreeDResult
interface ThreeDResult {
models: Generated3DModel[];
timing?: ComputeTiming;
cost?: number;
metadata: object;
}
Compute Job Types
Low-level types for generic compute jobs.
ComputeRequest
interface ComputeRequest {
model: string; // Model/endpoint to run (e.g. "fal-ai/flux/dev")
input: object; // Input parameters (model-specific)
webhook?: string; // Webhook URL for async completion notification
}
JobHandle
interface JobHandle {
id: string; // Provider-assigned job identifier
provider: string; // Provider name (e.g. "fal", "replicate", "runpod")
model: string; // Model/endpoint that was invoked
submittedAt: string; // ISO 8601 timestamp
}
JobStatus
String enum for compute job status.
JobStatus.Queued // "queued"
JobStatus.Running // "running"
JobStatus.Completed // "completed"
JobStatus.Failed // "failed"
JobStatus.Cancelled // "cancelled"
ComputeResult
interface ComputeResult {
job?: JobHandle; // Job handle that produced this result
output: object; // Output data (model-specific)
timing?: ComputeTiming; // Request timing breakdown
cost?: number; // Cost in USD
metadata: object; // Raw provider-specific metadata
}
ComputeTiming
interface ComputeTiming {
queueMs?: number; // Time spent waiting in queue (ms)
executionMs?: number; // Time spent executing (ms)
totalMs?: number; // Total wall-clock time (ms)
}
Media Output Types
MediaOutput
A single piece of generated media content.
interface MediaOutput {
url?: string; // URL where the media can be downloaded
base64?: string; // Base64-encoded media data
rawContent?: string; // Raw text content (SVG, OBJ, GLTF JSON)
mediaType: string; // MIME type (e.g. "image/png", "video/mp4")
fileSize?: number; // File size in bytes
metadata: object; // Provider-specific metadata
}
GeneratedImage
interface GeneratedImage {
media: MediaOutput;
width?: number; // Image width in pixels
height?: number; // Image height in pixels
}
GeneratedVideo
interface GeneratedVideo {
media: MediaOutput;
width?: number; // Video width in pixels
height?: number; // Video height in pixels
durationSeconds?: number; // Duration in seconds
fps?: number; // Frames per second
}
GeneratedAudio
interface GeneratedAudio {
media: MediaOutput;
durationSeconds?: number; // Duration in seconds
sampleRate?: number; // Sample rate in Hz
channels?: number; // Number of audio channels
}
Generated3DModel
interface Generated3DModel {
media: MediaOutput;
vertexCount?: number; // Total vertex count
faceCount?: number; // Total face/triangle count
hasTextures: boolean; // Whether the model includes textures
hasAnimations: boolean; // Whether the model includes animations
}
EmbeddingModel
Generate vector embeddings from text. Created via static factory methods. Keys are read from environment variables (OPENAI_API_KEY, TOGETHER_API_KEY, etc.) when options is omitted, or can be passed explicitly via { apiKey: "..." }.
import { EmbeddingModel } from "blazen";
const model = EmbeddingModel.openai();
const together = EmbeddingModel.together();
const cohere = EmbeddingModel.cohere({ apiKey: "co-..." });
const fireworks = EmbeddingModel.fireworks();
Provider Factory Methods
| Method | Default Model | Default Dimensions |
|---|---|---|
EmbeddingModel.openai(options?) | text-embedding-3-small | 1536 |
EmbeddingModel.together(options?) | togethercomputer/m2-bert-80M-8k-retrieval | 768 |
EmbeddingModel.cohere(options?) | embed-v4.0 | 1024 |
EmbeddingModel.fireworks(options?) | nomic-ai/nomic-embed-text-v1.5 | 768 |
Properties
| Property | Type | Description |
|---|---|---|
.modelId | string | The model identifier. |
.dimensions | number | Output vector dimensionality. |
await model.embed(texts: string[]): EmbeddingResponse
Embed one or more texts, returning one vector per input.
const response = await model.embed(["Hello", "World"]);
console.log(response.embeddings.length); // 2
console.log(response.embeddings[0].length); // 1536
EmbeddingResponse
Returned by EmbeddingModel.embed().
interface EmbeddingResponse {
embeddings: number[][]; // One vector per input text
model: string; // Model that produced the embeddings
usage?: TokenUsage; // Token usage statistics
cost?: number; // Estimated cost in USD
timing?: RequestTiming; // Request timing breakdown
metadata: object; // Provider-specific metadata
}
Token Estimation
Lightweight token counting functions. Uses a heuristic (~3.5 characters per token) suitable for budget checks without external data files.
estimateTokens(text: string, contextSize?: number): number
Estimate token count for a text string.
import { estimateTokens } from "blazen";
const count = estimateTokens("Hello, world!"); // 4
countMessageTokens(messages: ChatMessage[], contextSize?: number): number
Estimate total tokens for an array of chat messages, including per-message overhead.
import { countMessageTokens, ChatMessage } from "blazen";
const count = countMessageTokens([
ChatMessage.system("You are helpful."),
ChatMessage.user("Hello!"),
]);
contextSize defaults to 128000 if omitted.
Subclassable Providers
CompletionModel, EmbeddingModel, and Transcription can be subclassed to implement custom providers. Override the relevant methods and the framework will dispatch to your implementation.
CompletionModel
import { CompletionModel, ChatMessage } from "blazen";
class MyLLM extends CompletionModel {
constructor() {
super({ modelId: "my-llm" });
}
async complete(messages: ChatMessage[]) {
// Your inference logic here
return { content: "Hello from my custom model" };
}
async stream(messages: ChatMessage[], onChunk: (chunk: any) => void) {
onChunk({ delta: "Hello", finishReason: null, toolCalls: [] });
onChunk({ delta: null, finishReason: "stop", toolCalls: [] });
}
}
const model = new MyLLM();
const response = await model.complete([ChatMessage.user("Hi")]);
EmbeddingModel
import { EmbeddingModel } from "blazen";
class MyEmbedder extends EmbeddingModel {
constructor() {
super({ modelId: "my-embedder", dimensions: 128 });
}
async embed(texts: string[]) {
return {
embeddings: texts.map(() => new Array(128).fill(0.1)),
model: "my-embedder",
};
}
}
Transcription
import { Transcription } from "blazen";
class MyTranscriber extends Transcription {
constructor() {
super({ providerId: "my-stt" });
}
async transcribe(request: any) {
return { text: "transcribed text", segments: [] };
}
}
Per-Capability Provider Classes
Seven provider base classes let you implement a single compute capability without dealing with the full ComputeProvider interface. Subclass and override the relevant methods.
| Class | Methods to Override | Rust Trait |
|---|---|---|
TTSProvider | textToSpeech(request) | AudioGeneration |
MusicProvider | generateMusic(request), generateSfx(request) | AudioGeneration |
ImageProvider | generateImage(request), upscaleImage(request) | ImageGeneration |
VideoProvider | textToVideo(request), imageToVideo(request) | VideoGeneration |
ThreeDProvider | generate3d(request) | ThreeDGeneration |
BackgroundRemovalProvider | removeBackground(request) | BackgroundRemoval |
VoiceProvider | cloneVoice(request), listVoices(), deleteVoice(voice) | VoiceCloning |
Constructor
All provider classes share the same constructor config:
new TTSProvider({
providerId: string,
baseUrl?: string,
pricing?: ModelPricing,
memoryEstimateBytes?: number,
})
| Field | Type | Description |
|---|---|---|
providerId | string | Identifier for the provider instance. |
baseUrl | string? | Optional base URL for the provider API. |
pricing | ModelPricing? | Optional pricing info for cost tracking. |
memoryEstimateBytes | number? | Estimated memory footprint in bytes (host RAM if on CPU, GPU VRAM otherwise). Used by ModelManager to charge the appropriate pool. |
Example
import { TTSProvider } from "blazen";
class ElevenLabsTTS extends TTSProvider {
private apiKey: string;
constructor(apiKey: string) {
super({ providerId: "elevenlabs" });
this.apiKey = apiKey;
}
async textToSpeech(request: any) {
// Call ElevenLabs API with this.apiKey
return { audio: audioBuffer, format: "mp3" };
}
}
const tts = new ElevenLabsTTS("sk-...");
const result = await tts.textToSpeech({ text: "Hello world", voice: "alice" });
MemoryBackend
Base class for custom memory storage backends. Subclass to implement persistence backed by Postgres, SQLite, DynamoDB, or any other store.
import { MemoryBackend } from "blazen";
class PostgresBackend extends MemoryBackend {
async put(entry: any): Promise<void> {
// Insert or update entry in Postgres
}
async get(id: string): Promise<any | null> {
// Retrieve entry by id
}
async delete(id: string): Promise<boolean> {
// Delete entry, return true if it existed
}
async list(): Promise<any[]> {
// Return all entries
}
async len(): Promise<number> {
// Return count of entries
}
async searchByBands(bands: any, limit: number): Promise<any[]> {
// Return candidates sharing LSH bands with the query
}
}
Methods to Override
| Method | Signature | Description |
|---|---|---|
put | async put(entry): Promise<void> | Insert or update a stored entry. |
get | async get(id: string): Promise<any | null> | Retrieve a stored entry by id. |
delete | async delete(id: string): Promise<boolean> | Delete an entry by id. Returns true if it existed. |
list | async list(): Promise<any[]> | Return all stored entries. |
len | async len(): Promise<number> | Return the number of stored entries. |
searchByBands | async searchByBands(bands, limit): Promise<any[]> | Return candidate entries sharing at least one LSH band. |
ProgressCallback
Subclassable base for download progress callbacks. Pass an instance to ModelCache.download() (and other download-capable APIs) to receive byte-count progress updates.
onProgress takes bigint byte counts so multi-gigabyte downloads keep full precision. total is null when the server does not send Content-Length.
import { ModelCache, ProgressCallback } from "blazen";
class MyProgress extends ProgressCallback {
onProgress(downloaded: bigint, total?: bigint | null): void {
if (total != null) {
const pct = Number((downloaded * 100n) / total);
console.log(`${pct}%`);
} else {
console.log(`${downloaded} bytes`);
}
}
}
const cache = ModelCache.create();
await cache.download("bert-base-uncased", "config.json", new MyProgress());
The base onProgress always throws — overriding it is mandatory. super() must be called from the subclass constructor.
ProgressCallback instances are accepted anywhere the SDK exposes a download hook — ModelCache.download(), the local-inference Provider.create() paths that pull weights from HuggingFace, and the ProgressCallback-aware variants of dataset loaders. Pass the same ProgressCallback instance to multiple downloads to centralise reporting (e.g. for a TUI progress bar).
ModelManager
Per-pool memory budget-aware model manager with LRU eviction. Tracks registered local models and their estimated memory footprint (host RAM if the model runs on CPU, GPU VRAM otherwise). Models live in distinct pools (CPU RAM vs per-GPU VRAM); when loading a model that would exceed its pool’s budget, the least-recently-used loaded model within that same pool is unloaded first. A CPU embedder never evicts a GPU LLM, and vice versa.
ModelManageris a memory budget bookkeeper, not a performance scheduler. It answers “will this fit?” — not “will this run fast?”. Whether a 70B model loaded on CPU is useful at 1–3 tok/s is a workload-choice question the manager intentionally does not answer.
Constructor
import { ModelManager } from "blazen";
// Common case: separate CPU and GPU budgets, in gigabytes.
const manager = new ModelManager({ cpuRamGb: 100, gpuVramGb: 24 });
// Explicit per-pool budgets in bytes (BigInt). Pool labels: "cpu", "gpu", "gpu:N".
const manager = new ModelManager({
poolBudgets: {
"cpu": 100_000_000_000n,
"gpu:0": 24_000_000_000n,
},
});
// No-arg / empty-object: BOTH the "cpu" and "gpu:0" pools default to
// u64::MAX (unlimited-budget sentinel). Useful for tests and ad-hoc scripts.
const manager = new ModelManager();
| Field | Type | Description |
|---|---|---|
cpuRamGb | number? | Host RAM budget in gigabytes for the "cpu" pool. |
gpuVramGb | number? | VRAM budget in gigabytes for the "gpu:0" pool. Omit if you don’t load GPU models. |
poolBudgets | Record<string, bigint>? | Explicit per-pool byte budgets. Keys are pool labels ("cpu", "gpu", "gpu:N"); values are bigint byte budgets. Use this when you need budgets above 4 GiB on a per-pool basis or multi-GPU setups. |
Methods
| Method | Signature | Description |
|---|---|---|
register | await manager.register(id, model, memoryEstimateBytes?: bigint) | Register a model with its estimated memory footprint (in bytes, as a bigint). Starts unloaded. The pool charged is determined by the model’s device() (defaults to "cpu"). |
registerLocalModel | await manager.registerLocalModel(id, load, unload, isLoaded, memoryEstimateBytes?: bigint, device?: string) | Register a model from raw closures. The optional device string (default "cpu") selects which pool to charge — e.g. "cuda:0" charges "gpu:0". |
load | await manager.load(id) | Load a model, evicting LRU models in the same pool if needed. |
unload | await manager.unload(id) | Unload a model and free its memory. |
isLoaded | await manager.isLoaded(id): boolean | Check if a model is currently loaded. |
ensureLoaded | await manager.ensureLoaded(id) | Alias for load(). |
usedBytes | await manager.usedBytes(pool?: string): bigint | Bytes currently used by loaded models in the given pool. pool defaults to "cpu". Invalid pool labels reject with invalid pool label '<x>': expected 'cpu', 'gpu', or 'gpu:N' where N is a non-negative integer. |
availableBytes | await manager.availableBytes(pool?: string): bigint | Bytes still available within the given pool’s budget. Same default and validation rules as usedBytes. |
pools | manager.pools(): Array<{ pool: string; budgetBytes: bigint }> | Sync. List every configured pool and its byte budget. |
status | await manager.status(): ModelStatus[] | Status of all registered models. |
ModelStatus
interface ModelStatus {
id: string; // Model identifier
loaded: boolean; // Whether the model is currently loaded
memoryEstimateBytes: bigint; // Estimated memory footprint in bytes
pool: string; // Pool label this model is charged to ("cpu", "gpu:0", ...)
}
Why
bigint? The byte-budget surface (poolBudgetsvalues,register’smemoryEstimateBytes,usedBytes(),availableBytes(),ModelStatus.memoryEstimateBytes) used to benumber(u32on the Rust side), which capped budgets at ~4 GiB and silently truncated larger inputs — a real footgun for 7B+ local models that need 8 GiB+ of memory. Pass values asBigIntliterals (8n * 1_073_741_824n) or viaBigInt(8 * 1024 ** 3). ThecpuRamGb/gpuVramGb: numberconstructor path is unchanged for users who prefer plain numbers and gigabyte granularity.
ModelRegistry
An ABC for model catalogs. Subclass it to advertise the models your code knows about — used by capability-discovery code, dynamic model menus, and parity with the Rust blazen_llm::traits::ModelRegistry trait.
export declare class ModelRegistry {
listModels(): Promise<ModelInfo[]>;
getModel(modelId: string): Promise<ModelInfo | null>;
}
| Method | Signature | Description |
|---|---|---|
listModels | async listModels(): Promise<ModelInfo[]> | Return every model the registry advertises. |
getModel | async getModel(modelId: string): Promise<ModelInfo | null> | Look up a single model by id, or return null if unknown. |
The base implementations of both methods throw — overriding them is mandatory. super() must be called from the subclass constructor.
import { ModelRegistry } from "blazen";
import type { ModelInfo } from "blazen";
class MyRegistry extends ModelRegistry {
async listModels(): Promise<ModelInfo[]> {
return [
{ id: "gpt-4o", provider: "openai" /* ...other ModelInfo fields */ },
{ id: "claude-sonnet-4", provider: "anthropic" /* ... */ },
];
}
async getModel(modelId: string): Promise<ModelInfo | null> {
const all = await this.listModels();
return all.find((m) => m.id === modelId) ?? null;
}
}
See the ModelInfo reference for the full set of fields each entry must populate (id, provider, capabilities, context window, pricing, etc.).
Mirrors PyModelRegistry (Python) and WasmModelRegistry exposed as ModelRegistry (WASM SDK) — subclassing ModelRegistry in any binding produces the same Rust-side blazen_llm::traits::ModelRegistry implementation.
ModelPricing and Pricing Functions
ModelPricing
Pricing metadata for cost tracking.
interface ModelPricing {
inputPerMillion?: number; // Cost per million input tokens (USD)
outputPerMillion?: number; // Cost per million output tokens (USD)
perImage?: number; // Cost per generated image (USD)
perSecond?: number; // Cost per second of compute (USD)
}
registerPricing()
Register custom pricing for a model. Overrides any existing pricing for the same model ID.
import { registerPricing } from "blazen";
registerPricing("my-model", { inputPerMillion: 1.0, outputPerMillion: 2.0 });
lookupPricing()
Look up pricing for a model by ID. Returns null if the model is unknown.
import { lookupPricing } from "blazen";
const pricing = lookupPricing("gpt-4o");
if (pricing) {
console.log(`Input: $${pricing.inputPerMillion}/M tokens`);
}
LocalModel Methods on CompletionModel
CompletionModel instances backed by local inference (not remote APIs) support explicit load/unload lifecycle management.
| Method | Signature | Description |
|---|---|---|
load | await model.load(): void | Load the model into memory (host RAM if on CPU, GPU VRAM otherwise). Idempotent. |
unload | await model.unload(): void | Free the model’s memory. Idempotent. |
isLoaded | await model.isLoaded(): boolean | Whether the model is currently loaded. |
memoryBytes | await model.memoryBytes(): number | null | Approximate memory footprint in bytes (host RAM if on CPU, GPU VRAM otherwise), or null if unknown. |
device | await model.device(): string | Return the device string this model targets ('cpu', 'cuda:0', 'metal', etc.). Determines which pool the manager charges. Subclasses must override — the base implementation throws. |
// For a local model:
await model.load();
console.log(await model.isLoaded()); // true
console.log(await model.memoryBytes()); // e.g. 4000000000
console.log(await model.device()); // e.g. "cuda:0"
await model.unload();
Error Handling
Errors thrown across the FFI boundary are surfaced as instances of typed BlazenError subclasses. Every error class extends the base BlazenError, which extends the standard JavaScript Error, so existing instanceof Error checks keep working while gaining structural classification.
The BlazenError hierarchy is what makes typed error routing possible — any caught value can be matched against BlazenError (catch-all for anything from the SDK), against the direct subclass (broad category like RateLimitError), or against a leaf class (specific failure like LlamaCppModelLoadError). Use whichever level of specificity your handler needs.
Root and direct subclasses
BlazenError is the root. The following 18 classes extend it directly:
| Class | When it’s thrown |
|---|---|
AuthError | Invalid or expired API key. |
RateLimitError | Provider rate limit reached. |
TimeoutError | Request exceeded its deadline. |
ValidationError | Invalid request parameters or option set. |
ContentPolicyError | Content moderated by the provider. |
ProviderError | Provider-specific error (HTTP status / endpoint detail attached — see below). |
UnsupportedError | Feature not supported by the chosen provider. |
ComputeError | Compute job failure (cancelled, quota exhausted, runtime failure). |
MediaError | Invalid or oversized media content. |
PeerEncodeError | Failed to encode a peer envelope. |
PeerTransportError | Network failure between peers. |
PeerEnvelopeVersionError | Peer protocol version mismatch. |
PeerWorkflowError | Remote peer workflow failed. |
PeerTlsError | TLS handshake or cert validation failed for a peer connection. |
PeerUnknownStepError | Peer requested an unknown workflow step. |
PersistError | Snapshot persistence backend failure. |
PromptError | Prompt registry / template failure. |
MemoryError | Memory store / embedder failure. |
CacheError | Model cache / download failure. |
ProviderError structured fields
ProviderError carries structured context in addition to the message string. All fields are nullable.
| Field | Type | Description |
|---|---|---|
.provider | string | null | Provider name (e.g. "openai", "anthropic"). |
.status | number | null | HTTP status code, when the call reached the provider. |
.endpoint | string | null | The endpoint that returned the error. |
.requestId | string | null | Provider-assigned request id (use this when filing support tickets). |
.detail | string | null | Provider-supplied error detail / body. |
.retryAfterMs | number | null | Suggested back-off when the provider returned a Retry-After hint. |
Per-backend ProviderError subclasses
Each local-inference and provider-side backend has its own ProviderError subclass with narrower variants. Use instanceof to route to backend-specific handling.
| Class | Backend | Representative narrower subclasses |
|---|---|---|
LlamaCppError | llama.cpp | LlamaCppInvalidOptionsError, LlamaCppModelLoadError, LlamaCppInferenceError, LlamaCppEngineNotAvailableError |
MistralRsError | mistral.rs | MistralRsInvalidOptionsError, MistralRsInitError, MistralRsInferenceError, MistralRsEngineNotAvailableError |
CandleLlmError | candle (LLM) | CandleLlmInvalidOptionsError, CandleLlmModelLoadError, CandleLlmInferenceError, CandleLlmEngineNotAvailableError |
CandleEmbedError | candle (embeddings) | CandleEmbedModelLoadError, CandleEmbedEmbeddingError, CandleEmbedEngineNotAvailableError, CandleEmbedTaskPanickedError |
WhisperError | whisper.cpp | WhisperModelLoadError, WhisperTranscriptionError, WhisperEngineNotAvailableError, WhisperIoError |
PiperError | Piper TTS | PiperModelLoadError, PiperSynthesisError, PiperEngineNotAvailableError |
DiffusionError | diffusion image gen | DiffusionModelLoadError, DiffusionGenerationError |
FastEmbedError | fastembed | EmbedUnknownModelError, EmbedInitError, EmbedEmbedError, EmbedMutexPoisonedError, EmbedTaskPanickedError |
TractError | tract ONNX runtime | (no narrower variants) |
PromptError similarly has narrower variants like PromptMissingVariableError, PromptNotFoundError, PromptVersionNotFoundError, PromptIoError, PromptYamlError, PromptJsonError, PromptValidationError. MemoryError exposes MemoryNoEmbedderError, MemoryEmbeddingError, MemoryNotFoundError, MemorySerializationError, MemoryIoError, MemoryBackendError. CacheError exposes DownloadError, CacheDirError, IoError. There are around 80 narrower subclasses in total — every public Rust error variant gets its own JS class.
enrichError(err: unknown): unknown
Plain Error instances thrown across the FFI boundary lose their original Rust type. Pass any caught value through enrichError to re-classify it into the proper BlazenError subclass before further inspection. It’s a no-op when the error is already typed.
import { enrichError, BlazenError } from "blazen";
try {
await model.complete([ChatMessage.user("Hi")]);
} catch (raw) {
const err = enrichError(raw);
if (err instanceof BlazenError) {
// typed handling
}
throw err;
}
Example: routing on the typed hierarchy
import {
RateLimitError, AuthError, TimeoutError,
ProviderError, LlamaCppEngineNotAvailableError,
} from "blazen";
try {
const response = await model.complete([ChatMessage.user("Hello")]);
} catch (e) {
if (e instanceof RateLimitError) {
await sleep(e instanceof ProviderError && e.retryAfterMs ? e.retryAfterMs : 1000);
} else if (e instanceof AuthError) {
refreshApiKey();
} else if (e instanceof TimeoutError) {
// safe to retry once
} else if (e instanceof LlamaCppEngineNotAvailableError) {
console.error("Build was compiled without llama.cpp support");
} else if (e instanceof ProviderError) {
console.error(`[${e.provider} ${e.status ?? "?"}] ${e.detail ?? e.message}`);
} else {
throw e;
}
}
RateLimitError, TimeoutError, transient ProviderErrors with status >= 500, and most PeerTransportErrors are safe to retry. AuthError, ValidationError, ContentPolicyError, and UnsupportedError are not.
When to call enrichError
The Rust core always emits typed BlazenError subclasses, but errors that originate in JS callbacks (tool handlers, persist callbacks, custom providers) and bubble back through Rust come out as plain Error instances. If you want uniform instanceof BlazenError matching everywhere, run every caught value through enrichError at the catch site:
import { enrichError, BlazenError, ProviderError } from "blazen";
async function safeCall<T>(fn: () => Promise<T>): Promise<T> {
try {
return await fn();
} catch (raw) {
const err = enrichError(raw);
if (err instanceof ProviderError && err.retryAfterMs) {
await new Promise(r => setTimeout(r, err.retryAfterMs!));
}
throw err;
}
}
enrichError is idempotent — passing it an already-typed BlazenError returns the same value unchanged, so it’s safe to layer.
Local Inference Types
Local backends (MistralRsProvider, LlamaCppProvider, CandleLlmProvider) expose typed input/output classes for direct use without going through the generic CompletionModel surface. Two parallel families exist — un-prefixed * for mistral.rs (the canonical surface) and LlamaCpp* for llama.cpp — plus a single CandleInferenceResult for the candle backend.
mistral.rs (canonical, un-prefixed)
| Class / enum | Purpose |
|---|---|
ChatMessageInput | Inference-side chat message. Constructor: new ChatMessageInput(role, text, images?); static ChatMessageInput.fromText(role, text). Getters: .role, .text, .images, .hasImages. |
ChatRole | Const enum: System, User, Assistant, Tool. |
InferenceImage | Image attachment for vision-capable models. Static factories: fromBytes(buf), fromPath(path), fromSource(src). |
InferenceImageSource | Tagged-union source. Static factories: bytes(buf), path(p). Getters: .kind ("bytes" or "path"), .data, .filePath. |
InferenceResult | Non-streaming result. Getters: .content, .reasoningContent, .toolCalls, .finishReason, .model, .usage. |
InferenceChunk | Streaming chunk. Getters: .delta, .reasoningDelta, .toolCalls, .finishReason. |
InferenceChunkStream | Async chunk source. Pull with await stream.next(); returns null when exhausted. |
InferenceToolCall | Tool call requested by the model. Constructor new InferenceToolCall(id, name, arguments); getters .id, .name, .arguments (JSON string). |
InferenceUsage | Token usage. Getters: .promptTokens, .completionTokens, .totalTokens, .totalTimeSec. |
import { ChatMessageInput, ChatRole, InferenceImage, InferenceChunkStream, MistralRsProvider } from "blazen";
const provider = await MistralRsProvider.create({ modelId: "..." });
const stream: InferenceChunkStream = await provider.inferStream([
ChatMessageInput.fromText(ChatRole.User, "Describe this image"),
]);
for (let chunk = await stream.next(); chunk !== null; chunk = await stream.next()) {
process.stdout.write(chunk.delta ?? "");
}
InferenceChunkStream is single-pass — once you’ve reached the terminating null, the stream is exhausted. Errors raised from inside InferenceChunkStream.next() are typed BlazenError subclasses (MistralRsInferenceError, etc.) so they can be matched alongside the rest of the error hierarchy.
llama.cpp (LlamaCpp prefix)
The llama.cpp surface mirrors the mistral.rs one with a narrower feature set (no reasoning content, no images on the message input itself).
| Class / enum | Purpose |
|---|---|
LlamaCppChatMessageInput | Constructor: new LlamaCppChatMessageInput(role, text). Getters: .role, .text. |
LlamaCppChatRole | Const enum: System, User, Assistant, Tool (capitalised — distinct from ChatRole). |
LlamaCppInferenceResult | Non-streaming result. Getters: .content, .finishReason, .model, .usage. |
LlamaCppInferenceChunk | Streaming chunk. Getters: .delta, .finishReason. |
LlamaCppInferenceChunkStream | Async chunk source. Same await stream.next() pattern. |
LlamaCppInferenceUsage | Getters: .promptTokens, .completionTokens, .totalTokens, .totalTimeSec. |
import {
LlamaCppChatMessageInput, LlamaCppChatRole,
LlamaCppInferenceChunkStream, LlamaCppProvider,
} from "blazen";
const provider = await LlamaCppProvider.create({ modelPath: "/models/llama.gguf" });
const stream: LlamaCppInferenceChunkStream = await provider.inferStream([
new LlamaCppChatMessageInput(LlamaCppChatRole.User, "What is 2+2?"),
]);
for (let chunk = await stream.next(); chunk !== null; chunk = await stream.next()) {
process.stdout.write(chunk.delta ?? "");
}
Like the mistral.rs InferenceChunkStream, LlamaCppInferenceChunkStream is single-pass; mid-stream failures throw a typed LlamaCppInferenceError.
candle
The candle backend exposes a single non-streaming result class.
| Class | Purpose |
|---|---|
CandleInferenceResult | Constructor: new CandleInferenceResult(content, promptTokens, completionTokens, totalTimeSecs). Getters: .content, .promptTokens, .completionTokens, .totalTimeSecs. |
The candle backend has no streaming counterpart to InferenceChunkStream / LlamaCppInferenceChunkStream — pull CandleInferenceResult once per call. If you need token-by-token streaming on candle, swap the provider to mistral.rs or llama.cpp.
Errors raised from local inference
All three families propagate errors as typed BlazenError subclasses. The mapping is documented in the Error Handling section above. As elsewhere, run callbacks through enrichError to re-classify any plain Error that bubbles back through Rust from JS-side code (custom samplers, custom token decoders, etc.).
Telemetry
OpenTelemetry-compatible tracing flows through the standard tracing subscriber. Blazen ships an optional Langfuse exporter that ships span batches to the Langfuse ingestion API. This is gated by the langfuse Cargo feature on the underlying crate, so it’s only available in builds that opted in at compile time.
LangfuseConfig
import { LangfuseConfig } from "blazen";
const config = new LangfuseConfig(
process.env.LANGFUSE_PUBLIC_KEY!,
process.env.LANGFUSE_SECRET_KEY!,
"https://cloud.langfuse.com", // host (optional, defaults to cloud)
100, // batchSize (optional, default 100)
5000, // flushIntervalMs (optional, default 5000)
);
| Property | Type | Description |
|---|---|---|
.publicKey | string | The Langfuse public API key. |
.secretKey | string | The Langfuse secret API key. |
.host | string | null | The configured host URL, or null when defaulted. |
.batchSize | number | Maximum events buffered before an automatic flush. |
.flushIntervalMs | number | Background flush interval in milliseconds. |
initLangfuse(config: LangfuseConfig): void
Install the global tracing subscriber. Spawns a background tokio task that flushes buffered span envelopes to Langfuse on the configured interval. Calling this more than once per process is safe — subsequent calls no-op because the global subscriber is already registered.
import { initLangfuse, LangfuseConfig } from "blazen";
initLangfuse(new LangfuseConfig(
process.env.LANGFUSE_PUBLIC_KEY!,
process.env.LANGFUSE_SECRET_KEY!,
));
Available only when the host build was compiled with the langfuse feature on the underlying telemetry crate. In builds without it, the symbol is still exported but the configured exporter is a no-op.
Wiring LangfuseConfig from environment
Most deployments construct LangfuseConfig directly from environment variables at startup. Tune batchSize and flushIntervalMs to balance ingestion latency against the per-request overhead of HTTP flushes:
import { initLangfuse, LangfuseConfig } from "blazen";
const cfg = new LangfuseConfig(
process.env.LANGFUSE_PUBLIC_KEY!,
process.env.LANGFUSE_SECRET_KEY!,
process.env.LANGFUSE_HOST, // null → cloud default
Number(process.env.LANGFUSE_BATCH_SIZE ?? 100),
Number(process.env.LANGFUSE_FLUSH_INTERVAL_MS ?? 5000),
);
initLangfuse(cfg);
A LangfuseConfig instance is purely declarative — it carries no IO state — so it’s safe to construct, inspect, log (with secrets redacted), and pass to initLangfuse independently.
You can keep multiple LangfuseConfig objects around (e.g. one per environment) and choose which one to install at startup; only the first initLangfuse call wins per process.
version()
Returns the Blazen library version string.
import { version } from "blazen";
console.log(version()); // "0.1.0"