Node.js API Reference

Complete API reference for blazen in Node.js

ChatMessage

A class for building typed chat messages. Supports text, multimodal (image) content, and content parts.

Constructor

new ChatMessage({ role?: string, content?: string, parts?: ContentPart[] })

Create a message from an options object. role defaults to "user" if omitted. Supply either content (text) or parts (multimodal), not both.

// Text message with explicit role
new ChatMessage({ role: "user", content: "Hello" })

// Using the Role enum
new ChatMessage({ role: Role.User, content: "Hello" })

// System message
new ChatMessage({ role: "system", content: "You are a helpful assistant." })

// Multimodal message with content parts
new ChatMessage({
  role: "user",
  parts: [
    { partType: "text", text: "Describe this image:" },
    { partType: "image", image: { source: { sourceType: "url", url: "https://example.com/photo.jpg" } } }
  ]
})

Static Factory Methods

MethodDescription
ChatMessage.system(content: string)Create a system message
ChatMessage.user(content: string)Create a user message
ChatMessage.assistant(content: string)Create an assistant message
ChatMessage.tool(content: string)Create a tool result message
ChatMessage.userImageUrl(text: string, url: string, mediaType?: string)Create a user message with text and an image URL
ChatMessage.userImageBase64(text: string, data: string, mediaType: string)Create a user message with text and a base64-encoded image
ChatMessage.userParts(parts: ContentPart[])Create a user message from an explicit list of content parts
const msg = ChatMessage.user("What is 2 + 2?");

const imgMsg = ChatMessage.userImageUrl(
  "What's in this image?",
  "https://example.com/photo.jpg",
  "image/jpeg"
);

const b64Msg = ChatMessage.userImageBase64(
  "Describe this:",
  base64Data,
  "image/png"
);

Properties

PropertyTypeDescription
.rolestringThe message role: "system", "user", "assistant", or "tool"
.contentstring | nullThe text content of the message, if any. For tool results that returned a plain string, the string lives here (and .toolResult is null).
.toolResultToolOutput | nullStructured tool-result payload. null for non-tool messages or when the tool returned a plain string. When non-null, .toolResult.data is the full structured value the caller should consume; the LLM-facing wire form is derived from .toolResult.llmOverride (if set) or from .toolResult.data via the provider’s default conversion.

Note on toolCallId: The Rust core stores a tool_call_id on tool messages so providers can correlate a tool result with the originating ToolCall.id. As of this writing the Node binding does not expose a public .toolCallId getter on ChatMessage — correlation is handled internally by the agent loop and on the wire by each provider. If you need explicit access, file an issue and we can surface it.


Role

String enum for message roles.

Role.System    // "system"
Role.User      // "user"
Role.Assistant // "assistant"
Role.Tool      // "tool"

Can be used interchangeably with plain strings in the ChatMessage constructor.


ContentPart

Types for multimodal message content, used in the parts field of the ChatMessage constructor and in ChatMessage.userParts().

interface ContentPart {
  partType: "text" | "image";
  text?: string;          // Required when partType is "text"
  image?: ImageContent;   // Required when partType is "image"
}

interface ImageContent {
  source: ImageSource;
  mediaType?: string;     // MIME type, e.g. "image/png"
}

interface ImageSource {
  sourceType: "url" | "base64";
  url?: string;           // Required when sourceType is "url"
  data?: string;          // Required when sourceType is "base64"
}

// `MediaSource` is a type alias for `ImageSource` re-exported for compute APIs
// that accept any media (image / video frame / audio cover-art) using the same shape.
// All compute requests that take a `MediaSource` accept exactly the same value
// you would pass to `ImageSource` — `MediaSource` exists only as an aliasing affordance.
export type MediaSource = ImageSource;

Note on MediaSource: prefer MediaSource in compute / generation APIs (image upscaling, video frames, audio cover art) and ImageSource in chat-message APIs. The two are interchangeable at the type level — MediaSource is just the structurally identical alias for ImageSource.

// Text part
{ partType: "text", text: "Describe this image:" }

// Image from URL
{
  partType: "image",
  image: {
    source: { sourceType: "url", url: "https://example.com/photo.jpg" },
    mediaType: "image/jpeg"
  }
}

// Image from base64
{
  partType: "image",
  image: {
    source: { sourceType: "base64", data: "iVBORw0KGgo..." },
    mediaType: "image/png"
  }
}

CompletionModel

A chat completion model. Created via static factory methods for each provider.

Provider Factory Methods

All providers accept an optional options object containing an apiKey (and other provider-specific fields). If options is omitted — or apiKey is not set within it — the key is read from the provider’s standard environment variable (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, etc.). If model is not set, the provider’s default model is used.

MethodSignature
CompletionModel.openai(options?: ProviderOptions)
CompletionModel.anthropic(options?: ProviderOptions)
CompletionModel.gemini(options?: ProviderOptions)
CompletionModel.azure(options: AzureOptions)
CompletionModel.fal(options?: FalOptions)
CompletionModel.openrouter(options?: ProviderOptions)
CompletionModel.groq(options?: ProviderOptions)
CompletionModel.together(options?: ProviderOptions)
CompletionModel.mistral(options?: ProviderOptions)
CompletionModel.deepseek(options?: ProviderOptions)
CompletionModel.fireworks(options?: ProviderOptions)
CompletionModel.perplexity(options?: ProviderOptions)
CompletionModel.xai(options?: ProviderOptions)
CompletionModel.cohere(options?: ProviderOptions)
CompletionModel.bedrock(options: BedrockOptions)
// Read key from OPENAI_API_KEY env var
const model = CompletionModel.openai();

// Pass an explicit key and override the model
const claude = CompletionModel.anthropic({ apiKey: "sk-ant-...", model: "claude-sonnet-4-20250514" });
const gemini = CompletionModel.gemini({ model: "gemini-2.5-flash" });

Properties

PropertyTypeDescription
.modelIdstringThe model identifier string

await model.complete(messages: ChatMessage[]): CompletionResponse

Perform a chat completion.

const response = await model.complete([
  ChatMessage.system("You are a helpful assistant."),
  ChatMessage.user("What is 2 + 2?"),
]);
console.log(response.content); // "4"

await model.completeWithOptions(messages: ChatMessage[], options: CompletionOptions): CompletionResponse

Perform a chat completion with additional options.

const response = await model.completeWithOptions(
  [ChatMessage.user("Write a haiku about Rust.")],
  { temperature: 0.7, maxTokens: 100 }
);

await model.stream(messages: ChatMessage[], onChunk: (chunk) => void): void

Stream a chat completion. The callback receives each chunk as it arrives.

await model.stream(
  [ChatMessage.user("Tell me a story")],
  (chunk) => {
    if (chunk.delta) process.stdout.write(chunk.delta);
  }
);

Each chunk has the shape:

{
  delta?: string;              // Text content delta
  finishReason?: string;       // Set on the final chunk
  toolCalls: ToolCall[];       // Tool calls, if any
}

await model.streamWithOptions(messages: ChatMessage[], onChunk: (chunk) => void, options: CompletionOptions): void

Stream a chat completion with additional options.

await model.streamWithOptions(
  [ChatMessage.user("Explain quantum computing")],
  (chunk) => { if (chunk.delta) process.stdout.write(chunk.delta); },
  { temperature: 0.5, maxTokens: 500 }
);

Middleware Decorators

Each decorator returns a new CompletionModel wrapping the original with additional behaviour.

model.withRetry(config?: RetryConfig): CompletionModel

Automatic retry with exponential backoff on transient failures.

const resilient = model.withRetry({ maxRetries: 5, initialDelayMs: 500, maxDelayMs: 15000 });
FieldTypeDefaultDescription
maxRetriesnumber3Maximum retry attempts.
initialDelayMsnumber1000Delay before first retry (ms).
maxDelayMsnumber30000Upper bound on backoff delay (ms).

model.withCache(config?: CacheConfig): CompletionModel

In-memory response cache for identical non-streaming requests.

const cached = model.withCache({ ttlSeconds: 600, maxEntries: 500 });
FieldTypeDefaultDescription
ttlSecondsnumber300Cache entry TTL in seconds.
maxEntriesnumber1000Maximum entries before eviction.

CompletionModel.withFallback(models: CompletionModel[]): CompletionModel

Static factory method. Tries providers in order; falls back on transient errors.

const model = CompletionModel.withFallback([
  CompletionModel.openai(),
  CompletionModel.anthropic(),
]);

CompletionOptions

Options object for completeWithOptions() and streamWithOptions().

interface CompletionOptions {
  temperature?: number;        // Sampling temperature (0.0 - 2.0)
  maxTokens?: number;          // Maximum tokens to generate
  topP?: number;               // Nucleus sampling parameter
  model?: string;              // Override the default model ID
  tools?: ToolDefinition[];    // Tool definitions for function calling
}

CompletionResponse

Returned by model.complete() and model.completeWithOptions().

interface CompletionResponse {
  content?: string;                // The generated text
  toolCalls: ToolCall[];           // Tool calls requested by the model
  usage?: TokenUsage;              // Token usage statistics
  model: string;                   // Model name used for the completion
  finishReason?: string;           // Why generation stopped ("stop", "tool_calls", etc.)
  cost?: number;                   // Cost in USD, if reported by the provider
  timing?: RequestTiming;          // Request timing breakdown
  images: object[];                // Generated images, if any (provider-specific)
  audio: object[];                 // Generated audio, if any (provider-specific)
  videos: object[];                // Generated videos, if any (provider-specific)
  metadata: object;                // Raw provider-specific metadata
}

ToolCall

A tool invocation requested by the model.

PropertyTypeDescription
.idstringUnique identifier for the tool call
.namestringName of the tool to invoke
.argumentsobjectParsed JSON arguments

ToolOutput

Two-channel return shape for tool handlers. Tool results have two distinct audiences. The caller (your TypeScript code) wants the full structured data; the LLM, on the next turn, may need a different shape — sometimes shorter, sometimes provider-specific. ToolOutput carries both channels.

import type { ToolOutput, LlmPayload } from "blazen";

const out: ToolOutput = {
  data: { items: [1, 2, 3] },
};

Properties

MemberTypeDescription
dataanyThe structured value the caller sees programmatically. Dict, array, scalar, or string.
llmOverrideLlmPayload | undefinedOptional override for what the LLM sees on the next turn. undefined means each provider applies its default conversion from data.

Both llmOverride (camelCase) and llm_override (snake_case) are accepted on input from a JS tool handler, so a hand-written object using either casing will deserialize correctly.

When the agent loop appends a tool result to the conversation, the resulting ChatMessage exposes the structured output via .toolResult:

const last = result.messages[result.messages.length - 1];
// last.toolResult?.data is the full structured payload from your handler.
// last.toolResult?.llmOverride is the override (if any) the LLM saw this turn.

For tool handlers that returned a plain string, last.toolResult is null and the string lives on last.content instead.


LlmPayload

A tagged union describing what the LLM sees for a tool result on the next turn. Used as the llmOverride field of ToolOutput.

import type { LlmPayload } from "blazen";

const text: LlmPayload = { kind: "text", text: "Found 3 results." };

const json: LlmPayload = { kind: "json", value: { items: [1, 2, 3] } };

const parts: LlmPayload = {
  kind: "parts",
  parts: [
    { partType: "text", text: "Here is the table:" },
    { partType: "text", text: "| col |\n| --- |\n| 1 |" },
  ],
};

const raw: LlmPayload = {
  kind: "provider_raw",
  provider: "anthropic",
  value: [{ type: "text", text: "Custom Anthropic-shaped payload." }],
};

Variants

kindRequired fieldsBehavior
"text"textPlain text. Works on every provider.
"json"valueStructured JSON. Anthropic and Gemini natively consume the structure; OpenAI-family stringifies once at the wire boundary.
"parts"parts (ContentPart[])Multimodal content blocks; Anthropic supports natively, OpenAI falls back to text concatenation, Gemini wraps as { parts: [...] }.
"provider_raw"provider, valueProvider-specific escape hatch. Only the named provider sees value; every other provider falls back to converting ToolOutput.data with its default. provider is one of "openai", "openai_compat", "azure", "anthropic", "gemini", "responses", "fal".

Per-provider behavior

When a tool returns structured data and no llmOverride, each provider sends a sensible default to the LLM:

  • OpenAI / OpenAI-compat / Azure / Responses / Fal: the data is JSON-stringified into the content field of the tool message.
  • Anthropic: structured data becomes [{ type: "text", text: <stringified-json> }] inside tool_result.content.
  • Gemini: structured object data is passed natively as functionResponse.response. Scalars wrap as { result: <scalar> }.

When llmOverride is set, that override always wins for the variants the provider understands; kind: "provider_raw" is the strictest — it’s only honoured when provider matches the active provider, otherwise the provider falls back to converting data with its default.


ToolDefinition

Describes a tool that the model may invoke.

interface ToolDefinition {
  name: string;            // Unique tool name
  description: string;     // Human-readable description
  parameters: object;      // JSON Schema for the tool's parameters
}
const tools: ToolDefinition[] = [
  {
    name: "getWeather",
    description: "Get the current weather for a city",
    parameters: {
      type: "object",
      properties: { city: { type: "string" } },
      required: ["city"]
    }
  }
];

Content Subsystem

Pluggable storage and handle plumbing for multimodal payloads (images, audio, video, documents, 3D models, CAD files). A ContentStore issues opaque handles; tools accept handle-id strings as arguments and Blazen substitutes the resolved typed content before the handler runs.

import {
  ContentStore,
  imageInput,
  audioInput,
  videoInput,
  fileInput,
  threeDInput,
  cadInput,
} from "blazen";
import type {
  ContentHandle,
  ContentKind,
  JsContentMetadata,
  PutOptions,
} from "blazen";

ContentKind

String enum tagging what a handle refers to. Values match the serde-tag form so they round-trip across any Blazen API that takes a kind string.

const enum ContentKind {
  Image = "image",
  Audio = "audio",
  Video = "video",
  Document = "document",
  ThreeDModel = "three_d_model",
  Cad = "cad",
  Archive = "archive",
  Font = "font",
  Code = "code",
  Data = "data",
  Other = "other",
}

ContentHandle

Opaque reference returned by ContentStore.put() and consumed by every other store method. Treat id as a black box — store-defined.

interface ContentHandle {
  id: string;             // Opaque, store-defined identifier
  kind: ContentKind;      // What kind of content this handle refers to
  mimeType?: string;      // MIME type if known
  byteSize?: number;      // Byte size if known (i64 -- napi has no u64)
  displayName?: string;   // Human-readable display name (e.g. original filename)
}

ContentHandle is a type alias for the underlying JsContentHandle interface.

ContentStore

Pluggable registry for multimodal content. Construct via the static factories; instances are cheap to clone (internally an Arc), so reusing one store across multiple agents and requests is fine.

class ContentStore {
  // Factories
  static inMemory(): ContentStore;
  static localFile(root: string): ContentStore;
  static openaiFiles(apiKey: string, baseUrl?: string | null): ContentStore;
  static anthropicFiles(apiKey: string, baseUrl?: string | null): ContentStore;
  static geminiFiles(apiKey: string, baseUrl?: string | null): ContentStore;
  static falStorage(apiKey: string, baseUrl?: string | null): ContentStore;
  static custom(options: CustomContentStoreOptions): ContentStore;

  // Operations
  put(body: Buffer | string, options: PutOptions): Promise<ContentHandle>;
  resolve(handle: ContentHandle): Promise<any>;          // serialized MediaSource
  fetchBytes(handle: ContentHandle): Promise<Buffer>;
  metadata(handle: ContentHandle): Promise<JsContentMetadata>;
  delete(handle: ContentHandle): Promise<void>;
}

put() accepts either a Buffer (inline bytes uploaded to the store) or a string — interpreted as a URL when it contains "://" (the store records the reference) and as a local filesystem path otherwise (the store reads or copies the file as needed).

resolve() returns a serialized MediaSource — the same JSON shape Blazen’s request builders accept. fetchBytes() is for tools that need to operate on the raw payload (parse a PDF, transcribe audio); most tools reason over the handle and let resolve() produce the wire form. delete() is best-effort — default implementations on most stores are no-ops.

const store = ContentStore.inMemory();

const handle = await store.put(Buffer.from(pngBytes), {
  mimeType: "image/png",
  kind: ContentKind.Image,
  displayName: "diagram.png",
});

const meta = await store.metadata(handle);
const bytes = await store.fetchBytes(handle);

Subclassing ContentStore

ContentStore is subclassable from JavaScript / TypeScript. Override the methods your backend needs; napi-rs wraps your subclass in a Rust adapter that dispatches into your JS async functions via ThreadsafeFunction.

import { ContentStore } from "blazen";
import type { ContentHandle, ContentKind } from "blazen";

class S3ContentStore extends ContentStore {
  constructor(bucket: string) {
    super();
    this.bucket = bucket;
  }

  async put(body, hint) {
    // ...
    return { id: "...", kind: "image" };
  }

  async resolve(handle) {
    return { sourceType: "url", url: "..." };
  }

  async fetchBytes(handle) {
    return Buffer.from("...");
  }

  // Optional:
  async fetchStream(handle) { return Buffer.from("..."); }
  async delete(handle) { /* no-op */ }
}

Subclasses MUST override put, resolve, fetchBytes. The base-class default impls throw an error so any missing override is a clear failure rather than silent recursion via super().

ContentStore.custom({...})

Callback-based factory. Direct JS mirror of Rust CustomContentStore::builder.

ContentStore.custom(options: {
  put: (body: any, hint: any) => Promise<ContentHandle>;
  resolve: (handle: ContentHandle) => Promise<any>;        // serialized MediaSource
  fetchBytes: (handle: ContentHandle) => Promise<Buffer>;
  fetchStream?: (handle: ContentHandle) => Promise<Buffer>; // single-chunk for now
  delete?: (handle: ContentHandle) => Promise<void>;
  name?: string;
}): ContentStore

put, resolve, fetchBytes are required. fetchStream and delete are optional. The body argument arrives as a JS object shaped like {type: "bytes", data: [...]} / {type: "url", url} / {type: "local_path", path} / {type: "provider_file", provider, id} / {type: "stream", stream: AsyncIterable<Uint8Array>, sizeHint: number | null}. The hint has optional mimeType / kindHint / displayName / byteSize.

resolve returns a serialized MediaSource JS object (e.g. {sourceType: "url", url: "..."}). fetchBytes returns a Buffer. fetchStream may return either Buffer / Uint8Array / number[] / base64 string (legacy, single-chunk) or an AsyncIterable<Uint8Array> for true chunk-by-chunk streaming — a Node Readable qualifies since it implements [Symbol.asyncIterator].

PutOptions

Optional hints attached to a put() call. Every field is optional; the store may auto-detect from the bytes when a hint is missing.

interface PutOptions {
  mimeType?: string;      // MIME type, if known
  kind?: ContentKind;     // Caller's preferred classification -- overrides any auto-detection
  displayName?: string;   // Human-readable display name (filename, caption)
  byteSize?: number;      // Byte size, if known up-front (i64 since napi has no u64)
}

ContentMetadata

Cheap metadata summary returned by ContentStore.metadata(). No bytes are materialized.

interface ContentMetadata {
  kind: ContentKind;
  mimeType?: string;
  byteSize?: number;
  displayName?: string;
}

ContentMetadata is a type alias for the underlying JsContentMetadata interface.

Built-in stores

FactoryPurpose
ContentStore.inMemory()Ephemeral in-memory store. Default for tests and short-lived sessions.
ContentStore.localFile(root)Filesystem-backed store rooted at root. Directory is created if missing.
ContentStore.openaiFiles(apiKey, baseUrl?)Backed by the OpenAI Files API.
ContentStore.anthropicFiles(apiKey, baseUrl?)Backed by the Anthropic Files API.
ContentStore.geminiFiles(apiKey, baseUrl?)Backed by the Gemini Files API.
ContentStore.falStorage(apiKey, baseUrl?)Backed by fal.ai’s storage API.
ContentStore.custom({...})User-defined backend via async callbacks (see above).

Tool-input schema helpers

Each helper builds a JSON Schema fragment declaring a single required handle-id input. The model emits a handle-id string; Blazen swaps it for the resolved typed content before your tool runs.

HelperDeclares an input expecting a handle of kind
imageInput(name, description)image
audioInput(name, description)audio
videoInput(name, description)video
fileInput(name, description)document
threeDInput(name, description)three_d_model
cadInput(name, description)cad

Each call returns the same shape (kind tag varies):

imageInput("photo", "The image to describe");
// =>
// {
//   type: "object",
//   properties: {
//     photo: {
//       type: "string",
//       description: "The image to describe",
//       "x-blazen-content-ref": { kind: "image" }
//     }
//   },
//   required: ["photo"]
// }

The x-blazen-content-ref extension is invisible to providers — they see a plain string parameter — but Blazen’s resolver uses it as a marker for handle substitution.

How resolution works

When the model emits a tool call like {"photo": "blazen_xxx"}, Blazen scans the tool’s parameter schema for x-blazen-content-ref markers. For each marked field, it looks up the handle id in the active ContentStore and replaces the bare string with a typed object before invoking your handler:

{
  kind: "image",
  handleId: "blazen_xxx",
  mimeType: "image/png",
  byteSize: 24576,
  displayName: "diagram.png",
  source: { /* resolved MediaSource -- url, base64, or provider-native ref */ }
}

source is the same wire form ContentStore.resolve() returns, so your handler can forward it straight into a downstream provider request, or call store.fetchBytes(handle) if it needs the raw payload. If the handle is unknown to the store the call fails before the handler runs.


TokenUsage

Token usage statistics for a completion request.

PropertyTypeDescription
.promptTokensnumberTokens in the prompt
.completionTokensnumberTokens in the completion
.totalTokensnumberTotal tokens used

RequestTiming

Timing metadata for a completion request.

PropertyTypeDescription
.queueMsnumber | undefinedTime spent waiting in queue (ms)
.executionMsnumber | undefinedTime spent executing (ms)
.totalMsnumber | undefinedTotal wall-clock time (ms)

runAgent

Run an agentic tool execution loop. The agent repeatedly calls the model, executes tool calls via the handler callback, feeds results back, and repeats until the model stops calling tools or maxIterations is reached.

const result = await runAgent(model, messages, tools, toolHandler, options?);

Parameters

ParameterTypeDescription
modelCompletionModelThe completion model to use
messagesChatMessage[]Initial conversation messages
toolsToolDef[]Tool definitions the agent can invoke
toolHandler(toolName: string, args: object) => Promise<any | ToolOutput>Callback that executes tool calls. May return a bare value (auto-wrapped) or an explicit ToolOutput.
optionsAgentRunOptions?Optional configuration

Example

import { CompletionModel, ChatMessage, runAgent } from "blazen";

const model = CompletionModel.openai();

const result = await runAgent(
  model,
  [ChatMessage.user("What is the weather in NYC?")],
  [{
    name: "getWeather",
    description: "Get weather for a city",
    parameters: {
      type: "object",
      properties: { city: { type: "string" } },
      required: ["city"]
    }
  }],
  async (toolName, args) => {
    if (toolName === "getWeather") {
      return { temp: 72, condition: "sunny" };
    }
    throw new Error(`Unknown tool: ${toolName}`);
  },
  { maxIterations: 5 }
);

console.log(result.response.content);
console.log(`Took ${result.iterations} iterations`);

Tool handler return shapes

The handler’s return value is checked for an explicit data key. If present and the value deserializes as a ToolOutput, it’s used directly. Otherwise the bare value is wrapped via ToolOutput { data: <value>, llmOverride: undefined }.

This means an arbitrary user dict like { items: [1, 2, 3] } is treated as plain data, not as a ToolOutput. Only objects with a top-level data field are unpacked.

import { runAgent, type ToolDef } from "blazen";

// Simplest: return a value directly, auto-wrapped.
const search: ToolDef = {
  name: "search",
  description: "Search for items.",
  parameters: { type: "object", properties: { q: { type: "string" } } },
};

async function handlerSimple(toolName: string, args: any) {
  if (toolName === "search") {
    return { items: [1, 2, 3] }; // wrapped as ToolOutput { data: { items: [1,2,3] } }
  }
  throw new Error(`Unknown tool: ${toolName}`);
}

// With override: structured ToolOutput so the LLM sees a summary,
// but the caller's `messages[messages.length-1].toolResult.data`
// still has the full list.
async function handlerWithOverride(toolName: string, args: any) {
  if (toolName === "search") {
    return {
      data: { items: [1, 2, 3], rawResponse: "..." },
      llmOverride: { kind: "text", text: "Found 3 items." },
    };
  }
  throw new Error(`Unknown tool: ${toolName}`);
}

To return a string to the caller (so it lives on ChatMessage.content and toolResult is null), simply return a string from the handler:

async function handlerString(toolName: string, args: any) {
  return "ok"; // appears as ChatMessage.content; toolResult is null
}

See ToolOutput and LlmPayload for the full shape and per-provider wire behaviour.


ToolDef

Describes a tool that the agent may invoke.

interface ToolDef {
  name: string;            // Unique tool name
  description: string;     // Human-readable description
  parameters: object;      // JSON Schema for the tool's parameters
}

AgentRunOptions

Options for configuring an agent run.

interface AgentRunOptions {
  maxIterations?: number;    // Max tool-calling iterations (default: 10)
  systemPrompt?: string;     // System prompt prepended to the conversation
  temperature?: number;      // Sampling temperature (0.0 - 2.0)
  maxTokens?: number;        // Maximum tokens per completion call
  addFinishTool?: boolean;   // Add a built-in "finish" tool the model can call to signal completion
}

AgentResult

The result of an agent run, returned by runAgent(). AgentResult is a typed class with getter properties (not a plain object) so it carries identity across the FFI boundary and supports instanceof checks.

Properties

PropertyTypeDescription
.responsestringFinal assistant text from the last completion.
.messagesChatMessage[]Full message history (all tool calls and results).
.iterationsnumberNumber of tool-calling iterations that occurred.
.totalCostnumber | nullAggregated cost in USD across all iterations, or null if pricing is unknown.
.toString()stringHuman-readable summary, mirrors the Python __repr__.
const result = await runAgent(model, [ChatMessage.user("Hi")], tools);
console.log(result.response);            // "Hello!"
console.log(result.messages.length);     // e.g. 3
console.log(result.iterations);          // e.g. 1
console.log(result.totalCost);           // e.g. 0.00012 or null

BatchResult

Returned by completeBatch() / completeBatchConfig(). A typed class wrapping per-request outcomes plus aggregates.

Properties

PropertyTypeDescription
.responses(CompletionResponse | null)[]One entry per input request. null for failed requests.
.errors(string | null)[]Per-request error message, or null for successful requests.
.totalUsageTokenUsage | nullAggregated token usage across all successful responses.
.totalCostnumber | nullAggregated cost in USD across all successful responses.
.successCountnumberNumber of requests that succeeded.
.failureCountnumberNumber of requests that failed.
.lengthnumberTotal number of requests in the batch (= responses.length = errors.length).
.toString()stringHuman-readable batch summary.
const batch = await completeBatch(model, [
  [ChatMessage.user("What's 2+2?")],
  [ChatMessage.user("Capital of France?")],
]);
console.log(`${batch.successCount}/${batch.length} succeeded`);
for (let i = 0; i < batch.length; i++) {
  if (batch.errors[i]) console.error(`req ${i}:`, batch.errors[i]);
  else console.log(`req ${i}:`, batch.responses[i]?.content);
}

Pipeline

A Pipeline is a sequence of named Stages built with PipelineBuilder. Each stage runs as its own workflow; on completion, an optional persist callback fires with a typed snapshot so the caller can durably store progress for later resumption.

new PipelineBuilder(name: string)

import { PipelineBuilder } from "blazen";

const pipeline = new PipelineBuilder("ingest")
  .stage(stageA)
  .stage(stageB)
  .timeoutPerStage(120)
  .build();

.onPersist(callback)

Register a TSFN-based persist callback that receives a typed PipelineSnapshot after each stage completes. The callback must return Promise<void> (or be async). A rejected promise aborts the pipeline with a PipelineError.

import { PipelineBuilder, PipelineSnapshot } from "blazen";

const pipeline = new PipelineBuilder("ingest")
  .stage(stage)
  .onPersist(async (snapshot: PipelineSnapshot) => {
    await db.put(`pipeline:${snapshot.runId}`, snapshot.toJsonPretty());
  })
  .build();

.onPersistJson(callback)

Same as onPersist, but the callback receives the snapshot pre-serialized as a JSON string. Useful for backends that store opaque blobs (IndexedDB, Redis, S3).

const pipeline = new PipelineBuilder("ingest")
  .stage(stage)
  .onPersistJson(async (json: string) => {
    await idb.put("snapshots", json, runId);
  })
  .build();

The snapshot can later be replayed via pipeline.resume(PipelineSnapshot.fromJson(json)).

If your onPersist (or onPersistJson) callback throws or returns a rejected promise, the rejection is wrapped as a PersistError (a BlazenError subclass) and propagated to the running pipeline, aborting it. Catch the error from await handler.result() and inspect with instanceof PersistError to distinguish persistence failures from stage failures.

import { PersistError } from "blazen";

try {
  await handler.result();
} catch (e) {
  if (e instanceof PersistError) {
    console.error("snapshot persistence failed:", e.message);
  }
}

Workflow

new Workflow(name: string)

Create a new workflow instance. Default timeout is 300 seconds (5 minutes).

const wf = new Workflow("my-workflow");

.addStep(name: string, eventTypes: string[], handler: StepHandler)

Register a step that listens for one or more event types.

wf.addStep("process", ["MyEvent"], async (event, ctx) => {
  return { type: "blazen::StopEvent", result: { done: true } };
});

.setTimeout(seconds: number)

Set the maximum execution time for the workflow in seconds. Set to 0 or negative to disable.

wf.setTimeout(30);

await wf.run(input: object): WorkflowResult

Run the workflow to completion with the given input.

const result = await wf.run({ prompt: "Hello" });

await wf.runStreaming(input: object, callback: (event) => void): WorkflowResult

Run the workflow with a streaming callback invoked for each event published via ctx.writeEventToStream().

const result = await wf.runStreaming({ prompt: "Hello" }, (event) => {
  console.log("stream:", event);
});

await wf.runWithHandler(input: object): WorkflowHandler

Run the workflow and return a handler for pause/resume and streaming control.

const handler = await wf.runWithHandler({ prompt: "Hello" });

await wf.resume(snapshotJson: string): WorkflowHandler

Resume a previously paused workflow from a snapshot JSON string.

const snapshot = fs.readFileSync("snapshot.json", "utf-8");
const handler = await wf.resume(snapshot);
const result = await handler.result();

WorkflowResult

interface WorkflowResult {
  type: string;     // Event type of the final result (e.g. "blazen::StopEvent")
  data: object;     // Result data extracted from the StopEvent's result field
}

StepHandler

async (event: object, ctx: Context) => object | object[] | null

A step handler receives an event and a context. It can return:

  • A single event object to emit one event.
  • An array of event objects to fan-out multiple events.
  • null for side-effect-only steps that emit no events.

WorkflowHandler

Returned by Workflow.runWithHandler() and Workflow.resume(). Provides control over a running workflow.

Important: result() consumes the handler internally — you can only call it once. The other control methods (pause, resumeInPlace, abort, respondToInput, snapshot) borrow the handler and can be called multiple times.

await handler.result(): WorkflowResult

Await the final workflow result.

await handler.pause(): void

Signal the running workflow to pause. After pausing, use snapshot() to get a serializable snapshot, or resumeInPlace() to continue execution.

await handler.snapshot(): string

Get a serializable snapshot of the paused workflow as a JSON string. Save this to resume later with Workflow.resume().

await handler.resumeInPlace(): void

Resume a paused workflow in place without creating a new handler.

await handler.streamEvents(callback: (event) => void): void

Subscribe to intermediate events published via ctx.writeEventToStream(). Must be called before result() or pause().

const handler = await wf.runWithHandler({ prompt: "Hello" });

// Subscribe to stream events
await handler.streamEvents((event) => console.log(event));

// Then await the result
const result = await handler.result();

Events

Events are plain objects with a type field.

{ type: "MyEvent", payload: "data" }

Start Event

{ type: "blazen::StartEvent", ...input }

The workflow begins by emitting a StartEvent containing the input data.

Stop Event

{ type: "blazen::StopEvent", result: { ... } }

Returning a StopEvent from a step handler completes the workflow.


Context

Shared workflow context accessible by all steps. All methods are async.

StateValue

All context values conform to the StateValue type:

type StateValue = string | number | boolean | null | Buffer | StateValue[] | { [key: string]: StateValue };

await ctx.set(key: string, value: Exclude<StateValue, Buffer>): void

Store a JSON-serializable value in the workflow context. Accepts strings, numbers, booleans, null, arrays, and nested objects. For binary data, use ctx.setBytes() instead.

The legacy ctx.set / ctx.get shortcuts still work and route values through the same 4-tier dispatch. For new code, prefer the explicit ctx.state / ctx.session namespaces documented below.

await ctx.get(key: string): Promise<StateValue | null>

Retrieve a value from the workflow context. Returns null if not found. Returns data for all StateValue variants — strings, numbers, booleans, arrays, objects, and Buffer (if the key was stored via setBytes). No data is silently dropped.

await ctx.setBytes(key: string, buffer: Buffer): void

Store raw binary data in the workflow context. Use this for explicit binary storage (e.g., MessagePack, protobuf, raw buffers). Binary data persists through pause/resume/checkpoint.

await ctx.getBytes(key: string): Buffer | null

Retrieve raw binary data from the workflow context. Returns null if not found. Note that ctx.get() also returns binary data now, so getBytes is mainly useful when you want to assert that a key holds binary content.

await ctx.runId(): string

Get the unique run UUID for the current workflow execution.

await ctx.sendEvent(event: object): void

Manually route an event into the workflow event bus. The event will be delivered to any step whose eventTypes list includes its type.

await ctx.writeEventToStream(event: object): void

Publish an event to the external broadcast stream. Consumers that subscribed via runStreaming or handler.streamEvents() will receive this event. Unlike sendEvent, this does not route the event through the internal step registry.

get state(): StateNamespace

Persistable workflow state. Survives pause() / resume(), checkpoints, and durable storage. See StateNamespace below.

get session(): SessionNamespace

In-process-only values, excluded from snapshots. Use this for things that should not survive pause() / resume(). JS object identity is NOT preserved on Node — see the SessionNamespace caveat below.


StateNamespace

Namespace for persistable workflow state. Values stored via state.set / state.setBytes go into the underlying ContextInner.state map and survive snapshots, pause() / resume(), and checkpoint stores.

await state.set(key: string, value: Exclude<StateValue, Buffer>): Promise<void>

Store a JSON-serializable value under the given key.

await state.get(key: string): Promise<StateValue | null>

Retrieve a value previously stored under the given key. Returns null if not found.

await state.setBytes(key: string, data: Buffer): Promise<void>

Store raw binary data under the given key.

await state.getBytes(key: string): Promise<Buffer | null>

Retrieve raw binary data previously stored under the given key. Returns null if not found.

workflow.addStep("step", ["blazen::StartEvent"], async (event, ctx) => {
  await ctx.state.set("counter", 5);
  const count = await ctx.state.get("counter");
  return { type: "blazen::StopEvent", result: { count } };
});

SessionNamespace

Namespace for in-process-only workflow values. Values stored via session.set are kept in the ContextInner.objects side-channel and are excluded from snapshots. Use this for state that should not survive a pause() / resume() round-trip (request IDs, rate-limit counters, ephemeral caches, …).

Important — napi-rs identity caveat: JS object identity is NOT preserved through ctx.session on the Node bindings. Values are routed through serde_json::Value because napi-rs’s Reference<T> is !Send — its Drop must run on the v8 main thread, and tokio worker threads cannot safely cross the napi boundary with live JS object references. await ctx.session.get(key) returns a plain object equal to the one you passed in, not the same object instance. For true JS class identity preservation, use the Python or WASM bindings, or keep the work inside a single Rust step. Full identity through events is tracked as a follow-up architectural refactor.

The session namespace is still functionally distinct from state: session values are excluded from snapshots, state values are not.

await session.set(key: string, value: unknown): Promise<void>

Store a JSON-serializable value under the given key. The value is excluded from snapshots.

await session.get(key: string): Promise<unknown>

Retrieve a value previously stored under the given key. Returns null if the key does not exist.

await session.has(key: string): Promise<boolean>

Check whether a value exists under the given key.

await session.remove(key: string): Promise<void>

Remove the value stored under the given key.

workflow.addStep("step", ["blazen::StartEvent"], async (event, ctx) => {
  await ctx.session.set("reqId", "abc123");
  if (await ctx.session.has("reqId")) {
    const id = await ctx.session.get("reqId");
    console.log("request id:", id);
  }
  return { type: "blazen::StopEvent", result: {} };
});

state and session use independent keyspaces — the same key can exist in both namespaces without colliding:

await ctx.state.set("k", "state-value");
await ctx.session.set("k", "session-value");
// Both are accessible; they don't collide.

BlazenState

Base class for typed state objects with per-field context storage. Extend this class to define structured workflow state that is automatically serialized and deserialized field by field.

static meta?: BlazenStateMeta

Optional static metadata that controls how fields are stored and which fields are transient.

interface BlazenStateMeta {
  transient?: Set<string> | string[];
}
FieldTypeDescription
transientSet<string> | string[]Field names to exclude from persistence. These fields are not saved by saveTo() and will not be present after loadFrom().

restore?(): void | Promise<void>

Optional instance method called by loadFrom() after all persisted fields have been restored. Use this to recreate transient fields (caches, connections, derived data) that were excluded from persistence.

await state.saveTo(ctx: Context, key: string): Promise<void>

Persist every non-transient field of the state instance into the workflow context. Each field is stored individually under a namespaced key derived from key.

static loadFrom<T>(ctx: Context, key: string): Promise<T>

Restore a state instance from the workflow context. Reads each persisted field, constructs a new instance, and calls restore() if defined.

const state = await AgentState.loadFrom<AgentState>(ctx, "state");

Compute Request Types

Typed request interfaces for compute operations (image generation, video, speech, music, transcription, 3D models).

ImageRequest

interface ImageRequest {
  prompt: string;                  // Text prompt describing the desired image
  negativePrompt?: string;         // Things to avoid in the image
  width?: number;                  // Desired image width in pixels
  height?: number;                 // Desired image height in pixels
  numImages?: number;              // Number of images to generate
  model?: string;                  // Model override (provider-specific)
  parameters?: object;             // Additional provider-specific parameters
}

UpscaleRequest

interface UpscaleRequest {
  imageUrl: string;                // URL of the image to upscale
  scale: number;                   // Scale factor (e.g. 2.0, 4.0)
  model?: string;
  parameters?: object;
}

VideoRequest

interface VideoRequest {
  prompt: string;                  // Text prompt describing the desired video
  imageUrl?: string;               // Source image URL for image-to-video
  durationSeconds?: number;        // Desired duration in seconds
  negativePrompt?: string;         // Things to avoid
  width?: number;                  // Video width in pixels
  height?: number;                 // Video height in pixels
  model?: string;
  parameters?: object;
}

SpeechRequest

interface SpeechRequest {
  text: string;                    // Text to synthesize into speech
  voice?: string;                  // Voice identifier (provider-specific)
  voiceUrl?: string;               // Reference voice sample URL for cloning
  language?: string;               // Language code (e.g. "en", "fr", "ja")
  speed?: number;                  // Speech speed multiplier (1.0 = normal)
  model?: string;
  parameters?: object;
}

MusicRequest

interface MusicRequest {
  prompt: string;                  // Text prompt describing the desired audio
  durationSeconds?: number;        // Desired duration in seconds
  model?: string;
  parameters?: object;
}

TranscriptionRequest

interface TranscriptionRequest {
  audioUrl: string;                // URL of the audio file to transcribe
  language?: string;               // Language hint (e.g. "en", "fr")
  diarize?: boolean;               // Whether to perform speaker diarization
  model?: string;
  parameters?: object;
}

ThreeDRequest

interface ThreeDRequest {
  prompt?: string;                 // Text prompt describing the desired 3D model
  imageUrl?: string;               // Source image URL for image-to-3D
  format?: string;                 // Output format (e.g. "glb", "obj", "usdz")
  model?: string;
  parameters?: object;
}

Compute Result Types

ImageResult

interface ImageResult {
  images: GeneratedImage[];        // Generated or upscaled images
  timing?: ComputeTiming;          // Request timing breakdown
  cost?: number;                   // Cost in USD
  metadata: object;                // Provider-specific metadata
}

VideoResult

interface VideoResult {
  videos: GeneratedVideo[];
  timing?: ComputeTiming;
  cost?: number;
  metadata: object;
}

AudioResult

interface AudioResult {
  audio: GeneratedAudio[];
  timing?: ComputeTiming;
  cost?: number;
  metadata: object;
}

TranscriptionResult

interface TranscriptionResult {
  text: string;                          // Full transcribed text
  segments: TranscriptionSegment[];      // Time-aligned segments
  language?: string;                     // Detected or specified language code
  timing?: ComputeTiming;
  cost?: number;
  metadata: object;
}

TranscriptionSegment

interface TranscriptionSegment {
  text: string;            // Transcribed text for this segment
  start: number;           // Start time in seconds
  end: number;             // End time in seconds
  speaker?: string;        // Speaker label (if diarization enabled)
}

ThreeDResult

interface ThreeDResult {
  models: Generated3DModel[];
  timing?: ComputeTiming;
  cost?: number;
  metadata: object;
}

Compute Job Types

Low-level types for generic compute jobs.

ComputeRequest

interface ComputeRequest {
  model: string;             // Model/endpoint to run (e.g. "fal-ai/flux/dev")
  input: object;             // Input parameters (model-specific)
  webhook?: string;          // Webhook URL for async completion notification
}

JobHandle

interface JobHandle {
  id: string;                // Provider-assigned job identifier
  provider: string;          // Provider name (e.g. "fal", "replicate", "runpod")
  model: string;             // Model/endpoint that was invoked
  submittedAt: string;       // ISO 8601 timestamp
}

JobStatus

String enum for compute job status.

JobStatus.Queued      // "queued"
JobStatus.Running     // "running"
JobStatus.Completed   // "completed"
JobStatus.Failed      // "failed"
JobStatus.Cancelled   // "cancelled"

ComputeResult

interface ComputeResult {
  job?: JobHandle;           // Job handle that produced this result
  output: object;            // Output data (model-specific)
  timing?: ComputeTiming;    // Request timing breakdown
  cost?: number;             // Cost in USD
  metadata: object;          // Raw provider-specific metadata
}

ComputeTiming

interface ComputeTiming {
  queueMs?: number;          // Time spent waiting in queue (ms)
  executionMs?: number;      // Time spent executing (ms)
  totalMs?: number;          // Total wall-clock time (ms)
}

Media Output Types

MediaOutput

A single piece of generated media content.

interface MediaOutput {
  url?: string;              // URL where the media can be downloaded
  base64?: string;           // Base64-encoded media data
  rawContent?: string;       // Raw text content (SVG, OBJ, GLTF JSON)
  mediaType: string;         // MIME type (e.g. "image/png", "video/mp4")
  fileSize?: number;         // File size in bytes
  metadata: object;          // Provider-specific metadata
}

GeneratedImage

interface GeneratedImage {
  media: MediaOutput;
  width?: number;            // Image width in pixels
  height?: number;           // Image height in pixels
}

GeneratedVideo

interface GeneratedVideo {
  media: MediaOutput;
  width?: number;            // Video width in pixels
  height?: number;           // Video height in pixels
  durationSeconds?: number;  // Duration in seconds
  fps?: number;              // Frames per second
}

GeneratedAudio

interface GeneratedAudio {
  media: MediaOutput;
  durationSeconds?: number;  // Duration in seconds
  sampleRate?: number;       // Sample rate in Hz
  channels?: number;         // Number of audio channels
}

Generated3DModel

interface Generated3DModel {
  media: MediaOutput;
  vertexCount?: number;      // Total vertex count
  faceCount?: number;        // Total face/triangle count
  hasTextures: boolean;      // Whether the model includes textures
  hasAnimations: boolean;    // Whether the model includes animations
}

EmbeddingModel

Generate vector embeddings from text. Created via static factory methods. Keys are read from environment variables (OPENAI_API_KEY, TOGETHER_API_KEY, etc.) when options is omitted, or can be passed explicitly via { apiKey: "..." }.

import { EmbeddingModel } from "blazen";

const model = EmbeddingModel.openai();
const together = EmbeddingModel.together();
const cohere = EmbeddingModel.cohere({ apiKey: "co-..." });
const fireworks = EmbeddingModel.fireworks();

Provider Factory Methods

MethodDefault ModelDefault Dimensions
EmbeddingModel.openai(options?)text-embedding-3-small1536
EmbeddingModel.together(options?)togethercomputer/m2-bert-80M-8k-retrieval768
EmbeddingModel.cohere(options?)embed-v4.01024
EmbeddingModel.fireworks(options?)nomic-ai/nomic-embed-text-v1.5768

Properties

PropertyTypeDescription
.modelIdstringThe model identifier.
.dimensionsnumberOutput vector dimensionality.

await model.embed(texts: string[]): EmbeddingResponse

Embed one or more texts, returning one vector per input.

const response = await model.embed(["Hello", "World"]);
console.log(response.embeddings.length);    // 2
console.log(response.embeddings[0].length); // 1536

EmbeddingResponse

Returned by EmbeddingModel.embed().

interface EmbeddingResponse {
  embeddings: number[][];         // One vector per input text
  model: string;                  // Model that produced the embeddings
  usage?: TokenUsage;             // Token usage statistics
  cost?: number;                  // Estimated cost in USD
  timing?: RequestTiming;         // Request timing breakdown
  metadata: object;               // Provider-specific metadata
}

Token Estimation

Lightweight token counting functions. Uses a heuristic (~3.5 characters per token) suitable for budget checks without external data files.

estimateTokens(text: string, contextSize?: number): number

Estimate token count for a text string.

import { estimateTokens } from "blazen";

const count = estimateTokens("Hello, world!");  // 4

countMessageTokens(messages: ChatMessage[], contextSize?: number): number

Estimate total tokens for an array of chat messages, including per-message overhead.

import { countMessageTokens, ChatMessage } from "blazen";

const count = countMessageTokens([
  ChatMessage.system("You are helpful."),
  ChatMessage.user("Hello!"),
]);

contextSize defaults to 128000 if omitted.


Subclassable Providers

CompletionModel, EmbeddingModel, and Transcription can be subclassed to implement custom providers. Override the relevant methods and the framework will dispatch to your implementation.

CompletionModel

import { CompletionModel, ChatMessage } from "blazen";

class MyLLM extends CompletionModel {
  constructor() {
    super({ modelId: "my-llm" });
  }

  async complete(messages: ChatMessage[]) {
    // Your inference logic here
    return { content: "Hello from my custom model" };
  }

  async stream(messages: ChatMessage[], onChunk: (chunk: any) => void) {
    onChunk({ delta: "Hello", finishReason: null, toolCalls: [] });
    onChunk({ delta: null, finishReason: "stop", toolCalls: [] });
  }
}

const model = new MyLLM();
const response = await model.complete([ChatMessage.user("Hi")]);

EmbeddingModel

import { EmbeddingModel } from "blazen";

class MyEmbedder extends EmbeddingModel {
  constructor() {
    super({ modelId: "my-embedder", dimensions: 128 });
  }

  async embed(texts: string[]) {
    return {
      embeddings: texts.map(() => new Array(128).fill(0.1)),
      model: "my-embedder",
    };
  }
}

Transcription

import { Transcription } from "blazen";

class MyTranscriber extends Transcription {
  constructor() {
    super({ providerId: "my-stt" });
  }

  async transcribe(request: any) {
    return { text: "transcribed text", segments: [] };
  }
}

Per-Capability Provider Classes

Seven provider base classes let you implement a single compute capability without dealing with the full ComputeProvider interface. Subclass and override the relevant methods.

ClassMethods to OverrideRust Trait
TTSProvidertextToSpeech(request)AudioGeneration
MusicProvidergenerateMusic(request), generateSfx(request)AudioGeneration
ImageProvidergenerateImage(request), upscaleImage(request)ImageGeneration
VideoProvidertextToVideo(request), imageToVideo(request)VideoGeneration
ThreeDProvidergenerate3d(request)ThreeDGeneration
BackgroundRemovalProviderremoveBackground(request)BackgroundRemoval
VoiceProvidercloneVoice(request), listVoices(), deleteVoice(voice)VoiceCloning

Constructor

All provider classes share the same constructor config:

new TTSProvider({
  providerId: string,
  baseUrl?: string,
  pricing?: ModelPricing,
  memoryEstimateBytes?: number,
})
FieldTypeDescription
providerIdstringIdentifier for the provider instance.
baseUrlstring?Optional base URL for the provider API.
pricingModelPricing?Optional pricing info for cost tracking.
memoryEstimateBytesnumber?Estimated memory footprint in bytes (host RAM if on CPU, GPU VRAM otherwise). Used by ModelManager to charge the appropriate pool.

Example

import { TTSProvider } from "blazen";

class ElevenLabsTTS extends TTSProvider {
  private apiKey: string;

  constructor(apiKey: string) {
    super({ providerId: "elevenlabs" });
    this.apiKey = apiKey;
  }

  async textToSpeech(request: any) {
    // Call ElevenLabs API with this.apiKey
    return { audio: audioBuffer, format: "mp3" };
  }
}

const tts = new ElevenLabsTTS("sk-...");
const result = await tts.textToSpeech({ text: "Hello world", voice: "alice" });

MemoryBackend

Base class for custom memory storage backends. Subclass to implement persistence backed by Postgres, SQLite, DynamoDB, or any other store.

import { MemoryBackend } from "blazen";

class PostgresBackend extends MemoryBackend {
  async put(entry: any): Promise<void> {
    // Insert or update entry in Postgres
  }

  async get(id: string): Promise<any | null> {
    // Retrieve entry by id
  }

  async delete(id: string): Promise<boolean> {
    // Delete entry, return true if it existed
  }

  async list(): Promise<any[]> {
    // Return all entries
  }

  async len(): Promise<number> {
    // Return count of entries
  }

  async searchByBands(bands: any, limit: number): Promise<any[]> {
    // Return candidates sharing LSH bands with the query
  }
}

Methods to Override

MethodSignatureDescription
putasync put(entry): Promise<void>Insert or update a stored entry.
getasync get(id: string): Promise<any | null>Retrieve a stored entry by id.
deleteasync delete(id: string): Promise<boolean>Delete an entry by id. Returns true if it existed.
listasync list(): Promise<any[]>Return all stored entries.
lenasync len(): Promise<number>Return the number of stored entries.
searchByBandsasync searchByBands(bands, limit): Promise<any[]>Return candidate entries sharing at least one LSH band.

ProgressCallback

Subclassable base for download progress callbacks. Pass an instance to ModelCache.download() (and other download-capable APIs) to receive byte-count progress updates.

onProgress takes bigint byte counts so multi-gigabyte downloads keep full precision. total is null when the server does not send Content-Length.

import { ModelCache, ProgressCallback } from "blazen";

class MyProgress extends ProgressCallback {
  onProgress(downloaded: bigint, total?: bigint | null): void {
    if (total != null) {
      const pct = Number((downloaded * 100n) / total);
      console.log(`${pct}%`);
    } else {
      console.log(`${downloaded} bytes`);
    }
  }
}

const cache = ModelCache.create();
await cache.download("bert-base-uncased", "config.json", new MyProgress());

The base onProgress always throws — overriding it is mandatory. super() must be called from the subclass constructor.

ProgressCallback instances are accepted anywhere the SDK exposes a download hook — ModelCache.download(), the local-inference Provider.create() paths that pull weights from HuggingFace, and the ProgressCallback-aware variants of dataset loaders. Pass the same ProgressCallback instance to multiple downloads to centralise reporting (e.g. for a TUI progress bar).


ModelManager

Per-pool memory budget-aware model manager with LRU eviction. Tracks registered local models and their estimated memory footprint (host RAM if the model runs on CPU, GPU VRAM otherwise). Models live in distinct pools (CPU RAM vs per-GPU VRAM); when loading a model that would exceed its pool’s budget, the least-recently-used loaded model within that same pool is unloaded first. A CPU embedder never evicts a GPU LLM, and vice versa.

ModelManager is a memory budget bookkeeper, not a performance scheduler. It answers “will this fit?” — not “will this run fast?”. Whether a 70B model loaded on CPU is useful at 1–3 tok/s is a workload-choice question the manager intentionally does not answer.

Constructor

import { ModelManager } from "blazen";

// Common case: separate CPU and GPU budgets, in gigabytes.
const manager = new ModelManager({ cpuRamGb: 100, gpuVramGb: 24 });

// Explicit per-pool budgets in bytes (BigInt). Pool labels: "cpu", "gpu", "gpu:N".
const manager = new ModelManager({
  poolBudgets: {
    "cpu": 100_000_000_000n,
    "gpu:0": 24_000_000_000n,
  },
});

// No-arg / empty-object: BOTH the "cpu" and "gpu:0" pools default to
// u64::MAX (unlimited-budget sentinel). Useful for tests and ad-hoc scripts.
const manager = new ModelManager();
FieldTypeDescription
cpuRamGbnumber?Host RAM budget in gigabytes for the "cpu" pool.
gpuVramGbnumber?VRAM budget in gigabytes for the "gpu:0" pool. Omit if you don’t load GPU models.
poolBudgetsRecord<string, bigint>?Explicit per-pool byte budgets. Keys are pool labels ("cpu", "gpu", "gpu:N"); values are bigint byte budgets. Use this when you need budgets above 4 GiB on a per-pool basis or multi-GPU setups.

Methods

MethodSignatureDescription
registerawait manager.register(id, model, memoryEstimateBytes?: bigint)Register a model with its estimated memory footprint (in bytes, as a bigint). Starts unloaded. The pool charged is determined by the model’s device() (defaults to "cpu").
registerLocalModelawait manager.registerLocalModel(id, load, unload, isLoaded, memoryEstimateBytes?: bigint, device?: string)Register a model from raw closures. The optional device string (default "cpu") selects which pool to charge — e.g. "cuda:0" charges "gpu:0".
loadawait manager.load(id)Load a model, evicting LRU models in the same pool if needed.
unloadawait manager.unload(id)Unload a model and free its memory.
isLoadedawait manager.isLoaded(id): booleanCheck if a model is currently loaded.
ensureLoadedawait manager.ensureLoaded(id)Alias for load().
usedBytesawait manager.usedBytes(pool?: string): bigintBytes currently used by loaded models in the given pool. pool defaults to "cpu". Invalid pool labels reject with invalid pool label '<x>': expected 'cpu', 'gpu', or 'gpu:N' where N is a non-negative integer.
availableBytesawait manager.availableBytes(pool?: string): bigintBytes still available within the given pool’s budget. Same default and validation rules as usedBytes.
poolsmanager.pools(): Array<{ pool: string; budgetBytes: bigint }>Sync. List every configured pool and its byte budget.
statusawait manager.status(): ModelStatus[]Status of all registered models.

ModelStatus

interface ModelStatus {
  id: string;                    // Model identifier
  loaded: boolean;               // Whether the model is currently loaded
  memoryEstimateBytes: bigint;   // Estimated memory footprint in bytes
  pool: string;                  // Pool label this model is charged to ("cpu", "gpu:0", ...)
}

Why bigint? The byte-budget surface (poolBudgets values, register’s memoryEstimateBytes, usedBytes(), availableBytes(), ModelStatus.memoryEstimateBytes) used to be number (u32 on the Rust side), which capped budgets at ~4 GiB and silently truncated larger inputs — a real footgun for 7B+ local models that need 8 GiB+ of memory. Pass values as BigInt literals (8n * 1_073_741_824n) or via BigInt(8 * 1024 ** 3). The cpuRamGb / gpuVramGb: number constructor path is unchanged for users who prefer plain numbers and gigabyte granularity.


ModelRegistry

An ABC for model catalogs. Subclass it to advertise the models your code knows about — used by capability-discovery code, dynamic model menus, and parity with the Rust blazen_llm::traits::ModelRegistry trait.

export declare class ModelRegistry {
  listModels(): Promise<ModelInfo[]>;
  getModel(modelId: string): Promise<ModelInfo | null>;
}
MethodSignatureDescription
listModelsasync listModels(): Promise<ModelInfo[]>Return every model the registry advertises.
getModelasync getModel(modelId: string): Promise<ModelInfo | null>Look up a single model by id, or return null if unknown.

The base implementations of both methods throw — overriding them is mandatory. super() must be called from the subclass constructor.

import { ModelRegistry } from "blazen";
import type { ModelInfo } from "blazen";

class MyRegistry extends ModelRegistry {
  async listModels(): Promise<ModelInfo[]> {
    return [
      { id: "gpt-4o", provider: "openai" /* ...other ModelInfo fields */ },
      { id: "claude-sonnet-4", provider: "anthropic" /* ... */ },
    ];
  }

  async getModel(modelId: string): Promise<ModelInfo | null> {
    const all = await this.listModels();
    return all.find((m) => m.id === modelId) ?? null;
  }
}

See the ModelInfo reference for the full set of fields each entry must populate (id, provider, capabilities, context window, pricing, etc.).

Mirrors PyModelRegistry (Python) and WasmModelRegistry exposed as ModelRegistry (WASM SDK) — subclassing ModelRegistry in any binding produces the same Rust-side blazen_llm::traits::ModelRegistry implementation.


ModelPricing and Pricing Functions

ModelPricing

Pricing metadata for cost tracking.

interface ModelPricing {
  inputPerMillion?: number;     // Cost per million input tokens (USD)
  outputPerMillion?: number;    // Cost per million output tokens (USD)
  perImage?: number;            // Cost per generated image (USD)
  perSecond?: number;           // Cost per second of compute (USD)
}

registerPricing()

Register custom pricing for a model. Overrides any existing pricing for the same model ID.

import { registerPricing } from "blazen";

registerPricing("my-model", { inputPerMillion: 1.0, outputPerMillion: 2.0 });

lookupPricing()

Look up pricing for a model by ID. Returns null if the model is unknown.

import { lookupPricing } from "blazen";

const pricing = lookupPricing("gpt-4o");
if (pricing) {
  console.log(`Input: $${pricing.inputPerMillion}/M tokens`);
}

LocalModel Methods on CompletionModel

CompletionModel instances backed by local inference (not remote APIs) support explicit load/unload lifecycle management.

MethodSignatureDescription
loadawait model.load(): voidLoad the model into memory (host RAM if on CPU, GPU VRAM otherwise). Idempotent.
unloadawait model.unload(): voidFree the model’s memory. Idempotent.
isLoadedawait model.isLoaded(): booleanWhether the model is currently loaded.
memoryBytesawait model.memoryBytes(): number | nullApproximate memory footprint in bytes (host RAM if on CPU, GPU VRAM otherwise), or null if unknown.
deviceawait model.device(): stringReturn the device string this model targets ('cpu', 'cuda:0', 'metal', etc.). Determines which pool the manager charges. Subclasses must override — the base implementation throws.
// For a local model:
await model.load();
console.log(await model.isLoaded());    // true
console.log(await model.memoryBytes()); // e.g. 4000000000
console.log(await model.device());      // e.g. "cuda:0"
await model.unload();

Error Handling

Errors thrown across the FFI boundary are surfaced as instances of typed BlazenError subclasses. Every error class extends the base BlazenError, which extends the standard JavaScript Error, so existing instanceof Error checks keep working while gaining structural classification.

The BlazenError hierarchy is what makes typed error routing possible — any caught value can be matched against BlazenError (catch-all for anything from the SDK), against the direct subclass (broad category like RateLimitError), or against a leaf class (specific failure like LlamaCppModelLoadError). Use whichever level of specificity your handler needs.

Root and direct subclasses

BlazenError is the root. The following 18 classes extend it directly:

ClassWhen it’s thrown
AuthErrorInvalid or expired API key.
RateLimitErrorProvider rate limit reached.
TimeoutErrorRequest exceeded its deadline.
ValidationErrorInvalid request parameters or option set.
ContentPolicyErrorContent moderated by the provider.
ProviderErrorProvider-specific error (HTTP status / endpoint detail attached — see below).
UnsupportedErrorFeature not supported by the chosen provider.
ComputeErrorCompute job failure (cancelled, quota exhausted, runtime failure).
MediaErrorInvalid or oversized media content.
PeerEncodeErrorFailed to encode a peer envelope.
PeerTransportErrorNetwork failure between peers.
PeerEnvelopeVersionErrorPeer protocol version mismatch.
PeerWorkflowErrorRemote peer workflow failed.
PeerTlsErrorTLS handshake or cert validation failed for a peer connection.
PeerUnknownStepErrorPeer requested an unknown workflow step.
PersistErrorSnapshot persistence backend failure.
PromptErrorPrompt registry / template failure.
MemoryErrorMemory store / embedder failure.
CacheErrorModel cache / download failure.

ProviderError structured fields

ProviderError carries structured context in addition to the message string. All fields are nullable.

FieldTypeDescription
.providerstring | nullProvider name (e.g. "openai", "anthropic").
.statusnumber | nullHTTP status code, when the call reached the provider.
.endpointstring | nullThe endpoint that returned the error.
.requestIdstring | nullProvider-assigned request id (use this when filing support tickets).
.detailstring | nullProvider-supplied error detail / body.
.retryAfterMsnumber | nullSuggested back-off when the provider returned a Retry-After hint.

Per-backend ProviderError subclasses

Each local-inference and provider-side backend has its own ProviderError subclass with narrower variants. Use instanceof to route to backend-specific handling.

ClassBackendRepresentative narrower subclasses
LlamaCppErrorllama.cppLlamaCppInvalidOptionsError, LlamaCppModelLoadError, LlamaCppInferenceError, LlamaCppEngineNotAvailableError
MistralRsErrormistral.rsMistralRsInvalidOptionsError, MistralRsInitError, MistralRsInferenceError, MistralRsEngineNotAvailableError
CandleLlmErrorcandle (LLM)CandleLlmInvalidOptionsError, CandleLlmModelLoadError, CandleLlmInferenceError, CandleLlmEngineNotAvailableError
CandleEmbedErrorcandle (embeddings)CandleEmbedModelLoadError, CandleEmbedEmbeddingError, CandleEmbedEngineNotAvailableError, CandleEmbedTaskPanickedError
WhisperErrorwhisper.cppWhisperModelLoadError, WhisperTranscriptionError, WhisperEngineNotAvailableError, WhisperIoError
PiperErrorPiper TTSPiperModelLoadError, PiperSynthesisError, PiperEngineNotAvailableError
DiffusionErrordiffusion image genDiffusionModelLoadError, DiffusionGenerationError
FastEmbedErrorfastembedEmbedUnknownModelError, EmbedInitError, EmbedEmbedError, EmbedMutexPoisonedError, EmbedTaskPanickedError
TractErrortract ONNX runtime(no narrower variants)

PromptError similarly has narrower variants like PromptMissingVariableError, PromptNotFoundError, PromptVersionNotFoundError, PromptIoError, PromptYamlError, PromptJsonError, PromptValidationError. MemoryError exposes MemoryNoEmbedderError, MemoryEmbeddingError, MemoryNotFoundError, MemorySerializationError, MemoryIoError, MemoryBackendError. CacheError exposes DownloadError, CacheDirError, IoError. There are around 80 narrower subclasses in total — every public Rust error variant gets its own JS class.

enrichError(err: unknown): unknown

Plain Error instances thrown across the FFI boundary lose their original Rust type. Pass any caught value through enrichError to re-classify it into the proper BlazenError subclass before further inspection. It’s a no-op when the error is already typed.

import { enrichError, BlazenError } from "blazen";

try {
  await model.complete([ChatMessage.user("Hi")]);
} catch (raw) {
  const err = enrichError(raw);
  if (err instanceof BlazenError) {
    // typed handling
  }
  throw err;
}

Example: routing on the typed hierarchy

import {
  RateLimitError, AuthError, TimeoutError,
  ProviderError, LlamaCppEngineNotAvailableError,
} from "blazen";

try {
  const response = await model.complete([ChatMessage.user("Hello")]);
} catch (e) {
  if (e instanceof RateLimitError) {
    await sleep(e instanceof ProviderError && e.retryAfterMs ? e.retryAfterMs : 1000);
  } else if (e instanceof AuthError) {
    refreshApiKey();
  } else if (e instanceof TimeoutError) {
    // safe to retry once
  } else if (e instanceof LlamaCppEngineNotAvailableError) {
    console.error("Build was compiled without llama.cpp support");
  } else if (e instanceof ProviderError) {
    console.error(`[${e.provider} ${e.status ?? "?"}] ${e.detail ?? e.message}`);
  } else {
    throw e;
  }
}

RateLimitError, TimeoutError, transient ProviderErrors with status >= 500, and most PeerTransportErrors are safe to retry. AuthError, ValidationError, ContentPolicyError, and UnsupportedError are not.

When to call enrichError

The Rust core always emits typed BlazenError subclasses, but errors that originate in JS callbacks (tool handlers, persist callbacks, custom providers) and bubble back through Rust come out as plain Error instances. If you want uniform instanceof BlazenError matching everywhere, run every caught value through enrichError at the catch site:

import { enrichError, BlazenError, ProviderError } from "blazen";

async function safeCall<T>(fn: () => Promise<T>): Promise<T> {
  try {
    return await fn();
  } catch (raw) {
    const err = enrichError(raw);
    if (err instanceof ProviderError && err.retryAfterMs) {
      await new Promise(r => setTimeout(r, err.retryAfterMs!));
    }
    throw err;
  }
}

enrichError is idempotent — passing it an already-typed BlazenError returns the same value unchanged, so it’s safe to layer.


Local Inference Types

Local backends (MistralRsProvider, LlamaCppProvider, CandleLlmProvider) expose typed input/output classes for direct use without going through the generic CompletionModel surface. Two parallel families exist — un-prefixed * for mistral.rs (the canonical surface) and LlamaCpp* for llama.cpp — plus a single CandleInferenceResult for the candle backend.

mistral.rs (canonical, un-prefixed)

Class / enumPurpose
ChatMessageInputInference-side chat message. Constructor: new ChatMessageInput(role, text, images?); static ChatMessageInput.fromText(role, text). Getters: .role, .text, .images, .hasImages.
ChatRoleConst enum: System, User, Assistant, Tool.
InferenceImageImage attachment for vision-capable models. Static factories: fromBytes(buf), fromPath(path), fromSource(src).
InferenceImageSourceTagged-union source. Static factories: bytes(buf), path(p). Getters: .kind ("bytes" or "path"), .data, .filePath.
InferenceResultNon-streaming result. Getters: .content, .reasoningContent, .toolCalls, .finishReason, .model, .usage.
InferenceChunkStreaming chunk. Getters: .delta, .reasoningDelta, .toolCalls, .finishReason.
InferenceChunkStreamAsync chunk source. Pull with await stream.next(); returns null when exhausted.
InferenceToolCallTool call requested by the model. Constructor new InferenceToolCall(id, name, arguments); getters .id, .name, .arguments (JSON string).
InferenceUsageToken usage. Getters: .promptTokens, .completionTokens, .totalTokens, .totalTimeSec.
import { ChatMessageInput, ChatRole, InferenceImage, InferenceChunkStream, MistralRsProvider } from "blazen";

const provider = await MistralRsProvider.create({ modelId: "..." });
const stream: InferenceChunkStream = await provider.inferStream([
  ChatMessageInput.fromText(ChatRole.User, "Describe this image"),
]);
for (let chunk = await stream.next(); chunk !== null; chunk = await stream.next()) {
  process.stdout.write(chunk.delta ?? "");
}

InferenceChunkStream is single-pass — once you’ve reached the terminating null, the stream is exhausted. Errors raised from inside InferenceChunkStream.next() are typed BlazenError subclasses (MistralRsInferenceError, etc.) so they can be matched alongside the rest of the error hierarchy.

llama.cpp (LlamaCpp prefix)

The llama.cpp surface mirrors the mistral.rs one with a narrower feature set (no reasoning content, no images on the message input itself).

Class / enumPurpose
LlamaCppChatMessageInputConstructor: new LlamaCppChatMessageInput(role, text). Getters: .role, .text.
LlamaCppChatRoleConst enum: System, User, Assistant, Tool (capitalised — distinct from ChatRole).
LlamaCppInferenceResultNon-streaming result. Getters: .content, .finishReason, .model, .usage.
LlamaCppInferenceChunkStreaming chunk. Getters: .delta, .finishReason.
LlamaCppInferenceChunkStreamAsync chunk source. Same await stream.next() pattern.
LlamaCppInferenceUsageGetters: .promptTokens, .completionTokens, .totalTokens, .totalTimeSec.
import {
  LlamaCppChatMessageInput, LlamaCppChatRole,
  LlamaCppInferenceChunkStream, LlamaCppProvider,
} from "blazen";

const provider = await LlamaCppProvider.create({ modelPath: "/models/llama.gguf" });
const stream: LlamaCppInferenceChunkStream = await provider.inferStream([
  new LlamaCppChatMessageInput(LlamaCppChatRole.User, "What is 2+2?"),
]);
for (let chunk = await stream.next(); chunk !== null; chunk = await stream.next()) {
  process.stdout.write(chunk.delta ?? "");
}

Like the mistral.rs InferenceChunkStream, LlamaCppInferenceChunkStream is single-pass; mid-stream failures throw a typed LlamaCppInferenceError.

candle

The candle backend exposes a single non-streaming result class.

ClassPurpose
CandleInferenceResultConstructor: new CandleInferenceResult(content, promptTokens, completionTokens, totalTimeSecs). Getters: .content, .promptTokens, .completionTokens, .totalTimeSecs.

The candle backend has no streaming counterpart to InferenceChunkStream / LlamaCppInferenceChunkStream — pull CandleInferenceResult once per call. If you need token-by-token streaming on candle, swap the provider to mistral.rs or llama.cpp.

Errors raised from local inference

All three families propagate errors as typed BlazenError subclasses. The mapping is documented in the Error Handling section above. As elsewhere, run callbacks through enrichError to re-classify any plain Error that bubbles back through Rust from JS-side code (custom samplers, custom token decoders, etc.).


Telemetry

OpenTelemetry-compatible tracing flows through the standard tracing subscriber. Blazen ships an optional Langfuse exporter that ships span batches to the Langfuse ingestion API. This is gated by the langfuse Cargo feature on the underlying crate, so it’s only available in builds that opted in at compile time.

LangfuseConfig

import { LangfuseConfig } from "blazen";

const config = new LangfuseConfig(
  process.env.LANGFUSE_PUBLIC_KEY!,
  process.env.LANGFUSE_SECRET_KEY!,
  "https://cloud.langfuse.com",  // host (optional, defaults to cloud)
  100,                             // batchSize (optional, default 100)
  5000,                            // flushIntervalMs (optional, default 5000)
);
PropertyTypeDescription
.publicKeystringThe Langfuse public API key.
.secretKeystringThe Langfuse secret API key.
.hoststring | nullThe configured host URL, or null when defaulted.
.batchSizenumberMaximum events buffered before an automatic flush.
.flushIntervalMsnumberBackground flush interval in milliseconds.

initLangfuse(config: LangfuseConfig): void

Install the global tracing subscriber. Spawns a background tokio task that flushes buffered span envelopes to Langfuse on the configured interval. Calling this more than once per process is safe — subsequent calls no-op because the global subscriber is already registered.

import { initLangfuse, LangfuseConfig } from "blazen";

initLangfuse(new LangfuseConfig(
  process.env.LANGFUSE_PUBLIC_KEY!,
  process.env.LANGFUSE_SECRET_KEY!,
));

Available only when the host build was compiled with the langfuse feature on the underlying telemetry crate. In builds without it, the symbol is still exported but the configured exporter is a no-op.

Wiring LangfuseConfig from environment

Most deployments construct LangfuseConfig directly from environment variables at startup. Tune batchSize and flushIntervalMs to balance ingestion latency against the per-request overhead of HTTP flushes:

import { initLangfuse, LangfuseConfig } from "blazen";

const cfg = new LangfuseConfig(
  process.env.LANGFUSE_PUBLIC_KEY!,
  process.env.LANGFUSE_SECRET_KEY!,
  process.env.LANGFUSE_HOST,                                  // null → cloud default
  Number(process.env.LANGFUSE_BATCH_SIZE ?? 100),
  Number(process.env.LANGFUSE_FLUSH_INTERVAL_MS ?? 5000),
);
initLangfuse(cfg);

A LangfuseConfig instance is purely declarative — it carries no IO state — so it’s safe to construct, inspect, log (with secrets redacted), and pass to initLangfuse independently.

You can keep multiple LangfuseConfig objects around (e.g. one per environment) and choose which one to install at startup; only the first initLangfuse call wins per process.


version()

Returns the Blazen library version string.

import { version } from "blazen";

console.log(version()); // "0.1.0"