Node.js API Reference

Complete API reference for blazen in Node.js

ChatMessage

A class for building typed chat messages. Supports text, multimodal (image) content, and content parts.

Constructor

new ChatMessage({ role?: string, content?: string, parts?: ContentPart[] })

Create a message from an options object. role defaults to "user" if omitted. Supply either content (text) or parts (multimodal), not both.

// Text message with explicit role
new ChatMessage({ role: "user", content: "Hello" })

// Using the Role enum
new ChatMessage({ role: Role.User, content: "Hello" })

// System message
new ChatMessage({ role: "system", content: "You are a helpful assistant." })

// Multimodal message with content parts
new ChatMessage({
  role: "user",
  parts: [
    { partType: "text", text: "Describe this image:" },
    { partType: "image", image: { source: { sourceType: "url", url: "https://example.com/photo.jpg" } } }
  ]
})

Static Factory Methods

Method	Description
`ChatMessage.system(content: string)`	Create a system message
`ChatMessage.user(content: string)`	Create a user message
`ChatMessage.assistant(content: string)`	Create an assistant message
`ChatMessage.tool(content: string)`	Create a tool result message
`ChatMessage.userImageUrl(text: string, url: string, mediaType?: string)`	Create a user message with text and an image URL
`ChatMessage.userImageBase64(text: string, data: string, mediaType: string)`	Create a user message with text and a base64-encoded image
`ChatMessage.userParts(parts: ContentPart[])`	Create a user message from an explicit list of content parts

const msg = ChatMessage.user("What is 2 + 2?");

const imgMsg = ChatMessage.userImageUrl(
  "What's in this image?",
  "https://example.com/photo.jpg",
  "image/jpeg"
);

const b64Msg = ChatMessage.userImageBase64(
  "Describe this:",
  base64Data,
  "image/png"
);

Properties

Property	Type	Description
`.role`	`string`	The message role: `"system"`, `"user"`, `"assistant"`, or `"tool"`
`.content`	`string \| null`	The text content of the message, if any. For tool results that returned a plain string, the string lives here (and `.toolResult` is `null`).
`.toolResult`	`ToolOutput \| null`	Structured tool-result payload. `null` for non-tool messages or when the tool returned a plain string. When non-null, `.toolResult.data` is the full structured value the caller should consume; the LLM-facing wire form is derived from `.toolResult.llmOverride` (if set) or from `.toolResult.data` via the provider’s default conversion.

Note on toolCallId: The Rust core stores a tool_call_id on tool messages so providers can correlate a tool result with the originating ToolCall.id. As of this writing the Node binding does not expose a public .toolCallId getter on ChatMessage — correlation is handled internally by the agent loop and on the wire by each provider. If you need explicit access, file an issue and we can surface it.

Role

String enum for message roles.

Role.System    // "system"
Role.User      // "user"
Role.Assistant // "assistant"
Role.Tool      // "tool"

Can be used interchangeably with plain strings in the ChatMessage constructor.

ContentPart

Types for multimodal message content, used in the parts field of the ChatMessage constructor and in ChatMessage.userParts().

interface ContentPart {
  partType: "text" | "image";
  text?: string;          // Required when partType is "text"
  image?: ImageContent;   // Required when partType is "image"
}

interface ImageContent {
  source: ImageSource;
  mediaType?: string;     // MIME type, e.g. "image/png"
}

interface ImageSource {
  sourceType: "url" | "base64";
  url?: string;           // Required when sourceType is "url"
  data?: string;          // Required when sourceType is "base64"
}

// `MediaSource` is a type alias for `ImageSource` re-exported for compute APIs
// that accept any media (image / video frame / audio cover-art) using the same shape.
// All compute requests that take a `MediaSource` accept exactly the same value
// you would pass to `ImageSource` — `MediaSource` exists only as an aliasing affordance.
export type MediaSource = ImageSource;

Note on MediaSource: prefer MediaSource in compute / generation APIs (image upscaling, video frames, audio cover art) and ImageSource in chat-message APIs. The two are interchangeable at the type level — MediaSource is just the structurally identical alias for ImageSource.

// Text part
{ partType: "text", text: "Describe this image:" }

// Image from URL
{
  partType: "image",
  image: {
    source: { sourceType: "url", url: "https://example.com/photo.jpg" },
    mediaType: "image/jpeg"
  }
}

// Image from base64
{
  partType: "image",
  image: {
    source: { sourceType: "base64", data: "iVBORw0KGgo..." },
    mediaType: "image/png"
  }
}

CompletionModel

A chat completion model. Created via static factory methods for each provider.

Provider Factory Methods

All providers accept an optional options object containing an apiKey (and other provider-specific fields). If options is omitted — or apiKey is not set within it — the key is read from the provider’s standard environment variable (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, etc.). If model is not set, the provider’s default model is used.

Method	Signature
`CompletionModel.openai`	`(options?: ProviderOptions)`
`CompletionModel.anthropic`	`(options?: ProviderOptions)`
`CompletionModel.gemini`	`(options?: ProviderOptions)`
`CompletionModel.azure`	`(options: AzureOptions)`
`CompletionModel.fal`	`(options?: FalOptions)`
`CompletionModel.openrouter`	`(options?: ProviderOptions)`
`CompletionModel.groq`	`(options?: ProviderOptions)`
`CompletionModel.together`	`(options?: ProviderOptions)`
`CompletionModel.mistral`	`(options?: ProviderOptions)`
`CompletionModel.deepseek`	`(options?: ProviderOptions)`
`CompletionModel.fireworks`	`(options?: ProviderOptions)`
`CompletionModel.perplexity`	`(options?: ProviderOptions)`
`CompletionModel.xai`	`(options?: ProviderOptions)`
`CompletionModel.cohere`	`(options?: ProviderOptions)`
`CompletionModel.bedrock`	`(options: BedrockOptions)`

// Read key from OPENAI_API_KEY env var
const model = CompletionModel.openai();

// Pass an explicit key and override the model
const claude = CompletionModel.anthropic({ apiKey: "sk-ant-...", model: "claude-sonnet-4-20250514" });
const gemini = CompletionModel.gemini({ model: "gemini-2.5-flash" });

Properties

Property	Type	Description
`.modelId`	`string`	The model identifier string

`await model.complete(messages: ChatMessage[]): CompletionResponse`

Perform a chat completion.

const response = await model.complete([
  ChatMessage.system("You are a helpful assistant."),
  ChatMessage.user("What is 2 + 2?"),
]);
console.log(response.content); // "4"

`await model.completeWithOptions(messages: ChatMessage[], options: CompletionOptions): CompletionResponse`

Perform a chat completion with additional options.

const response = await model.completeWithOptions(
  [ChatMessage.user("Write a haiku about Rust.")],
  { temperature: 0.7, maxTokens: 100 }
);

`await model.stream(messages: ChatMessage[], onChunk: (chunk) => void): void`

Stream a chat completion. The callback receives each chunk as it arrives.

await model.stream(
  [ChatMessage.user("Tell me a story")],
  (chunk) => {
    if (chunk.delta) process.stdout.write(chunk.delta);
  }
);

Each chunk has the shape:

{
  delta?: string;              // Text content delta
  finishReason?: string;       // Set on the final chunk
  toolCalls: ToolCall[];       // Tool calls, if any
}

`await model.streamWithOptions(messages: ChatMessage[], onChunk: (chunk) => void, options: CompletionOptions): void`

Stream a chat completion with additional options.

await model.streamWithOptions(
  [ChatMessage.user("Explain quantum computing")],
  (chunk) => { if (chunk.delta) process.stdout.write(chunk.delta); },
  { temperature: 0.5, maxTokens: 500 }
);

Middleware Decorators

Each decorator returns a new CompletionModel wrapping the original with additional behaviour.

`model.withRetry(config?: RetryConfig): CompletionModel`

Automatic retry with exponential backoff on transient failures.

const resilient = model.withRetry({ maxRetries: 5, initialDelayMs: 500, maxDelayMs: 15000 });

Field	Type	Default	Description
`maxRetries`	`number`	`3`	Maximum retry attempts.
`initialDelayMs`	`number`	`1000`	Delay before first retry (ms).
`maxDelayMs`	`number`	`30000`	Upper bound on backoff delay (ms).

`model.withCache(config?: CacheConfig): CompletionModel`

In-memory response cache for identical non-streaming requests.

const cached = model.withCache({ ttlSeconds: 600, maxEntries: 500 });

Field	Type	Default	Description
`ttlSeconds`	`number`	`300`	Cache entry TTL in seconds.
`maxEntries`	`number`	`1000`	Maximum entries before eviction.

`CompletionModel.withFallback(models: CompletionModel[]): CompletionModel`

Static factory method. Tries providers in order; falls back on transient errors.

const model = CompletionModel.withFallback([
  CompletionModel.openai(),
  CompletionModel.anthropic(),
]);

CompletionOptions

Options object for completeWithOptions() and streamWithOptions().

interface CompletionOptions {
  temperature?: number;        // Sampling temperature (0.0 - 2.0)
  maxTokens?: number;          // Maximum tokens to generate
  topP?: number;               // Nucleus sampling parameter
  model?: string;              // Override the default model ID
  tools?: ToolDefinition[];    // Tool definitions for function calling
}

CompletionResponse

Returned by model.complete() and model.completeWithOptions().

interface CompletionResponse {
  content?: string;                // The generated text
  toolCalls: ToolCall[];           // Tool calls requested by the model
  usage?: TokenUsage;              // Token usage statistics
  model: string;                   // Model name used for the completion
  finishReason?: string;           // Why generation stopped ("stop", "tool_calls", etc.)
  cost?: number;                   // Cost in USD, if reported by the provider
  timing?: RequestTiming;          // Request timing breakdown
  images: object[];                // Generated images, if any (provider-specific)
  audio: object[];                 // Generated audio, if any (provider-specific)
  videos: object[];                // Generated videos, if any (provider-specific)
  metadata: object;                // Raw provider-specific metadata
}

ToolCall

A tool invocation requested by the model.

Property	Type	Description
`.id`	`string`	Unique identifier for the tool call
`.name`	`string`	Name of the tool to invoke
`.arguments`	`object`	Parsed JSON arguments

ToolOutput

Two-channel return shape for tool handlers. Tool results have two distinct audiences. The caller (your TypeScript code) wants the full structured data; the LLM, on the next turn, may need a different shape — sometimes shorter, sometimes provider-specific. ToolOutput carries both channels.

import type { ToolOutput, LlmPayload } from "blazen";

const out: ToolOutput = {
  data: { items: [1, 2, 3] },
};

Properties

Member	Type	Description
`data`	`any`	The structured value the caller sees programmatically. Dict, array, scalar, or string.
`llmOverride`	`LlmPayload \| undefined`	Optional override for what the LLM sees on the next turn. `undefined` means each provider applies its default conversion from `data`.

Both llmOverride (camelCase) and llm_override (snake_case) are accepted on input from a JS tool handler, so a hand-written object using either casing will deserialize correctly.

When the agent loop appends a tool result to the conversation, the resulting ChatMessage exposes the structured output via .toolResult:

const last = result.messages[result.messages.length - 1];
// last.toolResult?.data is the full structured payload from your handler.
// last.toolResult?.llmOverride is the override (if any) the LLM saw this turn.

For tool handlers that returned a plain string, last.toolResult is null and the string lives on last.content instead.

LlmPayload

A tagged union describing what the LLM sees for a tool result on the next turn. Used as the llmOverride field of ToolOutput.

import type { LlmPayload } from "blazen";

const text: LlmPayload = { kind: "text", text: "Found 3 results." };

const json: LlmPayload = { kind: "json", value: { items: [1, 2, 3] } };

const parts: LlmPayload = {
  kind: "parts",
  parts: [
    { partType: "text", text: "Here is the table:" },
    { partType: "text", text: "| col |\n| --- |\n| 1 |" },
  ],
};

const raw: LlmPayload = {
  kind: "provider_raw",
  provider: "anthropic",
  value: [{ type: "text", text: "Custom Anthropic-shaped payload." }],
};

Variants

`kind`	Required fields	Behavior
`"text"`	`text`	Plain text. Works on every provider.
`"json"`	`value`	Structured JSON. Anthropic and Gemini natively consume the structure; OpenAI-family stringifies once at the wire boundary.
`"parts"`	`parts` (`ContentPart[]`)	Multimodal content blocks; Anthropic supports natively, OpenAI falls back to text concatenation, Gemini wraps as `{ parts: [...] }`.
`"provider_raw"`	`provider`, `value`	Provider-specific escape hatch. Only the named provider sees `value`; every other provider falls back to converting `ToolOutput.data` with its default. `provider` is one of `"openai"`, `"openai_compat"`, `"azure"`, `"anthropic"`, `"gemini"`, `"responses"`, `"fal"`.

Per-provider behavior

When a tool returns structured data and no llmOverride, each provider sends a sensible default to the LLM:

OpenAI / OpenAI-compat / Azure / Responses / Fal: the data is JSON-stringified into the content field of the tool message.
Anthropic: structured data becomes [{ type: "text", text: <stringified-json> }] inside tool_result.content.
Gemini: structured object data is passed natively as functionResponse.response. Scalars wrap as { result: <scalar> }.

When llmOverride is set, that override always wins for the variants the provider understands; kind: "provider_raw" is the strictest — it’s only honoured when provider matches the active provider, otherwise the provider falls back to converting data with its default.

ToolDefinition

Describes a tool that the model may invoke.

interface ToolDefinition {
  name: string;            // Unique tool name
  description: string;     // Human-readable description
  parameters: object;      // JSON Schema for the tool's parameters
}

const tools: ToolDefinition[] = [
  {
    name: "getWeather",
    description: "Get the current weather for a city",
    parameters: {
      type: "object",
      properties: { city: { type: "string" } },
      required: ["city"]
    }
  }
];

Content Subsystem

Pluggable storage and handle plumbing for multimodal payloads (images, audio, video, documents, 3D models, CAD files). A ContentStore issues opaque handles; tools accept handle-id strings as arguments and Blazen substitutes the resolved typed content before the handler runs.

import {
  ContentStore,
  imageInput,
  audioInput,
  videoInput,
  fileInput,
  threeDInput,
  cadInput,
} from "blazen";
import type {
  ContentHandle,
  ContentKind,
  JsContentMetadata,
  PutOptions,
} from "blazen";

ContentKind

String enum tagging what a handle refers to. Values match the serde-tag form so they round-trip across any Blazen API that takes a kind string.

const enum ContentKind {
  Image = "image",
  Audio = "audio",
  Video = "video",
  Document = "document",
  ThreeDModel = "three_d_model",
  Cad = "cad",
  Archive = "archive",
  Font = "font",
  Code = "code",
  Data = "data",
  Other = "other",
}

ContentHandle

Opaque reference returned by ContentStore.put() and consumed by every other store method. Treat id as a black box — store-defined.

interface ContentHandle {
  id: string;             // Opaque, store-defined identifier
  kind: ContentKind;      // What kind of content this handle refers to
  mimeType?: string;      // MIME type if known
  byteSize?: number;      // Byte size if known (i64 -- napi has no u64)
  displayName?: string;   // Human-readable display name (e.g. original filename)
}

ContentHandle is a type alias for the underlying JsContentHandle interface.

ContentStore

Pluggable registry for multimodal content. Construct via the static factories; instances are cheap to clone (internally an Arc), so reusing one store across multiple agents and requests is fine.

class ContentStore {
  // Factories
  static inMemory(): ContentStore;
  static localFile(root: string): ContentStore;
  static openaiFiles(apiKey: string, baseUrl?: string | null): ContentStore;
  static anthropicFiles(apiKey: string, baseUrl?: string | null): ContentStore;
  static geminiFiles(apiKey: string, baseUrl?: string | null): ContentStore;
  static falStorage(apiKey: string, baseUrl?: string | null): ContentStore;
  static custom(options: CustomContentStoreOptions): ContentStore;

  // Operations
  put(body: Buffer | string, options: PutOptions): Promise<ContentHandle>;
  resolve(handle: ContentHandle): Promise<any>;          // serialized MediaSource
  fetchBytes(handle: ContentHandle): Promise<Buffer>;
  metadata(handle: ContentHandle): Promise<JsContentMetadata>;
  delete(handle: ContentHandle): Promise<void>;
}

put() accepts either a Buffer (inline bytes uploaded to the store) or a string — interpreted as a URL when it contains "://" (the store records the reference) and as a local filesystem path otherwise (the store reads or copies the file as needed).

resolve() returns a serialized MediaSource — the same JSON shape Blazen’s request builders accept. fetchBytes() is for tools that need to operate on the raw payload (parse a PDF, transcribe audio); most tools reason over the handle and let resolve() produce the wire form. delete() is best-effort — default implementations on most stores are no-ops.

const store = ContentStore.inMemory();

const handle = await store.put(Buffer.from(pngBytes), {
  mimeType: "image/png",
  kind: ContentKind.Image,
  displayName: "diagram.png",
});

const meta = await store.metadata(handle);
const bytes = await store.fetchBytes(handle);

Subclassing `ContentStore`

ContentStore is subclassable from JavaScript / TypeScript. Override the methods your backend needs; napi-rs wraps your subclass in a Rust adapter that dispatches into your JS async functions via ThreadsafeFunction.

import { ContentStore } from "blazen";
import type { ContentHandle, ContentKind } from "blazen";

class S3ContentStore extends ContentStore {
  constructor(bucket: string) {
    super();
    this.bucket = bucket;
  }

  async put(body, hint) {
    // ...
    return { id: "...", kind: "image" };
  }

  async resolve(handle) {
    return { sourceType: "url", url: "..." };
  }

  async fetchBytes(handle) {
    return Buffer.from("...");
  }

  // Optional:
  async fetchStream(handle) { return Buffer.from("..."); }
  async delete(handle) { /* no-op */ }
}

Subclasses MUST override put, resolve, fetchBytes. The base-class default impls throw an error so any missing override is a clear failure rather than silent recursion via super().

`ContentStore.custom({...})`

Callback-based factory. Direct JS mirror of Rust CustomContentStore::builder.

ContentStore.custom(options: {
  put: (body: any, hint: any) => Promise<ContentHandle>;
  resolve: (handle: ContentHandle) => Promise<any>;        // serialized MediaSource
  fetchBytes: (handle: ContentHandle) => Promise<Buffer>;
  fetchStream?: (handle: ContentHandle) => Promise<Buffer>; // single-chunk for now
  delete?: (handle: ContentHandle) => Promise<void>;
  name?: string;
}): ContentStore

put, resolve, fetchBytes are required. fetchStream and delete are optional. The body argument arrives as a JS object shaped like {type: "bytes", data: [...]} / {type: "url", url} / {type: "local_path", path} / {type: "provider_file", provider, id} / {type: "stream", stream: AsyncIterable<Uint8Array>, sizeHint: number | null}. The hint has optional mimeType / kindHint / displayName / byteSize.

resolve returns a serialized MediaSource JS object (e.g. {sourceType: "url", url: "..."}). fetchBytes returns a Buffer. fetchStream may return either Buffer / Uint8Array / number[] / base64 string (legacy, single-chunk) or an AsyncIterable<Uint8Array> for true chunk-by-chunk streaming — a Node Readable qualifies since it implements [Symbol.asyncIterator].

PutOptions

Optional hints attached to a put() call. Every field is optional; the store may auto-detect from the bytes when a hint is missing.

interface PutOptions {
  mimeType?: string;      // MIME type, if known
  kind?: ContentKind;     // Caller's preferred classification -- overrides any auto-detection
  displayName?: string;   // Human-readable display name (filename, caption)
  byteSize?: number;      // Byte size, if known up-front (i64 since napi has no u64)
}

ContentMetadata

Cheap metadata summary returned by ContentStore.metadata(). No bytes are materialized.

interface ContentMetadata {
  kind: ContentKind;
  mimeType?: string;
  byteSize?: number;
  displayName?: string;
}

ContentMetadata is a type alias for the underlying JsContentMetadata interface.

Built-in stores

Factory	Purpose
`ContentStore.inMemory()`	Ephemeral in-memory store. Default for tests and short-lived sessions.
`ContentStore.localFile(root)`	Filesystem-backed store rooted at `root`. Directory is created if missing.
`ContentStore.openaiFiles(apiKey, baseUrl?)`	Backed by the OpenAI Files API.
`ContentStore.anthropicFiles(apiKey, baseUrl?)`	Backed by the Anthropic Files API.
`ContentStore.geminiFiles(apiKey, baseUrl?)`	Backed by the Gemini Files API.
`ContentStore.falStorage(apiKey, baseUrl?)`	Backed by fal.ai’s storage API.
`ContentStore.custom({...})`	User-defined backend via async callbacks (see above).

Tool-input schema helpers

Each helper builds a JSON Schema fragment declaring a single required handle-id input. The model emits a handle-id string; Blazen swaps it for the resolved typed content before your tool runs.

Helper	Declares an input expecting a handle of kind
`imageInput(name, description)`	`image`
`audioInput(name, description)`	`audio`
`videoInput(name, description)`	`video`
`fileInput(name, description)`	`document`
`threeDInput(name, description)`	`three_d_model`
`cadInput(name, description)`	`cad`

Each call returns the same shape (kind tag varies):

imageInput("photo", "The image to describe");
// =>
// {
//   type: "object",
//   properties: {
//     photo: {
//       type: "string",
//       description: "The image to describe",
//       "x-blazen-content-ref": { kind: "image" }
//     }
//   },
//   required: ["photo"]
// }

The x-blazen-content-ref extension is invisible to providers — they see a plain string parameter — but Blazen’s resolver uses it as a marker for handle substitution.

How resolution works

When the model emits a tool call like {"photo": "blazen_xxx"}, Blazen scans the tool’s parameter schema for x-blazen-content-ref markers. For each marked field, it looks up the handle id in the active ContentStore and replaces the bare string with a typed object before invoking your handler:

{
  kind: "image",
  handleId: "blazen_xxx",
  mimeType: "image/png",
  byteSize: 24576,
  displayName: "diagram.png",
  source: { /* resolved MediaSource -- url, base64, or provider-native ref */ }
}

source is the same wire form ContentStore.resolve() returns, so your handler can forward it straight into a downstream provider request, or call store.fetchBytes(handle) if it needs the raw payload. If the handle is unknown to the store the call fails before the handler runs.

TokenUsage

Token usage statistics for a completion request.

Property	Type	Description
`.promptTokens`	`number`	Tokens in the prompt
`.completionTokens`	`number`	Tokens in the completion
`.totalTokens`	`number`	Total tokens used

RequestTiming

Timing metadata for a completion request.

Property	Type	Description
`.queueMs`	`number \| undefined`	Time spent waiting in queue (ms)
`.executionMs`	`number \| undefined`	Time spent executing (ms)
`.totalMs`	`number \| undefined`	Total wall-clock time (ms)

runAgent

Run an agentic tool execution loop. The agent repeatedly calls the model, executes tool calls via the handler callback, feeds results back, and repeats until the model stops calling tools or maxIterations is reached.

const result = await runAgent(model, messages, tools, toolHandler, options?);

Parameters

Parameter	Type	Description
`model`	`CompletionModel`	The completion model to use
`messages`	`ChatMessage[]`	Initial conversation messages
`tools`	`ToolDef[]`	Tool definitions the agent can invoke
`toolHandler`	`(toolName: string, args: object) => Promise<any \| ToolOutput>`	Callback that executes tool calls. May return a bare value (auto-wrapped) or an explicit `ToolOutput`.
`options`	`AgentRunOptions?`	Optional configuration

Example

import { CompletionModel, ChatMessage, runAgent } from "blazen";

const model = CompletionModel.openai();

const result = await runAgent(
  model,
  [ChatMessage.user("What is the weather in NYC?")],
  [{
    name: "getWeather",
    description: "Get weather for a city",
    parameters: {
      type: "object",
      properties: { city: { type: "string" } },
      required: ["city"]
    }
  }],
  async (toolName, args) => {
    if (toolName === "getWeather") {
      return { temp: 72, condition: "sunny" };
    }
    throw new Error(`Unknown tool: ${toolName}`);
  },
  { maxIterations: 5 }
);

console.log(result.response.content);
console.log(`Took ${result.iterations} iterations`);

Tool handler return shapes

The handler’s return value is checked for an explicit data key. If present and the value deserializes as a ToolOutput, it’s used directly. Otherwise the bare value is wrapped via ToolOutput { data: <value>, llmOverride: undefined }.

This means an arbitrary user dict like { items: [1, 2, 3] } is treated as plain data, not as a ToolOutput. Only objects with a top-level data field are unpacked.

import { runAgent, type ToolDef } from "blazen";

// Simplest: return a value directly, auto-wrapped.
const search: ToolDef = {
  name: "search",
  description: "Search for items.",
  parameters: { type: "object", properties: { q: { type: "string" } } },
};

async function handlerSimple(toolName: string, args: any) {
  if (toolName === "search") {
    return { items: [1, 2, 3] }; // wrapped as ToolOutput { data: { items: [1,2,3] } }
  }
  throw new Error(`Unknown tool: ${toolName}`);
}

// With override: structured ToolOutput so the LLM sees a summary,
// but the caller's `messages[messages.length-1].toolResult.data`
// still has the full list.
async function handlerWithOverride(toolName: string, args: any) {
  if (toolName === "search") {
    return {
      data: { items: [1, 2, 3], rawResponse: "..." },
      llmOverride: { kind: "text", text: "Found 3 items." },
    };
  }
  throw new Error(`Unknown tool: ${toolName}`);
}

To return a string to the caller (so it lives on ChatMessage.content and toolResult is null), simply return a string from the handler:

async function handlerString(toolName: string, args: any) {
  return "ok"; // appears as ChatMessage.content; toolResult is null
}

See ToolOutput and LlmPayload for the full shape and per-provider wire behaviour.

ToolDef

Describes a tool that the agent may invoke.

interface ToolDef {
  name: string;            // Unique tool name
  description: string;     // Human-readable description
  parameters: object;      // JSON Schema for the tool's parameters
}

AgentRunOptions

Options for configuring an agent run.

interface AgentRunOptions {
  maxIterations?: number;    // Max tool-calling iterations (default: 10)
  systemPrompt?: string;     // System prompt prepended to the conversation
  temperature?: number;      // Sampling temperature (0.0 - 2.0)
  maxTokens?: number;        // Maximum tokens per completion call
  addFinishTool?: boolean;   // Add a built-in "finish" tool the model can call to signal completion
}

AgentResult

The result of an agent run, returned by runAgent(). AgentResult is a typed class with getter properties (not a plain object) so it carries identity across the FFI boundary and supports instanceof checks.

Properties

Property	Type	Description
`.response`	`string`	Final assistant text from the last completion.
`.messages`	`ChatMessage[]`	Full message history (all tool calls and results).
`.iterations`	`number`	Number of tool-calling iterations that occurred.
`.totalCost`	`number \| null`	Aggregated cost in USD across all iterations, or `null` if pricing is unknown.
`.toString()`	`string`	Human-readable summary, mirrors the Python `__repr__`.

const result = await runAgent(model, [ChatMessage.user("Hi")], tools);
console.log(result.response);            // "Hello!"
console.log(result.messages.length);     // e.g. 3
console.log(result.iterations);          // e.g. 1
console.log(result.totalCost);           // e.g. 0.00012 or null

BatchResult

Returned by completeBatch() / completeBatchConfig(). A typed class wrapping per-request outcomes plus aggregates.

Properties

Property	Type	Description
`.responses`	`(CompletionResponse \| null)[]`	One entry per input request. `null` for failed requests.
`.errors`	`(string \| null)[]`	Per-request error message, or `null` for successful requests.
`.totalUsage`	`TokenUsage \| null`	Aggregated token usage across all successful responses.
`.totalCost`	`number \| null`	Aggregated cost in USD across all successful responses.
`.successCount`	`number`	Number of requests that succeeded.
`.failureCount`	`number`	Number of requests that failed.
`.length`	`number`	Total number of requests in the batch (= responses.length = errors.length).
`.toString()`	`string`	Human-readable batch summary.

const batch = await completeBatch(model, [
  [ChatMessage.user("What's 2+2?")],
  [ChatMessage.user("Capital of France?")],
]);
console.log(`${batch.successCount}/${batch.length} succeeded`);
for (let i = 0; i < batch.length; i++) {
  if (batch.errors[i]) console.error(`req ${i}:`, batch.errors[i]);
  else console.log(`req ${i}:`, batch.responses[i]?.content);
}

Pipeline

A Pipeline is a sequence of named Stages built with PipelineBuilder. Each stage runs as its own workflow; on completion, an optional persist callback fires with a typed snapshot so the caller can durably store progress for later resumption.

`new PipelineBuilder(name: string)`

import { PipelineBuilder } from "blazen";

const pipeline = new PipelineBuilder("ingest")
  .stage(stageA)
  .stage(stageB)
  .timeoutPerStage(120)
  .build();

`.onPersist(callback)`

Register a TSFN-based persist callback that receives a typed PipelineSnapshot after each stage completes. The callback must return Promise<void> (or be async). A rejected promise aborts the pipeline with a PipelineError.

import { PipelineBuilder, PipelineSnapshot } from "blazen";

const pipeline = new PipelineBuilder("ingest")
  .stage(stage)
  .onPersist(async (snapshot: PipelineSnapshot) => {
    await db.put(`pipeline:${snapshot.runId}`, snapshot.toJsonPretty());
  })
  .build();

`.onPersistJson(callback)`

Same as onPersist, but the callback receives the snapshot pre-serialized as a JSON string. Useful for backends that store opaque blobs (IndexedDB, Redis, S3).

const pipeline = new PipelineBuilder("ingest")
  .stage(stage)
  .onPersistJson(async (json: string) => {
    await idb.put("snapshots", json, runId);
  })
  .build();

The snapshot can later be replayed via pipeline.resume(PipelineSnapshot.fromJson(json)).

If your onPersist (or onPersistJson) callback throws or returns a rejected promise, the rejection is wrapped as a PersistError (a BlazenError subclass) and propagated to the running pipeline, aborting it. Catch the error from await handler.result() and inspect with instanceof PersistError to distinguish persistence failures from stage failures.

import { PersistError } from "blazen";

try {
  await handler.result();
} catch (e) {
  if (e instanceof PersistError) {
    console.error("snapshot persistence failed:", e.message);
  }
}

Workflow

`new Workflow(name: string)`

Create a new workflow instance. Default timeout is 300 seconds (5 minutes).

const wf = new Workflow("my-workflow");

`.addStep(name: string, eventTypes: string[], handler: StepHandler)`

wf.addStep("process", ["MyEvent"], async (event, ctx) => {
  return { type: "blazen::StopEvent", result: { done: true } };
});

`.setTimeout(seconds: number)`

Set the maximum execution time for the workflow in seconds. Set to 0 or negative to disable.

wf.setTimeout(30);

`await wf.run(input: object): WorkflowResult`

Run the workflow to completion with the given input.

const result = await wf.run({ prompt: "Hello" });

`await wf.runStreaming(input: object, callback: (event) => void): WorkflowResult`

Run the workflow with a streaming callback invoked for each event published via ctx.writeEventToStream().

const result = await wf.runStreaming({ prompt: "Hello" }, (event) => {
  console.log("stream:", event);
});

`await wf.runWithHandler(input: object): WorkflowHandler`

Run the workflow and return a handler for pause/resume and streaming control.

const handler = await wf.runWithHandler({ prompt: "Hello" });

`await wf.resume(snapshotJson: string): WorkflowHandler`

Resume a previously paused workflow from a snapshot JSON string.

const snapshot = fs.readFileSync("snapshot.json", "utf-8");
const handler = await wf.resume(snapshot);
const result = await handler.result();

WorkflowResult

interface WorkflowResult {
  type: string;     // Event type of the final result (e.g. "blazen::StopEvent")
  data: object;     // Result data extracted from the StopEvent's result field
}

StepHandler

async (event: object, ctx: Context) => object | object[] | null

A step handler receives an event and a context. It can return:

A single event object to emit one event.
An array of event objects to fan-out multiple events.
null for side-effect-only steps that emit no events.

WorkflowHandler

Returned by Workflow.runWithHandler() and Workflow.resume(). Provides control over a running workflow.

Important: result() consumes the handler internally — you can only call it once. The other control methods (pause, resumeInPlace, abort, respondToInput, snapshot) borrow the handler and can be called multiple times.

`await handler.result(): WorkflowResult`

Await the final workflow result.

`await handler.pause(): void`

Signal the running workflow to pause. After pausing, use snapshot() to get a serializable snapshot, or resumeInPlace() to continue execution.

`await handler.snapshot(): string`

Get a serializable snapshot of the paused workflow as a JSON string. Save this to resume later with Workflow.resume().

`await handler.resumeInPlace(): void`

Resume a paused workflow in place without creating a new handler.

`await handler.streamEvents(callback: (event) => void): void`

Subscribe to intermediate events published via ctx.writeEventToStream(). Must be called before result() or pause().

const handler = await wf.runWithHandler({ prompt: "Hello" });

// Subscribe to stream events
await handler.streamEvents((event) => console.log(event));

// Then await the result
const result = await handler.result();

Events

Events are plain objects with a type field.

{ type: "MyEvent", payload: "data" }

Start Event

{ type: "blazen::StartEvent", ...input }

The workflow begins by emitting a StartEvent containing the input data.

Stop Event

{ type: "blazen::StopEvent", result: { ... } }

Returning a StopEvent from a step handler completes the workflow.

Context

Shared workflow context accessible by all steps. All methods are async.

StateValue

All context values conform to the StateValue type:

type StateValue = string | number | boolean | null | Buffer | StateValue[] | { [key: string]: StateValue };

`await ctx.set(key: string, value: Exclude<StateValue, Buffer>): void`

Store a JSON-serializable value in the workflow context. Accepts strings, numbers, booleans, null, arrays, and nested objects. For binary data, use ctx.setBytes() instead.

The legacy ctx.set / ctx.get shortcuts still work and route values through the same 4-tier dispatch. For new code, prefer the explicit ctx.state / ctx.session namespaces documented below.

`await ctx.get(key: string): Promise<StateValue | null>`

Retrieve a value from the workflow context. Returns null if not found. Returns data for all StateValue variants — strings, numbers, booleans, arrays, objects, and Buffer (if the key was stored via setBytes). No data is silently dropped.

`await ctx.setBytes(key: string, buffer: Buffer): void`

Store raw binary data in the workflow context. Use this for explicit binary storage (e.g., MessagePack, protobuf, raw buffers). Binary data persists through pause/resume/checkpoint.

`await ctx.getBytes(key: string): Buffer | null`

Retrieve raw binary data from the workflow context. Returns null if not found. Note that ctx.get() also returns binary data now, so getBytes is mainly useful when you want to assert that a key holds binary content.

`await ctx.runId(): string`

Get the unique run UUID for the current workflow execution.

`await ctx.sendEvent(event: object): void`

Manually route an event into the workflow event bus. The event will be delivered to any step whose eventTypes list includes its type.

`await ctx.writeEventToStream(event: object): void`

Publish an event to the external broadcast stream. Consumers that subscribed via runStreaming or handler.streamEvents() will receive this event. Unlike sendEvent, this does not route the event through the internal step registry.

`get state(): StateNamespace`

Persistable workflow state. Survives pause() / resume(), checkpoints, and durable storage. See StateNamespace below.

`get session(): SessionNamespace`

In-process-only values, excluded from snapshots. Use this for things that should not survive pause() / resume(). JS object identity is NOT preserved on Node — see the SessionNamespace caveat below.

StateNamespace

Namespace for persistable workflow state. Values stored via state.set / state.setBytes go into the underlying ContextInner.state map and survive snapshots, pause() / resume(), and checkpoint stores.

`await state.set(key: string, value: Exclude<StateValue, Buffer>): Promise<void>`

Store a JSON-serializable value under the given key.

`await state.get(key: string): Promise<StateValue | null>`

Retrieve a value previously stored under the given key. Returns null if not found.

`await state.setBytes(key: string, data: Buffer): Promise<void>`

Store raw binary data under the given key.

`await state.getBytes(key: string): Promise<Buffer | null>`

Retrieve raw binary data previously stored under the given key. Returns null if not found.

workflow.addStep("step", ["blazen::StartEvent"], async (event, ctx) => {
  await ctx.state.set("counter", 5);
  const count = await ctx.state.get("counter");
  return { type: "blazen::StopEvent", result: { count } };
});

SessionNamespace

Namespace for in-process-only workflow values. Values stored via session.set are kept in the ContextInner.objects side-channel and are excluded from snapshots. Use this for state that should not survive a pause() / resume() round-trip (request IDs, rate-limit counters, ephemeral caches, …).

Important — napi-rs identity caveat: JS object identity is NOT preserved through ctx.session on the Node bindings. Values are routed through serde_json::Value because napi-rs’s Reference<T> is !Send — its Drop must run on the v8 main thread, and tokio worker threads cannot safely cross the napi boundary with live JS object references. await ctx.session.get(key) returns a plain object equal to the one you passed in, not the same object instance. For true JS class identity preservation, use the Python or WASM bindings, or keep the work inside a single Rust step. Full identity through events is tracked as a follow-up architectural refactor.

The session namespace is still functionally distinct from state: session values are excluded from snapshots, state values are not.

`await session.set(key: string, value: unknown): Promise<void>`

Store a JSON-serializable value under the given key. The value is excluded from snapshots.

`await session.get(key: string): Promise<unknown>`

Retrieve a value previously stored under the given key. Returns null if the key does not exist.

`await session.has(key: string): Promise<boolean>`

Check whether a value exists under the given key.

`await session.remove(key: string): Promise<void>`

Remove the value stored under the given key.

workflow.addStep("step", ["blazen::StartEvent"], async (event, ctx) => {
  await ctx.session.set("reqId", "abc123");
  if (await ctx.session.has("reqId")) {
    const id = await ctx.session.get("reqId");
    console.log("request id:", id);
  }
  return { type: "blazen::StopEvent", result: {} };
});

state and session use independent keyspaces — the same key can exist in both namespaces without colliding:

await ctx.state.set("k", "state-value");
await ctx.session.set("k", "session-value");
// Both are accessible; they don't collide.

BlazenState

Base class for typed state objects with per-field context storage. Extend this class to define structured workflow state that is automatically serialized and deserialized field by field.

`static meta?: BlazenStateMeta`

Optional static metadata that controls how fields are stored and which fields are transient.

interface BlazenStateMeta {
  transient?: Set<string> | string[];
}

Field	Type	Description
`transient`	`Set<string> \| string[]`	Field names to exclude from persistence. These fields are not saved by `saveTo()` and will not be present after `loadFrom()`.

`restore?(): void | Promise<void>`

Optional instance method called by loadFrom() after all persisted fields have been restored. Use this to recreate transient fields (caches, connections, derived data) that were excluded from persistence.

`await state.saveTo(ctx: Context, key: string): Promise<void>`

Persist every non-transient field of the state instance into the workflow context. Each field is stored individually under a namespaced key derived from key.

`static loadFrom<T>(ctx: Context, key: string): Promise<T>`

Restore a state instance from the workflow context. Reads each persisted field, constructs a new instance, and calls restore() if defined.

const state = await AgentState.loadFrom<AgentState>(ctx, "state");

Compute Request Types

Typed request interfaces for compute operations (image generation, video, speech, music, transcription, 3D models).

ImageRequest

interface ImageRequest {
  prompt: string;                  // Text prompt describing the desired image
  negativePrompt?: string;         // Things to avoid in the image
  width?: number;                  // Desired image width in pixels
  height?: number;                 // Desired image height in pixels
  numImages?: number;              // Number of images to generate
  model?: string;                  // Model override (provider-specific)
  parameters?: object;             // Additional provider-specific parameters
}

UpscaleRequest

interface UpscaleRequest {
  imageUrl: string;                // URL of the image to upscale
  scale: number;                   // Scale factor (e.g. 2.0, 4.0)
  model?: string;
  parameters?: object;
}

VideoRequest

interface VideoRequest {
  prompt: string;                  // Text prompt describing the desired video
  imageUrl?: string;               // Source image URL for image-to-video
  durationSeconds?: number;        // Desired duration in seconds
  negativePrompt?: string;         // Things to avoid
  width?: number;                  // Video width in pixels
  height?: number;                 // Video height in pixels
  model?: string;
  parameters?: object;
}

SpeechRequest

interface SpeechRequest {
  text: string;                    // Text to synthesize into speech
  voice?: string;                  // Voice identifier (provider-specific)
  voiceUrl?: string;               // Reference voice sample URL for cloning
  language?: string;               // Language code (e.g. "en", "fr", "ja")
  speed?: number;                  // Speech speed multiplier (1.0 = normal)
  model?: string;
  parameters?: object;
}

MusicRequest

interface MusicRequest {
  prompt: string;                  // Text prompt describing the desired audio
  durationSeconds?: number;        // Desired duration in seconds
  model?: string;
  parameters?: object;
}

TranscriptionRequest

interface TranscriptionRequest {
  audioUrl: string;                // URL of the audio file to transcribe
  language?: string;               // Language hint (e.g. "en", "fr")
  diarize?: boolean;               // Whether to perform speaker diarization
  model?: string;
  parameters?: object;
}

ThreeDRequest

interface ThreeDRequest {
  prompt?: string;                 // Text prompt describing the desired 3D model
  imageUrl?: string;               // Source image URL for image-to-3D
  format?: string;                 // Output format (e.g. "glb", "obj", "usdz")
  model?: string;
  parameters?: object;
}

Compute Result Types

ImageResult

interface ImageResult {
  images: GeneratedImage[];        // Generated or upscaled images
  timing?: ComputeTiming;          // Request timing breakdown
  cost?: number;                   // Cost in USD
  metadata: object;                // Provider-specific metadata
}

VideoResult

interface VideoResult {
  videos: GeneratedVideo[];
  timing?: ComputeTiming;
  cost?: number;
  metadata: object;
}

AudioResult

interface AudioResult {
  audio: GeneratedAudio[];
  timing?: ComputeTiming;
  cost?: number;
  metadata: object;
}

TranscriptionResult

interface TranscriptionResult {
  text: string;                          // Full transcribed text
  segments: TranscriptionSegment[];      // Time-aligned segments
  language?: string;                     // Detected or specified language code
  timing?: ComputeTiming;
  cost?: number;
  metadata: object;
}

TranscriptionSegment

interface TranscriptionSegment {
  text: string;            // Transcribed text for this segment
  start: number;           // Start time in seconds
  end: number;             // End time in seconds
  speaker?: string;        // Speaker label (if diarization enabled)
}

ThreeDResult

interface ThreeDResult {
  models: Generated3DModel[];
  timing?: ComputeTiming;
  cost?: number;
  metadata: object;
}

Compute Job Types

Low-level types for generic compute jobs.

ComputeRequest

interface ComputeRequest {
  model: string;             // Model/endpoint to run (e.g. "fal-ai/flux/dev")
  input: object;             // Input parameters (model-specific)
  webhook?: string;          // Webhook URL for async completion notification
}

JobHandle

interface JobHandle {
  id: string;                // Provider-assigned job identifier
  provider: string;          // Provider name (e.g. "fal", "replicate", "runpod")
  model: string;             // Model/endpoint that was invoked
  submittedAt: string;       // ISO 8601 timestamp
}

JobStatus

String enum for compute job status.

JobStatus.Queued      // "queued"
JobStatus.Running     // "running"
JobStatus.Completed   // "completed"
JobStatus.Failed      // "failed"
JobStatus.Cancelled   // "cancelled"

ComputeResult

interface ComputeResult {
  job?: JobHandle;           // Job handle that produced this result
  output: object;            // Output data (model-specific)
  timing?: ComputeTiming;    // Request timing breakdown
  cost?: number;             // Cost in USD
  metadata: object;          // Raw provider-specific metadata
}

ComputeTiming

interface ComputeTiming {
  queueMs?: number;          // Time spent waiting in queue (ms)
  executionMs?: number;      // Time spent executing (ms)
  totalMs?: number;          // Total wall-clock time (ms)
}

Media Output Types

MediaOutput

A single piece of generated media content.

interface MediaOutput {
  url?: string;              // URL where the media can be downloaded
  base64?: string;           // Base64-encoded media data
  rawContent?: string;       // Raw text content (SVG, OBJ, GLTF JSON)
  mediaType: string;         // MIME type (e.g. "image/png", "video/mp4")
  fileSize?: number;         // File size in bytes
  metadata: object;          // Provider-specific metadata
}

GeneratedImage

interface GeneratedImage {
  media: MediaOutput;
  width?: number;            // Image width in pixels
  height?: number;           // Image height in pixels
}

GeneratedVideo

interface GeneratedVideo {
  media: MediaOutput;
  width?: number;            // Video width in pixels
  height?: number;           // Video height in pixels
  durationSeconds?: number;  // Duration in seconds
  fps?: number;              // Frames per second
}

GeneratedAudio

interface GeneratedAudio {
  media: MediaOutput;
  durationSeconds?: number;  // Duration in seconds
  sampleRate?: number;       // Sample rate in Hz
  channels?: number;         // Number of audio channels
}

Generated3DModel

interface Generated3DModel {
  media: MediaOutput;
  vertexCount?: number;      // Total vertex count
  faceCount?: number;        // Total face/triangle count
  hasTextures: boolean;      // Whether the model includes textures
  hasAnimations: boolean;    // Whether the model includes animations
}

EmbeddingModel

Generate vector embeddings from text. Created via static factory methods. Keys are read from environment variables (OPENAI_API_KEY, TOGETHER_API_KEY, etc.) when options is omitted, or can be passed explicitly via { apiKey: "..." }.

import { EmbeddingModel } from "blazen";

const model = EmbeddingModel.openai();
const together = EmbeddingModel.together();
const cohere = EmbeddingModel.cohere({ apiKey: "co-..." });
const fireworks = EmbeddingModel.fireworks();

Provider Factory Methods

Method	Default Model	Default Dimensions
`EmbeddingModel.openai(options?)`	`text-embedding-3-small`	1536
`EmbeddingModel.together(options?)`	`togethercomputer/m2-bert-80M-8k-retrieval`	768
`EmbeddingModel.cohere(options?)`	`embed-v4.0`	1024
`EmbeddingModel.fireworks(options?)`	`nomic-ai/nomic-embed-text-v1.5`	768

Properties

Property	Type	Description
`.modelId`	`string`	The model identifier.
`.dimensions`	`number`	Output vector dimensionality.

`await model.embed(texts: string[]): EmbeddingResponse`

Embed one or more texts, returning one vector per input.

const response = await model.embed(["Hello", "World"]);
console.log(response.embeddings.length);    // 2
console.log(response.embeddings[0].length); // 1536

EmbeddingResponse

Returned by EmbeddingModel.embed().

interface EmbeddingResponse {
  embeddings: number[][];         // One vector per input text
  model: string;                  // Model that produced the embeddings
  usage?: TokenUsage;             // Token usage statistics
  cost?: number;                  // Estimated cost in USD
  timing?: RequestTiming;         // Request timing breakdown
  metadata: object;               // Provider-specific metadata
}

Token Estimation

Lightweight token counting functions. Uses a heuristic (~3.5 characters per token) suitable for budget checks without external data files.

`estimateTokens(text: string, contextSize?: number): number`

Estimate token count for a text string.

import { estimateTokens } from "blazen";

const count = estimateTokens("Hello, world!");  // 4

`countMessageTokens(messages: ChatMessage[], contextSize?: number): number`

Estimate total tokens for an array of chat messages, including per-message overhead.

import { countMessageTokens, ChatMessage } from "blazen";

const count = countMessageTokens([
  ChatMessage.system("You are helpful."),
  ChatMessage.user("Hello!"),
]);

contextSize defaults to 128000 if omitted.

Subclassable Providers

CompletionModel, EmbeddingModel, and Transcription can be subclassed to implement custom providers. Override the relevant methods and the framework will dispatch to your implementation.

CompletionModel

import { CompletionModel, ChatMessage } from "blazen";

class MyLLM extends CompletionModel {
  constructor() {
    super({ modelId: "my-llm" });
  }

  async complete(messages: ChatMessage[]) {
    // Your inference logic here
    return { content: "Hello from my custom model" };
  }

  async stream(messages: ChatMessage[], onChunk: (chunk: any) => void) {
    onChunk({ delta: "Hello", finishReason: null, toolCalls: [] });
    onChunk({ delta: null, finishReason: "stop", toolCalls: [] });
  }
}

const model = new MyLLM();
const response = await model.complete([ChatMessage.user("Hi")]);

EmbeddingModel

import { EmbeddingModel } from "blazen";

class MyEmbedder extends EmbeddingModel {
  constructor() {
    super({ modelId: "my-embedder", dimensions: 128 });
  }

  async embed(texts: string[]) {
    return {
      embeddings: texts.map(() => new Array(128).fill(0.1)),
      model: "my-embedder",
    };
  }
}

Transcription

import { Transcription } from "blazen";

class MyTranscriber extends Transcription {
  constructor() {
    super({ providerId: "my-stt" });
  }

  async transcribe(request: any) {
    return { text: "transcribed text", segments: [] };
  }
}

Per-Capability Provider Classes

Seven provider base classes let you implement a single compute capability without dealing with the full ComputeProvider interface. Subclass and override the relevant methods.

Class	Methods to Override	Rust Trait
`TTSProvider`	`textToSpeech(request)`	`AudioGeneration`
`MusicProvider`	`generateMusic(request)`, `generateSfx(request)`	`AudioGeneration`
`ImageProvider`	`generateImage(request)`, `upscaleImage(request)`	`ImageGeneration`
`VideoProvider`	`textToVideo(request)`, `imageToVideo(request)`	`VideoGeneration`
`ThreeDProvider`	`generate3d(request)`	`ThreeDGeneration`
`BackgroundRemovalProvider`	`removeBackground(request)`	`BackgroundRemoval`
`VoiceProvider`	`cloneVoice(request)`, `listVoices()`, `deleteVoice(voice)`	`VoiceCloning`

Constructor

All provider classes share the same constructor config:

new TTSProvider({
  providerId: string,
  baseUrl?: string,
  pricing?: ModelPricing,
  memoryEstimateBytes?: number,
})

Field	Type	Description
`providerId`	`string`	Identifier for the provider instance.
`baseUrl`	`string?`	Optional base URL for the provider API.
`pricing`	`ModelPricing?`	Optional pricing info for cost tracking.
`memoryEstimateBytes`	`number?`	Estimated memory footprint in bytes (host RAM if on CPU, GPU VRAM otherwise). Used by `ModelManager` to charge the appropriate pool.

Example

import { TTSProvider } from "blazen";

class ElevenLabsTTS extends TTSProvider {
  private apiKey: string;

  constructor(apiKey: string) {
    super({ providerId: "elevenlabs" });
    this.apiKey = apiKey;
  }

  async textToSpeech(request: any) {
    // Call ElevenLabs API with this.apiKey
    return { audio: audioBuffer, format: "mp3" };
  }
}

const tts = new ElevenLabsTTS("sk-...");
const result = await tts.textToSpeech({ text: "Hello world", voice: "alice" });

MemoryBackend

Base class for custom memory storage backends. Subclass to implement persistence backed by Postgres, SQLite, DynamoDB, or any other store.

import { MemoryBackend } from "blazen";

class PostgresBackend extends MemoryBackend {
  async put(entry: any): Promise<void> {
    // Insert or update entry in Postgres
  }

  async get(id: string): Promise<any | null> {
    // Retrieve entry by id
  }

  async delete(id: string): Promise<boolean> {
    // Delete entry, return true if it existed
  }

  async list(): Promise<any[]> {
    // Return all entries
  }

  async len(): Promise<number> {
    // Return count of entries
  }

  async searchByBands(bands: any, limit: number): Promise<any[]> {
    // Return candidates sharing LSH bands with the query
  }
}

Methods to Override

Method	Signature	Description
`put`	`async put(entry): Promise<void>`	Insert or update a stored entry.
`get`	`async get(id: string): Promise<any \| null>`	Retrieve a stored entry by id.
`delete`	`async delete(id: string): Promise<boolean>`	Delete an entry by id. Returns `true` if it existed.
`list`	`async list(): Promise<any[]>`	Return all stored entries.
`len`	`async len(): Promise<number>`	Return the number of stored entries.
`searchByBands`	`async searchByBands(bands, limit): Promise<any[]>`	Return candidate entries sharing at least one LSH band.

ProgressCallback

Subclassable base for download progress callbacks. Pass an instance to ModelCache.download() (and other download-capable APIs) to receive byte-count progress updates.

onProgress takes bigint byte counts so multi-gigabyte downloads keep full precision. total is null when the server does not send Content-Length.

import { ModelCache, ProgressCallback } from "blazen";

class MyProgress extends ProgressCallback {
  onProgress(downloaded: bigint, total?: bigint | null): void {
    if (total != null) {
      const pct = Number((downloaded * 100n) / total);
      console.log(`${pct}%`);
    } else {
      console.log(`${downloaded} bytes`);
    }
  }
}

const cache = ModelCache.create();
await cache.download("bert-base-uncased", "config.json", new MyProgress());

The base onProgress always throws — overriding it is mandatory. super() must be called from the subclass constructor.

ProgressCallback instances are accepted anywhere the SDK exposes a download hook — ModelCache.download(), the local-inference Provider.create() paths that pull weights from HuggingFace, and the ProgressCallback-aware variants of dataset loaders. Pass the same ProgressCallback instance to multiple downloads to centralise reporting (e.g. for a TUI progress bar).

ModelManager

Per-pool memory budget-aware model manager with LRU eviction. Tracks registered local models and their estimated memory footprint (host RAM if the model runs on CPU, GPU VRAM otherwise). Models live in distinct pools (CPU RAM vs per-GPU VRAM); when loading a model that would exceed its pool’s budget, the least-recently-used loaded model within that same pool is unloaded first. A CPU embedder never evicts a GPU LLM, and vice versa.

ModelManager is a memory budget bookkeeper, not a performance scheduler. It answers “will this fit?” — not “will this run fast?”. Whether a 70B model loaded on CPU is useful at 1–3 tok/s is a workload-choice question the manager intentionally does not answer.

Constructor

import { ModelManager } from "blazen";

// Common case: separate CPU and GPU budgets, in gigabytes.
const manager = new ModelManager({ cpuRamGb: 100, gpuVramGb: 24 });

// Explicit per-pool budgets in bytes (BigInt). Pool labels: "cpu", "gpu", "gpu:N".
const manager = new ModelManager({
  poolBudgets: {
    "cpu": 100_000_000_000n,
    "gpu:0": 24_000_000_000n,
  },
});

// No-arg / empty-object: BOTH the "cpu" and "gpu:0" pools default to
// u64::MAX (unlimited-budget sentinel). Useful for tests and ad-hoc scripts.
const manager = new ModelManager();

Field	Type	Description
`cpuRamGb`	`number?`	Host RAM budget in gigabytes for the `"cpu"` pool.
`gpuVramGb`	`number?`	VRAM budget in gigabytes for the `"gpu:0"` pool. Omit if you don’t load GPU models.
`poolBudgets`	`Record<string, bigint>?`	Explicit per-pool byte budgets. Keys are pool labels (`"cpu"`, `"gpu"`, `"gpu:N"`); values are `bigint` byte budgets. Use this when you need budgets above 4 GiB on a per-pool basis or multi-GPU setups.

Methods

Method	Signature	Description
`register`	`await manager.register(id, model, memoryEstimateBytes?: bigint)`	Register a model with its estimated memory footprint (in bytes, as a `bigint`). Starts unloaded. The pool charged is determined by the model’s `device()` (defaults to `"cpu"`).
`registerLocalModel`	`await manager.registerLocalModel(id, load, unload, isLoaded, memoryEstimateBytes?: bigint, device?: string)`	Register a model from raw closures. The optional `device` string (default `"cpu"`) selects which pool to charge — e.g. `"cuda:0"` charges `"gpu:0"`.
`load`	`await manager.load(id)`	Load a model, evicting LRU models in the same pool if needed.
`unload`	`await manager.unload(id)`	Unload a model and free its memory.
`isLoaded`	`await manager.isLoaded(id): boolean`	Check if a model is currently loaded.
`ensureLoaded`	`await manager.ensureLoaded(id)`	Alias for `load()`.
`usedBytes`	`await manager.usedBytes(pool?: string): bigint`	Bytes currently used by loaded models in the given pool. `pool` defaults to `"cpu"`. Invalid pool labels reject with `invalid pool label '<x>': expected 'cpu', 'gpu', or 'gpu:N' where N is a non-negative integer`.
`availableBytes`	`await manager.availableBytes(pool?: string): bigint`	Bytes still available within the given pool’s budget. Same default and validation rules as `usedBytes`.
`pools`	`manager.pools(): Array<{ pool: string; budgetBytes: bigint }>`	Sync. List every configured pool and its byte budget.
`status`	`await manager.status(): ModelStatus[]`	Status of all registered models.

ModelStatus

interface ModelStatus {
  id: string;                    // Model identifier
  loaded: boolean;               // Whether the model is currently loaded
  memoryEstimateBytes: bigint;   // Estimated memory footprint in bytes
  pool: string;                  // Pool label this model is charged to ("cpu", "gpu:0", ...)
}

Why bigint? The byte-budget surface (poolBudgets values, register’s memoryEstimateBytes, usedBytes(), availableBytes(), ModelStatus.memoryEstimateBytes) used to be number (u32 on the Rust side), which capped budgets at ~4 GiB and silently truncated larger inputs — a real footgun for 7B+ local models that need 8 GiB+ of memory. Pass values as BigInt literals (8n * 1_073_741_824n) or via BigInt(8 * 1024 ** 3). The cpuRamGb / gpuVramGb: number constructor path is unchanged for users who prefer plain numbers and gigabyte granularity.

ModelRegistry

An ABC for model catalogs. Subclass it to advertise the models your code knows about — used by capability-discovery code, dynamic model menus, and parity with the Rust blazen_llm::traits::ModelRegistry trait.

export declare class ModelRegistry {
  listModels(): Promise<ModelInfo[]>;
  getModel(modelId: string): Promise<ModelInfo | null>;
}

Method	Signature	Description
`listModels`	`async listModels(): Promise<ModelInfo[]>`	Return every model the registry advertises.
`getModel`	`async getModel(modelId: string): Promise<ModelInfo \| null>`	Look up a single model by id, or return `null` if unknown.

The base implementations of both methods throw — overriding them is mandatory. super() must be called from the subclass constructor.

import { ModelRegistry } from "blazen";
import type { ModelInfo } from "blazen";

class MyRegistry extends ModelRegistry {
  async listModels(): Promise<ModelInfo[]> {
    return [
      { id: "gpt-4o", provider: "openai" /* ...other ModelInfo fields */ },
      { id: "claude-sonnet-4", provider: "anthropic" /* ... */ },
    ];
  }

  async getModel(modelId: string): Promise<ModelInfo | null> {
    const all = await this.listModels();
    return all.find((m) => m.id === modelId) ?? null;
  }
}

See the ModelInfo reference for the full set of fields each entry must populate (id, provider, capabilities, context window, pricing, etc.).

Mirrors PyModelRegistry (Python) and WasmModelRegistry exposed as ModelRegistry (WASM SDK) — subclassing ModelRegistry in any binding produces the same Rust-side blazen_llm::traits::ModelRegistry implementation.

ModelPricing and Pricing Functions

ModelPricing

Pricing metadata for cost tracking.

interface ModelPricing {
  inputPerMillion?: number;     // Cost per million input tokens (USD)
  outputPerMillion?: number;    // Cost per million output tokens (USD)
  perImage?: number;            // Cost per generated image (USD)
  perSecond?: number;           // Cost per second of compute (USD)
}

registerPricing()

import { registerPricing } from "blazen";

registerPricing("my-model", { inputPerMillion: 1.0, outputPerMillion: 2.0 });

lookupPricing()

Look up pricing for a model by ID. Returns null if the model is unknown.

import { lookupPricing } from "blazen";

const pricing = lookupPricing("gpt-4o");
if (pricing) {
  console.log(`Input: $${pricing.inputPerMillion}/M tokens`);
}

LocalModel Methods on CompletionModel

CompletionModel instances backed by local inference (not remote APIs) support explicit load/unload lifecycle management.

Method	Signature	Description
`load`	`await model.load(): void`	Load the model into memory (host RAM if on CPU, GPU VRAM otherwise). Idempotent.
`unload`	`await model.unload(): void`	Free the model’s memory. Idempotent.
`isLoaded`	`await model.isLoaded(): boolean`	Whether the model is currently loaded.
`memoryBytes`	`await model.memoryBytes(): number \| null`	Approximate memory footprint in bytes (host RAM if on CPU, GPU VRAM otherwise), or `null` if unknown.
`device`	`await model.device(): string`	Return the device string this model targets (`'cpu'`, `'cuda:0'`, `'metal'`, etc.). Determines which pool the manager charges. Subclasses must override — the base implementation throws.

// For a local model:
await model.load();
console.log(await model.isLoaded());    // true
console.log(await model.memoryBytes()); // e.g. 4000000000
console.log(await model.device());      // e.g. "cuda:0"
await model.unload();

Error Handling

Errors thrown across the FFI boundary are surfaced as instances of typed BlazenError subclasses. Every error class extends the base BlazenError, which extends the standard JavaScript Error, so existing instanceof Error checks keep working while gaining structural classification.

The BlazenError hierarchy is what makes typed error routing possible — any caught value can be matched against BlazenError (catch-all for anything from the SDK), against the direct subclass (broad category like RateLimitError), or against a leaf class (specific failure like LlamaCppModelLoadError). Use whichever level of specificity your handler needs.

Root and direct subclasses

BlazenError is the root. The following 18 classes extend it directly:

Class	When it’s thrown
`AuthError`	Invalid or expired API key.
`RateLimitError`	Provider rate limit reached.
`TimeoutError`	Request exceeded its deadline.
`ValidationError`	Invalid request parameters or option set.
`ContentPolicyError`	Content moderated by the provider.
`ProviderError`	Provider-specific error (HTTP status / endpoint detail attached — see below).
`UnsupportedError`	Feature not supported by the chosen provider.
`ComputeError`	Compute job failure (cancelled, quota exhausted, runtime failure).
`MediaError`	Invalid or oversized media content.
`PeerEncodeError`	Failed to encode a peer envelope.
`PeerTransportError`	Network failure between peers.
`PeerEnvelopeVersionError`	Peer protocol version mismatch.
`PeerWorkflowError`	Remote peer workflow failed.
`PeerTlsError`	TLS handshake or cert validation failed for a peer connection.
`PeerUnknownStepError`	Peer requested an unknown workflow step.
`PersistError`	Snapshot persistence backend failure.
`PromptError`	Prompt registry / template failure.
`MemoryError`	Memory store / embedder failure.
`CacheError`	Model cache / download failure.

`ProviderError` structured fields

ProviderError carries structured context in addition to the message string. All fields are nullable.

Field	Type	Description
`.provider`	`string \| null`	Provider name (e.g. `"openai"`, `"anthropic"`).
`.status`	`number \| null`	HTTP status code, when the call reached the provider.
`.endpoint`	`string \| null`	The endpoint that returned the error.
`.requestId`	`string \| null`	Provider-assigned request id (use this when filing support tickets).
`.detail`	`string \| null`	Provider-supplied error detail / body.
`.retryAfterMs`	`number \| null`	Suggested back-off when the provider returned a `Retry-After` hint.

Per-backend `ProviderError` subclasses

Each local-inference and provider-side backend has its own ProviderError subclass with narrower variants. Use instanceof to route to backend-specific handling.

Class	Backend	Representative narrower subclasses
`LlamaCppError`	llama.cpp	`LlamaCppInvalidOptionsError`, `LlamaCppModelLoadError`, `LlamaCppInferenceError`, `LlamaCppEngineNotAvailableError`
`MistralRsError`	mistral.rs	`MistralRsInvalidOptionsError`, `MistralRsInitError`, `MistralRsInferenceError`, `MistralRsEngineNotAvailableError`
`CandleLlmError`	candle (LLM)	`CandleLlmInvalidOptionsError`, `CandleLlmModelLoadError`, `CandleLlmInferenceError`, `CandleLlmEngineNotAvailableError`
`CandleEmbedError`	candle (embeddings)	`CandleEmbedModelLoadError`, `CandleEmbedEmbeddingError`, `CandleEmbedEngineNotAvailableError`, `CandleEmbedTaskPanickedError`
`WhisperError`	whisper.cpp	`WhisperModelLoadError`, `WhisperTranscriptionError`, `WhisperEngineNotAvailableError`, `WhisperIoError`
`PiperError`	Piper TTS	`PiperModelLoadError`, `PiperSynthesisError`, `PiperEngineNotAvailableError`
`DiffusionError`	diffusion image gen	`DiffusionModelLoadError`, `DiffusionGenerationError`
`FastEmbedError`	fastembed	`EmbedUnknownModelError`, `EmbedInitError`, `EmbedEmbedError`, `EmbedMutexPoisonedError`, `EmbedTaskPanickedError`
`TractError`	tract ONNX runtime	(no narrower variants)

PromptError similarly has narrower variants like PromptMissingVariableError, PromptNotFoundError, PromptVersionNotFoundError, PromptIoError, PromptYamlError, PromptJsonError, PromptValidationError. MemoryError exposes MemoryNoEmbedderError, MemoryEmbeddingError, MemoryNotFoundError, MemorySerializationError, MemoryIoError, MemoryBackendError. CacheError exposes DownloadError, CacheDirError, IoError. There are around 80 narrower subclasses in total — every public Rust error variant gets its own JS class.

`enrichError(err: unknown): unknown`

Plain Error instances thrown across the FFI boundary lose their original Rust type. Pass any caught value through enrichError to re-classify it into the proper BlazenError subclass before further inspection. It’s a no-op when the error is already typed.

import { enrichError, BlazenError } from "blazen";

try {
  await model.complete([ChatMessage.user("Hi")]);
} catch (raw) {
  const err = enrichError(raw);
  if (err instanceof BlazenError) {
    // typed handling
  }
  throw err;
}

Example: routing on the typed hierarchy

import {
  RateLimitError, AuthError, TimeoutError,
  ProviderError, LlamaCppEngineNotAvailableError,
} from "blazen";

try {
  const response = await model.complete([ChatMessage.user("Hello")]);
} catch (e) {
  if (e instanceof RateLimitError) {
    await sleep(e instanceof ProviderError && e.retryAfterMs ? e.retryAfterMs : 1000);
  } else if (e instanceof AuthError) {
    refreshApiKey();
  } else if (e instanceof TimeoutError) {
    // safe to retry once
  } else if (e instanceof LlamaCppEngineNotAvailableError) {
    console.error("Build was compiled without llama.cpp support");
  } else if (e instanceof ProviderError) {
    console.error(`[${e.provider} ${e.status ?? "?"}] ${e.detail ?? e.message}`);
  } else {
    throw e;
  }
}

RateLimitError, TimeoutError, transient ProviderErrors with status >= 500, and most PeerTransportErrors are safe to retry. AuthError, ValidationError, ContentPolicyError, and UnsupportedError are not.

When to call `enrichError`

The Rust core always emits typed BlazenError subclasses, but errors that originate in JS callbacks (tool handlers, persist callbacks, custom providers) and bubble back through Rust come out as plain Error instances. If you want uniform instanceof BlazenError matching everywhere, run every caught value through enrichError at the catch site:

import { enrichError, BlazenError, ProviderError } from "blazen";

async function safeCall<T>(fn: () => Promise<T>): Promise<T> {
  try {
    return await fn();
  } catch (raw) {
    const err = enrichError(raw);
    if (err instanceof ProviderError && err.retryAfterMs) {
      await new Promise(r => setTimeout(r, err.retryAfterMs!));
    }
    throw err;
  }
}

enrichError is idempotent — passing it an already-typed BlazenError returns the same value unchanged, so it’s safe to layer.

Local Inference Types

Local backends (MistralRsProvider, LlamaCppProvider, CandleLlmProvider) expose typed input/output classes for direct use without going through the generic CompletionModel surface. Two parallel families exist — un-prefixed * for mistral.rs (the canonical surface) and LlamaCpp* for llama.cpp — plus a single CandleInferenceResult for the candle backend.

mistral.rs (canonical, un-prefixed)

Class / enum	Purpose
`ChatMessageInput`	Inference-side chat message. Constructor: `new ChatMessageInput(role, text, images?)`; static `ChatMessageInput.fromText(role, text)`. Getters: `.role`, `.text`, `.images`, `.hasImages`.
`ChatRole`	Const enum: `System`, `User`, `Assistant`, `Tool`.
`InferenceImage`	Image attachment for vision-capable models. Static factories: `fromBytes(buf)`, `fromPath(path)`, `fromSource(src)`.
`InferenceImageSource`	Tagged-union source. Static factories: `bytes(buf)`, `path(p)`. Getters: `.kind` (`"bytes"` or `"path"`), `.data`, `.filePath`.
`InferenceResult`	Non-streaming result. Getters: `.content`, `.reasoningContent`, `.toolCalls`, `.finishReason`, `.model`, `.usage`.
`InferenceChunk`	Streaming chunk. Getters: `.delta`, `.reasoningDelta`, `.toolCalls`, `.finishReason`.
`InferenceChunkStream`	Async chunk source. Pull with `await stream.next()`; returns `null` when exhausted.
`InferenceToolCall`	Tool call requested by the model. Constructor `new InferenceToolCall(id, name, arguments)`; getters `.id`, `.name`, `.arguments` (JSON string).
`InferenceUsage`	Token usage. Getters: `.promptTokens`, `.completionTokens`, `.totalTokens`, `.totalTimeSec`.

import { ChatMessageInput, ChatRole, InferenceImage, InferenceChunkStream, MistralRsProvider } from "blazen";

const provider = await MistralRsProvider.create({ modelId: "..." });
const stream: InferenceChunkStream = await provider.inferStream([
  ChatMessageInput.fromText(ChatRole.User, "Describe this image"),
]);
for (let chunk = await stream.next(); chunk !== null; chunk = await stream.next()) {
  process.stdout.write(chunk.delta ?? "");
}

InferenceChunkStream is single-pass — once you’ve reached the terminating null, the stream is exhausted. Errors raised from inside InferenceChunkStream.next() are typed BlazenError subclasses (MistralRsInferenceError, etc.) so they can be matched alongside the rest of the error hierarchy.

llama.cpp (`LlamaCpp` prefix)

The llama.cpp surface mirrors the mistral.rs one with a narrower feature set (no reasoning content, no images on the message input itself).

Class / enum	Purpose
`LlamaCppChatMessageInput`	Constructor: `new LlamaCppChatMessageInput(role, text)`. Getters: `.role`, `.text`.
`LlamaCppChatRole`	Const enum: `System`, `User`, `Assistant`, `Tool` (capitalised — distinct from `ChatRole`).
`LlamaCppInferenceResult`	Non-streaming result. Getters: `.content`, `.finishReason`, `.model`, `.usage`.
`LlamaCppInferenceChunk`	Streaming chunk. Getters: `.delta`, `.finishReason`.
`LlamaCppInferenceChunkStream`	Async chunk source. Same `await stream.next()` pattern.
`LlamaCppInferenceUsage`	Getters: `.promptTokens`, `.completionTokens`, `.totalTokens`, `.totalTimeSec`.

import {
  LlamaCppChatMessageInput, LlamaCppChatRole,
  LlamaCppInferenceChunkStream, LlamaCppProvider,
} from "blazen";

const provider = await LlamaCppProvider.create({ modelPath: "/models/llama.gguf" });
const stream: LlamaCppInferenceChunkStream = await provider.inferStream([
  new LlamaCppChatMessageInput(LlamaCppChatRole.User, "What is 2+2?"),
]);
for (let chunk = await stream.next(); chunk !== null; chunk = await stream.next()) {
  process.stdout.write(chunk.delta ?? "");
}

Like the mistral.rs InferenceChunkStream, LlamaCppInferenceChunkStream is single-pass; mid-stream failures throw a typed LlamaCppInferenceError.

candle

The candle backend exposes a single non-streaming result class.

Class	Purpose
`CandleInferenceResult`	Constructor: `new CandleInferenceResult(content, promptTokens, completionTokens, totalTimeSecs)`. Getters: `.content`, `.promptTokens`, `.completionTokens`, `.totalTimeSecs`.

The candle backend has no streaming counterpart to InferenceChunkStream / LlamaCppInferenceChunkStream — pull CandleInferenceResult once per call. If you need token-by-token streaming on candle, swap the provider to mistral.rs or llama.cpp.

Errors raised from local inference

All three families propagate errors as typed BlazenError subclasses. The mapping is documented in the Error Handling section above. As elsewhere, run callbacks through enrichError to re-classify any plain Error that bubbles back through Rust from JS-side code (custom samplers, custom token decoders, etc.).

Telemetry

OpenTelemetry-compatible tracing flows through the standard tracing subscriber. Blazen ships an optional Langfuse exporter that ships span batches to the Langfuse ingestion API. This is gated by the langfuse Cargo feature on the underlying crate, so it’s only available in builds that opted in at compile time.

`LangfuseConfig`

import { LangfuseConfig } from "blazen";

const config = new LangfuseConfig(
  process.env.LANGFUSE_PUBLIC_KEY!,
  process.env.LANGFUSE_SECRET_KEY!,
  "https://cloud.langfuse.com",  // host (optional, defaults to cloud)
  100,                             // batchSize (optional, default 100)
  5000,                            // flushIntervalMs (optional, default 5000)
);

Property	Type	Description
`.publicKey`	`string`	The Langfuse public API key.
`.secretKey`	`string`	The Langfuse secret API key.
`.host`	`string \| null`	The configured host URL, or `null` when defaulted.
`.batchSize`	`number`	Maximum events buffered before an automatic flush.
`.flushIntervalMs`	`number`	Background flush interval in milliseconds.

`initLangfuse(config: LangfuseConfig): void`

Install the global tracing subscriber. Spawns a background tokio task that flushes buffered span envelopes to Langfuse on the configured interval. Calling this more than once per process is safe — subsequent calls no-op because the global subscriber is already registered.

import { initLangfuse, LangfuseConfig } from "blazen";

initLangfuse(new LangfuseConfig(
  process.env.LANGFUSE_PUBLIC_KEY!,
  process.env.LANGFUSE_SECRET_KEY!,
));

Available only when the host build was compiled with the langfuse feature on the underlying telemetry crate. In builds without it, the symbol is still exported but the configured exporter is a no-op.

Wiring `LangfuseConfig` from environment

Most deployments construct LangfuseConfig directly from environment variables at startup. Tune batchSize and flushIntervalMs to balance ingestion latency against the per-request overhead of HTTP flushes:

import { initLangfuse, LangfuseConfig } from "blazen";

const cfg = new LangfuseConfig(
  process.env.LANGFUSE_PUBLIC_KEY!,
  process.env.LANGFUSE_SECRET_KEY!,
  process.env.LANGFUSE_HOST,                                  // null → cloud default
  Number(process.env.LANGFUSE_BATCH_SIZE ?? 100),
  Number(process.env.LANGFUSE_FLUSH_INTERVAL_MS ?? 5000),
);
initLangfuse(cfg);

A LangfuseConfig instance is purely declarative — it carries no IO state — so it’s safe to construct, inspect, log (with secrets redacted), and pass to initLangfuse independently.

You can keep multiple LangfuseConfig objects around (e.g. one per environment) and choose which one to install at startup; only the first initLangfuse call wins per process.

version()

Returns the Blazen library version string.

import { version } from "blazen";

console.log(version()); // "0.1.0"

Node.js API Reference

ChatMessage

Constructor

Static Factory Methods

Properties

Role

ContentPart

CompletionModel

Provider Factory Methods

Properties

await model.complete(messages: ChatMessage[]): CompletionResponse

await model.completeWithOptions(messages: ChatMessage[], options: CompletionOptions): CompletionResponse

await model.stream(messages: ChatMessage[], onChunk: (chunk) => void): void

await model.streamWithOptions(messages: ChatMessage[], onChunk: (chunk) => void, options: CompletionOptions): void

Middleware Decorators

model.withRetry(config?: RetryConfig): CompletionModel

model.withCache(config?: CacheConfig): CompletionModel

CompletionModel.withFallback(models: CompletionModel[]): CompletionModel

CompletionOptions

CompletionResponse

ToolCall

ToolOutput

Properties

LlmPayload

Variants

Per-provider behavior

ToolDefinition

Content Subsystem

ContentKind

ContentHandle

ContentStore

Subclassing ContentStore

ContentStore.custom({...})

PutOptions

ContentMetadata

Built-in stores

Tool-input schema helpers

How resolution works

TokenUsage

RequestTiming

runAgent

Parameters

Example

Tool handler return shapes

ToolDef

AgentRunOptions

AgentResult

Properties

BatchResult

Properties

Pipeline

new PipelineBuilder(name: string)

.onPersist(callback)

.onPersistJson(callback)

Workflow

new Workflow(name: string)

.addStep(name: string, eventTypes: string[], handler: StepHandler)

.setTimeout(seconds: number)

await wf.run(input: object): WorkflowResult

await wf.runStreaming(input: object, callback: (event) => void): WorkflowResult

await wf.runWithHandler(input: object): WorkflowHandler

await wf.resume(snapshotJson: string): WorkflowHandler

WorkflowResult

StepHandler

WorkflowHandler

await handler.result(): WorkflowResult

await handler.pause(): void

await handler.snapshot(): string

await handler.resumeInPlace(): void

await handler.streamEvents(callback: (event) => void): void

Events

Start Event

Stop Event

Context

StateValue

await ctx.set(key: string, value: Exclude<StateValue, Buffer>): void

await ctx.get(key: string): Promise<StateValue | null>

await ctx.setBytes(key: string, buffer: Buffer): void

await ctx.getBytes(key: string): Buffer | null

await ctx.runId(): string

`await model.complete(messages: ChatMessage[]): CompletionResponse`

`await model.completeWithOptions(messages: ChatMessage[], options: CompletionOptions): CompletionResponse`

`await model.stream(messages: ChatMessage[], onChunk: (chunk) => void): void`

`await model.streamWithOptions(messages: ChatMessage[], onChunk: (chunk) => void, options: CompletionOptions): void`

`model.withRetry(config?: RetryConfig): CompletionModel`

`model.withCache(config?: CacheConfig): CompletionModel`

`CompletionModel.withFallback(models: CompletionModel[]): CompletionModel`

Subclassing `ContentStore`

`ContentStore.custom({...})`

`new PipelineBuilder(name: string)`

`.onPersist(callback)`

`.onPersistJson(callback)`

`new Workflow(name: string)`

`.addStep(name: string, eventTypes: string[], handler: StepHandler)`

`.setTimeout(seconds: number)`

`await wf.run(input: object): WorkflowResult`

`await wf.runStreaming(input: object, callback: (event) => void): WorkflowResult`

`await wf.runWithHandler(input: object): WorkflowHandler`

`await wf.resume(snapshotJson: string): WorkflowHandler`

`await handler.result(): WorkflowResult`

`await handler.pause(): void`

`await handler.snapshot(): string`

`await handler.resumeInPlace(): void`

`await handler.streamEvents(callback: (event) => void): void`

`await ctx.set(key: string, value: Exclude<StateValue, Buffer>): void`

`await ctx.get(key: string): Promise<StateValue | null>`

`await ctx.setBytes(key: string, buffer: Buffer): void`

`await ctx.getBytes(key: string): Buffer | null`

`await ctx.runId(): string`

`await ctx.sendEvent(event: object): void`

`await ctx.writeEventToStream(event: object): void`

`get state(): StateNamespace`

`get session(): SessionNamespace`

`await state.set(key: string, value: Exclude<StateValue, Buffer>): Promise<void>`

`await state.get(key: string): Promise<StateValue | null>`

`await state.setBytes(key: string, data: Buffer): Promise<void>`

`await state.getBytes(key: string): Promise<Buffer | null>`

`await session.set(key: string, value: unknown): Promise<void>`

`await session.get(key: string): Promise<unknown>`

`await session.has(key: string): Promise<boolean>`

`await session.remove(key: string): Promise<void>`

`static meta?: BlazenStateMeta`

`restore?(): void | Promise<void>`

`await state.saveTo(ctx: Context, key: string): Promise<void>`

`static loadFrom<T>(ctx: Context, key: string): Promise<T>`

`await model.embed(texts: string[]): EmbeddingResponse`

`estimateTokens(text: string, contextSize?: number): number`

`countMessageTokens(messages: ChatMessage[], contextSize?: number): number`