WASM Examples

Example applications using Blazen WebAssembly SDK

WASM Examples

Three complete examples that demonstrate real-world usage of the Blazen WASM SDK.


Browser Chat App

A minimal chat interface that runs Blazen entirely in the browser. Tokens stream into the DOM as they arrive.

<!DOCTYPE html>
<html>
<body>
  <div id="chat"></div>
  <input id="input" placeholder="Type a message..." />
  <button id="send">Send</button>

  <script type="module">
    import init, { OpenRouterProvider, ChatMessage } from '@blazen/sdk';

    await init();

    const chat = document.getElementById('chat');
    const input = document.getElementById('input');
    const send = document.getElementById('send');
    const messages = [];

    // Construct the provider directly. With no apiKey, it reads OPENROUTER_API_KEY.
    // In production, proxy through your backend -- never expose keys client-side.
    const model = new OpenRouterProvider({ apiKey: 'sk-or-...' });

    send.addEventListener('click', async () => {
      const text = input.value.trim();
      if (!text) return;
      input.value = '';

      messages.push(ChatMessage.user(text));
      const userDiv = document.createElement('div');
      userDiv.textContent = `You: ${text}`;
      chat.appendChild(userDiv);

      const assistantDiv = document.createElement('div');
      assistantDiv.textContent = 'Assistant: ';
      chat.appendChild(assistantDiv);

      await model.stream(messages, (chunk) => {
        if (chunk.delta) {
          assistantDiv.textContent += chunk.delta;
        }
      });

      messages.push(ChatMessage.assistant(assistantDiv.textContent.replace('Assistant: ', '')));
    });
  </script>
</body>
</html>

Always-on chat bot

Bot keeps a persistent, event-driven agent running for the lifetime of a conversation. You send() messages into it and pull replies back with nextResponse(). Because wasm-bindgen has no stable async-iterator bridge, the reply surface is a pull model: loop on nextResponse() until it resolves with undefined (which happens once the bot shuts down).

import init, { Bot, Model, ChatMessage } from '@blazen/sdk';

await init();

// Construct the provider directly; with no apiKey it reads OPENAI_API_KEY.
const model = new OpenAiProvider({ apiKey: 'sk-...' }).toModel();

// Start the always-on bot. Shut it down after 60s of inactivity.
const bot = await Bot.create(model, undefined, {
  systemPrompt: 'You are a concise, friendly assistant.',
  idleTimeoutMs: 60_000, // milliseconds — 60s idle then auto-shutdown
});

// Drain replies in the background using the pull loop. Each subscription starts
// "from now", so start draining before the first send to avoid missing a reply.
const drain = (async () => {
  let r;
  while ((r = await bot.nextResponse()) !== undefined) {
    console.log('bot:', r);
  }
})();

// Drive the conversation. send() is synchronous and non-blocking — the turn runs
// on the bot's event loop and its reply arrives on nextResponse().
bot.send('Hi! What can you help me with?');
bot.send('Explain WebAssembly in one sentence.');

// Later, when the conversation is over:
bot.shutdown();

// Wait for the drain loop to observe the closed stream (nextResponse -> undefined).
await drain;

For the OpenAiProvider import, pull it in alongside Bot:

import init, { Bot, OpenAiProvider } from '@blazen/sdk';

In a browser, swap the console.log for DOM updates; outside the browser (Node, Deno, edge) the same pull loop works unchanged.


Node.js Serverless Function

A serverless API endpoint that uses the WASM SDK with tool calling. Deploy to any platform that supports Node.js (Vercel, AWS Lambda, etc.).

import init, { OpenAiProvider, ChatMessage, runAgent } from '@blazen/sdk';

let initialized = false;

const tools = [
  {
    name: 'lookupOrder',
    description: 'Look up an order by ID',
    parameters: {
      type: 'object',
      properties: { orderId: { type: 'string' } },
      required: ['orderId'],
    },
  },
  {
    name: 'cancelOrder',
    description: 'Cancel an order by ID',
    parameters: {
      type: 'object',
      properties: {
        orderId: { type: 'string' },
        reason: { type: 'string' },
      },
      required: ['orderId'],
    },
  },
];

async function toolHandler(toolName: string, args: Record<string, unknown>) {
  switch (toolName) {
    case 'lookupOrder':
      // Replace with your database call
      return { orderId: args.orderId, status: 'shipped', eta: '2026-03-21' };
    case 'cancelOrder':
      return { orderId: args.orderId, cancelled: true };
    default:
      throw new Error(`Unknown tool: ${toolName}`);
  }
}

export default async function handler(req: Request): Promise<Response> {
  if (!initialized) {
    await init();
    initialized = true;
  }

  const { message } = await req.json();
  // Construct the provider directly, then adapt it for runAgent via toModel().
  const model = new OpenAiProvider({ apiKey: process.env.OPENAI_API_KEY }).toModel();

  const result = await runAgent(
    model,
    [
      ChatMessage.system('You are a customer support agent. Use tools to look up and manage orders.'),
      ChatMessage.user(message),
    ],
    tools,
    toolHandler,
    { maxIterations: 5 }
  );

  return new Response(JSON.stringify({
    reply: result.response.content,
    iterations: result.iterations,
  }), {
    headers: { 'Content-Type': 'application/json' },
  });
}

Tauri Desktop App

Use the WASM SDK inside a Tauri v2 app to run AI features locally without a server.

// src/lib/ai.ts
import init, {
  AnthropicProvider,
  OpenAiProvider,
  ChatMessage,
  Workflow,
} from '@blazen/sdk';

let ready = false;

export async function ensureInit() {
  if (!ready) {
    await init();
    ready = true;
  }
}

export async function summarize(text: string): Promise<string> {
  await ensureInit();

  const wf = new Workflow('summarizer');

  wf.addStep('summarize', ['blazen::StartEvent'], async (event, ctx) => {
    // Construct the provider directly. With no apiKey, it reads ANTHROPIC_API_KEY.
    const model = new AnthropicProvider({ apiKey: 'sk-ant-...' });
    const response = await model.complete([
      ChatMessage.system('Summarize the following text concisely.'),
      ChatMessage.user(event.text),
    ]);
    return {
      type: 'blazen::StopEvent',
      result: { summary: response.content },
    };
  });

  const result = await wf.run({ text });
  return result.data.summary;
}

export async function chat(
  messages: Array<{ role: string; content: string }>,
  onChunk: (text: string) => void
): Promise<void> {
  await ensureInit();

  // Construct the provider directly. With no apiKey, it reads OPENAI_API_KEY.
  const model = new OpenAiProvider({ apiKey: 'sk-...' });
  const chatMessages = messages.map((m) =>
    m.role === 'user' ? ChatMessage.user(m.content) : ChatMessage.assistant(m.content)
  );

  await model.stream(chatMessages, (chunk) => {
    if (chunk.delta) onChunk(chunk.delta);
  });
}
// src/App.svelte (or your framework of choice)
import { chat } from './lib/ai';

let output = '';

async function handleSend() {
  output = '';
  await chat(
    [{ role: 'user', content: 'Explain Tauri in one paragraph.' }],
    (chunk) => { output += chunk; }
  );
}

The WASM binary runs inside the webview’s JavaScript context. No Tauri command bridge is needed for AI calls — only for filesystem or OS-level operations.


Custom Model via fromJsHandler

WASM classes cannot be subclassed the way Python or Node classes can — wasm-bindgen forbids it. Instead, the SDK exposes factory methods that accept JS handler functions. Model.fromJsHandler is the WASM equivalent of subclassing.

import init, {
  ChatMessage,
  Model,
  runAgent,
} from "@blazen/sdk";

await init();

// Build a custom model by passing a complete handler (and optionally a stream handler).
const model = Model.fromJsHandler(
  "echo-llm",
  async (request) => {
    // request has the same shape as ModelRequest.
    const last = [...request.messages].reverse().find((m: any) => m.role === "user");
    return {
      content: `echo: ${last?.content ?? ""}`,
      toolCalls: [],
      citations: [],
      artifacts: [],
      images: [],
      audio: [],
      videos: [],
      model: "echo-llm",
      metadata: {},
    };
  },
  // Optional stream handler -- fires the onChunk callback with StreamChunk-shaped objects.
  async (request, onChunk) => {
    const last = [...request.messages].reverse().find((m: any) => m.role === "user");
    for (const word of `echo: ${last?.content ?? ""}`.split(" ")) {
      onChunk({ delta: word + " " });
    }
  },
  // Config object -- everything optional. Pricing auto-registers into the global registry.
  {
    contextLength: 4096,
    maxOutputTokens: 2048,
    pricing: { inputPerMillion: 0.0, outputPerMillion: 0.0 },
  },
);

const result = await runAgent(
  model,
  [ChatMessage.user("hello world")],
  [],      // tools -- each item has { name, description, parameters, handler }
  {},      // options: { toolConcurrency?, maxIterations?, systemPrompt?, ... }
);
console.log(result.content); // -> "echo: hello world"

Custom TTSProvider via Handler

Per-capability providers (TTSProvider, ImageProvider, VideoProvider, MusicProvider, ThreeDProvider, BackgroundRemovalProvider, VoiceProvider) follow the same handler pattern — pass your async function to the constructor.

import init, { TTSProvider } from "@blazen/sdk";

await init();

// TTSProvider takes a providerId and a single async handler.
const tts = new TTSProvider("elevenlabs", async (request) => {
  // request: { text, voice, voiceUrl?, language?, speed?, model?, parameters? }
  // Replace with a real HTTP call to your TTS backend.
  const audio = new Uint8Array([0, 1, 2]);
  return {
    audioData: audio,
    format: "wav",
    voice: request.voice,
    text: request.text,
  };
});

const result = await tts.textToSpeech({
  text: "Hello from Blazen!",
  voice: "alice",
});
console.log(result.format, result.audioData.length, "bytes");

For multi-method providers (e.g. MusicProvider), the constructor accepts an object of named handlers:

import init, { MusicProvider } from "@blazen/sdk";

await init();

const music = new MusicProvider("local-musicgen", {
  generateMusic: async (request) => {
    return { audioData: new Uint8Array(), format: "wav" };
  },
  generateSfx: async (request) => {
    return { audioData: new Uint8Array(), format: "wav" };
  },
});

ModelManager (WASM)

The WASM ModelManager is the unified registry — register local models and remote providers by name, then dispatch with complete(id, messages) (also stream(id, messages, onChunk) and get(id)). It tracks per-pool memory budgets and evicts the least-recently-used local model within the same pool when its budget would be exceeded. Remote providers own no local weights, so they dispatch straight through and never count against a budget. Because WASM classes cannot be subclassed, local models take an explicit lifecycle object with load() and unload() async methods (plus optional memoryBytes() and device() callbacks for pool routing).

import init, { Model, ModelManager, OpenAiProvider, ChatMessage } from "@blazen/sdk";

await init();

// 8 GB CPU pool budget (conservative for a laptop). Pass a second
// argument like `new ModelManager(8, 4)` to add a GPU pool too.
const manager = new ModelManager(8);

// Remote provider: dispatch-only, no footprint (pass 0, omit lifecycle).
// Standalone providers expose toModel() for the registry.
manager.register("gpt", new OpenAiProvider({ apiKey: "sk-..." }).toModel(), 0);

// Construct models backed by @mlc-ai/web-llm (lazy-loaded at complete-time).
const llama = Model.webLlm("Llama-3.1-8B-Instruct-q4f32_1-MLC");
const qwen = Model.webLlm("Qwen2.5-7B-Instruct-q4f32_1-MLC");

// Each model registers with a lifecycle object. In a real app, this
// calls into the WebLLM engine to load/unload GPU resources.
manager.register("llama-8b", llama, 4_500_000_000, {
  load: async () => { console.log("loading llama..."); },
  unload: async () => { console.log("unloading llama..."); },
  isLoaded: () => false,
  memoryBytes: async () => 4_500_000_000,
  device: () => "cpu",   // optional -- defaults to "cpu"
});
manager.register("qwen-7b", qwen, 4_200_000_000, {
  load: async () => { console.log("loading qwen..."); },
  unload: async () => { console.log("unloading qwen..."); },
  isLoaded: () => false,
  memoryBytes: async () => 4_200_000_000,
  device: () => "cpu",
});

// Dispatch the remote provider by name — same call shape as the local models.
const remote = await manager.complete("gpt", [ChatMessage.user("Hello from the cloud!")]);
console.log(remote.content);

await manager.load("llama-8b");
await manager.load("qwen-7b"); // evicts llama-8b (4.5 + 4.2 > 8 GB CPU pool budget)

for (const s of manager.status()) {
  console.log(`${s.id}: loaded=${s.loaded}, pool=${s.pool}, memory=${s.memoryEstimateBytes}`);
}
console.log(`used=${await manager.usedBytes()}, available=${await manager.availableBytes()}`);

ModelRegistry (WASM)

Wraps a JS object exposing listModels() and getModel(modelId) so browser code can plug a custom model catalog (a fetched manifest, an in-browser registry, a control-plane endpoint) into Blazen’s model-info lookup surface. Same shape as the Python ModelRegistry ABC and the Node ModelRegistry class, so workflow code reads identically across runtimes.

import init, { ModelRegistry } from "@blazen/sdk";
import type { ModelInfo } from "@blazen/sdk";

await init();

// Back the registry with whatever source you like -- a fetched manifest,
// an offline IndexedDB cache, or a control-plane endpoint.
const registry = new ModelRegistry({
  async listModels(): Promise<ModelInfo[]> {
    const res = await fetch("/api/models");
    if (!res.ok) throw new Error(`registry fetch failed: ${res.status}`);
    return res.json();
  },
  async getModel(modelId: string): Promise<ModelInfo | null> {
    const res = await fetch(`/api/models/${modelId}`);
    if (res.status === 404) return null;
    if (!res.ok) throw new Error(`registry fetch failed: ${res.status}`);
    return res.json();
  },
});

const all = await registry.listModels();
console.log(`available models: ${all.length}`);

const gpt = await registry.getModel("gpt-4o");
if (gpt) {
  console.log(gpt.id, gpt.provider, gpt.capabilities);
}

Both methods may return synchronous values too — the SDK awaits whatever the callback returned, so a purely in-memory registry can skip the async keyword entirely.


Pricing Registration and Cost Tracking (WASM)

registerPricing attaches USD-per-million-token rates to any model ID. Completions produced with that model ID then carry a computed cost field.

import init, {
  ChatMessage,
  Model,
  computeCost,
  lookupPricing,
  registerPricing,
} from "@blazen/sdk";

await init();

// Register pricing once, globally.
registerPricing("my-finetuned-model", 1.0, 2.0); // $1/M input, $2/M output

// Lookup
const p = lookupPricing("my-finetuned-model");
if (p) {
  console.log(`input: $${p.inputPerMillion}/M, output: $${p.outputPerMillion}/M`);
}

// Compute a cost directly from token counts.
const cost = computeCost("my-finetuned-model", 1500, 800);
console.log(`estimated cost: $${cost?.toFixed(6)}`);

// Or route through any model that emits the same modelId -- for example, a
// custom handler that tags its responses with "my-finetuned-model".
const model = Model.fromJsHandler(
  "my-finetuned-model",
  async (_request) => ({
    content: "…",
    toolCalls: [],
    citations: [],
    artifacts: [],
    images: [],
    audio: [],
    videos: [],
    model: "my-finetuned-model",
    usage: { promptTokens: 1500, completionTokens: 800, totalTokens: 2300 },
    metadata: {},
  }),
  undefined,
  {},
);

const response = await model.complete([ChatMessage.user("hi")]);
console.log(`cost: $${response.cost?.toFixed(6)}`); // populated from the global registry

In-Browser RAG with TractEmbedModel + InMemoryBackend

TractEmbedModel runs ONNX-format sentence-transformers entirely in the browser via tract — no remote embedding API required. Pair it with a typed InMemoryBackend and the high-level Memory facade for a fully client-side semantic search store.

import init, { TractEmbedModel, Memory, InMemoryBackend } from "@blazen/sdk";

await init();

// Both URLs must be CORS-enabled. HuggingFace's `resolve/main/...` paths
// serve the right headers out of the box.
const embedder = await TractEmbedModel.create(
  "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx",
  "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/tokenizer.json",
);

// `fromBackend` keeps reads/writes inside WASM linear memory --
// no JS round-trips per call (unlike `fromJsBackend`).
const memory = Memory.fromBackend(embedder, new InMemoryBackend());

await memory.addMany([
  { id: "doc1", text: "Blazen is a Rust workflow engine." },
  { id: "doc2", text: "WebAssembly runs in browsers, Node.js, and edge runtimes." },
  { id: "doc3", text: "Tract is a tiny ONNX inference engine written in Rust." },
]);

const results = await memory.search("What is Blazen?", 3, null);
results.forEach((r) => console.log(r.id, r.score, r.text));

For cross-tab durability, swap InMemoryBackend for a JS-side IndexedDB backend via Memory.fromJsBackend(embedder, backend) — the backend object just needs to implement put, get, delete, list, len, and searchByBands.


Pipeline Snapshot Persistence to IndexedDB

PipelineBuilder.onPersistJson hands you a JSON-shaped snapshot every time the pipeline reaches a checkpoint. Persist it to IndexedDB (or any other store) so a refresh or tab close does not lose progress.

import init, { PipelineBuilder, Stage, Context } from "@blazen/sdk";

await init();

// `idb` here is an `IDBPDatabase` from the `idb` npm package; substitute your
// favourite IndexedDB wrapper. The callback fires after each successful stage
// commit, so it doubles as a place to update progress UI.
const pipeline = new PipelineBuilder("ingest-pipeline")
  .addStage(
    new Stage("normalize", async (input: any, _ctx: Context) => ({
      text: String(input.text ?? "").trim().toLowerCase(),
    })),
  )
  .addStage(
    new Stage("tokenize", async (input: any, _ctx: Context) => ({
      tokens: input.text.split(/\s+/).filter(Boolean),
    })),
  )
  .onPersistJson(async (snapshot: unknown) => {
    const tx = idb.transaction("checkpoints", "readwrite");
    await tx.objectStore("checkpoints").put(snapshot, "current");
    await tx.done;
  })
  .build();

const result = await pipeline.run({ text: "  Hello WASM World  " });
console.log("tokens:", result.tokens);

On the next page load, read checkpoints/current back out and feed it to PipelineBuilder.fromSnapshot(...) to resume mid-flight instead of restarting from stage zero.


Human-in-the-Loop with runWithHandler + streamEvents

Workflow.runWithHandler returns a live WorkflowHandler instead of awaiting the terminal event. Pair it with streamEvents to react to every event the engine publishes — including InputRequestEvent, which is the WASM SDK’s hook for human-in-the-loop prompts. Send the answer back through respondToInput(requestId, response) and the parked event loop unparks immediately.

import init, { Workflow } from "@blazen/sdk";

await init();

const workflow = new Workflow("topic-researcher");

workflow.addStep("clarify", ["blazen::StartEvent"], async (event, _ctx) => {
  // Ask the human to confirm or refine the research topic before we burn
  // tokens. The engine auto-parks on this event until `respondToInput` lands.
  return {
    type: "blazen::InputRequestEvent",
    request_id: crypto.randomUUID(),
    prompt: `Confirm the topic to research: "${event.topic}"`,
    metadata: null,
  };
});

workflow.addStep("research", ["blazen::InputResponseEvent"], async (event, _ctx) => {
  // `event.response` is whatever the JS side passed to `respondToInput`.
  return {
    type: "blazen::StopEvent",
    result: { confirmed_topic: event.response },
  };
});

const handler = await workflow.runWithHandler({ topic: "tract embeddings" });

// `streamEvents` resolves when the workflow ends; events emitted before this
// call are NOT replayed, so subscribe before you await the terminal result.
const streaming = handler.streamEvents((event: { event_type: string; data: any }) => {
  if (event.event_type === "blazen::InputRequestEvent") {
    const answer = window.prompt(event.data.prompt) ?? "";
    handler.respondToInput(event.data.request_id, answer);
  } else {
    console.log("event:", event.event_type, event.data);
  }
});

const [, finalResult] = await Promise.all([streaming, handler.awaitResult()]);
console.log("done:", finalResult);

Outside the browser (Node, Deno, edge runtimes) just swap window.prompt for whatever input source you have — a WebSocket message, a CLI readline, an HTTP form post — and call respondToInput from there. The handler is thread-safe in the sense that it’s a single-owner JS object, so as long as the respondToInput call happens on the same JS event loop that owns the handler, the workflow unparks correctly.