WASM Examples

Example applications using Blazen WebAssembly SDK

WASM Examples

Three complete examples that demonstrate real-world usage of the Blazen WASM SDK.


Browser Chat App

A minimal chat interface that runs Blazen entirely in the browser. Tokens stream into the DOM as they arrive.

<!DOCTYPE html>
<html>
<body>
  <div id="chat"></div>
  <input id="input" placeholder="Type a message..." />
  <button id="send">Send</button>

  <script type="module">
    import init, { CompletionModel, ChatMessage } from '@blazen/sdk';

    await init();

    const chat = document.getElementById('chat');
    const input = document.getElementById('input');
    const send = document.getElementById('send');
    const messages = [];

    // WASM reads OPENROUTER_API_KEY from the runtime env (or `process.env` in Node).
    // In production, proxy through your backend -- never expose keys client-side.
    const model = CompletionModel.openrouter();

    send.addEventListener('click', async () => {
      const text = input.value.trim();
      if (!text) return;
      input.value = '';

      messages.push(ChatMessage.user(text));
      const userDiv = document.createElement('div');
      userDiv.textContent = `You: ${text}`;
      chat.appendChild(userDiv);

      const assistantDiv = document.createElement('div');
      assistantDiv.textContent = 'Assistant: ';
      chat.appendChild(assistantDiv);

      await model.stream(messages, (chunk) => {
        if (chunk.delta) {
          assistantDiv.textContent += chunk.delta;
        }
      });

      messages.push(ChatMessage.assistant(assistantDiv.textContent.replace('Assistant: ', '')));
    });
  </script>
</body>
</html>

Node.js Serverless Function

A serverless API endpoint that uses the WASM SDK with tool calling. Deploy to any platform that supports Node.js (Vercel, AWS Lambda, etc.).

import init, { CompletionModel, ChatMessage, runAgent } from '@blazen/sdk';

let initialized = false;

const tools = [
  {
    name: 'lookupOrder',
    description: 'Look up an order by ID',
    parameters: {
      type: 'object',
      properties: { orderId: { type: 'string' } },
      required: ['orderId'],
    },
  },
  {
    name: 'cancelOrder',
    description: 'Cancel an order by ID',
    parameters: {
      type: 'object',
      properties: {
        orderId: { type: 'string' },
        reason: { type: 'string' },
      },
      required: ['orderId'],
    },
  },
];

async function toolHandler(toolName: string, args: Record<string, unknown>) {
  switch (toolName) {
    case 'lookupOrder':
      // Replace with your database call
      return { orderId: args.orderId, status: 'shipped', eta: '2026-03-21' };
    case 'cancelOrder':
      return { orderId: args.orderId, cancelled: true };
    default:
      throw new Error(`Unknown tool: ${toolName}`);
  }
}

export default async function handler(req: Request): Promise<Response> {
  if (!initialized) {
    await init();
    initialized = true;
  }

  const { message } = await req.json();
  // Reads OPENAI_API_KEY from process.env.
  const model = CompletionModel.openai();

  const result = await runAgent(
    model,
    [
      ChatMessage.system('You are a customer support agent. Use tools to look up and manage orders.'),
      ChatMessage.user(message),
    ],
    tools,
    toolHandler,
    { maxIterations: 5 }
  );

  return new Response(JSON.stringify({
    reply: result.response.content,
    iterations: result.iterations,
  }), {
    headers: { 'Content-Type': 'application/json' },
  });
}

Tauri Desktop App

Use the WASM SDK inside a Tauri v2 app to run AI features locally without a server.

// src/lib/ai.ts
import init, { CompletionModel, ChatMessage, Workflow } from '@blazen/sdk';

let ready = false;

export async function ensureInit() {
  if (!ready) {
    await init();
    ready = true;
  }
}

export async function summarize(text: string): Promise<string> {
  await ensureInit();

  const wf = new Workflow('summarizer');

  wf.addStep('summarize', ['blazen::StartEvent'], async (event, ctx) => {
    // The WASM SDK reads ANTHROPIC_API_KEY from the runtime environment.
    const model = CompletionModel.anthropic();
    const response = await model.complete([
      ChatMessage.system('Summarize the following text concisely.'),
      ChatMessage.user(event.text),
    ]);
    return {
      type: 'blazen::StopEvent',
      result: { summary: response.content },
    };
  });

  const result = await wf.run({ text });
  return result.data.summary;
}

export async function chat(
  messages: Array<{ role: string; content: string }>,
  onChunk: (text: string) => void
): Promise<void> {
  await ensureInit();

  // Reads OPENAI_API_KEY from the environment.
  const model = CompletionModel.openai();
  const chatMessages = messages.map((m) =>
    m.role === 'user' ? ChatMessage.user(m.content) : ChatMessage.assistant(m.content)
  );

  await model.stream(chatMessages, (chunk) => {
    if (chunk.delta) onChunk(chunk.delta);
  });
}
// src/App.svelte (or your framework of choice)
import { chat } from './lib/ai';

let output = '';

async function handleSend() {
  output = '';
  await chat(
    [{ role: 'user', content: 'Explain Tauri in one paragraph.' }],
    (chunk) => { output += chunk; }
  );
}

The WASM binary runs inside the webview’s JavaScript context. No Tauri command bridge is needed for AI calls — only for filesystem or OS-level operations.


Custom CompletionModel via fromJsHandler

WASM classes cannot be subclassed the way Python or Node classes can — wasm-bindgen forbids it. Instead, the SDK exposes factory methods that accept JS handler functions. CompletionModel.fromJsHandler is the WASM equivalent of subclassing.

import init, {
  ChatMessage,
  CompletionModel,
  runAgent,
} from "@blazen/sdk";

await init();

// Build a custom model by passing a complete handler (and optionally a stream handler).
const model = CompletionModel.fromJsHandler(
  "echo-llm",
  async (request) => {
    // request has the same shape as CompletionRequest.
    const last = [...request.messages].reverse().find((m: any) => m.role === "user");
    return {
      content: `echo: ${last?.content ?? ""}`,
      toolCalls: [],
      citations: [],
      artifacts: [],
      images: [],
      audio: [],
      videos: [],
      model: "echo-llm",
      metadata: {},
    };
  },
  // Optional stream handler -- fires the onChunk callback with StreamChunk-shaped objects.
  async (request, onChunk) => {
    const last = [...request.messages].reverse().find((m: any) => m.role === "user");
    for (const word of `echo: ${last?.content ?? ""}`.split(" ")) {
      onChunk({ delta: word + " " });
    }
  },
  // Config object -- everything optional. Pricing auto-registers into the global registry.
  {
    contextLength: 4096,
    maxOutputTokens: 2048,
    pricing: { inputPerMillion: 0.0, outputPerMillion: 0.0 },
  },
);

const result = await runAgent(
  model,
  [ChatMessage.user("hello world")],
  [],      // tools -- each item has { name, description, parameters, handler }
  {},      // options: { toolConcurrency?, maxIterations?, systemPrompt?, ... }
);
console.log(result.content); // -> "echo: hello world"

Custom TTSProvider via Handler

Per-capability providers (TTSProvider, ImageProvider, VideoProvider, MusicProvider, ThreeDProvider, BackgroundRemovalProvider, VoiceProvider) follow the same handler pattern — pass your async function to the constructor.

import init, { TTSProvider } from "@blazen/sdk";

await init();

// TTSProvider takes a providerId and a single async handler.
const tts = new TTSProvider("elevenlabs", async (request) => {
  // request: { text, voice, voiceUrl?, language?, speed?, model?, parameters? }
  // Replace with a real HTTP call to your TTS backend.
  const audio = new Uint8Array([0, 1, 2]);
  return {
    audioData: audio,
    format: "wav",
    voice: request.voice,
    text: request.text,
  };
});

const result = await tts.textToSpeech({
  text: "Hello from Blazen!",
  voice: "alice",
});
console.log(result.format, result.audioData.length, "bytes");

For multi-method providers (e.g. MusicProvider), the constructor accepts an object of named handlers:

import init, { MusicProvider } from "@blazen/sdk";

await init();

const music = new MusicProvider("local-musicgen", {
  generateMusic: async (request) => {
    return { audioData: new Uint8Array(), format: "wav" };
  },
  generateSfx: async (request) => {
    return { audioData: new Uint8Array(), format: "wav" };
  },
});

ModelManager (WASM)

The WASM ModelManager tracks per-pool memory budgets and evicts the least-recently-used model within the same pool when its budget would be exceeded. Because WASM classes cannot be subclassed, the manager takes an explicit lifecycle object with load() and unload() async methods (plus optional memoryBytes() and device() callbacks for pool routing).

import init, { CompletionModel, ModelManager } from "@blazen/sdk";

await init();

// 8 GB CPU pool budget (conservative for a laptop). Pass a second
// argument like `new ModelManager(8, 4)` to add a GPU pool too.
const manager = new ModelManager(8);

// Construct models backed by @mlc-ai/web-llm (lazy-loaded at complete-time).
const llama = CompletionModel.webLlm("Llama-3.1-8B-Instruct-q4f32_1-MLC");
const qwen = CompletionModel.webLlm("Qwen2.5-7B-Instruct-q4f32_1-MLC");

// Each model registers with a lifecycle object. In a real app, this
// calls into the WebLLM engine to load/unload GPU resources.
manager.register("llama-8b", llama, 4_500_000_000, {
  load: async () => { console.log("loading llama..."); },
  unload: async () => { console.log("unloading llama..."); },
  isLoaded: () => false,
  memoryBytes: async () => 4_500_000_000,
  device: () => "cpu",   // optional -- defaults to "cpu"
});
manager.register("qwen-7b", qwen, 4_200_000_000, {
  load: async () => { console.log("loading qwen..."); },
  unload: async () => { console.log("unloading qwen..."); },
  isLoaded: () => false,
  memoryBytes: async () => 4_200_000_000,
  device: () => "cpu",
});

await manager.load("llama-8b");
await manager.load("qwen-7b"); // evicts llama-8b (4.5 + 4.2 > 8 GB CPU pool budget)

for (const s of manager.status()) {
  console.log(`${s.id}: loaded=${s.loaded}, pool=${s.pool}, memory=${s.memoryEstimateBytes}`);
}
console.log(`used=${await manager.usedBytes()}, available=${await manager.availableBytes()}`);

ModelRegistry (WASM)

Wraps a JS object exposing listModels() and getModel(modelId) so browser code can plug a custom model catalog (a fetched manifest, an in-browser registry, a control-plane endpoint) into Blazen’s model-info lookup surface. Same shape as the Python ModelRegistry ABC and the Node ModelRegistry class, so workflow code reads identically across runtimes.

import init, { ModelRegistry } from "@blazen/sdk";
import type { ModelInfo } from "@blazen/sdk";

await init();

// Back the registry with whatever source you like -- a fetched manifest,
// an offline IndexedDB cache, or a control-plane endpoint.
const registry = new ModelRegistry({
  async listModels(): Promise<ModelInfo[]> {
    const res = await fetch("/api/models");
    if (!res.ok) throw new Error(`registry fetch failed: ${res.status}`);
    return res.json();
  },
  async getModel(modelId: string): Promise<ModelInfo | null> {
    const res = await fetch(`/api/models/${modelId}`);
    if (res.status === 404) return null;
    if (!res.ok) throw new Error(`registry fetch failed: ${res.status}`);
    return res.json();
  },
});

const all = await registry.listModels();
console.log(`available models: ${all.length}`);

const gpt = await registry.getModel("gpt-4o");
if (gpt) {
  console.log(gpt.id, gpt.provider, gpt.capabilities);
}

Both methods may return synchronous values too — the SDK awaits whatever the callback returned, so a purely in-memory registry can skip the async keyword entirely.


Pricing Registration and Cost Tracking (WASM)

registerPricing attaches USD-per-million-token rates to any model ID. Completions produced with that model ID then carry a computed cost field.

import init, {
  ChatMessage,
  CompletionModel,
  computeCost,
  lookupPricing,
  registerPricing,
} from "@blazen/sdk";

await init();

// Register pricing once, globally.
registerPricing("my-finetuned-model", 1.0, 2.0); // $1/M input, $2/M output

// Lookup
const p = lookupPricing("my-finetuned-model");
if (p) {
  console.log(`input: $${p.inputPerMillion}/M, output: $${p.outputPerMillion}/M`);
}

// Compute a cost directly from token counts.
const cost = computeCost("my-finetuned-model", 1500, 800);
console.log(`estimated cost: $${cost?.toFixed(6)}`);

// Or route through any model that emits the same modelId -- for example, a
// custom handler that tags its responses with "my-finetuned-model".
const model = CompletionModel.fromJsHandler(
  "my-finetuned-model",
  async (_request) => ({
    content: "…",
    toolCalls: [],
    citations: [],
    artifacts: [],
    images: [],
    audio: [],
    videos: [],
    model: "my-finetuned-model",
    usage: { promptTokens: 1500, completionTokens: 800, totalTokens: 2300 },
    metadata: {},
  }),
  undefined,
  {},
);

const response = await model.complete([ChatMessage.user("hi")]);
console.log(`cost: $${response.cost?.toFixed(6)}`); // populated from the global registry

In-Browser RAG with TractEmbedModel + InMemoryBackend

TractEmbedModel runs ONNX-format sentence-transformers entirely in the browser via tract — no remote embedding API required. Pair it with a typed InMemoryBackend and the high-level Memory facade for a fully client-side semantic search store.

import init, { TractEmbedModel, Memory, InMemoryBackend } from "@blazen/sdk";

await init();

// Both URLs must be CORS-enabled. HuggingFace's `resolve/main/...` paths
// serve the right headers out of the box.
const embedder = await TractEmbedModel.create(
  "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx",
  "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/tokenizer.json",
);

// `fromBackend` keeps reads/writes inside WASM linear memory --
// no JS round-trips per call (unlike `fromJsBackend`).
const memory = Memory.fromBackend(embedder, new InMemoryBackend());

await memory.addMany([
  { id: "doc1", text: "Blazen is a Rust workflow engine." },
  { id: "doc2", text: "WebAssembly runs in browsers, Node.js, and edge runtimes." },
  { id: "doc3", text: "Tract is a tiny ONNX inference engine written in Rust." },
]);

const results = await memory.search("What is Blazen?", 3, null);
results.forEach((r) => console.log(r.id, r.score, r.text));

For cross-tab durability, swap InMemoryBackend for a JS-side IndexedDB backend via Memory.fromJsBackend(embedder, backend) — the backend object just needs to implement put, get, delete, list, len, and searchByBands.


Pipeline Snapshot Persistence to IndexedDB

PipelineBuilder.onPersistJson hands you a JSON-shaped snapshot every time the pipeline reaches a checkpoint. Persist it to IndexedDB (or any other store) so a refresh or tab close does not lose progress.

import init, { PipelineBuilder, Stage, Context } from "@blazen/sdk";

await init();

// `idb` here is an `IDBPDatabase` from the `idb` npm package; substitute your
// favourite IndexedDB wrapper. The callback fires after each successful stage
// commit, so it doubles as a place to update progress UI.
const pipeline = new PipelineBuilder("ingest-pipeline")
  .addStage(
    new Stage("normalize", async (input: any, _ctx: Context) => ({
      text: String(input.text ?? "").trim().toLowerCase(),
    })),
  )
  .addStage(
    new Stage("tokenize", async (input: any, _ctx: Context) => ({
      tokens: input.text.split(/\s+/).filter(Boolean),
    })),
  )
  .onPersistJson(async (snapshot: unknown) => {
    const tx = idb.transaction("checkpoints", "readwrite");
    await tx.objectStore("checkpoints").put(snapshot, "current");
    await tx.done;
  })
  .build();

const result = await pipeline.run({ text: "  Hello WASM World  " });
console.log("tokens:", result.tokens);

On the next page load, read checkpoints/current back out and feed it to PipelineBuilder.fromSnapshot(...) to resume mid-flight instead of restarting from stage zero.


Human-in-the-Loop with runWithHandler + streamEvents

Workflow.runWithHandler returns a live WorkflowHandler instead of awaiting the terminal event. Pair it with streamEvents to react to every event the engine publishes — including InputRequestEvent, which is the WASM SDK’s hook for human-in-the-loop prompts. Send the answer back through respondToInput(requestId, response) and the parked event loop unparks immediately.

import init, { Workflow } from "@blazen/sdk";

await init();

const workflow = new Workflow("topic-researcher");

workflow.addStep("clarify", ["blazen::StartEvent"], async (event, _ctx) => {
  // Ask the human to confirm or refine the research topic before we burn
  // tokens. The engine auto-parks on this event until `respondToInput` lands.
  return {
    type: "blazen::InputRequestEvent",
    request_id: crypto.randomUUID(),
    prompt: `Confirm the topic to research: "${event.topic}"`,
    metadata: null,
  };
});

workflow.addStep("research", ["blazen::InputResponseEvent"], async (event, _ctx) => {
  // `event.response` is whatever the JS side passed to `respondToInput`.
  return {
    type: "blazen::StopEvent",
    result: { confirmed_topic: event.response },
  };
});

const handler = await workflow.runWithHandler({ topic: "tract embeddings" });

// `streamEvents` resolves when the workflow ends; events emitted before this
// call are NOT replayed, so subscribe before you await the terminal result.
const streaming = handler.streamEvents((event: { event_type: string; data: any }) => {
  if (event.event_type === "blazen::InputRequestEvent") {
    const answer = window.prompt(event.data.prompt) ?? "";
    handler.respondToInput(event.data.request_id, answer);
  } else {
    console.log("event:", event.event_type, event.data);
  }
});

const [, finalResult] = await Promise.all([streaming, handler.awaitResult()]);
console.log("done:", finalResult);

Outside the browser (Node, Deno, edge runtimes) just swap window.prompt for whatever input source you have — a WebSocket message, a CLI readline, an HTTP form post — and call respondToInput from there. The handler is thread-safe in the sense that it’s a single-owner JS object, so as long as the respondToInput call happens on the same JS event loop that owns the handler, the workflow unparks correctly.