WASM Examples
Example applications using Blazen WebAssembly SDK
WASM Examples
Three complete examples that demonstrate real-world usage of the Blazen WASM SDK.
Browser Chat App
A minimal chat interface that runs Blazen entirely in the browser. Tokens stream into the DOM as they arrive.
<!DOCTYPE html>
<html>
<body>
<div id="chat"></div>
<input id="input" placeholder="Type a message..." />
<button id="send">Send</button>
<script type="module">
import init, { OpenRouterProvider, ChatMessage } from '@blazen/sdk';
await init();
const chat = document.getElementById('chat');
const input = document.getElementById('input');
const send = document.getElementById('send');
const messages = [];
// Construct the provider directly. With no apiKey, it reads OPENROUTER_API_KEY.
// In production, proxy through your backend -- never expose keys client-side.
const model = new OpenRouterProvider({ apiKey: 'sk-or-...' });
send.addEventListener('click', async () => {
const text = input.value.trim();
if (!text) return;
input.value = '';
messages.push(ChatMessage.user(text));
const userDiv = document.createElement('div');
userDiv.textContent = `You: ${text}`;
chat.appendChild(userDiv);
const assistantDiv = document.createElement('div');
assistantDiv.textContent = 'Assistant: ';
chat.appendChild(assistantDiv);
await model.stream(messages, (chunk) => {
if (chunk.delta) {
assistantDiv.textContent += chunk.delta;
}
});
messages.push(ChatMessage.assistant(assistantDiv.textContent.replace('Assistant: ', '')));
});
</script>
</body>
</html>
Always-on chat bot
Bot keeps a persistent, event-driven agent running for the lifetime of a
conversation. You send() messages into it and pull replies back with
nextResponse(). Because wasm-bindgen has no stable async-iterator bridge, the
reply surface is a pull model: loop on nextResponse() until it resolves with
undefined (which happens once the bot shuts down).
import init, { Bot, Model, ChatMessage } from '@blazen/sdk';
await init();
// Construct the provider directly; with no apiKey it reads OPENAI_API_KEY.
const model = new OpenAiProvider({ apiKey: 'sk-...' }).toModel();
// Start the always-on bot. Shut it down after 60s of inactivity.
const bot = await Bot.create(model, undefined, {
systemPrompt: 'You are a concise, friendly assistant.',
idleTimeoutMs: 60_000, // milliseconds — 60s idle then auto-shutdown
});
// Drain replies in the background using the pull loop. Each subscription starts
// "from now", so start draining before the first send to avoid missing a reply.
const drain = (async () => {
let r;
while ((r = await bot.nextResponse()) !== undefined) {
console.log('bot:', r);
}
})();
// Drive the conversation. send() is synchronous and non-blocking — the turn runs
// on the bot's event loop and its reply arrives on nextResponse().
bot.send('Hi! What can you help me with?');
bot.send('Explain WebAssembly in one sentence.');
// Later, when the conversation is over:
bot.shutdown();
// Wait for the drain loop to observe the closed stream (nextResponse -> undefined).
await drain;
For the OpenAiProvider import, pull it in alongside Bot:
import init, { Bot, OpenAiProvider } from '@blazen/sdk';
In a browser, swap the console.log for DOM updates; outside the browser (Node,
Deno, edge) the same pull loop works unchanged.
Node.js Serverless Function
A serverless API endpoint that uses the WASM SDK with tool calling. Deploy to any platform that supports Node.js (Vercel, AWS Lambda, etc.).
import init, { OpenAiProvider, ChatMessage, runAgent } from '@blazen/sdk';
let initialized = false;
const tools = [
{
name: 'lookupOrder',
description: 'Look up an order by ID',
parameters: {
type: 'object',
properties: { orderId: { type: 'string' } },
required: ['orderId'],
},
},
{
name: 'cancelOrder',
description: 'Cancel an order by ID',
parameters: {
type: 'object',
properties: {
orderId: { type: 'string' },
reason: { type: 'string' },
},
required: ['orderId'],
},
},
];
async function toolHandler(toolName: string, args: Record<string, unknown>) {
switch (toolName) {
case 'lookupOrder':
// Replace with your database call
return { orderId: args.orderId, status: 'shipped', eta: '2026-03-21' };
case 'cancelOrder':
return { orderId: args.orderId, cancelled: true };
default:
throw new Error(`Unknown tool: ${toolName}`);
}
}
export default async function handler(req: Request): Promise<Response> {
if (!initialized) {
await init();
initialized = true;
}
const { message } = await req.json();
// Construct the provider directly, then adapt it for runAgent via toModel().
const model = new OpenAiProvider({ apiKey: process.env.OPENAI_API_KEY }).toModel();
const result = await runAgent(
model,
[
ChatMessage.system('You are a customer support agent. Use tools to look up and manage orders.'),
ChatMessage.user(message),
],
tools,
toolHandler,
{ maxIterations: 5 }
);
return new Response(JSON.stringify({
reply: result.response.content,
iterations: result.iterations,
}), {
headers: { 'Content-Type': 'application/json' },
});
}
Tauri Desktop App
Use the WASM SDK inside a Tauri v2 app to run AI features locally without a server.
// src/lib/ai.ts
import init, {
AnthropicProvider,
OpenAiProvider,
ChatMessage,
Workflow,
} from '@blazen/sdk';
let ready = false;
export async function ensureInit() {
if (!ready) {
await init();
ready = true;
}
}
export async function summarize(text: string): Promise<string> {
await ensureInit();
const wf = new Workflow('summarizer');
wf.addStep('summarize', ['blazen::StartEvent'], async (event, ctx) => {
// Construct the provider directly. With no apiKey, it reads ANTHROPIC_API_KEY.
const model = new AnthropicProvider({ apiKey: 'sk-ant-...' });
const response = await model.complete([
ChatMessage.system('Summarize the following text concisely.'),
ChatMessage.user(event.text),
]);
return {
type: 'blazen::StopEvent',
result: { summary: response.content },
};
});
const result = await wf.run({ text });
return result.data.summary;
}
export async function chat(
messages: Array<{ role: string; content: string }>,
onChunk: (text: string) => void
): Promise<void> {
await ensureInit();
// Construct the provider directly. With no apiKey, it reads OPENAI_API_KEY.
const model = new OpenAiProvider({ apiKey: 'sk-...' });
const chatMessages = messages.map((m) =>
m.role === 'user' ? ChatMessage.user(m.content) : ChatMessage.assistant(m.content)
);
await model.stream(chatMessages, (chunk) => {
if (chunk.delta) onChunk(chunk.delta);
});
}
// src/App.svelte (or your framework of choice)
import { chat } from './lib/ai';
let output = '';
async function handleSend() {
output = '';
await chat(
[{ role: 'user', content: 'Explain Tauri in one paragraph.' }],
(chunk) => { output += chunk; }
);
}
The WASM binary runs inside the webview’s JavaScript context. No Tauri command bridge is needed for AI calls — only for filesystem or OS-level operations.
Custom Model via fromJsHandler
WASM classes cannot be subclassed the way Python or Node classes can — wasm-bindgen forbids it. Instead, the SDK exposes factory methods that accept JS handler functions. Model.fromJsHandler is the WASM equivalent of subclassing.
import init, {
ChatMessage,
Model,
runAgent,
} from "@blazen/sdk";
await init();
// Build a custom model by passing a complete handler (and optionally a stream handler).
const model = Model.fromJsHandler(
"echo-llm",
async (request) => {
// request has the same shape as ModelRequest.
const last = [...request.messages].reverse().find((m: any) => m.role === "user");
return {
content: `echo: ${last?.content ?? ""}`,
toolCalls: [],
citations: [],
artifacts: [],
images: [],
audio: [],
videos: [],
model: "echo-llm",
metadata: {},
};
},
// Optional stream handler -- fires the onChunk callback with StreamChunk-shaped objects.
async (request, onChunk) => {
const last = [...request.messages].reverse().find((m: any) => m.role === "user");
for (const word of `echo: ${last?.content ?? ""}`.split(" ")) {
onChunk({ delta: word + " " });
}
},
// Config object -- everything optional. Pricing auto-registers into the global registry.
{
contextLength: 4096,
maxOutputTokens: 2048,
pricing: { inputPerMillion: 0.0, outputPerMillion: 0.0 },
},
);
const result = await runAgent(
model,
[ChatMessage.user("hello world")],
[], // tools -- each item has { name, description, parameters, handler }
{}, // options: { toolConcurrency?, maxIterations?, systemPrompt?, ... }
);
console.log(result.content); // -> "echo: hello world"
Custom TTSProvider via Handler
Per-capability providers (TTSProvider, ImageProvider, VideoProvider, MusicProvider, ThreeDProvider, BackgroundRemovalProvider, VoiceProvider) follow the same handler pattern — pass your async function to the constructor.
import init, { TTSProvider } from "@blazen/sdk";
await init();
// TTSProvider takes a providerId and a single async handler.
const tts = new TTSProvider("elevenlabs", async (request) => {
// request: { text, voice, voiceUrl?, language?, speed?, model?, parameters? }
// Replace with a real HTTP call to your TTS backend.
const audio = new Uint8Array([0, 1, 2]);
return {
audioData: audio,
format: "wav",
voice: request.voice,
text: request.text,
};
});
const result = await tts.textToSpeech({
text: "Hello from Blazen!",
voice: "alice",
});
console.log(result.format, result.audioData.length, "bytes");
For multi-method providers (e.g. MusicProvider), the constructor accepts an object of named handlers:
import init, { MusicProvider } from "@blazen/sdk";
await init();
const music = new MusicProvider("local-musicgen", {
generateMusic: async (request) => {
return { audioData: new Uint8Array(), format: "wav" };
},
generateSfx: async (request) => {
return { audioData: new Uint8Array(), format: "wav" };
},
});
ModelManager (WASM)
The WASM ModelManager is the unified registry — register local models and remote providers by name, then dispatch with complete(id, messages) (also stream(id, messages, onChunk) and get(id)). It tracks per-pool memory budgets and evicts the least-recently-used local model within the same pool when its budget would be exceeded. Remote providers own no local weights, so they dispatch straight through and never count against a budget. Because WASM classes cannot be subclassed, local models take an explicit lifecycle object with load() and unload() async methods (plus optional memoryBytes() and device() callbacks for pool routing).
import init, { Model, ModelManager, OpenAiProvider, ChatMessage } from "@blazen/sdk";
await init();
// 8 GB CPU pool budget (conservative for a laptop). Pass a second
// argument like `new ModelManager(8, 4)` to add a GPU pool too.
const manager = new ModelManager(8);
// Remote provider: dispatch-only, no footprint (pass 0, omit lifecycle).
// Standalone providers expose toModel() for the registry.
manager.register("gpt", new OpenAiProvider({ apiKey: "sk-..." }).toModel(), 0);
// Construct models backed by @mlc-ai/web-llm (lazy-loaded at complete-time).
const llama = Model.webLlm("Llama-3.1-8B-Instruct-q4f32_1-MLC");
const qwen = Model.webLlm("Qwen2.5-7B-Instruct-q4f32_1-MLC");
// Each model registers with a lifecycle object. In a real app, this
// calls into the WebLLM engine to load/unload GPU resources.
manager.register("llama-8b", llama, 4_500_000_000, {
load: async () => { console.log("loading llama..."); },
unload: async () => { console.log("unloading llama..."); },
isLoaded: () => false,
memoryBytes: async () => 4_500_000_000,
device: () => "cpu", // optional -- defaults to "cpu"
});
manager.register("qwen-7b", qwen, 4_200_000_000, {
load: async () => { console.log("loading qwen..."); },
unload: async () => { console.log("unloading qwen..."); },
isLoaded: () => false,
memoryBytes: async () => 4_200_000_000,
device: () => "cpu",
});
// Dispatch the remote provider by name — same call shape as the local models.
const remote = await manager.complete("gpt", [ChatMessage.user("Hello from the cloud!")]);
console.log(remote.content);
await manager.load("llama-8b");
await manager.load("qwen-7b"); // evicts llama-8b (4.5 + 4.2 > 8 GB CPU pool budget)
for (const s of manager.status()) {
console.log(`${s.id}: loaded=${s.loaded}, pool=${s.pool}, memory=${s.memoryEstimateBytes}`);
}
console.log(`used=${await manager.usedBytes()}, available=${await manager.availableBytes()}`);
ModelRegistry (WASM)
Wraps a JS object exposing listModels() and getModel(modelId) so browser code can plug a custom model catalog (a fetched manifest, an in-browser registry, a control-plane endpoint) into Blazen’s model-info lookup surface. Same shape as the Python ModelRegistry ABC and the Node ModelRegistry class, so workflow code reads identically across runtimes.
import init, { ModelRegistry } from "@blazen/sdk";
import type { ModelInfo } from "@blazen/sdk";
await init();
// Back the registry with whatever source you like -- a fetched manifest,
// an offline IndexedDB cache, or a control-plane endpoint.
const registry = new ModelRegistry({
async listModels(): Promise<ModelInfo[]> {
const res = await fetch("/api/models");
if (!res.ok) throw new Error(`registry fetch failed: ${res.status}`);
return res.json();
},
async getModel(modelId: string): Promise<ModelInfo | null> {
const res = await fetch(`/api/models/${modelId}`);
if (res.status === 404) return null;
if (!res.ok) throw new Error(`registry fetch failed: ${res.status}`);
return res.json();
},
});
const all = await registry.listModels();
console.log(`available models: ${all.length}`);
const gpt = await registry.getModel("gpt-4o");
if (gpt) {
console.log(gpt.id, gpt.provider, gpt.capabilities);
}
Both methods may return synchronous values too — the SDK awaits whatever the callback returned, so a purely in-memory registry can skip the async keyword entirely.
Pricing Registration and Cost Tracking (WASM)
registerPricing attaches USD-per-million-token rates to any model ID. Completions produced with that model ID then carry a computed cost field.
import init, {
ChatMessage,
Model,
computeCost,
lookupPricing,
registerPricing,
} from "@blazen/sdk";
await init();
// Register pricing once, globally.
registerPricing("my-finetuned-model", 1.0, 2.0); // $1/M input, $2/M output
// Lookup
const p = lookupPricing("my-finetuned-model");
if (p) {
console.log(`input: $${p.inputPerMillion}/M, output: $${p.outputPerMillion}/M`);
}
// Compute a cost directly from token counts.
const cost = computeCost("my-finetuned-model", 1500, 800);
console.log(`estimated cost: $${cost?.toFixed(6)}`);
// Or route through any model that emits the same modelId -- for example, a
// custom handler that tags its responses with "my-finetuned-model".
const model = Model.fromJsHandler(
"my-finetuned-model",
async (_request) => ({
content: "…",
toolCalls: [],
citations: [],
artifacts: [],
images: [],
audio: [],
videos: [],
model: "my-finetuned-model",
usage: { promptTokens: 1500, completionTokens: 800, totalTokens: 2300 },
metadata: {},
}),
undefined,
{},
);
const response = await model.complete([ChatMessage.user("hi")]);
console.log(`cost: $${response.cost?.toFixed(6)}`); // populated from the global registry
In-Browser RAG with TractEmbedModel + InMemoryBackend
TractEmbedModel runs ONNX-format sentence-transformers entirely in the browser via tract — no remote embedding API required. Pair it with a typed InMemoryBackend and the high-level Memory facade for a fully client-side semantic search store.
import init, { TractEmbedModel, Memory, InMemoryBackend } from "@blazen/sdk";
await init();
// Both URLs must be CORS-enabled. HuggingFace's `resolve/main/...` paths
// serve the right headers out of the box.
const embedder = await TractEmbedModel.create(
"https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx",
"https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/tokenizer.json",
);
// `fromBackend` keeps reads/writes inside WASM linear memory --
// no JS round-trips per call (unlike `fromJsBackend`).
const memory = Memory.fromBackend(embedder, new InMemoryBackend());
await memory.addMany([
{ id: "doc1", text: "Blazen is a Rust workflow engine." },
{ id: "doc2", text: "WebAssembly runs in browsers, Node.js, and edge runtimes." },
{ id: "doc3", text: "Tract is a tiny ONNX inference engine written in Rust." },
]);
const results = await memory.search("What is Blazen?", 3, null);
results.forEach((r) => console.log(r.id, r.score, r.text));
For cross-tab durability, swap InMemoryBackend for a JS-side IndexedDB backend via Memory.fromJsBackend(embedder, backend) — the backend object just needs to implement put, get, delete, list, len, and searchByBands.
Pipeline Snapshot Persistence to IndexedDB
PipelineBuilder.onPersistJson hands you a JSON-shaped snapshot every time the pipeline reaches a checkpoint. Persist it to IndexedDB (or any other store) so a refresh or tab close does not lose progress.
import init, { PipelineBuilder, Stage, Context } from "@blazen/sdk";
await init();
// `idb` here is an `IDBPDatabase` from the `idb` npm package; substitute your
// favourite IndexedDB wrapper. The callback fires after each successful stage
// commit, so it doubles as a place to update progress UI.
const pipeline = new PipelineBuilder("ingest-pipeline")
.addStage(
new Stage("normalize", async (input: any, _ctx: Context) => ({
text: String(input.text ?? "").trim().toLowerCase(),
})),
)
.addStage(
new Stage("tokenize", async (input: any, _ctx: Context) => ({
tokens: input.text.split(/\s+/).filter(Boolean),
})),
)
.onPersistJson(async (snapshot: unknown) => {
const tx = idb.transaction("checkpoints", "readwrite");
await tx.objectStore("checkpoints").put(snapshot, "current");
await tx.done;
})
.build();
const result = await pipeline.run({ text: " Hello WASM World " });
console.log("tokens:", result.tokens);
On the next page load, read checkpoints/current back out and feed it to PipelineBuilder.fromSnapshot(...) to resume mid-flight instead of restarting from stage zero.
Human-in-the-Loop with runWithHandler + streamEvents
Workflow.runWithHandler returns a live WorkflowHandler instead of awaiting the terminal event. Pair it with streamEvents to react to every event the engine publishes — including InputRequestEvent, which is the WASM SDK’s hook for human-in-the-loop prompts. Send the answer back through respondToInput(requestId, response) and the parked event loop unparks immediately.
import init, { Workflow } from "@blazen/sdk";
await init();
const workflow = new Workflow("topic-researcher");
workflow.addStep("clarify", ["blazen::StartEvent"], async (event, _ctx) => {
// Ask the human to confirm or refine the research topic before we burn
// tokens. The engine auto-parks on this event until `respondToInput` lands.
return {
type: "blazen::InputRequestEvent",
request_id: crypto.randomUUID(),
prompt: `Confirm the topic to research: "${event.topic}"`,
metadata: null,
};
});
workflow.addStep("research", ["blazen::InputResponseEvent"], async (event, _ctx) => {
// `event.response` is whatever the JS side passed to `respondToInput`.
return {
type: "blazen::StopEvent",
result: { confirmed_topic: event.response },
};
});
const handler = await workflow.runWithHandler({ topic: "tract embeddings" });
// `streamEvents` resolves when the workflow ends; events emitted before this
// call are NOT replayed, so subscribe before you await the terminal result.
const streaming = handler.streamEvents((event: { event_type: string; data: any }) => {
if (event.event_type === "blazen::InputRequestEvent") {
const answer = window.prompt(event.data.prompt) ?? "";
handler.respondToInput(event.data.request_id, answer);
} else {
console.log("event:", event.event_type, event.data);
}
});
const [, finalResult] = await Promise.all([streaming, handler.awaitResult()]);
console.log("done:", finalResult);
Outside the browser (Node, Deno, edge runtimes) just swap window.prompt for whatever input source you have — a WebSocket message, a CLI readline, an HTTP form post — and call respondToInput from there. The handler is thread-safe in the sense that it’s a single-owner JS object, so as long as the respondToInput call happens on the same JS event loop that owns the handler, the workflow unparks correctly.