Middleware & Composition
Compose retry, caching, fallback, and custom middleware in the WASM SDK
The WASM SDK supports the same middleware patterns as the Node.js SDK. Each decorator method returns a new CompletionModel, keeping the original unchanged.
Retry
Wrap a model with automatic retry on transient failures. The WASM withRetry() takes an optional maxRetries number (default 3).
import init, { CompletionModel } from "@blazen/sdk";
await init();
// The WASM SDK reads OPENAI_API_KEY from the runtime environment;
// factory methods do not accept arguments.
const model = CompletionModel.openai().withRetry(5);
Cache
Cache identical non-streaming requests in memory.
// ttlSeconds (default 300), maxEntries (default 1000)
const model = CompletionModel.openai().withCache(600, 500);
Streaming requests always bypass the cache.
Fallback
Route requests through multiple providers in order.
const primary = CompletionModel.openai(); // reads OPENAI_API_KEY
const backup = CompletionModel.groq(); // reads GROQ_API_KEY
const model = CompletionModel.withFallback([primary, backup]);
When the first provider fails with a transient error (rate limit, timeout, server error), the next provider is tried. Non-retryable errors short-circuit immediately.
Composing Middleware
Chain decorators to layer multiple behaviours:
const model = CompletionModel.openai()
.withCache(300, 1000)
.withRetry(3);
For maximum resilience, combine all three:
// The WASM SDK exposes OpenAI-compatible providers only.
// Use OpenRouter for Claude/Gemini/etc. via a single key.
const primary = CompletionModel.openai().withCache().withRetry();
const backup = CompletionModel.openrouter().withRetry();
const model = CompletionModel.withFallback([primary, backup]);
Using Decorated Models
Decorated models work identically to plain models — pass them to complete(), stream(), or runAgent():
import { ChatMessage } from "@blazen/sdk";
const response = await model.complete([
ChatMessage.user("Explain quantum computing in one sentence."),
]);
console.log(response.content);