Middleware & Composition

Compose retry, caching, fallback, and custom middleware in the WASM SDK

The WASM SDK supports the same middleware patterns as the Node.js SDK. Each decorator method returns a new CompletionModel, keeping the original unchanged.

Retry

Wrap a model with automatic retry on transient failures. The WASM withRetry() takes an optional maxRetries number (default 3).

import init, { CompletionModel } from "@blazen/sdk";

await init();

const model = CompletionModel.openai("sk-...").withRetry(5);

Cache

Cache identical non-streaming requests in memory.

// ttlSeconds (default 300), maxEntries (default 1000)
const model = CompletionModel.openai("sk-...").withCache(600, 500);

Streaming requests always bypass the cache.

Fallback

Route requests through multiple providers in order.

const primary = CompletionModel.openai("sk-...");
const backup = CompletionModel.groq("gsk-...");

const model = CompletionModel.withFallback([primary, backup]);

When the first provider fails with a transient error (rate limit, timeout, server error), the next provider is tried. Non-retryable errors short-circuit immediately.

Composing Middleware

Chain decorators to layer multiple behaviours:

const model = CompletionModel.openai("sk-...")
  .withCache(300, 1000)
  .withRetry(3);

For maximum resilience, combine all three:

const primary = CompletionModel.openai("sk-...").withCache().withRetry();
const backup = CompletionModel.anthropic("sk-ant-...").withRetry();

const model = CompletionModel.withFallback([primary, backup]);

Using Decorated Models

Decorated models work identically to plain models — pass them to complete(), stream(), or runAgent():

import { ChatMessage } from "@blazen/sdk";

const response = await model.complete([
  ChatMessage.user("Explain quantum computing in one sentence."),
]);
console.log(response.content);