Embeddings

Generate vector embeddings with Blazen in Node.js

Blazen provides a unified EmbeddingModel interface for generating vector embeddings across multiple providers. The API mirrors CompletionModel: create a model with a static factory method, then call embed().

Create an Embedding Model

import { EmbeddingModel } from "blazen";

// OpenAI (default: text-embedding-3-small, 1536 dimensions)
// Reads OPENAI_API_KEY from the environment by default.
const model = EmbeddingModel.openai();

// Or pass an API key explicitly.
const model = EmbeddingModel.openai({ apiKey: "sk-..." });

// Together AI
const model = EmbeddingModel.together({ apiKey: "tok-..." });

// Cohere
const model = EmbeddingModel.cohere({ apiKey: "co-..." });

// Fireworks AI
const model = EmbeddingModel.fireworks({ apiKey: "fw-..." });

Generate Embeddings

Pass an array of strings to embed(). It returns an EmbeddingResponse with one vector per input text.

const response = await model.embed(["Hello, world!", "Goodbye, world!"]);

console.log(response.embeddings.length);       // 2
console.log(response.embeddings[0].length);    // 1536 (dimensionality)
console.log(response.model);                   // "text-embedding-3-small"

EmbeddingResponse

The response object has the following fields:

PropertyTypeDescription
.embeddingsnumber[][]One vector per input text.
.modelstringModel that produced the embeddings.
.usageTokenUsage | undefinedToken usage statistics.
.costnumber | undefinedEstimated cost in USD.
.timingRequestTiming | undefinedRequest timing breakdown.

Model Properties

console.log(model.modelId);     // "text-embedding-3-small"
console.log(model.dimensions);  // 1536

Local Embeddings

Blazen can generate embeddings entirely on your machine using its built-in embed backend. No API key, no network calls after the initial model download, and completely free. Blazen’s embed backend runs through ONNX Runtime on glibc/mac/windows and pure-Rust tract on musl — the facade picks the right underlying implementation automatically for your target.

Setup

Local embeddings are available when Blazen is built with the embed feature. The default npm install blazen package includes it.

Usage

import { EmbeddingModel } from "blazen";

// Use the default model (BAAI/bge-small-en-v1.5, 384 dimensions)
const model = EmbeddingModel.embed();

// Or specify a model and other options explicitly
const model = EmbeddingModel.embed({
  modelName: "BGESmallENV15",
  cacheDir: "/tmp/models",
  maxBatchSize: 256,
  showDownloadProgress: true,
});

const response = await model.embed(["hello", "world"]);
console.log(response.embeddings.length);       // 2
console.log(response.embeddings[0].length);    // 384

EmbedOptions

FieldTypeDefaultDescription
modelNamestring | undefined"BGESmallENV15"Embed model variant name.
cacheDirstring | undefinedbackend defaultDirectory where downloaded models are cached.
maxBatchSizenumber | undefined256Maximum batch size for embedding.
showDownloadProgressboolean | undefinedfalsePrint a progress bar during model download.

Drop-in with Memory

A local embedding model is a regular EmbeddingModel — it plugs into Memory with no changes:

import { EmbeddingModel, Memory, InMemoryBackend } from "blazen";

const model = EmbeddingModel.embed();
const memory = new Memory(model, new InMemoryBackend());

await memory.add("doc1", "Paris is the capital of France");
const results = await memory.search("capital of France", 5);

Model Download

The first call to embed() (or memory.add()) downloads the ONNX model weights. For BGESmallENV15 the download is roughly 33 MB. After the first run the model is cached locally and no further network access is required.

Use Cases

Embeddings are the building block for semantic search, RAG pipelines, clustering, and classification. A typical pattern inside a workflow step:

import { EmbeddingModel } from "blazen";

const embedModel = EmbeddingModel.openai({ apiKey: "sk-..." });

wf.addStep("embed_documents", ["DocumentsReady"], async (event, ctx) => {
  const response = await embedModel.embed(event.documents);
  await ctx.set("vectors", response.embeddings);
  return { type: "SearchEvent" };
});