Embeddings

Generate vector embeddings with Blazen in Python

Blazen provides a unified EmbeddingModel interface for generating vector embeddings across multiple providers. The API mirrors CompletionModel: create a model with a static constructor, then call embed().

Create an Embedding Model

from blazen import EmbeddingModel, ProviderOptions

# OpenAI (default: text-embedding-3-small, 1536 dimensions)
# Reads OPENAI_API_KEY from the environment by default.
model = EmbeddingModel.openai()

# Or pass an API key explicitly via ProviderOptions.
model = EmbeddingModel.openai(options=ProviderOptions(api_key="sk-..."))

# OpenAI with a specific model and dimensionality
model = EmbeddingModel.openai(
    options=ProviderOptions(api_key="sk-..."),
    model="text-embedding-3-large",
    dimensions=3072,
)

# Together AI
model = EmbeddingModel.together(options=ProviderOptions(api_key="tok-..."))

# Cohere
model = EmbeddingModel.cohere(options=ProviderOptions(api_key="co-..."))

# Fireworks AI
model = EmbeddingModel.fireworks(options=ProviderOptions(api_key="fw-..."))

Generate Embeddings

Pass a list of strings to embed(). It returns an EmbeddingResponse with one vector per input text.

response = await model.embed(["Hello, world!", "Goodbye, world!"])

print(len(response.embeddings))       # 2
print(len(response.embeddings[0]))    # 1536 (dimensionality)
print(response.model)                 # "text-embedding-3-small"

EmbeddingResponse

The response object exposes the following properties:

PropertyTypeDescription
.embeddingslist[list[float]]One vector per input text.
.modelstrModel that produced the embeddings.
.usageTokenUsage | NoneToken usage statistics.
.costfloat | NoneEstimated cost in USD.
.timingRequestTiming | NoneRequest timing breakdown.

Model Properties

print(model.model_id)    # "text-embedding-3-small"
print(model.dimensions)  # 1536

Local Embeddings

Blazen can generate embeddings entirely on your machine using its built-in embed backend. No API key, no network calls after the initial model download, and completely free. Blazen’s embed backend runs through ONNX Runtime on glibc/mac/windows and pure-Rust tract on musl — the facade picks the right underlying implementation automatically for your target.

Setup

Local embeddings are available when Blazen is built with the embed feature. The default pip install blazen wheels include it.

Usage

from blazen import EmbeddingModel, EmbedOptions

# Use the default model (BAAI/bge-small-en-v1.5, 384 dimensions)
model = EmbeddingModel.local()

# Or specify a model and other options explicitly
model = EmbeddingModel.local(
    options=EmbedOptions(
        model_name="BGESmallENV15",
        cache_dir="/tmp/models",
        max_batch_size=256,
        show_download_progress=True,
    )
)

response = await model.embed(["hello", "world"])
print(len(response.embeddings))       # 2
print(len(response.embeddings[0]))    # 384

EmbedOptions

FieldTypeDefaultDescription
model_namestr | None"BGESmallENV15"Embed model variant name.
cache_dirstr | Nonebackend defaultDirectory where downloaded models are cached.
max_batch_sizeint | None256Maximum batch size for embedding.
show_download_progressbool | NoneFalsePrint a progress bar during model download.

Drop-in with Memory

A local embedding model is a regular EmbeddingModel — it plugs into Memory with no changes:

from blazen import EmbeddingModel, Memory, InMemoryBackend

model = EmbeddingModel.local()
memory = Memory(model, InMemoryBackend())

await memory.add("doc1", "Paris is the capital of France")
results = await memory.search("capital of France", limit=5)

Model Download

The first call to embed() (or Memory.add()) downloads the ONNX model weights. For BGESmallENV15 the download is roughly 33 MB. After the first run the model is cached locally and no further network access is required.

Use Cases

Embeddings are the building block for semantic search, RAG pipelines, clustering, and classification. A typical pattern inside a workflow step:

from blazen import step, Context, Event, EmbeddingModel, ProviderOptions

embed_model = EmbeddingModel.openai(options=ProviderOptions(api_key="sk-..."))

@step
async def embed_documents(ctx: Context, ev: Event):
    texts = ev.documents
    response = await embed_model.embed(texts)
    ctx.set("vectors", response.embeddings)
    return Event("SearchEvent")