Rust API Reference

Complete API reference for blazen-llm in Rust

Feature Flags

blazen-llm providers:

FeatureDescription
openaiEnables OpenAiProvider and OpenAiCompatProvider (covers OpenRouter, Groq, Together, Mistral, DeepSeek, Fireworks, Perplexity, xAI, Cohere, Bedrock)
anthropicEnables AnthropicProvider
geminiEnables GeminiProvider
falEnables FalProvider (compute: image, video, audio, 3D)
azureEnables AzureOpenAiProvider
all-providersEnables all provider implementations

blazen-llm local-inference backends (each gated behind its own feature, all bundled in the all-local umbrella):

FeatureRe-exports from blazen_llm::*
mistralrsMistralRsProvider, ChatMessageInput, ChatRole, InferenceChunk, InferenceChunkStream, InferenceImage, InferenceImageSource, InferenceResult, InferenceToolCall, InferenceUsage, MistralRsError, MistralRsOptions
llamacppLlamaCppProvider, LlamaCppChatMessageInput, LlamaCppChatRole, LlamaCppInferenceChunk, LlamaCppInferenceChunkStream, LlamaCppInferenceResult, LlamaCppInferenceUsage, LlamaCppError, LlamaCppOptions
candle-llmCandleLlmProvider, CandleLlmCompletionModel, CandleInferenceResult, CandleLlmError, CandleLlmOptions
candle-embedCandleEmbedModel, CandleEmbedOptions, CandleEmbedError
embedEmbedModel, EmbedOptions, EmbedResponse, EmbedError
whispercppWhisperCppProvider, WhisperModel, WhisperOptions, WhisperError
piperPiperProvider, PiperOptions, PiperError
diffusionDiffusionProvider, DiffusionOptions, DiffusionScheduler, DiffusionError

blazen-telemetry exporters:

FeatureDescription
spans (default)Enables TracingCompletionModel and per-span instrumentation hooks
historyEnables WorkflowHistory, HistoryEvent, HistoryEventKind, PauseReason
otlpOTLP gRPC exporter via tonic (init_otlp + OtlpConfig). Native targets only
otlp-httpOTLP HTTP/protobuf exporter via a custom HttpClient (init_otlp_http + OtlpConfig). Works on native and wasm32
prometheusEnables init_prometheus + MetricsLayer
langfuseEnables LangfuseConfig, LangfuseLayer, init_langfuse
allEnables spans, history, otlp, otlp-http, prometheus, langfuse

Core LLM Traits

CompletionModel

The central trait every LLM provider must implement. Supports both one-shot and streaming completions.

#[async_trait]
pub trait CompletionModel: Send + Sync {
    fn model_id(&self) -> &str;

    async fn complete(
        &self,
        request: CompletionRequest,
    ) -> Result<CompletionResponse, BlazenError>;

    async fn stream(
        &self,
        request: CompletionRequest,
    ) -> Result<
        Pin<Box<dyn Stream<Item = Result<StreamChunk, BlazenError>> + Send>>,
        BlazenError,
    >;
}

Usage:

use blazen_llm::{CompletionModel, CompletionRequest, ChatMessage};
use blazen_llm::providers::openai::OpenAiProvider;

let model = OpenAiProvider::new("sk-...");
let request = CompletionRequest::new(vec![
    ChatMessage::user("What is 2 + 2?"),
]);
let response = model.complete(request).await?;
println!("{}", response.content.unwrap_or_default());

Streaming:

use futures_util::StreamExt;

let request = CompletionRequest::new(vec![
    ChatMessage::user("Tell me a story"),
]);
let mut stream = model.stream(request).await?;
while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    if let Some(delta) = &chunk.delta {
        print!("{delta}");
    }
}

StructuredOutput

Extract typed data from a model using JSON Schema constraints. This trait has a blanket implementation for every CompletionModel — providers do not need to implement it.

#[async_trait]
pub trait StructuredOutput: CompletionModel {
    async fn extract<T: JsonSchema + DeserializeOwned + Send>(
        &self,
        messages: Vec<ChatMessage>,
    ) -> Result<StructuredResponse<T>, BlazenError>;
}

// Blanket impl: every CompletionModel automatically gets this.
impl<M: CompletionModel> StructuredOutput for M {}

T must implement schemars::JsonSchema and serde::de::DeserializeOwned. The schema is derived at call time via schemars::schema_for! and injected into the request’s response_format.

Usage:

use schemars::JsonSchema;
use serde::Deserialize;
use blazen_llm::StructuredOutput;

#[derive(JsonSchema, Deserialize)]
struct Sentiment {
    label: String,
    score: f64,
}

let result = model.extract::<Sentiment>(vec![
    ChatMessage::user("Analyze sentiment: 'I love Rust'"),
]).await?;
println!("{}: {}", result.data.label, result.data.score);

EmbeddingModel

Produces vector embeddings for text inputs.

#[async_trait]
pub trait EmbeddingModel: Send + Sync {
    fn model_id(&self) -> &str;
    fn dimensions(&self) -> usize;
    async fn embed(&self, texts: &[String]) -> Result<EmbeddingResponse, BlazenError>;
}

Usage:

let texts = vec!["Hello world".into(), "Goodbye world".into()];
let response = embedding_model.embed(&texts).await?;
for (i, vector) in response.embeddings.iter().enumerate() {
    println!("text {i}: {} dimensions", vector.len());
}

Tool

A callable tool that can be invoked by an LLM during a conversation.

#[async_trait]
pub trait Tool: Send + Sync {
    fn definition(&self) -> ToolDefinition;
    async fn execute(
        &self,
        arguments: serde_json::Value,
    ) -> Result<ToolOutput<serde_json::Value>, BlazenError>;
}

:::note[Migration from earlier versions] The Tool::execute return type changed: it used to return a bare Result<…Value…, BlazenError> and now returns Result<ToolOutput<Value>, BlazenError>. Existing tools that returned a Value continue to compile by changing Ok(value) to Ok(value.into()) — the From<Value> impl wraps it into a ToolOutput with no override. The new ChatMessage::tool_result constructor is also a breaking change for callers that passed a &str; convert via serde_json::Value::String(s.into()) or use serde_json::json!(s). :::

Usage:

use blazen_llm::{Tool, ToolDefinition, ToolOutput, BlazenError};
use async_trait::async_trait;

struct WeatherTool;

#[async_trait]
impl Tool for WeatherTool {
    fn definition(&self) -> ToolDefinition {
        ToolDefinition {
            name: "get_weather".into(),
            description: "Get the weather for a city.".into(),
            parameters: serde_json::json!({
                "type": "object",
                "properties": { "city": { "type": "string" } },
                "required": ["city"],
            }),
        }
    }

    async fn execute(
        &self,
        args: serde_json::Value,
    ) -> Result<ToolOutput<serde_json::Value>, BlazenError> {
        let _city = args["city"].as_str().unwrap_or_default();
        // Common case: return a structured value, no override.
        Ok(serde_json::json!({ "temperature_f": 72, "conditions": "clear" }).into())
    }
}

Sending a summary to the LLM while keeping the full payload visible to callers:

use blazen_llm::{Tool, ToolDefinition, ToolOutput, LlmPayload, BlazenError};
use async_trait::async_trait;

# struct SearchTool;
# #[async_trait]
# impl Tool for SearchTool {
#     fn definition(&self) -> ToolDefinition { unimplemented!() }
async fn execute(
    &self,
    args: serde_json::Value,
) -> Result<ToolOutput<serde_json::Value>, BlazenError> {
    Ok(ToolOutput::with_override(
        serde_json::json!({ "items": [1, 2, 3], "raw": "..." }),
        LlmPayload::Text { text: "Found 3 items.".into() },
    ))
}
# }

The full data payload is preserved in ChatMessage.tool_result so application code can inspect the unredacted result, while only the llm_override is sent to the model on the next turn.


TypedTool

A generic wrapper that turns a typed handler Fn(Args) -> Future<Output = Result<ToolOutput<Output>>> into an implementation of Tool. Handles serde_json::from_value of the input and serde_json::to_value of the output for you, and auto-derives the JSON Schema in ToolDefinition::parameters via schemars::schema_for!.

pub struct TypedTool<Args, Output, F>
where
    Args: DeserializeOwned + JsonSchema + Send + 'static,
    Output: Serialize + Send + 'static,
    F: Fn(Args) -> BoxFut<Output> + Send + Sync + 'static,
{ /* ... */ }

impl<Args, Output, F> TypedTool<Args, Output, F> {
    pub fn new(
        name: impl Into<String>,
        description: impl Into<String>,
        handler: F,
    ) -> Self;
}

Usage:

use blazen_llm::{TypedTool, ToolOutput};
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};

#[derive(Deserialize, JsonSchema)]
struct AddArgs { a: i64, b: i64 }

#[derive(Serialize)]
struct AddOutput { sum: i64 }

let add_tool = TypedTool::new(
    "add",
    "Add two numbers.",
    |args: AddArgs| {
        Box::pin(async move {
            Ok(ToolOutput::new(AddOutput { sum: args.a + args.b }))
        })
    },
);

typed_tool_simple

Convenience constructor for the no-override common case. The handler returns Result<Output> directly; the wrapper applies ToolOutput::new for you.

pub fn typed_tool_simple<Args, Output, Fut, F>(
    name: impl Into<String>,
    description: impl Into<String>,
    handler: F,
) -> impl Tool
where
    Args: DeserializeOwned + JsonSchema + Send + 'static,
    Output: Serialize + Send + 'static,
    Fut: Future<Output = Result<Output, BlazenError>> + Send + 'static,
    F: Fn(Args) -> Fut + Send + Sync + 'static;

Usage:

use blazen_llm::{typed_tool_simple, BlazenError};

let add_tool = typed_tool_simple(
    "add",
    "Add two numbers.",
    |args: AddArgs| async move {
        Ok::<_, BlazenError>(AddOutput { sum: args.a + args.b })
    },
);

Why it exists: TypedTool does the serde_json::from_value of the input and serde_json::to_value of the output for you, exactly once per call. The JSON Schema in ToolDefinition::parameters is auto-derived from Args via schemars::schema_for!, so you do not have to hand-write the schema or repeat field names.


ToolOutput

The return type of Tool::execute. Carries the structured data that callers see, plus an optional llm_override controlling what the LLM receives on the next turn.

pub struct ToolOutput<T = Value> {
    pub data: T,
    pub llm_override: Option<LlmPayload>,
}
FieldTypeDescription
dataTStructured tool output. Visible to application code via ChatMessage.tool_result.
llm_overrideOption<LlmPayload>If Some, replaces the default representation of data when serialised into the next prompt. If None, each provider applies a sensible default (see LlmPayload).

Constructors:

ConstructorSignatureDescription
ToolOutput::newfn(data: T) -> SelfWrap structured data with no override. The LLM sees the provider default.
ToolOutput::with_overridefn(data: T, override_payload: LlmPayload) -> SelfWrap structured data and pin exactly what the LLM receives next turn.
From<Value>impl From<Value> for ToolOutput<Value>value.into() produces ToolOutput { data: value, llm_override: None }. Lets Tool::execute keep returning bare Values with Ok(value.into()).

Usage:

use blazen_llm::{ToolOutput, LlmPayload};
use serde_json::json;

let plain = ToolOutput::new(json!({ "items": [1, 2, 3] }));

let with_summary = ToolOutput::with_override(
    json!({ "items": [1, 2, 3], "raw": "..." }),
    LlmPayload::Text { text: "Found 3 items.".into() },
);

Why it exists: many tools return large structured payloads that the application wants in full (logs, UI, downstream steps), but feeding all of it back into the next LLM call is wasteful or noisy. ToolOutput decouples the two channels so you can return rich data to the caller while pinning a compact summary for the model.


LlmPayload

The wire-format-agnostic shape of a tool result as it appears to the LLM. Used as ToolOutput::llm_override and as the second component of ChatMessage::tool_result_view.

#[serde(tag = "kind", rename_all = "snake_case")]
pub enum LlmPayload {
    Text { text: String },
    Json { value: serde_json::Value },
    Parts { parts: Vec<ContentPart> },
    ProviderRaw { provider: ProviderId, value: serde_json::Value },
}
VariantDescription
Text { text }Plain text. Sent as-is to providers that accept string tool results, or wrapped in [{type: "text", text}] for providers that require parts.
Json { value }Structured JSON. Each provider serialises this in its native shape (see per-provider behaviour below).
Parts { parts }A Vec<ContentPart> for multimodal tool results (text + images + files). Used together with ChatMessage::tool_result_parts.
ProviderRaw { provider, value }An exact, provider-specific payload. Bypasses Blazen’s translation layer and is forwarded verbatim only when the active provider matches provider; other providers fall back to the default representation of data.

Usage:

use blazen_llm::{LlmPayload, ProviderId};
use serde_json::json;

LlmPayload::Text { text: "Found 3 results.".into() };
LlmPayload::Json { value: json!({ "items": [1, 2, 3] }) };
LlmPayload::ProviderRaw {
    provider: ProviderId::Anthropic,
    value: json!([{"type": "text", "text": "..."}]),
};

Per-provider behaviour for the default (no llm_override) case:

When a tool returns structured data and no llm_override, each provider sends a sensible default to the LLM:

  • OpenAI / OpenAI-compat / Azure / Responses / Fal: the data is JSON-stringified into the content field of the tool message.
  • Anthropic: structured data becomes [{type: "text", text: <stringified-json>}] inside tool_result.content.
  • Gemini: structured object data is passed natively as functionResponse.response. Scalars are wrapped as {result: <scalar>}.

ProviderId

Tags an LlmPayload::ProviderRaw variant with the provider whose wire format the value follows. The runtime uses this to decide whether to forward the raw payload or fall back to the default representation.

pub enum ProviderId {
    OpenAi,
    OpenAiCompat,
    Azure,
    Anthropic,
    Gemini,
    Responses,
    Fal,
}
VariantProvider
OpenAiOpenAiProvider
OpenAiCompatOpenAiCompatProvider (OpenRouter, Groq, Together, etc.)
AzureAzureOpenAiProvider
AnthropicAnthropicProvider
GeminiGeminiProvider
ResponsesOpenAI Responses API provider
FalFalProvider

ModelRegistry

Allows providers to advertise their available models.

#[async_trait]
pub trait ModelRegistry: Send + Sync {
    async fn list_models(&self) -> Result<Vec<ModelInfo>, BlazenError>;
    async fn get_model(&self, model_id: &str) -> Result<Option<ModelInfo>, BlazenError>;
}

ModelInfo

FieldTypeDescription
idStringModel identifier used in API requests (e.g. "gpt-4o")
nameOption<String>Human-readable display name
providerStringProvider that serves this model
context_lengthOption<u64>Maximum context window in tokens
pricingOption<ModelPricing>Pricing information
capabilitiesModelCapabilitiesWhat this model can do

ModelPricing

FieldTypeDescription
input_per_millionOption<f64>Cost per million input tokens (USD)
output_per_millionOption<f64>Cost per million output tokens (USD)
per_imageOption<f64>Cost per image (image generation models)
per_secondOption<f64>Cost per second of compute

ModelCapabilities

FieldTypeDescription
chatboolSupports chat completions
streamingboolSupports streaming responses
tool_useboolSupports tool/function calling
structured_outputboolSupports JSON schema constraints
visionboolSupports image inputs
image_generationboolSupports image generation
embeddingsboolSupports text embeddings
video_generationboolVideo generation support
text_to_speechboolText-to-speech synthesis
speech_to_textboolSpeech-to-text transcription
audio_generationboolAudio generation (music, SFX)
three_d_generationbool3D model generation

Types

ChatMessage

A single message in a chat conversation.

FieldTypeDescription
roleRoleWho produced this message
contentMessageContentThe message payload. For tool-result messages with a plain string return, the string lives here as MessageContent::Text(s).
tool_callsVec<ToolCall>Tool invocations requested by the assistant on this message (empty for non-assistant roles).
tool_resultOption<ToolOutput<serde_json::Value>>Structured tool-result payload for tool-role messages. Some only when the tool returned non-string data or supplied an llm_override. Plain-string results live in content instead.
nameOption<String>Tool name (set for tool-role messages).
tool_call_idOption<String>Provider-assigned id from the originating ToolCall.

Constructors:

// Text messages
ChatMessage::system("You are a helpful assistant")
ChatMessage::user("Hello!")
ChatMessage::assistant("Hi there!")
ChatMessage::tool("{ \"result\": 42 }")

// Multimodal messages
ChatMessage::user_image_url("Describe this", "https://img.com/a.png", Some("image/png"))
ChatMessage::user_image_base64("What is this?", "iVBORw0K...", "image/jpeg")
ChatMessage::user_parts(vec![
    ContentPart::Text { text: "Look at this:".into() },
    ContentPart::Image(ImageContent {
        source: ImageSource::Url { url: "https://...".into() },
        media_type: Some("image/png".into()),
    }),
    ContentPart::File(FileContent {
        source: ImageSource::Url { url: "https://...".into() },
        media_type: "application/pdf".into(),
        filename: Some("doc.pdf".into()),
    }),
])

ChatMessage::tool_result

Build a tool-role message that closes a prior ToolCall. Routes plain strings into content and structured payloads (or anything with an llm_override) onto the new tool_result sibling field.

pub fn tool_result(
    call_id: impl Into<String>,
    name: impl Into<String>,
    output: impl Into<ToolOutput<serde_json::Value>>,
) -> Self
ArgumentDescription
call_idThe id from the originating ToolCall. Stored on tool_call_id.
nameThe tool name. Stored on name.
outputA ToolOutput<Value> (or anything that converts via From<Value>, e.g. serde_json::Value directly).

Routing rules:

  • If output.data == Value::String(s) and output.llm_override.is_none(), the string is moved into content as MessageContent::Text(s) and tool_result stays None.
  • Otherwise, output is stored verbatim on tool_result and content is set to MessageContent::Text(String::new()).

Usage:

use blazen_llm::{ChatMessage, ToolOutput, LlmPayload};
use serde_json::json;

// Plain string -- lives in content as a regular text message.
ChatMessage::tool_result("call_1", "search", json!("hello"));

// Structured -- lives in the tool_result sibling field.
ChatMessage::tool_result("call_1", "search", json!({"items": [1, 2, 3]}));

// With override -- full data preserved on the message,
// summary sent to the LLM next turn.
ChatMessage::tool_result(
    "call_1",
    "search",
    ToolOutput::with_override(
        json!({"items": [1, 2, 3], "raw": "..."}),
        LlmPayload::Text { text: "Found 3 items.".into() },
    ),
);

:::caution[Breaking change] Earlier versions accepted content: impl Into<String> and stored the string verbatim. The new signature accepts output: impl Into<ToolOutput<Value>>. Callers that passed a &str should switch to serde_json::Value::String(s.into()) or serde_json::json!(s). :::

ChatMessage::tool_result_parts

Build a tool-role message whose result carries multimodal content (text + images + files). The parts ride on tool_result as LlmPayload::Parts { parts } so providers that support multimodal tool results (Anthropic, Gemini) can forward them natively.

pub fn tool_result_parts(
    call_id: impl Into<String>,
    name: impl Into<String>,
    parts: Vec<ContentPart>,
) -> Self

Usage:

use blazen_llm::{ChatMessage, ContentPart, ImageContent, ImageSource};

ChatMessage::tool_result_parts(
    "call_1",
    "render_chart",
    vec![
        ContentPart::Text { text: "Rendered the requested chart.".into() },
        ContentPart::Image(ImageContent {
            source: ImageSource::Url { url: "https://example.com/chart.png".into() },
            media_type: Some("image/png".into()),
        }),
    ],
);

ChatMessage::tool_result_view

Accessor returning both channels of a tool-result message in a single call. Used internally by provider implementations that need to choose between the structured data and an explicit llm_override when serialising to the wire format.

pub fn tool_result_view(&self) -> Option<(serde_json::Value, Option<&LlmPayload>)>

Returns None for non-tool-role messages. For tool-role messages it returns Some((data, override)) where data is the raw serde_json::Value payload (drawn from tool_result.data when present, otherwise reconstructed from the plain-string content) and override is the optional &LlmPayload from tool_result.llm_override.


Role

pub enum Role {
    System,
    User,
    Assistant,
    Tool,
}

MessageContent

pub enum MessageContent {
    Text(String),
    Image(ImageContent),
    Parts(Vec<ContentPart>),
}
MethodSignatureDescription
as_text()&self -> Option<&str>Return the text if this is a Text variant
as_parts()&self -> Vec<ContentPart>Convert any variant into a Vec<ContentPart>
text_content()&self -> Option<String>Extract and concatenate all text content

MessageContent implements From<&str> and From<String>.


ContentPart

pub enum ContentPart {
    Text { text: String },
    Image(ImageContent),
    File(FileContent),
}

ImageContent

FieldTypeDescription
sourceImageSourceURL or base64 data
media_typeOption<String>MIME type (e.g. "image/png")

ImageSource

The source of an image, file, or any other media payload. Marked #[non_exhaustive] so new variants can be added without breaking callers — always pattern-match with a wildcard arm. MediaSource is a type alias for ImageSource and is the preferred name when the value is not specifically an image.

#[non_exhaustive]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum ImageSource {
    Url { url: String },
    Base64 { data: String },
    File { path: PathBuf },
    ProviderFile { provider: ProviderId, id: String },
    Handle { handle: ContentHandle },
}

pub type MediaSource = ImageSource;
VariantDescription
Url { url }Public URL the provider can fetch directly.
Base64 { data }Inline base64-encoded bytes. Pair with media_type on the enclosing ImageContent / FileContent.
File { path }Local filesystem path. Use ImageSource::file(path) to construct. Resolved at request time by a ContentStore or by the provider adapter.
ProviderFile { provider, id }Reference to a file already uploaded to a provider’s Files API (e.g. an OpenAI file-xxx id). Forwarded verbatim only when the active provider matches provider.
Handle { handle }Reference to a ContentHandle registered with a ContentStore. Resolved into a concrete Url / Base64 / ProviderFile by CompletionRequest::resolve_handles_with before the request hits a provider.

FileContent

FieldTypeDescription
sourceImageSourceURL or base64 data
media_typeStringMIME type (e.g. "application/pdf")
filenameOption<String>Optional filename for display

CompletionRequest

A provider-agnostic request for a chat completion.

FieldTypeDescription
messagesVec<ChatMessage>The conversation history
toolsVec<ToolDefinition>Tools available for the model to invoke
temperatureOption<f32>Sampling temperature (0.0 = deterministic, 2.0 = very random)
max_tokensOption<u32>Maximum number of tokens to generate
top_pOption<f32>Nucleus sampling parameter
response_formatOption<serde_json::Value>JSON Schema for structured output
modelOption<String>Override the provider’s default model
modalitiesOption<Vec<String>>Output modalities (e.g. ["text"], ["image", "text"])
image_configOption<serde_json::Value>Image generation configuration (model-specific)
audio_configOption<serde_json::Value>Audio output configuration (voice, format, etc.)

Builder pattern:

let request = CompletionRequest::new(vec![ChatMessage::user("Hello")])
    .with_tools(tool_defs)
    .with_temperature(0.7)
    .with_max_tokens(1024)
    .with_top_p(0.9)
    .with_response_format(schema_json)
    .with_model("gpt-4o")
    .with_modalities(vec!["text".into(), "image".into()])
    .with_image_config(serde_json::json!({ "size": "1024x1024" }))
    .with_audio_config(serde_json::json!({ "voice": "alloy" }));

CompletionResponse

The result of a non-streaming chat completion.

FieldTypeDescription
contentOption<String>Text content of the assistant’s reply
tool_callsVec<ToolCall>Tool invocations requested by the model
usageOption<TokenUsage>Token usage statistics
modelStringThe model that produced this response
finish_reasonOption<String>Why the model stopped (e.g. "stop", "tool_use")
costOption<f64>Estimated cost in USD
timingOption<RequestTiming>Request timing breakdown
imagesVec<GeneratedImage>Generated images (multimodal models)
audioVec<GeneratedAudio>Generated audio (TTS / multimodal)
videosVec<GeneratedVideo>Generated videos
metadataserde_json::ValueProvider-specific metadata

StructuredResponse<T>

Response from structured output extraction, preserving metadata.

FieldTypeDescription
dataTThe extracted structured data
usageOption<TokenUsage>Token usage statistics
modelStringThe model that produced this response
costOption<f64>Estimated cost in USD
timingOption<RequestTiming>Request timing
metadataserde_json::ValueProvider-specific metadata

EmbeddingResponse

Response from an embedding operation.

FieldTypeDescription
embeddingsVec<Vec<f32>>The embedding vectors (one per input text)
modelStringThe model used
usageOption<TokenUsage>Token usage statistics
costOption<f64>Estimated cost in USD
timingOption<RequestTiming>Request timing
metadataserde_json::ValueProvider-specific metadata

RequestTiming

Timing metadata for a request.

FieldTypeDescription
queue_msOption<u64>Time spent waiting in queue (ms)
execution_msOption<u64>Time spent executing the request (ms)
total_msOption<u64>Total wall-clock time from submit to response (ms)

TokenUsage

Token usage statistics for a completion request.

FieldTypeDescription
prompt_tokensu32Tokens in the prompt / input
completion_tokensu32Tokens in the completion / output
total_tokensu32Total tokens consumed (prompt + completion)

ToolDefinition

Describes a tool that the model may invoke.

FieldTypeDescription
nameStringUnique name of the tool
descriptionStringHuman-readable description
parametersserde_json::ValueJSON Schema describing the tool’s input parameters

ToolCall

A tool invocation requested by the model.

FieldTypeDescription
idStringProvider-assigned identifier for this invocation
nameStringName of the tool to invoke
argumentsserde_json::ValueArguments to pass, as JSON

StreamChunk

A single chunk from a streaming completion response.

FieldTypeDescription
deltaOption<String>Incremental text content
tool_callsVec<ToolCall>Tool invocations completed in this chunk
finish_reasonOption<String>Present in the final chunk to indicate why generation stopped

Content Subsystem

A provider-agnostic layer for handing media (images, audio, video, documents, 3D, CAD, archives, fonts, code, generic data) to and from models. The core idea: instead of inlining bytes or URLs in every message, callers register payloads with a ContentStore, receive a small ContentHandle, and reference the handle from messages, tool inputs, and tool outputs. Just before a request is sent, the runtime resolves every handle into the concrete representation the active provider expects — a URL, a base64 blob, or a ProviderFile reference for providers with native Files APIs.

Everything in this section is re-exported from blazen_llm::content.

ContentKind

Coarse classification for a piece of content. Marked #[non_exhaustive]. Serde uses rename_all = "snake_case", so ThreeDModel round-trips as "three_d_model". Implements Display.

#[non_exhaustive]
pub enum ContentKind {
    Image,
    Audio,
    Video,
    Document,
    ThreeDModel,
    Cad,
    Archive,
    Font,
    Code,
    Data,
    Other,
}
MethodSignatureDescription
from_mimefn(&str) -> SelfBest-effort classification from a MIME string.
from_extensionfn(&str) -> SelfBest-effort classification from a filename extension (no leading dot required).
as_strfn(self) -> &'static strSnake-case string form (matches the serde representation).
use blazen_llm::content::ContentKind;

assert_eq!(ContentKind::from_mime("image/png"), ContentKind::Image);
assert_eq!(ContentKind::from_extension("glb"), ContentKind::ThreeDModel);
assert_eq!(ContentKind::ThreeDModel.as_str(), "three_d_model");

ContentHandle

An opaque pointer to a payload owned by a ContentStore. Cheap to clone, safe to embed in messages and tool arguments, and resolvable on demand.

pub struct ContentHandle {
    pub id: String,
    pub kind: ContentKind,
    pub mime_type: Option<String>,
    pub byte_size: Option<u64>,
    pub display_name: Option<String>,
}
MethodSignatureDescription
newfn(id: impl Into<String>, kind: ContentKind) -> SelfConstruct a handle with no metadata.
with_mime_typefn(self, mime: impl Into<String>) -> SelfBuilder: attach a MIME type.
with_byte_sizefn(self, bytes: u64) -> SelfBuilder: attach a byte size.
with_display_namefn(self, name: impl Into<String>) -> SelfBuilder: attach a human-readable name.
use blazen_llm::content::{ContentHandle, ContentKind};

let handle = ContentHandle::new("blob_abc123", ContentKind::Image)
    .with_mime_type("image/png")
    .with_byte_size(48_213)
    .with_display_name("chart.png");

ContentStore

The async trait every store implements. Stores own the bytes (or the URL, or the provider-side file id) and translate handles into something the provider can consume on demand.

#[async_trait]
pub trait ContentStore: Send + Sync + std::fmt::Debug {
    async fn put(&self, body: ContentBody, hint: ContentHint)
        -> Result<ContentHandle, BlazenError>;
    async fn resolve(&self, handle: &ContentHandle)
        -> Result<MediaSource, BlazenError>;
    async fn fetch_bytes(&self, handle: &ContentHandle)
        -> Result<Vec<u8>, BlazenError>;
    async fn metadata(&self, handle: &ContentHandle)
        -> Result<ContentMetadata, BlazenError> { /* default */ }
    async fn fetch_stream(&self, handle: &ContentHandle)
        -> Result<ByteStream, BlazenError> { /* default */ }
    async fn delete(&self, _handle: &ContentHandle)
        -> Result<(), BlazenError> { Ok(()) }
}
MethodDescription
putIngest a ContentBody under a ContentHint and return a fresh handle.
resolveProduce the concrete MediaSource the provider will see (typically Url, Base64, or ProviderFile). Called by CompletionRequest::resolve_handles_with.
fetch_bytesMaterialise the underlying bytes. Used when a tool needs to read the payload directly.
metadataReturn ContentMetadata. The default impl synthesises this from the handle’s own fields; stores with richer indices should override.
fetch_streamStream raw bytes back as a ByteStream. The default impl buffers fetch_bytes into a single chunk via futures::stream::once, so existing impls keep working unchanged. Stores backed by HTTP / disk / object storage should override for true incremental streaming. Built-in overrides today: LocalFileContentStore (uses tokio_util::io::ReaderStream), OpenAiFilesStore, AnthropicFilesStore, FalStorageStore (all via the HttpClient trait’s send_streaming method). InMemoryContentStore and GeminiFilesStore use the buffered default impl.
deleteBest-effort deletion. Default is a no-op so read-only stores can leave it unimplemented.
use std::sync::Arc;
use blazen_llm::content::{ContentBody, ContentHint, ContentKind, InMemoryContentStore, ContentStore};

let store = Arc::new(InMemoryContentStore::new());

let handle = store.put(
    ContentBody::Bytes { data: std::fs::read("chart.png")? },
    ContentHint::default()
        .with_mime_type("image/png")
        .with_kind(ContentKind::Image)
        .with_display_name("chart.png"),
).await?;

let source = store.resolve(&handle).await?;

ContentBody

The five ways a caller can hand bytes (or a pointer to bytes) to a ContentStore via put. Variants are struct-form (named fields), and serde uses an internally-tagged representation (tag = "type", rename_all = "snake_case").

#[derive(Serialize, Deserialize)]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum ContentBody {
    Bytes { data: Vec<u8> },
    Url { url: String },
    LocalPath { path: PathBuf },
    ProviderFile { provider: ProviderId, id: String },
    /// Streaming byte source. The store consumes the stream during `put`
    /// and is free to spool to disk, forward as a chunked upload, or
    /// drain into bytes.
    #[serde(skip)]
    Stream {
        stream: ByteStream,
        size_hint: Option<u64>,
    },
}
VariantDescription
Bytes { data }In-memory payload. The store decides whether to keep it in RAM, spill to disk, or upload to a provider.
Url { url }Remote URL the store may fetch lazily or pass through verbatim on resolve.
LocalPath { path }Local filesystem path. Native-only stores (e.g. LocalFileContentStore) can index it without copying.
ProviderFile { provider, id }A pre-existing file id on a provider’s Files API. Lets you wrap an externally uploaded asset in a handle without re-uploading.
Stream { stream, size_hint }Streaming byte source built on ByteStream. Stores with a true streaming upload path (filesystem, S3, HTTP multipart) consume the stream incrementally; memory-bound stores buffer. size_hint carries the total length when known up front (e.g. from a Content-Length header) so stores can pre-allocate or pick between simple and resumable upload paths.

ContentBody::Stream is not Clone (a ByteStream is single-use) and not Serialize — the variant is annotated #[serde(skip)] because ByteStream implements neither Serialize nor Deserialize. The manual Clone impl panics with unreachable! if you clone a Stream variant; consume streaming bodies by value. Bindings that route ContentBody through serde_json must check for Stream first and handle it on a separate path.

use blazen_llm::content::ContentBody;

let from_memory = ContentBody::Bytes { data: b"hello".to_vec() };
let from_disk   = ContentBody::LocalPath { path: "./report.pdf".into() };
let from_url    = ContentBody::Url { url: "https://example.com/a.png".into() };

ByteStream

A pinned, boxed, fallible stream of byte chunks. Used by ContentBody::Stream for streaming uploads and by ContentStore::fetch_stream for streaming downloads.

pub type ByteStream = std::pin::Pin<
    Box<dyn futures_core::Stream<Item = Result<bytes::Bytes, BlazenError>> + Send>
>;

Stores backed by HTTP, S3, or the filesystem should produce / consume these incrementally; memory-bound stores may buffer.


ContentHint

Optional metadata callers pass alongside a ContentBody so the store can pick a sensible MIME type, classification, and display name without re-sniffing. Implements Default.

pub struct ContentHint {
    pub mime_type: Option<String>,
    pub kind_hint: Option<ContentKind>,
    pub display_name: Option<String>,
    pub byte_size: Option<u64>,
}
MethodSignatureDescription
with_mime_typefn(self, mime: impl Into<String>) -> SelfPin the MIME type.
with_kindfn(self, kind: ContentKind) -> SelfPin the ContentKind.
with_display_namefn(self, name: impl Into<String>) -> SelfPin a human-readable name.
with_byte_sizefn(self, bytes: u64) -> SelfPin the byte size when known up front (e.g. from a Content-Length header).
use blazen_llm::content::{ContentHint, ContentKind};

let hint = ContentHint::default()
    .with_mime_type("audio/wav")
    .with_kind(ContentKind::Audio)
    .with_display_name("note.wav");

ContentMetadata

The non-id fields of a ContentHandle, returned by ContentStore::metadata. Useful for cheap introspection without materialising bytes.

pub struct ContentMetadata {
    pub kind: ContentKind,
    pub mime_type: Option<String>,
    pub byte_size: Option<u64>,
    pub display_name: Option<String>,
}
use blazen_llm::content::ContentStore;

let meta = store.metadata(&handle).await?;
println!("{} ({} bytes)", meta.kind, meta.byte_size.unwrap_or(0));

DynContentStore

Convenience alias for the shared, thread-safe form most code passes around.

pub type DynContentStore = std::sync::Arc<dyn ContentStore>;
use std::sync::Arc;
use blazen_llm::content::{DynContentStore, InMemoryContentStore};

let store: DynContentStore = Arc::new(InMemoryContentStore::new());

Built-in stores

StoreConstructorNotes
InMemoryContentStoreInMemoryContentStore::new() (also Default)Bytes / URL / provider-file refs held in a RwLock-guarded map. Great for tests and short-lived sessions.
LocalFileContentStoreLocalFileContentStore::new(root: impl Into<PathBuf>) -> Result<Self, BlazenError>Native-only (not(target_arch = "wasm32")). Persists payloads under root; assigns each entry a UUID-derived filename and tracks the index in memory.
OpenAiFilesStoreOpenAiFilesStore::new(api_key), .with_base_url(url), .with_purpose(p)Uploads via OpenAI’s /v1/files API. purpose defaults to "user_data" (override for assistants / fine-tuning / batch).
AnthropicFilesStoreAnthropicFilesStore::new(api_key), .with_base_url(url), .with_beta_header(h)Uploads via Anthropic’s Files API. beta_header is forwarded as anthropic-beta.
GeminiFilesStoreGeminiFilesStore::new(api_key), .with_base_url(url)Uploads via Google’s Files API and resolves handles to gs:///file-uri refs.
FalStorageStoreFalStorageStore::new(api_key), .with_base_url(url)Uploads to Fal’s storage endpoint.
CustomContentStoreCustomContentStore::builder(name) -> CustomContentStoreBuilderBuild a store from closures: .put(...), .resolve(...), .fetch_bytes(...), .fetch_stream(...) (optional), .delete(...) (optional), .build(). The .fetch_stream callback is optional — when omitted, the trait’s default impl buffers fetch_bytes into one chunk via stream::once. Lets callers integrate their own blob backend without writing a new trait impl.
use std::sync::Arc;
use blazen_llm::content::{
    AnthropicFilesStore, CustomContentStore, InMemoryContentStore,
    LocalFileContentStore, OpenAiFilesStore,
};

let in_mem = Arc::new(InMemoryContentStore::new());
let on_disk = Arc::new(LocalFileContentStore::new("/var/cache/blazen")?);
let openai = Arc::new(OpenAiFilesStore::new(std::env::var("OPENAI_API_KEY")?)
    .with_purpose("user_data"));
let anthropic = Arc::new(AnthropicFilesStore::new(std::env::var("ANTHROPIC_API_KEY")?)
    .with_beta_header("files-api-2025-04-14"));

let custom = Arc::new(
    CustomContentStore::builder("s3-store")
        .put(|body, hint| async move { /* upload, return ContentHandle */ todo!() })
        .resolve(|handle| async move { /* return MediaSource */ todo!() })
        .fetch_bytes(|handle| async move { /* return Vec<u8> */ todo!() })
        .fetch_stream(|handle| Box::pin(async move {
            // OPTIONAL: stream bytes back chunk-by-chunk for large content.
            // When omitted, the default impl buffers fetch_bytes into one chunk.
            use bytes::Bytes;
            use futures_util::stream;
            let chunks: Vec<Result<Bytes, _>> = vec![Ok(Bytes::from_static(b"hello"))];
            Ok(Box::pin(stream::iter(chunks)) as blazen_llm::content::ByteStream)
        }))
        .delete(|handle| async move { /* delete blob */ Ok(()) })
        .build()?,
);

Tool-input helpers

Helpers that produce JSON Schema fragments for tool parameters that should accept a ContentHandle. Each fragment is tagged with x-blazen-content-ref so resolve_tool_arguments knows where to substitute resolved MediaSource values before the tool runs.

pub fn image_input(name: &str, description: &str) -> serde_json::Value;
pub fn audio_input(name: &str, description: &str) -> serde_json::Value;
pub fn video_input(name: &str, description: &str) -> serde_json::Value;
pub fn file_input(name: &str, description: &str) -> serde_json::Value;
pub fn three_d_input(name: &str, description: &str) -> serde_json::Value;
pub fn cad_input(name: &str, description: &str) -> serde_json::Value;

pub fn content_ref_property(
    kind: ContentKind,
    description: &str,
) -> serde_json::Value;

pub fn content_ref_required_object(
    name: &str,
    kind: ContentKind,
    description: &str,
    extra_properties: serde_json::Map<String, serde_json::Value>,
) -> serde_json::Value;

pub async fn resolve_tool_arguments(
    arguments: &mut serde_json::Value,
    schema: &serde_json::Value,
    store: &dyn ContentStore,
) -> Result<usize, BlazenError>;

pub struct KindMismatch {
    pub property: String,
    pub expected: ContentKind,
    pub actual: ContentKind,
}
HelperDescription
image_input / audio_input / video_input / file_input / three_d_input / cad_inputTop-level convenience: returns a single-property required object schema for a typed content reference.
content_ref_propertySchema for one property accepting a ContentHandle of the given ContentKind. Use when assembling a custom multi-property schema.
content_ref_required_objectBuild a required object schema mixing one content ref with extra_properties (other primitives, enums, etc.).
resolve_tool_argumentsWalk arguments against schema, replace every x-blazen-content-ref site with the MediaSource returned by store.resolve. Returns the number of substitutions made.
KindMismatchError variant returned when a handle’s ContentKind does not match the schema’s declared expected kind.
use blazen_llm::content::tool_input::{image_input, resolve_tool_arguments};
use serde_json::{json, Map};

let schema = json!({
    "type": "object",
    "properties": {
        "image": image_input("image", "The image to caption"),
        "max_words": { "type": "integer" },
    },
    "required": ["image", "max_words"],
});

let mut args = json!({
    "image": { "id": "blob_abc123", "kind": "image" },
    "max_words": 20,
});

let n = resolve_tool_arguments(&mut args, &schema, store.as_ref()).await?;
// `args["image"]` is now a concrete MediaSource (Url / Base64 / ProviderFile).
println!("rewrote {n} content refs");

Visibility helpers

Helpers for the runtime’s “what handles is the model actually allowed to see right now?” pass.

pub fn collect_visible_handles(messages: &[ChatMessage]) -> Vec<ContentHandle>;

pub fn build_handle_directory_system_note(
    handles: &[ContentHandle],
) -> Option<String>;

pub async fn prepare_request_with_store(
    request: &mut CompletionRequest,
    store: &dyn ContentStore,
) -> Result<usize, BlazenError>;
HelperDescription
collect_visible_handlesWalks messages and returns every distinct ContentHandle referenced from user/assistant/tool content, deduped first-seen.
build_handle_directory_system_noteReturns a system-note string listing the visible handles (id, kind, MIME, name) so the model can name them when calling tools. Returns None when handles is empty — callers should not append an empty note.
prepare_request_with_storeOne-call convenience: runs CompletionRequest::resolve_handles_with, then builds and prepends the system note. Returns the number of resolved handles. This is what the agent loop calls before dispatching a request.
use blazen_llm::content::visibility::{
    collect_visible_handles, build_handle_directory_system_note, prepare_request_with_store,
};

let visible = collect_visible_handles(&request.messages);
if let Some(note) = build_handle_directory_system_note(&visible) {
    println!("would inject system note:\n{note}");
}

let n = prepare_request_with_store(&mut request, store.as_ref()).await?;
println!("resolved {n} handles before dispatch");

Magic-number detection

Lightweight content sniffing backed by infer. Gated by the default-on content-detect Cargo feature — disabling the feature drops the infer dependency entirely (the functions remain but return (ContentKind::Other, None)).

pub fn detect_from_bytes(bytes: &[u8]) -> (ContentKind, Option<String>);

#[cfg(not(target_arch = "wasm32"))]
pub fn detect_from_path(path: &std::path::Path) -> (ContentKind, Option<String>);

pub fn detect(
    bytes: Option<&[u8]>,
    mime_hint: Option<&str>,
    filename: Option<&str>,
) -> (ContentKind, Option<String>);
FunctionDescription
detect_from_bytesSniff the leading bytes and return the inferred ContentKind plus the matching MIME string if any.
detect_from_pathNative-only. Reads the head of the file, then falls back to the extension when bytes are inconclusive.
detectCombined entry point: prefers byte sniffing when bytes is Some, then mime_hint, then filename extension. Returns (ContentKind::Other, None) when nothing matches.
use blazen_llm::content::{detect, detect_from_bytes};

let (kind, mime) = detect_from_bytes(&[0x89, b'P', b'N', b'G', 0x0d, 0x0a, 0x1a, 0x0a]);
assert_eq!(mime.as_deref(), Some("image/png"));

let (kind2, mime2) = detect(None, Some("application/pdf"), Some("report.pdf"));

CompletionRequest::resolve_handles_with

The lower-level half of prepare_request_with_store. Walks every message in the request, finds every ImageSource::Handle, and replaces it with the concrete MediaSource returned by store.resolve. Does not inject a system note — prefer prepare_request_with_store from the agent loop unless you specifically want to skip the note.

impl CompletionRequest {
    pub async fn resolve_handles_with(
        &mut self,
        store: &dyn ContentStore,
    ) -> Result<usize, BlazenError>;
}

Returns the number of Handle variants that were rewritten. Errors propagate from store.resolve.

let resolved = request.resolve_handles_with(store.as_ref()).await?;
println!("rewrote {resolved} handles in place");

Agent System

The agent system implements the standard LLM + tool calling loop: send messages with tool definitions, execute any tool calls the model makes, feed results back, and repeat until the model stops or max_iterations is reached.

run_agent()

Run the agent loop without event callbacks.

pub async fn run_agent(
    model: &dyn CompletionModel,
    messages: Vec<ChatMessage>,
    config: AgentConfig,
) -> Result<AgentResult, BlazenError>

run_agent_with_callback()

Run the agent loop, emitting AgentEvents to the supplied callback.

pub async fn run_agent_with_callback(
    model: &dyn CompletionModel,
    messages: Vec<ChatMessage>,
    config: AgentConfig,
    on_event: impl Fn(AgentEvent) + Send + Sync,
) -> Result<AgentResult, BlazenError>

The loop works as follows:

  1. Build a CompletionRequest with the full message history and all tool definitions.
  2. Call the model.
  3. If the model responds with no tool calls, return immediately.
  4. If the model invoked the built-in “finish” tool (when enabled), extract the answer and return.
  5. Otherwise, execute each tool call, append results to messages, go back to step 1.
  6. If max_iterations is reached, make one final call without tools to force a text answer.

Usage:

use std::sync::Arc;
use blazen_llm::{run_agent, AgentConfig, ChatMessage};

let config = AgentConfig::new(vec![Arc::new(WeatherTool)])
    .with_system_prompt("You are a helpful assistant with weather tools.")
    .with_max_iterations(5)
    .with_finish_tool()
    .with_temperature(0.7)
    .with_max_tokens(2048);

let result = run_agent(
    &model,
    vec![ChatMessage::user("What's the weather in Paris?")],
    config,
).await?;

println!("Answer: {}", result.response.content.unwrap_or_default());
println!("Iterations: {}", result.iterations);
println!("Total cost: ${:.4}", result.total_cost.unwrap_or(0.0));

With callback:

use blazen_llm::{run_agent_with_callback, AgentEvent};

let result = run_agent_with_callback(
    &model,
    vec![ChatMessage::user("What's the weather?")],
    config,
    |event| match &event {
        AgentEvent::ToolCalled { iteration, tool_call } => {
            println!("[iter {iteration}] calling tool: {}", tool_call.name);
        }
        AgentEvent::ToolResult { tool_name, result, .. } => {
            println!("  {tool_name} -> {result}");
        }
        AgentEvent::IterationComplete { iteration, had_tool_calls } => {
            println!("[iter {iteration}] done (tools: {had_tool_calls})");
        }
    },
).await?;

AgentConfig

Configuration for the agentic tool execution loop.

FieldTypeDefaultDescription
max_iterationsu3210Maximum tool call rounds before forcing a stop
toolsVec<Arc<dyn Tool>>requiredTools available to the agent
add_finish_toolboolfalseAdd an implicit “finish” tool the model can call to exit early
system_promptOption<String>NoneSystem prompt prepended to messages
temperatureOption<f32>NoneSampling temperature
max_tokensOption<u32>NoneMaximum tokens per completion call

Builder pattern:

AgentConfig::new(tools)
    .with_max_iterations(5)
    .with_system_prompt("You are helpful.")
    .with_finish_tool()
    .with_temperature(0.7)
    .with_max_tokens(2048)

AgentResult

Result of an agent run.

FieldTypeDescription
responseCompletionResponseThe final completion response
messagesVec<ChatMessage>Full message history including all tool calls and results
iterationsu32Number of tool call rounds that occurred
total_usageOption<TokenUsage>Aggregated token usage across all rounds
total_costOption<f64>Aggregated cost across all rounds
timingOption<RequestTiming>Total wall-clock time for the entire agent run

AgentEvent

Events emitted during agent execution (passed to the callback in run_agent_with_callback).

pub enum AgentEvent {
    ToolCalled {
        iteration: u32,
        tool_call: ToolCall,
    },
    ToolResult {
        iteration: u32,
        tool_name: String,
        result: serde_json::Value,
    },
    IterationComplete {
        iteration: u32,
        had_tool_calls: bool,
    },
}

Context

The Context object is a shared key-value store available in every workflow step. It provides three storage tiers and methods for event routing, streaming, and state management.

State Storage

Typed JSON: set() / get()

Store and retrieve any Serialize / DeserializeOwned type. Values are held internally as StateValue::Json.

// Store a typed value (anything implementing Serialize)
ctx.set("user_id", serde_json::json!("user_123"));
ctx.set("doc_count", serde_json::json!(5));

// Retrieve with type inference
let user_id: String = serde_json::from_value(ctx.get("user_id").unwrap()).unwrap();
let doc_count: i64 = serde_json::from_value(ctx.get("doc_count").unwrap()).unwrap();

Binary: set_bytes() / get_bytes()

Store raw Vec<u8> data. Values are held as StateValue::Bytes. No serialization requirement — useful for model weights, protobuf, bincode, or any binary format.

ctx.set_bytes("weights", vec![0x01, 0x02, 0x03]);
let bytes: Vec<u8> = ctx.get_bytes("weights").unwrap();

Raw StateValue: set_value() / get_value()

Work with the StateValue enum directly for full control over the storage variant, including the Native variant used by language bindings.

use blazen::context::StateValue;

ctx.set_value("config", StateValue::Json(serde_json::json!({"retries": 3})));
ctx.set_value("blob", StateValue::Bytes(vec![0xDE, 0xAD].into()));
ctx.set_value("py_obj", StateValue::Native(pickle_bytes.into()));

match ctx.get_value("config") {
    Some(StateValue::Json(v)) => { /* structured data */ }
    Some(StateValue::Bytes(b)) => { /* raw bytes */ }
    Some(StateValue::Native(b)) => { /* platform-serialized opaque bytes */ }
    None => { /* key not found */ }
}

StateValue

pub enum StateValue {
    Json(serde_json::Value),
    Bytes(BytesWrapper),
    Native(BytesWrapper),
}
VariantDescription
Json(serde_json::Value)Structured, serializable data. Used by ctx.set() / ctx.get().
Bytes(BytesWrapper)Raw binary data. Used by ctx.set_bytes() / ctx.get_bytes().
Native(BytesWrapper)Platform-serialized opaque objects (e.g., Python pickle bytes). Preserved across language boundaries without deserialization.

Run Identity

ctx.run_id() -> &str

Returns the unique identifier for the current workflow run.

Event Routing

ctx.send_event(event: impl Event)

Programmatically route an event into the workflow. Use this when a step needs to emit multiple events or decide at runtime which path to take. When using send_event, the step returns () instead of an event type.

ctx.write_event_to_stream(event: impl Event)

Publish an event to the workflow’s external event stream, observable by callers via stream_events(). Useful for progress reporting and live updates.

Session References

async fn session_refs_arc(&self) -> Arc<SessionRefRegistry>
async fn clear_session_refs(&self) -> usize
async fn session_pause_policy(&self) -> SessionPausePolicy
MethodDescription
session_refs_arc()Get a clone of the session-ref registry handle for use by language bindings. Bindings install it as a task-local for the duration of a step so platform-native objects (Py<PyAny>, napi::Ref<JsObject>, etc.) carried via event payloads can be resolved by UUID.
clear_session_refs()Drain the session-ref registry. Called on workflow termination by the language bindings to release platform-specific live refs back to their respective garbage collectors. Returns the number of entries removed.
session_pause_policy()Get the configured SessionPausePolicy. The policy is set by WorkflowBuilder::session_pause_policy; there is no public setter on Context.

See the dedicated Session Reference Registry section for background, key types, and the pause-time policy matrix.

State Snapshot and Restore

ctx.collect_events() -> Vec<Box<dyn Event>>
ctx.snapshot_state() -> ContextSnapshot
ctx.restore_state(snapshot: ContextSnapshot)
MethodDescription
collect_events()Drain all pending events from the context.
snapshot_state()Capture the entire context state as a serializable snapshot (for checkpointing / pause-resume).
restore_state(snapshot)Restore context from a previously captured snapshot.

:::caution[Session refs are not snapshotted] Context::snapshot_state intentionally excludes both the opaque objects map and the session_refs registry. Live in-process references cannot survive a cross-process snapshot round-trip. If your workflow may pause() and your bindings use session refs, configure WorkflowBuilder::session_pause_policy to control what happens (default: PickleOrError). :::


BlazenState

BlazenState is a binding-layer concept for Python, Node.js, and WASM. In those languages, extending a BlazenState base class gives you automatic per-field persistence in the workflow context. Rust has no equivalent base class.

In Rust, per-field storage is achieved manually by calling set_value() / get_value() with the StateValue enum (see the StateValue section above). Each field is stored under an explicit key, giving you full control over serialization format and storage variant.

The Native(BytesWrapper) variant exists specifically to support bindings: it lets platform-serialized objects (e.g., Python pickle bytes, Node.js v8.serialize output) round-trip through Rust steps without deserialization. Binding authors use StateValue::Native to store opaque platform objects, and Rust code can forward those values without interpreting their contents.


Session Reference Registry

blazen_core::session_ref provides a per-Context registry of live in-process references — values that cannot or should not be JSON-serialized, such as DB connections, file handles, large in-memory tensors, lambdas, locks, or platform-native objects like Py<PyAny> and napi::Ref<JsObject>.

Each Context owns its own SessionRefRegistry with a lifetime tied to the workflow run. Event payloads carry only a JSON marker containing the key; the actual object lives in the registry until workflow completion. Bindings detect the marker and resolve it through the active registry to preserve object identity across step boundaries without serialisation.

The JSON marker format is:

{"__blazen_session_ref__": "<uuid>"}

The tag string is exposed as a constant:

pub const SESSION_REF_TAG: &str = "__blazen_session_ref__";

A defensive cap protects against runaway loops exhausting memory:

pub const MAX_SESSION_REFS_PER_RUN: usize = 10_000;

insert_arc / insert return SessionRefError::CapacityExceeded once the registry reaches this cap.

RegistryKey

Strongly-typed wrapper around Uuid used as the registry key.

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
#[serde(transparent)]
pub struct RegistryKey(pub Uuid);
MethodSignatureDescription
new()fn() -> SelfMint a fresh random key.
parse(s)fn(&str) -> Result<Self, uuid::Error>Parse a key from a UUID string.
DisplayFormats as the underlying UUID.

SessionRefRegistry

Per-context registry of live session references. Internally Arc<dyn Any + Send + Sync> keyed by RegistryKey and guarded by a tokio::sync::RwLock.

pub struct SessionRefRegistry { /* ... */ }
MethodSignatureDescription
new()fn() -> SelfCreate an empty registry.
insert_arc()async fn(&self, Arc<dyn Any + Send + Sync>) -> Result<RegistryKey, SessionRefError>Insert a type-erased Arc directly. Returns the freshly minted key or CapacityExceeded.
insert::<T>()async fn(&self, T) -> Result<RegistryKey, SessionRefError>Insert any Any + Send + Sync + 'static value, wrapping it in an Arc for you.
get_any()async fn(&self, RegistryKey) -> Option<Arc<dyn Any + Send + Sync>>Look up the type-erased entry. Bindings call this and downcast.
get::<T>()async fn(&self, RegistryKey) -> Option<Arc<T>>Look up and downcast to a concrete Arc<T>.
remove()async fn(&self, RegistryKey) -> Option<Arc<dyn Any + Send + Sync>>Remove a single entry, returning the removed value if present.
drain()async fn(&self) -> usizeDrain all entries, returning the number removed.
len()async fn(&self) -> usizeNumber of currently live entries.
is_empty()async fn(&self) -> boolWhether the registry has any live entries.
keys()async fn(&self) -> Vec<RegistryKey>Iterate every key currently in the registry. Used by the snapshot walker to apply SessionPausePolicy uniformly.

SessionPausePolicy

Controls what happens to live session references when a workflow is paused or snapshotted.

#[derive(Debug, Clone, Copy, Default, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum SessionPausePolicy {
    #[default]
    PickleOrError,
    WarnDrop,
    HardError,
}
VariantBehaviour
PickleOrError (default)Attempt to pickle each live ref into the snapshot. On any failure, raise WorkflowError::SessionRefsNotSerializable and abort the snapshot. Recommended.
WarnDropDrop live refs from the snapshot, emit tracing::warn! per drop, and store a diagnostic report in snapshot metadata. On resume, accessing a dropped field raises a clear runtime error from the binding.
HardErrorRefuse to pause if any live refs are in flight. Raises WorkflowError::SessionRefsNotSerializable immediately.

SessionRefError

Error type for registry operations.

#[derive(Debug, thiserror::Error)]
pub enum SessionRefError {
    #[error("session ref registry capacity exceeded ({cap} entries) — too many live references in this workflow run")]
    CapacityExceeded { cap: usize },
}
VariantDescription
CapacityExceeded { cap: usize }Returned when SessionRefRegistry::insert_arc is called while the registry already holds MAX_SESSION_REFS_PER_RUN entries.

Snapshot exclusion

Session-ref entries are deliberately excluded from Context::snapshot_state(), mirroring the existing objects exclusion. Live in-process references cannot meaningfully round-trip through a serialized snapshot, so the snapshot walker applies SessionPausePolicy at pause time instead. See State Snapshot and Restore for the callout.


Workflow

WorkflowBuilder

The builder exposes a fluent API for configuring a workflow before build(). The full set of builder methods is documented in the guides; the entry below covers the session-ref configuration knob added alongside the session reference registry.

WorkflowBuilder::session_pause_policy

pub fn session_pause_policy(mut self, policy: SessionPausePolicy) -> Self

Configures the policy applied to live session references when the workflow is paused or snapshotted. Defaults to PickleOrError. See SessionPausePolicy for the full variant matrix.

Usage:

use blazen_core::{WorkflowBuilder, session_ref::SessionPausePolicy};

let workflow = WorkflowBuilder::new("my-workflow")
    .step(my_step)
    .session_pause_policy(SessionPausePolicy::WarnDrop)
    .build()?;

WorkflowHandler

The handle returned after starting a workflow. Provides await/stream/pause modes for consuming workflow results.

WorkflowHandler::session_refs

#[must_use]
pub fn session_refs(&self) -> Arc<SessionRefRegistry>

Returns a clone of the session-ref registry handle. Bindings call this after result() to resolve any __blazen_session_ref__ markers carried by the final event, ensuring identity-preserving access to live Python / JS objects passed via event payloads.

The returned Arc keeps the registry alive past the event loop’s exit so the final result event can still resolve live-ref markers even after the original Context has been dropped.

WorkflowHandler::result

pub async fn result(&self) -> Result<WorkflowResult>

Await the final workflow result. The returned WorkflowResult has an .event field containing the final event (typically a StopEvent). Use result.event.downcast_ref::<StopEvent>() to access typed data, or result.event.to_json() for a JSON representation.

WorkflowHandler::pause

pub fn pause(&self) -> Result<()>

Signal the workflow to pause. This is a synchronous, non-consuming call — it does not return a snapshot. After pausing, call snapshot() to obtain the serialized state.

WorkflowHandler::snapshot

pub async fn snapshot(&self) -> Result<String>

Capture the workflow’s current state as a JSON string. Typically called after pause(). The snapshot can be persisted and later passed to Workflow::resume().

WorkflowHandler::resume_in_place

pub fn resume_in_place(&self)

Resume a paused workflow in-place, continuing execution from where it left off.

WorkflowHandler::respond_to_input

pub fn respond_to_input(&self, request_id: String, response: serde_json::Value)

Supply a response to a pending InputRequestEvent. The request_id must match the ID from the original request. The workflow will route the response to the appropriate step and continue.

WorkflowHandler::abort

pub async fn abort(&self) -> Result<()>

Abort the running workflow. Any pending steps are cancelled and the workflow terminates with an error.

WorkflowError variants

blazen_core::WorkflowError is the unified workflow error type. The session-ref subsystem introduces one new variant; other variants (e.g. Paused, InputRequired, Other) are documented in the workflow guides.

SessionRefsNotSerializable

#[error("session refs cannot be serialized for snapshot: {keys:?}")]
SessionRefsNotSerializable {
    /// String-formatted UUIDs of the live session refs that could not
    /// be persisted.
    keys: Vec<String>,
}

One or more live session references could not be serialized for a snapshot. The keys vector contains the string-formatted UUIDs of the offending entries. Produced by the default PickleOrError pause policy when a live ref is not picklable, and by HardError whenever any live refs are in flight at pause time.


Compute Platform

The compute module provides a unified trait system for async, job-based media generation providers (fal.ai, Replicate, RunPod, etc.) that model a submit-poll-retrieve workflow for GPU workloads.

ComputeProvider

The base trait for compute providers.

#[async_trait]
pub trait ComputeProvider: Send + Sync {
    fn provider_id(&self) -> &str;

    async fn submit(&self, request: ComputeRequest) -> Result<JobHandle, BlazenError>;

    async fn status(&self, job: &JobHandle) -> Result<JobStatus, BlazenError>;

    async fn result(&self, job: JobHandle) -> Result<ComputeResult, BlazenError>;

    async fn cancel(&self, job: &JobHandle) -> Result<(), BlazenError>;

    // Default: submit then wait for result
    async fn run(&self, request: ComputeRequest) -> Result<ComputeResult, BlazenError> {
        let job = self.submit(request).await?;
        self.result(job).await
    }
}

ImageGeneration

Image generation and upscaling. Requires ComputeProvider as a supertrait.

#[async_trait]
pub trait ImageGeneration: ComputeProvider {
    async fn generate_image(&self, request: ImageRequest) -> Result<ImageResult, BlazenError>;
    async fn upscale_image(&self, request: UpscaleRequest) -> Result<ImageResult, BlazenError>;
}

Usage:

use blazen_llm::compute::{ImageGeneration, ImageRequest};

let result = provider.generate_image(
    ImageRequest::new("a cat in space")
        .with_size(1024, 1024)
        .with_count(2)
        .with_negative_prompt("blurry")
        .with_model("flux-dev"),
).await?;

for image in &result.images {
    println!("url: {:?}, {}x{}", image.media.url, image.width.unwrap_or(0), image.height.unwrap_or(0));
}

VideoGeneration

Video synthesis from text or images. Requires ComputeProvider as a supertrait.

#[async_trait]
pub trait VideoGeneration: ComputeProvider {
    async fn text_to_video(&self, request: VideoRequest) -> Result<VideoResult, BlazenError>;
    async fn image_to_video(&self, request: VideoRequest) -> Result<VideoResult, BlazenError>;
}

Usage:

use blazen_llm::compute::{VideoGeneration, VideoRequest};

// Text-to-video
let result = provider.text_to_video(
    VideoRequest::new("a sunset timelapse")
        .with_duration(5.0)
        .with_size(1920, 1080)
        .with_model("kling"),
).await?;

// Image-to-video
let result = provider.image_to_video(
    VideoRequest::for_image("https://example.com/img.png", "animate this scene")
        .with_duration(3.0),
).await?;

AudioGeneration

Audio synthesis including TTS, music, and sound effects. Requires ComputeProvider as a supertrait.

#[async_trait]
pub trait AudioGeneration: ComputeProvider {
    async fn text_to_speech(&self, request: SpeechRequest) -> Result<AudioResult, BlazenError>;

    // Default: returns BlazenError::Unsupported
    async fn generate_music(&self, request: MusicRequest) -> Result<AudioResult, BlazenError>;

    // Default: returns BlazenError::Unsupported
    async fn generate_sfx(&self, request: MusicRequest) -> Result<AudioResult, BlazenError>;
}

generate_music() and generate_sfx() have default implementations that return BlazenError::Unsupported. Providers override only the methods they support.

Usage:

use blazen_llm::compute::{AudioGeneration, SpeechRequest, MusicRequest};

let speech = provider.text_to_speech(
    SpeechRequest::new("Hello world")
        .with_voice("alloy")
        .with_language("en")
        .with_speed(1.0)
        .with_voice_url("https://example.com/voice.wav") // voice cloning
        .with_model("tts-1"),
).await?;

let music = provider.generate_music(
    MusicRequest::new("upbeat jazz")
        .with_duration(30.0)
        .with_model("musicgen"),
).await?;

Transcription

Audio transcription (speech-to-text). Requires ComputeProvider as a supertrait.

#[async_trait]
pub trait Transcription: ComputeProvider {
    async fn transcribe(
        &self,
        request: TranscriptionRequest,
    ) -> Result<TranscriptionResult, BlazenError>;
}

Usage:

use blazen_llm::compute::{Transcription, TranscriptionRequest};

let result = provider.transcribe(
    TranscriptionRequest::new("https://example.com/audio.mp3")
        .with_language("en")
        .with_diarize(true)
        .with_model("whisper-v3"),
).await?;

println!("Full text: {}", result.text);
for segment in &result.segments {
    println!("[{:.1}s - {:.1}s] {}: {}",
        segment.start, segment.end,
        segment.speaker.as_deref().unwrap_or("?"),
        segment.text,
    );
}

ThreeDGeneration

3D model generation from text or images. Requires ComputeProvider as a supertrait.

#[async_trait]
pub trait ThreeDGeneration: ComputeProvider {
    async fn generate_3d(&self, request: ThreeDRequest) -> Result<ThreeDResult, BlazenError>;
}

Usage:

use blazen_llm::compute::{ThreeDGeneration, ThreeDRequest};

// Text-to-3D
let result = provider.generate_3d(
    ThreeDRequest::new("a 3D cat")
        .with_format("glb")
        .with_model("triposr"),
).await?;

// Image-to-3D
let result = provider.generate_3d(
    ThreeDRequest::from_image("https://example.com/cat.png")
        .with_format("obj"),
).await?;

for model_3d in &result.models {
    println!("vertices: {:?}, faces: {:?}, textures: {}, animations: {}",
        model_3d.vertex_count, model_3d.face_count,
        model_3d.has_textures, model_3d.has_animations,
    );
}

Compute Request Types

ImageRequest

FieldTypeDescription
promptStringText prompt describing the desired image
negative_promptOption<String>Things to avoid in the image
widthOption<u32>Desired width in pixels
heightOption<u32>Desired height in pixels
num_imagesOption<u32>Number of images to generate
modelOption<String>Model override
parametersserde_json::ValueAdditional provider-specific parameters

Builder: ImageRequest::new(prompt).with_size(w, h).with_count(n).with_negative_prompt(p).with_model(m)

UpscaleRequest

FieldTypeDescription
image_urlStringURL of the image to upscale
scalef32Scale factor (e.g. 2.0, 4.0)
modelOption<String>Model override
parametersserde_json::ValueAdditional provider-specific parameters

Builder: UpscaleRequest::new(url, scale).with_model(m)

VideoRequest

FieldTypeDescription
promptStringText prompt
image_urlOption<String>Source image for image-to-video
duration_secondsOption<f32>Desired duration in seconds
negative_promptOption<String>Things to avoid
widthOption<u32>Desired width in pixels
heightOption<u32>Desired height in pixels
modelOption<String>Model override
parametersserde_json::ValueAdditional provider-specific parameters

Builder: VideoRequest::new(prompt) or VideoRequest::for_image(url, prompt), then .with_duration(s).with_size(w, h).with_model(m)

SpeechRequest

FieldTypeDescription
textStringText to synthesize
voiceOption<String>Voice identifier (provider-specific)
voice_urlOption<String>Reference voice URL for voice cloning
languageOption<String>Language code (e.g. "en", "fr")
speedOption<f32>Speed multiplier (1.0 = normal)
modelOption<String>Model override
parametersserde_json::ValueAdditional provider-specific parameters

Builder: SpeechRequest::new(text).with_voice(v).with_voice_url(url).with_language(l).with_speed(s).with_model(m)

MusicRequest

FieldTypeDescription
promptStringText prompt
duration_secondsOption<f32>Desired duration in seconds
modelOption<String>Model override
parametersserde_json::ValueAdditional provider-specific parameters

Builder: MusicRequest::new(prompt).with_duration(s).with_model(m)

TranscriptionRequest

FieldTypeDescription
audio_urlStringURL of the audio file
languageOption<String>Language hint
diarizeboolEnable speaker diarization (default: false)
modelOption<String>Model override
parametersserde_json::ValueAdditional provider-specific parameters

Builder: TranscriptionRequest::new(url).with_language(l).with_diarize(true).with_model(m)

ThreeDRequest

FieldTypeDescription
promptStringText prompt
image_urlOption<String>Source image for image-to-3D
formatOption<String>Output format (e.g. "glb", "obj", "usdz")
modelOption<String>Model override
parametersserde_json::ValueAdditional provider-specific parameters

Builder: ThreeDRequest::new(prompt) or ThreeDRequest::from_image(url), then .with_format(f).with_model(m)


Compute Result Types

ImageResult

FieldTypeDescription
imagesVec<GeneratedImage>The generated/upscaled images
timingRequestTimingRequest timing breakdown
costOption<f64>Cost in USD
metadataserde_json::ValueProvider-specific metadata

VideoResult

FieldTypeDescription
videosVec<GeneratedVideo>The generated videos
timingRequestTimingRequest timing breakdown
costOption<f64>Cost in USD
metadataserde_json::ValueProvider-specific metadata

AudioResult

FieldTypeDescription
audioVec<GeneratedAudio>The generated audio clips
timingRequestTimingRequest timing breakdown
costOption<f64>Cost in USD
metadataserde_json::ValueProvider-specific metadata

ThreeDResult

FieldTypeDescription
modelsVec<Generated3DModel>The generated 3D models
timingRequestTimingRequest timing breakdown
costOption<f64>Cost in USD
metadataserde_json::ValueProvider-specific metadata

TranscriptionResult

FieldTypeDescription
textStringFull transcribed text
segmentsVec<TranscriptionSegment>Time-aligned segments
languageOption<String>Detected/specified language code
timingRequestTimingRequest timing breakdown
costOption<f64>Cost in USD
metadataserde_json::ValueProvider-specific metadata

TranscriptionSegment

FieldTypeDescription
textStringTranscribed text for this segment
startf64Start time in seconds
endf64End time in seconds
speakerOption<String>Speaker label (if diarization was enabled)

Compute Job Types

ComputeRequest

FieldTypeDescription
modelStringModel/endpoint to run (e.g. "fal-ai/flux/dev")
inputserde_json::ValueInput parameters as JSON (model-specific)
webhookOption<String>Webhook URL for async completion notification

ComputeResult

FieldTypeDescription
jobOption<JobHandle>The job handle that produced this result
outputserde_json::ValueOutput data (model-specific JSON)
timingRequestTimingRequest timing breakdown
costOption<f64>Cost in USD
metadataserde_json::ValueProvider-specific metadata

JobHandle

FieldTypeDescription
idStringProvider-assigned job identifier
providerStringProvider name (e.g. "fal")
modelStringModel/endpoint that was invoked
submitted_atDateTime<Utc>When the job was submitted

JobStatus

pub enum JobStatus {
    Queued,
    Running,
    Completed,
    Failed { error: String },
    Cancelled,
}

Media

MediaType

Exhaustive enumeration of media formats with detection support. Covers images, video, audio, 3D models, documents, and a catch-all Other variant.

Variants:

CategoryVariants
ImagePng, Jpeg, WebP, Gif, Svg, Bmp, Tiff, Avif, Ico
VideoMp4, WebM, Mov, Avi, Mkv
AudioMp3, Wav, Ogg, Flac, Aac, M4a, WebmAudio
3DGlb, Gltf, Obj, Fbx, Usdz, Stl, Ply
DocumentPdf
Catch-allOther { mime: String }

Methods:

MethodSignatureDescription
mime()&self -> &strReturn the MIME type string
extension()&self -> &strReturn the canonical file extension (no dot)
magic_bytes()&self -> Option<&'static [u8]>Return the magic byte signature, if any
detect(bytes)fn(&[u8]) -> Option<Self>Detect media type from file header bytes
from_mime(mime)fn(&str) -> SelfParse a MIME string (unknown = Other)
from_extension(ext)fn(&str) -> SelfParse a file extension (unknown = Other)
is_image()&self -> boolIs this an image format?
is_video()&self -> boolIs this a video format?
is_audio()&self -> boolIs this an audio format?
is_3d()&self -> boolIs this a 3D model format?
is_vector()&self -> boolIs this a text-based format (SVG, GLTF, OBJ)?

MediaType implements Display (outputs the MIME string).

Example:

use blazen_llm::MediaType;

let mt = MediaType::from_extension("png");
assert_eq!(mt.mime(), "image/png");
assert!(mt.is_image());

// Detect from raw bytes
let bytes = [0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A];
assert_eq!(MediaType::detect(&bytes), Some(MediaType::Png));

MediaOutput

A single piece of generated media content. At least one of url, base64, or raw_content will be populated.

FieldTypeDescription
urlOption<String>URL where the media can be downloaded
base64Option<String>Base64-encoded media data
raw_contentOption<String>Raw text content (SVG, OBJ, GLTF JSON)
media_typeMediaTypeFormat of the media
file_sizeOption<u64>File size in bytes
metadataserde_json::ValueProvider-specific metadata

Constructors:

let output = MediaOutput::from_url("https://example.com/img.png", MediaType::Png);
let output = MediaOutput::from_base64("iVBORw0KGgo=", MediaType::Png);

GeneratedImage

FieldTypeDescription
mediaMediaOutputThe image media output
widthOption<u32>Width in pixels
heightOption<u32>Height in pixels

GeneratedVideo

FieldTypeDescription
mediaMediaOutputThe video media output
widthOption<u32>Width in pixels
heightOption<u32>Height in pixels
duration_secondsOption<f32>Duration in seconds
fpsOption<f32>Frames per second

GeneratedAudio

FieldTypeDescription
mediaMediaOutputThe audio media output
duration_secondsOption<f32>Duration in seconds
sample_rateOption<u32>Sample rate in Hz
channelsOption<u8>Number of audio channels

Generated3DModel

FieldTypeDescription
mediaMediaOutputThe 3D model media output
vertex_countOption<u64>Total vertex count
face_countOption<u64>Total face/triangle count
has_texturesboolWhether the model includes textures
has_animationsboolWhether the model includes animations

LocalModel Trait

The LocalModel trait provides explicit load/unload lifecycle management for models running in-process (llama.cpp, whisper.cpp, etc.). Remote API providers do not implement this trait.

#[async_trait]
pub trait LocalModel: Send + Sync {
    async fn load(&self) -> Result<(), BlazenError>;
    async fn unload(&self) -> Result<(), BlazenError>;
    async fn is_loaded(&self) -> bool;
    fn device(&self) -> blazen_llm::Device { Device::Cpu }
    async fn memory_bytes(&self) -> Option<u64>;
}
MethodDescription
load()Load the model into memory/VRAM. Idempotent.
unload()Free the model’s memory/VRAM. Idempotent.
is_loaded()Whether the model is currently loaded.
device()Which device the model targets. Determines which pool the ModelManager charges this model against. Defaults to Device::Cpu.
memory_bytes()Approximate memory footprint in bytes (host RAM if device() is Device::Cpu, GPU VRAM otherwise), or None if unknown.

A type can implement both CompletionModel and LocalModel:

struct MyLocalLLM { /* ... */ }

#[async_trait::async_trait]
impl CompletionModel for MyLocalLLM {
    fn model_id(&self) -> &str { "my-local-llm" }
    async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse, BlazenError> {
        self.load().await?; // auto-load on first call
        // inference logic
        todo!()
    }
    async fn stream(&self, request: CompletionRequest) -> Result<Pin<Box<dyn Stream<Item = Result<StreamChunk, BlazenError>> + Send>>, BlazenError> {
        todo!()
    }
}

#[async_trait::async_trait]
impl LocalModel for MyLocalLLM {
    async fn load(&self) -> Result<(), BlazenError> { /* load weights */ Ok(()) }
    async fn unload(&self) -> Result<(), BlazenError> { /* free VRAM */ Ok(()) }
    async fn is_loaded(&self) -> bool { true }
    fn device(&self) -> Device { Device::Cuda(0) }
    async fn memory_bytes(&self) -> Option<u64> { Some(4_000_000_000) }
}

ModelManager

:::caution[Capacity, not performance] ModelManager is a memory budget bookkeeper, not a performance scheduler. It answers “will this fit?” — not “will this run fast?”. Whether a 70B model loaded on CPU is useful at 1–3 tok/s is a workload-choice question the manager intentionally does not answer. :::

Per-pool memory budget-aware model manager with LRU eviction. Tracks registered LocalModel instances and their estimated memory footprint, indexed by the Pool each model targets (Pool::Cpu or Pool::Gpu(index)). When loading a model would exceed that pool’s budget, the least-recently-used loaded model in the same pool is unloaded first. Models in different pools never evict each other.

use std::collections::HashMap;
use std::sync::Arc;
use blazen_llm::Pool;
use blazen_manager::ModelManager;

// Common case: CPU RAM + GPU(0) VRAM budgets in GB.
let manager = ModelManager::with_budgets_gb(64.0, 24.0);

// Explicit per-pool budgets in bytes.
let manager = ModelManager::new(HashMap::from([
    (Pool::Cpu, 64 * 1024 * 1024 * 1024),
    (Pool::Gpu(0), 24 * 1024 * 1024 * 1024),
]));

Methods

MethodSignatureDescription
newfn(pool_budgets: HashMap<Pool, u64>) -> SelfCreate a manager with explicit per-pool byte budgets.
with_budgets_gbfn(cpu_ram_gb: f64, gpu_vram_gb: f64) -> SelfConvenience constructor for Pool::Cpu + Pool::Gpu(0) budgets in GB.
registerasync fn(&self, id: &str, model: Arc<dyn LocalModel>, memory_estimate_bytes: u64)Register a model. Starts unloaded. The model’s device() selects the pool.
loadasync fn(&self, id: &str) -> Result<()>Load a model, evicting LRU models within the same pool if needed.
unloadasync fn(&self, id: &str) -> Result<()>Unload a model and free its memory.
is_loadedasync fn(&self, id: &str) -> boolCheck if a model is currently loaded.
ensure_loadedasync fn(&self, id: &str) -> Result<()>Alias for load().
used_bytesasync fn(&self, pool: Pool) -> u64Bytes currently used by loaded models in the given pool.
available_bytesasync fn(&self, pool: Pool) -> u64Bytes available within the given pool’s budget.
poolsfn(&self) -> Vec<(Pool, u64)>All configured pools and their budgets.
statusasync fn(&self) -> Vec<ModelStatus>Status of all registered models.

ModelStatus

pub struct ModelStatus {
    pub id: String,
    pub loaded: bool,
    pub memory_estimate_bytes: u64,
    pub pool: Pool,
}
FieldTypeDescription
idStringModel identifier.
loadedboolWhether the model is currently loaded.
memory_estimate_bytesu64Estimated memory footprint in bytes (interpreted against pool).
poolPoolWhich pool this model is charged against (Pool::Cpu or Pool::Gpu(N)).

Usage:

use std::sync::Arc;
use blazen_llm::Pool;
use blazen_manager::ModelManager;

let manager = ModelManager::with_budgets_gb(64.0, 24.0);
manager.register("llama", Arc::new(my_llama_model), 8 * 1_073_741_824).await;
manager.register("whisper", Arc::new(my_whisper_model), 2 * 1_073_741_824).await;

manager.load("llama").await?;
assert!(manager.is_loaded("llama").await);
println!("GPU used:      {} bytes", manager.used_bytes(Pool::Gpu(0)).await);
println!("GPU available: {} bytes", manager.available_bytes(Pool::Gpu(0)).await);

// Loading whisper may evict llama if the GPU pool is tight (only if both target Gpu(0))
manager.load("whisper").await?;
manager.unload("llama").await?;

Pricing

The pricing module provides a global thread-safe registry of per-model pricing data, pre-seeded with defaults for well-known models.

PricingEntry

FieldTypeDescription
input_per_millionf64Cost per million input tokens (USD).
output_per_millionf64Cost per million output tokens (USD).

register_pricing()

Register or override pricing for a model. Model IDs are normalized before storage.

use blazen_llm::{register_pricing, PricingEntry};

register_pricing("my-model", PricingEntry {
    input_per_million: 1.0,
    output_per_million: 2.0,
});

lookup_pricing()

Look up pricing for a model by ID. Returns None if the model is unknown.

use blazen_llm::lookup_pricing;

if let Some(entry) = lookup_pricing("gpt-4o") {
    println!("Input: ${}/M tokens", entry.input_per_million);
}

compute_cost()

Compute the cost of a request given a model ID and token usage.

use blazen_llm::{compute_cost, TokenUsage};

let usage = TokenUsage { prompt_tokens: 1000, completion_tokens: 500, total_tokens: 1500 };
if let Some(cost) = compute_cost("gpt-4o", &usage) {
    println!("Cost: ${:.4}", cost);
}

MemoryBackend Trait

Low-level storage backend used by Memory. Backends are responsible for persistence and band-based candidate retrieval. They do not perform embedding or ELID encoding.

#[async_trait]
pub trait MemoryBackend: Send + Sync {
    async fn put(&self, entry: StoredEntry) -> Result<()>;
    async fn get(&self, id: &str) -> Result<Option<StoredEntry>>;
    async fn delete(&self, id: &str) -> Result<bool>;
    async fn list(&self) -> Result<Vec<StoredEntry>>;
    async fn len(&self) -> Result<usize>;
    async fn is_empty(&self) -> Result<bool>; // default: self.len() == 0
    async fn search_by_bands(
        &self,
        bands: &[u64],
        limit: usize,
    ) -> Result<Vec<StoredEntry>>;
}

Implementing a Custom Backend

use blazen_memory::store::{MemoryBackend, StoredEntry};
use anyhow::Result;

struct PostgresBackend {
    pool: sqlx::PgPool,
}

#[async_trait::async_trait]
impl MemoryBackend for PostgresBackend {
    async fn put(&self, entry: StoredEntry) -> Result<()> {
        // INSERT or UPDATE in Postgres
        todo!()
    }

    async fn get(&self, id: &str) -> Result<Option<StoredEntry>> {
        // SELECT by id
        todo!()
    }

    async fn delete(&self, id: &str) -> Result<bool> {
        // DELETE by id, return true if row existed
        todo!()
    }

    async fn list(&self) -> Result<Vec<StoredEntry>> {
        // SELECT all
        todo!()
    }

    async fn len(&self) -> Result<usize> {
        // SELECT COUNT(*)
        todo!()
    }

    async fn search_by_bands(
        &self,
        bands: &[u64],
        limit: usize,
    ) -> Result<Vec<StoredEntry>> {
        // Query entries sharing at least one band
        todo!()
    }
}

Built-in Backends

BackendDescription
InMemoryBackendIn-process HashMap storage. Fast, no persistence.
JsonlBackendAppend-only JSONL file storage.
ValkeyBackendRedis/Valkey-backed storage.

Error Handling

BlazenError

The unified error type for all Blazen LLM and compute operations.

VariantFieldsDescription
Authmessage: StringAuthentication failed
RateLimitretry_after_ms: Option<u64>Rate limited by the provider
Timeoutelapsed_ms: u64Request timed out
Providerprovider: String, message: String, status_code: Option<u16>Provider-specific error
Validationfield: Option<String>, message: StringInvalid input
ContentPolicymessage: StringContent policy violation
Unsupportedmessage: StringRequested capability is not supported
SerializationStringJSON serialization/deserialization error
Requestmessage: String, source: Option<Box<dyn Error>>Network or request-level failure
CompletionCompletionErrorKindLLM completion-specific error
ComputeComputeErrorKindCompute job-specific error
MediaMediaErrorKindMedia-specific error
Toolname: Option<String>, message: StringTool execution error

CompletionErrorKind

VariantDescription
NoContentModel returned no content
ModelNotFound(String)Model not found
InvalidResponse(String)Invalid response from the model
Stream(String)Streaming error

ComputeErrorKind

VariantFieldsDescription
JobFailedmessage: String, error_type: Option<String>, retryable: boolCompute job failed
CancelledJob was cancelled
QuotaExceededmessage: StringProvider quota exceeded

MediaErrorKind

VariantFieldsDescription
Invalidmedia_type: Option<String>, message: StringInvalid media
TooLargesize_bytes: u64, max_bytes: u64Media exceeds size limit

is_retryable()

impl BlazenError {
    pub fn is_retryable(&self) -> bool;
}

Returns true for RateLimit, Timeout, Request, provider errors with status >= 500, and ComputeErrorKind::JobFailed where retryable is true.

Convenience Constructors

BlazenError::auth("invalid api key")
BlazenError::timeout(5000)
BlazenError::timeout_from_duration(elapsed)
BlazenError::request("connection reset")
BlazenError::unsupported("music generation not available")
BlazenError::provider("openai", "internal server error")
BlazenError::validation("prompt must not be empty")
BlazenError::tool_error("unknown tool: foo")
BlazenError::no_content()
BlazenError::model_not_found("gpt-5")
BlazenError::invalid_response("missing content field")
BlazenError::stream_error("unexpected EOF")
BlazenError::job_failed("GPU out of memory")
BlazenError::cancelled()

BlazenError also implements From<serde_json::Error> for automatic conversion.


Custom Providers

Implementing CompletionModel

use blazen_llm::{
    CompletionModel, CompletionRequest, CompletionResponse, StreamChunk, BlazenError,
};
use std::pin::Pin;
use futures_util::Stream;

struct MyProvider {
    api_key: String,
}

#[async_trait::async_trait]
impl CompletionModel for MyProvider {
    fn model_id(&self) -> &str {
        "my-custom-model"
    }

    async fn complete(
        &self,
        request: CompletionRequest,
    ) -> Result<CompletionResponse, BlazenError> {
        // Your HTTP/gRPC/local inference logic here
        todo!()
    }

    async fn stream(
        &self,
        request: CompletionRequest,
    ) -> Result<
        Pin<Box<dyn Stream<Item = Result<StreamChunk, BlazenError>> + Send>>,
        BlazenError,
    > {
        // Your streaming implementation here
        todo!()
    }
}

Once implemented, MyProvider automatically gets StructuredOutput via the blanket impl, so model.extract::<T>(messages) works out of the box.

Implementing ComputeProvider + ImageGeneration

use blazen_llm::compute::*;
use blazen_llm::BlazenError;

struct MyImageProvider {
    api_key: String,
}

#[async_trait::async_trait]
impl ComputeProvider for MyImageProvider {
    fn provider_id(&self) -> &str { "my-image-provider" }

    async fn submit(&self, request: ComputeRequest) -> Result<JobHandle, BlazenError> {
        todo!()
    }

    async fn status(&self, job: &JobHandle) -> Result<JobStatus, BlazenError> {
        todo!()
    }

    async fn result(&self, job: JobHandle) -> Result<ComputeResult, BlazenError> {
        todo!()
    }

    async fn cancel(&self, job: &JobHandle) -> Result<(), BlazenError> {
        todo!()
    }
}

#[async_trait::async_trait]
impl ImageGeneration for MyImageProvider {
    async fn generate_image(
        &self,
        request: ImageRequest,
    ) -> Result<ImageResult, BlazenError> {
        // Convert ImageRequest to your provider's format and call the API
        todo!()
    }

    async fn upscale_image(
        &self,
        request: UpscaleRequest,
    ) -> Result<ImageResult, BlazenError> {
        todo!()
    }
}

Built-in Providers

ProviderFeatureTraits Implemented
OpenAiProvideropenaiCompletionModel, StructuredOutput
OpenAiCompatProvideropenaiCompletionModel, StructuredOutput, ModelRegistry
AnthropicProvideranthropicCompletionModel, StructuredOutput
GeminiProvidergeminiCompletionModel, StructuredOutput, ModelRegistry
AzureOpenAiProviderazureCompletionModel, StructuredOutput
FalProviderfalCompletionModel, StructuredOutput, ComputeProvider, ImageGeneration, VideoGeneration, AudioGeneration, Transcription

OpenAiCompatProvider Presets

OpenAiCompatProvider works with any OpenAI-compatible endpoint. Named constructors are provided for popular services:

use blazen_llm::providers::openai_compat::OpenAiCompatProvider;

let groq = OpenAiCompatProvider::groq("gsk-...");
let openrouter = OpenAiCompatProvider::openrouter("sk-or-...");
let together = OpenAiCompatProvider::together("...");
let mistral = OpenAiCompatProvider::mistral("...");
let deepseek = OpenAiCompatProvider::deepseek("...");
let fireworks = OpenAiCompatProvider::fireworks("...");
let perplexity = OpenAiCompatProvider::perplexity("...");
let xai = OpenAiCompatProvider::xai("...");
let cohere = OpenAiCompatProvider::cohere("...");
let bedrock = OpenAiCompatProvider::bedrock("...", "us-east-1");

Telemetry Exporters

Re-exported from blazen_telemetry at the crate root and gated by the corresponding Cargo features (see Feature Flags). All exporters return a tracing_subscriber::Layer (or install one globally) that is composed into a tracing_subscriber::registry().

LangfuseConfig

Builder-style configuration for the Langfuse exporter. Shipping langfuse pulls in reqwest for the native ingestion client; on wasm32 the dispatcher is a no-op (events are dropped) because Langfuse export is a native-target feature.

MethodSignatureDescription
newfn new(public_key: impl Into<String>, secret_key: impl Into<String>) -> SelfConstruct with the required Langfuse public + secret keys. Defaults host to https://cloud.langfuse.com, batch size to 100, flush interval to 5000 ms
with_hostfn with_host(self, host: impl Into<String>) -> SelfOverride the Langfuse host URL (e.g. https://eu.langfuse.com)
with_batch_sizefn with_batch_size(self, batch_size: usize) -> SelfMaximum number of envelopes buffered before an automatic flush
with_flush_interval_msfn with_flush_interval_ms(self, ms: u64) -> SelfBackground flush cadence in milliseconds

LangfuseConfig derives Debug, Clone, Serialize, Deserialize, and Default so it can be loaded from configuration files. Public/secret-key fields are required; host, batch_size, and flush_interval_ms are populated with defaults during deserialization.

LangfuseLayer

A tracing_subscriber::Layer<S> (where S: Subscriber + for<'a> LookupSpan<'a>) that maps Blazen spans to Langfuse ingestion events:

Blazen span nameLangfuse conceptIngestion event
workflow.run, pipeline.runTracetrace-create
workflow.step, pipeline.stage, pipeline.stage.sequential, pipeline.stage.parallelSpanspan-create
llm.complete, llm.streamGenerationgeneration-create

Construct via init_langfuse. The layer owns an unbounded mpsc sender into a background dispatcher that batches and POSTs to {host}/api/public/ingestion with HTTP basic auth (public_key:secret_key).

init_langfuse

pub fn init_langfuse(config: LangfuseConfig) -> Result<LangfuseLayer, TelemetryError>;

Builds the HTTP client and spawns the background dispatcher on the current Tokio runtime, then returns a LangfuseLayer ready to compose into a subscriber registry. Returns TelemetryError::Langfuse if no Tokio runtime is available (native targets) or if the underlying reqwest::Client cannot be built.

use blazen_telemetry::{LangfuseConfig, init_langfuse};
use tracing_subscriber::prelude::*;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = LangfuseConfig::new("pk-lf-...", "sk-lf-...")
        .with_host("https://cloud.langfuse.com")
        .with_batch_size(50)
        .with_flush_interval_ms(2_500);

    let layer = init_langfuse(config)?;
    tracing_subscriber::registry().with(layer).init();
    Ok(())
}

OtlpConfig

Shared configuration for both OTLP transports.

FieldTypeDescription
endpointStringOTLP endpoint URL. For otlp (gRPC): http://localhost:4317. For otlp-http: http://localhost:4318/v1/traces
service_nameStringReported as the service.name resource attribute on every span

Derives Debug, Clone, Serialize, Deserialize.

init_otlp (gRPC, otlp feature, native only)

pub fn init_otlp(config: OtlpConfig) -> Result<(), Box<dyn std::error::Error>>;

Builds a tonic-backed SpanExporter, installs it into a global SdkTracerProvider, and registers a tracing_subscriber::registry() with EnvFilter, the OTel layer, and a fmt layer in one call. Native targets only — opentelemetry-otlp/grpc-tonic does not compile to wasm32-unknown-unknown.

init_otlp_http (HTTP/protobuf, otlp-http feature)

pub fn init_otlp_http(config: OtlpConfig) -> Result<(), Box<dyn std::error::Error>>;

Same shape as init_otlp but speaks OTLP/HTTP with the binary protobuf encoding. Works on both native and wasm32:

  • Native: registers an internal ReqwestHttpClient (a thin reqwest::Client wrapper) via with_http_client.
  • wasm32: registers WasmFetchHttpClient, a web_sys::fetch-backed opentelemetry_http::HttpClient impl that ships in the blazen_telemetry::exporters::wasm_otlp_client module.

The reason for the indirection: opentelemetry-otlp’s built-in reqwest-client / reqwest-blocking-client features compile a wasm32 reqwest::Client whose send future is !Send, which violates the HttpClient: Send + Sync bound and breaks the build. Pinning our own HttpClient impls per target keeps the trait satisfied without enabling those upstream features.

use blazen_telemetry::{OtlpConfig, init_otlp_http};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    init_otlp_http(OtlpConfig {
        endpoint: "http://localhost:4318/v1/traces".to_string(),
        service_name: "my-blazen-app".to_string(),
    })?;
    // ... run workflow; spans flush via OTLP/HTTP ...
    Ok(())
}

TelemetryError

Returned by init_langfuse. Variants include Langfuse(String) (HTTP-client construction or runtime-lookup failure). Re-exported from blazen_telemetry::TelemetryError.


WASM Embeddings (blazen-embed-tract)

The blazen-embed-tract crate ships two ONNX-runtime-backed embedding providers built on tract:

TypeTargetSource of weights
TractEmbedModelnativehf-hub (Hugging Face download cache, filesystem-backed)
WasmTractEmbedModelwasm32 onlyweb_sys::fetch (URLs supplied by the caller)

The wasm variant exists because hf-hub requires a filesystem and Tokio, neither of which is available in wasm32-unknown-unknown. Both providers share the same TractOptions and pooling logic so callers generic over EmbeddingModel work unchanged.

WasmTractEmbedModel

#[cfg(target_arch = "wasm32")]
use blazen_embed_tract::wasm_provider::{WasmTractEmbedModel, WasmTractError, WasmTractResponse};
use blazen_embed_tract::options::TractOptions;
MethodSignatureDescription
createasync fn create(model_url: &str, tokenizer_url: &str, options: TractOptions) -> Result<Self, WasmTractError>Fetch ONNX weights and a HuggingFace tokenizer.json from the supplied URLs and build a runnable model. options.cache_dir and options.show_download_progress are ignored on wasm32
embedasync fn embed(&self, texts: &[String]) -> Result<WasmTractResponse, WasmTractError>Returns one L2-normalized vector per input. Inference runs synchronously on the wasm main thread
model_idfn model_id(&self) -> &strHugging Face model id resolved from the TractOptions::model_name registry entry
dimensionsfn dimensions(&self) -> usizeOutput embedding dimensionality

WasmTractResponse exposes embeddings: Vec<Vec<f32>> and model: String. WasmTractError variants: UnknownModel(String), Fetch { url, message }, Init(String), Embed(String).

#[cfg(target_arch = "wasm32")]
{
    use blazen_embed_tract::options::TractOptions;
    use blazen_embed_tract::wasm_provider::WasmTractEmbedModel;

    let opts = TractOptions {
        model_name: Some("bge-small-en-v1.5".to_string()),
        ..Default::default()
    };
    let model = WasmTractEmbedModel::create(
        "https://huggingface.co/Xenova/bge-small-en-v1.5/resolve/main/onnx/model.onnx",
        "https://huggingface.co/Xenova/bge-small-en-v1.5/resolve/main/tokenizer.json",
        opts,
    ).await?;
    let out = model.embed(&["hello world".into()]).await?;
}

The fetch loop resolves either window.fetch (browser) or globalThis.fetch (Workers, Deno, Node) so the same provider runs in every wasm host.

Send + Sync are implemented vacuously (unsafe impl) because wasm32-unknown-unknown is single-threaded; this lets WasmTractEmbedModel sit behind Arc<dyn EmbeddingModel> in target-generic code.


Local Inference Backend Re-exports

When you enable a local-inference feature on blazen-llm, the per-backend crate’s public types are re-exported at the blazen_llm crate root so callers do not need to depend on the backend crate directly. Each group of types is gated by its respective feature.

mistralrs feature

Re-exported un-prefixed (the mistralrs backend was the first local provider added and owns the canonical names):

#[cfg(feature = "mistralrs")]
use blazen_llm::{
    MistralRsProvider, MistralRsOptions, MistralRsError,
    ChatMessageInput, ChatRole,
    InferenceChunk, InferenceChunkStream,
    InferenceImage, InferenceImageSource,
    InferenceResult, InferenceToolCall, InferenceUsage,
};

llamacpp feature

Re-exported with a LlamaCpp prefix to avoid colliding with the mistralrs names when both features are enabled simultaneously:

#[cfg(feature = "llamacpp")]
use blazen_llm::{
    LlamaCppProvider, LlamaCppOptions, LlamaCppError,
    LlamaCppChatMessageInput, LlamaCppChatRole,
    LlamaCppInferenceChunk, LlamaCppInferenceChunkStream,
    LlamaCppInferenceResult, LlamaCppInferenceUsage,
};

candle-llm feature

Candle exposes its result type as CandleInferenceResult (already prefixed upstream). The CandleLlmCompletionModel adapter wraps the raw provider in the CompletionModel trait:

#[cfg(feature = "candle-llm")]
use blazen_llm::{
    CandleLlmProvider, CandleLlmCompletionModel,
    CandleLlmOptions, CandleLlmError,
    CandleInferenceResult,
};

Other local backends

FeatureRe-exports
candle-embedCandleEmbedModel, CandleEmbedOptions, CandleEmbedError
embedEmbedModel, EmbedOptions, EmbedResponse, EmbedError (from blazen-embed)
whispercppWhisperCppProvider, WhisperModel, WhisperOptions, WhisperError
piperPiperProvider, PiperOptions, PiperError
diffusionDiffusionProvider, DiffusionOptions, DiffusionScheduler, DiffusionError

All five additional backends follow the same convention — enable the feature on blazen-llm, then import directly from blazen_llm::*.