Rust API Reference

Complete API reference for blazen-llm in Rust

Feature Flags

blazen-llm providers:

Feature	Description
`openai`	Enables `OpenAiProvider` and `OpenAiCompatProvider` (covers OpenRouter, Groq, Together, Mistral, DeepSeek, Fireworks, Perplexity, xAI, Cohere, Bedrock)
`anthropic`	Enables `AnthropicProvider`
`gemini`	Enables `GeminiProvider`
`fal`	Enables `FalProvider` (compute: image, video, audio, 3D)
`azure`	Enables `AzureOpenAiProvider`
`all-providers`	Enables all provider implementations

blazen-llm local-inference backends (each gated behind its own feature, all bundled in the all-local umbrella):

Feature	Re-exports from `blazen_llm::*`
`mistralrs`	`MistralRsProvider`, `ChatMessageInput`, `ChatRole`, `InferenceChunk`, `InferenceChunkStream`, `InferenceImage`, `InferenceImageSource`, `InferenceResult`, `InferenceToolCall`, `InferenceUsage`, `MistralRsError`, `MistralRsOptions`
`llamacpp`	`LlamaCppProvider`, `LlamaCppChatMessageInput`, `LlamaCppChatRole`, `LlamaCppInferenceChunk`, `LlamaCppInferenceChunkStream`, `LlamaCppInferenceResult`, `LlamaCppInferenceUsage`, `LlamaCppError`, `LlamaCppOptions`
`candle-llm`	`CandleLlmProvider`, `CandleLlmCompletionModel`, `CandleInferenceResult`, `CandleLlmError`, `CandleLlmOptions`
`candle-embed`	`CandleEmbedModel`, `CandleEmbedOptions`, `CandleEmbedError`
`embed`	`EmbedModel`, `EmbedOptions`, `EmbedResponse`, `EmbedError`
`whispercpp`	`WhisperCppProvider`, `WhisperModel`, `WhisperOptions`, `WhisperError`
`piper`	`PiperProvider`, `PiperOptions`, `PiperError`
`diffusion`	`DiffusionProvider`, `DiffusionOptions`, `DiffusionScheduler`, `DiffusionError`

blazen-telemetry exporters:

Feature	Description
`spans` (default)	Enables `TracingCompletionModel` and per-span instrumentation hooks
`history`	Enables `WorkflowHistory`, `HistoryEvent`, `HistoryEventKind`, `PauseReason`
`otlp`	OTLP gRPC exporter via `tonic` (`init_otlp` + `OtlpConfig`). Native targets only
`otlp-http`	OTLP HTTP/protobuf exporter via a custom `HttpClient` (`init_otlp_http` + `OtlpConfig`). Works on native and wasm32
`prometheus`	Enables `init_prometheus` + `MetricsLayer`
`langfuse`	Enables `LangfuseConfig`, `LangfuseLayer`, `init_langfuse`
`all`	Enables `spans`, `history`, `otlp`, `otlp-http`, `prometheus`, `langfuse`

Core LLM Traits

`CompletionModel`

The central trait every LLM provider must implement. Supports both one-shot and streaming completions.

#[async_trait]
pub trait CompletionModel: Send + Sync {
    fn model_id(&self) -> &str;

    async fn complete(
        &self,
        request: CompletionRequest,
    ) -> Result<CompletionResponse, BlazenError>;

    async fn stream(
        &self,
        request: CompletionRequest,
    ) -> Result<
        Pin<Box<dyn Stream<Item = Result<StreamChunk, BlazenError>> + Send>>,
        BlazenError,
    >;
}

Usage:

use blazen_llm::{CompletionModel, CompletionRequest, ChatMessage};
use blazen_llm::providers::openai::OpenAiProvider;

let model = OpenAiProvider::new("sk-...");
let request = CompletionRequest::new(vec![
    ChatMessage::user("What is 2 + 2?"),
]);
let response = model.complete(request).await?;
println!("{}", response.content.unwrap_or_default());

Streaming:

use futures_util::StreamExt;

let request = CompletionRequest::new(vec![
    ChatMessage::user("Tell me a story"),
]);
let mut stream = model.stream(request).await?;
while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    if let Some(delta) = &chunk.delta {
        print!("{delta}");
    }
}

`StructuredOutput`

Extract typed data from a model using JSON Schema constraints. This trait has a blanket implementation for every CompletionModel — providers do not need to implement it.

#[async_trait]
pub trait StructuredOutput: CompletionModel {
    async fn extract<T: JsonSchema + DeserializeOwned + Send>(
        &self,
        messages: Vec<ChatMessage>,
    ) -> Result<StructuredResponse<T>, BlazenError>;
}

// Blanket impl: every CompletionModel automatically gets this.
impl<M: CompletionModel> StructuredOutput for M {}

T must implement schemars::JsonSchema and serde::de::DeserializeOwned. The schema is derived at call time via schemars::schema_for! and injected into the request’s response_format.

Usage:

use schemars::JsonSchema;
use serde::Deserialize;
use blazen_llm::StructuredOutput;

#[derive(JsonSchema, Deserialize)]
struct Sentiment {
    label: String,
    score: f64,
}

let result = model.extract::<Sentiment>(vec![
    ChatMessage::user("Analyze sentiment: 'I love Rust'"),
]).await?;
println!("{}: {}", result.data.label, result.data.score);

`EmbeddingModel`

Produces vector embeddings for text inputs.

#[async_trait]
pub trait EmbeddingModel: Send + Sync {
    fn model_id(&self) -> &str;
    fn dimensions(&self) -> usize;
    async fn embed(&self, texts: &[String]) -> Result<EmbeddingResponse, BlazenError>;
}

Usage:

let texts = vec!["Hello world".into(), "Goodbye world".into()];
let response = embedding_model.embed(&texts).await?;
for (i, vector) in response.embeddings.iter().enumerate() {
    println!("text {i}: {} dimensions", vector.len());
}

`Tool`

A callable tool that can be invoked by an LLM during a conversation.

#[async_trait]
pub trait Tool: Send + Sync {
    fn definition(&self) -> ToolDefinition;
    async fn execute(
        &self,
        arguments: serde_json::Value,
    ) -> Result<ToolOutput<serde_json::Value>, BlazenError>;
}

:::note[Migration from earlier versions] The Tool::execute return type changed: it used to return a bare Result<…Value…, BlazenError> and now returns Result<ToolOutput<Value>, BlazenError>. Existing tools that returned a Value continue to compile by changing Ok(value) to Ok(value.into()) — the From<Value> impl wraps it into a ToolOutput with no override. The new ChatMessage::tool_result constructor is also a breaking change for callers that passed a &str; convert via serde_json::Value::String(s.into()) or use serde_json::json!(s). :::

Usage:

use blazen_llm::{Tool, ToolDefinition, ToolOutput, BlazenError};
use async_trait::async_trait;

struct WeatherTool;

#[async_trait]
impl Tool for WeatherTool {
    fn definition(&self) -> ToolDefinition {
        ToolDefinition {
            name: "get_weather".into(),
            description: "Get the weather for a city.".into(),
            parameters: serde_json::json!({
                "type": "object",
                "properties": { "city": { "type": "string" } },
                "required": ["city"],
            }),
        }
    }

    async fn execute(
        &self,
        args: serde_json::Value,
    ) -> Result<ToolOutput<serde_json::Value>, BlazenError> {
        let _city = args["city"].as_str().unwrap_or_default();
        // Common case: return a structured value, no override.
        Ok(serde_json::json!({ "temperature_f": 72, "conditions": "clear" }).into())
    }
}

Sending a summary to the LLM while keeping the full payload visible to callers:

use blazen_llm::{Tool, ToolDefinition, ToolOutput, LlmPayload, BlazenError};
use async_trait::async_trait;

# struct SearchTool;
# #[async_trait]
# impl Tool for SearchTool {
#     fn definition(&self) -> ToolDefinition { unimplemented!() }
async fn execute(
    &self,
    args: serde_json::Value,
) -> Result<ToolOutput<serde_json::Value>, BlazenError> {
    Ok(ToolOutput::with_override(
        serde_json::json!({ "items": [1, 2, 3], "raw": "..." }),
        LlmPayload::Text { text: "Found 3 items.".into() },
    ))
}
# }

The full data payload is preserved in ChatMessage.tool_result so application code can inspect the unredacted result, while only the llm_override is sent to the model on the next turn.

`TypedTool`

A generic wrapper that turns a typed handler Fn(Args) -> Future<Output = Result<ToolOutput<Output>>> into an implementation of Tool. Handles serde_json::from_value of the input and serde_json::to_value of the output for you, and auto-derives the JSON Schema in ToolDefinition::parameters via schemars::schema_for!.

pub struct TypedTool<Args, Output, F>
where
    Args: DeserializeOwned + JsonSchema + Send + 'static,
    Output: Serialize + Send + 'static,
    F: Fn(Args) -> BoxFut<Output> + Send + Sync + 'static,
{ /* ... */ }

impl<Args, Output, F> TypedTool<Args, Output, F> {
    pub fn new(
        name: impl Into<String>,
        description: impl Into<String>,
        handler: F,
    ) -> Self;
}

Usage:

use blazen_llm::{TypedTool, ToolOutput};
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};

#[derive(Deserialize, JsonSchema)]
struct AddArgs { a: i64, b: i64 }

#[derive(Serialize)]
struct AddOutput { sum: i64 }

let add_tool = TypedTool::new(
    "add",
    "Add two numbers.",
    |args: AddArgs| {
        Box::pin(async move {
            Ok(ToolOutput::new(AddOutput { sum: args.a + args.b }))
        })
    },
);

`typed_tool_simple`

Convenience constructor for the no-override common case. The handler returns Result<Output> directly; the wrapper applies ToolOutput::new for you.

pub fn typed_tool_simple<Args, Output, Fut, F>(
    name: impl Into<String>,
    description: impl Into<String>,
    handler: F,
) -> impl Tool
where
    Args: DeserializeOwned + JsonSchema + Send + 'static,
    Output: Serialize + Send + 'static,
    Fut: Future<Output = Result<Output, BlazenError>> + Send + 'static,
    F: Fn(Args) -> Fut + Send + Sync + 'static;

Usage:

use blazen_llm::{typed_tool_simple, BlazenError};

let add_tool = typed_tool_simple(
    "add",
    "Add two numbers.",
    |args: AddArgs| async move {
        Ok::<_, BlazenError>(AddOutput { sum: args.a + args.b })
    },
);

Why it exists: TypedTool does the serde_json::from_value of the input and serde_json::to_value of the output for you, exactly once per call. The JSON Schema in ToolDefinition::parameters is auto-derived from Args via schemars::schema_for!, so you do not have to hand-write the schema or repeat field names.

`ToolOutput`

The return type of Tool::execute. Carries the structured data that callers see, plus an optional llm_override controlling what the LLM receives on the next turn.

pub struct ToolOutput<T = Value> {
    pub data: T,
    pub llm_override: Option<LlmPayload>,
}

Field	Type	Description
`data`	`T`	Structured tool output. Visible to application code via `ChatMessage.tool_result`.
`llm_override`	`Option<LlmPayload>`	If `Some`, replaces the default representation of `data` when serialised into the next prompt. If `None`, each provider applies a sensible default (see `LlmPayload`).

Constructors:

Constructor	Signature	Description
`ToolOutput::new`	`fn(data: T) -> Self`	Wrap structured data with no override. The LLM sees the provider default.
`ToolOutput::with_override`	`fn(data: T, override_payload: LlmPayload) -> Self`	Wrap structured data and pin exactly what the LLM receives next turn.
`From<Value>`	`impl From<Value> for ToolOutput<Value>`	`value.into()` produces `ToolOutput { data: value, llm_override: None }`. Lets `Tool::execute` keep returning bare `Value`s with `Ok(value.into())`.

Usage:

use blazen_llm::{ToolOutput, LlmPayload};
use serde_json::json;

let plain = ToolOutput::new(json!({ "items": [1, 2, 3] }));

let with_summary = ToolOutput::with_override(
    json!({ "items": [1, 2, 3], "raw": "..." }),
    LlmPayload::Text { text: "Found 3 items.".into() },
);

Why it exists: many tools return large structured payloads that the application wants in full (logs, UI, downstream steps), but feeding all of it back into the next LLM call is wasteful or noisy. ToolOutput decouples the two channels so you can return rich data to the caller while pinning a compact summary for the model.

`LlmPayload`

The wire-format-agnostic shape of a tool result as it appears to the LLM. Used as ToolOutput::llm_override and as the second component of ChatMessage::tool_result_view.

#[serde(tag = "kind", rename_all = "snake_case")]
pub enum LlmPayload {
    Text { text: String },
    Json { value: serde_json::Value },
    Parts { parts: Vec<ContentPart> },
    ProviderRaw { provider: ProviderId, value: serde_json::Value },
}

Variant	Description
`Text { text }`	Plain text. Sent as-is to providers that accept string tool results, or wrapped in `[{type: "text", text}]` for providers that require parts.
`Json { value }`	Structured JSON. Each provider serialises this in its native shape (see per-provider behaviour below).
`Parts { parts }`	A `Vec<ContentPart>` for multimodal tool results (text + images + files). Used together with `ChatMessage::tool_result_parts`.
`ProviderRaw { provider, value }`	An exact, provider-specific payload. Bypasses Blazen’s translation layer and is forwarded verbatim only when the active provider matches `provider`; other providers fall back to the default representation of `data`.

Usage:

use blazen_llm::{LlmPayload, ProviderId};
use serde_json::json;

LlmPayload::Text { text: "Found 3 results.".into() };
LlmPayload::Json { value: json!({ "items": [1, 2, 3] }) };
LlmPayload::ProviderRaw {
    provider: ProviderId::Anthropic,
    value: json!([{"type": "text", "text": "..."}]),
};

Per-provider behaviour for the default (no llm_override) case:

When a tool returns structured data and no llm_override, each provider sends a sensible default to the LLM:

OpenAI / OpenAI-compat / Azure / Responses / Fal: the data is JSON-stringified into the content field of the tool message.
Anthropic: structured data becomes [{type: "text", text: <stringified-json>}] inside tool_result.content.
Gemini: structured object data is passed natively as functionResponse.response. Scalars are wrapped as {result: <scalar>}.

`ProviderId`

Tags an LlmPayload::ProviderRaw variant with the provider whose wire format the value follows. The runtime uses this to decide whether to forward the raw payload or fall back to the default representation.

pub enum ProviderId {
    OpenAi,
    OpenAiCompat,
    Azure,
    Anthropic,
    Gemini,
    Responses,
    Fal,
}

Variant	Provider
`OpenAi`	`OpenAiProvider`
`OpenAiCompat`	`OpenAiCompatProvider` (OpenRouter, Groq, Together, etc.)
`Azure`	`AzureOpenAiProvider`
`Anthropic`	`AnthropicProvider`
`Gemini`	`GeminiProvider`
`Responses`	OpenAI Responses API provider
`Fal`	`FalProvider`

`ModelRegistry`

Allows providers to advertise their available models.

#[async_trait]
pub trait ModelRegistry: Send + Sync {
    async fn list_models(&self) -> Result<Vec<ModelInfo>, BlazenError>;
    async fn get_model(&self, model_id: &str) -> Result<Option<ModelInfo>, BlazenError>;
}

`ModelInfo`

Field	Type	Description
`id`	`String`	Model identifier used in API requests (e.g. `"gpt-4o"`)
`name`	`Option<String>`	Human-readable display name
`provider`	`String`	Provider that serves this model
`context_length`	`Option<u64>`	Maximum context window in tokens
`pricing`	`Option<ModelPricing>`	Pricing information
`capabilities`	`ModelCapabilities`	What this model can do

`ModelPricing`

Field	Type	Description
`input_per_million`	`Option<f64>`	Cost per million input tokens (USD)
`output_per_million`	`Option<f64>`	Cost per million output tokens (USD)
`per_image`	`Option<f64>`	Cost per image (image generation models)
`per_second`	`Option<f64>`	Cost per second of compute

`ModelCapabilities`

Field	Type	Description
`chat`	`bool`	Supports chat completions
`streaming`	`bool`	Supports streaming responses
`tool_use`	`bool`	Supports tool/function calling
`structured_output`	`bool`	Supports JSON schema constraints
`vision`	`bool`	Supports image inputs
`image_generation`	`bool`	Supports image generation
`embeddings`	`bool`	Supports text embeddings
`video_generation`	`bool`	Video generation support
`text_to_speech`	`bool`	Text-to-speech synthesis
`speech_to_text`	`bool`	Speech-to-text transcription
`audio_generation`	`bool`	Audio generation (music, SFX)
`three_d_generation`	`bool`	3D model generation

Types

`ChatMessage`

A single message in a chat conversation.

Field	Type	Description
`role`	`Role`	Who produced this message
`content`	`MessageContent`	The message payload. For tool-result messages with a plain string return, the string lives here as `MessageContent::Text(s)`.
`tool_calls`	`Vec<ToolCall>`	Tool invocations requested by the assistant on this message (empty for non-assistant roles).
`tool_result`	`Option<ToolOutput<serde_json::Value>>`	Structured tool-result payload for tool-role messages. `Some` only when the tool returned non-string `data` or supplied an `llm_override`. Plain-string results live in `content` instead.
`name`	`Option<String>`	Tool name (set for tool-role messages).
`tool_call_id`	`Option<String>`	Provider-assigned `id` from the originating `ToolCall`.

Constructors:

// Text messages
ChatMessage::system("You are a helpful assistant")
ChatMessage::user("Hello!")
ChatMessage::assistant("Hi there!")
ChatMessage::tool("{ \"result\": 42 }")

// Multimodal messages
ChatMessage::user_image_url("Describe this", "https://img.com/a.png", Some("image/png"))
ChatMessage::user_image_base64("What is this?", "iVBORw0K...", "image/jpeg")
ChatMessage::user_parts(vec![
    ContentPart::Text { text: "Look at this:".into() },
    ContentPart::Image(ImageContent {
        source: ImageSource::Url { url: "https://...".into() },
        media_type: Some("image/png".into()),
    }),
    ContentPart::File(FileContent {
        source: ImageSource::Url { url: "https://...".into() },
        media_type: "application/pdf".into(),
        filename: Some("doc.pdf".into()),
    }),
])

`ChatMessage::tool_result`

Build a tool-role message that closes a prior ToolCall. Routes plain strings into content and structured payloads (or anything with an llm_override) onto the new tool_result sibling field.

pub fn tool_result(
    call_id: impl Into<String>,
    name: impl Into<String>,
    output: impl Into<ToolOutput<serde_json::Value>>,
) -> Self

Argument	Description
`call_id`	The `id` from the originating `ToolCall`. Stored on `tool_call_id`.
`name`	The tool name. Stored on `name`.
`output`	A `ToolOutput<Value>` (or anything that converts via `From<Value>`, e.g. `serde_json::Value` directly).

Routing rules:

If output.data == Value::String(s) and output.llm_override.is_none(), the string is moved into content as MessageContent::Text(s) and tool_result stays None.
Otherwise, output is stored verbatim on tool_result and content is set to MessageContent::Text(String::new()).

Usage:

use blazen_llm::{ChatMessage, ToolOutput, LlmPayload};
use serde_json::json;

// Plain string -- lives in content as a regular text message.
ChatMessage::tool_result("call_1", "search", json!("hello"));

// Structured -- lives in the tool_result sibling field.
ChatMessage::tool_result("call_1", "search", json!({"items": [1, 2, 3]}));

// With override -- full data preserved on the message,
// summary sent to the LLM next turn.
ChatMessage::tool_result(
    "call_1",
    "search",
    ToolOutput::with_override(
        json!({"items": [1, 2, 3], "raw": "..."}),
        LlmPayload::Text { text: "Found 3 items.".into() },
    ),
);

:::caution[Breaking change] Earlier versions accepted content: impl Into<String> and stored the string verbatim. The new signature accepts output: impl Into<ToolOutput<Value>>. Callers that passed a &str should switch to serde_json::Value::String(s.into()) or serde_json::json!(s). :::

`ChatMessage::tool_result_parts`

Build a tool-role message whose result carries multimodal content (text + images + files). The parts ride on tool_result as LlmPayload::Parts { parts } so providers that support multimodal tool results (Anthropic, Gemini) can forward them natively.

pub fn tool_result_parts(
    call_id: impl Into<String>,
    name: impl Into<String>,
    parts: Vec<ContentPart>,
) -> Self

Usage:

use blazen_llm::{ChatMessage, ContentPart, ImageContent, ImageSource};

ChatMessage::tool_result_parts(
    "call_1",
    "render_chart",
    vec![
        ContentPart::Text { text: "Rendered the requested chart.".into() },
        ContentPart::Image(ImageContent {
            source: ImageSource::Url { url: "https://example.com/chart.png".into() },
            media_type: Some("image/png".into()),
        }),
    ],
);

`ChatMessage::tool_result_view`

Accessor returning both channels of a tool-result message in a single call. Used internally by provider implementations that need to choose between the structured data and an explicit llm_override when serialising to the wire format.

pub fn tool_result_view(&self) -> Option<(serde_json::Value, Option<&LlmPayload>)>

Returns None for non-tool-role messages. For tool-role messages it returns Some((data, override)) where data is the raw serde_json::Value payload (drawn from tool_result.data when present, otherwise reconstructed from the plain-string content) and override is the optional &LlmPayload from tool_result.llm_override.

`Role`

pub enum Role {
    System,
    User,
    Assistant,
    Tool,
}

`MessageContent`

pub enum MessageContent {
    Text(String),
    Image(ImageContent),
    Parts(Vec<ContentPart>),
}

Method	Signature	Description
`as_text()`	`&self -> Option<&str>`	Return the text if this is a `Text` variant
`as_parts()`	`&self -> Vec<ContentPart>`	Convert any variant into a `Vec<ContentPart>`
`text_content()`	`&self -> Option<String>`	Extract and concatenate all text content

MessageContent implements From<&str> and From<String>.

`ContentPart`

pub enum ContentPart {
    Text { text: String },
    Image(ImageContent),
    File(FileContent),
}

`ImageContent`

Field	Type	Description
`source`	`ImageSource`	URL or base64 data
`media_type`	`Option<String>`	MIME type (e.g. `"image/png"`)

`ImageSource`

The source of an image, file, or any other media payload. Marked #[non_exhaustive] so new variants can be added without breaking callers — always pattern-match with a wildcard arm. MediaSource is a type alias for ImageSource and is the preferred name when the value is not specifically an image.

#[non_exhaustive]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum ImageSource {
    Url { url: String },
    Base64 { data: String },
    File { path: PathBuf },
    ProviderFile { provider: ProviderId, id: String },
    Handle { handle: ContentHandle },
}

pub type MediaSource = ImageSource;

Variant	Description
`Url { url }`	Public URL the provider can fetch directly.
`Base64 { data }`	Inline base64-encoded bytes. Pair with `media_type` on the enclosing `ImageContent` / `FileContent`.
`File { path }`	Local filesystem path. Use `ImageSource::file(path)` to construct. Resolved at request time by a `ContentStore` or by the provider adapter.
`ProviderFile { provider, id }`	Reference to a file already uploaded to a provider’s Files API (e.g. an OpenAI `file-xxx` id). Forwarded verbatim only when the active provider matches `provider`.
`Handle { handle }`	Reference to a `ContentHandle` registered with a `ContentStore`. Resolved into a concrete `Url` / `Base64` / `ProviderFile` by `CompletionRequest::resolve_handles_with` before the request hits a provider.

`FileContent`

Field	Type	Description
`source`	`ImageSource`	URL or base64 data
`media_type`	`String`	MIME type (e.g. `"application/pdf"`)
`filename`	`Option<String>`	Optional filename for display

`CompletionRequest`

A provider-agnostic request for a chat completion.

Field	Type	Description
`messages`	`Vec<ChatMessage>`	The conversation history
`tools`	`Vec<ToolDefinition>`	Tools available for the model to invoke
`temperature`	`Option<f32>`	Sampling temperature (0.0 = deterministic, 2.0 = very random)
`max_tokens`	`Option<u32>`	Maximum number of tokens to generate
`top_p`	`Option<f32>`	Nucleus sampling parameter
`response_format`	`Option<serde_json::Value>`	JSON Schema for structured output
`model`	`Option<String>`	Override the provider’s default model
`modalities`	`Option<Vec<String>>`	Output modalities (e.g. `["text"]`, `["image", "text"]`)
`image_config`	`Option<serde_json::Value>`	Image generation configuration (model-specific)
`audio_config`	`Option<serde_json::Value>`	Audio output configuration (voice, format, etc.)

Builder pattern:

let request = CompletionRequest::new(vec![ChatMessage::user("Hello")])
    .with_tools(tool_defs)
    .with_temperature(0.7)
    .with_max_tokens(1024)
    .with_top_p(0.9)
    .with_response_format(schema_json)
    .with_model("gpt-4o")
    .with_modalities(vec!["text".into(), "image".into()])
    .with_image_config(serde_json::json!({ "size": "1024x1024" }))
    .with_audio_config(serde_json::json!({ "voice": "alloy" }));

`CompletionResponse`

The result of a non-streaming chat completion.

Field	Type	Description
`content`	`Option<String>`	Text content of the assistant’s reply
`tool_calls`	`Vec<ToolCall>`	Tool invocations requested by the model
`usage`	`Option<TokenUsage>`	Token usage statistics
`model`	`String`	The model that produced this response
`finish_reason`	`Option<String>`	Why the model stopped (e.g. `"stop"`, `"tool_use"`)
`cost`	`Option<f64>`	Estimated cost in USD
`timing`	`Option<RequestTiming>`	Request timing breakdown
`images`	`Vec<GeneratedImage>`	Generated images (multimodal models)
`audio`	`Vec<GeneratedAudio>`	Generated audio (TTS / multimodal)
`videos`	`Vec<GeneratedVideo>`	Generated videos
`metadata`	`serde_json::Value`	Provider-specific metadata

`StructuredResponse<T>`

Response from structured output extraction, preserving metadata.

Field	Type	Description
`data`	`T`	The extracted structured data
`usage`	`Option<TokenUsage>`	Token usage statistics
`model`	`String`	The model that produced this response
`cost`	`Option<f64>`	Estimated cost in USD
`timing`	`Option<RequestTiming>`	Request timing
`metadata`	`serde_json::Value`	Provider-specific metadata

`EmbeddingResponse`

Response from an embedding operation.

Field	Type	Description
`embeddings`	`Vec<Vec<f32>>`	The embedding vectors (one per input text)
`model`	`String`	The model used
`usage`	`Option<TokenUsage>`	Token usage statistics
`cost`	`Option<f64>`	Estimated cost in USD
`timing`	`Option<RequestTiming>`	Request timing
`metadata`	`serde_json::Value`	Provider-specific metadata

`RequestTiming`

Timing metadata for a request.

Field	Type	Description
`queue_ms`	`Option<u64>`	Time spent waiting in queue (ms)
`execution_ms`	`Option<u64>`	Time spent executing the request (ms)
`total_ms`	`Option<u64>`	Total wall-clock time from submit to response (ms)

`TokenUsage`

Token usage statistics for a completion request.

Field	Type	Description
`prompt_tokens`	`u32`	Tokens in the prompt / input
`completion_tokens`	`u32`	Tokens in the completion / output
`total_tokens`	`u32`	Total tokens consumed (prompt + completion)

`ToolDefinition`

Describes a tool that the model may invoke.

Field	Type	Description
`name`	`String`	Unique name of the tool
`description`	`String`	Human-readable description
`parameters`	`serde_json::Value`	JSON Schema describing the tool’s input parameters

`ToolCall`

A tool invocation requested by the model.

Field	Type	Description
`id`	`String`	Provider-assigned identifier for this invocation
`name`	`String`	Name of the tool to invoke
`arguments`	`serde_json::Value`	Arguments to pass, as JSON

`StreamChunk`

A single chunk from a streaming completion response.

Field	Type	Description
`delta`	`Option<String>`	Incremental text content
`tool_calls`	`Vec<ToolCall>`	Tool invocations completed in this chunk
`finish_reason`	`Option<String>`	Present in the final chunk to indicate why generation stopped

Content Subsystem

A provider-agnostic layer for handing media (images, audio, video, documents, 3D, CAD, archives, fonts, code, generic data) to and from models. The core idea: instead of inlining bytes or URLs in every message, callers register payloads with a ContentStore, receive a small ContentHandle, and reference the handle from messages, tool inputs, and tool outputs. Just before a request is sent, the runtime resolves every handle into the concrete representation the active provider expects — a URL, a base64 blob, or a ProviderFile reference for providers with native Files APIs.

Everything in this section is re-exported from blazen_llm::content.

`ContentKind`

Coarse classification for a piece of content. Marked #[non_exhaustive]. Serde uses rename_all = "snake_case", so ThreeDModel round-trips as "three_d_model". Implements Display.

#[non_exhaustive]
pub enum ContentKind {
    Image,
    Audio,
    Video,
    Document,
    ThreeDModel,
    Cad,
    Archive,
    Font,
    Code,
    Data,
    Other,
}

Method	Signature	Description
`from_mime`	`fn(&str) -> Self`	Best-effort classification from a MIME string.
`from_extension`	`fn(&str) -> Self`	Best-effort classification from a filename extension (no leading dot required).
`as_str`	`fn(self) -> &'static str`	Snake-case string form (matches the serde representation).

use blazen_llm::content::ContentKind;

assert_eq!(ContentKind::from_mime("image/png"), ContentKind::Image);
assert_eq!(ContentKind::from_extension("glb"), ContentKind::ThreeDModel);
assert_eq!(ContentKind::ThreeDModel.as_str(), "three_d_model");

`ContentHandle`

An opaque pointer to a payload owned by a ContentStore. Cheap to clone, safe to embed in messages and tool arguments, and resolvable on demand.

pub struct ContentHandle {
    pub id: String,
    pub kind: ContentKind,
    pub mime_type: Option<String>,
    pub byte_size: Option<u64>,
    pub display_name: Option<String>,
}

Method	Signature	Description
`new`	`fn(id: impl Into<String>, kind: ContentKind) -> Self`	Construct a handle with no metadata.
`with_mime_type`	`fn(self, mime: impl Into<String>) -> Self`	Builder: attach a MIME type.
`with_byte_size`	`fn(self, bytes: u64) -> Self`	Builder: attach a byte size.
`with_display_name`	`fn(self, name: impl Into<String>) -> Self`	Builder: attach a human-readable name.

use blazen_llm::content::{ContentHandle, ContentKind};

let handle = ContentHandle::new("blob_abc123", ContentKind::Image)
    .with_mime_type("image/png")
    .with_byte_size(48_213)
    .with_display_name("chart.png");

`ContentStore`

The async trait every store implements. Stores own the bytes (or the URL, or the provider-side file id) and translate handles into something the provider can consume on demand.

#[async_trait]
pub trait ContentStore: Send + Sync + std::fmt::Debug {
    async fn put(&self, body: ContentBody, hint: ContentHint)
        -> Result<ContentHandle, BlazenError>;
    async fn resolve(&self, handle: &ContentHandle)
        -> Result<MediaSource, BlazenError>;
    async fn fetch_bytes(&self, handle: &ContentHandle)
        -> Result<Vec<u8>, BlazenError>;
    async fn metadata(&self, handle: &ContentHandle)
        -> Result<ContentMetadata, BlazenError> { /* default */ }
    async fn fetch_stream(&self, handle: &ContentHandle)
        -> Result<ByteStream, BlazenError> { /* default */ }
    async fn delete(&self, _handle: &ContentHandle)
        -> Result<(), BlazenError> { Ok(()) }
}

Method	Description
`put`	Ingest a `ContentBody` under a `ContentHint` and return a fresh handle.
`resolve`	Produce the concrete `MediaSource` the provider will see (typically `Url`, `Base64`, or `ProviderFile`). Called by `CompletionRequest::resolve_handles_with`.
`fetch_bytes`	Materialise the underlying bytes. Used when a tool needs to read the payload directly.
`metadata`	Return `ContentMetadata`. The default impl synthesises this from the handle’s own fields; stores with richer indices should override.
`fetch_stream`	Stream raw bytes back as a `ByteStream`. The default impl buffers `fetch_bytes` into a single chunk via `futures::stream::once`, so existing impls keep working unchanged. Stores backed by HTTP / disk / object storage should override for true incremental streaming. Built-in overrides today: `LocalFileContentStore` (uses `tokio_util::io::ReaderStream`), `OpenAiFilesStore`, `AnthropicFilesStore`, `FalStorageStore` (all via the `HttpClient` trait’s `send_streaming` method). `InMemoryContentStore` and `GeminiFilesStore` use the buffered default impl.
`delete`	Best-effort deletion. Default is a no-op so read-only stores can leave it unimplemented.

use std::sync::Arc;
use blazen_llm::content::{ContentBody, ContentHint, ContentKind, InMemoryContentStore, ContentStore};

let store = Arc::new(InMemoryContentStore::new());

let handle = store.put(
    ContentBody::Bytes { data: std::fs::read("chart.png")? },
    ContentHint::default()
        .with_mime_type("image/png")
        .with_kind(ContentKind::Image)
        .with_display_name("chart.png"),
).await?;

let source = store.resolve(&handle).await?;

`ContentBody`

The five ways a caller can hand bytes (or a pointer to bytes) to a ContentStore via put. Variants are struct-form (named fields), and serde uses an internally-tagged representation (tag = "type", rename_all = "snake_case").

#[derive(Serialize, Deserialize)]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum ContentBody {
    Bytes { data: Vec<u8> },
    Url { url: String },
    LocalPath { path: PathBuf },
    ProviderFile { provider: ProviderId, id: String },
    /// Streaming byte source. The store consumes the stream during `put`
    /// and is free to spool to disk, forward as a chunked upload, or
    /// drain into bytes.
    #[serde(skip)]
    Stream {
        stream: ByteStream,
        size_hint: Option<u64>,
    },
}

Variant	Description
`Bytes { data }`	In-memory payload. The store decides whether to keep it in RAM, spill to disk, or upload to a provider.
`Url { url }`	Remote URL the store may fetch lazily or pass through verbatim on resolve.
`LocalPath { path }`	Local filesystem path. Native-only stores (e.g. `LocalFileContentStore`) can index it without copying.
`ProviderFile { provider, id }`	A pre-existing file id on a provider’s Files API. Lets you wrap an externally uploaded asset in a handle without re-uploading.
`Stream { stream, size_hint }`	Streaming byte source built on `ByteStream`. Stores with a true streaming upload path (filesystem, S3, HTTP multipart) consume the stream incrementally; memory-bound stores buffer. `size_hint` carries the total length when known up front (e.g. from a `Content-Length` header) so stores can pre-allocate or pick between simple and resumable upload paths.

ContentBody::Stream is not Clone (a ByteStream is single-use) and not Serialize — the variant is annotated #[serde(skip)] because ByteStream implements neither Serialize nor Deserialize. The manual Clone impl panics with unreachable! if you clone a Stream variant; consume streaming bodies by value. Bindings that route ContentBody through serde_json must check for Stream first and handle it on a separate path.

use blazen_llm::content::ContentBody;

let from_memory = ContentBody::Bytes { data: b"hello".to_vec() };
let from_disk   = ContentBody::LocalPath { path: "./report.pdf".into() };
let from_url    = ContentBody::Url { url: "https://example.com/a.png".into() };

`ByteStream`

A pinned, boxed, fallible stream of byte chunks. Used by ContentBody::Stream for streaming uploads and by ContentStore::fetch_stream for streaming downloads.

pub type ByteStream = std::pin::Pin<
    Box<dyn futures_core::Stream<Item = Result<bytes::Bytes, BlazenError>> + Send>
>;

Stores backed by HTTP, S3, or the filesystem should produce / consume these incrementally; memory-bound stores may buffer.

`ContentHint`

Optional metadata callers pass alongside a ContentBody so the store can pick a sensible MIME type, classification, and display name without re-sniffing. Implements Default.

pub struct ContentHint {
    pub mime_type: Option<String>,
    pub kind_hint: Option<ContentKind>,
    pub display_name: Option<String>,
    pub byte_size: Option<u64>,
}

Method	Signature	Description
`with_mime_type`	`fn(self, mime: impl Into<String>) -> Self`	Pin the MIME type.
`with_kind`	`fn(self, kind: ContentKind) -> Self`	Pin the `ContentKind`.
`with_display_name`	`fn(self, name: impl Into<String>) -> Self`	Pin a human-readable name.
`with_byte_size`	`fn(self, bytes: u64) -> Self`	Pin the byte size when known up front (e.g. from a `Content-Length` header).

use blazen_llm::content::{ContentHint, ContentKind};

let hint = ContentHint::default()
    .with_mime_type("audio/wav")
    .with_kind(ContentKind::Audio)
    .with_display_name("note.wav");

`ContentMetadata`

The non-id fields of a ContentHandle, returned by ContentStore::metadata. Useful for cheap introspection without materialising bytes.

pub struct ContentMetadata {
    pub kind: ContentKind,
    pub mime_type: Option<String>,
    pub byte_size: Option<u64>,
    pub display_name: Option<String>,
}

use blazen_llm::content::ContentStore;

let meta = store.metadata(&handle).await?;
println!("{} ({} bytes)", meta.kind, meta.byte_size.unwrap_or(0));

`DynContentStore`

Convenience alias for the shared, thread-safe form most code passes around.

pub type DynContentStore = std::sync::Arc<dyn ContentStore>;

use std::sync::Arc;
use blazen_llm::content::{DynContentStore, InMemoryContentStore};

let store: DynContentStore = Arc::new(InMemoryContentStore::new());

Built-in stores

Store	Constructor	Notes
`InMemoryContentStore`	`InMemoryContentStore::new()` (also `Default`)	Bytes / URL / provider-file refs held in a `RwLock`-guarded map. Great for tests and short-lived sessions.
`LocalFileContentStore`	`LocalFileContentStore::new(root: impl Into<PathBuf>) -> Result<Self, BlazenError>`	Native-only (`not(target_arch = "wasm32")`). Persists payloads under `root`; assigns each entry a UUID-derived filename and tracks the index in memory.
`OpenAiFilesStore`	`OpenAiFilesStore::new(api_key)`, `.with_base_url(url)`, `.with_purpose(p)`	Uploads via OpenAI’s `/v1/files` API. `purpose` defaults to `"user_data"` (override for assistants / fine-tuning / batch).
`AnthropicFilesStore`	`AnthropicFilesStore::new(api_key)`, `.with_base_url(url)`, `.with_beta_header(h)`	Uploads via Anthropic’s Files API. `beta_header` is forwarded as `anthropic-beta`.
`GeminiFilesStore`	`GeminiFilesStore::new(api_key)`, `.with_base_url(url)`	Uploads via Google’s Files API and resolves handles to `gs://`/file-uri refs.
`FalStorageStore`	`FalStorageStore::new(api_key)`, `.with_base_url(url)`	Uploads to Fal’s storage endpoint.
`CustomContentStore`	`CustomContentStore::builder(name)` -> `CustomContentStoreBuilder`	Build a store from closures: `.put(...)`, `.resolve(...)`, `.fetch_bytes(...)`, `.fetch_stream(...)` (optional), `.delete(...)` (optional), `.build()`. The `.fetch_stream` callback is optional — when omitted, the trait’s default impl buffers `fetch_bytes` into one chunk via `stream::once`. Lets callers integrate their own blob backend without writing a new trait impl.

use std::sync::Arc;
use blazen_llm::content::{
    AnthropicFilesStore, CustomContentStore, InMemoryContentStore,
    LocalFileContentStore, OpenAiFilesStore,
};

let in_mem = Arc::new(InMemoryContentStore::new());
let on_disk = Arc::new(LocalFileContentStore::new("/var/cache/blazen")?);
let openai = Arc::new(OpenAiFilesStore::new(std::env::var("OPENAI_API_KEY")?)
    .with_purpose("user_data"));
let anthropic = Arc::new(AnthropicFilesStore::new(std::env::var("ANTHROPIC_API_KEY")?)
    .with_beta_header("files-api-2025-04-14"));

let custom = Arc::new(
    CustomContentStore::builder("s3-store")
        .put(|body, hint| async move { /* upload, return ContentHandle */ todo!() })
        .resolve(|handle| async move { /* return MediaSource */ todo!() })
        .fetch_bytes(|handle| async move { /* return Vec<u8> */ todo!() })
        .fetch_stream(|handle| Box::pin(async move {
            // OPTIONAL: stream bytes back chunk-by-chunk for large content.
            // When omitted, the default impl buffers fetch_bytes into one chunk.
            use bytes::Bytes;
            use futures_util::stream;
            let chunks: Vec<Result<Bytes, _>> = vec![Ok(Bytes::from_static(b"hello"))];
            Ok(Box::pin(stream::iter(chunks)) as blazen_llm::content::ByteStream)
        }))
        .delete(|handle| async move { /* delete blob */ Ok(()) })
        .build()?,
);

Tool-input helpers

Helpers that produce JSON Schema fragments for tool parameters that should accept a ContentHandle. Each fragment is tagged with x-blazen-content-ref so resolve_tool_arguments knows where to substitute resolved MediaSource values before the tool runs.

pub fn image_input(name: &str, description: &str) -> serde_json::Value;
pub fn audio_input(name: &str, description: &str) -> serde_json::Value;
pub fn video_input(name: &str, description: &str) -> serde_json::Value;
pub fn file_input(name: &str, description: &str) -> serde_json::Value;
pub fn three_d_input(name: &str, description: &str) -> serde_json::Value;
pub fn cad_input(name: &str, description: &str) -> serde_json::Value;

pub fn content_ref_property(
    kind: ContentKind,
    description: &str,
) -> serde_json::Value;

pub fn content_ref_required_object(
    name: &str,
    kind: ContentKind,
    description: &str,
    extra_properties: serde_json::Map<String, serde_json::Value>,
) -> serde_json::Value;

pub async fn resolve_tool_arguments(
    arguments: &mut serde_json::Value,
    schema: &serde_json::Value,
    store: &dyn ContentStore,
) -> Result<usize, BlazenError>;

pub struct KindMismatch {
    pub property: String,
    pub expected: ContentKind,
    pub actual: ContentKind,
}

Helper	Description
`image_input` / `audio_input` / `video_input` / `file_input` / `three_d_input` / `cad_input`	Top-level convenience: returns a single-property required object schema for a typed content reference.
`content_ref_property`	Schema for one property accepting a `ContentHandle` of the given `ContentKind`. Use when assembling a custom multi-property schema.
`content_ref_required_object`	Build a required object schema mixing one content ref with `extra_properties` (other primitives, enums, etc.).
`resolve_tool_arguments`	Walk `arguments` against `schema`, replace every `x-blazen-content-ref` site with the `MediaSource` returned by `store.resolve`. Returns the number of substitutions made.
`KindMismatch`	Error variant returned when a handle’s `ContentKind` does not match the schema’s declared `expected` kind.

use blazen_llm::content::tool_input::{image_input, resolve_tool_arguments};
use serde_json::{json, Map};

let schema = json!({
    "type": "object",
    "properties": {
        "image": image_input("image", "The image to caption"),
        "max_words": { "type": "integer" },
    },
    "required": ["image", "max_words"],
});

let mut args = json!({
    "image": { "id": "blob_abc123", "kind": "image" },
    "max_words": 20,
});

let n = resolve_tool_arguments(&mut args, &schema, store.as_ref()).await?;
// `args["image"]` is now a concrete MediaSource (Url / Base64 / ProviderFile).
println!("rewrote {n} content refs");

Visibility helpers

Helpers for the runtime’s “what handles is the model actually allowed to see right now?” pass.

pub fn collect_visible_handles(messages: &[ChatMessage]) -> Vec<ContentHandle>;

pub fn build_handle_directory_system_note(
    handles: &[ContentHandle],
) -> Option<String>;

pub async fn prepare_request_with_store(
    request: &mut CompletionRequest,
    store: &dyn ContentStore,
) -> Result<usize, BlazenError>;

Helper	Description
`collect_visible_handles`	Walks `messages` and returns every distinct `ContentHandle` referenced from user/assistant/tool content, deduped first-seen.
`build_handle_directory_system_note`	Returns a system-note string listing the visible handles (id, kind, MIME, name) so the model can name them when calling tools. Returns `None` when `handles` is empty — callers should not append an empty note.
`prepare_request_with_store`	One-call convenience: runs `CompletionRequest::resolve_handles_with`, then builds and prepends the system note. Returns the number of resolved handles. This is what the agent loop calls before dispatching a request.

use blazen_llm::content::visibility::{
    collect_visible_handles, build_handle_directory_system_note, prepare_request_with_store,
};

let visible = collect_visible_handles(&request.messages);
if let Some(note) = build_handle_directory_system_note(&visible) {
    println!("would inject system note:\n{note}");
}

let n = prepare_request_with_store(&mut request, store.as_ref()).await?;
println!("resolved {n} handles before dispatch");

Magic-number detection

Lightweight content sniffing backed by infer. Gated by the default-on content-detect Cargo feature — disabling the feature drops the infer dependency entirely (the functions remain but return (ContentKind::Other, None)).

pub fn detect_from_bytes(bytes: &[u8]) -> (ContentKind, Option<String>);

#[cfg(not(target_arch = "wasm32"))]
pub fn detect_from_path(path: &std::path::Path) -> (ContentKind, Option<String>);

pub fn detect(
    bytes: Option<&[u8]>,
    mime_hint: Option<&str>,
    filename: Option<&str>,
) -> (ContentKind, Option<String>);

Function	Description
`detect_from_bytes`	Sniff the leading bytes and return the inferred `ContentKind` plus the matching MIME string if any.
`detect_from_path`	Native-only. Reads the head of the file, then falls back to the extension when bytes are inconclusive.
`detect`	Combined entry point: prefers byte sniffing when `bytes` is `Some`, then `mime_hint`, then `filename` extension. Returns `(ContentKind::Other, None)` when nothing matches.

use blazen_llm::content::{detect, detect_from_bytes};

let (kind, mime) = detect_from_bytes(&[0x89, b'P', b'N', b'G', 0x0d, 0x0a, 0x1a, 0x0a]);
assert_eq!(mime.as_deref(), Some("image/png"));

let (kind2, mime2) = detect(None, Some("application/pdf"), Some("report.pdf"));

`CompletionRequest::resolve_handles_with`

The lower-level half of prepare_request_with_store. Walks every message in the request, finds every ImageSource::Handle, and replaces it with the concrete MediaSource returned by store.resolve. Does not inject a system note — prefer prepare_request_with_store from the agent loop unless you specifically want to skip the note.

impl CompletionRequest {
    pub async fn resolve_handles_with(
        &mut self,
        store: &dyn ContentStore,
    ) -> Result<usize, BlazenError>;
}

Returns the number of Handle variants that were rewritten. Errors propagate from store.resolve.

let resolved = request.resolve_handles_with(store.as_ref()).await?;
println!("rewrote {resolved} handles in place");

Agent System

The agent system implements the standard LLM + tool calling loop: send messages with tool definitions, execute any tool calls the model makes, feed results back, and repeat until the model stops or max_iterations is reached.

`run_agent()`

Run the agent loop without event callbacks.

pub async fn run_agent(
    model: &dyn CompletionModel,
    messages: Vec<ChatMessage>,
    config: AgentConfig,
) -> Result<AgentResult, BlazenError>

`run_agent_with_callback()`

Run the agent loop, emitting AgentEvents to the supplied callback.

pub async fn run_agent_with_callback(
    model: &dyn CompletionModel,
    messages: Vec<ChatMessage>,
    config: AgentConfig,
    on_event: impl Fn(AgentEvent) + Send + Sync,
) -> Result<AgentResult, BlazenError>

The loop works as follows:

Build a CompletionRequest with the full message history and all tool definitions.
Call the model.
If the model responds with no tool calls, return immediately.
If the model invoked the built-in “finish” tool (when enabled), extract the answer and return.
Otherwise, execute each tool call, append results to messages, go back to step 1.
If max_iterations is reached, make one final call without tools to force a text answer.

Usage:

use std::sync::Arc;
use blazen_llm::{run_agent, AgentConfig, ChatMessage};

let config = AgentConfig::new(vec![Arc::new(WeatherTool)])
    .with_system_prompt("You are a helpful assistant with weather tools.")
    .with_max_iterations(5)
    .with_finish_tool()
    .with_temperature(0.7)
    .with_max_tokens(2048);

let result = run_agent(
    &model,
    vec![ChatMessage::user("What's the weather in Paris?")],
    config,
).await?;

println!("Answer: {}", result.response.content.unwrap_or_default());
println!("Iterations: {}", result.iterations);
println!("Total cost: ${:.4}", result.total_cost.unwrap_or(0.0));

With callback:

use blazen_llm::{run_agent_with_callback, AgentEvent};

let result = run_agent_with_callback(
    &model,
    vec![ChatMessage::user("What's the weather?")],
    config,
    |event| match &event {
        AgentEvent::ToolCalled { iteration, tool_call } => {
            println!("[iter {iteration}] calling tool: {}", tool_call.name);
        }
        AgentEvent::ToolResult { tool_name, result, .. } => {
            println!("  {tool_name} -> {result}");
        }
        AgentEvent::IterationComplete { iteration, had_tool_calls } => {
            println!("[iter {iteration}] done (tools: {had_tool_calls})");
        }
    },
).await?;

`AgentConfig`

Configuration for the agentic tool execution loop.

Field	Type	Default	Description
`max_iterations`	`u32`	`10`	Maximum tool call rounds before forcing a stop
`tools`	`Vec<Arc<dyn Tool>>`	required	Tools available to the agent
`add_finish_tool`	`bool`	`false`	Add an implicit “finish” tool the model can call to exit early
`system_prompt`	`Option<String>`	`None`	System prompt prepended to messages
`temperature`	`Option<f32>`	`None`	Sampling temperature
`max_tokens`	`Option<u32>`	`None`	Maximum tokens per completion call

Builder pattern:

AgentConfig::new(tools)
    .with_max_iterations(5)
    .with_system_prompt("You are helpful.")
    .with_finish_tool()
    .with_temperature(0.7)
    .with_max_tokens(2048)

`AgentResult`

Result of an agent run.

Field	Type	Description
`response`	`CompletionResponse`	The final completion response
`messages`	`Vec<ChatMessage>`	Full message history including all tool calls and results
`iterations`	`u32`	Number of tool call rounds that occurred
`total_usage`	`Option<TokenUsage>`	Aggregated token usage across all rounds
`total_cost`	`Option<f64>`	Aggregated cost across all rounds
`timing`	`Option<RequestTiming>`	Total wall-clock time for the entire agent run

`AgentEvent`

Events emitted during agent execution (passed to the callback in run_agent_with_callback).

pub enum AgentEvent {
    ToolCalled {
        iteration: u32,
        tool_call: ToolCall,
    },
    ToolResult {
        iteration: u32,
        tool_name: String,
        result: serde_json::Value,
    },
    IterationComplete {
        iteration: u32,
        had_tool_calls: bool,
    },
}

Context

The Context object is a shared key-value store available in every workflow step. It provides three storage tiers and methods for event routing, streaming, and state management.

State Storage

Typed JSON: `set()` / `get()`

Store and retrieve any Serialize / DeserializeOwned type. Values are held internally as StateValue::Json.

// Store a typed value (anything implementing Serialize)
ctx.set("user_id", serde_json::json!("user_123"));
ctx.set("doc_count", serde_json::json!(5));

// Retrieve with type inference
let user_id: String = serde_json::from_value(ctx.get("user_id").unwrap()).unwrap();
let doc_count: i64 = serde_json::from_value(ctx.get("doc_count").unwrap()).unwrap();

Binary: `set_bytes()` / `get_bytes()`

Store raw Vec<u8> data. Values are held as StateValue::Bytes. No serialization requirement — useful for model weights, protobuf, bincode, or any binary format.

ctx.set_bytes("weights", vec![0x01, 0x02, 0x03]);
let bytes: Vec<u8> = ctx.get_bytes("weights").unwrap();

Raw StateValue: `set_value()` / `get_value()`

Work with the StateValue enum directly for full control over the storage variant, including the Native variant used by language bindings.

use blazen::context::StateValue;

ctx.set_value("config", StateValue::Json(serde_json::json!({"retries": 3})));
ctx.set_value("blob", StateValue::Bytes(vec![0xDE, 0xAD].into()));
ctx.set_value("py_obj", StateValue::Native(pickle_bytes.into()));

match ctx.get_value("config") {
    Some(StateValue::Json(v)) => { /* structured data */ }
    Some(StateValue::Bytes(b)) => { /* raw bytes */ }
    Some(StateValue::Native(b)) => { /* platform-serialized opaque bytes */ }
    None => { /* key not found */ }
}

`StateValue`

pub enum StateValue {
    Json(serde_json::Value),
    Bytes(BytesWrapper),
    Native(BytesWrapper),
}

Variant	Description
`Json(serde_json::Value)`	Structured, serializable data. Used by `ctx.set()` / `ctx.get()`.
`Bytes(BytesWrapper)`	Raw binary data. Used by `ctx.set_bytes()` / `ctx.get_bytes()`.
`Native(BytesWrapper)`	Platform-serialized opaque objects (e.g., Python pickle bytes). Preserved across language boundaries without deserialization.

Run Identity

ctx.run_id() -> &str

Returns the unique identifier for the current workflow run.

Event Routing

ctx.send_event(event: impl Event)

Programmatically route an event into the workflow. Use this when a step needs to emit multiple events or decide at runtime which path to take. When using send_event, the step returns () instead of an event type.

ctx.write_event_to_stream(event: impl Event)

Publish an event to the workflow’s external event stream, observable by callers via stream_events(). Useful for progress reporting and live updates.

Session References

async fn session_refs_arc(&self) -> Arc<SessionRefRegistry>
async fn clear_session_refs(&self) -> usize
async fn session_pause_policy(&self) -> SessionPausePolicy

Method	Description
`session_refs_arc()`	Get a clone of the session-ref registry handle for use by language bindings. Bindings install it as a task-local for the duration of a step so platform-native objects (`Py<PyAny>`, `napi::Ref<JsObject>`, etc.) carried via event payloads can be resolved by UUID.
`clear_session_refs()`	Drain the session-ref registry. Called on workflow termination by the language bindings to release platform-specific live refs back to their respective garbage collectors. Returns the number of entries removed.
`session_pause_policy()`	Get the configured `SessionPausePolicy`. The policy is set by `WorkflowBuilder::session_pause_policy`; there is no public setter on `Context`.

See the dedicated Session Reference Registry section for background, key types, and the pause-time policy matrix.

State Snapshot and Restore

ctx.collect_events() -> Vec<Box<dyn Event>>
ctx.snapshot_state() -> ContextSnapshot
ctx.restore_state(snapshot: ContextSnapshot)

Method	Description
`collect_events()`	Drain all pending events from the context.
`snapshot_state()`	Capture the entire context state as a serializable snapshot (for checkpointing / pause-resume).
`restore_state(snapshot)`	Restore context from a previously captured snapshot.

:::caution[Session refs are not snapshotted] Context::snapshot_state intentionally excludes both the opaque objects map and the session_refs registry. Live in-process references cannot survive a cross-process snapshot round-trip. If your workflow may pause() and your bindings use session refs, configure WorkflowBuilder::session_pause_policy to control what happens (default: PickleOrError). :::

BlazenState

BlazenState is a binding-layer concept for Python, Node.js, and WASM. In those languages, extending a BlazenState base class gives you automatic per-field persistence in the workflow context. Rust has no equivalent base class.

In Rust, per-field storage is achieved manually by calling set_value() / get_value() with the StateValue enum (see the StateValue section above). Each field is stored under an explicit key, giving you full control over serialization format and storage variant.

The Native(BytesWrapper) variant exists specifically to support bindings: it lets platform-serialized objects (e.g., Python pickle bytes, Node.js v8.serialize output) round-trip through Rust steps without deserialization. Binding authors use StateValue::Native to store opaque platform objects, and Rust code can forward those values without interpreting their contents.

Session Reference Registry

blazen_core::session_ref provides a per-Context registry of live in-process references — values that cannot or should not be JSON-serialized, such as DB connections, file handles, large in-memory tensors, lambdas, locks, or platform-native objects like Py<PyAny> and napi::Ref<JsObject>.

Each Context owns its own SessionRefRegistry with a lifetime tied to the workflow run. Event payloads carry only a JSON marker containing the key; the actual object lives in the registry until workflow completion. Bindings detect the marker and resolve it through the active registry to preserve object identity across step boundaries without serialisation.

The JSON marker format is:

{"__blazen_session_ref__": "<uuid>"}

The tag string is exposed as a constant:

pub const SESSION_REF_TAG: &str = "__blazen_session_ref__";

A defensive cap protects against runaway loops exhausting memory:

pub const MAX_SESSION_REFS_PER_RUN: usize = 10_000;

insert_arc / insert return SessionRefError::CapacityExceeded once the registry reaches this cap.

`RegistryKey`

Strongly-typed wrapper around Uuid used as the registry key.

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
#[serde(transparent)]
pub struct RegistryKey(pub Uuid);

Method	Signature	Description
`new()`	`fn() -> Self`	Mint a fresh random key.
`parse(s)`	`fn(&str) -> Result<Self, uuid::Error>`	Parse a key from a UUID string.
`Display`	—	Formats as the underlying UUID.

`SessionRefRegistry`

Per-context registry of live session references. Internally Arc<dyn Any + Send + Sync> keyed by RegistryKey and guarded by a tokio::sync::RwLock.

pub struct SessionRefRegistry { /* ... */ }

Method	Signature	Description
`new()`	`fn() -> Self`	Create an empty registry.
`insert_arc()`	`async fn(&self, Arc<dyn Any + Send + Sync>) -> Result<RegistryKey, SessionRefError>`	Insert a type-erased `Arc` directly. Returns the freshly minted key or `CapacityExceeded`.
`insert::<T>()`	`async fn(&self, T) -> Result<RegistryKey, SessionRefError>`	Insert any `Any + Send + Sync + 'static` value, wrapping it in an `Arc` for you.
`get_any()`	`async fn(&self, RegistryKey) -> Option<Arc<dyn Any + Send + Sync>>`	Look up the type-erased entry. Bindings call this and downcast.
`get::<T>()`	`async fn(&self, RegistryKey) -> Option<Arc<T>>`	Look up and downcast to a concrete `Arc<T>`.
`remove()`	`async fn(&self, RegistryKey) -> Option<Arc<dyn Any + Send + Sync>>`	Remove a single entry, returning the removed value if present.
`drain()`	`async fn(&self) -> usize`	Drain all entries, returning the number removed.
`len()`	`async fn(&self) -> usize`	Number of currently live entries.
`is_empty()`	`async fn(&self) -> bool`	Whether the registry has any live entries.
`keys()`	`async fn(&self) -> Vec<RegistryKey>`	Iterate every key currently in the registry. Used by the snapshot walker to apply `SessionPausePolicy` uniformly.

`SessionPausePolicy`

Controls what happens to live session references when a workflow is paused or snapshotted.

#[derive(Debug, Clone, Copy, Default, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum SessionPausePolicy {
    #[default]
    PickleOrError,
    WarnDrop,
    HardError,
}

Variant	Behaviour
`PickleOrError` (default)	Attempt to pickle each live ref into the snapshot. On any failure, raise `WorkflowError::SessionRefsNotSerializable` and abort the snapshot. Recommended.
`WarnDrop`	Drop live refs from the snapshot, emit `tracing::warn!` per drop, and store a diagnostic report in snapshot metadata. On resume, accessing a dropped field raises a clear runtime error from the binding.
`HardError`	Refuse to pause if any live refs are in flight. Raises `WorkflowError::SessionRefsNotSerializable` immediately.

`SessionRefError`

Error type for registry operations.

#[derive(Debug, thiserror::Error)]
pub enum SessionRefError {
    #[error("session ref registry capacity exceeded ({cap} entries) — too many live references in this workflow run")]
    CapacityExceeded { cap: usize },
}

Variant	Description
`CapacityExceeded { cap: usize }`	Returned when `SessionRefRegistry::insert_arc` is called while the registry already holds `MAX_SESSION_REFS_PER_RUN` entries.

Snapshot exclusion

Session-ref entries are deliberately excluded from Context::snapshot_state(), mirroring the existing objects exclusion. Live in-process references cannot meaningfully round-trip through a serialized snapshot, so the snapshot walker applies SessionPausePolicy at pause time instead. See State Snapshot and Restore for the callout.

Workflow

`WorkflowBuilder`

The builder exposes a fluent API for configuring a workflow before build(). The full set of builder methods is documented in the guides; the entry below covers the session-ref configuration knob added alongside the session reference registry.

`WorkflowBuilder::session_pause_policy`

pub fn session_pause_policy(mut self, policy: SessionPausePolicy) -> Self

Configures the policy applied to live session references when the workflow is paused or snapshotted. Defaults to PickleOrError. See SessionPausePolicy for the full variant matrix.

Usage:

use blazen_core::{WorkflowBuilder, session_ref::SessionPausePolicy};

let workflow = WorkflowBuilder::new("my-workflow")
    .step(my_step)
    .session_pause_policy(SessionPausePolicy::WarnDrop)
    .build()?;

`WorkflowHandler`

The handle returned after starting a workflow. Provides await/stream/pause modes for consuming workflow results.

`WorkflowHandler::session_refs`

#[must_use]
pub fn session_refs(&self) -> Arc<SessionRefRegistry>

Returns a clone of the session-ref registry handle. Bindings call this after result() to resolve any __blazen_session_ref__ markers carried by the final event, ensuring identity-preserving access to live Python / JS objects passed via event payloads.

The returned Arc keeps the registry alive past the event loop’s exit so the final result event can still resolve live-ref markers even after the original Context has been dropped.

`WorkflowHandler::result`

pub async fn result(&self) -> Result<WorkflowResult>

Await the final workflow result. The returned WorkflowResult has an .event field containing the final event (typically a StopEvent). Use result.event.downcast_ref::<StopEvent>() to access typed data, or result.event.to_json() for a JSON representation.

`WorkflowHandler::pause`

pub fn pause(&self) -> Result<()>

Signal the workflow to pause. This is a synchronous, non-consuming call — it does not return a snapshot. After pausing, call snapshot() to obtain the serialized state.

`WorkflowHandler::snapshot`

pub async fn snapshot(&self) -> Result<String>

Capture the workflow’s current state as a JSON string. Typically called after pause(). The snapshot can be persisted and later passed to Workflow::resume().

`WorkflowHandler::resume_in_place`

pub fn resume_in_place(&self)

Resume a paused workflow in-place, continuing execution from where it left off.

`WorkflowHandler::respond_to_input`

pub fn respond_to_input(&self, request_id: String, response: serde_json::Value)

Supply a response to a pending InputRequestEvent. The request_id must match the ID from the original request. The workflow will route the response to the appropriate step and continue.

`WorkflowHandler::abort`

pub async fn abort(&self) -> Result<()>

Abort the running workflow. Any pending steps are cancelled and the workflow terminates with an error.

`WorkflowError` variants

blazen_core::WorkflowError is the unified workflow error type. The session-ref subsystem introduces one new variant; other variants (e.g. Paused, InputRequired, Other) are documented in the workflow guides.

`SessionRefsNotSerializable`

#[error("session refs cannot be serialized for snapshot: {keys:?}")]
SessionRefsNotSerializable {
    /// String-formatted UUIDs of the live session refs that could not
    /// be persisted.
    keys: Vec<String>,
}

One or more live session references could not be serialized for a snapshot. The keys vector contains the string-formatted UUIDs of the offending entries. Produced by the default PickleOrError pause policy when a live ref is not picklable, and by HardError whenever any live refs are in flight at pause time.

Compute Platform

The compute module provides a unified trait system for async, job-based media generation providers (fal.ai, Replicate, RunPod, etc.) that model a submit-poll-retrieve workflow for GPU workloads.

`ComputeProvider`

The base trait for compute providers.

#[async_trait]
pub trait ComputeProvider: Send + Sync {
    fn provider_id(&self) -> &str;

    async fn submit(&self, request: ComputeRequest) -> Result<JobHandle, BlazenError>;

    async fn status(&self, job: &JobHandle) -> Result<JobStatus, BlazenError>;

    async fn result(&self, job: JobHandle) -> Result<ComputeResult, BlazenError>;

    async fn cancel(&self, job: &JobHandle) -> Result<(), BlazenError>;

    // Default: submit then wait for result
    async fn run(&self, request: ComputeRequest) -> Result<ComputeResult, BlazenError> {
        let job = self.submit(request).await?;
        self.result(job).await
    }
}

`ImageGeneration`

Image generation and upscaling. Requires ComputeProvider as a supertrait.

#[async_trait]
pub trait ImageGeneration: ComputeProvider {
    async fn generate_image(&self, request: ImageRequest) -> Result<ImageResult, BlazenError>;
    async fn upscale_image(&self, request: UpscaleRequest) -> Result<ImageResult, BlazenError>;
}

Usage:

use blazen_llm::compute::{ImageGeneration, ImageRequest};

let result = provider.generate_image(
    ImageRequest::new("a cat in space")
        .with_size(1024, 1024)
        .with_count(2)
        .with_negative_prompt("blurry")
        .with_model("flux-dev"),
).await?;

for image in &result.images {
    println!("url: {:?}, {}x{}", image.media.url, image.width.unwrap_or(0), image.height.unwrap_or(0));
}

`VideoGeneration`

Video synthesis from text or images. Requires ComputeProvider as a supertrait.

#[async_trait]
pub trait VideoGeneration: ComputeProvider {
    async fn text_to_video(&self, request: VideoRequest) -> Result<VideoResult, BlazenError>;
    async fn image_to_video(&self, request: VideoRequest) -> Result<VideoResult, BlazenError>;
}

Usage:

use blazen_llm::compute::{VideoGeneration, VideoRequest};

// Text-to-video
let result = provider.text_to_video(
    VideoRequest::new("a sunset timelapse")
        .with_duration(5.0)
        .with_size(1920, 1080)
        .with_model("kling"),
).await?;

// Image-to-video
let result = provider.image_to_video(
    VideoRequest::for_image("https://example.com/img.png", "animate this scene")
        .with_duration(3.0),
).await?;

`AudioGeneration`

Audio synthesis including TTS, music, and sound effects. Requires ComputeProvider as a supertrait.

#[async_trait]
pub trait AudioGeneration: ComputeProvider {
    async fn text_to_speech(&self, request: SpeechRequest) -> Result<AudioResult, BlazenError>;

    // Default: returns BlazenError::Unsupported
    async fn generate_music(&self, request: MusicRequest) -> Result<AudioResult, BlazenError>;

    // Default: returns BlazenError::Unsupported
    async fn generate_sfx(&self, request: MusicRequest) -> Result<AudioResult, BlazenError>;
}

generate_music() and generate_sfx() have default implementations that return BlazenError::Unsupported. Providers override only the methods they support.

Usage:

use blazen_llm::compute::{AudioGeneration, SpeechRequest, MusicRequest};

let speech = provider.text_to_speech(
    SpeechRequest::new("Hello world")
        .with_voice("alloy")
        .with_language("en")
        .with_speed(1.0)
        .with_voice_url("https://example.com/voice.wav") // voice cloning
        .with_model("tts-1"),
).await?;

let music = provider.generate_music(
    MusicRequest::new("upbeat jazz")
        .with_duration(30.0)
        .with_model("musicgen"),
).await?;

`Transcription`

Audio transcription (speech-to-text). Requires ComputeProvider as a supertrait.

#[async_trait]
pub trait Transcription: ComputeProvider {
    async fn transcribe(
        &self,
        request: TranscriptionRequest,
    ) -> Result<TranscriptionResult, BlazenError>;
}

Usage:

use blazen_llm::compute::{Transcription, TranscriptionRequest};

let result = provider.transcribe(
    TranscriptionRequest::new("https://example.com/audio.mp3")
        .with_language("en")
        .with_diarize(true)
        .with_model("whisper-v3"),
).await?;

println!("Full text: {}", result.text);
for segment in &result.segments {
    println!("[{:.1}s - {:.1}s] {}: {}",
        segment.start, segment.end,
        segment.speaker.as_deref().unwrap_or("?"),
        segment.text,
    );
}

`ThreeDGeneration`

3D model generation from text or images. Requires ComputeProvider as a supertrait.

#[async_trait]
pub trait ThreeDGeneration: ComputeProvider {
    async fn generate_3d(&self, request: ThreeDRequest) -> Result<ThreeDResult, BlazenError>;
}

Usage:

use blazen_llm::compute::{ThreeDGeneration, ThreeDRequest};

// Text-to-3D
let result = provider.generate_3d(
    ThreeDRequest::new("a 3D cat")
        .with_format("glb")
        .with_model("triposr"),
).await?;

// Image-to-3D
let result = provider.generate_3d(
    ThreeDRequest::from_image("https://example.com/cat.png")
        .with_format("obj"),
).await?;

for model_3d in &result.models {
    println!("vertices: {:?}, faces: {:?}, textures: {}, animations: {}",
        model_3d.vertex_count, model_3d.face_count,
        model_3d.has_textures, model_3d.has_animations,
    );
}

Compute Request Types

`ImageRequest`

Field	Type	Description
`prompt`	`String`	Text prompt describing the desired image
`negative_prompt`	`Option<String>`	Things to avoid in the image
`width`	`Option<u32>`	Desired width in pixels
`height`	`Option<u32>`	Desired height in pixels
`num_images`	`Option<u32>`	Number of images to generate
`model`	`Option<String>`	Model override
`parameters`	`serde_json::Value`	Additional provider-specific parameters

Builder: ImageRequest::new(prompt).with_size(w, h).with_count(n).with_negative_prompt(p).with_model(m)

`UpscaleRequest`

Field	Type	Description
`image_url`	`String`	URL of the image to upscale
`scale`	`f32`	Scale factor (e.g. 2.0, 4.0)
`model`	`Option<String>`	Model override
`parameters`	`serde_json::Value`	Additional provider-specific parameters

Builder: UpscaleRequest::new(url, scale).with_model(m)

`VideoRequest`

Field	Type	Description
`prompt`	`String`	Text prompt
`image_url`	`Option<String>`	Source image for image-to-video
`duration_seconds`	`Option<f32>`	Desired duration in seconds
`negative_prompt`	`Option<String>`	Things to avoid
`width`	`Option<u32>`	Desired width in pixels
`height`	`Option<u32>`	Desired height in pixels
`model`	`Option<String>`	Model override
`parameters`	`serde_json::Value`	Additional provider-specific parameters

Builder: VideoRequest::new(prompt) or VideoRequest::for_image(url, prompt), then .with_duration(s).with_size(w, h).with_model(m)

`SpeechRequest`

Field	Type	Description
`text`	`String`	Text to synthesize
`voice`	`Option<String>`	Voice identifier (provider-specific)
`voice_url`	`Option<String>`	Reference voice URL for voice cloning
`language`	`Option<String>`	Language code (e.g. `"en"`, `"fr"`)
`speed`	`Option<f32>`	Speed multiplier (1.0 = normal)
`model`	`Option<String>`	Model override
`parameters`	`serde_json::Value`	Additional provider-specific parameters

Builder: SpeechRequest::new(text).with_voice(v).with_voice_url(url).with_language(l).with_speed(s).with_model(m)

`MusicRequest`

Field	Type	Description
`prompt`	`String`	Text prompt
`duration_seconds`	`Option<f32>`	Desired duration in seconds
`model`	`Option<String>`	Model override
`parameters`	`serde_json::Value`	Additional provider-specific parameters

Builder: MusicRequest::new(prompt).with_duration(s).with_model(m)

`TranscriptionRequest`

Field	Type	Description
`audio_url`	`String`	URL of the audio file
`language`	`Option<String>`	Language hint
`diarize`	`bool`	Enable speaker diarization (default: `false`)
`model`	`Option<String>`	Model override
`parameters`	`serde_json::Value`	Additional provider-specific parameters

Builder: TranscriptionRequest::new(url).with_language(l).with_diarize(true).with_model(m)

`ThreeDRequest`

Field	Type	Description
`prompt`	`String`	Text prompt
`image_url`	`Option<String>`	Source image for image-to-3D
`format`	`Option<String>`	Output format (e.g. `"glb"`, `"obj"`, `"usdz"`)
`model`	`Option<String>`	Model override
`parameters`	`serde_json::Value`	Additional provider-specific parameters

Builder: ThreeDRequest::new(prompt) or ThreeDRequest::from_image(url), then .with_format(f).with_model(m)

Compute Result Types

`ImageResult`

Field	Type	Description
`images`	`Vec<GeneratedImage>`	The generated/upscaled images
`timing`	`RequestTiming`	Request timing breakdown
`cost`	`Option<f64>`	Cost in USD
`metadata`	`serde_json::Value`	Provider-specific metadata

`VideoResult`

Field	Type	Description
`videos`	`Vec<GeneratedVideo>`	The generated videos
`timing`	`RequestTiming`	Request timing breakdown
`cost`	`Option<f64>`	Cost in USD
`metadata`	`serde_json::Value`	Provider-specific metadata

`AudioResult`

Field	Type	Description
`audio`	`Vec<GeneratedAudio>`	The generated audio clips
`timing`	`RequestTiming`	Request timing breakdown
`cost`	`Option<f64>`	Cost in USD
`metadata`	`serde_json::Value`	Provider-specific metadata

`ThreeDResult`

Field	Type	Description
`models`	`Vec<Generated3DModel>`	The generated 3D models
`timing`	`RequestTiming`	Request timing breakdown
`cost`	`Option<f64>`	Cost in USD
`metadata`	`serde_json::Value`	Provider-specific metadata

`TranscriptionResult`

Field	Type	Description
`text`	`String`	Full transcribed text
`segments`	`Vec<TranscriptionSegment>`	Time-aligned segments
`language`	`Option<String>`	Detected/specified language code
`timing`	`RequestTiming`	Request timing breakdown
`cost`	`Option<f64>`	Cost in USD
`metadata`	`serde_json::Value`	Provider-specific metadata

`TranscriptionSegment`

Field	Type	Description
`text`	`String`	Transcribed text for this segment
`start`	`f64`	Start time in seconds
`end`	`f64`	End time in seconds
`speaker`	`Option<String>`	Speaker label (if diarization was enabled)

Compute Job Types

`ComputeRequest`

Field	Type	Description
`model`	`String`	Model/endpoint to run (e.g. `"fal-ai/flux/dev"`)
`input`	`serde_json::Value`	Input parameters as JSON (model-specific)
`webhook`	`Option<String>`	Webhook URL for async completion notification

`ComputeResult`

Field	Type	Description
`job`	`Option<JobHandle>`	The job handle that produced this result
`output`	`serde_json::Value`	Output data (model-specific JSON)
`timing`	`RequestTiming`	Request timing breakdown
`cost`	`Option<f64>`	Cost in USD
`metadata`	`serde_json::Value`	Provider-specific metadata

`JobHandle`

Field	Type	Description
`id`	`String`	Provider-assigned job identifier
`provider`	`String`	Provider name (e.g. `"fal"`)
`model`	`String`	Model/endpoint that was invoked
`submitted_at`	`DateTime<Utc>`	When the job was submitted

`JobStatus`

pub enum JobStatus {
    Queued,
    Running,
    Completed,
    Failed { error: String },
    Cancelled,
}

Media

`MediaType`

Exhaustive enumeration of media formats with detection support. Covers images, video, audio, 3D models, documents, and a catch-all Other variant.

Variants:

Category	Variants
Image	`Png`, `Jpeg`, `WebP`, `Gif`, `Svg`, `Bmp`, `Tiff`, `Avif`, `Ico`
Video	`Mp4`, `WebM`, `Mov`, `Avi`, `Mkv`
Audio	`Mp3`, `Wav`, `Ogg`, `Flac`, `Aac`, `M4a`, `WebmAudio`
3D	`Glb`, `Gltf`, `Obj`, `Fbx`, `Usdz`, `Stl`, `Ply`
Document	`Pdf`
Catch-all	`Other { mime: String }`

Methods:

Method	Signature	Description
`mime()`	`&self -> &str`	Return the MIME type string
`extension()`	`&self -> &str`	Return the canonical file extension (no dot)
`magic_bytes()`	`&self -> Option<&'static [u8]>`	Return the magic byte signature, if any
`detect(bytes)`	`fn(&[u8]) -> Option<Self>`	Detect media type from file header bytes
`from_mime(mime)`	`fn(&str) -> Self`	Parse a MIME string (unknown = `Other`)
`from_extension(ext)`	`fn(&str) -> Self`	Parse a file extension (unknown = `Other`)
`is_image()`	`&self -> bool`	Is this an image format?
`is_video()`	`&self -> bool`	Is this a video format?
`is_audio()`	`&self -> bool`	Is this an audio format?
`is_3d()`	`&self -> bool`	Is this a 3D model format?
`is_vector()`	`&self -> bool`	Is this a text-based format (SVG, GLTF, OBJ)?

MediaType implements Display (outputs the MIME string).

Example:

use blazen_llm::MediaType;

let mt = MediaType::from_extension("png");
assert_eq!(mt.mime(), "image/png");
assert!(mt.is_image());

// Detect from raw bytes
let bytes = [0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A];
assert_eq!(MediaType::detect(&bytes), Some(MediaType::Png));

`MediaOutput`

A single piece of generated media content. At least one of url, base64, or raw_content will be populated.

Field	Type	Description
`url`	`Option<String>`	URL where the media can be downloaded
`base64`	`Option<String>`	Base64-encoded media data
`raw_content`	`Option<String>`	Raw text content (SVG, OBJ, GLTF JSON)
`media_type`	`MediaType`	Format of the media
`file_size`	`Option<u64>`	File size in bytes
`metadata`	`serde_json::Value`	Provider-specific metadata

Constructors:

let output = MediaOutput::from_url("https://example.com/img.png", MediaType::Png);
let output = MediaOutput::from_base64("iVBORw0KGgo=", MediaType::Png);

`GeneratedImage`

Field	Type	Description
`media`	`MediaOutput`	The image media output
`width`	`Option<u32>`	Width in pixels
`height`	`Option<u32>`	Height in pixels

`GeneratedVideo`

Field	Type	Description
`media`	`MediaOutput`	The video media output
`width`	`Option<u32>`	Width in pixels
`height`	`Option<u32>`	Height in pixels
`duration_seconds`	`Option<f32>`	Duration in seconds
`fps`	`Option<f32>`	Frames per second

`GeneratedAudio`

Field	Type	Description
`media`	`MediaOutput`	The audio media output
`duration_seconds`	`Option<f32>`	Duration in seconds
`sample_rate`	`Option<u32>`	Sample rate in Hz
`channels`	`Option<u8>`	Number of audio channels

`Generated3DModel`

Field	Type	Description
`media`	`MediaOutput`	The 3D model media output
`vertex_count`	`Option<u64>`	Total vertex count
`face_count`	`Option<u64>`	Total face/triangle count
`has_textures`	`bool`	Whether the model includes textures
`has_animations`	`bool`	Whether the model includes animations

LocalModel Trait

The LocalModel trait provides explicit load/unload lifecycle management for models running in-process (llama.cpp, whisper.cpp, etc.). Remote API providers do not implement this trait.

#[async_trait]
pub trait LocalModel: Send + Sync {
    async fn load(&self) -> Result<(), BlazenError>;
    async fn unload(&self) -> Result<(), BlazenError>;
    async fn is_loaded(&self) -> bool;
    fn device(&self) -> blazen_llm::Device { Device::Cpu }
    async fn memory_bytes(&self) -> Option<u64>;
}

Method	Description
`load()`	Load the model into memory/VRAM. Idempotent.
`unload()`	Free the model’s memory/VRAM. Idempotent.
`is_loaded()`	Whether the model is currently loaded.
`device()`	Which device the model targets. Determines which pool the `ModelManager` charges this model against. Defaults to `Device::Cpu`.
`memory_bytes()`	Approximate memory footprint in bytes (host RAM if `device()` is `Device::Cpu`, GPU VRAM otherwise), or `None` if unknown.

A type can implement both CompletionModel and LocalModel:

struct MyLocalLLM { /* ... */ }

#[async_trait::async_trait]
impl CompletionModel for MyLocalLLM {
    fn model_id(&self) -> &str { "my-local-llm" }
    async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse, BlazenError> {
        self.load().await?; // auto-load on first call
        // inference logic
        todo!()
    }
    async fn stream(&self, request: CompletionRequest) -> Result<Pin<Box<dyn Stream<Item = Result<StreamChunk, BlazenError>> + Send>>, BlazenError> {
        todo!()
    }
}

#[async_trait::async_trait]
impl LocalModel for MyLocalLLM {
    async fn load(&self) -> Result<(), BlazenError> { /* load weights */ Ok(()) }
    async fn unload(&self) -> Result<(), BlazenError> { /* free VRAM */ Ok(()) }
    async fn is_loaded(&self) -> bool { true }
    fn device(&self) -> Device { Device::Cuda(0) }
    async fn memory_bytes(&self) -> Option<u64> { Some(4_000_000_000) }
}

ModelManager

:::caution[Capacity, not performance] ModelManager is a memory budget bookkeeper, not a performance scheduler. It answers “will this fit?” — not “will this run fast?”. Whether a 70B model loaded on CPU is useful at 1–3 tok/s is a workload-choice question the manager intentionally does not answer. :::

Per-pool memory budget-aware model manager with LRU eviction. Tracks registered LocalModel instances and their estimated memory footprint, indexed by the Pool each model targets (Pool::Cpu or Pool::Gpu(index)). When loading a model would exceed that pool’s budget, the least-recently-used loaded model in the same pool is unloaded first. Models in different pools never evict each other.

use std::collections::HashMap;
use std::sync::Arc;
use blazen_llm::Pool;
use blazen_manager::ModelManager;

// Common case: CPU RAM + GPU(0) VRAM budgets in GB.
let manager = ModelManager::with_budgets_gb(64.0, 24.0);

// Explicit per-pool budgets in bytes.
let manager = ModelManager::new(HashMap::from([
    (Pool::Cpu, 64 * 1024 * 1024 * 1024),
    (Pool::Gpu(0), 24 * 1024 * 1024 * 1024),
]));

Methods

Method	Signature	Description
`new`	`fn(pool_budgets: HashMap<Pool, u64>) -> Self`	Create a manager with explicit per-pool byte budgets.
`with_budgets_gb`	`fn(cpu_ram_gb: f64, gpu_vram_gb: f64) -> Self`	Convenience constructor for `Pool::Cpu` + `Pool::Gpu(0)` budgets in GB.
`register`	`async fn(&self, id: &str, model: Arc<dyn LocalModel>, memory_estimate_bytes: u64)`	Register a model. Starts unloaded. The model’s `device()` selects the pool.
`load`	`async fn(&self, id: &str) -> Result<()>`	Load a model, evicting LRU models within the same pool if needed.
`unload`	`async fn(&self, id: &str) -> Result<()>`	Unload a model and free its memory.
`is_loaded`	`async fn(&self, id: &str) -> bool`	Check if a model is currently loaded.
`ensure_loaded`	`async fn(&self, id: &str) -> Result<()>`	Alias for `load()`.
`used_bytes`	`async fn(&self, pool: Pool) -> u64`	Bytes currently used by loaded models in the given pool.
`available_bytes`	`async fn(&self, pool: Pool) -> u64`	Bytes available within the given pool’s budget.
`pools`	`fn(&self) -> Vec<(Pool, u64)>`	All configured pools and their budgets.
`status`	`async fn(&self) -> Vec<ModelStatus>`	Status of all registered models.

ModelStatus

pub struct ModelStatus {
    pub id: String,
    pub loaded: bool,
    pub memory_estimate_bytes: u64,
    pub pool: Pool,
}

Field	Type	Description
`id`	`String`	Model identifier.
`loaded`	`bool`	Whether the model is currently loaded.
`memory_estimate_bytes`	`u64`	Estimated memory footprint in bytes (interpreted against `pool`).
`pool`	`Pool`	Which pool this model is charged against (`Pool::Cpu` or `Pool::Gpu(N)`).

Usage:

use std::sync::Arc;
use blazen_llm::Pool;
use blazen_manager::ModelManager;

let manager = ModelManager::with_budgets_gb(64.0, 24.0);
manager.register("llama", Arc::new(my_llama_model), 8 * 1_073_741_824).await;
manager.register("whisper", Arc::new(my_whisper_model), 2 * 1_073_741_824).await;

manager.load("llama").await?;
assert!(manager.is_loaded("llama").await);
println!("GPU used:      {} bytes", manager.used_bytes(Pool::Gpu(0)).await);
println!("GPU available: {} bytes", manager.available_bytes(Pool::Gpu(0)).await);

// Loading whisper may evict llama if the GPU pool is tight (only if both target Gpu(0))
manager.load("whisper").await?;
manager.unload("llama").await?;

Pricing

The pricing module provides a global thread-safe registry of per-model pricing data, pre-seeded with defaults for well-known models.

`PricingEntry`

Field	Type	Description
`input_per_million`	`f64`	Cost per million input tokens (USD).
`output_per_million`	`f64`	Cost per million output tokens (USD).

`register_pricing()`

use blazen_llm::{register_pricing, PricingEntry};

register_pricing("my-model", PricingEntry {
    input_per_million: 1.0,
    output_per_million: 2.0,
});

`lookup_pricing()`

Look up pricing for a model by ID. Returns None if the model is unknown.

use blazen_llm::lookup_pricing;

if let Some(entry) = lookup_pricing("gpt-4o") {
    println!("Input: ${}/M tokens", entry.input_per_million);
}

`compute_cost()`

Compute the cost of a request given a model ID and token usage.

use blazen_llm::{compute_cost, TokenUsage};

let usage = TokenUsage { prompt_tokens: 1000, completion_tokens: 500, total_tokens: 1500 };
if let Some(cost) = compute_cost("gpt-4o", &usage) {
    println!("Cost: ${:.4}", cost);
}

MemoryBackend Trait

Low-level storage backend used by Memory. Backends are responsible for persistence and band-based candidate retrieval. They do not perform embedding or ELID encoding.

#[async_trait]
pub trait MemoryBackend: Send + Sync {
    async fn put(&self, entry: StoredEntry) -> Result<()>;
    async fn get(&self, id: &str) -> Result<Option<StoredEntry>>;
    async fn delete(&self, id: &str) -> Result<bool>;
    async fn list(&self) -> Result<Vec<StoredEntry>>;
    async fn len(&self) -> Result<usize>;
    async fn is_empty(&self) -> Result<bool>; // default: self.len() == 0
    async fn search_by_bands(
        &self,
        bands: &[u64],
        limit: usize,
    ) -> Result<Vec<StoredEntry>>;
}

Implementing a Custom Backend

use blazen_memory::store::{MemoryBackend, StoredEntry};
use anyhow::Result;

struct PostgresBackend {
    pool: sqlx::PgPool,
}

#[async_trait::async_trait]
impl MemoryBackend for PostgresBackend {
    async fn put(&self, entry: StoredEntry) -> Result<()> {
        // INSERT or UPDATE in Postgres
        todo!()
    }

    async fn get(&self, id: &str) -> Result<Option<StoredEntry>> {
        // SELECT by id
        todo!()
    }

    async fn delete(&self, id: &str) -> Result<bool> {
        // DELETE by id, return true if row existed
        todo!()
    }

    async fn list(&self) -> Result<Vec<StoredEntry>> {
        // SELECT all
        todo!()
    }

    async fn len(&self) -> Result<usize> {
        // SELECT COUNT(*)
        todo!()
    }

    async fn search_by_bands(
        &self,
        bands: &[u64],
        limit: usize,
    ) -> Result<Vec<StoredEntry>> {
        // Query entries sharing at least one band
        todo!()
    }
}

Built-in Backends

Backend	Description
`InMemoryBackend`	In-process `HashMap` storage. Fast, no persistence.
`JsonlBackend`	Append-only JSONL file storage.
`ValkeyBackend`	Redis/Valkey-backed storage.

Error Handling

`BlazenError`

The unified error type for all Blazen LLM and compute operations.

Variant	Fields	Description
`Auth`	`message: String`	Authentication failed
`RateLimit`	`retry_after_ms: Option<u64>`	Rate limited by the provider
`Timeout`	`elapsed_ms: u64`	Request timed out
`Provider`	`provider: String, message: String, status_code: Option<u16>`	Provider-specific error
`Validation`	`field: Option<String>, message: String`	Invalid input
`ContentPolicy`	`message: String`	Content policy violation
`Unsupported`	`message: String`	Requested capability is not supported
`Serialization`	`String`	JSON serialization/deserialization error
`Request`	`message: String, source: Option<Box<dyn Error>>`	Network or request-level failure
`Completion`	`CompletionErrorKind`	LLM completion-specific error
`Compute`	`ComputeErrorKind`	Compute job-specific error
`Media`	`MediaErrorKind`	Media-specific error
`Tool`	`name: Option<String>, message: String`	Tool execution error

`CompletionErrorKind`

Variant	Description
`NoContent`	Model returned no content
`ModelNotFound(String)`	Model not found
`InvalidResponse(String)`	Invalid response from the model
`Stream(String)`	Streaming error

`ComputeErrorKind`

Variant	Fields	Description
`JobFailed`	`message: String, error_type: Option<String>, retryable: bool`	Compute job failed
`Cancelled`	—	Job was cancelled
`QuotaExceeded`	`message: String`	Provider quota exceeded

`MediaErrorKind`

Variant	Fields	Description
`Invalid`	`media_type: Option<String>, message: String`	Invalid media
`TooLarge`	`size_bytes: u64, max_bytes: u64`	Media exceeds size limit

`is_retryable()`

impl BlazenError {
    pub fn is_retryable(&self) -> bool;
}

Returns true for RateLimit, Timeout, Request, provider errors with status >= 500, and ComputeErrorKind::JobFailed where retryable is true.

Convenience Constructors

BlazenError::auth("invalid api key")
BlazenError::timeout(5000)
BlazenError::timeout_from_duration(elapsed)
BlazenError::request("connection reset")
BlazenError::unsupported("music generation not available")
BlazenError::provider("openai", "internal server error")
BlazenError::validation("prompt must not be empty")
BlazenError::tool_error("unknown tool: foo")
BlazenError::no_content()
BlazenError::model_not_found("gpt-5")
BlazenError::invalid_response("missing content field")
BlazenError::stream_error("unexpected EOF")
BlazenError::job_failed("GPU out of memory")
BlazenError::cancelled()

BlazenError also implements From<serde_json::Error> for automatic conversion.

Custom Providers

Implementing `CompletionModel`

use blazen_llm::{
    CompletionModel, CompletionRequest, CompletionResponse, StreamChunk, BlazenError,
};
use std::pin::Pin;
use futures_util::Stream;

struct MyProvider {
    api_key: String,
}

#[async_trait::async_trait]
impl CompletionModel for MyProvider {
    fn model_id(&self) -> &str {
        "my-custom-model"
    }

    async fn complete(
        &self,
        request: CompletionRequest,
    ) -> Result<CompletionResponse, BlazenError> {
        // Your HTTP/gRPC/local inference logic here
        todo!()
    }

    async fn stream(
        &self,
        request: CompletionRequest,
    ) -> Result<
        Pin<Box<dyn Stream<Item = Result<StreamChunk, BlazenError>> + Send>>,
        BlazenError,
    > {
        // Your streaming implementation here
        todo!()
    }
}

Once implemented, MyProvider automatically gets StructuredOutput via the blanket impl, so model.extract::<T>(messages) works out of the box.

Implementing `ComputeProvider` + `ImageGeneration`

use blazen_llm::compute::*;
use blazen_llm::BlazenError;

struct MyImageProvider {
    api_key: String,
}

#[async_trait::async_trait]
impl ComputeProvider for MyImageProvider {
    fn provider_id(&self) -> &str { "my-image-provider" }

    async fn submit(&self, request: ComputeRequest) -> Result<JobHandle, BlazenError> {
        todo!()
    }

    async fn status(&self, job: &JobHandle) -> Result<JobStatus, BlazenError> {
        todo!()
    }

    async fn result(&self, job: JobHandle) -> Result<ComputeResult, BlazenError> {
        todo!()
    }

    async fn cancel(&self, job: &JobHandle) -> Result<(), BlazenError> {
        todo!()
    }
}

#[async_trait::async_trait]
impl ImageGeneration for MyImageProvider {
    async fn generate_image(
        &self,
        request: ImageRequest,
    ) -> Result<ImageResult, BlazenError> {
        // Convert ImageRequest to your provider's format and call the API
        todo!()
    }

    async fn upscale_image(
        &self,
        request: UpscaleRequest,
    ) -> Result<ImageResult, BlazenError> {
        todo!()
    }
}

Built-in Providers

Provider	Feature	Traits Implemented
`OpenAiProvider`	`openai`	`CompletionModel`, `StructuredOutput`
`OpenAiCompatProvider`	`openai`	`CompletionModel`, `StructuredOutput`, `ModelRegistry`
`AnthropicProvider`	`anthropic`	`CompletionModel`, `StructuredOutput`
`GeminiProvider`	`gemini`	`CompletionModel`, `StructuredOutput`, `ModelRegistry`
`AzureOpenAiProvider`	`azure`	`CompletionModel`, `StructuredOutput`
`FalProvider`	`fal`	`CompletionModel`, `StructuredOutput`, `ComputeProvider`, `ImageGeneration`, `VideoGeneration`, `AudioGeneration`, `Transcription`

`OpenAiCompatProvider` Presets

OpenAiCompatProvider works with any OpenAI-compatible endpoint. Named constructors are provided for popular services:

use blazen_llm::providers::openai_compat::OpenAiCompatProvider;

let groq = OpenAiCompatProvider::groq("gsk-...");
let openrouter = OpenAiCompatProvider::openrouter("sk-or-...");
let together = OpenAiCompatProvider::together("...");
let mistral = OpenAiCompatProvider::mistral("...");
let deepseek = OpenAiCompatProvider::deepseek("...");
let fireworks = OpenAiCompatProvider::fireworks("...");
let perplexity = OpenAiCompatProvider::perplexity("...");
let xai = OpenAiCompatProvider::xai("...");
let cohere = OpenAiCompatProvider::cohere("...");
let bedrock = OpenAiCompatProvider::bedrock("...", "us-east-1");

Telemetry Exporters

Re-exported from blazen_telemetry at the crate root and gated by the corresponding Cargo features (see Feature Flags). All exporters return a tracing_subscriber::Layer (or install one globally) that is composed into a tracing_subscriber::registry().

`LangfuseConfig`

Builder-style configuration for the Langfuse exporter. Shipping langfuse pulls in reqwest for the native ingestion client; on wasm32 the dispatcher is a no-op (events are dropped) because Langfuse export is a native-target feature.

Method	Signature	Description
`new`	`fn new(public_key: impl Into<String>, secret_key: impl Into<String>) -> Self`	Construct with the required Langfuse public + secret keys. Defaults host to `https://cloud.langfuse.com`, batch size to `100`, flush interval to `5000` ms
`with_host`	`fn with_host(self, host: impl Into<String>) -> Self`	Override the Langfuse host URL (e.g. `https://eu.langfuse.com`)
`with_batch_size`	`fn with_batch_size(self, batch_size: usize) -> Self`	Maximum number of envelopes buffered before an automatic flush
`with_flush_interval_ms`	`fn with_flush_interval_ms(self, ms: u64) -> Self`	Background flush cadence in milliseconds

LangfuseConfig derives Debug, Clone, Serialize, Deserialize, and Default so it can be loaded from configuration files. Public/secret-key fields are required; host, batch_size, and flush_interval_ms are populated with defaults during deserialization.

`LangfuseLayer`

A tracing_subscriber::Layer<S> (where S: Subscriber + for<'a> LookupSpan<'a>) that maps Blazen spans to Langfuse ingestion events:

Blazen span name	Langfuse concept	Ingestion event
`workflow.run`, `pipeline.run`	Trace	`trace-create`
`workflow.step`, `pipeline.stage`, `pipeline.stage.sequential`, `pipeline.stage.parallel`	Span	`span-create`
`llm.complete`, `llm.stream`	Generation	`generation-create`

Construct via init_langfuse. The layer owns an unbounded mpsc sender into a background dispatcher that batches and POSTs to {host}/api/public/ingestion with HTTP basic auth (public_key:secret_key).

`init_langfuse`

pub fn init_langfuse(config: LangfuseConfig) -> Result<LangfuseLayer, TelemetryError>;

Builds the HTTP client and spawns the background dispatcher on the current Tokio runtime, then returns a LangfuseLayer ready to compose into a subscriber registry. Returns TelemetryError::Langfuse if no Tokio runtime is available (native targets) or if the underlying reqwest::Client cannot be built.

use blazen_telemetry::{LangfuseConfig, init_langfuse};
use tracing_subscriber::prelude::*;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = LangfuseConfig::new("pk-lf-...", "sk-lf-...")
        .with_host("https://cloud.langfuse.com")
        .with_batch_size(50)
        .with_flush_interval_ms(2_500);

    let layer = init_langfuse(config)?;
    tracing_subscriber::registry().with(layer).init();
    Ok(())
}

`OtlpConfig`

Shared configuration for both OTLP transports.

Field	Type	Description
`endpoint`	`String`	OTLP endpoint URL. For `otlp` (gRPC): `http://localhost:4317`. For `otlp-http`: `http://localhost:4318/v1/traces`
`service_name`	`String`	Reported as the `service.name` resource attribute on every span

Derives Debug, Clone, Serialize, Deserialize.

`init_otlp` (gRPC, `otlp` feature, native only)

pub fn init_otlp(config: OtlpConfig) -> Result<(), Box<dyn std::error::Error>>;

Builds a tonic-backed SpanExporter, installs it into a global SdkTracerProvider, and registers a tracing_subscriber::registry() with EnvFilter, the OTel layer, and a fmt layer in one call. Native targets only — opentelemetry-otlp/grpc-tonic does not compile to wasm32-unknown-unknown.

`init_otlp_http` (HTTP/protobuf, `otlp-http` feature)

pub fn init_otlp_http(config: OtlpConfig) -> Result<(), Box<dyn std::error::Error>>;

Same shape as init_otlp but speaks OTLP/HTTP with the binary protobuf encoding. Works on both native and wasm32:

Native: registers an internal ReqwestHttpClient (a thin reqwest::Client wrapper) via with_http_client.
wasm32: registers WasmFetchHttpClient, a web_sys::fetch-backed opentelemetry_http::HttpClient impl that ships in the blazen_telemetry::exporters::wasm_otlp_client module.

The reason for the indirection: opentelemetry-otlp’s built-in reqwest-client / reqwest-blocking-client features compile a wasm32 reqwest::Client whose send future is !Send, which violates the HttpClient: Send + Sync bound and breaks the build. Pinning our own HttpClient impls per target keeps the trait satisfied without enabling those upstream features.

use blazen_telemetry::{OtlpConfig, init_otlp_http};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    init_otlp_http(OtlpConfig {
        endpoint: "http://localhost:4318/v1/traces".to_string(),
        service_name: "my-blazen-app".to_string(),
    })?;
    // ... run workflow; spans flush via OTLP/HTTP ...
    Ok(())
}

`TelemetryError`

Returned by init_langfuse. Variants include Langfuse(String) (HTTP-client construction or runtime-lookup failure). Re-exported from blazen_telemetry::TelemetryError.

WASM Embeddings (`blazen-embed-tract`)

The blazen-embed-tract crate ships two ONNX-runtime-backed embedding providers built on tract:

Type	Target	Source of weights
`TractEmbedModel`	native	`hf-hub` (Hugging Face download cache, filesystem-backed)
`WasmTractEmbedModel`	`wasm32` only	`web_sys::fetch` (URLs supplied by the caller)

The wasm variant exists because hf-hub requires a filesystem and Tokio, neither of which is available in wasm32-unknown-unknown. Both providers share the same TractOptions and pooling logic so callers generic over EmbeddingModel work unchanged.

`WasmTractEmbedModel`

#[cfg(target_arch = "wasm32")]
use blazen_embed_tract::wasm_provider::{WasmTractEmbedModel, WasmTractError, WasmTractResponse};
use blazen_embed_tract::options::TractOptions;

Method	Signature	Description
`create`	`async fn create(model_url: &str, tokenizer_url: &str, options: TractOptions) -> Result<Self, WasmTractError>`	Fetch ONNX weights and a HuggingFace `tokenizer.json` from the supplied URLs and build a runnable model. `options.cache_dir` and `options.show_download_progress` are ignored on wasm32
`embed`	`async fn embed(&self, texts: &[String]) -> Result<WasmTractResponse, WasmTractError>`	Returns one L2-normalized vector per input. Inference runs synchronously on the wasm main thread
`model_id`	`fn model_id(&self) -> &str`	Hugging Face model id resolved from the `TractOptions::model_name` registry entry
`dimensions`	`fn dimensions(&self) -> usize`	Output embedding dimensionality

WasmTractResponse exposes embeddings: Vec<Vec<f32>> and model: String. WasmTractError variants: UnknownModel(String), Fetch { url, message }, Init(String), Embed(String).

#[cfg(target_arch = "wasm32")]
{
    use blazen_embed_tract::options::TractOptions;
    use blazen_embed_tract::wasm_provider::WasmTractEmbedModel;

    let opts = TractOptions {
        model_name: Some("bge-small-en-v1.5".to_string()),
        ..Default::default()
    };
    let model = WasmTractEmbedModel::create(
        "https://huggingface.co/Xenova/bge-small-en-v1.5/resolve/main/onnx/model.onnx",
        "https://huggingface.co/Xenova/bge-small-en-v1.5/resolve/main/tokenizer.json",
        opts,
    ).await?;
    let out = model.embed(&["hello world".into()]).await?;
}

The fetch loop resolves either window.fetch (browser) or globalThis.fetch (Workers, Deno, Node) so the same provider runs in every wasm host.

Send + Sync are implemented vacuously (unsafe impl) because wasm32-unknown-unknown is single-threaded; this lets WasmTractEmbedModel sit behind Arc<dyn EmbeddingModel> in target-generic code.

Local Inference Backend Re-exports

When you enable a local-inference feature on blazen-llm, the per-backend crate’s public types are re-exported at the blazen_llm crate root so callers do not need to depend on the backend crate directly. Each group of types is gated by its respective feature.

`mistralrs` feature

Re-exported un-prefixed (the mistralrs backend was the first local provider added and owns the canonical names):

#[cfg(feature = "mistralrs")]
use blazen_llm::{
    MistralRsProvider, MistralRsOptions, MistralRsError,
    ChatMessageInput, ChatRole,
    InferenceChunk, InferenceChunkStream,
    InferenceImage, InferenceImageSource,
    InferenceResult, InferenceToolCall, InferenceUsage,
};

`llamacpp` feature

Re-exported with a LlamaCpp prefix to avoid colliding with the mistralrs names when both features are enabled simultaneously:

#[cfg(feature = "llamacpp")]
use blazen_llm::{
    LlamaCppProvider, LlamaCppOptions, LlamaCppError,
    LlamaCppChatMessageInput, LlamaCppChatRole,
    LlamaCppInferenceChunk, LlamaCppInferenceChunkStream,
    LlamaCppInferenceResult, LlamaCppInferenceUsage,
};

`candle-llm` feature

Candle exposes its result type as CandleInferenceResult (already prefixed upstream). The CandleLlmCompletionModel adapter wraps the raw provider in the CompletionModel trait:

#[cfg(feature = "candle-llm")]
use blazen_llm::{
    CandleLlmProvider, CandleLlmCompletionModel,
    CandleLlmOptions, CandleLlmError,
    CandleInferenceResult,
};

Other local backends

Feature	Re-exports
`candle-embed`	`CandleEmbedModel`, `CandleEmbedOptions`, `CandleEmbedError`
`embed`	`EmbedModel`, `EmbedOptions`, `EmbedResponse`, `EmbedError` (from `blazen-embed`)
`whispercpp`	`WhisperCppProvider`, `WhisperModel`, `WhisperOptions`, `WhisperError`
`piper`	`PiperProvider`, `PiperOptions`, `PiperError`
`diffusion`	`DiffusionProvider`, `DiffusionOptions`, `DiffusionScheduler`, `DiffusionError`

All five additional backends follow the same convention — enable the feature on blazen-llm, then import directly from blazen_llm::*.

Rust API Reference

Feature Flags

Core LLM Traits

CompletionModel

StructuredOutput

EmbeddingModel

Tool

TypedTool

typed_tool_simple

ToolOutput

LlmPayload

ProviderId

ModelRegistry

ModelInfo

ModelPricing

ModelCapabilities

Types

ChatMessage

ChatMessage::tool_result

ChatMessage::tool_result_parts

ChatMessage::tool_result_view

Role

MessageContent

ContentPart

ImageContent

ImageSource

FileContent

CompletionRequest

CompletionResponse

StructuredResponse<T>

EmbeddingResponse

RequestTiming

TokenUsage

ToolDefinition

ToolCall

StreamChunk

Content Subsystem

ContentKind

ContentHandle

ContentStore

ContentBody

ByteStream

ContentHint

ContentMetadata

DynContentStore

Built-in stores

Tool-input helpers

Visibility helpers

Magic-number detection

CompletionRequest::resolve_handles_with

Agent System

run_agent()

run_agent_with_callback()

AgentConfig

AgentResult

AgentEvent

Context

State Storage

Typed JSON: set() / get()

Binary: set_bytes() / get_bytes()

Raw StateValue: set_value() / get_value()

StateValue

Run Identity

Event Routing

Session References

State Snapshot and Restore

BlazenState

Session Reference Registry

RegistryKey

SessionRefRegistry

SessionPausePolicy

SessionRefError

Snapshot exclusion

Workflow

WorkflowBuilder

WorkflowBuilder::session_pause_policy

WorkflowHandler

WorkflowHandler::session_refs

WorkflowHandler::result

WorkflowHandler::pause

`CompletionModel`

`StructuredOutput`

`EmbeddingModel`

`Tool`

`TypedTool`

`typed_tool_simple`

`ToolOutput`

`LlmPayload`

`ProviderId`

`ModelRegistry`

`ModelInfo`

`ModelPricing`

`ModelCapabilities`

`ChatMessage`

`ChatMessage::tool_result`

`ChatMessage::tool_result_parts`

`ChatMessage::tool_result_view`

`Role`

`MessageContent`

`ContentPart`

`ImageContent`

`ImageSource`

`FileContent`

`CompletionRequest`

`CompletionResponse`

`StructuredResponse<T>`

`EmbeddingResponse`

`RequestTiming`

`TokenUsage`

`ToolDefinition`

`ToolCall`

`StreamChunk`

`ContentKind`

`ContentHandle`

`ContentStore`

`ContentBody`

`ByteStream`

`ContentHint`

`ContentMetadata`

`DynContentStore`

`CompletionRequest::resolve_handles_with`

`run_agent()`

`run_agent_with_callback()`

`AgentConfig`

`AgentResult`

`AgentEvent`

Typed JSON: `set()` / `get()`

Binary: `set_bytes()` / `get_bytes()`

Raw StateValue: `set_value()` / `get_value()`

`StateValue`

`RegistryKey`

`SessionRefRegistry`

`SessionPausePolicy`

`SessionRefError`

`WorkflowBuilder`

`WorkflowBuilder::session_pause_policy`

`WorkflowHandler`

`WorkflowHandler::session_refs`

`WorkflowHandler::result`

`WorkflowHandler::pause`

`WorkflowHandler::snapshot`

`WorkflowHandler::resume_in_place`

`WorkflowHandler::respond_to_input`

`WorkflowHandler::abort`

`WorkflowError` variants

`SessionRefsNotSerializable`

`ComputeProvider`

`ImageGeneration`

`VideoGeneration`

`AudioGeneration`

`Transcription`

`ThreeDGeneration`

`ImageRequest`

`UpscaleRequest`

`VideoRequest`

`SpeechRequest`

`MusicRequest`

`TranscriptionRequest`

`ThreeDRequest`

`ImageResult`