Rust API Reference
Complete API reference for blazen-llm in Rust
Feature Flags
blazen-llm providers:
| Feature | Description |
|---|---|
openai | Enables OpenAiProvider and OpenAiCompatProvider (covers OpenRouter, Groq, Together, Mistral, DeepSeek, Fireworks, Perplexity, xAI, Cohere, Bedrock) |
anthropic | Enables AnthropicProvider |
gemini | Enables GeminiProvider |
fal | Enables FalProvider (compute: image, video, audio, 3D) |
azure | Enables AzureOpenAiProvider |
all-providers | Enables all provider implementations |
blazen-llm local-inference backends (each gated behind its own feature, all bundled in the all-local umbrella):
| Feature | Re-exports from blazen_llm::* |
|---|---|
mistralrs | MistralRsProvider, ChatMessageInput, ChatRole, InferenceChunk, InferenceChunkStream, InferenceImage, InferenceImageSource, InferenceResult, InferenceToolCall, InferenceUsage, MistralRsError, MistralRsOptions |
llamacpp | LlamaCppProvider, LlamaCppChatMessageInput, LlamaCppChatRole, LlamaCppInferenceChunk, LlamaCppInferenceChunkStream, LlamaCppInferenceResult, LlamaCppInferenceUsage, LlamaCppError, LlamaCppOptions |
candle-llm | CandleLlmProvider, CandleLlmCompletionModel, CandleInferenceResult, CandleLlmError, CandleLlmOptions |
candle-embed | CandleEmbedModel, CandleEmbedOptions, CandleEmbedError |
embed | EmbedModel, EmbedOptions, EmbedResponse, EmbedError |
whispercpp | WhisperCppProvider, WhisperModel, WhisperOptions, WhisperError |
piper | PiperProvider, PiperOptions, PiperError |
diffusion | DiffusionProvider, DiffusionOptions, DiffusionScheduler, DiffusionError |
blazen-telemetry exporters:
| Feature | Description |
|---|---|
spans (default) | Enables TracingCompletionModel and per-span instrumentation hooks |
history | Enables WorkflowHistory, HistoryEvent, HistoryEventKind, PauseReason |
otlp | OTLP gRPC exporter via tonic (init_otlp + OtlpConfig). Native targets only |
otlp-http | OTLP HTTP/protobuf exporter via a custom HttpClient (init_otlp_http + OtlpConfig). Works on native and wasm32 |
prometheus | Enables init_prometheus + MetricsLayer |
langfuse | Enables LangfuseConfig, LangfuseLayer, init_langfuse |
all | Enables spans, history, otlp, otlp-http, prometheus, langfuse |
Core LLM Traits
CompletionModel
The central trait every LLM provider must implement. Supports both one-shot and streaming completions.
#[async_trait]
pub trait CompletionModel: Send + Sync {
fn model_id(&self) -> &str;
async fn complete(
&self,
request: CompletionRequest,
) -> Result<CompletionResponse, BlazenError>;
async fn stream(
&self,
request: CompletionRequest,
) -> Result<
Pin<Box<dyn Stream<Item = Result<StreamChunk, BlazenError>> + Send>>,
BlazenError,
>;
}
Usage:
use blazen_llm::{CompletionModel, CompletionRequest, ChatMessage};
use blazen_llm::providers::openai::OpenAiProvider;
let model = OpenAiProvider::new("sk-...");
let request = CompletionRequest::new(vec![
ChatMessage::user("What is 2 + 2?"),
]);
let response = model.complete(request).await?;
println!("{}", response.content.unwrap_or_default());
Streaming:
use futures_util::StreamExt;
let request = CompletionRequest::new(vec![
ChatMessage::user("Tell me a story"),
]);
let mut stream = model.stream(request).await?;
while let Some(chunk) = stream.next().await {
let chunk = chunk?;
if let Some(delta) = &chunk.delta {
print!("{delta}");
}
}
StructuredOutput
Extract typed data from a model using JSON Schema constraints. This trait has a blanket implementation for every CompletionModel — providers do not need to implement it.
#[async_trait]
pub trait StructuredOutput: CompletionModel {
async fn extract<T: JsonSchema + DeserializeOwned + Send>(
&self,
messages: Vec<ChatMessage>,
) -> Result<StructuredResponse<T>, BlazenError>;
}
// Blanket impl: every CompletionModel automatically gets this.
impl<M: CompletionModel> StructuredOutput for M {}
T must implement schemars::JsonSchema and serde::de::DeserializeOwned. The schema is derived at call time via schemars::schema_for! and injected into the request’s response_format.
Usage:
use schemars::JsonSchema;
use serde::Deserialize;
use blazen_llm::StructuredOutput;
#[derive(JsonSchema, Deserialize)]
struct Sentiment {
label: String,
score: f64,
}
let result = model.extract::<Sentiment>(vec![
ChatMessage::user("Analyze sentiment: 'I love Rust'"),
]).await?;
println!("{}: {}", result.data.label, result.data.score);
EmbeddingModel
Produces vector embeddings for text inputs.
#[async_trait]
pub trait EmbeddingModel: Send + Sync {
fn model_id(&self) -> &str;
fn dimensions(&self) -> usize;
async fn embed(&self, texts: &[String]) -> Result<EmbeddingResponse, BlazenError>;
}
Usage:
let texts = vec!["Hello world".into(), "Goodbye world".into()];
let response = embedding_model.embed(&texts).await?;
for (i, vector) in response.embeddings.iter().enumerate() {
println!("text {i}: {} dimensions", vector.len());
}
Tool
A callable tool that can be invoked by an LLM during a conversation.
#[async_trait]
pub trait Tool: Send + Sync {
fn definition(&self) -> ToolDefinition;
async fn execute(
&self,
arguments: serde_json::Value,
) -> Result<ToolOutput<serde_json::Value>, BlazenError>;
}
:::note[Migration from earlier versions]
The Tool::execute return type changed: it used to return a bare Result<…Value…, BlazenError> and now returns Result<ToolOutput<Value>, BlazenError>. Existing tools that returned a Value continue to compile by changing Ok(value) to Ok(value.into()) — the From<Value> impl wraps it into a ToolOutput with no override. The new ChatMessage::tool_result constructor is also a breaking change for callers that passed a &str; convert via serde_json::Value::String(s.into()) or use serde_json::json!(s).
:::
Usage:
use blazen_llm::{Tool, ToolDefinition, ToolOutput, BlazenError};
use async_trait::async_trait;
struct WeatherTool;
#[async_trait]
impl Tool for WeatherTool {
fn definition(&self) -> ToolDefinition {
ToolDefinition {
name: "get_weather".into(),
description: "Get the weather for a city.".into(),
parameters: serde_json::json!({
"type": "object",
"properties": { "city": { "type": "string" } },
"required": ["city"],
}),
}
}
async fn execute(
&self,
args: serde_json::Value,
) -> Result<ToolOutput<serde_json::Value>, BlazenError> {
let _city = args["city"].as_str().unwrap_or_default();
// Common case: return a structured value, no override.
Ok(serde_json::json!({ "temperature_f": 72, "conditions": "clear" }).into())
}
}
Sending a summary to the LLM while keeping the full payload visible to callers:
use blazen_llm::{Tool, ToolDefinition, ToolOutput, LlmPayload, BlazenError};
use async_trait::async_trait;
# struct SearchTool;
# #[async_trait]
# impl Tool for SearchTool {
# fn definition(&self) -> ToolDefinition { unimplemented!() }
async fn execute(
&self,
args: serde_json::Value,
) -> Result<ToolOutput<serde_json::Value>, BlazenError> {
Ok(ToolOutput::with_override(
serde_json::json!({ "items": [1, 2, 3], "raw": "..." }),
LlmPayload::Text { text: "Found 3 items.".into() },
))
}
# }
The full data payload is preserved in ChatMessage.tool_result so application code can inspect the unredacted result, while only the llm_override is sent to the model on the next turn.
TypedTool
A generic wrapper that turns a typed handler Fn(Args) -> Future<Output = Result<ToolOutput<Output>>> into an implementation of Tool. Handles serde_json::from_value of the input and serde_json::to_value of the output for you, and auto-derives the JSON Schema in ToolDefinition::parameters via schemars::schema_for!.
pub struct TypedTool<Args, Output, F>
where
Args: DeserializeOwned + JsonSchema + Send + 'static,
Output: Serialize + Send + 'static,
F: Fn(Args) -> BoxFut<Output> + Send + Sync + 'static,
{ /* ... */ }
impl<Args, Output, F> TypedTool<Args, Output, F> {
pub fn new(
name: impl Into<String>,
description: impl Into<String>,
handler: F,
) -> Self;
}
Usage:
use blazen_llm::{TypedTool, ToolOutput};
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
#[derive(Deserialize, JsonSchema)]
struct AddArgs { a: i64, b: i64 }
#[derive(Serialize)]
struct AddOutput { sum: i64 }
let add_tool = TypedTool::new(
"add",
"Add two numbers.",
|args: AddArgs| {
Box::pin(async move {
Ok(ToolOutput::new(AddOutput { sum: args.a + args.b }))
})
},
);
typed_tool_simple
Convenience constructor for the no-override common case. The handler returns Result<Output> directly; the wrapper applies ToolOutput::new for you.
pub fn typed_tool_simple<Args, Output, Fut, F>(
name: impl Into<String>,
description: impl Into<String>,
handler: F,
) -> impl Tool
where
Args: DeserializeOwned + JsonSchema + Send + 'static,
Output: Serialize + Send + 'static,
Fut: Future<Output = Result<Output, BlazenError>> + Send + 'static,
F: Fn(Args) -> Fut + Send + Sync + 'static;
Usage:
use blazen_llm::{typed_tool_simple, BlazenError};
let add_tool = typed_tool_simple(
"add",
"Add two numbers.",
|args: AddArgs| async move {
Ok::<_, BlazenError>(AddOutput { sum: args.a + args.b })
},
);
Why it exists: TypedTool does the serde_json::from_value of the input and serde_json::to_value of the output for you, exactly once per call. The JSON Schema in ToolDefinition::parameters is auto-derived from Args via schemars::schema_for!, so you do not have to hand-write the schema or repeat field names.
ToolOutput
The return type of Tool::execute. Carries the structured data that callers see, plus an optional llm_override controlling what the LLM receives on the next turn.
pub struct ToolOutput<T = Value> {
pub data: T,
pub llm_override: Option<LlmPayload>,
}
| Field | Type | Description |
|---|---|---|
data | T | Structured tool output. Visible to application code via ChatMessage.tool_result. |
llm_override | Option<LlmPayload> | If Some, replaces the default representation of data when serialised into the next prompt. If None, each provider applies a sensible default (see LlmPayload). |
Constructors:
| Constructor | Signature | Description |
|---|---|---|
ToolOutput::new | fn(data: T) -> Self | Wrap structured data with no override. The LLM sees the provider default. |
ToolOutput::with_override | fn(data: T, override_payload: LlmPayload) -> Self | Wrap structured data and pin exactly what the LLM receives next turn. |
From<Value> | impl From<Value> for ToolOutput<Value> | value.into() produces ToolOutput { data: value, llm_override: None }. Lets Tool::execute keep returning bare Values with Ok(value.into()). |
Usage:
use blazen_llm::{ToolOutput, LlmPayload};
use serde_json::json;
let plain = ToolOutput::new(json!({ "items": [1, 2, 3] }));
let with_summary = ToolOutput::with_override(
json!({ "items": [1, 2, 3], "raw": "..." }),
LlmPayload::Text { text: "Found 3 items.".into() },
);
Why it exists: many tools return large structured payloads that the application wants in full (logs, UI, downstream steps), but feeding all of it back into the next LLM call is wasteful or noisy. ToolOutput decouples the two channels so you can return rich data to the caller while pinning a compact summary for the model.
LlmPayload
The wire-format-agnostic shape of a tool result as it appears to the LLM. Used as ToolOutput::llm_override and as the second component of ChatMessage::tool_result_view.
#[serde(tag = "kind", rename_all = "snake_case")]
pub enum LlmPayload {
Text { text: String },
Json { value: serde_json::Value },
Parts { parts: Vec<ContentPart> },
ProviderRaw { provider: ProviderId, value: serde_json::Value },
}
| Variant | Description |
|---|---|
Text { text } | Plain text. Sent as-is to providers that accept string tool results, or wrapped in [{type: "text", text}] for providers that require parts. |
Json { value } | Structured JSON. Each provider serialises this in its native shape (see per-provider behaviour below). |
Parts { parts } | A Vec<ContentPart> for multimodal tool results (text + images + files). Used together with ChatMessage::tool_result_parts. |
ProviderRaw { provider, value } | An exact, provider-specific payload. Bypasses Blazen’s translation layer and is forwarded verbatim only when the active provider matches provider; other providers fall back to the default representation of data. |
Usage:
use blazen_llm::{LlmPayload, ProviderId};
use serde_json::json;
LlmPayload::Text { text: "Found 3 results.".into() };
LlmPayload::Json { value: json!({ "items": [1, 2, 3] }) };
LlmPayload::ProviderRaw {
provider: ProviderId::Anthropic,
value: json!([{"type": "text", "text": "..."}]),
};
Per-provider behaviour for the default (no llm_override) case:
When a tool returns structured data and no llm_override, each provider sends a sensible default to the LLM:
- OpenAI / OpenAI-compat / Azure / Responses / Fal: the data is JSON-stringified into the
contentfield of the tool message. - Anthropic: structured data becomes
[{type: "text", text: <stringified-json>}]insidetool_result.content. - Gemini: structured object data is passed natively as
functionResponse.response. Scalars are wrapped as{result: <scalar>}.
ProviderId
Tags an LlmPayload::ProviderRaw variant with the provider whose wire format the value follows. The runtime uses this to decide whether to forward the raw payload or fall back to the default representation.
pub enum ProviderId {
OpenAi,
OpenAiCompat,
Azure,
Anthropic,
Gemini,
Responses,
Fal,
}
| Variant | Provider |
|---|---|
OpenAi | OpenAiProvider |
OpenAiCompat | OpenAiCompatProvider (OpenRouter, Groq, Together, etc.) |
Azure | AzureOpenAiProvider |
Anthropic | AnthropicProvider |
Gemini | GeminiProvider |
Responses | OpenAI Responses API provider |
Fal | FalProvider |
ModelRegistry
Allows providers to advertise their available models.
#[async_trait]
pub trait ModelRegistry: Send + Sync {
async fn list_models(&self) -> Result<Vec<ModelInfo>, BlazenError>;
async fn get_model(&self, model_id: &str) -> Result<Option<ModelInfo>, BlazenError>;
}
ModelInfo
| Field | Type | Description |
|---|---|---|
id | String | Model identifier used in API requests (e.g. "gpt-4o") |
name | Option<String> | Human-readable display name |
provider | String | Provider that serves this model |
context_length | Option<u64> | Maximum context window in tokens |
pricing | Option<ModelPricing> | Pricing information |
capabilities | ModelCapabilities | What this model can do |
ModelPricing
| Field | Type | Description |
|---|---|---|
input_per_million | Option<f64> | Cost per million input tokens (USD) |
output_per_million | Option<f64> | Cost per million output tokens (USD) |
per_image | Option<f64> | Cost per image (image generation models) |
per_second | Option<f64> | Cost per second of compute |
ModelCapabilities
| Field | Type | Description |
|---|---|---|
chat | bool | Supports chat completions |
streaming | bool | Supports streaming responses |
tool_use | bool | Supports tool/function calling |
structured_output | bool | Supports JSON schema constraints |
vision | bool | Supports image inputs |
image_generation | bool | Supports image generation |
embeddings | bool | Supports text embeddings |
video_generation | bool | Video generation support |
text_to_speech | bool | Text-to-speech synthesis |
speech_to_text | bool | Speech-to-text transcription |
audio_generation | bool | Audio generation (music, SFX) |
three_d_generation | bool | 3D model generation |
Types
ChatMessage
A single message in a chat conversation.
| Field | Type | Description |
|---|---|---|
role | Role | Who produced this message |
content | MessageContent | The message payload. For tool-result messages with a plain string return, the string lives here as MessageContent::Text(s). |
tool_calls | Vec<ToolCall> | Tool invocations requested by the assistant on this message (empty for non-assistant roles). |
tool_result | Option<ToolOutput<serde_json::Value>> | Structured tool-result payload for tool-role messages. Some only when the tool returned non-string data or supplied an llm_override. Plain-string results live in content instead. |
name | Option<String> | Tool name (set for tool-role messages). |
tool_call_id | Option<String> | Provider-assigned id from the originating ToolCall. |
Constructors:
// Text messages
ChatMessage::system("You are a helpful assistant")
ChatMessage::user("Hello!")
ChatMessage::assistant("Hi there!")
ChatMessage::tool("{ \"result\": 42 }")
// Multimodal messages
ChatMessage::user_image_url("Describe this", "https://img.com/a.png", Some("image/png"))
ChatMessage::user_image_base64("What is this?", "iVBORw0K...", "image/jpeg")
ChatMessage::user_parts(vec![
ContentPart::Text { text: "Look at this:".into() },
ContentPart::Image(ImageContent {
source: ImageSource::Url { url: "https://...".into() },
media_type: Some("image/png".into()),
}),
ContentPart::File(FileContent {
source: ImageSource::Url { url: "https://...".into() },
media_type: "application/pdf".into(),
filename: Some("doc.pdf".into()),
}),
])
ChatMessage::tool_result
Build a tool-role message that closes a prior ToolCall. Routes plain strings into content and structured payloads (or anything with an llm_override) onto the new tool_result sibling field.
pub fn tool_result(
call_id: impl Into<String>,
name: impl Into<String>,
output: impl Into<ToolOutput<serde_json::Value>>,
) -> Self
| Argument | Description |
|---|---|
call_id | The id from the originating ToolCall. Stored on tool_call_id. |
name | The tool name. Stored on name. |
output | A ToolOutput<Value> (or anything that converts via From<Value>, e.g. serde_json::Value directly). |
Routing rules:
- If
output.data == Value::String(s)andoutput.llm_override.is_none(), the string is moved intocontentasMessageContent::Text(s)andtool_resultstaysNone. - Otherwise,
outputis stored verbatim ontool_resultandcontentis set toMessageContent::Text(String::new()).
Usage:
use blazen_llm::{ChatMessage, ToolOutput, LlmPayload};
use serde_json::json;
// Plain string -- lives in content as a regular text message.
ChatMessage::tool_result("call_1", "search", json!("hello"));
// Structured -- lives in the tool_result sibling field.
ChatMessage::tool_result("call_1", "search", json!({"items": [1, 2, 3]}));
// With override -- full data preserved on the message,
// summary sent to the LLM next turn.
ChatMessage::tool_result(
"call_1",
"search",
ToolOutput::with_override(
json!({"items": [1, 2, 3], "raw": "..."}),
LlmPayload::Text { text: "Found 3 items.".into() },
),
);
:::caution[Breaking change]
Earlier versions accepted content: impl Into<String> and stored the string verbatim. The new signature accepts output: impl Into<ToolOutput<Value>>. Callers that passed a &str should switch to serde_json::Value::String(s.into()) or serde_json::json!(s).
:::
ChatMessage::tool_result_parts
Build a tool-role message whose result carries multimodal content (text + images + files). The parts ride on tool_result as LlmPayload::Parts { parts } so providers that support multimodal tool results (Anthropic, Gemini) can forward them natively.
pub fn tool_result_parts(
call_id: impl Into<String>,
name: impl Into<String>,
parts: Vec<ContentPart>,
) -> Self
Usage:
use blazen_llm::{ChatMessage, ContentPart, ImageContent, ImageSource};
ChatMessage::tool_result_parts(
"call_1",
"render_chart",
vec![
ContentPart::Text { text: "Rendered the requested chart.".into() },
ContentPart::Image(ImageContent {
source: ImageSource::Url { url: "https://example.com/chart.png".into() },
media_type: Some("image/png".into()),
}),
],
);
ChatMessage::tool_result_view
Accessor returning both channels of a tool-result message in a single call. Used internally by provider implementations that need to choose between the structured data and an explicit llm_override when serialising to the wire format.
pub fn tool_result_view(&self) -> Option<(serde_json::Value, Option<&LlmPayload>)>
Returns None for non-tool-role messages. For tool-role messages it returns Some((data, override)) where data is the raw serde_json::Value payload (drawn from tool_result.data when present, otherwise reconstructed from the plain-string content) and override is the optional &LlmPayload from tool_result.llm_override.
Role
pub enum Role {
System,
User,
Assistant,
Tool,
}
MessageContent
pub enum MessageContent {
Text(String),
Image(ImageContent),
Parts(Vec<ContentPart>),
}
| Method | Signature | Description |
|---|---|---|
as_text() | &self -> Option<&str> | Return the text if this is a Text variant |
as_parts() | &self -> Vec<ContentPart> | Convert any variant into a Vec<ContentPart> |
text_content() | &self -> Option<String> | Extract and concatenate all text content |
MessageContent implements From<&str> and From<String>.
ContentPart
pub enum ContentPart {
Text { text: String },
Image(ImageContent),
File(FileContent),
}
ImageContent
| Field | Type | Description |
|---|---|---|
source | ImageSource | URL or base64 data |
media_type | Option<String> | MIME type (e.g. "image/png") |
ImageSource
The source of an image, file, or any other media payload. Marked #[non_exhaustive] so new variants can be added without breaking callers — always pattern-match with a wildcard arm. MediaSource is a type alias for ImageSource and is the preferred name when the value is not specifically an image.
#[non_exhaustive]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum ImageSource {
Url { url: String },
Base64 { data: String },
File { path: PathBuf },
ProviderFile { provider: ProviderId, id: String },
Handle { handle: ContentHandle },
}
pub type MediaSource = ImageSource;
| Variant | Description |
|---|---|
Url { url } | Public URL the provider can fetch directly. |
Base64 { data } | Inline base64-encoded bytes. Pair with media_type on the enclosing ImageContent / FileContent. |
File { path } | Local filesystem path. Use ImageSource::file(path) to construct. Resolved at request time by a ContentStore or by the provider adapter. |
ProviderFile { provider, id } | Reference to a file already uploaded to a provider’s Files API (e.g. an OpenAI file-xxx id). Forwarded verbatim only when the active provider matches provider. |
Handle { handle } | Reference to a ContentHandle registered with a ContentStore. Resolved into a concrete Url / Base64 / ProviderFile by CompletionRequest::resolve_handles_with before the request hits a provider. |
FileContent
| Field | Type | Description |
|---|---|---|
source | ImageSource | URL or base64 data |
media_type | String | MIME type (e.g. "application/pdf") |
filename | Option<String> | Optional filename for display |
CompletionRequest
A provider-agnostic request for a chat completion.
| Field | Type | Description |
|---|---|---|
messages | Vec<ChatMessage> | The conversation history |
tools | Vec<ToolDefinition> | Tools available for the model to invoke |
temperature | Option<f32> | Sampling temperature (0.0 = deterministic, 2.0 = very random) |
max_tokens | Option<u32> | Maximum number of tokens to generate |
top_p | Option<f32> | Nucleus sampling parameter |
response_format | Option<serde_json::Value> | JSON Schema for structured output |
model | Option<String> | Override the provider’s default model |
modalities | Option<Vec<String>> | Output modalities (e.g. ["text"], ["image", "text"]) |
image_config | Option<serde_json::Value> | Image generation configuration (model-specific) |
audio_config | Option<serde_json::Value> | Audio output configuration (voice, format, etc.) |
Builder pattern:
let request = CompletionRequest::new(vec![ChatMessage::user("Hello")])
.with_tools(tool_defs)
.with_temperature(0.7)
.with_max_tokens(1024)
.with_top_p(0.9)
.with_response_format(schema_json)
.with_model("gpt-4o")
.with_modalities(vec!["text".into(), "image".into()])
.with_image_config(serde_json::json!({ "size": "1024x1024" }))
.with_audio_config(serde_json::json!({ "voice": "alloy" }));
CompletionResponse
The result of a non-streaming chat completion.
| Field | Type | Description |
|---|---|---|
content | Option<String> | Text content of the assistant’s reply |
tool_calls | Vec<ToolCall> | Tool invocations requested by the model |
usage | Option<TokenUsage> | Token usage statistics |
model | String | The model that produced this response |
finish_reason | Option<String> | Why the model stopped (e.g. "stop", "tool_use") |
cost | Option<f64> | Estimated cost in USD |
timing | Option<RequestTiming> | Request timing breakdown |
images | Vec<GeneratedImage> | Generated images (multimodal models) |
audio | Vec<GeneratedAudio> | Generated audio (TTS / multimodal) |
videos | Vec<GeneratedVideo> | Generated videos |
metadata | serde_json::Value | Provider-specific metadata |
StructuredResponse<T>
Response from structured output extraction, preserving metadata.
| Field | Type | Description |
|---|---|---|
data | T | The extracted structured data |
usage | Option<TokenUsage> | Token usage statistics |
model | String | The model that produced this response |
cost | Option<f64> | Estimated cost in USD |
timing | Option<RequestTiming> | Request timing |
metadata | serde_json::Value | Provider-specific metadata |
EmbeddingResponse
Response from an embedding operation.
| Field | Type | Description |
|---|---|---|
embeddings | Vec<Vec<f32>> | The embedding vectors (one per input text) |
model | String | The model used |
usage | Option<TokenUsage> | Token usage statistics |
cost | Option<f64> | Estimated cost in USD |
timing | Option<RequestTiming> | Request timing |
metadata | serde_json::Value | Provider-specific metadata |
RequestTiming
Timing metadata for a request.
| Field | Type | Description |
|---|---|---|
queue_ms | Option<u64> | Time spent waiting in queue (ms) |
execution_ms | Option<u64> | Time spent executing the request (ms) |
total_ms | Option<u64> | Total wall-clock time from submit to response (ms) |
TokenUsage
Token usage statistics for a completion request.
| Field | Type | Description |
|---|---|---|
prompt_tokens | u32 | Tokens in the prompt / input |
completion_tokens | u32 | Tokens in the completion / output |
total_tokens | u32 | Total tokens consumed (prompt + completion) |
ToolDefinition
Describes a tool that the model may invoke.
| Field | Type | Description |
|---|---|---|
name | String | Unique name of the tool |
description | String | Human-readable description |
parameters | serde_json::Value | JSON Schema describing the tool’s input parameters |
ToolCall
A tool invocation requested by the model.
| Field | Type | Description |
|---|---|---|
id | String | Provider-assigned identifier for this invocation |
name | String | Name of the tool to invoke |
arguments | serde_json::Value | Arguments to pass, as JSON |
StreamChunk
A single chunk from a streaming completion response.
| Field | Type | Description |
|---|---|---|
delta | Option<String> | Incremental text content |
tool_calls | Vec<ToolCall> | Tool invocations completed in this chunk |
finish_reason | Option<String> | Present in the final chunk to indicate why generation stopped |
Content Subsystem
A provider-agnostic layer for handing media (images, audio, video, documents, 3D, CAD, archives, fonts, code, generic data) to and from models. The core idea: instead of inlining bytes or URLs in every message, callers register payloads with a ContentStore, receive a small ContentHandle, and reference the handle from messages, tool inputs, and tool outputs. Just before a request is sent, the runtime resolves every handle into the concrete representation the active provider expects — a URL, a base64 blob, or a ProviderFile reference for providers with native Files APIs.
Everything in this section is re-exported from blazen_llm::content.
ContentKind
Coarse classification for a piece of content. Marked #[non_exhaustive]. Serde uses rename_all = "snake_case", so ThreeDModel round-trips as "three_d_model". Implements Display.
#[non_exhaustive]
pub enum ContentKind {
Image,
Audio,
Video,
Document,
ThreeDModel,
Cad,
Archive,
Font,
Code,
Data,
Other,
}
| Method | Signature | Description |
|---|---|---|
from_mime | fn(&str) -> Self | Best-effort classification from a MIME string. |
from_extension | fn(&str) -> Self | Best-effort classification from a filename extension (no leading dot required). |
as_str | fn(self) -> &'static str | Snake-case string form (matches the serde representation). |
use blazen_llm::content::ContentKind;
assert_eq!(ContentKind::from_mime("image/png"), ContentKind::Image);
assert_eq!(ContentKind::from_extension("glb"), ContentKind::ThreeDModel);
assert_eq!(ContentKind::ThreeDModel.as_str(), "three_d_model");
ContentHandle
An opaque pointer to a payload owned by a ContentStore. Cheap to clone, safe to embed in messages and tool arguments, and resolvable on demand.
pub struct ContentHandle {
pub id: String,
pub kind: ContentKind,
pub mime_type: Option<String>,
pub byte_size: Option<u64>,
pub display_name: Option<String>,
}
| Method | Signature | Description |
|---|---|---|
new | fn(id: impl Into<String>, kind: ContentKind) -> Self | Construct a handle with no metadata. |
with_mime_type | fn(self, mime: impl Into<String>) -> Self | Builder: attach a MIME type. |
with_byte_size | fn(self, bytes: u64) -> Self | Builder: attach a byte size. |
with_display_name | fn(self, name: impl Into<String>) -> Self | Builder: attach a human-readable name. |
use blazen_llm::content::{ContentHandle, ContentKind};
let handle = ContentHandle::new("blob_abc123", ContentKind::Image)
.with_mime_type("image/png")
.with_byte_size(48_213)
.with_display_name("chart.png");
ContentStore
The async trait every store implements. Stores own the bytes (or the URL, or the provider-side file id) and translate handles into something the provider can consume on demand.
#[async_trait]
pub trait ContentStore: Send + Sync + std::fmt::Debug {
async fn put(&self, body: ContentBody, hint: ContentHint)
-> Result<ContentHandle, BlazenError>;
async fn resolve(&self, handle: &ContentHandle)
-> Result<MediaSource, BlazenError>;
async fn fetch_bytes(&self, handle: &ContentHandle)
-> Result<Vec<u8>, BlazenError>;
async fn metadata(&self, handle: &ContentHandle)
-> Result<ContentMetadata, BlazenError> { /* default */ }
async fn fetch_stream(&self, handle: &ContentHandle)
-> Result<ByteStream, BlazenError> { /* default */ }
async fn delete(&self, _handle: &ContentHandle)
-> Result<(), BlazenError> { Ok(()) }
}
| Method | Description |
|---|---|
put | Ingest a ContentBody under a ContentHint and return a fresh handle. |
resolve | Produce the concrete MediaSource the provider will see (typically Url, Base64, or ProviderFile). Called by CompletionRequest::resolve_handles_with. |
fetch_bytes | Materialise the underlying bytes. Used when a tool needs to read the payload directly. |
metadata | Return ContentMetadata. The default impl synthesises this from the handle’s own fields; stores with richer indices should override. |
fetch_stream | Stream raw bytes back as a ByteStream. The default impl buffers fetch_bytes into a single chunk via futures::stream::once, so existing impls keep working unchanged. Stores backed by HTTP / disk / object storage should override for true incremental streaming. Built-in overrides today: LocalFileContentStore (uses tokio_util::io::ReaderStream), OpenAiFilesStore, AnthropicFilesStore, FalStorageStore (all via the HttpClient trait’s send_streaming method). InMemoryContentStore and GeminiFilesStore use the buffered default impl. |
delete | Best-effort deletion. Default is a no-op so read-only stores can leave it unimplemented. |
use std::sync::Arc;
use blazen_llm::content::{ContentBody, ContentHint, ContentKind, InMemoryContentStore, ContentStore};
let store = Arc::new(InMemoryContentStore::new());
let handle = store.put(
ContentBody::Bytes { data: std::fs::read("chart.png")? },
ContentHint::default()
.with_mime_type("image/png")
.with_kind(ContentKind::Image)
.with_display_name("chart.png"),
).await?;
let source = store.resolve(&handle).await?;
ContentBody
The five ways a caller can hand bytes (or a pointer to bytes) to a ContentStore via put. Variants are struct-form (named fields), and serde uses an internally-tagged representation (tag = "type", rename_all = "snake_case").
#[derive(Serialize, Deserialize)]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum ContentBody {
Bytes { data: Vec<u8> },
Url { url: String },
LocalPath { path: PathBuf },
ProviderFile { provider: ProviderId, id: String },
/// Streaming byte source. The store consumes the stream during `put`
/// and is free to spool to disk, forward as a chunked upload, or
/// drain into bytes.
#[serde(skip)]
Stream {
stream: ByteStream,
size_hint: Option<u64>,
},
}
| Variant | Description |
|---|---|
Bytes { data } | In-memory payload. The store decides whether to keep it in RAM, spill to disk, or upload to a provider. |
Url { url } | Remote URL the store may fetch lazily or pass through verbatim on resolve. |
LocalPath { path } | Local filesystem path. Native-only stores (e.g. LocalFileContentStore) can index it without copying. |
ProviderFile { provider, id } | A pre-existing file id on a provider’s Files API. Lets you wrap an externally uploaded asset in a handle without re-uploading. |
Stream { stream, size_hint } | Streaming byte source built on ByteStream. Stores with a true streaming upload path (filesystem, S3, HTTP multipart) consume the stream incrementally; memory-bound stores buffer. size_hint carries the total length when known up front (e.g. from a Content-Length header) so stores can pre-allocate or pick between simple and resumable upload paths. |
ContentBody::Stream is not Clone (a ByteStream is single-use) and not Serialize — the variant is annotated #[serde(skip)] because ByteStream implements neither Serialize nor Deserialize. The manual Clone impl panics with unreachable! if you clone a Stream variant; consume streaming bodies by value. Bindings that route ContentBody through serde_json must check for Stream first and handle it on a separate path.
use blazen_llm::content::ContentBody;
let from_memory = ContentBody::Bytes { data: b"hello".to_vec() };
let from_disk = ContentBody::LocalPath { path: "./report.pdf".into() };
let from_url = ContentBody::Url { url: "https://example.com/a.png".into() };
ByteStream
A pinned, boxed, fallible stream of byte chunks. Used by ContentBody::Stream for streaming uploads and by ContentStore::fetch_stream for streaming downloads.
pub type ByteStream = std::pin::Pin<
Box<dyn futures_core::Stream<Item = Result<bytes::Bytes, BlazenError>> + Send>
>;
Stores backed by HTTP, S3, or the filesystem should produce / consume these incrementally; memory-bound stores may buffer.
ContentHint
Optional metadata callers pass alongside a ContentBody so the store can pick a sensible MIME type, classification, and display name without re-sniffing. Implements Default.
pub struct ContentHint {
pub mime_type: Option<String>,
pub kind_hint: Option<ContentKind>,
pub display_name: Option<String>,
pub byte_size: Option<u64>,
}
| Method | Signature | Description |
|---|---|---|
with_mime_type | fn(self, mime: impl Into<String>) -> Self | Pin the MIME type. |
with_kind | fn(self, kind: ContentKind) -> Self | Pin the ContentKind. |
with_display_name | fn(self, name: impl Into<String>) -> Self | Pin a human-readable name. |
with_byte_size | fn(self, bytes: u64) -> Self | Pin the byte size when known up front (e.g. from a Content-Length header). |
use blazen_llm::content::{ContentHint, ContentKind};
let hint = ContentHint::default()
.with_mime_type("audio/wav")
.with_kind(ContentKind::Audio)
.with_display_name("note.wav");
ContentMetadata
The non-id fields of a ContentHandle, returned by ContentStore::metadata. Useful for cheap introspection without materialising bytes.
pub struct ContentMetadata {
pub kind: ContentKind,
pub mime_type: Option<String>,
pub byte_size: Option<u64>,
pub display_name: Option<String>,
}
use blazen_llm::content::ContentStore;
let meta = store.metadata(&handle).await?;
println!("{} ({} bytes)", meta.kind, meta.byte_size.unwrap_or(0));
DynContentStore
Convenience alias for the shared, thread-safe form most code passes around.
pub type DynContentStore = std::sync::Arc<dyn ContentStore>;
use std::sync::Arc;
use blazen_llm::content::{DynContentStore, InMemoryContentStore};
let store: DynContentStore = Arc::new(InMemoryContentStore::new());
Built-in stores
| Store | Constructor | Notes |
|---|---|---|
InMemoryContentStore | InMemoryContentStore::new() (also Default) | Bytes / URL / provider-file refs held in a RwLock-guarded map. Great for tests and short-lived sessions. |
LocalFileContentStore | LocalFileContentStore::new(root: impl Into<PathBuf>) -> Result<Self, BlazenError> | Native-only (not(target_arch = "wasm32")). Persists payloads under root; assigns each entry a UUID-derived filename and tracks the index in memory. |
OpenAiFilesStore | OpenAiFilesStore::new(api_key), .with_base_url(url), .with_purpose(p) | Uploads via OpenAI’s /v1/files API. purpose defaults to "user_data" (override for assistants / fine-tuning / batch). |
AnthropicFilesStore | AnthropicFilesStore::new(api_key), .with_base_url(url), .with_beta_header(h) | Uploads via Anthropic’s Files API. beta_header is forwarded as anthropic-beta. |
GeminiFilesStore | GeminiFilesStore::new(api_key), .with_base_url(url) | Uploads via Google’s Files API and resolves handles to gs:///file-uri refs. |
FalStorageStore | FalStorageStore::new(api_key), .with_base_url(url) | Uploads to Fal’s storage endpoint. |
CustomContentStore | CustomContentStore::builder(name) -> CustomContentStoreBuilder | Build a store from closures: .put(...), .resolve(...), .fetch_bytes(...), .fetch_stream(...) (optional), .delete(...) (optional), .build(). The .fetch_stream callback is optional — when omitted, the trait’s default impl buffers fetch_bytes into one chunk via stream::once. Lets callers integrate their own blob backend without writing a new trait impl. |
use std::sync::Arc;
use blazen_llm::content::{
AnthropicFilesStore, CustomContentStore, InMemoryContentStore,
LocalFileContentStore, OpenAiFilesStore,
};
let in_mem = Arc::new(InMemoryContentStore::new());
let on_disk = Arc::new(LocalFileContentStore::new("/var/cache/blazen")?);
let openai = Arc::new(OpenAiFilesStore::new(std::env::var("OPENAI_API_KEY")?)
.with_purpose("user_data"));
let anthropic = Arc::new(AnthropicFilesStore::new(std::env::var("ANTHROPIC_API_KEY")?)
.with_beta_header("files-api-2025-04-14"));
let custom = Arc::new(
CustomContentStore::builder("s3-store")
.put(|body, hint| async move { /* upload, return ContentHandle */ todo!() })
.resolve(|handle| async move { /* return MediaSource */ todo!() })
.fetch_bytes(|handle| async move { /* return Vec<u8> */ todo!() })
.fetch_stream(|handle| Box::pin(async move {
// OPTIONAL: stream bytes back chunk-by-chunk for large content.
// When omitted, the default impl buffers fetch_bytes into one chunk.
use bytes::Bytes;
use futures_util::stream;
let chunks: Vec<Result<Bytes, _>> = vec![Ok(Bytes::from_static(b"hello"))];
Ok(Box::pin(stream::iter(chunks)) as blazen_llm::content::ByteStream)
}))
.delete(|handle| async move { /* delete blob */ Ok(()) })
.build()?,
);
Tool-input helpers
Helpers that produce JSON Schema fragments for tool parameters that should accept a ContentHandle. Each fragment is tagged with x-blazen-content-ref so resolve_tool_arguments knows where to substitute resolved MediaSource values before the tool runs.
pub fn image_input(name: &str, description: &str) -> serde_json::Value;
pub fn audio_input(name: &str, description: &str) -> serde_json::Value;
pub fn video_input(name: &str, description: &str) -> serde_json::Value;
pub fn file_input(name: &str, description: &str) -> serde_json::Value;
pub fn three_d_input(name: &str, description: &str) -> serde_json::Value;
pub fn cad_input(name: &str, description: &str) -> serde_json::Value;
pub fn content_ref_property(
kind: ContentKind,
description: &str,
) -> serde_json::Value;
pub fn content_ref_required_object(
name: &str,
kind: ContentKind,
description: &str,
extra_properties: serde_json::Map<String, serde_json::Value>,
) -> serde_json::Value;
pub async fn resolve_tool_arguments(
arguments: &mut serde_json::Value,
schema: &serde_json::Value,
store: &dyn ContentStore,
) -> Result<usize, BlazenError>;
pub struct KindMismatch {
pub property: String,
pub expected: ContentKind,
pub actual: ContentKind,
}
| Helper | Description |
|---|---|
image_input / audio_input / video_input / file_input / three_d_input / cad_input | Top-level convenience: returns a single-property required object schema for a typed content reference. |
content_ref_property | Schema for one property accepting a ContentHandle of the given ContentKind. Use when assembling a custom multi-property schema. |
content_ref_required_object | Build a required object schema mixing one content ref with extra_properties (other primitives, enums, etc.). |
resolve_tool_arguments | Walk arguments against schema, replace every x-blazen-content-ref site with the MediaSource returned by store.resolve. Returns the number of substitutions made. |
KindMismatch | Error variant returned when a handle’s ContentKind does not match the schema’s declared expected kind. |
use blazen_llm::content::tool_input::{image_input, resolve_tool_arguments};
use serde_json::{json, Map};
let schema = json!({
"type": "object",
"properties": {
"image": image_input("image", "The image to caption"),
"max_words": { "type": "integer" },
},
"required": ["image", "max_words"],
});
let mut args = json!({
"image": { "id": "blob_abc123", "kind": "image" },
"max_words": 20,
});
let n = resolve_tool_arguments(&mut args, &schema, store.as_ref()).await?;
// `args["image"]` is now a concrete MediaSource (Url / Base64 / ProviderFile).
println!("rewrote {n} content refs");
Visibility helpers
Helpers for the runtime’s “what handles is the model actually allowed to see right now?” pass.
pub fn collect_visible_handles(messages: &[ChatMessage]) -> Vec<ContentHandle>;
pub fn build_handle_directory_system_note(
handles: &[ContentHandle],
) -> Option<String>;
pub async fn prepare_request_with_store(
request: &mut CompletionRequest,
store: &dyn ContentStore,
) -> Result<usize, BlazenError>;
| Helper | Description |
|---|---|
collect_visible_handles | Walks messages and returns every distinct ContentHandle referenced from user/assistant/tool content, deduped first-seen. |
build_handle_directory_system_note | Returns a system-note string listing the visible handles (id, kind, MIME, name) so the model can name them when calling tools. Returns None when handles is empty — callers should not append an empty note. |
prepare_request_with_store | One-call convenience: runs CompletionRequest::resolve_handles_with, then builds and prepends the system note. Returns the number of resolved handles. This is what the agent loop calls before dispatching a request. |
use blazen_llm::content::visibility::{
collect_visible_handles, build_handle_directory_system_note, prepare_request_with_store,
};
let visible = collect_visible_handles(&request.messages);
if let Some(note) = build_handle_directory_system_note(&visible) {
println!("would inject system note:\n{note}");
}
let n = prepare_request_with_store(&mut request, store.as_ref()).await?;
println!("resolved {n} handles before dispatch");
Magic-number detection
Lightweight content sniffing backed by infer. Gated by the default-on content-detect Cargo feature — disabling the feature drops the infer dependency entirely (the functions remain but return (ContentKind::Other, None)).
pub fn detect_from_bytes(bytes: &[u8]) -> (ContentKind, Option<String>);
#[cfg(not(target_arch = "wasm32"))]
pub fn detect_from_path(path: &std::path::Path) -> (ContentKind, Option<String>);
pub fn detect(
bytes: Option<&[u8]>,
mime_hint: Option<&str>,
filename: Option<&str>,
) -> (ContentKind, Option<String>);
| Function | Description |
|---|---|
detect_from_bytes | Sniff the leading bytes and return the inferred ContentKind plus the matching MIME string if any. |
detect_from_path | Native-only. Reads the head of the file, then falls back to the extension when bytes are inconclusive. |
detect | Combined entry point: prefers byte sniffing when bytes is Some, then mime_hint, then filename extension. Returns (ContentKind::Other, None) when nothing matches. |
use blazen_llm::content::{detect, detect_from_bytes};
let (kind, mime) = detect_from_bytes(&[0x89, b'P', b'N', b'G', 0x0d, 0x0a, 0x1a, 0x0a]);
assert_eq!(mime.as_deref(), Some("image/png"));
let (kind2, mime2) = detect(None, Some("application/pdf"), Some("report.pdf"));
CompletionRequest::resolve_handles_with
The lower-level half of prepare_request_with_store. Walks every message in the request, finds every ImageSource::Handle, and replaces it with the concrete MediaSource returned by store.resolve. Does not inject a system note — prefer prepare_request_with_store from the agent loop unless you specifically want to skip the note.
impl CompletionRequest {
pub async fn resolve_handles_with(
&mut self,
store: &dyn ContentStore,
) -> Result<usize, BlazenError>;
}
Returns the number of Handle variants that were rewritten. Errors propagate from store.resolve.
let resolved = request.resolve_handles_with(store.as_ref()).await?;
println!("rewrote {resolved} handles in place");
Agent System
The agent system implements the standard LLM + tool calling loop: send messages with tool definitions, execute any tool calls the model makes, feed results back, and repeat until the model stops or max_iterations is reached.
run_agent()
Run the agent loop without event callbacks.
pub async fn run_agent(
model: &dyn CompletionModel,
messages: Vec<ChatMessage>,
config: AgentConfig,
) -> Result<AgentResult, BlazenError>
run_agent_with_callback()
Run the agent loop, emitting AgentEvents to the supplied callback.
pub async fn run_agent_with_callback(
model: &dyn CompletionModel,
messages: Vec<ChatMessage>,
config: AgentConfig,
on_event: impl Fn(AgentEvent) + Send + Sync,
) -> Result<AgentResult, BlazenError>
The loop works as follows:
- Build a
CompletionRequestwith the full message history and all tool definitions. - Call the model.
- If the model responds with no tool calls, return immediately.
- If the model invoked the built-in “finish” tool (when enabled), extract the answer and return.
- Otherwise, execute each tool call, append results to messages, go back to step 1.
- If
max_iterationsis reached, make one final call without tools to force a text answer.
Usage:
use std::sync::Arc;
use blazen_llm::{run_agent, AgentConfig, ChatMessage};
let config = AgentConfig::new(vec![Arc::new(WeatherTool)])
.with_system_prompt("You are a helpful assistant with weather tools.")
.with_max_iterations(5)
.with_finish_tool()
.with_temperature(0.7)
.with_max_tokens(2048);
let result = run_agent(
&model,
vec![ChatMessage::user("What's the weather in Paris?")],
config,
).await?;
println!("Answer: {}", result.response.content.unwrap_or_default());
println!("Iterations: {}", result.iterations);
println!("Total cost: ${:.4}", result.total_cost.unwrap_or(0.0));
With callback:
use blazen_llm::{run_agent_with_callback, AgentEvent};
let result = run_agent_with_callback(
&model,
vec![ChatMessage::user("What's the weather?")],
config,
|event| match &event {
AgentEvent::ToolCalled { iteration, tool_call } => {
println!("[iter {iteration}] calling tool: {}", tool_call.name);
}
AgentEvent::ToolResult { tool_name, result, .. } => {
println!(" {tool_name} -> {result}");
}
AgentEvent::IterationComplete { iteration, had_tool_calls } => {
println!("[iter {iteration}] done (tools: {had_tool_calls})");
}
},
).await?;
AgentConfig
Configuration for the agentic tool execution loop.
| Field | Type | Default | Description |
|---|---|---|---|
max_iterations | u32 | 10 | Maximum tool call rounds before forcing a stop |
tools | Vec<Arc<dyn Tool>> | required | Tools available to the agent |
add_finish_tool | bool | false | Add an implicit “finish” tool the model can call to exit early |
system_prompt | Option<String> | None | System prompt prepended to messages |
temperature | Option<f32> | None | Sampling temperature |
max_tokens | Option<u32> | None | Maximum tokens per completion call |
Builder pattern:
AgentConfig::new(tools)
.with_max_iterations(5)
.with_system_prompt("You are helpful.")
.with_finish_tool()
.with_temperature(0.7)
.with_max_tokens(2048)
AgentResult
Result of an agent run.
| Field | Type | Description |
|---|---|---|
response | CompletionResponse | The final completion response |
messages | Vec<ChatMessage> | Full message history including all tool calls and results |
iterations | u32 | Number of tool call rounds that occurred |
total_usage | Option<TokenUsage> | Aggregated token usage across all rounds |
total_cost | Option<f64> | Aggregated cost across all rounds |
timing | Option<RequestTiming> | Total wall-clock time for the entire agent run |
AgentEvent
Events emitted during agent execution (passed to the callback in run_agent_with_callback).
pub enum AgentEvent {
ToolCalled {
iteration: u32,
tool_call: ToolCall,
},
ToolResult {
iteration: u32,
tool_name: String,
result: serde_json::Value,
},
IterationComplete {
iteration: u32,
had_tool_calls: bool,
},
}
Context
The Context object is a shared key-value store available in every workflow step. It provides three storage tiers and methods for event routing, streaming, and state management.
State Storage
Typed JSON: set() / get()
Store and retrieve any Serialize / DeserializeOwned type. Values are held internally as StateValue::Json.
// Store a typed value (anything implementing Serialize)
ctx.set("user_id", serde_json::json!("user_123"));
ctx.set("doc_count", serde_json::json!(5));
// Retrieve with type inference
let user_id: String = serde_json::from_value(ctx.get("user_id").unwrap()).unwrap();
let doc_count: i64 = serde_json::from_value(ctx.get("doc_count").unwrap()).unwrap();
Binary: set_bytes() / get_bytes()
Store raw Vec<u8> data. Values are held as StateValue::Bytes. No serialization requirement — useful for model weights, protobuf, bincode, or any binary format.
ctx.set_bytes("weights", vec![0x01, 0x02, 0x03]);
let bytes: Vec<u8> = ctx.get_bytes("weights").unwrap();
Raw StateValue: set_value() / get_value()
Work with the StateValue enum directly for full control over the storage variant, including the Native variant used by language bindings.
use blazen::context::StateValue;
ctx.set_value("config", StateValue::Json(serde_json::json!({"retries": 3})));
ctx.set_value("blob", StateValue::Bytes(vec![0xDE, 0xAD].into()));
ctx.set_value("py_obj", StateValue::Native(pickle_bytes.into()));
match ctx.get_value("config") {
Some(StateValue::Json(v)) => { /* structured data */ }
Some(StateValue::Bytes(b)) => { /* raw bytes */ }
Some(StateValue::Native(b)) => { /* platform-serialized opaque bytes */ }
None => { /* key not found */ }
}
StateValue
pub enum StateValue {
Json(serde_json::Value),
Bytes(BytesWrapper),
Native(BytesWrapper),
}
| Variant | Description |
|---|---|
Json(serde_json::Value) | Structured, serializable data. Used by ctx.set() / ctx.get(). |
Bytes(BytesWrapper) | Raw binary data. Used by ctx.set_bytes() / ctx.get_bytes(). |
Native(BytesWrapper) | Platform-serialized opaque objects (e.g., Python pickle bytes). Preserved across language boundaries without deserialization. |
Run Identity
ctx.run_id() -> &str
Returns the unique identifier for the current workflow run.
Event Routing
ctx.send_event(event: impl Event)
Programmatically route an event into the workflow. Use this when a step needs to emit multiple events or decide at runtime which path to take. When using send_event, the step returns () instead of an event type.
ctx.write_event_to_stream(event: impl Event)
Publish an event to the workflow’s external event stream, observable by callers via stream_events(). Useful for progress reporting and live updates.
Session References
async fn session_refs_arc(&self) -> Arc<SessionRefRegistry>
async fn clear_session_refs(&self) -> usize
async fn session_pause_policy(&self) -> SessionPausePolicy
| Method | Description |
|---|---|
session_refs_arc() | Get a clone of the session-ref registry handle for use by language bindings. Bindings install it as a task-local for the duration of a step so platform-native objects (Py<PyAny>, napi::Ref<JsObject>, etc.) carried via event payloads can be resolved by UUID. |
clear_session_refs() | Drain the session-ref registry. Called on workflow termination by the language bindings to release platform-specific live refs back to their respective garbage collectors. Returns the number of entries removed. |
session_pause_policy() | Get the configured SessionPausePolicy. The policy is set by WorkflowBuilder::session_pause_policy; there is no public setter on Context. |
See the dedicated Session Reference Registry section for background, key types, and the pause-time policy matrix.
State Snapshot and Restore
ctx.collect_events() -> Vec<Box<dyn Event>>
ctx.snapshot_state() -> ContextSnapshot
ctx.restore_state(snapshot: ContextSnapshot)
| Method | Description |
|---|---|
collect_events() | Drain all pending events from the context. |
snapshot_state() | Capture the entire context state as a serializable snapshot (for checkpointing / pause-resume). |
restore_state(snapshot) | Restore context from a previously captured snapshot. |
:::caution[Session refs are not snapshotted]
Context::snapshot_state intentionally excludes both the opaque objects map and the session_refs registry. Live in-process references cannot survive a cross-process snapshot round-trip. If your workflow may pause() and your bindings use session refs, configure WorkflowBuilder::session_pause_policy to control what happens (default: PickleOrError).
:::
BlazenState
BlazenState is a binding-layer concept for Python, Node.js, and WASM. In those languages, extending a BlazenState base class gives you automatic per-field persistence in the workflow context. Rust has no equivalent base class.
In Rust, per-field storage is achieved manually by calling set_value() / get_value() with the StateValue enum (see the StateValue section above). Each field is stored under an explicit key, giving you full control over serialization format and storage variant.
The Native(BytesWrapper) variant exists specifically to support bindings: it lets platform-serialized objects (e.g., Python pickle bytes, Node.js v8.serialize output) round-trip through Rust steps without deserialization. Binding authors use StateValue::Native to store opaque platform objects, and Rust code can forward those values without interpreting their contents.
Session Reference Registry
blazen_core::session_ref provides a per-Context registry of live in-process references — values that cannot or should not be JSON-serialized, such as DB connections, file handles, large in-memory tensors, lambdas, locks, or platform-native objects like Py<PyAny> and napi::Ref<JsObject>.
Each Context owns its own SessionRefRegistry with a lifetime tied to the workflow run. Event payloads carry only a JSON marker containing the key; the actual object lives in the registry until workflow completion. Bindings detect the marker and resolve it through the active registry to preserve object identity across step boundaries without serialisation.
The JSON marker format is:
{"__blazen_session_ref__": "<uuid>"}
The tag string is exposed as a constant:
pub const SESSION_REF_TAG: &str = "__blazen_session_ref__";
A defensive cap protects against runaway loops exhausting memory:
pub const MAX_SESSION_REFS_PER_RUN: usize = 10_000;
insert_arc / insert return SessionRefError::CapacityExceeded once the registry reaches this cap.
RegistryKey
Strongly-typed wrapper around Uuid used as the registry key.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
#[serde(transparent)]
pub struct RegistryKey(pub Uuid);
| Method | Signature | Description |
|---|---|---|
new() | fn() -> Self | Mint a fresh random key. |
parse(s) | fn(&str) -> Result<Self, uuid::Error> | Parse a key from a UUID string. |
Display | — | Formats as the underlying UUID. |
SessionRefRegistry
Per-context registry of live session references. Internally Arc<dyn Any + Send + Sync> keyed by RegistryKey and guarded by a tokio::sync::RwLock.
pub struct SessionRefRegistry { /* ... */ }
| Method | Signature | Description |
|---|---|---|
new() | fn() -> Self | Create an empty registry. |
insert_arc() | async fn(&self, Arc<dyn Any + Send + Sync>) -> Result<RegistryKey, SessionRefError> | Insert a type-erased Arc directly. Returns the freshly minted key or CapacityExceeded. |
insert::<T>() | async fn(&self, T) -> Result<RegistryKey, SessionRefError> | Insert any Any + Send + Sync + 'static value, wrapping it in an Arc for you. |
get_any() | async fn(&self, RegistryKey) -> Option<Arc<dyn Any + Send + Sync>> | Look up the type-erased entry. Bindings call this and downcast. |
get::<T>() | async fn(&self, RegistryKey) -> Option<Arc<T>> | Look up and downcast to a concrete Arc<T>. |
remove() | async fn(&self, RegistryKey) -> Option<Arc<dyn Any + Send + Sync>> | Remove a single entry, returning the removed value if present. |
drain() | async fn(&self) -> usize | Drain all entries, returning the number removed. |
len() | async fn(&self) -> usize | Number of currently live entries. |
is_empty() | async fn(&self) -> bool | Whether the registry has any live entries. |
keys() | async fn(&self) -> Vec<RegistryKey> | Iterate every key currently in the registry. Used by the snapshot walker to apply SessionPausePolicy uniformly. |
SessionPausePolicy
Controls what happens to live session references when a workflow is paused or snapshotted.
#[derive(Debug, Clone, Copy, Default, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum SessionPausePolicy {
#[default]
PickleOrError,
WarnDrop,
HardError,
}
| Variant | Behaviour |
|---|---|
PickleOrError (default) | Attempt to pickle each live ref into the snapshot. On any failure, raise WorkflowError::SessionRefsNotSerializable and abort the snapshot. Recommended. |
WarnDrop | Drop live refs from the snapshot, emit tracing::warn! per drop, and store a diagnostic report in snapshot metadata. On resume, accessing a dropped field raises a clear runtime error from the binding. |
HardError | Refuse to pause if any live refs are in flight. Raises WorkflowError::SessionRefsNotSerializable immediately. |
SessionRefError
Error type for registry operations.
#[derive(Debug, thiserror::Error)]
pub enum SessionRefError {
#[error("session ref registry capacity exceeded ({cap} entries) — too many live references in this workflow run")]
CapacityExceeded { cap: usize },
}
| Variant | Description |
|---|---|
CapacityExceeded { cap: usize } | Returned when SessionRefRegistry::insert_arc is called while the registry already holds MAX_SESSION_REFS_PER_RUN entries. |
Snapshot exclusion
Session-ref entries are deliberately excluded from Context::snapshot_state(), mirroring the existing objects exclusion. Live in-process references cannot meaningfully round-trip through a serialized snapshot, so the snapshot walker applies SessionPausePolicy at pause time instead. See State Snapshot and Restore for the callout.
Workflow
WorkflowBuilder
The builder exposes a fluent API for configuring a workflow before build(). The full set of builder methods is documented in the guides; the entry below covers the session-ref configuration knob added alongside the session reference registry.
WorkflowBuilder::session_pause_policy
pub fn session_pause_policy(mut self, policy: SessionPausePolicy) -> Self
Configures the policy applied to live session references when the workflow is paused or snapshotted. Defaults to PickleOrError. See SessionPausePolicy for the full variant matrix.
Usage:
use blazen_core::{WorkflowBuilder, session_ref::SessionPausePolicy};
let workflow = WorkflowBuilder::new("my-workflow")
.step(my_step)
.session_pause_policy(SessionPausePolicy::WarnDrop)
.build()?;
WorkflowHandler
The handle returned after starting a workflow. Provides await/stream/pause modes for consuming workflow results.
WorkflowHandler::session_refs
#[must_use]
pub fn session_refs(&self) -> Arc<SessionRefRegistry>
Returns a clone of the session-ref registry handle. Bindings call this after result() to resolve any __blazen_session_ref__ markers carried by the final event, ensuring identity-preserving access to live Python / JS objects passed via event payloads.
The returned Arc keeps the registry alive past the event loop’s exit so the final result event can still resolve live-ref markers even after the original Context has been dropped.
WorkflowHandler::result
pub async fn result(&self) -> Result<WorkflowResult>
Await the final workflow result. The returned WorkflowResult has an .event field containing the final event (typically a StopEvent). Use result.event.downcast_ref::<StopEvent>() to access typed data, or result.event.to_json() for a JSON representation.
WorkflowHandler::pause
pub fn pause(&self) -> Result<()>
Signal the workflow to pause. This is a synchronous, non-consuming call — it does not return a snapshot. After pausing, call snapshot() to obtain the serialized state.
WorkflowHandler::snapshot
pub async fn snapshot(&self) -> Result<String>
Capture the workflow’s current state as a JSON string. Typically called after pause(). The snapshot can be persisted and later passed to Workflow::resume().
WorkflowHandler::resume_in_place
pub fn resume_in_place(&self)
Resume a paused workflow in-place, continuing execution from where it left off.
WorkflowHandler::respond_to_input
pub fn respond_to_input(&self, request_id: String, response: serde_json::Value)
Supply a response to a pending InputRequestEvent. The request_id must match the ID from the original request. The workflow will route the response to the appropriate step and continue.
WorkflowHandler::abort
pub async fn abort(&self) -> Result<()>
Abort the running workflow. Any pending steps are cancelled and the workflow terminates with an error.
WorkflowError variants
blazen_core::WorkflowError is the unified workflow error type. The session-ref subsystem introduces one new variant; other variants (e.g. Paused, InputRequired, Other) are documented in the workflow guides.
SessionRefsNotSerializable
#[error("session refs cannot be serialized for snapshot: {keys:?}")]
SessionRefsNotSerializable {
/// String-formatted UUIDs of the live session refs that could not
/// be persisted.
keys: Vec<String>,
}
One or more live session references could not be serialized for a snapshot. The keys vector contains the string-formatted UUIDs of the offending entries. Produced by the default PickleOrError pause policy when a live ref is not picklable, and by HardError whenever any live refs are in flight at pause time.
Compute Platform
The compute module provides a unified trait system for async, job-based media generation providers (fal.ai, Replicate, RunPod, etc.) that model a submit-poll-retrieve workflow for GPU workloads.
ComputeProvider
The base trait for compute providers.
#[async_trait]
pub trait ComputeProvider: Send + Sync {
fn provider_id(&self) -> &str;
async fn submit(&self, request: ComputeRequest) -> Result<JobHandle, BlazenError>;
async fn status(&self, job: &JobHandle) -> Result<JobStatus, BlazenError>;
async fn result(&self, job: JobHandle) -> Result<ComputeResult, BlazenError>;
async fn cancel(&self, job: &JobHandle) -> Result<(), BlazenError>;
// Default: submit then wait for result
async fn run(&self, request: ComputeRequest) -> Result<ComputeResult, BlazenError> {
let job = self.submit(request).await?;
self.result(job).await
}
}
ImageGeneration
Image generation and upscaling. Requires ComputeProvider as a supertrait.
#[async_trait]
pub trait ImageGeneration: ComputeProvider {
async fn generate_image(&self, request: ImageRequest) -> Result<ImageResult, BlazenError>;
async fn upscale_image(&self, request: UpscaleRequest) -> Result<ImageResult, BlazenError>;
}
Usage:
use blazen_llm::compute::{ImageGeneration, ImageRequest};
let result = provider.generate_image(
ImageRequest::new("a cat in space")
.with_size(1024, 1024)
.with_count(2)
.with_negative_prompt("blurry")
.with_model("flux-dev"),
).await?;
for image in &result.images {
println!("url: {:?}, {}x{}", image.media.url, image.width.unwrap_or(0), image.height.unwrap_or(0));
}
VideoGeneration
Video synthesis from text or images. Requires ComputeProvider as a supertrait.
#[async_trait]
pub trait VideoGeneration: ComputeProvider {
async fn text_to_video(&self, request: VideoRequest) -> Result<VideoResult, BlazenError>;
async fn image_to_video(&self, request: VideoRequest) -> Result<VideoResult, BlazenError>;
}
Usage:
use blazen_llm::compute::{VideoGeneration, VideoRequest};
// Text-to-video
let result = provider.text_to_video(
VideoRequest::new("a sunset timelapse")
.with_duration(5.0)
.with_size(1920, 1080)
.with_model("kling"),
).await?;
// Image-to-video
let result = provider.image_to_video(
VideoRequest::for_image("https://example.com/img.png", "animate this scene")
.with_duration(3.0),
).await?;
AudioGeneration
Audio synthesis including TTS, music, and sound effects. Requires ComputeProvider as a supertrait.
#[async_trait]
pub trait AudioGeneration: ComputeProvider {
async fn text_to_speech(&self, request: SpeechRequest) -> Result<AudioResult, BlazenError>;
// Default: returns BlazenError::Unsupported
async fn generate_music(&self, request: MusicRequest) -> Result<AudioResult, BlazenError>;
// Default: returns BlazenError::Unsupported
async fn generate_sfx(&self, request: MusicRequest) -> Result<AudioResult, BlazenError>;
}
generate_music() and generate_sfx() have default implementations that return BlazenError::Unsupported. Providers override only the methods they support.
Usage:
use blazen_llm::compute::{AudioGeneration, SpeechRequest, MusicRequest};
let speech = provider.text_to_speech(
SpeechRequest::new("Hello world")
.with_voice("alloy")
.with_language("en")
.with_speed(1.0)
.with_voice_url("https://example.com/voice.wav") // voice cloning
.with_model("tts-1"),
).await?;
let music = provider.generate_music(
MusicRequest::new("upbeat jazz")
.with_duration(30.0)
.with_model("musicgen"),
).await?;
Transcription
Audio transcription (speech-to-text). Requires ComputeProvider as a supertrait.
#[async_trait]
pub trait Transcription: ComputeProvider {
async fn transcribe(
&self,
request: TranscriptionRequest,
) -> Result<TranscriptionResult, BlazenError>;
}
Usage:
use blazen_llm::compute::{Transcription, TranscriptionRequest};
let result = provider.transcribe(
TranscriptionRequest::new("https://example.com/audio.mp3")
.with_language("en")
.with_diarize(true)
.with_model("whisper-v3"),
).await?;
println!("Full text: {}", result.text);
for segment in &result.segments {
println!("[{:.1}s - {:.1}s] {}: {}",
segment.start, segment.end,
segment.speaker.as_deref().unwrap_or("?"),
segment.text,
);
}
ThreeDGeneration
3D model generation from text or images. Requires ComputeProvider as a supertrait.
#[async_trait]
pub trait ThreeDGeneration: ComputeProvider {
async fn generate_3d(&self, request: ThreeDRequest) -> Result<ThreeDResult, BlazenError>;
}
Usage:
use blazen_llm::compute::{ThreeDGeneration, ThreeDRequest};
// Text-to-3D
let result = provider.generate_3d(
ThreeDRequest::new("a 3D cat")
.with_format("glb")
.with_model("triposr"),
).await?;
// Image-to-3D
let result = provider.generate_3d(
ThreeDRequest::from_image("https://example.com/cat.png")
.with_format("obj"),
).await?;
for model_3d in &result.models {
println!("vertices: {:?}, faces: {:?}, textures: {}, animations: {}",
model_3d.vertex_count, model_3d.face_count,
model_3d.has_textures, model_3d.has_animations,
);
}
Compute Request Types
ImageRequest
| Field | Type | Description |
|---|---|---|
prompt | String | Text prompt describing the desired image |
negative_prompt | Option<String> | Things to avoid in the image |
width | Option<u32> | Desired width in pixels |
height | Option<u32> | Desired height in pixels |
num_images | Option<u32> | Number of images to generate |
model | Option<String> | Model override |
parameters | serde_json::Value | Additional provider-specific parameters |
Builder: ImageRequest::new(prompt).with_size(w, h).with_count(n).with_negative_prompt(p).with_model(m)
UpscaleRequest
| Field | Type | Description |
|---|---|---|
image_url | String | URL of the image to upscale |
scale | f32 | Scale factor (e.g. 2.0, 4.0) |
model | Option<String> | Model override |
parameters | serde_json::Value | Additional provider-specific parameters |
Builder: UpscaleRequest::new(url, scale).with_model(m)
VideoRequest
| Field | Type | Description |
|---|---|---|
prompt | String | Text prompt |
image_url | Option<String> | Source image for image-to-video |
duration_seconds | Option<f32> | Desired duration in seconds |
negative_prompt | Option<String> | Things to avoid |
width | Option<u32> | Desired width in pixels |
height | Option<u32> | Desired height in pixels |
model | Option<String> | Model override |
parameters | serde_json::Value | Additional provider-specific parameters |
Builder: VideoRequest::new(prompt) or VideoRequest::for_image(url, prompt), then .with_duration(s).with_size(w, h).with_model(m)
SpeechRequest
| Field | Type | Description |
|---|---|---|
text | String | Text to synthesize |
voice | Option<String> | Voice identifier (provider-specific) |
voice_url | Option<String> | Reference voice URL for voice cloning |
language | Option<String> | Language code (e.g. "en", "fr") |
speed | Option<f32> | Speed multiplier (1.0 = normal) |
model | Option<String> | Model override |
parameters | serde_json::Value | Additional provider-specific parameters |
Builder: SpeechRequest::new(text).with_voice(v).with_voice_url(url).with_language(l).with_speed(s).with_model(m)
MusicRequest
| Field | Type | Description |
|---|---|---|
prompt | String | Text prompt |
duration_seconds | Option<f32> | Desired duration in seconds |
model | Option<String> | Model override |
parameters | serde_json::Value | Additional provider-specific parameters |
Builder: MusicRequest::new(prompt).with_duration(s).with_model(m)
TranscriptionRequest
| Field | Type | Description |
|---|---|---|
audio_url | String | URL of the audio file |
language | Option<String> | Language hint |
diarize | bool | Enable speaker diarization (default: false) |
model | Option<String> | Model override |
parameters | serde_json::Value | Additional provider-specific parameters |
Builder: TranscriptionRequest::new(url).with_language(l).with_diarize(true).with_model(m)
ThreeDRequest
| Field | Type | Description |
|---|---|---|
prompt | String | Text prompt |
image_url | Option<String> | Source image for image-to-3D |
format | Option<String> | Output format (e.g. "glb", "obj", "usdz") |
model | Option<String> | Model override |
parameters | serde_json::Value | Additional provider-specific parameters |
Builder: ThreeDRequest::new(prompt) or ThreeDRequest::from_image(url), then .with_format(f).with_model(m)
Compute Result Types
ImageResult
| Field | Type | Description |
|---|---|---|
images | Vec<GeneratedImage> | The generated/upscaled images |
timing | RequestTiming | Request timing breakdown |
cost | Option<f64> | Cost in USD |
metadata | serde_json::Value | Provider-specific metadata |
VideoResult
| Field | Type | Description |
|---|---|---|
videos | Vec<GeneratedVideo> | The generated videos |
timing | RequestTiming | Request timing breakdown |
cost | Option<f64> | Cost in USD |
metadata | serde_json::Value | Provider-specific metadata |
AudioResult
| Field | Type | Description |
|---|---|---|
audio | Vec<GeneratedAudio> | The generated audio clips |
timing | RequestTiming | Request timing breakdown |
cost | Option<f64> | Cost in USD |
metadata | serde_json::Value | Provider-specific metadata |
ThreeDResult
| Field | Type | Description |
|---|---|---|
models | Vec<Generated3DModel> | The generated 3D models |
timing | RequestTiming | Request timing breakdown |
cost | Option<f64> | Cost in USD |
metadata | serde_json::Value | Provider-specific metadata |
TranscriptionResult
| Field | Type | Description |
|---|---|---|
text | String | Full transcribed text |
segments | Vec<TranscriptionSegment> | Time-aligned segments |
language | Option<String> | Detected/specified language code |
timing | RequestTiming | Request timing breakdown |
cost | Option<f64> | Cost in USD |
metadata | serde_json::Value | Provider-specific metadata |
TranscriptionSegment
| Field | Type | Description |
|---|---|---|
text | String | Transcribed text for this segment |
start | f64 | Start time in seconds |
end | f64 | End time in seconds |
speaker | Option<String> | Speaker label (if diarization was enabled) |
Compute Job Types
ComputeRequest
| Field | Type | Description |
|---|---|---|
model | String | Model/endpoint to run (e.g. "fal-ai/flux/dev") |
input | serde_json::Value | Input parameters as JSON (model-specific) |
webhook | Option<String> | Webhook URL for async completion notification |
ComputeResult
| Field | Type | Description |
|---|---|---|
job | Option<JobHandle> | The job handle that produced this result |
output | serde_json::Value | Output data (model-specific JSON) |
timing | RequestTiming | Request timing breakdown |
cost | Option<f64> | Cost in USD |
metadata | serde_json::Value | Provider-specific metadata |
JobHandle
| Field | Type | Description |
|---|---|---|
id | String | Provider-assigned job identifier |
provider | String | Provider name (e.g. "fal") |
model | String | Model/endpoint that was invoked |
submitted_at | DateTime<Utc> | When the job was submitted |
JobStatus
pub enum JobStatus {
Queued,
Running,
Completed,
Failed { error: String },
Cancelled,
}
Media
MediaType
Exhaustive enumeration of media formats with detection support. Covers images, video, audio, 3D models, documents, and a catch-all Other variant.
Variants:
| Category | Variants |
|---|---|
| Image | Png, Jpeg, WebP, Gif, Svg, Bmp, Tiff, Avif, Ico |
| Video | Mp4, WebM, Mov, Avi, Mkv |
| Audio | Mp3, Wav, Ogg, Flac, Aac, M4a, WebmAudio |
| 3D | Glb, Gltf, Obj, Fbx, Usdz, Stl, Ply |
| Document | Pdf |
| Catch-all | Other { mime: String } |
Methods:
| Method | Signature | Description |
|---|---|---|
mime() | &self -> &str | Return the MIME type string |
extension() | &self -> &str | Return the canonical file extension (no dot) |
magic_bytes() | &self -> Option<&'static [u8]> | Return the magic byte signature, if any |
detect(bytes) | fn(&[u8]) -> Option<Self> | Detect media type from file header bytes |
from_mime(mime) | fn(&str) -> Self | Parse a MIME string (unknown = Other) |
from_extension(ext) | fn(&str) -> Self | Parse a file extension (unknown = Other) |
is_image() | &self -> bool | Is this an image format? |
is_video() | &self -> bool | Is this a video format? |
is_audio() | &self -> bool | Is this an audio format? |
is_3d() | &self -> bool | Is this a 3D model format? |
is_vector() | &self -> bool | Is this a text-based format (SVG, GLTF, OBJ)? |
MediaType implements Display (outputs the MIME string).
Example:
use blazen_llm::MediaType;
let mt = MediaType::from_extension("png");
assert_eq!(mt.mime(), "image/png");
assert!(mt.is_image());
// Detect from raw bytes
let bytes = [0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A];
assert_eq!(MediaType::detect(&bytes), Some(MediaType::Png));
MediaOutput
A single piece of generated media content. At least one of url, base64, or raw_content will be populated.
| Field | Type | Description |
|---|---|---|
url | Option<String> | URL where the media can be downloaded |
base64 | Option<String> | Base64-encoded media data |
raw_content | Option<String> | Raw text content (SVG, OBJ, GLTF JSON) |
media_type | MediaType | Format of the media |
file_size | Option<u64> | File size in bytes |
metadata | serde_json::Value | Provider-specific metadata |
Constructors:
let output = MediaOutput::from_url("https://example.com/img.png", MediaType::Png);
let output = MediaOutput::from_base64("iVBORw0KGgo=", MediaType::Png);
GeneratedImage
| Field | Type | Description |
|---|---|---|
media | MediaOutput | The image media output |
width | Option<u32> | Width in pixels |
height | Option<u32> | Height in pixels |
GeneratedVideo
| Field | Type | Description |
|---|---|---|
media | MediaOutput | The video media output |
width | Option<u32> | Width in pixels |
height | Option<u32> | Height in pixels |
duration_seconds | Option<f32> | Duration in seconds |
fps | Option<f32> | Frames per second |
GeneratedAudio
| Field | Type | Description |
|---|---|---|
media | MediaOutput | The audio media output |
duration_seconds | Option<f32> | Duration in seconds |
sample_rate | Option<u32> | Sample rate in Hz |
channels | Option<u8> | Number of audio channels |
Generated3DModel
| Field | Type | Description |
|---|---|---|
media | MediaOutput | The 3D model media output |
vertex_count | Option<u64> | Total vertex count |
face_count | Option<u64> | Total face/triangle count |
has_textures | bool | Whether the model includes textures |
has_animations | bool | Whether the model includes animations |
LocalModel Trait
The LocalModel trait provides explicit load/unload lifecycle management for models running in-process (llama.cpp, whisper.cpp, etc.). Remote API providers do not implement this trait.
#[async_trait]
pub trait LocalModel: Send + Sync {
async fn load(&self) -> Result<(), BlazenError>;
async fn unload(&self) -> Result<(), BlazenError>;
async fn is_loaded(&self) -> bool;
fn device(&self) -> blazen_llm::Device { Device::Cpu }
async fn memory_bytes(&self) -> Option<u64>;
}
| Method | Description |
|---|---|
load() | Load the model into memory/VRAM. Idempotent. |
unload() | Free the model’s memory/VRAM. Idempotent. |
is_loaded() | Whether the model is currently loaded. |
device() | Which device the model targets. Determines which pool the ModelManager charges this model against. Defaults to Device::Cpu. |
memory_bytes() | Approximate memory footprint in bytes (host RAM if device() is Device::Cpu, GPU VRAM otherwise), or None if unknown. |
A type can implement both CompletionModel and LocalModel:
struct MyLocalLLM { /* ... */ }
#[async_trait::async_trait]
impl CompletionModel for MyLocalLLM {
fn model_id(&self) -> &str { "my-local-llm" }
async fn complete(&self, request: CompletionRequest) -> Result<CompletionResponse, BlazenError> {
self.load().await?; // auto-load on first call
// inference logic
todo!()
}
async fn stream(&self, request: CompletionRequest) -> Result<Pin<Box<dyn Stream<Item = Result<StreamChunk, BlazenError>> + Send>>, BlazenError> {
todo!()
}
}
#[async_trait::async_trait]
impl LocalModel for MyLocalLLM {
async fn load(&self) -> Result<(), BlazenError> { /* load weights */ Ok(()) }
async fn unload(&self) -> Result<(), BlazenError> { /* free VRAM */ Ok(()) }
async fn is_loaded(&self) -> bool { true }
fn device(&self) -> Device { Device::Cuda(0) }
async fn memory_bytes(&self) -> Option<u64> { Some(4_000_000_000) }
}
ModelManager
:::caution[Capacity, not performance]
ModelManager is a memory budget bookkeeper, not a performance scheduler. It answers “will this fit?” — not “will this run fast?”. Whether a 70B model loaded on CPU is useful at 1–3 tok/s is a workload-choice question the manager intentionally does not answer.
:::
Per-pool memory budget-aware model manager with LRU eviction. Tracks registered LocalModel instances and their estimated memory footprint, indexed by the Pool each model targets (Pool::Cpu or Pool::Gpu(index)). When loading a model would exceed that pool’s budget, the least-recently-used loaded model in the same pool is unloaded first. Models in different pools never evict each other.
use std::collections::HashMap;
use std::sync::Arc;
use blazen_llm::Pool;
use blazen_manager::ModelManager;
// Common case: CPU RAM + GPU(0) VRAM budgets in GB.
let manager = ModelManager::with_budgets_gb(64.0, 24.0);
// Explicit per-pool budgets in bytes.
let manager = ModelManager::new(HashMap::from([
(Pool::Cpu, 64 * 1024 * 1024 * 1024),
(Pool::Gpu(0), 24 * 1024 * 1024 * 1024),
]));
Methods
| Method | Signature | Description |
|---|---|---|
new | fn(pool_budgets: HashMap<Pool, u64>) -> Self | Create a manager with explicit per-pool byte budgets. |
with_budgets_gb | fn(cpu_ram_gb: f64, gpu_vram_gb: f64) -> Self | Convenience constructor for Pool::Cpu + Pool::Gpu(0) budgets in GB. |
register | async fn(&self, id: &str, model: Arc<dyn LocalModel>, memory_estimate_bytes: u64) | Register a model. Starts unloaded. The model’s device() selects the pool. |
load | async fn(&self, id: &str) -> Result<()> | Load a model, evicting LRU models within the same pool if needed. |
unload | async fn(&self, id: &str) -> Result<()> | Unload a model and free its memory. |
is_loaded | async fn(&self, id: &str) -> bool | Check if a model is currently loaded. |
ensure_loaded | async fn(&self, id: &str) -> Result<()> | Alias for load(). |
used_bytes | async fn(&self, pool: Pool) -> u64 | Bytes currently used by loaded models in the given pool. |
available_bytes | async fn(&self, pool: Pool) -> u64 | Bytes available within the given pool’s budget. |
pools | fn(&self) -> Vec<(Pool, u64)> | All configured pools and their budgets. |
status | async fn(&self) -> Vec<ModelStatus> | Status of all registered models. |
ModelStatus
pub struct ModelStatus {
pub id: String,
pub loaded: bool,
pub memory_estimate_bytes: u64,
pub pool: Pool,
}
| Field | Type | Description |
|---|---|---|
id | String | Model identifier. |
loaded | bool | Whether the model is currently loaded. |
memory_estimate_bytes | u64 | Estimated memory footprint in bytes (interpreted against pool). |
pool | Pool | Which pool this model is charged against (Pool::Cpu or Pool::Gpu(N)). |
Usage:
use std::sync::Arc;
use blazen_llm::Pool;
use blazen_manager::ModelManager;
let manager = ModelManager::with_budgets_gb(64.0, 24.0);
manager.register("llama", Arc::new(my_llama_model), 8 * 1_073_741_824).await;
manager.register("whisper", Arc::new(my_whisper_model), 2 * 1_073_741_824).await;
manager.load("llama").await?;
assert!(manager.is_loaded("llama").await);
println!("GPU used: {} bytes", manager.used_bytes(Pool::Gpu(0)).await);
println!("GPU available: {} bytes", manager.available_bytes(Pool::Gpu(0)).await);
// Loading whisper may evict llama if the GPU pool is tight (only if both target Gpu(0))
manager.load("whisper").await?;
manager.unload("llama").await?;
Pricing
The pricing module provides a global thread-safe registry of per-model pricing data, pre-seeded with defaults for well-known models.
PricingEntry
| Field | Type | Description |
|---|---|---|
input_per_million | f64 | Cost per million input tokens (USD). |
output_per_million | f64 | Cost per million output tokens (USD). |
register_pricing()
Register or override pricing for a model. Model IDs are normalized before storage.
use blazen_llm::{register_pricing, PricingEntry};
register_pricing("my-model", PricingEntry {
input_per_million: 1.0,
output_per_million: 2.0,
});
lookup_pricing()
Look up pricing for a model by ID. Returns None if the model is unknown.
use blazen_llm::lookup_pricing;
if let Some(entry) = lookup_pricing("gpt-4o") {
println!("Input: ${}/M tokens", entry.input_per_million);
}
compute_cost()
Compute the cost of a request given a model ID and token usage.
use blazen_llm::{compute_cost, TokenUsage};
let usage = TokenUsage { prompt_tokens: 1000, completion_tokens: 500, total_tokens: 1500 };
if let Some(cost) = compute_cost("gpt-4o", &usage) {
println!("Cost: ${:.4}", cost);
}
MemoryBackend Trait
Low-level storage backend used by Memory. Backends are responsible for persistence and band-based candidate retrieval. They do not perform embedding or ELID encoding.
#[async_trait]
pub trait MemoryBackend: Send + Sync {
async fn put(&self, entry: StoredEntry) -> Result<()>;
async fn get(&self, id: &str) -> Result<Option<StoredEntry>>;
async fn delete(&self, id: &str) -> Result<bool>;
async fn list(&self) -> Result<Vec<StoredEntry>>;
async fn len(&self) -> Result<usize>;
async fn is_empty(&self) -> Result<bool>; // default: self.len() == 0
async fn search_by_bands(
&self,
bands: &[u64],
limit: usize,
) -> Result<Vec<StoredEntry>>;
}
Implementing a Custom Backend
use blazen_memory::store::{MemoryBackend, StoredEntry};
use anyhow::Result;
struct PostgresBackend {
pool: sqlx::PgPool,
}
#[async_trait::async_trait]
impl MemoryBackend for PostgresBackend {
async fn put(&self, entry: StoredEntry) -> Result<()> {
// INSERT or UPDATE in Postgres
todo!()
}
async fn get(&self, id: &str) -> Result<Option<StoredEntry>> {
// SELECT by id
todo!()
}
async fn delete(&self, id: &str) -> Result<bool> {
// DELETE by id, return true if row existed
todo!()
}
async fn list(&self) -> Result<Vec<StoredEntry>> {
// SELECT all
todo!()
}
async fn len(&self) -> Result<usize> {
// SELECT COUNT(*)
todo!()
}
async fn search_by_bands(
&self,
bands: &[u64],
limit: usize,
) -> Result<Vec<StoredEntry>> {
// Query entries sharing at least one band
todo!()
}
}
Built-in Backends
| Backend | Description |
|---|---|
InMemoryBackend | In-process HashMap storage. Fast, no persistence. |
JsonlBackend | Append-only JSONL file storage. |
ValkeyBackend | Redis/Valkey-backed storage. |
Error Handling
BlazenError
The unified error type for all Blazen LLM and compute operations.
| Variant | Fields | Description |
|---|---|---|
Auth | message: String | Authentication failed |
RateLimit | retry_after_ms: Option<u64> | Rate limited by the provider |
Timeout | elapsed_ms: u64 | Request timed out |
Provider | provider: String, message: String, status_code: Option<u16> | Provider-specific error |
Validation | field: Option<String>, message: String | Invalid input |
ContentPolicy | message: String | Content policy violation |
Unsupported | message: String | Requested capability is not supported |
Serialization | String | JSON serialization/deserialization error |
Request | message: String, source: Option<Box<dyn Error>> | Network or request-level failure |
Completion | CompletionErrorKind | LLM completion-specific error |
Compute | ComputeErrorKind | Compute job-specific error |
Media | MediaErrorKind | Media-specific error |
Tool | name: Option<String>, message: String | Tool execution error |
CompletionErrorKind
| Variant | Description |
|---|---|
NoContent | Model returned no content |
ModelNotFound(String) | Model not found |
InvalidResponse(String) | Invalid response from the model |
Stream(String) | Streaming error |
ComputeErrorKind
| Variant | Fields | Description |
|---|---|---|
JobFailed | message: String, error_type: Option<String>, retryable: bool | Compute job failed |
Cancelled | — | Job was cancelled |
QuotaExceeded | message: String | Provider quota exceeded |
MediaErrorKind
| Variant | Fields | Description |
|---|---|---|
Invalid | media_type: Option<String>, message: String | Invalid media |
TooLarge | size_bytes: u64, max_bytes: u64 | Media exceeds size limit |
is_retryable()
impl BlazenError {
pub fn is_retryable(&self) -> bool;
}
Returns true for RateLimit, Timeout, Request, provider errors with status >= 500, and ComputeErrorKind::JobFailed where retryable is true.
Convenience Constructors
BlazenError::auth("invalid api key")
BlazenError::timeout(5000)
BlazenError::timeout_from_duration(elapsed)
BlazenError::request("connection reset")
BlazenError::unsupported("music generation not available")
BlazenError::provider("openai", "internal server error")
BlazenError::validation("prompt must not be empty")
BlazenError::tool_error("unknown tool: foo")
BlazenError::no_content()
BlazenError::model_not_found("gpt-5")
BlazenError::invalid_response("missing content field")
BlazenError::stream_error("unexpected EOF")
BlazenError::job_failed("GPU out of memory")
BlazenError::cancelled()
BlazenError also implements From<serde_json::Error> for automatic conversion.
Custom Providers
Implementing CompletionModel
use blazen_llm::{
CompletionModel, CompletionRequest, CompletionResponse, StreamChunk, BlazenError,
};
use std::pin::Pin;
use futures_util::Stream;
struct MyProvider {
api_key: String,
}
#[async_trait::async_trait]
impl CompletionModel for MyProvider {
fn model_id(&self) -> &str {
"my-custom-model"
}
async fn complete(
&self,
request: CompletionRequest,
) -> Result<CompletionResponse, BlazenError> {
// Your HTTP/gRPC/local inference logic here
todo!()
}
async fn stream(
&self,
request: CompletionRequest,
) -> Result<
Pin<Box<dyn Stream<Item = Result<StreamChunk, BlazenError>> + Send>>,
BlazenError,
> {
// Your streaming implementation here
todo!()
}
}
Once implemented, MyProvider automatically gets StructuredOutput via the blanket impl, so model.extract::<T>(messages) works out of the box.
Implementing ComputeProvider + ImageGeneration
use blazen_llm::compute::*;
use blazen_llm::BlazenError;
struct MyImageProvider {
api_key: String,
}
#[async_trait::async_trait]
impl ComputeProvider for MyImageProvider {
fn provider_id(&self) -> &str { "my-image-provider" }
async fn submit(&self, request: ComputeRequest) -> Result<JobHandle, BlazenError> {
todo!()
}
async fn status(&self, job: &JobHandle) -> Result<JobStatus, BlazenError> {
todo!()
}
async fn result(&self, job: JobHandle) -> Result<ComputeResult, BlazenError> {
todo!()
}
async fn cancel(&self, job: &JobHandle) -> Result<(), BlazenError> {
todo!()
}
}
#[async_trait::async_trait]
impl ImageGeneration for MyImageProvider {
async fn generate_image(
&self,
request: ImageRequest,
) -> Result<ImageResult, BlazenError> {
// Convert ImageRequest to your provider's format and call the API
todo!()
}
async fn upscale_image(
&self,
request: UpscaleRequest,
) -> Result<ImageResult, BlazenError> {
todo!()
}
}
Built-in Providers
| Provider | Feature | Traits Implemented |
|---|---|---|
OpenAiProvider | openai | CompletionModel, StructuredOutput |
OpenAiCompatProvider | openai | CompletionModel, StructuredOutput, ModelRegistry |
AnthropicProvider | anthropic | CompletionModel, StructuredOutput |
GeminiProvider | gemini | CompletionModel, StructuredOutput, ModelRegistry |
AzureOpenAiProvider | azure | CompletionModel, StructuredOutput |
FalProvider | fal | CompletionModel, StructuredOutput, ComputeProvider, ImageGeneration, VideoGeneration, AudioGeneration, Transcription |
OpenAiCompatProvider Presets
OpenAiCompatProvider works with any OpenAI-compatible endpoint. Named constructors are provided for popular services:
use blazen_llm::providers::openai_compat::OpenAiCompatProvider;
let groq = OpenAiCompatProvider::groq("gsk-...");
let openrouter = OpenAiCompatProvider::openrouter("sk-or-...");
let together = OpenAiCompatProvider::together("...");
let mistral = OpenAiCompatProvider::mistral("...");
let deepseek = OpenAiCompatProvider::deepseek("...");
let fireworks = OpenAiCompatProvider::fireworks("...");
let perplexity = OpenAiCompatProvider::perplexity("...");
let xai = OpenAiCompatProvider::xai("...");
let cohere = OpenAiCompatProvider::cohere("...");
let bedrock = OpenAiCompatProvider::bedrock("...", "us-east-1");
Telemetry Exporters
Re-exported from blazen_telemetry at the crate root and gated by the corresponding Cargo features (see Feature Flags). All exporters return a tracing_subscriber::Layer (or install one globally) that is composed into a tracing_subscriber::registry().
LangfuseConfig
Builder-style configuration for the Langfuse exporter. Shipping langfuse pulls in reqwest for the native ingestion client; on wasm32 the dispatcher is a no-op (events are dropped) because Langfuse export is a native-target feature.
| Method | Signature | Description |
|---|---|---|
new | fn new(public_key: impl Into<String>, secret_key: impl Into<String>) -> Self | Construct with the required Langfuse public + secret keys. Defaults host to https://cloud.langfuse.com, batch size to 100, flush interval to 5000 ms |
with_host | fn with_host(self, host: impl Into<String>) -> Self | Override the Langfuse host URL (e.g. https://eu.langfuse.com) |
with_batch_size | fn with_batch_size(self, batch_size: usize) -> Self | Maximum number of envelopes buffered before an automatic flush |
with_flush_interval_ms | fn with_flush_interval_ms(self, ms: u64) -> Self | Background flush cadence in milliseconds |
LangfuseConfig derives Debug, Clone, Serialize, Deserialize, and Default so it can be loaded from configuration files. Public/secret-key fields are required; host, batch_size, and flush_interval_ms are populated with defaults during deserialization.
LangfuseLayer
A tracing_subscriber::Layer<S> (where S: Subscriber + for<'a> LookupSpan<'a>) that maps Blazen spans to Langfuse ingestion events:
| Blazen span name | Langfuse concept | Ingestion event |
|---|---|---|
workflow.run, pipeline.run | Trace | trace-create |
workflow.step, pipeline.stage, pipeline.stage.sequential, pipeline.stage.parallel | Span | span-create |
llm.complete, llm.stream | Generation | generation-create |
Construct via init_langfuse. The layer owns an unbounded mpsc sender into a background dispatcher that batches and POSTs to {host}/api/public/ingestion with HTTP basic auth (public_key:secret_key).
init_langfuse
pub fn init_langfuse(config: LangfuseConfig) -> Result<LangfuseLayer, TelemetryError>;
Builds the HTTP client and spawns the background dispatcher on the current Tokio runtime, then returns a LangfuseLayer ready to compose into a subscriber registry. Returns TelemetryError::Langfuse if no Tokio runtime is available (native targets) or if the underlying reqwest::Client cannot be built.
use blazen_telemetry::{LangfuseConfig, init_langfuse};
use tracing_subscriber::prelude::*;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = LangfuseConfig::new("pk-lf-...", "sk-lf-...")
.with_host("https://cloud.langfuse.com")
.with_batch_size(50)
.with_flush_interval_ms(2_500);
let layer = init_langfuse(config)?;
tracing_subscriber::registry().with(layer).init();
Ok(())
}
OtlpConfig
Shared configuration for both OTLP transports.
| Field | Type | Description |
|---|---|---|
endpoint | String | OTLP endpoint URL. For otlp (gRPC): http://localhost:4317. For otlp-http: http://localhost:4318/v1/traces |
service_name | String | Reported as the service.name resource attribute on every span |
Derives Debug, Clone, Serialize, Deserialize.
init_otlp (gRPC, otlp feature, native only)
pub fn init_otlp(config: OtlpConfig) -> Result<(), Box<dyn std::error::Error>>;
Builds a tonic-backed SpanExporter, installs it into a global SdkTracerProvider, and registers a tracing_subscriber::registry() with EnvFilter, the OTel layer, and a fmt layer in one call. Native targets only — opentelemetry-otlp/grpc-tonic does not compile to wasm32-unknown-unknown.
init_otlp_http (HTTP/protobuf, otlp-http feature)
pub fn init_otlp_http(config: OtlpConfig) -> Result<(), Box<dyn std::error::Error>>;
Same shape as init_otlp but speaks OTLP/HTTP with the binary protobuf encoding. Works on both native and wasm32:
- Native: registers an internal
ReqwestHttpClient(a thinreqwest::Clientwrapper) viawith_http_client. - wasm32: registers
WasmFetchHttpClient, aweb_sys::fetch-backedopentelemetry_http::HttpClientimpl that ships in theblazen_telemetry::exporters::wasm_otlp_clientmodule.
The reason for the indirection: opentelemetry-otlp’s built-in reqwest-client / reqwest-blocking-client features compile a wasm32 reqwest::Client whose send future is !Send, which violates the HttpClient: Send + Sync bound and breaks the build. Pinning our own HttpClient impls per target keeps the trait satisfied without enabling those upstream features.
use blazen_telemetry::{OtlpConfig, init_otlp_http};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
init_otlp_http(OtlpConfig {
endpoint: "http://localhost:4318/v1/traces".to_string(),
service_name: "my-blazen-app".to_string(),
})?;
// ... run workflow; spans flush via OTLP/HTTP ...
Ok(())
}
TelemetryError
Returned by init_langfuse. Variants include Langfuse(String) (HTTP-client construction or runtime-lookup failure). Re-exported from blazen_telemetry::TelemetryError.
WASM Embeddings (blazen-embed-tract)
The blazen-embed-tract crate ships two ONNX-runtime-backed embedding providers built on tract:
| Type | Target | Source of weights |
|---|---|---|
TractEmbedModel | native | hf-hub (Hugging Face download cache, filesystem-backed) |
WasmTractEmbedModel | wasm32 only | web_sys::fetch (URLs supplied by the caller) |
The wasm variant exists because hf-hub requires a filesystem and Tokio, neither of which is available in wasm32-unknown-unknown. Both providers share the same TractOptions and pooling logic so callers generic over EmbeddingModel work unchanged.
WasmTractEmbedModel
#[cfg(target_arch = "wasm32")]
use blazen_embed_tract::wasm_provider::{WasmTractEmbedModel, WasmTractError, WasmTractResponse};
use blazen_embed_tract::options::TractOptions;
| Method | Signature | Description |
|---|---|---|
create | async fn create(model_url: &str, tokenizer_url: &str, options: TractOptions) -> Result<Self, WasmTractError> | Fetch ONNX weights and a HuggingFace tokenizer.json from the supplied URLs and build a runnable model. options.cache_dir and options.show_download_progress are ignored on wasm32 |
embed | async fn embed(&self, texts: &[String]) -> Result<WasmTractResponse, WasmTractError> | Returns one L2-normalized vector per input. Inference runs synchronously on the wasm main thread |
model_id | fn model_id(&self) -> &str | Hugging Face model id resolved from the TractOptions::model_name registry entry |
dimensions | fn dimensions(&self) -> usize | Output embedding dimensionality |
WasmTractResponse exposes embeddings: Vec<Vec<f32>> and model: String. WasmTractError variants: UnknownModel(String), Fetch { url, message }, Init(String), Embed(String).
#[cfg(target_arch = "wasm32")]
{
use blazen_embed_tract::options::TractOptions;
use blazen_embed_tract::wasm_provider::WasmTractEmbedModel;
let opts = TractOptions {
model_name: Some("bge-small-en-v1.5".to_string()),
..Default::default()
};
let model = WasmTractEmbedModel::create(
"https://huggingface.co/Xenova/bge-small-en-v1.5/resolve/main/onnx/model.onnx",
"https://huggingface.co/Xenova/bge-small-en-v1.5/resolve/main/tokenizer.json",
opts,
).await?;
let out = model.embed(&["hello world".into()]).await?;
}
The fetch loop resolves either window.fetch (browser) or globalThis.fetch (Workers, Deno, Node) so the same provider runs in every wasm host.
Send + Sync are implemented vacuously (unsafe impl) because wasm32-unknown-unknown is single-threaded; this lets WasmTractEmbedModel sit behind Arc<dyn EmbeddingModel> in target-generic code.
Local Inference Backend Re-exports
When you enable a local-inference feature on blazen-llm, the per-backend crate’s public types are re-exported at the blazen_llm crate root so callers do not need to depend on the backend crate directly. Each group of types is gated by its respective feature.
mistralrs feature
Re-exported un-prefixed (the mistralrs backend was the first local provider added and owns the canonical names):
#[cfg(feature = "mistralrs")]
use blazen_llm::{
MistralRsProvider, MistralRsOptions, MistralRsError,
ChatMessageInput, ChatRole,
InferenceChunk, InferenceChunkStream,
InferenceImage, InferenceImageSource,
InferenceResult, InferenceToolCall, InferenceUsage,
};
llamacpp feature
Re-exported with a LlamaCpp prefix to avoid colliding with the mistralrs names when both features are enabled simultaneously:
#[cfg(feature = "llamacpp")]
use blazen_llm::{
LlamaCppProvider, LlamaCppOptions, LlamaCppError,
LlamaCppChatMessageInput, LlamaCppChatRole,
LlamaCppInferenceChunk, LlamaCppInferenceChunkStream,
LlamaCppInferenceResult, LlamaCppInferenceUsage,
};
candle-llm feature
Candle exposes its result type as CandleInferenceResult (already prefixed upstream). The CandleLlmCompletionModel adapter wraps the raw provider in the CompletionModel trait:
#[cfg(feature = "candle-llm")]
use blazen_llm::{
CandleLlmProvider, CandleLlmCompletionModel,
CandleLlmOptions, CandleLlmError,
CandleInferenceResult,
};
Other local backends
| Feature | Re-exports |
|---|---|
candle-embed | CandleEmbedModel, CandleEmbedOptions, CandleEmbedError |
embed | EmbedModel, EmbedOptions, EmbedResponse, EmbedError (from blazen-embed) |
whispercpp | WhisperCppProvider, WhisperModel, WhisperOptions, WhisperError |
piper | PiperProvider, PiperOptions, PiperError |
diffusion | DiffusionProvider, DiffusionOptions, DiffusionScheduler, DiffusionError |
All five additional backends follow the same convention — enable the feature on blazen-llm, then import directly from blazen_llm::*.