Getting Started
Development Environment
Run Rust natively on your host machine. Run backing services (PostgreSQL, Valkey, Restate, RustFS, MailCrab) in Docker containers. This separation keeps your edit-compile-run cycle fast while giving you disposable, reproducible infrastructure.
Install rustup, which manages your Rust compiler, standard library, and development tools.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
The default installation profile includes rustc, cargo, clippy, and rustfmt. Add rust-analyzer (the language server) and rust-src (standard library source, needed for full rust-analyzer functionality) separately:
rustup component add rust-analyzer rust-src
Verify the installation:
rustc --version
cargo --version
Keep everything current with rustup update. Rust releases a new stable version every six weeks.
rustc compiles Rust source code. You rarely invoke it directly; cargo handles it.
cargo builds, tests, runs, and manages dependencies. It is the entry point for nearly every Rust workflow.
clippy is the official linter. Run cargo clippy to catch common mistakes and non-idiomatic patterns.
rustfmt formats code to a consistent style. Run cargo fmt to format, cargo fmt -- --check to verify without modifying files.
rust-analyzer provides IDE features (completions, diagnostics, go-to-definition, refactoring) via the Language Server Protocol. Any editor or AI coding agent with LSP support can use it.
Backing Services with Docker Compose
The application depends on five external services during development. Run them in containers so they are disposable and require no host-level installation.
| Service | Image | Ports | Purpose |
| PostgreSQL | postgres:18-alpine | 5432 | Primary database |
| Valkey | valkey/valkey:9-alpine | 6379 | Pub/sub and caching |
| Restate | docker.restate.dev/restatedev/restate:latest | 8080, 9070, 9071 | Durable execution engine |
| RustFS | rustfs/rustfs:latest | 9000, 9001 | S3-compatible object storage |
| MailCrab | marlonb/mailcrab:latest | 1025 (SMTP), 1080 (Web UI) | Email capture for testing |
Create compose.yaml at the project root:
services:
postgres:
image: postgres:18-alpine
ports:
- "5432:5432"
environment:
POSTGRES_USER: app
POSTGRES_PASSWORD: secret
POSTGRES_DB: app_dev
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app"]
interval: 5s
timeout: 5s
retries: 5
valkey:
image: valkey/valkey:9-alpine
ports:
- "6379:6379"
volumes:
- valkeydata:/data
restate:
image: docker.restate.dev/restatedev/restate:latest
ports:
- "8080:8080"
- "9070:9070"
- "9071:9071"
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
- restatedata:/target
rustfs:
image: rustfs/rustfs:latest
command: server /data --console-address ":9001"
ports:
- "9000:9000"
- "9001:9001"
environment:
RUSTFS_ROOT_USER: minioadmin
RUSTFS_ROOT_PASSWORD: minioadmin
volumes:
- rustfsdata:/data
mailcrab:
image: marlonb/mailcrab:latest
ports:
- "1080:1080"
- "1025:1025"
volumes:
pgdata:
valkeydata:
restatedata:
rustfsdata:
Start all services:
docker compose up -d
Stop containers (data persists in named volumes):
docker compose down
Stop and destroy everything, including data:
docker compose down -v
Service notes
Valkey is the BSD-licensed fork of Redis, maintained by the Linux Foundation. It is fully API-compatible with Redis, so any Redis client library works without changes. The guide uses Valkey because its licence is unambiguous.
Restate is a durable execution engine for reliable background work, workflows, and agentic AI. The extra_hosts entry allows Restate (running inside Docker) to reach your application (running on the host) via host.docker.internal. Use this hostname instead of localhost when registering service deployments with the Restate admin API on port 9070.
RustFS is an S3-compatible object storage server written in Rust, licensed under Apache 2.0. It replaces MinIO, which entered maintenance mode in December 2025. RustFS is still in alpha but functional for local development. Its web console is available at http://localhost:9001.
MailCrab captures all email sent to it. Configure your application’s SMTP to point at localhost:1025, then view captured messages at http://localhost:1080. No email leaves your machine.
Docker runtime
Any Docker-compatible runtime works: Docker Desktop, OrbStack (macOS), Colima (macOS/Linux), or Podman. The docker compose commands behave identically across all of them.
cargo xtask
cargo xtask is a convention for writing project automation as a Rust binary inside your workspace. Instead of shell scripts or Makefiles, your build tasks are Rust code: checked by the compiler, cross-platform, and requiring no external tooling beyond cargo.
The pattern works by defining a cargo alias that runs a dedicated crate.
Setup
Create the alias in .cargo/config.toml:
[alias]
xtask = "run --package xtask --"
Add an xtask crate to your workspace. In the root Cargo.toml:
[workspace]
resolver = "2"
members = ["app", "xtask"]
default-members = ["app"]
default-members prevents cargo build and cargo test from compiling the xtask crate unless explicitly requested.
Create xtask/Cargo.toml:
[package]
name = "xtask"
version = "0.1.0"
edition = "2024"
publish = false
[dependencies]
clap = { version = "4", features = ["derive"] }
xshell = "0.2"
anyhow = "1"
xshell provides shell-like command execution without invoking an actual shell. Variable interpolation is safe by construction, preventing injection.
Create xtask/src/main.rs:
use std::process::ExitCode;
use anyhow::Result;
use clap::{Parser, Subcommand};
use xshell::{cmd, Shell};
#[derive(Parser)]
#[command(name = "xtask")]
struct Cli {
#[command(subcommand)]
command: Command,
}
#[derive(Subcommand)]
enum Command {
Dev,
Migrate,
Ci,
}
fn main() -> ExitCode {
let cli = Cli::parse();
let result = match cli.command {
Command::Dev => dev(),
Command::Migrate => migrate(),
Command::Ci => ci(),
};
match result {
Ok(()) => ExitCode::SUCCESS,
Err(e) => {
eprintln!("error: {e:?}");
ExitCode::FAILURE
}
}
}
fn dev() -> Result<()> {
let sh = Shell::new()?;
cmd!(sh, "docker compose up -d").run()?;
cmd!(sh, "bacon run").run()?;
Ok(())
}
fn migrate() -> Result<()> {
let sh = Shell::new()?;
cmd!(sh, "cargo sqlx migrate run").run()?;
Ok(())
}
fn ci() -> Result<()> {
let sh = Shell::new()?;
cmd!(sh, "cargo fmt --all -- --check").run()?;
cmd!(sh, "cargo clippy --all-targets -- -D warnings").run()?;
cmd!(sh, "cargo nextest run").run()?;
Ok(())
}
Run tasks with:
cargo xtask dev
cargo xtask migrate
cargo xtask ci
Add subcommands as your project grows. Common additions: seed (populate development data), reset (drop and recreate the database), build-css (run lightningcss processing).
Editor Configuration
Any editor with Language Server Protocol support works for Rust development. Install the rust-analyzer extension or plugin for your editor of choice.
The following rust-analyzer settings matter for this stack. Apply them through your editor’s LSP configuration.
{
"rust-analyzer.check.command": "clippy",
"rust-analyzer.procMacro.enable": true,
"rust-analyzer.cargo.buildScripts.enable": true,
"rust-analyzer.check.allTargets": true
}
check.command: "clippy" runs clippy instead of cargo check on save, giving you lint feedback inline. Slightly slower on large workspaces, but the additional warnings are worth it.
procMacro.enable: true is critical for this stack. Maud’s html! macro, serde’s derive macros, and SQLx’s query! macro are all procedural macros. Without this setting, rust-analyzer cannot expand them, resulting in false errors and missing completions inside macro invocations.
cargo.buildScripts.enable: true ensures build scripts run during analysis. SQLx’s compile-time query checking depends on this.
check.allTargets: true includes tests, examples, and benchmarks in diagnostic checking.
Fast Iteration
bacon
bacon watches your source files and runs cargo commands on every change. It replaces the older cargo-watch, which is no longer actively developed (its maintainer recommends bacon).
Install it:
cargo install --locked bacon
Run it:
bacon
bacon clippy
bacon test
bacon run
bacon provides a TUI with sorted, filtered diagnostics. Press t to switch to tests, c to switch to clippy, r to run the application. The full set of keyboard shortcuts is shown in the interface.
For project-specific jobs, create a bacon.toml at the workspace root:
[jobs.run]
command = ["cargo", "run"]
watch = ["src"]
[jobs.test-integration]
command = ["cargo", "nextest", "run", "--test", "integration"]
watch = ["src", "tests"]
Linking
On Linux with Rust 1.90+, the compiler uses lld (the LLVM linker) by default. This is significantly faster than the traditional system linker and requires no configuration.
On macOS, Apple’s default linker is adequate. No special setup is needed.
Incremental compilation
Cargo enables incremental compilation by default for debug builds. After the initial compile, changing a single file typically triggers a rebuild of only the affected crate and its dependents.
Two practices keep incremental rebuilds fast:
- Split your workspace into focused crates. A change in one crate does not recompile unrelated crates. The Project Structure section covers this in detail.
- Keep macro-heavy code in leaf crates. Procedural macro expansion is one of the slower compilation phases. Isolating it limits the rebuild radius.
cargo-nextest
cargo-nextest is a test runner that executes tests in parallel across separate processes. It is noticeably faster than cargo test on projects with more than a handful of tests, and its output is easier to read.
cargo install --locked cargo-nextest
cargo nextest run
Doctests are not supported by nextest. Run them separately with cargo test --doc.
Project Structure
A Cargo workspace groups multiple crates under a single Cargo.lock and shared target/ directory. Each crate has its own Cargo.toml and its own dependency list, which means the compiler enforces boundaries between crates: if crates/domain/Cargo.toml does not list sqlx, no code in that crate can import it. This is not a convention. It is a compilation error.
Splitting a project into workspace crates gives you faster incremental builds (changing one crate does not recompile unrelated ones), enforced dependency boundaries, and a clear map of what depends on what.
Workspace layout
Use a virtual manifest, a root Cargo.toml that contains [workspace] but no [package]. All application crates live under crates/:
my-app/
Cargo.toml # virtual manifest (workspace root)
Cargo.lock
.cargo/
config.toml # cargo aliases (xtask)
crates/
server/ # binary: composition root
web/ # library: Axum handlers, routing, middleware
db/ # library: SQLx queries and database access
domain/ # library: shared types, business logic
config/ # library: environment variable parsing
jobs/ # library: Restate durable execution handlers
xtask/ # binary: build automation (dev, migrate, ci)
migrations/ # SQLx migration files
compose.yaml # backing services (Postgres, Valkey, etc.)
.env # local environment variables
The flat crates/* layout is the simplest approach. Cargo’s crate namespace is flat, so hierarchical folder structures (like crates/libs/ and crates/services/) add visual complexity that does not map to anything Cargo understands. Put everything under crates/ and use the crate names to communicate purpose.
What each crate does
server is the binary crate and the composition root. It depends on every other crate and wires them together at startup: builds the database pool, constructs the Axum router, starts the HTTP listener. This is the only crate that sees the full dependency graph.
web contains Axum handlers, route definitions, middleware configuration, and Maud templates. It depends on domain for shared types and on db for data access. All HTTP-facing code lives here.
db owns all database access. SQLx queries, connection pool management, and result-to-type mappings belong in this crate. It depends on domain for the types that queries return.
domain holds types and logic shared across the application: entity structs, error enums, validation rules, and any business logic that is not tied to a specific framework. This crate should have minimal dependencies. It does not depend on Axum, SQLx, or any infrastructure crate.
config parses environment variables into typed configuration structs at startup. It depends on serde and dotenvy, not on framework crates.
jobs contains Restate service and workflow handlers for durable background work. It depends on domain and db, but not on web. Jobs are triggered by HTTP handlers but execute independently.
xtask is the build automation crate. The Development Environment section covers its setup in detail.
Root Cargo.toml
The workspace root defines shared settings, dependency versions, and lint configuration for all members.
[workspace]
members = ["crates/*"]
resolver = "3"
[workspace.package]
edition = "2024"
version = "0.1.0"
rust-version = "1.85"
[workspace.dependencies]
tokio = { version = "1", features = ["rt-multi-thread", "macros", "signal"] }
axum = "0.8"
tower = "0.5"
tower-http = { version = "0.6", features = ["trace", "compression-gzip"] }
tower-sessions = "0.14"
maud = { version = "0.26", features = ["axum"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
sqlx = { version = "0.8", default-features = false, features = [
"runtime-tokio", "postgres", "macros", "migrate",
] }
thiserror = "2"
anyhow = "1"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
dotenvy = "0.15"
reqwest = { version = "0.12", default-features = false, features = [
"rustls-tls", "json",
] }
app-server = { path = "crates/server" }
app-web = { path = "crates/web" }
app-db = { path = "crates/db" }
app-domain = { path = "crates/domain" }
app-config = { path = "crates/config" }
app-jobs = { path = "crates/jobs" }
[workspace.lints.rust]
unsafe_code = "forbid"
rust_2018_idioms = { level = "warn", priority = -1 }
unreachable_pub = "warn"
[workspace.lints.clippy]
enum_glob_use = "warn"
implicit_clone = "warn"
dbg_macro = "warn"
Workspace dependencies
The [workspace.dependencies] table defines dependency versions once. Member crates reference them with workspace = true:
[package]
name = "app-web"
edition.workspace = true
version.workspace = true
[lints]
workspace = true
[dependencies]
axum.workspace = true
maud.workspace = true
tower.workspace = true
tower-http.workspace = true
tower-sessions.workspace = true
serde.workspace = true
tracing.workspace = true
app-domain.workspace = true
app-db.workspace = true
Members can add features on top of the workspace definition. Features are additive: you can add but not remove them.
[dependencies]
sqlx = { workspace = true, features = ["uuid", "time"] }
Workspace lints
The [workspace.lints] table shares lint configuration across all crates. Each member opts in with [lints] workspace = true. The example above forbids unsafe code project-wide and enables several useful Clippy lints.
[workspace.package] avoids repeating edition, version, and rust-version in every crate. Members inherit with edition.workspace = true, and so on. Only unpublished, internal crates should share a version this way. If you publish crates to crates.io, give them independent version numbers.
The dependency graph
The crate dependency graph for this layout looks like this:
server ──→ web ──→ domain
│ │
│ └────→ db ──→ domain
│
├──────→ db
├──────→ config
├──────→ jobs ──→ domain
│ │
│ └────→ db
└──────→ domain
domain sits at the bottom with no framework dependencies. db depends on domain and sqlx. web depends on domain, db, and axum. server depends on everything and wires it all together.
This graph is enforced by Cargo.toml files. If someone adds an axum import to the domain crate, the compiler rejects it. No linting rules or code review discipline required.
The domain crate
Keep domain lean. It holds types that multiple crates need: entity structs, ID types, error enums, validation logic. It depends on serde for serialisation and thiserror for error types. It does not depend on Axum, SQLx, Maud, or Tokio.
[package]
name = "app-domain"
edition.workspace = true
version.workspace = true
[lints]
workspace = true
[dependencies]
serde.workspace = true
thiserror.workspace = true
A typical domain crate:
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Contact {
pub id: i64,
pub name: String,
pub email: String,
}
#[derive(Debug, Deserialize)]
pub struct CreateContact {
pub name: String,
pub email: String,
}
#[derive(Debug, thiserror::Error)]
pub enum ContactError {
#[error("contact not found")]
NotFound,
#[error("email already exists")]
DuplicateEmail,
}
Other crates import these types. The db crate maps SQL rows to Contact. The web crate uses CreateContact to deserialise form submissions. Neither crate defines these types itself, so there is a single source of truth.
The server crate
The binary crate has one job: connect everything and start listening.
[package]
name = "app-server"
edition.workspace = true
version.workspace = true
[lints]
workspace = true
[dependencies]
tokio.workspace = true
axum.workspace = true
tracing.workspace = true
tracing-subscriber.workspace = true
app-web.workspace = true
app-db.workspace = true
app-config.workspace = true
use anyhow::Result;
use tracing_subscriber::EnvFilter;
#[tokio::main]
async fn main() -> Result<()> {
tracing_subscriber::fmt()
.with_env_filter(EnvFilter::from_default_env())
.init();
let config = app_config::load()?;
let db = app_db::connect(&config.database_url).await?;
let app = app_web::router(db);
let listener = tokio::net::TcpListener::bind(&config.listen_addr).await?;
tracing::info!("listening on {}", config.listen_addr);
axum::serve(listener, app).await?;
Ok(())
}
This is deliberately thin. Route definitions, middleware, and handler logic live in the web crate. The server crate only constructs dependencies and passes them in.
default-members
Set default-members in the workspace root to control which crates cargo build and cargo run operate on by default:
[workspace]
members = ["crates/*"]
default-members = ["crates/server"]
resolver = "3"
With this setting, cargo run starts the server without needing -p app-server. The xtask crate and other library crates only compile when explicitly requested or pulled in as dependencies.
When to split into more crates
Start with fewer crates than you think you need. A single lib crate alongside server and xtask is a reasonable starting point. Split when you have a concrete reason:
- Compile times. A change in one module triggers recompilation of unrelated code. Splitting into separate crates isolates the rebuild radius.
- Dependency sprawl. A module pulls in heavy dependencies that most of the codebase does not need. Moving it to its own crate keeps those dependencies contained.
- Independent deployment. A Restate worker or CLI tool needs to share domain types with the web server but should not pull in Axum.
- Team boundaries. Different people or teams own different parts of the system and want clear interfaces between them.
Do not split pre-emptively. Each new crate adds a Cargo.toml to maintain and a boundary to design. Split when the cost of staying in one crate (slow builds, tangled dependencies) exceeds the cost of the boundary.
Feature unification
Cargo unifies features of shared dependencies across all workspace members. If the web crate enables sqlx/postgres and the jobs crate enables sqlx/uuid, both features are active everywhere. Features are additive, so this usually works fine. It becomes a problem only if two crates need genuinely incompatible configurations of the same dependency, which is rare in practice.
Resolver 3 (the default with edition = "2024") already avoids unifying features across different target platforms, which eliminates the most common source of unexpected feature activation.
Architecture
Why Hypermedia-Driven Architecture
This section makes the case for hypermedia-driven architecture (HDA) as the default approach to building web applications. The arguments here are opinionated but grounded in the original definition of REST, the economics of framework migration, and the structural properties of HTML as a transfer format.
The technical implementation follows in later sections. This one answers the prior question: why build this way at all?
Roy Fielding’s 2000 doctoral dissertation, Architectural Styles and the Design of Network-based Software Architectures, defined REST as “an architectural style for distributed hypermedia systems.” The word hypermedia is not incidental. It is the subject of the entire architecture.
Chapter 5 of the dissertation specifies four interface constraints for REST. The fourth is HATEOAS: Hypermedia As The Engine of Application State. Server responses carry both data and navigational controls. The client does not hardcode knowledge of available actions. It discovers them through hypermedia links and forms embedded in the response. HTML is the canonical format that satisfies this constraint: an HTML page contains both content and the controls (links, forms, buttons) that drive state transitions.
JSON carries no native hypermedia controls. A JSON response like {"name": "Alice", "email": "alice@example.com"} contains data but no affordances. The client must know in advance what URLs to call, what HTTP methods to use, and what payloads to send. This requires out-of-band documentation and tight client-server coupling, which is precisely what REST’s uniform interface constraint was designed to prevent.
By 2008, the drift had become bad enough that Fielding wrote a blog post titled “REST APIs must be hypertext-driven”:
I am getting frustrated by the number of people calling any HTTP-based interface a REST API. […] If the engine of application state (and hence the API) is not being driven by hypertext, then it cannot be RESTful and cannot be a REST API. Period.
The industry ignored him. The Richardson Maturity Model, popularised by Martin Fowler, formalised REST into “levels.” Most developers stopped at Level 2 (HTTP verbs and resource URLs) and never implemented Level 3 (hypermedia controls). When JSON replaced XML as the dominant transfer format, the “REST” label stuck even though the defining constraint had been dropped. What the industry calls a “RESTful API” is, by Fielding’s definition, RPC with nice URLs.
This matters because the original REST architecture was designed to solve real problems: evolvability, loose coupling, and independent deployment of client and server. Those problems did not go away when the industry adopted JSON APIs. The solutions were simply abandoned.
The HDA architecture defined
A hypermedia-driven application (HDA) returns HTML from the server, not JSON. The term comes from Carson Gross, creator of htmx, and is defined in detail in the book Hypermedia Systems and on the htmx website.
The architecture has two constraints:
-
Hypermedia communication. The server responds to HTTP requests with HTML. The client renders it. There is no JSON serialisation layer, no client-side data model, and no mapping between API responses and UI state. The HTML is the interface.
-
Declarative interactivity. HTML-embedded attributes (such as htmx’s hx-get, hx-post, hx-swap) drive dynamic behaviour. The developer declares what should happen in the markup rather than writing imperative JavaScript to manage requests, state, and DOM updates.
The key mechanism is partial page replacement. When the user interacts with an element, the browser sends an HTTP request and receives an HTML fragment. That fragment replaces a targeted region of the DOM. The server controls what the user sees next, because the server produces the HTML. The client is a rendering engine, not an application runtime.
This eliminates an entire layer of software. In a typical SPA, the server serialises data to JSON, the client deserialises it, maps it into a state store, derives a virtual DOM from that state, and diffs it against the real DOM. In HDA, the server renders HTML and the browser displays it. The serialisation, deserialisation, state management, and virtual DOM diffing layers do not exist because they are not needed.
An HDA is not a traditional multi-page application with full page reloads on every click. The partial replacement model provides the same responsiveness that SPAs deliver, but the interactivity logic lives on the server rather than in client-side JavaScript.
The coupling advantage
Each endpoint in an HDA produces self-contained HTML. A handler for GET /contacts/42/edit returns an edit form. That form contains the data, the input fields, the validation rules (via HTML5 attributes), and the submit action (via the form’s action attribute or htmx attributes). Everything the client needs is in the response. There is no shared state to coordinate with.
SPA architectures centralise client-side state. React applications commonly use a global state store (Redux, Zustand, Jotai, or React Context) to hold data that multiple components need. This creates a coupling pattern: when you change the shape of data in the store, every component that reads or writes that data must be updated.
Redux’s single-store design has been criticised for exhibiting the God Object anti-pattern, where a single entity becomes tightly coupled to much of the codebase. Changes intended to benefit one feature create ripple effects in unrelated features. The React-Redux community documented this problem: hooks encourage tight coupling between Redux state shape and component internals, reducing testability and violating the single responsibility principle.
The single-spa project (a framework for combining multiple SPAs) explicitly warns against sharing Redux stores across micro-frontends: “if you find yourself needing constant sharing of UI state, your microfrontends are likely more coupled than they should be.” This is an acknowledgement from within the SPA ecosystem that centralised client state creates coupling problems.
In HDA, the coupling boundary is the HTTP response. Each response is stateless and self-contained. The server can change the HTML structure of one endpoint without affecting any other endpoint, because there is no shared client-side state that binds them together. Two developers can modify two different pages concurrently with zero coordination. This property is structural, not a matter of discipline. It falls out automatically from the architecture.
The framework migration tax
JavaScript framework churn imposes a recurring cost on every project built with a client-side framework.
AngularJS to Angular 2+. React class components to hooks to server components. Vue 2 to Vue 3. Each major transition changes fundamental patterns: how components are defined, how state is managed, how side effects are handled. Code written against the old patterns must be rewritten, not just updated.
A peer-reviewed study by Ferreira, Borges, and Valente (On the (Un-)Adoption of JavaScript Front-end Frameworks, published in Software: Practice and Experience, 2021) examined 12 open-source projects that performed framework migrations. The findings:
- The time spent performing the migration was greater than or equal to the time spent using the old framework in all 12 projects.
- In 5 of the 12 projects, the time spent migrating exceeded the time spent using both the old and new frameworks combined.
- Migration durations ranged from 7 days to 966 days.
AngularJS reached end-of-life on 31 December 2021. Three years later, BuiltWith reports over one million live websites still running AngularJS. WebTechSurvey puts the figure above 500,000. The exact count varies by measurement method, but the order of magnitude is clear: hundreds of thousands of applications remain on a deprecated, unpatched framework because migrating to Angular 2+ requires a near-complete rewrite of the client-side codebase.
This is not a one-time problem. React’s transition from class components to hooks changed every component pattern in the ecosystem. The ongoing shift toward React Server Components is changing the execution model itself, blurring the boundary between server and client in ways that require rethinking application architecture. Each transition resets knowledge, breaks libraries, and forces rewrites.
The migration tax is a structural property of the SPA model: when interactivity logic lives in client-side JavaScript tied to a specific framework’s component model, that logic must be rewritten whenever the framework’s model changes. HDA does not eliminate the need to stay current with server-side tools, but server-side framework transitions (switching from one Rust web framework to another, for example) affect route definitions and middleware, not the fundamental rendering model. The HTML your server produces is the same regardless of which framework generates it.
The backward-compatibility guarantee
No HTML element has ever been removed from the specification in a way that breaks rendering.
The WHATWG HTML Standard, which governs HTML as a living specification, lists obsolete elements including <marquee>, <center>, <font>, <frame>, and <acronym>. Authors are told not to use them. But the specification still mandates that browsers render them. <marquee> has a complete interface specification (HTMLMarqueeElement) with defined behaviour. <acronym> must be treated as equivalent to <abbr> for rendering purposes. These elements work in every modern browser because the spec requires it.
This is not accidental. It is policy. The W3C HTML Design Principles document establishes a priority of constituencies: “In case of conflict, consider users over authors over implementors over specifiers over theoretical purity.” Backward compatibility flows directly from this principle: breaking existing content harms users, so the specification does not break existing content.
The WHATWG’s founding position reinforces this:
Technologies need to be backwards compatible, that specifications and implementations need to match even if this means changing the specification rather than the implementations.
An application built on HTML, CSS, and HTTP in 2026 can reasonably expect its platform foundation to remain stable for decades. The same HTML that rendered in Netscape Navigator still renders in Chrome today. No JavaScript framework has provided, or can provide, a comparable guarantee. React is 12 years old and has undergone three major paradigm shifts. The <form> element is 31 years old and works exactly as it did in 1995, with additional capabilities layered on top.
This is the core durability argument for HDA. Your investment in HTML templates, HTTP handlers, and declarative interactivity attributes is protected by the strongest backward-compatibility commitment in software: the web platform’s refusal to break existing content.
No separate API layer
In HDA, the HTML response is the API. There is no JSON layer to design, version, document, or maintain.
A traditional SPA architecture requires two applications: a client-side app that renders UI, and a server-side API that produces JSON. These are developed, tested, deployed, and versioned as separate artefacts with a contract between them. When the contract changes, both sides must change in coordination.
HDA collapses this into one application. An Axum handler receives a request, queries the database, renders HTML with Maud, and returns it. The browser displays the HTML. There is one codebase, one deployment, one thing to reason about.
This has practical consequences:
- No API versioning. The server controls the HTML. If the data model changes, the server updates the template. There is no external consumer relying on a JSON schema.
- No serialisation code. No
serde annotations on response types, no JSON schema validation on the client, no mapping between API responses and component props.
- No CORS configuration. The browser requests HTML from the same origin that served the page. Cross-origin issues do not arise.
- Faster feature delivery. Adding a field to a page means adding it to the query and the template. In an SPA, it means updating the API response, the TypeScript types, the state store, and the component that renders it.
The reduction in moving parts is not incremental. It is categorical. An entire class of bugs (schema mismatches, stale client caches, API versioning conflicts) cannot occur because the architecture does not have the layers where those bugs live.
When you do need a separate API
HDA does not mean you never write JSON endpoints. It means JSON is not the default, and HTML handles the majority of your application’s interface.
There are legitimate cases where a JSON API is the right tool:
- Third-party integrations. External services that call your application (payment webhooks, OAuth callbacks, partner integrations) communicate in JSON. These are not UI interactions; they are machine-to-machine interfaces.
- Mobile applications. If you ship a native mobile app alongside your web application, the mobile client needs a data API. HDA applies to the web interface; the mobile interface has different constraints.
- Public APIs. If your product offers an API as a feature (for customers to build integrations), that API will be JSON and needs the usual API design treatment: versioning, documentation, authentication, rate limiting.
- Islands of rich interactivity. Some UI components genuinely need client-side state: a drag-and-drop kanban board, a collaborative text editor, a real-time data visualisation. These components can fetch JSON from dedicated endpoints while the rest of the application uses HDA. This is the islands pattern, covered in When to Use HDA.
The principle is straightforward: use HTML for the interface, JSON for integrations. Most web applications are overwhelmingly interface. The JSON endpoints, when needed, are a small surface area alongside the HDA core, not a parallel architecture that doubles the codebase.
The Web Platform Has Caught Up
Between 2022 and 2026, the web platform crossed a capability threshold. Native CSS and HTML features now provide the functionality that historically justified adopting a CSS preprocessor, a utility framework, a CSS-in-JS library, or a JavaScript UI component system. No single feature is transformative. The cumulative effect is that the problems requiring these tools in 2020 can be solved with the platform itself in 2026.
This section catalogues what changed and why it matters for the architectural choice described in Why Hypermedia-Driven Architecture. The HDA model depends on the platform being capable enough that server-rendered HTML, plain CSS, and minimal JavaScript can deliver a production-quality experience. That dependency is now met.
The Interop Project
Cross-browser inconsistency was a primary driver of framework and preprocessor adoption. Developers reached for jQuery, Sass, Autoprefixer, and eventually React because writing to the platform directly meant writing to four different platforms with different bugs. The Interop Project has largely eliminated this rationale.
Interop is a joint initiative of Apple, Google, Igalia, Microsoft, and Mozilla, running annually since 2021 (initially as “Compat 2021”). Each year, the participants agree on a set of web platform features, write shared test suites via the Web Platform Tests project, and publicly track each browser engine’s pass rate. The Interop dashboard reports a single “interop score”: the percentage of tests that pass in all browsers simultaneously.
The scores tell the story:
| Year | Starting interop score | End-of-year (stable) | End-of-year (experimental) |
| Compat 2021 | 64-69% | >90% | – |
| Interop 2022 | ~49% | 83% | ~97% |
| Interop 2023 | ~48% | 75% | 89% |
| Interop 2024 | 46% | 95% | 99% |
| Interop 2025 | 29% | 97% | 99% |
The low starting scores each year reflect the selection of new focus areas, not regression. Each iteration targets harder, more recent features. That Interop 2025 started at 29% and finished at 97% in stable releases means the browser vendors are converging on new features within a single calendar year.
WebKit’s review of Interop 2025 described the result directly: “Every browser engine invested heavily, and the lines converge at the top. That convergence is what makes the Interop project so valuable, the shared progress that means you can write code once and trust that it works everywhere.”
Interop 2026 launched in February 2026 with 20 focus areas including cross-document view transitions, scroll-driven animation timelines, and continued anchor positioning alignment. The initiative is now in its fifth consecutive year with no signs of winding down.
The practical consequence: if you write CSS and HTML to the current specifications, it works in Chrome, Firefox, Safari, and Edge. The “works in my browser but not yours” problem that drove an entire generation of tooling adoption is, for the features that matter most, solved.
CSS features that replace frameworks
Eight CSS features, all shipping between 2022 and 2026, collectively address the problems that justified Sass, Less, PostCSS, Tailwind, CSS-in-JS, and JavaScript positioning libraries.
Cascade Layers (@layer)
Cascade Layers provide explicit control over cascade priority, independent of selector specificity or source order. All major browsers shipped support within five weeks of each other in early 2022. @layer reached Baseline Widely Available in September 2024.
@layer reset, base, components, utilities;
@layer reset {
* { margin: 0; box-sizing: border-box; }
}
@layer components {
.card { padding: 1rem; border: 1px solid #ddd; }
}
@layer utilities {
.hidden { display: none; }
}
Styles in later-declared layers always win over earlier layers, regardless of specificity. This replaces the specificity arms race that led to !important abuse, strict BEM naming conventions, and CSS-in-JS libraries whose primary value proposition was specificity isolation. Styles outside any @layer have the highest priority, which allows third-party CSS to be layered below application styles without modification.
CSS Nesting
CSS Nesting reached Baseline Newly Available in December 2023, when Chrome 120 and Safari 17.2 shipped the relaxed syntax (Firefox 117 had shipped in August 2023).
.card {
padding: 1rem;
h2 {
font-size: 1.25rem;
}
&:hover {
box-shadow: 0 2px 8px rgb(0 0 0 / 0.1);
}
@media (width >= 768px) {
padding: 2rem;
}
}
This is the feature that eliminated the most common reason for using Sass or Less. The relaxed nesting syntax (no & required before element selectors) matches what preprocessor users expect. Media queries and other at-rules can nest directly inside selectors.
Container Queries
Container Queries reached Baseline Widely Available in August 2025. Firefox 110 was the last browser to ship, completing Baseline in February 2023.
.card-container {
container-type: inline-size;
}
@container (inline-size > 400px) {
.card {
display: grid;
grid-template-columns: 200px 1fr;
}
}
Media queries respond to the viewport. Container queries respond to the size of the containing element. This makes components genuinely reusable: a card component that switches from stacked to horizontal layout based on its container width, not the window width. Previously, achieving this required JavaScript ResizeObserver workarounds or abandoning the idea entirely.
Size container queries are the Baseline part. Style container queries (@container style(...)) remain Chromium-only as of early 2026.
The :has() selector
:has() reached Baseline Newly Available in December 2023, when Firefox 121 shipped (Safari had led in March 2022, Chrome followed in August 2022).
.card:has(img) {
grid-template-rows: 200px 1fr;
}
.form-group:has(:invalid) {
border-color: var(--color-error);
}
section:has(> :only-child) {
padding: 0;
}
:has() is the long-requested “parent selector,” though it is more general than that name implies. It selects an element based on its descendants, siblings, or any relational condition expressible as a selector. Before :has(), selecting a parent based on its children required JavaScript DOM traversal. Entire categories of conditional styling that needed classList.toggle() or framework-level reactivity can now be expressed in CSS alone.
@scope
@scope reached Baseline Newly Available in December 2025, when Firefox 146 shipped (Chrome 118 had led in October 2023, Safari 17.4 followed in March 2024).
@scope (.card) to (.card-footer) {
p { margin-bottom: 0.5rem; }
a { color: var(--card-link-color); }
}
@scope provides proximity-based style scoping with both an upper bound (the scope root) and an optional lower bound (the scope limit), creating a “donut scope” that prevents styles from leaking into nested sub-components. This addresses the problem that CSS Modules, BEM, and Shadow DOM each solved partially: keeping component styles from colliding. Unlike Shadow DOM, @scope does not create hard encapsulation boundaries, so styles remain inspectable and overridable when needed.
The cumulative effect
No single feature here replaces a framework. The replacement is structural.
In 2020, a developer building a component library needed: a preprocessor for nesting and variables (Sass), a naming convention or tooling for specificity management (BEM or CSS Modules), JavaScript for responsive component behaviour (ResizeObserver hacks), JavaScript for parent-based conditional styling (no :has()), and either strict discipline or a CSS-in-JS library to prevent style collisions.
In 2026, native CSS handles all of this. Nesting and custom properties replace the preprocessor. @layer replaces specificity management tooling. Container queries replace JavaScript resize detection. :has() replaces JavaScript conditional styling. @scope replaces CSS-in-JS scoping. The developer writes CSS, and it works across browsers.
HTML features that replace JavaScript UI primitives
The historical justification for React’s component model arose partly because HTML lacked native primitives for modals, tooltips, menus, and rich selects. Three of those gaps are now closed at Baseline. Two more are closing.
The <dialog> element
<dialog> reached Baseline Widely Available in approximately September 2024. Firefox 98 and Safari 15.4 completed cross-browser support in March 2022.
<dialog id="confirm-dialog">
<h2>Delete this item?</h2>
<p>This action cannot be undone.</p>
<form method="dialog">
<button value="cancel">Cancel</button>
<button value="confirm">Delete</button>
</form>
</dialog>
A modal <dialog> (opened via showModal()) provides focus trapping, top-layer rendering, backdrop styling via ::backdrop, the Escape key to close, and <form method="dialog"> for declarative close actions. These are the behaviours that every custom modal library (Bootstrap Modal, React Modal, a11y-dialog) reimplements in JavaScript. The native element provides them with correct accessibility semantics, including the dialog ARIA role and proper focus restoration on close, out of the box.
The Popover API
The Popover API reached Baseline Newly Available in January 2025 (Safari 18.3 resolved a light-dismiss bug on iOS that had delayed the designation).
<button popovertarget="menu">Options</button>
<div id="menu" popover>
<a href="/settings">Settings</a>
<a href="/profile">Profile</a>
<a href="/logout">Log out</a>
</div>
The popover attribute gives any element top-layer rendering, light dismiss (click outside or press Escape to close), and automatic accessibility wiring. popover="auto" provides light dismiss; popover="manual" requires explicit close. This replaces Tippy.js, Bootstrap Popovers, and the custom JavaScript that every dropdown menu previously required.
The popover="hint" variant (for hover-triggered tooltips) is an Interop 2026 focus area and not yet Baseline.
Invoker Commands
Invoker Commands (command and commandfor attributes) reached Baseline Newly Available in early 2026, with Safari 26.2 completing cross-browser support after Chrome 135 (April 2025) and Firefox 144.
<button commandfor="my-dialog" command="show-modal">Open</button>
<dialog id="my-dialog">
<p>Dialog content</p>
<button commandfor="my-dialog" command="close">Close</button>
</dialog>
Invoker Commands connect a button to a target element declaratively: commandfor names the target, command specifies the action. Built-in commands include show-modal, close, and request-close for dialogs, and toggle-popover, show-popover, hide-popover for popovers. No JavaScript required for these interactions.
Combined with <dialog> and the Popover API, Invoker Commands eliminate the last bit of JavaScript glue that modals and popovers previously required. A dialog can be opened, populated, and closed entirely through HTML attributes and server-rendered content, which is exactly what HDA needs.
Gaps still closing
Two features listed in the outline remain Chromium-only as of February 2026:
Customizable <select> (appearance: base-select). Chrome 134+ and Edge 134+ ship full CSS styling of <select> elements, including custom option rendering via exposed pseudo-elements (::picker(select), selectedoption). Firefox and Safari are implementing but have not shipped to stable. This feature replaces React Select, Select2, and the entire category of custom dropdown libraries that exist because native <select> has been unstyled. The opt-in (appearance: base-select) means browsers without support simply show the default <select>, making it safe to adopt as progressive enhancement.
Speculation Rules API. Chrome 121+ supports declarative prefetch and prerender rules via <script type="speculationrules">. WordPress and Shopify have deployed it at scale. Firefox’s standards position is positive for prefetch but neutral on prerender; Safari has published no position. Non-supporting browsers ignore the <script> block entirely, so it can be deployed today without harm. For HDA applications, speculation rules offer the multi-page navigation speed that SPA prefetching provides, without any client-side routing framework.
Both features work as progressive enhancement: they improve the experience in supporting browsers without breaking others.
Progressive enhancement as the architectural default
The features above share a property: they degrade gracefully. A <dialog> without JavaScript still renders its content. A popover without support becomes a static element. A <select> without appearance: base-select falls back to the native control. This is not accidental. The web platform is designed around progressive enhancement.
Native HTML elements carry built-in ARIA semantics, focus management, and keyboard handling. A <dialog> opened with showModal() traps focus, responds to Escape, announces itself to screen readers, and restores focus to the triggering element on close. A <button> with commandfor and command attributes communicates its relationship to the target element through the accessibility tree. These behaviours are defined by the specification and implemented by the browser.
SPA component libraries must reimplement all of this. A React modal component needs explicit focus-trap logic, an Escape key handler, ARIA attributes, a portal to render in the correct DOM position, and focus restoration on unmount. Libraries like Radix UI and Headless UI exist specifically because implementing accessible interactive components in React is difficult. The native elements provide the same behaviours correctly by default.
In HDA, progressive enhancement is the structural default. The baseline is server-rendered HTML with standard links and forms. htmx attributes enhance but are not required; a form with hx-post and hx-swap still submits normally via the browser’s native form handling if htmx fails to load. In SPA frameworks, progressive enhancement is opt-in and, under deadline pressure, frequently abandoned.
No-build JavaScript
ES Modules (<script type="module">) have been supported in all major browsers since 2018 and are Baseline Widely Available. Import Maps reached Baseline Widely Available in approximately September 2025, with Safari 16.4 completing cross-browser support in March 2023.
Together, they enable npm-style bare specifier imports in the browser without npm, Node.js, or a bundler:
<script type="importmap">
{
"imports": {
"htmx": "/static/js/htmx.min.js",
"alpinejs": "/static/js/alpine.min.js"
}
}
</script>
<script type="module">
import 'htmx';
</script>
Import maps resolve bare specifiers (import 'htmx') to URLs, the same job that webpack, Rollup, and esbuild perform during a build step. With import maps, the browser does this resolution at runtime. No bundler needed.
The trade-offs are real. There is no tree-shaking: unused code in imported modules ships to the client. No TypeScript compilation: types are stripped only if a build step runs. No code splitting: the browser loads entire modules rather than optimised chunks. For applications with large client-side dependency graphs, these costs matter.
For HDA applications, they do not. The client-side dependency count is typically small: htmx (14 KB gzipped), perhaps a date formatting library, perhaps a small charting library for a dashboard page. The total client-side JavaScript in an HDA application is measured in tens of kilobytes, not megabytes. HTTP/2 and HTTP/3 multiplexing further reduce the cost of serving a handful of small modules individually.
Some practitioners retain a build step for minification, but this is an optional optimisation, not an architectural requirement. The htmx project itself argues explicitly against build steps, distributing as a single file that can be included with a <script> tag. The no-build approach is not a compromise for HDA. It is the natural fit.
The supply chain security argument
The architectural choice to avoid npm is not only a simplicity argument. It is a security argument, grounded in the structural properties of the npm dependency graph and the empirical record of supply chain attacks against it.
The dependency graph problem
Zimmermann, Staicu, Tenny, and Pradel (Small World with High Risks: A Study of Security Threats in the npm Ecosystem, USENIX Security 2019) analysed npm’s dependency graph as of April 2018 and found small-world network properties: just 20 maintainer accounts could reach more than half of the entire ecosystem through transitive dependencies. Installing an average npm package implicitly trusts approximately 80 other packages and 39 maintainers. 391 highly influential maintainers each affected more than 10,000 packages.
A comparative study by Decan, Mens, and Grosjean (An Empirical Comparison of Dependency Network Evolution in Seven Software Packaging Ecosystems, Empirical Software Engineering, 2019) found npm had the highest transitive dependency counts among seven ecosystems. A more recent study by Biernat et al. (How Deep Does Your Dependency Tree Go?, December 2025) across ten ecosystems found that Maven now shows the highest mean amplification ratio (24.70x transitive-to-direct), with npm at 4.32x. npm is not the worst offender across all ecosystems, but it remains structurally exposed: 12% of npm projects exceed a 10x amplification ratio, and the absolute number of affected projects is enormous given npm’s scale.
The empirical record
The structural risk is not theoretical. Supply chain attacks against npm are recurring and escalating in sophistication.
event-stream (November 2018). A new maintainer, given publish access through social engineering, added a dependency on flatmap-stream containing encrypted malicious code targeting the Copay Bitcoin wallet. The package had approximately 2 million weekly downloads. The malicious code was live for over two months before a computer science student noticed it.
polyfill.io (June 2024). The polyfill.io CDN domain was acquired by a new owner in February 2024. Four months later, the CDN began serving modified JavaScript that redirected mobile users to scam sites. Over 380,000 websites were embedding scripts from the compromised domain. Andrew Betts, the original creator, had warned users when the sale occurred. Most did not act.
chalk/debug (September 2025). A phishing attack compromised the npm credentials of a maintainer of chalk, debug, and 16 other packages. The malicious versions contained code to hijack cryptocurrency transactions in browsers. The 18 affected packages accounted for over 2.6 billion combined weekly downloads. The malicious versions were live for approximately two hours.
These incidents share a structural cause: the npm ecosystem’s deep transitive dependency graphs mean that compromising a single package or maintainer account can reach thousands or millions of downstream projects. The risk scales with the number of dependencies.
The HDA alternative
An HDA application with vendored htmx eliminates this entire attack surface. htmx is 14 KB minified and gzipped, has zero dependencies, and is distributed as a single JavaScript file. There is no npm install step, no node_modules directory, no transitive dependency graph, and no exposure to registry-level supply chain attacks.
This is not an incremental improvement. A typical React application created with Vite installs approximately 270 packages, and projects using Create React App (now deprecated) routinely exceeded 1,500. Each package is a node in the dependency graph that the Zimmermann findings describe. Reducing that graph from hundreds of nodes to zero is a categorical change in supply chain risk profile.
The comparison is worth stating plainly. One architecture requires you to trust hundreds of packages, maintained by strangers, with update cadences you do not control, delivered through a registry that is a recurring target of supply chain attacks. The other architecture requires you to trust one 14 KB file that you can vendor, audit, and pin.
What this means for HDA
The web platform’s capability expansion between 2022 and 2026 is the material condition that makes hypermedia-driven architecture practical for production applications. The HDA model depends on three platform properties:
-
CSS is sufficient for production UI. Nesting, container queries, cascade layers, :has(), and @scope collectively provide the capabilities that previously required a preprocessor, a utility framework, or CSS-in-JS.
-
HTML provides interactive primitives. <dialog>, the Popover API, and Invoker Commands cover modals, tooltips, dropdowns, and declarative element interaction without JavaScript component libraries.
-
The browser is a capable module system. ES Modules and Import Maps enable dependency management without a build tool, and the small dependency footprint of HDA applications makes the trade-offs (no tree-shaking, no code splitting) irrelevant.
The Interop Project ensures these features work consistently across browsers. The backward-compatibility guarantee described in the previous section ensures they will continue to work. And the elimination of the npm dependency graph provides a supply chain security posture that no framework-dependent architecture can match.
The web platform was not always adequate for building rich applications without frameworks. It is now.
SPA vs HDA: A Side-by-Side Comparison
The previous sections argued for hypermedia-driven architecture on structural grounds: coupling, migration cost, backward compatibility. This section puts code next to code. What does the same feature actually look like when built both ways, and what do published migrations tell us about the difference at scale?
What real migrations show
The strongest published data comes from Contexte, a SaaS product for media professionals built with React. In 2022, developer David Guillot presented the results of porting the application from React to Django templates with htmx:
| Metric | React | Django + htmx | Change |
| Total lines of code | 21,500 | 7,200 | −67% |
| JavaScript dependencies | 255 | 9 | −96% |
| Web build time | 40s | 5s | −88% |
| First load time-to-interactive | 2–6s | 1–2s | −50–60% |
| Memory usage | ~75 MB | ~45 MB | −46% |
The port took roughly two months to rewrite a codebase that had taken two years to build. The team eliminated the hard split between frontend and backend developers. User experience did not degrade.
Contexte is a media-oriented application, exactly the kind of content-driven, read-heavy workload that hypermedia was designed for. The htmx project acknowledges this: “These sorts of numbers would not be expected for every web application.” A separate Next.js to htmx port showed a 17% reduction in written application code and over 50% reduction in total shipped code when accounting for dependency weight.
The pattern across these migrations is consistent. The JSON serialisation layer disappears. Client-side state management disappears. The build toolchain disappears. The dependency graph collapses. What remains is server-side code that got somewhat larger (Contexte’s Python grew from 500 to 1,200 lines) and a total codebase that got dramatically smaller.
The same feature, two architectures
Consider a searchable contact list with inline editing and deletion. The specification is identical for both implementations:
- Display contacts from a database
- Live search with debounce (300ms)
- Click a row to get an editable form
- Delete with confirmation
- All changes persist to the server
This is a bread-and-butter CRUD feature. Most web applications are made of features like this one.
SPA: React + Vite + REST API
The SPA approach requires two applications. A React client handles rendering and state. A server exposes JSON endpoints. They communicate through a serialisation boundary.
Search with debounce needs a custom hook or a library:
function useDebounce(value, delay) {
const [debounced, setDebounced] = useState(value);
useEffect(() => {
const timer = setTimeout(() => setDebounced(value), delay);
return () => clearTimeout(timer);
}, [value, delay]);
return debounced;
}
function ContactList() {
const [query, setQuery] = useState('');
const [contacts, setContacts] = useState([]);
const [editingId, setEditingId] = useState(null);
const debouncedQuery = useDebounce(query, 300);
useEffect(() => {
fetch(`/api/contacts?q=${debouncedQuery}`)
.then(res => res.json())
.then(setContacts);
}, [debouncedQuery]);
}
The component manages three pieces of state: the search query, the contact list, and which row is being edited. Each state change triggers a re-render. The search query flows through a debounce hook, which triggers a fetch, which deserialises JSON, which updates state, which triggers another re-render. The edit mode is a client-side toggle: clicking a row sets editingId, and the component conditionally renders either a display row or a form row based on that state.
Inline editing requires the client to manage form state, submit JSON to the API, handle the response, and update the local contact list to reflect the change:
async function handleSave(contact) {
const res = await fetch(`/api/contacts/${contact.id}`, {
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(contact),
});
const updated = await res.json();
setContacts(prev =>
prev.map(c => c.id === updated.id ? updated : c)
);
setEditingId(null);
}
The server side mirrors this with JSON endpoints:
app.get('/api/contacts', async (req, res) => {
const contacts = await db.query(
'SELECT * FROM contacts WHERE name ILIKE $1',
[`%${req.query.q}%`]
);
res.json(contacts);
});
app.put('/api/contacts/:id', async (req, res) => {
const { name, email } = req.body;
const updated = await db.query(
'UPDATE contacts SET name = $1, email = $2 WHERE id = $3 RETURNING *',
[name, email, req.params.id]
);
res.json(updated[0]);
});
Every interaction crosses the serialisation boundary twice: the server serialises to JSON, the client deserialises, processes the data, and re-renders. The CORS configuration, the Content-Type headers, the JSON.stringify and res.json() calls are all infrastructure that exists solely because the client and server are separate applications communicating through a data format that carries no hypermedia controls.
The project also needs a build toolchain. A fresh React + Vite project installs Node.js, npm, Vite (which bundles esbuild for development and Rollup for production), a JSX transformer (Babel or SWC), and ESLint. The node_modules directory contains hundreds of transitive packages. Each is a separate project with its own release cycle.
HDA: Rust/Axum/Maud + htmx
The HDA approach is one application. The server handles everything: routing, data access, rendering, and interactivity declarations.
Search with debounce is a single HTML attribute:
fn search_input(query: &str) -> Markup {
html! {
input type="text" name="q" value=(query)
hx-get="/contacts"
hx-trigger="input changed delay:300ms"
hx-target="#contact-list"
placeholder="Search contacts...";
}
}
No hook. No state. No effect. The hx-trigger attribute declares the debounce behaviour inline. When the user types, htmx waits 300ms after the last keystroke, sends a GET request, and swaps the response into #contact-list. The server returns an HTML fragment containing the filtered rows.
The search handler queries the database and renders HTML directly:
async fn list_contacts(
State(pool): State<PgPool>,
Query(params): Query<SearchParams>,
) -> Markup {
let contacts = sqlx::query_as!(
Contact,
"SELECT * FROM contacts WHERE name ILIKE '%' || $1 || '%'",
params.q.unwrap_or_default()
)
.fetch_all(&pool)
.await
.unwrap();
html! {
tbody#contact-list {
@for contact in &contacts {
(contact_row(contact))
}
}
}
}
There is no JSON serialisation. The handler returns Markup, which Axum sends as an HTML response. The database result flows directly into the template. The query is checked at compile time by SQLx.
Inline editing is a template swap, not a state toggle. Clicking the edit button asks the server for an edit form:
fn contact_row(contact: &Contact) -> Markup {
html! {
tr {
td { (contact.name) }
td { (contact.email) }
td {
button hx-get={"/contacts/" (contact.id) "/edit"}
hx-target="closest tr"
hx-swap="outerHTML" { "Edit" }
}
}
}
}
fn contact_edit_row(contact: &Contact) -> Markup {
html! {
tr {
td {
input type="text" name="name" value=(contact.name);
}
td {
input type="text" name="email" value=(contact.email);
}
td {
button hx-put={"/contacts/" (contact.id)}
hx-target="closest tr"
hx-swap="outerHTML"
hx-include="closest tr" { "Save" }
}
}
}
}
The edit handler returns contact_edit_row, which replaces the display row. The save handler updates the database and returns contact_row, which replaces the edit form. No client-side state tracks which row is being edited. The server controls the UI by returning the appropriate HTML fragment.
The entire client-side dependency is htmx: a single 14 KB file (minified and gzipped) with zero dependencies. No build step. No node_modules. No package manager. Vendor the file and serve it from your Rust application.
Key observations
The comparison reveals differences that are structural, not incremental.
The JSON serialisation layer is eliminated entirely. In the SPA, every interaction crosses a serialisation boundary: JSON.stringify on the client, res.json() on the server, res.json() then setContacts() on the way back. In HDA, the handler returns HTML. The serialisation layer does not exist because the architecture does not need it.
Client-side state management disappears. The React component manages query, contacts, and editingId as state. Changes to any of these trigger re-renders. The htmx version has no client-side state at all. The server is the single source of truth, and every user action asks the server what to show next.
The dependency asymmetry is categorical. One side installs hundreds of packages through a package manager, maintained by hundreds of independent maintainers, each a potential supply chain risk. The other vendors a single file. The React runtime alone (~55 KB gzipped for React 19 + ReactDOM) is roughly four times the size of htmx (~14 KB gzipped), and that comparison ignores the entire build toolchain and its transitive dependencies.
The build toolchain is a complexity tax. The SPA needs Node.js, npm, Vite, esbuild, Rollup, and a JSX transformer to convert source files into something a browser can execute. The HDA serves HTML from a compiled Rust binary. The browser needs no build artefact because the server already produced what the browser understands natively: HTML.
What the SPA provides that HDA does not
The comparison above is favourable to HDA because this is a CRUD feature, and CRUD features are what HDA handles best. The SPA architecture has genuine strengths that should not be dismissed as irrelevant.
Component-level encapsulation with typed props. React components accept typed props and manage their own state in a well-defined scope. This composability model is genuinely powerful for building complex UIs. A component can be tested in isolation, rendered in a storybook, and reused across pages with different data. Maud functions provide similar composition, but the pattern is less formalised and has no equivalent to React’s developer tooling for component inspection.
React DevTools and the debugging experience. React DevTools lets you inspect the component tree, view props and state, trace re-renders, and profile performance. The htmx debugging experience is the browser’s network tab and the DOM inspector. For complex UIs, React’s tooling gives developers significantly better visibility into what the application is doing and why.
Client-side rendering avoids some server round-trips. When edit mode is a client-side state toggle, the UI updates instantly. No network request is needed to show a form. In HDA, clicking “Edit” sends a request to the server and waits for the response. On a fast connection, this difference is imperceptible. On a slow connection or for highly interactive interfaces, it matters.
The component library ecosystem is unmatched. Libraries like shadcn/ui and Radix provide production-quality, accessible UI primitives: dialogs, dropdowns, date pickers, data tables, command palettes. These components handle keyboard navigation, screen reader announcements, focus trapping, and edge cases that take significant effort to implement correctly. The HDA ecosystem has no equivalent at comparable maturity. If your application needs a complex, accessible data table with column sorting, filtering, pagination, and row selection, a React component library gives you that out of the box.
TypeScript provides end-to-end type checking. TypeScript catches errors across the entire client-side codebase: props, state, API response shapes, event handlers. In the SPA model, a type error in a component is caught before the code runs. Rust provides this same safety on the server side (and Maud catches malformed HTML at compile time), but the client-side interactivity in HDA is untyped HTML attributes. A typo in hx-target is a runtime error, not a compile-time error.
Hiring and ecosystem momentum. React dominates job postings and developer mindshare. Finding developers who know React is straightforward. Finding developers who know Rust, Axum, Maud, and htmx is harder. This is not a technical argument, but it is a practical one that affects team building and hiring timelines.
For most CRUD and content-driven features, these trade-offs favour HDA. The component ecosystem advantage matters most when building interfaces that require complex, accessible widgets. The typing advantage is real but narrower than it appears, because the majority of interactivity in an HDA is handled by a small set of well-tested htmx attributes rather than arbitrary JavaScript. The hiring argument is genuine and may be the strongest practical objection for many teams.
Rust-specific advantages
The contact list comparison used generic server code for the SPA side. The HDA side is Rust, and Rust brings specific advantages beyond the architectural ones.
Maud checks HTML at compile time. Most server-side template engines (Jinja2, ERB, Handlebars) parse templates at runtime. A typo in a variable name, a missing closing tag, or a type mismatch surfaces as a runtime error, sometimes only when that specific template path is hit in production. Maud’s html! macro is evaluated during compilation. If the template contains a syntax error or references a variable that does not exist, the code does not compile. This is a meaningful safety guarantee that most server-side frameworks cannot offer.
SQLx checks queries at compile time. The sqlx::query_as! macro verifies SQL against a live database during compilation. If a column name is wrong, a type does not match, or a table does not exist, the compiler catches it. Combined with Maud’s compile-time HTML checking, the Rust HDA stack catches errors at two boundaries (database-to-code and code-to-HTML) where most stacks only discover problems at runtime.
The combination delivers type safety comparable to TypeScript + React, but without the client-side dependency graph. TypeScript checks component props and state. Rust + SQLx + Maud checks database queries, handler types, and HTML output. Both approaches catch a broad category of errors before the code runs. The difference is that the Rust approach achieves this with a single compiled binary, while the TypeScript approach requires a build toolchain, a runtime, and hundreds of dependencies to deliver the same guarantee.
When to Use HDA (and When Not To)
HDA is not a universal prescription. It is an architecture that fits a specific, large class of web applications extremely well and fits others poorly. This section draws the boundary and describes how to handle the cases that fall on either side of it.
Where HDA excels
HDA is the natural architecture for any application where the primary interaction is reading, writing, and navigating server-managed data. This covers:
Content-heavy sites. Media publications, documentation platforms, blogs, knowledge bases, wikis. The content lives on the server. The user reads it. The server renders HTML. There is nothing to manage on the client. These applications gain nothing from a client-side framework and pay a real cost in complexity if they adopt one.
CRUD applications. Admin panels, CRM systems, ERP interfaces, internal tools, project management dashboards. The interaction pattern is: list records, view a record, edit fields, save. Every step is a request-response cycle that maps directly onto HTTP. htmx’s partial page replacement handles the dynamic parts (inline editing, live search, filtered lists) without requiring client-side state.
Form-heavy workflows. Onboarding sequences, multi-step applications, surveys, checkout flows, approval processes. Forms are native HTML. Validation can happen both in the browser (HTML5 attributes) and on the server. The Post/Redirect/Get pattern handles submission cleanly. Adding htmx provides progressive enhancement: inline validation, step transitions without full page reloads, conditional form sections that load from the server based on prior answers.
E-commerce. Catalogue browsing, product search, filtering, cart management, checkout. These are read-heavy with occasional writes. The product page is server-rendered content. The cart is server-managed state. Search is a server query. The few interactive elements (add to cart, quantity adjustment) are simple HTTP requests that return HTML fragments. Shopify, the largest e-commerce platform, serves server-rendered pages.
Dashboards with periodic data updates. Reporting interfaces, analytics dashboards, monitoring views. If the data refreshes on a cadence measured in seconds or minutes (not milliseconds), server-sent events or periodic htmx polling deliver updates without client-side state management. A dashboard that refreshes every 30 seconds does not need React.
The common thread: the server owns the data, the user interacts through standard HTTP patterns (links, forms, requests), and the UI is a representation of server state rather than an independent application with its own state model.
Where SPAs are genuinely superior
Some applications have interaction models that fundamentally require client-side state. For these, HDA is the wrong tool.
Real-time collaborative editing. Google Docs, Figma, and Linear all maintain local copies of document state on the client. Edits apply optimistically, synchronise with the server via WebSockets, and reconcile conflicts using operational transformation or CRDTs. Figma’s multiplayer system gives each document its own server process and maintains persistent WebSocket connections for every collaborating client. This is architecturally incompatible with request-response HTML. The client must own state because it must apply edits instantly and resolve conflicts locally before the server confirms them.
Offline-first applications. Applications that must function without a network connection need a complete client-side data model, a sync engine, and a conflict resolution strategy. Service workers and IndexedDB provide the storage. CRDTs or similar structures handle the merge logic. The server is not available to render HTML when the user is on an aeroplane, so the client must be a self-sufficient application.
Continuous manipulation interfaces. Drawing tools (Figma, Excalidraw), music production software, video editors, spreadsheets with real-time formula recalculation. These require sub-16ms frame rendering for smooth interaction. A server round-trip is physically incompatible with the latency budget. Many of these applications bypass the DOM entirely, rendering to <canvas> or WebGL because even DOM manipulation is too slow for their needs. Google Docs moved to canvas-based rendering to sidestep DOM performance constraints. Quadratic, an open-source spreadsheet, chose WebGL over HTML because the DOM cannot handle millions of cells.
Extreme latency sensitivity. In-browser IDEs need sub-50ms keystroke-to-render times. Trading dashboards require sub-second updates with client-side filtering across large datasets. Audio applications measure latency in single-digit milliseconds. Any architecture that routes through the server for UI updates cannot meet these constraints.
The common thread across all four: the client must own state because the interaction model is physically incompatible with server round-trips. This is not a preference or a trade-off. It is a hard constraint imposed by latency, connectivity, or rendering performance.
Steelmanning client-side frameworks
The SPA vs HDA comparison covered the technical strengths of client-side frameworks in detail: component encapsulation, developer tooling, TypeScript type checking, and the component library ecosystem. Those arguments are real and worth reading.
Beyond the technical merits, there are organisational strengths that matter for team decisions:
Hiring and ecosystem momentum. React appears in roughly 45% of developer survey responses. Job postings that require React are abundant. Job postings that require htmx are nearly nonexistent. Adopting HDA means training developers rather than hiring specialists. The htmx learning curve is shallow (it is a small library over standard HTML), but the absence of a recognised hiring category creates friction for teams accustomed to recruiting by framework name.
Established patterns for complex UIs. The React ecosystem has converged on well-documented patterns for routing, data fetching, state management, and component composition. A developer joining a React project finds familiar structure. The HDA ecosystem has fewer established conventions, and the patterns vary more between projects. This is improving (htmx’s own documentation is thorough, and this guide exists for the Rust stack), but it is not yet at parity.
The component library gap. This is worth repeating because it is the most concrete practical difference. Libraries like shadcn/ui and Radix provide accessible, production-quality date pickers, command palettes, data tables, comboboxes, and dropdown menus with keyboard navigation, focus trapping, and screen reader support built in. The HDA ecosystem has nothing at comparable maturity. Building an accessible combobox from scratch is significant work. If your application needs several such components, the React ecosystem delivers them faster today.
These are genuine advantages, not strawmen. For many teams, the hiring argument alone outweighs the architectural benefits of HDA. The right response is not to dismiss these concerns but to weigh them honestly against the structural costs documented in the preceding sections.
The islands pattern
Most applications are not purely one thing. A CRUD application might need a rich text editor on one page. A dashboard might need a real-time chart alongside otherwise static report content. A form workflow might need an interactive date range picker.
The answer is not to adopt an SPA framework for the entire application because one page needs a complex widget. The answer is islands: HDA as the default architecture, with isolated client-side components for the specific interactions that require them.
The concept is straightforward. The server renders the page as HTML. Most of the page is standard hypermedia, driven by htmx. One region of the page mounts a standalone JavaScript component, a chart library, a rich text editor, a custom date picker, whatever the specific interaction demands. That component owns its own state and manages its own rendering within its DOM region. The rest of the page is unaware of it.
Events are the integration mechanism. The island communicates with the surrounding hypermedia through DOM events. When the rich text editor saves, it dispatches a custom event. An htmx attribute on a nearby element listens for that event and triggers a server request. When the server needs to update the island, it can return an HTML fragment containing updated data- attributes or a <script> tag that the island picks up. The boundary between hypermedia and non-hypermedia is clean: HTML and HTTP on one side, JavaScript and local state on the other, with events bridging the gap.
This is not a compromise. It is the architecturally correct approach: matching the interaction style to the interaction requirements. Using htmx for a contact list is correct. Using a JavaScript charting library for a real-time visualisation is also correct. Using React for both, or htmx for both, optimises for consistency at the expense of fitness.
The practical implication is that an HDA project should have a clear policy for when an island is warranted. A reasonable threshold: if a component requires persistent client-side state that cannot be modelled as a series of server requests, it is an island. If it can be expressed as “user acts, server responds with HTML,” it is hypermedia. Most features in most applications fall into the second category.
The web’s actual composition
SPA frameworks dominate developer discourse, conference talks, blog posts, job listings, and tutorial ecosystems. This creates a perception that SPAs are the standard way to build for the web.
The data tells a different story. According to W3Techs, React is used on roughly 6% of all websites. Angular and Vue each account for 1-2%. Even granting that some sites use client-side frameworks not captured by these measurements, and that some React sites are server-rendered via Next.js, the total share of websites running as true single-page applications is well under 10%.
The remaining 90%+ is WordPress (43% of all websites alone), other CMS platforms (Shopify, Squarespace, Wix, Drupal, Joomla), static sites, and traditional server-rendered applications. The web is overwhelmingly server-rendered, content-oriented, and CRUD-driven.
This matters because the architecture you choose should match the architecture your application actually needs, not the architecture that dominates Hacker News. If you are building a collaborative design tool, use a client-side framework. If you are building a content site, an admin panel, a form workflow, a dashboard, or an e-commerce platform, you are building the kind of application that constitutes the vast majority of the web. HDA is the architecture designed for that majority.
Core Stack
Web Server with Axum
Axum is the HTTP framework for this stack. Built on Tower and Hyper, it provides type-safe request handling through extractors and uses the same Tower middleware that the rest of the Rust async ecosystem uses.
This section covers routing, handlers, extractors, shared state, middleware, static assets, and graceful shutdown. A complete runnable server is assembled at the end.
A minimal server
Add Axum and Tokio to your Cargo.toml:
[dependencies]
axum = "0.8"
tokio = { version = "1", features = ["full"] }
A server that responds to GET /:
use axum::{Router, routing::get};
#[tokio::main]
async fn main() {
let app = Router::new()
.route("/", get(|| async { "hello" }));
let listener = tokio::net::TcpListener::bind("0.0.0.0:3000")
.await
.unwrap();
axum::serve(listener, app).await.unwrap();
}
axum::serve binds the router to a TCP listener. There is no separate Server type.
Handlers
A handler is an async function that receives zero or more extractors and returns something that implements IntoResponse:
use axum::response::Html;
async fn index() -> Html<&'static str> {
Html("<h1>Home</h1>")
}
Axum provides IntoResponse implementations for common types: String, &str, StatusCode, Html<T>, Json<T>, and tuples that combine a status code with a body.
use axum::{http::StatusCode, response::IntoResponse};
async fn not_found() -> impl IntoResponse {
(StatusCode::NOT_FOUND, Html("<h1>404</h1>"))
}
In a hypermedia-driven application, most handlers return Html. The JSON response types exist but are rarely the primary format.
Debugging handler signatures
Enable the macros feature and annotate handlers with #[debug_handler] during development. It produces clearer compiler errors when an extractor or return type is wrong:
axum = { version = "0.8", features = ["macros"] }
use axum::debug_handler;
#[debug_handler]
async fn index() -> Html<&'static str> {
Html("<h1>Home</h1>")
}
Remove #[debug_handler] before release. It adds overhead that is only useful during compilation.
Extractors pull data out of the incoming request. Axum calls FromRequestParts (for headers, path parameters, query strings) or FromRequest (for the body) on each handler argument. A body-consuming extractor must be the last argument.
Common extractors:
| Extractor | Source | Example |
Path<T> | URL path parameters | Path(id): Path<u64> |
Query<T> | Query string | Query(params): Query<SearchParams> |
Form<T> | URL-encoded body | Form(data): Form<LoginForm> |
State<T> | Shared application state | State(state): State<AppState> |
HeaderMap | Request headers | headers: HeaderMap |
use axum::extract::{Path, Query, State};
use axum::response::Html;
use serde::Deserialize;
#[derive(Deserialize)]
struct SearchParams {
q: Option<String>,
page: Option<u32>,
}
async fn search(
State(state): State<AppState>,
Query(params): Query<SearchParams>,
) -> Html<String> {
Html(format!("<p>Searching for {:?}</p>", params.q))
}
Path parameters use curly-brace syntax in route definitions. This changed in Axum 0.8; the older colon syntax (:id) no longer works:
app.route("/users/{id}", get(show_user));
async fn show_user(Path(id): Path<u64>) -> impl IntoResponse {
Html(format!("<h1>User {id}</h1>"))
}
Application state
Shared state is how handlers access the database pool, configuration, and other application-wide resources. Define a struct, derive Clone, and pass it to the router with with_state:
use sqlx::PgPool;
#[derive(Clone)]
struct AppState {
db: PgPool,
config: AppConfig,
}
#[derive(Clone)]
struct AppConfig {
app_name: String,
base_url: String,
}
Wire the state into the router:
let state = AppState {
db: PgPool::connect(&database_url).await.unwrap(),
config: AppConfig {
app_name: "My App".into(),
base_url: "http://localhost:3000".into(),
},
};
let app = Router::new()
.route("/", get(index))
.with_state(state);
Handlers extract it with State<AppState>:
async fn index(State(state): State<AppState>) -> Html<String> {
Html(format!("<h1>{}</h1>", state.config.app_name))
}
Router<S> means the router is missing state of type S. Calling .with_state(state) produces Router<()>, meaning all state has been provided. Only Router<()> can be passed to axum::serve.
PgPool is internally reference-counted, so cloning AppState is cheap. For fields that need interior mutability (counters, caches), wrap them in Arc<RwLock<T>>.
Route organisation with nest
Router::nest mounts a sub-router under a path prefix. Use this to organise routes by feature or domain area:
fn user_routes() -> Router<AppState> {
Router::new()
.route("/", get(list_users).post(create_user))
.route("/{id}", get(show_user))
.route("/{id}/edit", get(edit_user_form).post(update_user))
}
fn admin_routes() -> Router<AppState> {
Router::new()
.route("/", get(admin_dashboard))
.route("/users", get(admin_users))
}
let app = Router::new()
.route("/", get(index))
.nest("/users", user_routes())
.nest("/admin", admin_routes())
.with_state(state);
Requests to /users/42 reach show_user with the path /42. The prefix is stripped before the nested router sees the request. If a handler needs the full original URI, extract OriginalUri from axum::extract.
In a workspace with multiple crates, define route functions in each crate and assemble them in the binary crate:
use users::user_routes;
use admin::admin_routes;
let app = Router::new()
.nest("/users", user_routes())
.nest("/admin", admin_routes())
.with_state(state);
All nested routers must share the same state type. If a sub-router has its own state, call .with_state() on it before nesting:
let inner = Router::new()
.route("/bar", get(inner_handler))
.with_state(InnerState {});
let app = Router::new()
.nest("/foo", inner)
.with_state(OuterState {});
Middleware
Axum uses Tower layers for middleware. The tower-http crate provides HTTP-specific layers that cover most common needs.
[dependencies]
tower = "0.5"
tower-http = { version = "0.6", features = ["trace", "compression-gzip"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
Request tracing
TraceLayer logs every request and response, integrating with the tracing crate:
use tower_http::trace::TraceLayer;
use tracing_subscriber::EnvFilter;
tracing_subscriber::fmt()
.with_env_filter(EnvFilter::from_default_env())
.init();
let app = Router::new()
.route("/", get(index))
.layer(TraceLayer::new_for_http());
Control log levels with the RUST_LOG environment variable: RUST_LOG=info for production, RUST_LOG=tower_http=trace during development.
Response compression
CompressionLayer compresses response bodies. Enable additional algorithms by adding features like compression-br or compression-zstd:
use tower_http::compression::CompressionLayer;
let app = Router::new()
.route("/", get(index))
.layer(CompressionLayer::new())
.layer(TraceLayer::new_for_http());
Combining layers
Apply multiple layers with ServiceBuilder. Layers are listed top-to-bottom, and the first layer listed is the outermost (runs first on the request, last on the response):
use tower::ServiceBuilder;
let app = Router::new()
.route("/", get(index))
.layer(
ServiceBuilder::new()
.layer(TraceLayer::new_for_http())
.layer(CompressionLayer::new())
)
.with_state(state);
Here, tracing wraps compression: requests are logged before responses are compressed.
Sessions and CSRF
Session management (tower-sessions) and CSRF protection follow the same .layer() pattern. They are covered in the Authentication section.
Custom middleware
For application-specific middleware, use axum::middleware::from_fn. Write a plain async function that receives the request and a Next handle:
use axum::{
middleware::{self, Next},
extract::Request,
response::Response,
http::StatusCode,
};
async fn require_auth(
State(state): State<AppState>,
request: Request,
next: Next,
) -> Result<Response, StatusCode> {
Ok(next.run(request).await)
}
let app = Router::new()
.route("/dashboard", get(dashboard))
.route_layer(middleware::from_fn_with_state(
state.clone(),
require_auth,
))
.with_state(state);
.route_layer() applies middleware only to matched routes. Unmatched requests fall through to the fallback without hitting this middleware. .layer() applies to all requests, including fallbacks.
Serving static assets
An HDA application typically serves a small set of CSS and JavaScript files. The rust-embed crate embeds an entire directory into the binary at compile time, producing a single self-contained executable.
[dependencies]
rust-embed = "8"
mime_guess = "2"
Define an embedded asset struct pointing at your assets directory:
use rust_embed::RustEmbed;
#[derive(RustEmbed)]
#[folder = "assets/"]
struct Assets;
Write a handler that serves embedded files:
use axum::{
extract::Path,
http::{header, StatusCode},
response::IntoResponse,
};
async fn static_handler(Path(path): Path<String>) -> impl IntoResponse {
match Assets::get(&path) {
Some(file) => {
let mime = mime_guess::from_path(&path).first_or_octet_stream();
(
[(header::CONTENT_TYPE, mime.as_ref())],
file.data.to_vec(),
)
.into_response()
}
None => StatusCode::NOT_FOUND.into_response(),
}
}
Mount it on the router:
let app = Router::new()
.route("/", get(index))
.route("/assets/{*path}", get(static_handler));
In debug builds, rust-embed reads files from disk, so changes to CSS and JavaScript appear without recompilation. In release builds, everything is baked into the binary.
If your project grows to include many large assets (images, fonts), consider tower-http’s ServeDir to serve from the filesystem instead, or move large files to object storage.
Graceful shutdown
axum::serve accepts a shutdown signal via .with_graceful_shutdown(). When the signal fires, the server stops accepting new connections and waits for in-flight requests to complete.
use tokio::signal;
async fn shutdown_signal() {
let ctrl_c = async {
signal::ctrl_c()
.await
.expect("failed to install Ctrl+C handler");
};
#[cfg(unix)]
let terminate = async {
signal::unix::signal(signal::unix::SignalKind::terminate())
.expect("failed to install SIGTERM handler")
.recv()
.await;
};
#[cfg(not(unix))]
let terminate = std::future::pending::<()>();
tokio::select! {
_ = ctrl_c => {},
_ = terminate => {},
}
}
Pass the signal to the server:
axum::serve(listener, app)
.with_graceful_shutdown(shutdown_signal())
.await
.unwrap();
This handles both Ctrl+C (SIGINT) and SIGTERM, which is what Docker and most process managers send when stopping a container. In production, consider adding a TimeoutLayer from tower-http so that slow in-flight requests cannot block shutdown indefinitely.
Putting it together
A complete main.rs combining routing, state, middleware, static assets, and graceful shutdown:
use axum::{
extract::{Path, State},
http::{header, StatusCode},
response::{Html, IntoResponse},
routing::get,
Router,
};
use rust_embed::RustEmbed;
use sqlx::PgPool;
use tokio::signal;
use tower::ServiceBuilder;
use tower_http::{compression::CompressionLayer, trace::TraceLayer};
use tracing_subscriber::EnvFilter;
#[derive(Clone)]
struct AppState {
db: PgPool,
config: AppConfig,
}
#[derive(Clone)]
struct AppConfig {
app_name: String,
}
#[derive(RustEmbed)]
#[folder = "assets/"]
struct Assets;
async fn static_handler(Path(path): Path<String>) -> impl IntoResponse {
match Assets::get(&path) {
Some(file) => {
let mime = mime_guess::from_path(&path).first_or_octet_stream();
([(header::CONTENT_TYPE, mime.as_ref())], file.data.to_vec())
.into_response()
}
None => StatusCode::NOT_FOUND.into_response(),
}
}
async fn index(State(state): State<AppState>) -> Html<String> {
Html(format!(
r#"<html>
<head><link rel="stylesheet" href="/assets/style.css"></head>
<body><h1>{}</h1></body>
</html>"#,
state.config.app_name
))
}
fn user_routes() -> Router<AppState> {
Router::new()
.route("/", get(list_users))
.route("/{id}", get(show_user))
}
async fn list_users() -> Html<&'static str> {
Html("<h1>Users</h1>")
}
async fn show_user(Path(id): Path<u64>) -> Html<String> {
Html(format!("<h1>User {id}</h1>"))
}
async fn shutdown_signal() {
let ctrl_c = async {
signal::ctrl_c()
.await
.expect("failed to install Ctrl+C handler");
};
#[cfg(unix)]
let terminate = async {
signal::unix::signal(signal::unix::SignalKind::terminate())
.expect("failed to install SIGTERM handler")
.recv()
.await;
};
#[cfg(not(unix))]
let terminate = std::future::pending::<()>();
tokio::select! {
_ = ctrl_c => {},
_ = terminate => {},
}
}
#[tokio::main]
async fn main() {
tracing_subscriber::fmt()
.with_env_filter(EnvFilter::from_default_env())
.init();
let database_url =
std::env::var("DATABASE_URL").expect("DATABASE_URL must be set");
let state = AppState {
db: PgPool::connect(&database_url).await.unwrap(),
config: AppConfig {
app_name: "My App".into(),
},
};
let app = Router::new()
.route("/", get(index))
.nest("/users", user_routes())
.route("/assets/{*path}", get(static_handler))
.layer(
ServiceBuilder::new()
.layer(TraceLayer::new_for_http())
.layer(CompressionLayer::new()),
)
.with_state(state);
let listener = tokio::net::TcpListener::bind("0.0.0.0:3000")
.await
.unwrap();
tracing::info!("listening on {}", listener.local_addr().unwrap());
axum::serve(listener, app)
.with_graceful_shutdown(shutdown_signal())
.await
.unwrap();
}
The corresponding dependencies:
[dependencies]
axum = "0.8"
tokio = { version = "1", features = ["full"] }
tower = "0.5"
tower-http = { version = "0.6", features = ["trace", "compression-gzip"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
sqlx = { version = "0.8", features = ["runtime-tokio", "postgres"] }
rust-embed = "8"
mime_guess = "2"
serde = { version = "1", features = ["derive"] }
HTML Templating with Maud
Maud is a compile-time HTML templating library for Rust. Its html! macro checks your markup at compile time and expands it to efficient string-building code, so there is no runtime template parsing, no template files to deploy, and no possibility of a missing closing tag appearing in production.
The Web Server with Axum section used Html<String> with format! for responses. That works for trivial cases, but it gives you no structure, no escaping, and no compile-time checking. Maud replaces it entirely. Handlers return Markup instead of Html<String>, and the compiler catches template errors before the server starts.
Setup
Add Maud to your Cargo.toml with the axum feature:
[dependencies]
maud = { version = "0.27", features = ["axum"] }
The axum feature implements IntoResponse for Maud’s Markup type, so handlers can return markup directly. It targets axum-core 0.5, which corresponds to Axum 0.8.
The html! macro
The html! macro is the core of Maud. It takes a custom syntax that resembles HTML but follows Rust conventions, and returns a Markup value:
use maud::{html, Markup};
let greeting = "world";
let page: Markup = html! {
h1 { "Hello, " (greeting) "!" }
};
Elements
Elements with content use curly braces. Void elements (those that cannot have children) use a semicolon:
html! {
h1 { "Page title" }
p {
strong { "Bold text" }
" followed by normal text."
}
br;
input type="text" name="query";
}
Non-void elements that need no content still use braces:
html! {
script src="/static/app.js" {}
div.placeholder {}
}
Attributes
Attributes appear after the element name, before the braces or semicolon:
html! {
input type="email" name="user_email" required placeholder="you@example.com";
a href="/about" { "About" }
article data-id="12345" { "Content" }
}
Classes and IDs have a shorthand syntax, chained directly onto the element:
html! {
input #search-input .form-control type="text";
div.card.shadow-sm { "Card content" }
}
A class or ID without an element name produces a div:
html! {
#main { "Main content" }
.sidebar { "Sidebar content" }
}
Quote class names that contain characters Maud’s parser would choke on:
html! {
div."col-sm-6" { "Column" }
}
Dynamic values with splices
Parentheses insert a Rust expression into the output. Maud automatically escapes HTML special characters:
let username = "Alice <script>alert('xss')</script>";
html! {
p { "Hello, " (username) "!" }
}
Any type implementing std::fmt::Display can be spliced. This includes strings, numbers, and any type with a Display implementation.
For dynamic attribute values, use parentheses for a single expression or braces for concatenation:
let user_id = 42;
let base = "/users";
html! {
span data-id=(user_id) { "User" }
a href={ (base) "/" (user_id) } { "Profile" }
}
Boolean attributes and toggles
Square brackets conditionally toggle boolean attributes and classes:
let is_active = true;
let is_disabled = false;
html! {
button disabled[is_disabled] { "Submit" }
a.nav-link.active[is_active] href="/" { "Home" }
}
Optional attributes
Attributes that take an Option value render only when Some:
let tooltip: Option<&str> = Some("More info");
let label: Option<&str> = None;
html! {
span title=[tooltip] { "Hover me" }
span aria-label=[label] { "No aria-label rendered" }
}
Control flow
Prefix control structures with @:
let user: Option<&str> = Some("Alice");
let items = vec!["Bread", "Milk", "Eggs"];
html! {
@if let Some(name) = user {
p { "Welcome, " (name) }
} @else {
p { a href="/login" { "Log in" } }
}
ul {
@for item in &items {
li { (item) }
}
}
@for (i, item) in items.iter().enumerate() {
@let label = format!("{}. {}", i + 1, item);
p { (label) }
}
@match items.len() {
0 => { p { "No items." } },
1 => { p { "One item." } },
n => { p { (n) " items." } },
}
}
DOCTYPE
Maud provides a DOCTYPE constant:
use maud::DOCTYPE;
html! {
(DOCTYPE)
html lang="en" {
head { title { "My App" } }
body { h1 { "Hello" } }
}
}
Raw HTML with PreEscaped
Maud escapes all spliced content by default. When you have trusted HTML that should not be escaped, wrap it in PreEscaped:
use maud::PreEscaped;
let svg = r#"<svg viewBox="0 0 100 100"><circle cx="50" cy="50" r="40"/></svg>"#;
html! {
div.icon { (PreEscaped(svg)) }
}
Use this for inline SVGs, pre-rendered markdown, or other HTML you control. Never pass user input to PreEscaped.
Components as functions
Maud has no built-in component system. Components are Rust functions that return Markup. This is simpler and more flexible than a template inheritance system, because you have the full language for composition, branching, and parameterisation.
A basic component:
use maud::{html, Markup};
fn nav_link(href: &str, text: &str, active: bool) -> Markup {
html! {
a.nav-link.active[active] href=(href) { (text) }
}
}
Use it by calling the function inside a splice:
html! {
nav {
(nav_link("/", "Home", true))
(nav_link("/about", "About", false))
(nav_link("/contact", "Contact", false))
}
}
Passing content blocks
The simplest approach is accepting Markup directly:
fn card(title: &str, body: Markup) -> Markup {
html! {
div.card {
div.card-header { h3 { (title) } }
div.card-body { (body) }
}
}
}
let output = card("Settings", html! {
p { "Adjust your preferences below." }
form method="post" {
}
});
A more flexible approach is accepting anything that implements Render. This lets callers pass Markup, strings, numbers, or any custom type with a Render implementation, without forcing them to wrap everything in html!:
use maud::Render;
fn card(title: &str, body: impl Render) -> Markup {
html! {
div.card {
div.card-header { h3 { (title) } }
div.card-body { (body) }
}
}
}
card("Note", html! { p { "Rich content." } });
card("Note", "Plain text content");
card("Note", my_renderable_struct);
Prefer impl Render over Markup for component parameters. It is a small change that makes components more composable.
Components as structs with Render
When a component has several fields, or when you want it to compose via splice syntax rather than a function call, make it a struct that implements Render:
use maud::{html, Markup, Render};
enum AlertLevel {
Info,
Warning,
Error,
}
struct Alert<'a, B: Render> {
level: AlertLevel,
title: &'a str,
body: B,
dismissible: bool,
}
impl<B: Render> Render for Alert<'_, B> {
fn render(&self) -> Markup {
let class = match self.level {
AlertLevel::Info => "alert-info",
AlertLevel::Warning => "alert-warning",
AlertLevel::Error => "alert-error",
};
html! {
div.alert.(class) role="alert" {
strong { (self.title) }
div { (self.body) }
@if self.dismissible {
button.close type="button" { "×" }
}
}
}
}
}
Splice it directly, no wrapper function needed:
html! {
(Alert {
level: AlertLevel::Warning,
title: "Disk space low",
body: html! { p { "Less than 10% remaining." } },
dismissible: true,
})
}
Another example, a breadcrumb navigation:
struct Breadcrumb {
segments: Vec<(String, String)>,
}
impl Render for Breadcrumb {
fn render(&self) -> Markup {
html! {
nav aria-label="breadcrumb" {
ol.breadcrumb {
@for (i, (label, href)) in self.segments.iter().enumerate() {
@let is_last = i == self.segments.len() - 1;
li.breadcrumb-item.active[is_last] {
@if is_last {
(label)
} @else {
a href=(href) { (label) }
}
}
}
}
}
}
}
}
Reach for Render when a component has enough fields that a function signature would get unwieldy, when it will be stored in collections and rendered in loops, or when other crates need to provide renderable types. For simple one- or two-parameter components, plain functions are shorter and sufficient.
Page layouts
A layout is a function that wraps content in a full HTML document. Since layouts in an HDA application typically need request context (the current user, flash messages, navigation state), build the layout as an Axum extractor.
First, a minimal layout function to show the shape:
use maud::{html, Markup, DOCTYPE};
fn base_layout(title: &str, content: Markup) -> Markup {
html! {
(DOCTYPE)
html lang="en" {
head {
meta charset="utf-8";
meta name="viewport" content="width=device-width, initial-scale=1";
title { (title) }
link rel="stylesheet" href="/assets/style.css";
script src="/assets/htmx.min.js" defer {}
}
body {
main { (content) }
}
}
}
}
In practice, layouts need data from the request: the authenticated user for navigation, flash messages from the session, the current path for active link highlighting. Extract all of this in a layout struct that implements FromRequestParts:
use axum::extract::FromRequestParts;
use axum::http::request::Parts;
use maud::{html, Markup, DOCTYPE};
struct PageLayout {
user: Option<User>,
current_path: String,
}
impl<S: Send + Sync> FromRequestParts<S> for PageLayout {
type Rejection = std::convert::Infallible;
async fn from_request_parts(
parts: &mut Parts,
_state: &S,
) -> Result<Self, Self::Rejection> {
let user = parts.extensions.get::<User>().cloned();
let current_path = parts.uri.path().to_string();
Ok(PageLayout { user, current_path })
}
}
impl PageLayout {
fn render(self, title: &str, content: Markup) -> Markup {
html! {
(DOCTYPE)
html lang="en" {
head {
meta charset="utf-8";
meta name="viewport" content="width=device-width, initial-scale=1";
title { (title) }
link rel="stylesheet" href="/assets/style.css";
script src="/assets/htmx.min.js" defer {}
}
body {
nav {
a.active[self.current_path == "/"] href="/" { "Home" }
a.active[self.current_path.starts_with("/users")]
href="/users" { "Users" }
div.nav-right {
@if let Some(user) = &self.user {
span { (user.name) }
a href="/logout" { "Log out" }
} @else {
a href="/login" { "Log in" }
}
}
}
main { (content) }
footer {
p { "© 2026" }
}
}
}
}
}
}
Handlers extract the layout alongside other parameters:
async fn user_list(
layout: PageLayout,
State(state): State<AppState>,
) -> Markup {
let users = fetch_users(&state.db).await;
layout.render("Users", html! {
h1 { "Users" }
ul {
@for user in &users {
li { (user.name) }
}
}
})
}
The handler focuses on its content. The layout handles the document shell, navigation, and any request-scoped data. Add fields to PageLayout as the application grows (flash messages, CSRF tokens, feature flags) without changing handler signatures.
Full pages vs HTML fragments
In an HDA application, the same handler often needs to return a full HTML page for normal browser requests and a bare HTML fragment for htmx requests. A normal navigation loads the entire page. An htmx-boosted link or hx-get request only needs the content that will be swapped into the page.
The axum-htmx crate provides typed extractors for htmx request headers:
[dependencies]
axum-htmx = "0.6"
Use HxBoosted to detect boosted navigation (where htmx intercepts a normal link click and swaps just the body), or HxRequest to detect any htmx-initiated request:
use axum_htmx::HxBoosted;
async fn user_list(
HxBoosted(boosted): HxBoosted,
layout: PageLayout,
State(state): State<AppState>,
) -> Markup {
let users = fetch_users(&state.db).await;
let content = html! {
h1 { "Users" }
ul {
@for user in &users {
li { (user.name) }
}
}
};
if boosted {
content
} else {
layout.render("Users", content)
}
}
For targeted fragment swaps (where htmx replaces a specific element on the page), handlers return only the fragment:
use axum_htmx::HxRequest;
async fn user_search(
HxRequest(is_htmx): HxRequest,
Query(params): Query<SearchParams>,
layout: PageLayout,
State(state): State<AppState>,
) -> Markup {
let users = search_users(&state.db, ¶ms.q).await;
let results = html! {
ul #search-results {
@for user in &users {
li { (user.name) }
}
}
};
if is_htmx {
results
} else {
layout.render("Search", html! {
h1 { "Search users" }
input type="search" name="q" value=(params.q)
hx-get="/users/search"
hx-target="#search-results"
hx-trigger="input changed delay:300ms";
(results)
})
}
}
This pattern means every URL works as a full page when accessed directly (bookmarks, shared links, first page load) and as a fragment when accessed via htmx. No separate endpoint needed.
htmx attributes in Maud
htmx attributes use the hx- prefix, which works naturally in Maud:
html! {
button hx-get="/api/data" hx-target="#results" hx-swap="innerHTML" {
"Load data"
}
form hx-post="/contacts" hx-target="#contact-list" hx-swap="beforeend" {
input type="text" name="name" required;
button type="submit" { "Add contact" }
}
tr hx-get={ "/users/" (user.id) "/edit" } hx-trigger="click"
hx-target="this" hx-swap="outerHTML" {
td { (user.name) }
td { (user.email) }
}
button hx-delete={ "/users/" (user.id) }
hx-confirm="Delete this user?"
hx-target="closest tr"
hx-swap="outerHTML swap:500ms" {
"Delete"
}
}
The Interactivity with htmx section covers htmx patterns in full.
Gotchas
Semicolons on void elements. Forgetting the semicolon on input, br, meta, link, or img causes a compile error. If the compiler complains about unexpected tokens after an element name, check for a missing semicolon.
input type="text" { }
input type="text";
The @ prefix is mandatory for control flow. All if, for, let, and match inside html! must start with @. Without it, Maud tries to parse the keyword as an element name.
Brace vs parenthesis in attributes. Parentheses splice a single expression. Braces concatenate multiple parts. Using parentheses when you need concatenation silently drops everything after the first expression:
let id = 42;
a href=("/users/") (id) { "Profile" }
a href={ "/users/" (id) } { "Profile" }
Compile-time cost. Maud macros expand at compile time, which is good for runtime performance but can slow incremental builds on large templates. Breaking templates into smaller functions across modules helps, because Rust only recompiles the modules that changed.
Interactivity with HTMX
htmx gives HTML the ability to issue HTTP requests and swap content into the page, without writing JavaScript. Add attributes to your markup, and htmx handles the rest: it sends an AJAX request, receives an HTML fragment from the server, and replaces a targeted element in the DOM. The server stays in control of rendering. There is no client-side state, no JSON serialisation layer, and no build step.
htmx is 14 KB minified and gzipped, has zero runtime dependencies, and works with any server that returns HTML. In this stack, Axum handlers return Maud Markup and htmx swaps it into place.
Including htmx
Vendor htmx into your project rather than loading it from a CDN. Download the minified file and place it in your assets directory:
assets/
htmx.min.js
The Web Server with Axum section covers serving static assets with rust-embed. Include htmx in your layout’s <head>:
head {
script src="/assets/htmx.min.js" defer {}
}
The defer attribute ensures htmx loads after HTML parsing completes, avoiding render-blocking.
Vendoring eliminates CDN availability concerns and keeps the dependency auditable. htmx has zero transitive runtime dependencies, so you are vendoring exactly one file.
How htmx works
htmx extends HTML with attributes that describe HTTP interactions declaratively. The core mechanism is:
- An event occurs on an element (click, submit, keyup, or any DOM event).
- htmx sends an HTTP request (GET, POST, PUT, PATCH, DELETE) to a URL.
- The server returns an HTML fragment.
- htmx swaps that fragment into a target element in the DOM.
Every step is controlled by HTML attributes. No JavaScript is written by the application developer.
button hx-get="/clicked" hx-target="#result" hx-swap="innerHTML" {
"Click me"
}
div #result {}
When the button is clicked, htmx issues GET /clicked, takes the response body, and replaces the inner HTML of #result with it. The handler for /clicked returns a Maud fragment:
async fn clicked() -> Markup {
html! { p { "You clicked the button." } }
}
That is the entire pattern. Everything else in htmx is refinement of these four concepts: what triggers the request, what request to send, where to put the response, and how to swap it in.
Core attributes
Request attributes
Five attributes correspond to the five HTTP methods:
| Attribute | Method | Typical use |
hx-get | GET | Fetch and display data |
hx-post | POST | Submit data, create resources |
hx-put | PUT | Full resource replacement |
hx-patch | PATCH | Partial update |
hx-delete | DELETE | Remove a resource |
Each takes a URL as its value:
button hx-get="/users" hx-target="#user-list" { "Load users" }
form hx-post="/users" hx-target="#user-list" hx-swap="beforeend" {
input type="text" name="name" required;
button type="submit" { "Add user" }
}
button hx-delete={ "/users/" (user.id) }
hx-confirm="Delete this user?"
hx-target="closest tr"
hx-swap="outerHTML" {
"Delete"
}
htmx sends form data automatically for elements within a form. For elements outside a form, use hx-include to specify which inputs to include in the request.
hx-target
hx-target specifies which element receives the swapped content. It takes a CSS selector, or one of these special values:
this: the element that triggered the request
closest <selector>: the nearest ancestor matching the selector
find <selector>: the first descendant matching the selector
next <selector>: the next sibling matching the selector
previous <selector>: the previous sibling matching the selector
If hx-target is omitted, htmx swaps content into the element that issued the request.
button hx-get="/stats" hx-target="#dashboard-stats" { "Refresh" }
button hx-delete={ "/items/" (item.id) }
hx-target="closest li"
hx-swap="outerHTML" {
"Remove"
}
hx-swap
hx-swap controls how the response is inserted relative to the target. The default is innerHTML.
| Value | Behaviour |
innerHTML | Replace the target’s children (default) |
outerHTML | Replace the entire target element |
beforebegin | Insert before the target |
afterbegin | Insert as the target’s first child |
beforeend | Insert as the target’s last child |
afterend | Insert after the target |
delete | Delete the target element, ignore the response |
none | Don’t swap anything (out-of-band swaps still process) |
beforeend is particularly useful for appending to lists:
form hx-post="/todos" hx-target="#todo-list" hx-swap="beforeend" {
input type="text" name="title" placeholder="New todo" required;
button type="submit" { "Add" }
}
ul #todo-list {
@for todo in &todos {
li { (todo.title) }
}
}
outerHTML replaces the target itself, which is the right choice for inline editing where the display row swaps with an edit form and back:
tr hx-get={ "/contacts/" (contact.id) "/edit" }
hx-trigger="click"
hx-target="this"
hx-swap="outerHTML" {
td { (contact.name) }
td { (contact.email) }
}
Swap modifiers
Append modifiers to the swap value, separated by spaces:
swap:<timing>: delay before performing the swap (e.g., swap:100ms)
settle:<timing>: delay between swap and settle phase, useful for CSS transitions (e.g., settle:200ms)
scroll:<target>:<direction>: scroll the target or window after swap (scroll:top, scroll:bottom)
show:<target>:<direction>: scroll to show the swapped element (show:top, show:bottom)
transition:true: use the View Transitions API for the swap animation
button hx-delete={ "/users/" (user.id) }
hx-target="closest tr"
hx-swap="outerHTML swap:500ms" {
"Delete"
}
div hx-get="/page/2" hx-swap="innerHTML show:top" {
"Load more"
}
hx-confirm
hx-confirm shows a browser confirmation dialog before the request is sent. The request only proceeds if the user confirms:
button hx-delete={ "/projects/" (project.id) }
hx-confirm="Are you sure? This cannot be undone."
hx-target="closest .project-card"
hx-swap="outerHTML" {
"Delete project"
}
hx-select
hx-select extracts a subset of the response using a CSS selector before swapping. This is useful when a handler returns a full page but you only need a fragment:
a hx-get="/search?q=rust" hx-target="#results" hx-select="#results" {
"Search for Rust"
}
hx-include
hx-include tells htmx to include values from additional elements in the request. Accepts a CSS selector:
input #search type="text" name="q";
button hx-get="/search" hx-include="#search" hx-target="#results" {
"Search"
}
hx-vals and hx-headers
hx-vals adds extra parameters to the request body as JSON:
button hx-post="/track"
hx-vals=r#"{"event": "button_click", "source": "header"}"# {
"Track"
}
hx-headers adds custom HTTP headers:
button hx-get="/api/data"
hx-headers=r#"{"X-Custom-Header": "value"}"# {
"Fetch"
}
hx-push-url
hx-push-url pushes the request URL into the browser’s history stack, so the back button works. Set it to true to push the request URL, or provide a specific URL:
a hx-get="/users" hx-target="#content" hx-push-url="true" {
"Users"
}
button hx-get="/users?page=2" hx-target="#user-list"
hx-push-url="/users/page/2" {
"Next page"
}
Triggering requests
hx-trigger
hx-trigger specifies which event initiates the request. Without it, htmx uses the natural event for each element type:
| Element | Default trigger |
input, textarea, select | change |
form | submit |
| Everything else | click |
Override the default by specifying any DOM event:
input type="search" name="q"
hx-get="/search"
hx-target="#results"
hx-trigger="keyup" {
}
Trigger modifiers
Modifiers refine when and how triggers fire. Append them after the event name, separated by spaces:
changed: only fire if the element’s value has actually changed since the last request:
input type="search" name="q"
hx-get="/search"
hx-target="#results"
hx-trigger="keyup changed";
delay:<time>: wait before firing. Each new event resets the timer. This is debouncing: the request fires only after the user stops typing:
input type="search" name="q"
hx-get="/search"
hx-target="#results"
hx-trigger="keyup changed delay:300ms";
throttle:<time>: fire at most once per interval. Unlike delay, the first event fires immediately. Subsequent events within the window are dropped:
div hx-post="/position"
hx-trigger="mousemove throttle:200ms" {
"Track mouse"
}
from:<selector>: listen for the event on a different element. Useful for keyboard shortcuts:
div hx-get="/search"
hx-target="#results"
hx-trigger="keyup[key=='Enter'] from:body" {
}
consume: prevent the event from propagating to parent elements.
queue:<strategy>: control what happens when a new event fires while a request is in flight:
queue:first: queue the first event, drop the rest
queue:last: queue only the most recent event (default)
queue:all: queue every event, process them one at a time
queue:none: drop all events while a request is active
Multiple triggers
Separate multiple triggers with commas:
div hx-get="/notifications"
hx-trigger="load, click"
hx-target="this" {
"Loading..."
}
Polling
Use the every syntax to poll an endpoint at a fixed interval:
div hx-get="/status"
hx-trigger="every 5s"
hx-target="this" {
"Checking status..."
}
Event filters
Square brackets filter events by a JavaScript expression:
input type="text" name="q"
hx-get="/search"
hx-target="#results"
hx-trigger="keyup[key=='Enter']";
Boosted links and navigation
hx-boost converts standard links and forms into AJAX requests. Apply it to a parent element and all descendant <a> and <form> elements are automatically boosted:
body hx-boost="true" {
nav {
a href="/users" { "Users" }
a href="/settings" { "Settings" }
}
main #content {
(content)
}
}
When a user clicks a boosted link, htmx:
- Issues a GET request to the link’s
href
- Swaps the response into
<body> using innerHTML
- Pushes the URL into browser history
The page does not fully reload. The browser keeps the existing <head> (scripts, stylesheets) and only swaps the body content, making navigation feel instant.
Boosted forms work the same way: the form is submitted via AJAX and the response replaces the body.
Progressive enhancement is built in. If JavaScript is disabled or fails to load, boosted links and forms still work as standard HTML. The server returns the same full HTML page either way. htmx intercepts the navigation when it can; the browser handles it normally when it cannot.
Detecting boosted requests on the server
Boosted requests include the HX-Boosted: true header. Use this to return just the body content instead of a full HTML document, avoiding redundant <head> parsing:
use axum_htmx::HxBoosted;
async fn users_page(
HxBoosted(boosted): HxBoosted,
layout: PageLayout,
State(state): State<AppState>,
) -> Markup {
let users = fetch_users(&state.db).await;
let content = html! {
h1 { "Users" }
ul {
@for user in &users {
li { (user.name) }
}
}
};
if boosted {
content
} else {
layout.render("Users", content)
}
}
Every URL works as both a full page (direct navigation, bookmarks) and a fragment (boosted navigation). One handler, one template, no separate endpoint.
Loading indicators
htmx adds the htmx-request CSS class to elements while a request is in flight. Use this to show loading spinners, disable buttons, or fade content.
Default behaviour
By default, htmx adds htmx-request to the element that issued the request:
button hx-get="/slow-endpoint" hx-target="#result" {
"Load data"
}
Style the loading state with CSS:
button.htmx-request {
opacity: 0.5;
pointer-events: none;
}
hx-indicator
hx-indicator specifies a different element to receive the htmx-request class. This is useful for showing a spinner that is separate from the trigger element:
button hx-get="/data" hx-target="#results" hx-indicator="#spinner" {
"Load"
}
img #spinner.htmx-indicator src="/assets/spinner.svg" alt="Loading";
htmx includes default CSS for the htmx-indicator class that hides the element until the request is active:
.htmx-indicator {
opacity: 0;
}
.htmx-request .htmx-indicator,
.htmx-request.htmx-indicator {
opacity: 1;
transition: opacity 200ms ease-in;
}
Override these styles to match your application’s design. The visibility approach avoids layout shifts:
.htmx-indicator {
display: none;
}
.htmx-request .htmx-indicator,
.htmx-request.htmx-indicator {
display: inline-block;
}
Inline loading text
A common pattern replaces the button text while loading:
button hx-post="/save" hx-target="#form-container" hx-swap="outerHTML" {
span.ready { "Save" }
span.htmx-indicator { "Saving..." }
}
button .htmx-indicator { display: none; }
button.htmx-request .ready { display: none; }
button.htmx-request .htmx-indicator { display: inline; }
Working with Axum
The axum-htmx crate provides typed extractors for htmx request headers and typed responders for htmx response headers:
[dependencies]
axum-htmx = "0.6"
All htmx request headers have a corresponding extractor. Extractors are infallible, so they always succeed and never reject a request:
| Header | Extractor | Value |
HX-Request | HxRequest | bool |
HX-Boosted | HxBoosted | bool |
HX-Target | HxTarget | Option<String> |
HX-Trigger | HxTrigger | Option<String> |
HX-Trigger-Name | HxTriggerName | Option<String> |
HX-Current-URL | HxCurrentUrl | Option<Uri> |
HX-Prompt | HxPrompt | Option<String> |
HxRequest detects any htmx-initiated request. HxBoosted specifically detects boosted navigation. Use whichever matches the handler’s needs:
use axum_htmx::HxRequest;
async fn search(
HxRequest(is_htmx): HxRequest,
Query(params): Query<SearchParams>,
layout: PageLayout,
State(state): State<AppState>,
) -> Markup {
let users = search_users(&state.db, ¶ms.q).await;
let results = html! {
ul #search-results {
@for user in &users {
li { (user.name) " – " (user.email) }
}
}
};
if is_htmx {
results
} else {
layout.render("Search", html! {
h1 { "Search users" }
input type="search" name="q" value=(params.q)
hx-get="/users/search"
hx-target="#search-results"
hx-trigger="input changed delay:300ms";
(results)
})
}
}
This pattern ensures every URL works as both a full page (direct navigation, bookmarks, search engine indexing) and as a fragment (htmx requests). The handler renders the same data either way; the only difference is whether it wraps the content in the layout.
Returning fragments from handlers
Handlers that serve only htmx requests return a bare Maud Markup:
async fn delete_user(
Path(id): Path<i64>,
State(state): State<AppState>,
) -> Markup {
sqlx::query!("DELETE FROM users WHERE id = $1", id)
.execute(&state.db)
.await
.unwrap();
html! {}
}
For delete operations, the handler returns an empty fragment. Combined with hx-swap="outerHTML" on the trigger element, this removes the target element from the DOM.
htmx checks specific response headers to control client-side behaviour. The axum-htmx crate provides typed responders for each header. Return them as part of a tuple with your response body.
HX-Redirect
Forces a full-page redirect (not an htmx swap). Use this for operations that should leave the current page entirely, like a successful login:
use axum_htmx::HxRedirect;
async fn login(Form(data): Form<LoginForm>) -> impl IntoResponse {
(HxRedirect("/dashboard".to_string()), html! {})
}
htmx intercepts the response and performs window.location = url. The browser does a full navigation. This is different from a standard HTTP 302 redirect, which the browser handles transparently before htmx sees the response.
HX-Push-Url and HX-Replace-Url
HX-Push-Url pushes a URL into the browser’s history stack (creates a new history entry). HX-Replace-Url replaces the current history entry. Both let the server control the displayed URL after a swap:
use axum_htmx::HxPushUrl;
async fn filter_users(
Query(params): Query<FilterParams>,
State(state): State<AppState>,
) -> impl IntoResponse {
let users = filter_users(&state.db, ¶ms).await;
let url = format!("/users?role={}", params.role);
(
HxPushUrl(url),
html! {
@for user in &users {
tr {
td { (user.name) }
td { (user.role) }
}
}
},
)
}
HX-Retarget and HX-Reswap
HX-Retarget overrides the element’s hx-target from the server side. HX-Reswap overrides hx-swap. Together, they let the server change where and how content is placed based on the response:
use axum_htmx::{HxRetarget, HxReswap, SwapOption};
use axum::http::StatusCode;
async fn create_user(Form(data): Form<NewUser>) -> impl IntoResponse {
match validate_and_save(&data).await {
Ok(user) => {
(StatusCode::OK, html! {
tr {
td { (user.name) }
td { (user.email) }
}
}).into_response()
}
Err(errors) => {
(
StatusCode::UNPROCESSABLE_ENTITY,
HxRetarget("#form-errors".to_string()),
HxReswap(SwapOption::InnerHtml),
html! {
ul.errors {
@for error in &errors {
li { (error) }
}
}
},
).into_response()
}
}
}
This is a powerful pattern: the form’s hx-target and hx-swap describe the success case. When validation fails, the server redirects the swap to a different element without any client-side logic.
HX-Trigger (response)
HX-Trigger fires custom events on the client after the response is processed. Other elements on the page can listen for these events using hx-trigger="from:body":
use axum_htmx::HxResponseTrigger;
async fn create_todo(
Form(data): Form<NewTodo>,
State(state): State<AppState>,
) -> impl IntoResponse {
let todo = save_todo(&state.db, &data).await.unwrap();
(
HxResponseTrigger::normal(vec!["todo-added".to_string()]),
html! {
li { (todo.title) }
},
)
}
An element elsewhere on the page can react to this event:
span hx-get="/todos/count"
hx-trigger="todo-added from:body"
hx-target="this" {
(count)
}
HxResponseTrigger supports three timing modes:
HxResponseTrigger::normal(): fires immediately (sets HX-Trigger)
HxResponseTrigger::after_swap(): fires after the swap completes (sets HX-Trigger-After-Swap)
HxResponseTrigger::after_settle(): fires after the settle phase (sets HX-Trigger-After-Settle)
HX-Refresh
Forces a full page refresh:
use axum_htmx::HxRefresh;
async fn clear_cache() -> impl IntoResponse {
(HxRefresh(true), html! {})
}
Complete responder reference
| Header | Responder | Value |
HX-Location | HxLocation | String |
HX-Push-Url | HxPushUrl | String |
HX-Redirect | HxRedirect | String |
HX-Refresh | HxRefresh | bool |
HX-Replace-Url | HxReplaceUrl | String |
HX-Reswap | HxReswap | SwapOption |
HX-Retarget | HxRetarget | String |
HX-Reselect | HxReselect | String |
HX-Trigger | HxResponseTrigger | Vec<String> or Vec<HxEvent> |
All responders implement IntoResponseParts, so they compose naturally with Maud’s Markup in tuples.
A basic pattern: submit a form with hx-post, display validation errors inline if the submission fails. The Form Handling and Validation section covers this topic in full.
The form:
fn new_contact_form(errors: &[String]) -> Markup {
html! {
form #contact-form hx-post="/contacts" hx-target="this" hx-swap="outerHTML" {
@if !errors.is_empty() {
ul.errors {
@for error in errors {
li { (error) }
}
}
}
label {
"Name"
input type="text" name="name" required;
}
label {
"Email"
input type="email" name="email" required;
}
button type="submit" {
span.ready { "Save" }
span.htmx-indicator { "Saving..." }
}
}
}
}
The handler:
async fn create_contact(
State(state): State<AppState>,
Form(data): Form<NewContact>,
) -> Markup {
let mut errors = Vec::new();
if data.name.trim().is_empty() {
errors.push("Name is required.".to_string());
}
if !data.email.contains('@') {
errors.push("A valid email is required.".to_string());
}
if !errors.is_empty() {
return new_contact_form(&errors);
}
let contact = save_contact(&state.db, &data).await.unwrap();
html! {
tr {
td { (contact.name) }
td { (contact.email) }
}
}
}
On validation failure, the handler returns the form again with error messages. The form’s hx-swap="outerHTML" replaces itself with the re-rendered version, preserving the user’s input. On success, the handler returns a table row, which replaces the form.
SSE extension
htmx includes an SSE (Server-Sent Events) extension for receiving real-time server-pushed updates. The Server-Sent Events and Real-Time Updates section covers SSE in depth. Here is the basic setup.
Include the SSE extension after htmx. Vendor the file alongside htmx.min.js:
assets/
htmx.min.js
ext/
sse.js
Include it in the layout:
head {
script src="/assets/htmx.min.js" defer {}
script src="/assets/ext/sse.js" defer {}
}
Connect to an SSE endpoint and swap content when events arrive:
div hx-ext="sse" sse-connect="/events" {
div sse-swap="notification" {
"Waiting for notifications..."
}
div sse-swap="status" {
"Status: unknown"
}
}
The Axum handler returns an SSE stream. When the server sends an event named notification, htmx takes the event’s data (an HTML fragment) and swaps it into the element with sse-swap="notification". The browser handles reconnection automatically if the connection drops.
Gotchas
htmx processes 2xx responses only. By default, htmx swaps content only for 200-level status codes. Non-2xx responses are ignored (no swap happens). To swap error content into the page, either return a 200 with error markup, or configure htmx’s responseHandling to process specific error codes. The HX-Retarget and HX-Reswap headers offer a clean alternative: return the error markup with a 422 status and redirect the swap to an error container.
3xx redirects bypass htmx headers. When a server returns a 302 or 301, the browser follows the redirect transparently. htmx never sees the response headers. Use HX-Redirect (with a 200 status) instead of HTTP 302 when you need htmx to process the redirect.
hx-boost changes form encoding. Boosted forms send requests via AJAX. If a form uses enctype="multipart/form-data" for file uploads, htmx handles this correctly. But be aware that boosted GET forms append parameters to the URL rather than the body, matching standard HTML form behaviour.
History and the back button. When using hx-push-url or hx-boost, htmx caches page snapshots for the back button. If your pages include dynamic state (e.g., a logged-in user’s name in the nav), the cached snapshot may show stale data. htmx fires an htmx:historyRestore event when restoring from cache, which you can use to refresh stale sections.
Attribute inheritance. htmx attributes inherit down the DOM tree. An hx-target on a <div> applies to all htmx-enabled elements inside it. This is useful for setting defaults (e.g., hx-target="#content" on a container), but can cause surprises if a nested element inherits a target you did not intend. Use hx-target="unset" to break inheritance.
CSS Without Frameworks
CSS frameworks and preprocessors exist to solve problems that the web platform now handles natively. CSS nesting, container queries, :has(), @layer, and custom properties eliminate the need for Sass, Less, or utility-class frameworks. This section covers writing plain CSS for an HDA application, processing it with the lightningcss crate, and co-locating styles alongside Maud components using the inventory crate.
The result is a single processed stylesheet, built at startup from a base CSS file and component-scoped fragments, minified and vendor-prefixed, served from memory with cache-busting.
Plain CSS in 2026
Native CSS now provides the features that historically required preprocessors:
- Nesting replaces Sass/Less nesting syntax. Write
.card { .title { ... } } directly.
- Custom properties (
--color-primary: #1a1a2e;) replace preprocessor variables, with the advantage of being runtime-configurable and inheritable through the DOM.
@layer controls cascade priority without specificity hacks.
- Container queries let components respond to their container’s size rather than the viewport.
:has() selects elements based on their children, replacing many patterns that previously required JavaScript.
The Web Platform Has Caught Up section covers these features in detail. This section focuses on the tooling pipeline: how to write, process, and serve CSS in a Rust HDA application.
CSS organisation with RSCSS
RSCSS (Reasonable System for CSS Stylesheet Structure) provides a lightweight naming convention that works well with component-based architectures. It imposes just enough structure to keep styles maintainable without the ceremony of BEM or the magic of CSS Modules.
The core rules:
- Components are named with at least two words, separated by dashes:
.search-form, .article-card, .user-profile.
- Elements within a component use a single word:
.title, .body, .avatar. Multi-word elements are concatenated: .firstname, .submitbutton. Use the child selector (>) to prevent styles bleeding into nested components.
- Variants modify a component or element. RSCSS normally prefixes variants with a dash (
.search-form.-compact), but dashes at the start of a class name are awkward in Maud templates. Use a double underscore prefix instead: .search-form.__compact. The double underscore distinguishes variants from helpers at a glance.
- Helpers are global utility classes prefixed with a single underscore:
._hidden, ._center. Keep these minimal.
In practice:
.article-card {
border: 1px solid var(--border);
border-radius: 0.5rem;
padding: 1rem;
> .title {
font-size: 1.25rem;
font-weight: 600;
}
> .meta {
color: var(--text-muted);
font-size: 0.875rem;
}
&.__featured {
border-color: var(--accent);
}
}
And the corresponding Maud component:
fn article_card(article: &Article) -> Markup {
html! {
div.article-card.__featured[article.featured] {
h2.title { (article.title) }
p.meta { "By " (article.author) }
}
}
}
The two-word component rule means component classes never collide with single-word element classes. The double-underscore variant prefix is visually distinct from both element classes and helper utilities, and works cleanly in Maud’s class syntax.
lightningcss
lightningcss is a CSS parser, transformer, and minifier written in Rust by the Parcel team. It processes over 2.7 million lines of CSS per second on a single thread. Use it to minify, vendor-prefix, and downlevel modern CSS syntax for older browsers.
Add it to Cargo.toml:
[dependencies]
lightningcss = { version = "1.0.0-alpha.70", default-features = false }
Disable default features to avoid pulling in Node.js binding dependencies. Enable bundler if you need @import resolution, or visitor if you need custom AST transforms.
A function to process a CSS string:
use lightningcss::stylesheet::{StyleSheet, ParserOptions, MinifyOptions};
use lightningcss::printer::PrinterOptions;
use lightningcss::targets::{Targets, Browsers};
pub fn process_css(raw: &str) -> Result<String, String> {
let targets = Targets::from(Browsers {
chrome: Some(95 << 16),
firefox: Some(90 << 16),
safari: Some(15 << 16),
..Browsers::default()
});
let mut stylesheet = StyleSheet::parse(raw, ParserOptions {
filename: "styles.css".to_string(),
..ParserOptions::default()
})
.map_err(|e| format!("CSS parse error: {e}"))?;
stylesheet
.minify(MinifyOptions {
targets,
..MinifyOptions::default()
})
.map_err(|e| format!("CSS minify error: {e}"))?;
let result = stylesheet
.to_css(PrinterOptions {
minify: true,
targets,
..PrinterOptions::default()
})
.map_err(|e| format!("CSS print error: {e}"))?;
Ok(result.code)
}
Browser targets are encoded as major << 16 | minor << 8 | patch. Chrome 95 is 95 << 16.
Pass targets to both MinifyOptions and PrinterOptions. The minify step transforms modern syntax (nesting, oklch() colours, logical properties) into forms the target browsers understand. The printer step serialises the result, applying minification when minify: true.
What lightningcss handles automatically:
- Flattens CSS nesting for older browsers
- Adds vendor prefixes (
-webkit-, -moz-) where targets require them
- Converts modern colour functions (
oklch(), lab(), color-mix()) to rgb()/rgba() fallbacks
- Transpiles logical properties (
margin-inline-start) to physical equivalents
- Converts media query range syntax (
@media (width >= 768px)) to min-width form
- Merges longhand properties into shorthands
- Removes redundant vendor prefixes the targets don’t need
Locality of behaviour with inventory
The inventory crate provides a distributed registration pattern: declare values in any module, collect them all in one place at startup. This enables locality of behaviour for CSS, where each component’s styles live in the same file as its markup.
[dependencies]
inventory = "0.3"
Define a CSS fragment type
Create a type to hold a CSS fragment and register it with inventory::collect!:
pub struct CssFragment(pub &'static str);
inventory::collect!(CssFragment);
Co-locate CSS with components
In each component file, declare the CSS alongside the markup using inventory::submit!:
use maud::{html, Markup};
use crate::styles::CssFragment;
inventory::submit! {
CssFragment(r#"
.article-card {
border: 1px solid var(--border);
border-radius: 0.5rem;
padding: 1rem;
> .title {
font-size: 1.25rem;
font-weight: 600;
}
> .meta {
color: var(--text-muted);
font-size: 0.875rem;
}
&.__featured {
border-color: var(--accent);
}
}
"#)
}
pub fn article_card(article: &Article) -> Markup {
html! {
div.article-card.__featured[article.featured] {
h2.title { (article.title) }
p.meta { "By " (article.author) }
}
}
}
Adding a new component with styles requires no changes to any other file. The CSS lives next to the markup that uses it.
Another component
use maud::{html, Markup};
use crate::styles::CssFragment;
inventory::submit! {
CssFragment(r#"
.nav-bar {
display: flex;
align-items: center;
gap: 1rem;
padding: 0.75rem 1.5rem;
background: var(--nav-bg);
> .link {
color: var(--nav-link);
text-decoration: none;
}
> .link.__active {
font-weight: 600;
color: var(--nav-link-active);
}
}
"#)
}
pub fn nav_bar(current_path: &str) -> Markup {
html! {
nav.nav-bar {
a.link.__active[current_path == "/"] href="/" { "Home" }
a.link.__active[current_path.starts_with("/users")] href="/users" { "Users" }
}
}
}
The processing pipeline
At startup, collect all CSS fragments, concatenate them with a base stylesheet, process through lightningcss, and cache the result in memory. A content hash in the filename enables indefinite browser caching.
Base stylesheet
A base.css file contains resets, custom properties, and global styles that don’t belong to any component:
*,
*::before,
*::after {
box-sizing: border-box;
}
:root {
--text: #1a1a2e;
--text-muted: #6b7280;
--bg: #ffffff;
--border: #e5e7eb;
--accent: #2563eb;
--nav-bg: #f9fafb;
--nav-link: #374151;
--nav-link-active: #1a1a2e;
}
body {
font-family: system-ui, -apple-system, sans-serif;
color: var(--text);
background: var(--bg);
margin: 0;
line-height: 1.6;
}
Build and serve the stylesheet
use lightningcss::stylesheet::{StyleSheet, ParserOptions, MinifyOptions};
use lightningcss::printer::PrinterOptions;
use lightningcss::targets::{Targets, Browsers};
use std::sync::LazyLock;
pub struct CssFragment(pub &'static str);
inventory::collect!(CssFragment);
static BASE_CSS: &str = include_str!("../assets/base.css");
pub struct ProcessedCss {
pub body: String,
pub filename: String,
pub route: String,
}
static STYLESHEET: LazyLock<ProcessedCss> = LazyLock::new(|| build_stylesheet());
pub fn stylesheet() -> &'static ProcessedCss {
&STYLESHEET
}
fn build_stylesheet() -> ProcessedCss {
let mut raw = String::from(BASE_CSS);
for fragment in inventory::iter::<CssFragment> {
raw.push('\n');
raw.push_str(fragment.0);
}
let targets = Targets::from(Browsers {
chrome: Some(95 << 16),
firefox: Some(90 << 16),
safari: Some(15 << 16),
..Browsers::default()
});
let mut sheet = StyleSheet::parse(&raw, ParserOptions {
filename: "styles.css".to_string(),
..ParserOptions::default()
})
.expect("CSS parse error");
sheet
.minify(MinifyOptions {
targets,
..MinifyOptions::default()
})
.expect("CSS minify error");
let result = sheet
.to_css(PrinterOptions {
minify: true,
targets,
..PrinterOptions::default()
})
.expect("CSS print error");
let hash = {
use std::hash::{Hash, Hasher};
let mut hasher = std::collections::hash_map::DefaultHasher::new();
result.code.hash(&mut hasher);
format!("{:x}", hasher.finish())
};
let filename = format!("style.{hash}.css");
let route = format!("/assets/{filename}");
ProcessedCss {
body: result.code,
filename,
route,
}
}
The LazyLock ensures the CSS is built once on first access and cached for the lifetime of the process. include_str! embeds base.css into the binary at compile time, so the binary is self-contained.
Wire it into Axum
Expose the stylesheet as a route and make the filename available to the layout:
use axum::{
http::header,
response::IntoResponse,
routing::get,
Router,
};
mod styles;
mod components;
async fn css_handler() -> impl IntoResponse {
let css = styles::stylesheet();
(
[
(header::CONTENT_TYPE, "text/css"),
(header::CACHE_CONTROL, "public, max-age=31536000, immutable"),
],
css.body.clone(),
)
}
fn app() -> Router {
let css = styles::stylesheet();
Router::new()
.route(&css.route, get(css_handler))
}
The Cache-Control header tells browsers to cache the file for a year. Because the filename contains a content hash, deploying new CSS produces a new filename, and browsers fetch the new version automatically. Old cached versions expire naturally.
Reference the stylesheet in the layout
The layout component needs the hashed filename to build the <link> tag:
use maud::{html, Markup, DOCTYPE};
use crate::styles;
fn base_layout(title: &str, content: Markup) -> Markup {
let css = styles::stylesheet();
html! {
(DOCTYPE)
html lang="en" {
head {
meta charset="utf-8";
meta name="viewport" content="width=device-width, initial-scale=1";
title { (title) }
link rel="stylesheet" href=(css.route);
script src="/assets/htmx.min.js" defer {}
}
body {
(content)
}
}
}
}
Every page automatically references the current stylesheet version. When any component’s CSS changes, the hash changes, the filename changes, and browsers fetch the new file on the next page load.
How inventory works
inventory uses platform-specific linker constructor sections (the same mechanism as __attribute__((constructor)) in C). Each inventory::submit! call creates a static value and a constructor function that registers it in an atomic linked list. The OS loader runs all constructors before main() starts, so by the time your application code runs, every fragment is already registered and inventory::iter yields them all.
Three things to keep in mind:
- No ordering guarantees. Fragments are yielded in whatever order the linker placed them. If CSS cascade order matters between components, switch to a struct with a
weight field and sort after collecting. In practice, well-scoped component styles rarely depend on source order.
- Same-crate usage is safe. The known linker dead-code-elimination issue (where submitted items in an unreferenced crate get stripped) does not apply when
collect! and submit! are in the same crate. For a workspace with multiple crates, ensure each crate that submits fragments is referenced by at least one symbol in the binary crate.
submit! is module-level only. It cannot appear inside a function body. It is a static declaration, not a runtime statement.
Putting it together
The full flow:
base.css contains resets, custom properties, and global styles. It is embedded with include_str!.
- Each component file uses
inventory::submit! to register its CSS alongside its Maud markup.
- At startup,
build_stylesheet() concatenates the base CSS with all registered fragments, processes the result through lightningcss, and hashes the output.
- The hashed filename is available to the layout via
styles::stylesheet().route.
- A single Axum route serves the processed CSS from memory with long-lived cache headers.
No build step. No CSS preprocessor. No file watchers. The Rust compiler and lightningcss handle everything at compile time and startup.
Data
Database with PostgreSQL and SQLx
SQLx is an async database library for Rust that checks your SQL queries against a real PostgreSQL database at compile time. If a query references a column that does not exist, uses the wrong type, or has a syntax error, the compiler catches it before the application runs. This is the primary reason to choose SQLx over other database libraries.
SQLx is not an ORM. There is no query builder, no model macros, and no schema-to-struct code generation. Write SQL directly, and SQLx verifies it.
Setup
Add SQLx to your Cargo.toml:
[dependencies]
sqlx = { version = "0.8", features = [
"runtime-tokio",
"tls-rustls-ring-webpki",
"postgres",
"macros",
"migrate",
] }
Feature breakdown:
runtime-tokio selects the Tokio async runtime.
tls-rustls-ring-webpki enables TLS via rustls with WebPKI certificate roots. For local development without TLS, this still needs to be present but the connection will negotiate plaintext if the server allows it.
postgres enables the PostgreSQL driver.
macros enables query!, query_as!, and the other compile-time checked query macros.
migrate enables the migration runner and migrate! macro.
Add type integration features as needed:
sqlx = { version = "0.8", features = [
"runtime-tokio",
"tls-rustls-ring-webpki",
"postgres",
"macros",
"migrate",
"uuid",
"time",
"json",
] }
These enable uuid::Uuid, time crate date/time types, and serde_json::Value / Json<T> for JSONB columns, respectively.
Install the CLI
The sqlx-cli tool manages databases and migrations:
cargo install sqlx-cli --no-default-features --features rustls,postgres
This installs only PostgreSQL support, which keeps the build faster than the full default install.
Connecting to PostgreSQL
SQLx reads the database connection string from the DATABASE_URL environment variable. Set it in a .env file at the project root:
DATABASE_URL=postgres://myapp:password@localhost:5432/myapp_dev
The format is postgres://user:password@host:port/database. SQLx’s macros use dotenvy to read .env automatically at compile time.
PostgreSQL itself should be running as a Docker container managed by Docker Compose. See the Development Environment section for the container setup.
Connection pooling
Create a connection pool at application startup and share it through Axum’s application state. PgPool is internally reference-counted, so cloning it is cheap.
use sqlx::postgres::PgPoolOptions;
use sqlx::PgPool;
let pool = PgPoolOptions::new()
.max_connections(5)
.connect(&std::env::var("DATABASE_URL").expect("DATABASE_URL must be set"))
.await
.expect("failed to connect to database");
Pass the pool into your Axum AppState:
#[derive(Clone)]
struct AppState {
db: PgPool,
}
let app = Router::new()
.route("/", get(index))
.with_state(AppState { db: pool });
Handlers extract it with State:
async fn list_users(State(state): State<AppState>) -> impl IntoResponse {
let users = sqlx::query_as!(User, "SELECT id, name, email FROM users")
.fetch_all(&state.db)
.await
.unwrap();
}
The default pool configuration is reasonable for most applications:
| Option | Default | Purpose |
max_connections | 10 | Maximum connections in the pool |
min_connections | 0 | Minimum idle connections maintained |
acquire_timeout | 30s | How long to wait for a connection |
idle_timeout | 10 min | Close idle connections after this duration |
max_lifetime | 30 min | Close connections older than this |
Override them on PgPoolOptions if needed. For most web applications, setting max_connections to match your expected concurrency and leaving the rest at defaults works well.
For lazy connection establishment (useful in tests or CLIs where the database might not be needed):
let pool = PgPoolOptions::new()
.max_connections(5)
.connect_lazy(&database_url)?;
This returns immediately. Connections are established on first use.
Compile-time checked queries
The query! macro is the core of SQLx. At compile time, it connects to the database specified by DATABASE_URL, sends the query to PostgreSQL for parsing and type-checking, and generates Rust code that matches the result columns.
query!
query! returns an anonymous record type with fields matching the query’s output columns:
let row = sqlx::query!("SELECT id, name, email FROM users WHERE id = $1", user_id)
.fetch_one(&pool)
.await?;
Bind parameters use PostgreSQL’s $1, $2, … syntax. The macro checks that the number and types of bind arguments match what the query expects.
query_as!
query_as! maps results directly into a named struct:
struct User {
id: i32,
name: String,
email: String,
}
let user = sqlx::query_as!(User, "SELECT id, name, email FROM users WHERE id = $1", user_id)
.fetch_one(&pool)
.await?;
The macro generates a struct literal, matching column names to field names. It does not use the FromRow trait. The struct does not need any derive macros.
Fetch methods
Choose the fetch method based on how many rows you expect:
| Method | Returns | Use when |
.execute(&pool) | PgQueryResult | INSERT, UPDATE, DELETE with no RETURNING |
.fetch_one(&pool) | T | Exactly one row expected (errors if zero or multiple) |
.fetch_optional(&pool) | Option<T> | Zero or one row |
.fetch_all(&pool) | Vec<T> | Collect all rows into a Vec |
.fetch(&pool) | impl Stream<Item = Result<T>> | Stream rows without buffering |
fetch_one returns an error if the query produces zero rows or more than one. Use fetch_optional when the row might not exist.
Nullable columns
The macro infers nullability from the database schema. A column with a NOT NULL constraint maps to T; a nullable column maps to Option<T>.
Override nullability in the column alias when the macro gets it wrong (common with expressions, COALESCE, or complex joins):
sqlx::query!(r#"SELECT count(*) as "count!" FROM users"#)
sqlx::query!(r#"SELECT name as "name?" FROM users"#)
sqlx::query!(r#"SELECT id as "id!: uuid::Uuid" FROM users"#)
The override syntax uses the column alias in double quotes:
"col!" forces non-null
"col?" forces nullable
"col: Type" overrides the Rust type
"col!: Type" forces non-null with a type override
RETURNING clauses
PostgreSQL’s RETURNING clause turns INSERT, UPDATE, and DELETE into queries that produce rows. Use fetch_one with query_as! to get the created or modified record back:
let user = sqlx::query_as!(
User,
"INSERT INTO users (name, email) VALUES ($1, $2) RETURNING id, name, email",
name,
email
)
.fetch_one(&pool)
.await?;
This avoids a separate SELECT after every insert.
Offline mode for CI
Compile-time query checking requires a running PostgreSQL database. In CI environments where a database is not available during compilation, SQLx provides offline mode.
- With the database running locally, generate the query cache:
cargo sqlx prepare --workspace
This creates a .sqlx/ directory containing metadata for every compile-time checked query in the project.
-
Commit .sqlx/ to version control.
-
When DATABASE_URL is absent at compile time and .sqlx/ exists, the macros use the cached metadata instead of connecting to a database.
-
In CI, verify the cache is up to date:
cargo sqlx prepare --workspace --check
This fails if any query has changed without regenerating the cache, catching stale metadata before it causes runtime surprises.
To include queries from tests and other non-default targets:
cargo sqlx prepare --workspace -- --all-targets --all-features
Set SQLX_OFFLINE=true to force offline mode even when DATABASE_URL is present. This is useful for verifying that the offline cache works correctly.
Writing and organising queries
Keep queries inline, next to the code that uses them. SQLx’s macros are designed for this: the query text and its bind parameters live together in the handler or module function, so the reader sees the full picture without jumping between files.
pub async fn find_user_by_email(pool: &PgPool, email: &str) -> Result<Option<User>, sqlx::Error> {
sqlx::query_as!(
User,
"SELECT id, name, email, created_at FROM users WHERE email = $1",
email
)
.fetch_optional(pool)
.await
}
pub async fn create_user(pool: &PgPool, name: &str, email: &str) -> Result<User, sqlx::Error> {
sqlx::query_as!(
User,
"INSERT INTO users (name, email) VALUES ($1, $2) RETURNING id, name, email, created_at",
name,
email
)
.fetch_one(pool)
.await
}
For queries that are genuinely long (complex joins, CTEs), query_file_as! reads SQL from a separate file:
SELECT u.id, u.name, u.email, count(p.id) as "post_count!"
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
GROUP BY u.id, u.name, u.email
ORDER BY u.name
let users = sqlx::query_file_as!(UserWithPosts, "queries/users_with_posts.sql")
.fetch_all(&pool)
.await?;
File paths are relative to the crate’s Cargo.toml directory. The file is still checked at compile time against the database.
Mapping query results to Rust types
With macros (preferred)
query_as! maps columns to struct fields by name. The struct needs no special derives:
struct User {
id: i32,
name: String,
email: String,
bio: Option<String>,
created_at: time::OffsetDateTime,
}
let users = sqlx::query_as!(User, "SELECT id, name, email, bio, created_at FROM users")
.fetch_all(&pool)
.await?;
The macro matches column names to field names at compile time. If the types do not match (e.g., a NOT NULL TEXT column mapped to i32), compilation fails.
With FromRow (runtime)
For cases where compile-time checking is not available (dynamic queries, generic code), use sqlx::FromRow:
#[derive(Debug, sqlx::FromRow)]
struct User {
id: i32,
name: String,
email: String,
bio: Option<String>,
}
let users: Vec<User> = sqlx::query_as::<_, User>("SELECT id, name, email, bio FROM users")
.fetch_all(&pool)
.await?;
Note the distinction: query_as! (with !) is a macro that checks at compile time and does not use FromRow. query_as::<_, T>() (without !) is a runtime function that requires T: FromRow.
FromRow supports field-level attributes for column renaming, defaults, and type conversion:
#[derive(sqlx::FromRow)]
struct User {
id: i32,
#[sqlx(rename = "user_name")]
name: String,
#[sqlx(default)]
role: String,
}
PostgreSQL type mappings
SQLx maps PostgreSQL types to Rust types. The common mappings, using the feature flags from the setup above:
| PostgreSQL | Rust | Feature |
BOOL | bool | |
INT2 / SMALLINT | i16 | |
INT4 / INT | i32 | |
INT8 / BIGINT | i64 | |
FLOAT4 / REAL | f32 | |
FLOAT8 / DOUBLE PRECISION | f64 | |
TEXT, VARCHAR | String | |
BYTEA | Vec<u8> | |
UUID | uuid::Uuid | uuid |
TIMESTAMPTZ | time::OffsetDateTime | time |
TIMESTAMP | time::PrimitiveDateTime | time |
DATE | time::Date | time |
TIME | time::Time | time |
JSON, JSONB | serde_json::Value or Json<T> | json |
INT4[], TEXT[], etc. | Vec<T> | |
UUID
UUID primary keys are common in web applications. Enable the uuid feature and use uuid::Uuid directly:
use uuid::Uuid;
struct User {
id: Uuid,
name: String,
email: String,
}
let user = sqlx::query_as!(
User,
"INSERT INTO users (id, name, email) VALUES ($1, $2, $3) RETURNING id, name, email",
Uuid::new_v4(),
name,
email
)
.fetch_one(&pool)
.await?;
Add uuid to your direct dependencies too, since you will construct values from it:
uuid = { version = "1", features = ["v4"] }
Timestamps with the time crate
Enable the time feature for date and time support. TIMESTAMPTZ columns map to time::OffsetDateTime, which carries a UTC offset:
use time::OffsetDateTime;
struct AuditEntry {
id: i32,
action: String,
created_at: OffsetDateTime,
}
let entry = sqlx::query_as!(
AuditEntry,
"INSERT INTO audit_log (action) VALUES ($1) RETURNING id, action, created_at",
action
)
.fetch_one(&pool)
.await?;
PostgreSQL stores TIMESTAMPTZ in UTC internally. The OffsetDateTime you receive will always have a UTC offset.
For the time crate, add it as a direct dependency:
time = "0.3"
JSONB
JSONB is useful for semi-structured data that does not warrant its own columns. Enable the json feature and use serde_json::Value for unstructured JSON or sqlx::types::Json<T> for typed deserialization:
use sqlx::types::Json;
#[derive(serde::Serialize, serde::Deserialize)]
struct Preferences {
theme: String,
notifications: bool,
}
sqlx::query!(
"UPDATE users SET preferences = $1 WHERE id = $2",
Json(&prefs) as _,
user_id
)
.execute(&pool)
.await?;
let row = sqlx::query!(
r#"SELECT preferences as "preferences!: Json<Preferences>" FROM users WHERE id = $1"#,
user_id
)
.fetch_one(&pool)
.await?;
let prefs: Preferences = row.preferences.0;
The as _ cast on the insert side is required to help the macro infer the correct PostgreSQL type. On the read side, the type override in the column alias tells the macro to deserialise into Json<Preferences>.
Custom enum types
Map PostgreSQL enum types to Rust enums with sqlx::Type:
#[derive(Debug, sqlx::Type)]
#[sqlx(type_name = "user_role", rename_all = "lowercase")]
enum UserRole {
Admin,
Member,
Guest,
}
This corresponds to a PostgreSQL type created with:
CREATE TYPE user_role AS ENUM ('admin', 'member', 'guest');
Use the enum directly in queries:
sqlx::query!(
"INSERT INTO users (name, role) VALUES ($1, $2)",
name,
role as UserRole
)
.execute(&pool)
.await?;
The as UserRole cast tells the macro which Rust type to use for encoding.
Transactions
A transaction groups multiple queries into an atomic unit. Either all succeed and the changes are committed, or any failure rolls everything back.
Start a transaction with pool.begin():
let mut tx = pool.begin().await?;
let user = sqlx::query_as!(
User,
"INSERT INTO users (name, email) VALUES ($1, $2) RETURNING id, name, email",
name,
email
)
.execute(&mut *tx)
.await?;
sqlx::query!(
"INSERT INTO audit_log (user_id, action) VALUES ($1, $2)",
user.id,
"account_created"
)
.execute(&mut *tx)
.await?;
tx.commit().await?;
Pass the transaction to queries with &mut *tx. This dereferences the Transaction to the underlying connection and reborrows it.
If commit() is never called, the transaction rolls back when it is dropped. This makes the ? operator transaction-safe: if any query fails and the function returns early, the transaction is dropped and automatically rolled back.
async fn transfer(
pool: &PgPool,
from_id: i32,
to_id: i32,
amount: i64,
) -> Result<(), sqlx::Error> {
let mut tx = pool.begin().await?;
sqlx::query!(
"UPDATE accounts SET balance = balance - $1 WHERE id = $2",
amount,
from_id
)
.execute(&mut *tx)
.await?;
sqlx::query!(
"UPDATE accounts SET balance = balance + $1 WHERE id = $2",
amount,
to_id
)
.execute(&mut *tx)
.await?;
tx.commit().await?;
Ok(())
}
For explicit rollback (useful when a business rule fails after the queries succeed):
if balance_too_low {
tx.rollback().await?;
return Err();
}
Gotchas
DATABASE_URL must be set at compile time. The query! macros connect to PostgreSQL during compilation. If the variable is missing and no .sqlx/ cache exists, compilation fails. Keep a .env file in your project root for local development.
*&mut tx syntax. Passing a transaction to a query requires &mut *tx, not &mut tx or &tx. The Transaction type implements DerefMut to the underlying connection; the dereference-reborrow is needed for the borrow checker.
Column name matching in query_as!. The column names in the SELECT must match the struct field names exactly. Use AS to rename columns if the database naming convention differs:
sqlx::query_as!(
User,
"SELECT id, user_name AS name FROM users"
)
Nullable inference in expressions. The macro sometimes cannot determine nullability for computed expressions (count(*), COALESCE, subqueries). Use the "col!" override to tell it the result is non-null:
sqlx::query!(r#"SELECT count(*) as "total!" FROM users"#)
Pool exhaustion. If all connections are in use and acquire_timeout is reached, the next query fails. This usually means the pool is too small for the application’s concurrency, or a handler is holding a connection too long (a common cause is doing non-database work while a transaction is open). Keep transactions short.
Database Migrations
Migrations track every change to your database schema as versioned SQL files. SQLx includes a migration system that runs these files in order, records which have been applied, and validates that applied migrations have not been modified. The same sqlx-cli tool installed in the database section manages the full lifecycle.
Creating migrations
Generate a new migration with sqlx migrate add. Use the -r flag to create reversible migrations, which produce a .up.sql and .down.sql pair:
sqlx migrate add -r create_users
This creates two files in the migrations/ directory at the project root:
migrations/
20260226140000_create_users.up.sql
20260226140000_create_users.down.sql
The timestamp prefix is generated in UTC and determines execution order. Timestamp versioning is the default and prevents conflicts when multiple developers create migrations concurrently.
Once the first migration uses -r, subsequent calls to sqlx migrate add will produce reversible pairs automatically. The CLI infers the mode from existing files.
Writing the SQL
The .up.sql file contains the forward schema change:
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email TEXT NOT NULL UNIQUE,
name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
The .down.sql file reverses it:
DROP TABLE users;
Keep each migration focused on a single change. A migration that creates a table should not also modify a different table. This makes reverting predictable and keeps the history readable.
Running migrations
At application startup
The migrate! macro embeds migration files directly into the compiled binary. Call .run() on the pool at startup to apply any pending migrations before the application begins serving requests:
use sqlx::PgPool;
#[tokio::main]
async fn main() {
let pool = PgPool::connect(&std::env::var("DATABASE_URL").expect("DATABASE_URL must be set"))
.await
.expect("failed to connect to database");
sqlx::migrate!()
.run(&pool)
.await
.expect("failed to run migrations");
}
migrate!() reads from the migrations/ directory relative to Cargo.toml. The migration SQL is baked into the binary at compile time, so the deployed binary is self-contained, it does not need the migration files on disk.
This is the simplest deployment model. One binary, one process, and the schema is always in sync with the code.
With the CLI
For larger deployments where migrations should run as a separate step before the application starts, use the CLI directly:
sqlx migrate run
This reads DATABASE_URL from the environment or a .env file. The CLI approach gives you explicit control over when schema changes happen, which matters when you have multiple application instances starting simultaneously, need to run migrations from a CI pipeline before deployment, or want human review of what will be applied before it runs.
The two approaches are not mutually exclusive. migrate run is idempotent: it skips any migration already recorded in the database. You can run migrations from the CLI in your deployment pipeline and keep sqlx::migrate!().run(&pool) in your application code as a safety net.
Recompilation caveat
The migrate! macro runs at compile time, but Cargo does not automatically detect changes to non-Rust files. Adding a new .sql migration without modifying any .rs file will not trigger recompilation. The application will silently use the old set of migrations.
Fix this by generating a build.rs that watches the migrations directory:
sqlx migrate build-script
This creates a build.rs at the project root:
fn main() {
println!("cargo:rerun-if-changed=migrations");
}
Commit this file. With it in place, any change to the migrations/ directory triggers a rebuild.
Reverting migrations
Revert the most recently applied migration:
sqlx migrate revert
This runs the .down.sql file for the last applied migration. Run it multiple times to step back further, or target a specific version:
sqlx migrate revert --target-version 20260226140000
sqlx migrate revert --target-version 0
Reverting is primarily a development tool. In production, writing a new forward migration to undo a change is usually safer than reverting, because other parts of the system may already depend on the schema change.
Checking migration status
Inspect which migrations have been applied and whether any are out of sync:
sqlx migrate info
This prints each migration’s version, description, applied status, and whether its checksum matches the file on disk. Use this to diagnose problems before making changes, especially in shared environments.
How SQLx tracks migrations
SQLx creates a _sqlx_migrations table automatically on first run. It records each applied migration’s version, description, checksum (SHA-256 of the SQL content), execution time, and success status.
Two behaviours follow from this:
Checksum validation. Every time migrations run, SQLx compares the stored checksum for each already-applied migration against the current file on disk. If a file has been edited after it was applied, SQLx raises an error. This catches accidental edits to applied migrations. If you need to correct a mistake, write a new migration rather than editing the old one.
Dirty state detection. If a migration fails partway through, its row may be recorded with success = false. SQLx refuses to run further migrations until the dirty state is resolved. In development, the simplest fix is to drop and recreate the database. In production, investigate the failure, fix it manually, and update the row.
Managing migrations across environments
Development
The typical workflow during development:
sqlx database create
sqlx migrate run
sqlx database drop
sqlx database create
sqlx migrate run
CI
In CI, create a disposable database, apply migrations, and verify the offline query cache is up to date:
sqlx database create
sqlx migrate run
cargo sqlx prepare --workspace --check
The --check flag fails the build if any query! macro’s cached metadata in .sqlx/ is stale. This enforces that developers run cargo sqlx prepare after schema changes.
Production
For applications using the embedded migrate!() macro, no separate migration step is needed. The binary applies its own migrations on startup.
For CLI-based deployments, run sqlx migrate run as part of the deployment process, before starting the application. In Docker, this is typically an entrypoint script or an init container. The --dry-run flag shows what would be applied without executing, useful for pre-deployment review:
sqlx migrate run --dry-run
Concurrency safety
SQLx acquires a PostgreSQL advisory lock before running migrations. If multiple instances start simultaneously, only one will apply migrations while the others wait. This prevents race conditions during rolling deployments.
Gotchas
Never edit an applied migration. The checksum validation will reject it. Write a new corrective migration instead.
Don’t mix simple and reversible migrations. SQLx infers the migration type from existing files. Stick with one style (reversible, using -r) throughout the project.
Commit build.rs and .sqlx/. The build.rs file (from sqlx migrate build-script) ensures new migrations trigger recompilation. The .sqlx/ directory (from cargo sqlx prepare) enables compilation without a live database. Both belong in version control.
DATABASE_URL takes precedence over .sqlx/. In CI, if DATABASE_URL is set during compilation, the query! macros will try to connect to it rather than using the offline cache. Set SQLX_OFFLINE=true explicitly when you want to force offline mode.
Search
PostgreSQL ships a full-text search engine. For most content-heavy and CRUD applications, it is the right starting point: no extra service to run, no index to keep in sync, and search results are transactionally consistent with your writes. Start here, and graduate to a dedicated search engine only when you hit a specific limitation that PostgreSQL cannot address.
This section covers PostgreSQL full-text search and trigram matching, the SQLx patterns for using them from Rust, building a search UI with HTMX and Maud, and when and how to move to Meilisearch.
PostgreSQL full-text search
PostgreSQL full-text search works by converting text into tsvector (a sorted list of normalised lexemes) and matching it against a tsquery (a search predicate). The engine handles stemming, stop-word removal, and ranking.
Schema setup
Add a tsvector column to your table using GENERATED ALWAYS AS ... STORED. PostgreSQL maintains it automatically on every insert and update.
CREATE TABLE articles (
id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
body TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
search_vector tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(body, '')), 'B')
) STORED
);
setweight assigns a weight label (A, B, C, or D) to lexemes. Title matches weighted A will rank higher than body matches weighted B when you use the ranking functions.
Create a GIN index on the column:
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
Without this index, every search query scans the full table and recomputes tsvectors. With it, PostgreSQL uses an inverted index to look up only rows containing the matching lexemes.
Building search queries
websearch_to_tsquery is the best choice for user-facing search. It accepts Google-like syntax (quoted phrases, - for exclusion, OR), and it never raises a syntax error on malformed input.
SELECT websearch_to_tsquery('english', 'rust web framework');
The @@ operator matches a tsvector against a tsquery:
SELECT id, title
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'rust web framework')
ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', 'rust web framework')) DESC
LIMIT 20;
Other tsquery constructors exist for specific needs:
| Function | Behaviour |
websearch_to_tsquery | Google-like syntax, never errors. Best for user input. |
plainto_tsquery | Inserts & (AND) between all words. No special syntax. |
phraseto_tsquery | Inserts <-> (adjacent) between words. For exact phrase matching. |
to_tsquery | Requires explicit operators (&, |, !, <->). For programmatic query building. |
Ranking results
ts_rank_cd uses cover density ranking, which rewards documents where matching terms appear close together. It generally produces better results than ts_rank for multi-term queries.
SELECT id, title,
ts_rank_cd(search_vector, query) AS rank
FROM articles, websearch_to_tsquery('english', 'rust web') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;
The weights array controls how much each label contributes to the rank. The default is {0.1, 0.2, 0.4, 1.0} for D, C, B, A respectively. Override it when you need different weighting:
ts_rank_cd('{0.1, 0.2, 0.4, 1.0}', search_vector, query)
Highlighting search results
ts_headline generates a text snippet with matching terms wrapped in markers:
SELECT id, title,
ts_headline('english', body, query,
'StartSel=<mark>, StopSel=</mark>, MaxWords=35, MinWords=15, MaxFragments=2')
AS snippet
FROM articles, websearch_to_tsquery('english', 'rust web') AS query
WHERE search_vector @@ query
ORDER BY ts_rank_cd(search_vector, query) DESC
LIMIT 20;
ts_headline is expensive. It re-parses the original text for every row. Always apply it only to rows that have already been filtered and limited.
Search queries with SQLx
SQLx does not have native Rust types for tsvector or tsquery. This is not a problem in practice: keep the FTS logic in SQL, bind the search term as a String, and return only types SQLx understands.
ts_rank and ts_rank_cd return float4 (maps to f32). ts_headline returns text (maps to String). The @@ operator returns bool. All work directly with SQLx’s compile-time checked macros.
struct SearchResult {
id: i64,
title: String,
snippet: String,
rank: f32,
}
pub async fn search_articles(
pool: &PgPool,
query: &str,
limit: i64,
) -> Result<Vec<SearchResult>, sqlx::Error> {
sqlx::query_as!(
SearchResult,
r#"
SELECT
id,
title,
ts_headline('english', body, websearch_to_tsquery('english', $1),
'StartSel=<mark>, StopSel=</mark>, MaxWords=35, MinWords=15')
as "snippet!",
ts_rank_cd(search_vector, websearch_to_tsquery('english', $1))
as "rank!"
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', $1)
ORDER BY rank DESC
LIMIT $2
"#,
query,
limit
)
.fetch_all(pool)
.await
}
The "snippet!" and "rank!" column aliases force SQLx to treat these as non-nullable. Without the ! suffix, the macro infers Option<String> and Option<f32> for computed columns, even though these functions never return NULL for non-null inputs.
Do not use SELECT * on tables with tsvector columns. The query! and query_as! macros will fail at compile time because SQLx has no Rust type for tsvector. Always list your columns explicitly, omitting the tsvector column or casting it with ::text if you genuinely need its contents.
pg_trgm for fuzzy matching
PostgreSQL full-text search is lexeme-exact after normalisation. If a user types “postgre” instead of “postgresql”, FTS will not match. The pg_trgm extension fills this gap with trigram-based similarity matching, providing typo tolerance that FTS lacks.
Enable the extension
Add a migration:
CREATE EXTENSION IF NOT EXISTS pg_trgm;
pg_trgm is a contrib extension shipped with PostgreSQL but not enabled by default. The compile-time query! macros connect to your development database, so the extension must be installed there too.
Similarity search
A trigram is a sequence of three consecutive characters. Two strings are similar if they share many trigrams. The similarity function returns a score between 0.0 and 1.0:
SELECT similarity('postgresql', 'postgre');
The % operator returns true when similarity exceeds a threshold (default 0.3, configurable with SET pg_trgm.similarity_threshold):
SELECT title, similarity(title, 'postgre') AS sml
FROM articles
WHERE title % 'postgre'
ORDER BY sml DESC
LIMIT 10;
word_similarity compares a search term against substrings of a longer text. It is better suited when searching for a word within a title or sentence:
SELECT title, word_similarity('serch', title) AS sml
FROM articles
WHERE 'serch' <% title
ORDER BY sml DESC
LIMIT 10;
Indexing for trigram queries
Create a GIN index with the gin_trgm_ops operator class:
CREATE INDEX idx_articles_title_trgm ON articles USING GIN (title gin_trgm_ops);
This index supports the % operator, LIKE, ILIKE, and regex patterns. Without it, every trigram query requires a sequential scan.
If you need KNN (K-Nearest Neighbour) ordering with the <-> distance operator, use a GiST index instead:
CREATE INDEX idx_articles_title_trgm_gist ON articles USING GiST (title gist_trgm_ops);
GiST supports ORDER BY title <-> 'search term' directly, which GIN does not.
Trigram queries with SQLx
similarity and word_similarity take text inputs and return real (maps to f32). No casting workarounds needed.
struct FuzzyResult {
id: i64,
title: String,
similarity: f32,
}
pub async fn fuzzy_search(
pool: &PgPool,
query: &str,
limit: i64,
) -> Result<Vec<FuzzyResult>, sqlx::Error> {
sqlx::query_as!(
FuzzyResult,
r#"
SELECT
id,
title,
similarity(title, $1) as "similarity!"
FROM articles
WHERE title % $1
ORDER BY similarity DESC
LIMIT $2
"#,
query,
limit
)
.fetch_all(pool)
.await
}
Combining FTS and trigram search
A practical search function tries full-text search first for precise, ranked results, then falls back to trigram matching for typo tolerance:
pub async fn search(
pool: &PgPool,
query: &str,
limit: i64,
) -> Result<Vec<SearchResult>, sqlx::Error> {
let results = search_articles(pool, query, limit).await?;
if results.is_empty() {
return sqlx::query_as!(
SearchResult,
r#"
SELECT
id,
title,
'' as "snippet!",
similarity(title, $1) as "rank!"
FROM articles
WHERE title % $1
ORDER BY rank DESC
LIMIT $2
"#,
query,
limit
)
.fetch_all(pool)
.await;
}
Ok(results)
}
You can also combine both in a single query with a weighted score, but the fallback pattern is simpler to reason about and avoids the cost of trigram comparison on every row when FTS already produces good results.
Search UI with HTMX
A search interface needs a text input that sends queries as the user types, a target element where results appear, and debouncing to avoid flooding the server with requests on every keystroke. HTMX handles all of this declaratively.
fn search_input(query: &str) -> Markup {
html! {
input type="search" name="q" value=(query)
placeholder="Search articles..."
hx-get="/search"
hx-trigger="input changed delay:300ms, keyup[key=='Enter'], search"
hx-target="#search-results"
hx-sync="this:replace"
hx-replace-url="true"
hx-indicator="#search-spinner";
span #search-spinner .htmx-indicator { "Searching..." }
}
}
The trigger configuration:
input changed delay:300ms debounces: fires 300ms after the user stops typing, and only if the value actually changed.
keyup[key=='Enter'] fires immediately on Enter.
search fires when the user clicks the browser’s native clear button on <input type="search">.
hx-sync="this:replace" cancels any in-flight request and replaces it with the new one. Without this, a slow response for “ab” could arrive after a fast response for “abc” and overwrite the correct results with stale ones.
hx-replace-url="true" updates the browser URL bar to /search?q=... without creating a history entry for every keystroke. The user can copy, bookmark, or share the URL.
The results fragment
fn search_results(results: &[SearchResult]) -> Markup {
html! {
@if results.is_empty() {
p .no-results { "No articles found." }
} @else {
@for result in results {
article .search-result {
h3 {
a href={ "/articles/" (result.id) } { (result.title) }
}
p .snippet { (PreEscaped(&result.snippet)) }
}
}
}
}
}
Use PreEscaped for the snippet because ts_headline returns HTML with <mark> tags. The snippet content comes from your own database, not from user input, so this is safe.
The Axum handler
The handler serves both full page loads (direct navigation to /search?q=rust) and HTMX fragment requests (triggered by typing in the input). Detect the difference with the HX-Request header.
use axum::extract::{Query, State};
use axum::http::HeaderMap;
use axum::response::Html;
use maud::{html, Markup, PreEscaped};
#[derive(serde::Deserialize)]
pub struct SearchParams {
#[serde(default)]
q: String,
}
pub async fn search_handler(
headers: HeaderMap,
State(state): State<AppState>,
Query(params): Query<SearchParams>,
) -> Markup {
let results = if params.q.is_empty() {
vec![]
} else {
search(&state.db, ¶ms.q, 20)
.await
.unwrap_or_default()
};
let fragment = search_results(&results);
if headers.get("HX-Request").is_some() {
fragment
} else {
search_page(¶ms.q, fragment)
}
}
fn search_page(query: &str, results: Markup) -> Markup {
html! {
h1 { "Search" }
(search_input(query))
div #search-results {
(results)
}
}
}
When HTMX sends a request, the handler returns only the results fragment. When the user navigates directly to /search?q=rust, it returns the full page with the search input pre-populated and results already rendered. This makes search URLs bookmarkable and shareable.
Route setup
use axum::{routing::get, Router};
let app = Router::new()
.route("/search", get(search_handler))
.with_state(state);
When PostgreSQL search is not enough
PostgreSQL FTS handles most search requirements for content-heavy and CRUD applications. Recognise these limits so you know when to reach for a dedicated engine:
- No built-in typo tolerance.
pg_trgm helps, but it works on string similarity, not search-query-level fuzzy matching. A dedicated engine like Meilisearch handles typos automatically across all indexed fields.
- No faceted search. Counting results by category, tag, or date range alongside search results requires separate
GROUP BY queries. Dedicated engines provide facets as a first-class feature.
- Limited relevance tuning.
ts_rank and ts_rank_cd are basic. There is no equivalent to Elasticsearch’s function scoring, decay functions, or field-level boosting beyond four weight levels (A/B/C/D).
- Performance at scale. PostgreSQL FTS works well into the millions of rows for straightforward queries. Beyond that, GIN indexes become large and slow to update, and
ts_headline is CPU-intensive.
- No instant prefix matching. FTS matches complete lexemes. Searching for “rus” will not match “rust”. Dedicated engines handle prefix matching out of the box.
- No semantic matching. FTS matches words, not meaning. “How to fix a flat tire” will not find documents about “tire puncture repair”. For meaning-based retrieval, see Semantic Search.
If your application hits one or more of these limits and search is a primary user-facing feature, add Meilisearch or pgvector depending on what you need.
Meilisearch
Meilisearch is a search engine built in Rust with built-in typo tolerance, instant search, and faceted filtering. It runs as a separate service, providing a RESTful API that your application talks to via the Rust SDK.
Running Meilisearch in development
Add it to your Docker Compose file:
services:
meilisearch:
image: getmeili/meilisearch:v1.12
ports:
- "7700:7700"
environment:
MEILI_ENV: development
MEILI_MASTER_KEY: devMasterKey123
volumes:
- meili_data:/meili_data
volumes:
meili_data:
In development mode, Meilisearch exposes a web-based search preview UI at http://localhost:7700.
Rust SDK
Add the dependency:
[dependencies]
meilisearch-sdk = "0.28"
Index documents and search:
use meilisearch_sdk::client::Client;
#[derive(serde::Serialize, serde::Deserialize, Debug)]
struct Article {
id: i64,
title: String,
body: String,
}
let client = Client::new("http://localhost:7700", Some("devMasterKey123"))?;
let articles: Vec<Article> = fetch_all_articles(&pool).await?;
client.index("articles")
.add_documents(&articles, Some("id"))
.await?;
let results = client.index("articles")
.search()
.with_query("rrust web framwork")
.with_limit(20)
.execute::<Article>()
.await?;
Keeping the index in sync
PostgreSQL remains the source of truth. Meilisearch is a derived, read-optimised search layer. The simplest sync strategy is application-level dual write with a periodic full resync as a safety net.
Dual write: when your application inserts or updates an article in PostgreSQL, also push the document to Meilisearch:
pub async fn create_article(
pool: &PgPool,
meili: &Client,
title: &str,
body: &str,
) -> Result<Article, AppError> {
let article = sqlx::query_as!(
Article,
"INSERT INTO articles (title, body) VALUES ($1, $2) RETURNING id, title, body",
title, body
)
.fetch_one(pool)
.await?;
meili.index("articles")
.add_documents(&[&article], Some("id"))
.await?;
Ok(article)
}
Periodic resync: a background task queries PostgreSQL for rows modified since the last sync (using an updated_at column) and pushes them to Meilisearch. Run this every 30-60 seconds. It catches any drift caused by failed dual writes.
If the Meilisearch write fails, the search index is temporarily stale but the database is correct. Design your application to tolerate this eventual consistency.
When to use Meilisearch
Add Meilisearch when search is a primary user-facing feature and you need:
- Automatic typo tolerance across all indexed fields
- Faceted search and filtering
- Instant prefix matching (results as the user types each character)
- Relevance ranking that works well out of the box without manual tuning
Accept the operational cost: a separate service to run, a sync strategy to maintain, and eventual consistency between your database and search index.
tantivy
tantivy is an embedded full-text search library for Rust. Think of it as Lucene for Rust: you link it into your application directly, with no separate process or HTTP API. It provides BM25 scoring, configurable tokenisers with stemming support for 17 languages, phrase queries, and faceted search.
tantivy is a good fit when you need more powerful search than PostgreSQL FTS but want to avoid adding infrastructure. The index lives in your application process, so there is no sync problem and no network hop. The trade-off is that you manage the index lifecycle yourself, and the index writer holds an exclusive lock, which limits it to a single-process deployment (or requires designating one process as the indexer).
tantivy does not provide built-in typo tolerance. If you need automatic fuzzy matching, Meilisearch is a better choice.
Gotchas
websearch_to_tsquery never errors, to_tsquery does. Use websearch_to_tsquery or plainto_tsquery for user-facing search. to_tsquery requires valid operator syntax and will return a SQL error on malformed input like unbalanced parentheses.
Generated column expressions must be immutable. to_tsvector('english', title) with a string literal regconfig is immutable. If the language configuration comes from another column, you need a trigger instead of a generated column.
The pg_trgm extension must be installed in your development database. The query! macros connect at compile time. If the extension is missing, any query using similarity(), %, or related operators will fail to compile.
ts_headline on large result sets is slow. Always filter and limit rows before applying ts_headline. Never call it on the full table.
sqlx prepare needs extensions too. If you use cargo sqlx prepare for offline compilation in CI, the database used for preparation must have pg_trgm installed and the schema fully migrated.
Semantic Search
PostgreSQL full-text search matches words. Semantic search matches meaning. A user searching for “how to fix a flat tire” finds documents about “tire puncture repair” even though no words overlap. This is possible because text is converted into high-dimensional vectors (embeddings) that encode meaning, and similar meanings produce similar vectors.
pgvector adds vector similarity search to PostgreSQL. It introduces a native vector column type with distance operators and index support. If you already run PostgreSQL, adding semantic search requires an extension, not a new service. Your embeddings live alongside your relational data, with full ACID guarantees and SQL for filtering.
This section covers pgvector setup, generating embeddings with a local model, storing and querying vectors from Rust with SQLx, and combining vector similarity with full-text search for hybrid retrieval. For building complete RAG pipelines that feed retrieved context to an LLM, see Retrieval-Augmented Generation in the AI and LLM Integration section.
pgvector setup
Enable the extension
Add a migration:
CREATE EXTENSION IF NOT EXISTS vector;
vector is a contrib-style extension included in the standard postgres Docker image from PostgreSQL 17 onward. Cloud-managed PostgreSQL services (AWS RDS, Supabase, Neon) include it too.
Schema
Add a vector column sized to match your embedding model’s output dimensions. The example below uses 768 dimensions, which matches nomic-embed-text.
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(768),
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
Unlike the tsvector column used for full-text search, embeddings cannot be a GENERATED ALWAYS column. Generating an embedding requires calling an external model, which PostgreSQL cannot do in a column expression. Your application generates the embedding and writes it alongside the content.
Indexing
Create an HNSW (Hierarchical Navigable Small World) index for approximate nearest neighbour search:
CREATE INDEX idx_documents_embedding ON documents
USING hnsw (embedding vector_cosine_ops);
HNSW is the recommended index type. It provides logarithmic search time and handles data updates without degrading recall. The alternative, IVFFlat, builds faster and uses less space, but its recall degrades as data changes because cluster centroids are not recalculated.
Without an index, pgvector performs exact nearest neighbour search via sequential scan. This is fine for small datasets (under ~100K vectors) but does not scale.
The vector_cosine_ops operator class matches cosine distance (<=>), which is the right choice for text embeddings. Other operator classes exist for L2 distance (vector_l2_ops), inner product (vector_ip_ops), and others.
Tuning index parameters
HNSW accepts two build-time parameters:
CREATE INDEX idx_documents_embedding ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
m controls the maximum number of connections per node (default 16). Higher values improve recall but increase index size and build time.
ef_construction controls the search breadth during index building (default 64). Higher values produce a better quality index at the cost of slower builds.
At query time, hnsw.ef_search controls how many nodes the search visits (default 40). Increase it when you need higher recall:
SET hnsw.ef_search = 100;
The defaults work well for most workloads. Benchmark against your actual data before changing them.
Generating embeddings
An embedding model converts text into a fixed-size vector. You need one to populate the embedding column and to convert search queries into vectors at query time.
Local embeddings with Ollama
Ollama runs embedding models locally. It serves an HTTP API compatible with the OpenAI embeddings endpoint, so any client that speaks that protocol works.
Pull an embedding model:
ollama pull nomic-embed-text
nomic-embed-text produces 768-dimension vectors, supports 8,192 token context, and runs on commodity hardware. It scores competitively with commercial APIs on retrieval benchmarks.
Generate an embedding via the API:
curl http://localhost:11434/api/embed -d '{
"model": "nomic-embed-text",
"input": "How to handle errors in Rust web applications"
}'
The response includes an embeddings array containing one vector per input string.
Calling Ollama from Rust
Ollama’s /api/embed endpoint accepts JSON and returns JSON. Use reqwest directly:
use reqwest::Client;
use serde::{Deserialize, Serialize};
#[derive(Serialize)]
struct EmbedRequest {
model: String,
input: Vec<String>,
}
#[derive(Deserialize)]
struct EmbedResponse {
embeddings: Vec<Vec<f32>>,
}
pub async fn generate_embeddings(
client: &Client,
ollama_url: &str,
texts: &[&str],
) -> Result<Vec<Vec<f32>>, reqwest::Error> {
let response: EmbedResponse = client
.post(format!("{}/api/embed", ollama_url))
.json(&EmbedRequest {
model: "nomic-embed-text".to_string(),
input: texts.iter().map(|s| s.to_string()).collect(),
})
.send()
.await?
.json()
.await?;
Ok(response.embeddings)
}
Batch multiple texts in a single request. Ollama processes them together, which is faster than one request per text.
OpenAI as an alternative
If you prefer a hosted API, OpenAI’s text-embedding-3-small produces 1,536-dimension vectors at $0.02 per million tokens. Change the vector(768) column to vector(1536), swap the model name, and point the request at https://api.openai.com/v1/embeddings with a bearer token. The query patterns in this section work the same regardless of how the embedding was generated.
Storing and querying vectors with SQLx
The pgvector crate
The pgvector crate provides a Vector type that implements SQLx’s Encode and Decode traits.
[dependencies]
pgvector = { version = "0.4", features = ["sqlx"] }
Inserting documents with embeddings
use pgvector::Vector;
use sqlx::PgPool;
pub async fn insert_document(
pool: &PgPool,
title: &str,
content: &str,
embedding: Vec<f32>,
) -> Result<i64, sqlx::Error> {
let embedding = Vector::from(embedding);
sqlx::query_scalar!(
r#"
INSERT INTO documents (title, content, embedding)
VALUES ($1, $2, $3)
RETURNING id
"#,
title,
content,
embedding as _
)
.fetch_one(pool)
.await
}
The as _ cast tells SQLx to use the pgvector crate’s Encode implementation rather than trying to infer a type mapping for the vector column.
Similarity search
The <=> operator computes cosine distance. Lower distance means higher similarity. Order by distance ascending to get the most similar results first.
struct SimilarDocument {
id: i64,
title: String,
content: String,
similarity: f64,
}
pub async fn semantic_search(
pool: &PgPool,
query_embedding: Vec<f32>,
limit: i64,
) -> Result<Vec<SimilarDocument>, sqlx::Error> {
let embedding = Vector::from(query_embedding);
sqlx::query_as!(
SimilarDocument,
r#"
SELECT
id,
title,
content,
1 - (embedding <=> $1) as "similarity!"
FROM documents
ORDER BY embedding <=> $1
LIMIT $2
"#,
embedding as _,
limit
)
.fetch_all(pool)
.await
}
1 - cosine_distance converts the distance into a similarity score between 0.0 and 1.0, where 1.0 is identical.
Filtered similarity search
Combine vector similarity with standard SQL filtering. pgvector’s HNSW index supports iterative scans (v0.8.0+), so filtered queries return the expected number of results even when the filter is selective:
pub async fn search_by_category(
pool: &PgPool,
query_embedding: Vec<f32>,
category: &str,
limit: i64,
) -> Result<Vec<SimilarDocument>, sqlx::Error> {
let embedding = Vector::from(query_embedding);
sqlx::query_as!(
SimilarDocument,
r#"
SELECT
id,
title,
content,
1 - (embedding <=> $1) as "similarity!"
FROM documents
WHERE category = $2
ORDER BY embedding <=> $1
LIMIT $3
"#,
embedding as _,
category,
limit
)
.fetch_all(pool)
.await
}
Hybrid search
Vector similarity alone achieves roughly 62% retrieval precision. Combining it with full-text search using Reciprocal Rank Fusion (RRF) pushes this to roughly 84%. RRF merges two ranked result lists by converting ranks into scores and summing them, so a document that ranks well in both lists scores highest.
Schema for hybrid search
A table that supports both search strategies needs a tsvector column for FTS and a vector column for semantic search:
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(768),
search_vector tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(content, '')), 'B')
) STORED,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_documents_embedding ON documents
USING hnsw (embedding vector_cosine_ops);
CREATE INDEX idx_documents_search ON documents
USING gin (search_vector);
The hybrid search query
Run both search strategies, rank each result set independently, then merge with RRF:
WITH semantic AS (
SELECT id, title, content,
row_number() OVER (ORDER BY embedding <=> $1) AS rank
FROM documents
ORDER BY embedding <=> $1
LIMIT $3
),
fulltext AS (
SELECT id, title, content,
row_number() OVER (
ORDER BY ts_rank_cd(search_vector,
websearch_to_tsquery('english', $2)) DESC
) AS rank
FROM documents
WHERE search_vector @@ websearch_to_tsquery('english', $2)
LIMIT $3
),
combined AS (
SELECT id, title, content, rank, 'semantic' AS source FROM semantic
UNION ALL
SELECT id, title, content, rank, 'fulltext' AS source FROM fulltext
)
SELECT id, title, content,
sum(1.0 / (50 + rank)) AS score
FROM combined
GROUP BY id, title, content
ORDER BY score DESC
LIMIT $3;
The constant 50 in the RRF formula (1.0 / (50 + rank)) is a smoothing parameter. It prevents top-ranked results from dominating excessively. 50 is the standard value from the original RRF paper.
Hybrid search in Rust
struct HybridResult {
id: i64,
title: String,
content: String,
score: f64,
}
pub async fn hybrid_search(
pool: &PgPool,
query_embedding: Vec<f32>,
query_text: &str,
limit: i64,
) -> Result<Vec<HybridResult>, sqlx::Error> {
let embedding = Vector::from(query_embedding);
sqlx::query_as!(
HybridResult,
r#"
WITH semantic AS (
SELECT id, title, content,
row_number() OVER (ORDER BY embedding <=> $1) AS rank
FROM documents
ORDER BY embedding <=> $1
LIMIT $3
),
fulltext AS (
SELECT id, title, content,
row_number() OVER (
ORDER BY ts_rank_cd(search_vector,
websearch_to_tsquery('english', $2)) DESC
) AS rank
FROM documents
WHERE search_vector @@ websearch_to_tsquery('english', $2)
LIMIT $3
),
combined AS (
SELECT id, title, content, rank FROM semantic
UNION ALL
SELECT id, title, content, rank FROM fulltext
)
SELECT
id as "id!",
title as "title!",
content as "content!",
sum(1.0 / (50 + rank))::float8 as "score!"
FROM combined
GROUP BY id, title, content
ORDER BY score DESC
LIMIT $3
"#,
embedding as _,
query_text,
limit
)
.fetch_all(pool)
.await
}
The caller generates an embedding from the query text, then passes both the embedding and the raw text. The embedding drives the semantic branch; the raw text drives the FTS branch.
pub async fn search(
pool: &PgPool,
http_client: &reqwest::Client,
ollama_url: &str,
query: &str,
limit: i64,
) -> Result<Vec<HybridResult>, sqlx::Error> {
let embeddings = generate_embeddings(http_client, ollama_url, &[query])
.await
.map_err(|e| sqlx::Error::Protocol(e.to_string()))?;
hybrid_search(pool, embeddings.into_iter().next().unwrap(), query, limit).await
}
When to use semantic search
Add pgvector when your application needs to match by meaning rather than keywords:
- Knowledge base search. Users describe problems in their own words; documents use different terminology.
- Recommendation. “Show me articles similar to this one” is a single vector distance query.
- RAG retrieval. An LLM needs relevant context from your data to generate grounded answers. See Retrieval-Augmented Generation in the AI and LLM Integration section.
- Classification and clustering. Group documents by semantic similarity without manual tagging.
Stick with full-text search when exact keyword matching, boolean queries, or phrase search are what users expect. The two approaches complement each other, as the hybrid search pattern above demonstrates.
pgvector vs dedicated vector databases
pgvector handles up to a few million vectors comfortably. Beyond that, index builds become slow and memory-intensive. Dedicated vector databases (Qdrant, Weaviate, Pinecone) are built for horizontal scaling to billions of vectors.
For most content-heavy and CRUD web applications, pgvector is the right choice. Your embeddings share a database with the data they describe, transactions keep them consistent, and there is no sync pipeline to maintain. The same reasoning that makes PostgreSQL FTS the right starting point for keyword search applies here: start with what you have, and graduate to a dedicated service only when you hit a specific limitation.
Gotchas
The vector type has a dimension limit. Maximum 2,000 dimensions for vector, 4,000 for halfvec. Most embedding models produce 768 or 1,536 dimensions, which fit comfortably. OpenAI’s text-embedding-3-large at 3,072 dimensions exceeds the vector limit — reduce it to 1,536 via the API’s dimensions parameter.
Embeddings are not free to generate. Every document insert or update requires an embedding model call. For bulk imports, batch the embedding requests. For Ollama, send multiple texts in a single /api/embed request.
HNSW index builds can spike memory. Building an HNSW index on a large table may consume significant memory. For tables with millions of rows, build the index during a maintenance window and monitor resource usage.
IVFFlat recall degrades silently. If you use IVFFlat instead of HNSW, recall drops as your data changes because cluster centroids are not recalculated. Rebuild the index periodically or use HNSW.
SELECT * fails with vector columns in query_as!. Just as with tsvector columns, SQLx’s compile-time macros need explicit column lists. List your columns explicitly, omitting or casting the embedding column unless you need its contents.
The extension must be installed in your development database. SQLx’s compile-time query! macros connect to the database during compilation. The vector extension must be enabled there. The same applies to cargo sqlx prepare for offline compilation in CI.
Auth & Security
Authentication
Session-based authentication fits naturally into a hypermedia-driven architecture. The server manages all auth state. The browser sends a cookie. No client-side token management, no JWT parsing in JavaScript, no OAuth dance in the browser. The server decides who the user is, renders the appropriate HTML, and sends it.
This section builds authentication with tower-sessions for session management, argon2 for password hashing, and tower-csrf for cross-site request forgery protection. PostgreSQL stores both user records and session data via tower-sessions-sqlx-store.
Dependencies
[dependencies]
tower-sessions = "0.14"
tower-sessions-sqlx-store = { version = "0.15", features = ["postgres"] }
tower-csrf = "0.1"
argon2 = "0.5"
sqlx = { version = "0.8", features = ["runtime-tokio", "postgres", "time", "uuid"] }
time = "0.3"
uuid = { version = "1", features = ["v4", "serde"] }
tower-sessions provides the session middleware layer. tower-sessions-sqlx-store backs it with PostgreSQL so sessions survive server restarts. argon2 handles password hashing using the Argon2id algorithm, the OWASP primary recommendation. tower-csrf protects state-changing requests from cross-site forgery.
Note the version pairing: tower-sessions 0.14 and tower-sessions-sqlx-store 0.15 are compatible through their shared dependency on tower-sessions-core 0.14. Check both crates for newer matching releases.
Password hashing
Argon2id is memory-hard and CPU-hard, which makes brute-force attacks expensive even with GPUs. The argon2 crate provides a pure-Rust implementation.
Passwords are stored as PHC-format strings. The algorithm, version, and parameters are embedded alongside the hash, making the value self-describing:
$argon2id$v=19$m=65536,t=2,p=1$<salt>$<hash>
This means you can change hashing parameters over time without breaking verification of existing hashes. During verification, the argon2 crate reads parameters from the stored hash, not from the Argon2 instance.
use argon2::{
password_hash::{
rand_core::OsRng, PasswordHash, PasswordHasher, PasswordVerifier, SaltString,
},
Algorithm, Argon2, Params, Version,
};
fn build_hasher() -> Argon2<'static> {
let params = Params::new(
64 * 1024,
2,
1,
None,
)
.expect("valid argon2 params");
Argon2::new(Algorithm::Argon2id, Version::V0x13, params)
}
fn hash_password(password: &str) -> Result<String, argon2::password_hash::Error> {
let salt = SaltString::generate(&mut OsRng);
let hash = build_hasher().hash_password(password.as_bytes(), &salt)?;
Ok(hash.to_string())
}
fn verify_password(
password: &str,
stored_hash: &str,
) -> Result<(), argon2::password_hash::Error> {
let parsed = PasswordHash::new(stored_hash)?;
Argon2::default().verify_password(password.as_bytes(), &parsed)
}
SaltString::generate(&mut OsRng) produces a cryptographically random salt using the OS random number generator. The build_hasher function configures Argon2id with 64 MiB of memory, which is a reasonable starting point. Argon2::default() uses 19 MiB (the OWASP floor), but the recommendation is 64 MiB or higher if your server can handle it. Tune the memory parameter upward until hashing takes roughly 200ms on your production hardware.
The verify_password function uses Argon2::default() because it reads parameters from the stored hash, not from the instance. This means old hashes created with different parameters continue to verify correctly.
Peppering
A pepper is a secret key stored only in the application server, never in the database. If the database leaks but the application server is not compromised, the pepper makes the stolen hashes unverifiable. Argon2 has a built-in secret parameter for this:
fn build_hasher_with_pepper(pepper: &[u8]) -> Argon2<'_> {
let params = Params::new(64 * 1024, 2, 1, None).expect("valid argon2 params");
Argon2::new_with_secret(pepper, Algorithm::Argon2id, Version::V0x13, params)
.expect("valid argon2 secret")
}
Generate the pepper once (32 random bytes from a CSPRNG), store it as an environment variable or in a secrets manager, and load it at application startup. If the pepper is lost, all password hashes become unverifiable and every user must reset their password. Treat it with the same care as a database encryption key.
If you want a simpler API, the password-auth crate wraps argon2 with two functions (generate_hash, verify_password) and provides is_hash_obsolete() for detecting when stored hashes should be re-hashed with newer parameters. The lower-level API shown here gives more control when you need it.
Async context
Argon2 hashing is CPU-intensive. A single hash takes 50-200ms depending on hardware. Running it directly in an async handler blocks the tokio worker thread and starves other requests. Always offload to the blocking thread pool:
use tokio::task;
async fn hash_password_async(password: String) -> Result<String, anyhow::Error> {
task::spawn_blocking(move || hash_password(&password))
.await?
.map_err(Into::into)
}
async fn verify_password_async(
password: String,
stored_hash: String,
) -> Result<(), anyhow::Error> {
task::spawn_blocking(move || verify_password(&password, &stored_hash))
.await?
.map_err(Into::into)
}
The closure takes owned String values because spawn_blocking requires 'static. This moves the work to tokio’s dedicated blocking thread pool (separate from the async worker threads), keeping the async runtime responsive.
Password validation
Enforce constraints before hashing:
- Minimum length: 10 characters. Shorter passwords are too easy to brute-force.
- Maximum length: 128 characters. Without a maximum, an attacker can submit multi-megabyte passwords to exhaust server resources through expensive hashing.
- Unicode normalisation: Apply NFKC normalisation before hashing. Different systems represent the same characters differently, which causes cross-platform login failures. The
unicode-normalization crate handles this.
For password quality checking, the zxcvbn crate (a Rust port of Dropbox’s password strength estimator) catches common and weak passwords without maintaining a separate banned-password list.
Session layer
Set up a PostgreSQL-backed session store, run its migration to create the session table, and start a background task to clean up expired sessions.
use axum::Router;
use sqlx::PgPool;
use tower_sessions::{Expiry, SessionManagerLayer};
use tower_sessions_sqlx_store::PostgresStore;
use time::Duration;
async fn session_layer(pool: PgPool) -> SessionManagerLayer<PostgresStore> {
let store = PostgresStore::new(pool);
store.migrate().await.expect("session table migration failed");
tokio::task::spawn(
store
.clone()
.continuously_delete_expired(tokio::time::Duration::from_secs(60)),
);
SessionManagerLayer::new(store)
.with_secure(true)
.with_expiry(Expiry::OnInactivity(Duration::hours(24)))
}
PostgresStore::migrate() creates a tower_sessions schema with a session table (columns: id TEXT, data BYTEA, expiry_date TIMESTAMPTZ). The continuously_delete_expired task runs in the background, removing sessions that have passed their expiry date.
Cookie configuration
SessionManagerLayer configures the session cookie through builder methods:
with_secure(true) sets the Secure flag so the cookie is only sent over HTTPS. Always enable this in production.
with_http_only(true) is the default. The cookie is inaccessible to JavaScript, protecting against XSS-based session theft.
with_same_site(SameSite::Lax) is the default. Cookies are sent on top-level navigations but not on cross-site subrequests. Combined with CSRF protection, this is sufficient for most applications. Use SameSite::Strict for high-security applications, with the trade-off that users clicking links to your site from email will appear logged out on first load.
Expiry options
tower-sessions supports three expiry strategies:
Expiry::OnInactivity(Duration) resets the expiration on each request. A sliding window. Good for most applications.
Expiry::AtDateTime(OffsetDateTime) sets a fixed expiration. The session expires at that time regardless of activity.
Expiry::OnSessionEnd creates a browser session cookie with no Max-Age. The cookie is deleted when the browser closes.
The default when no expiry is set is two weeks. For applications handling sensitive data, consider shorter windows (1-24 hours) and requiring re-authentication for high-risk actions.
Layer ordering
Apply the session layer as the outermost middleware so sessions are available to all inner layers and handlers:
let app = Router::new()
.route("/register", get(show_register).post(handle_register))
.route("/login", get(show_login).post(handle_login))
.route("/logout", post(handle_logout))
.layer(csrf_layer)
.layer(session_layer(pool).await);
In Axum, the last .layer() call is the outermost layer and processes requests first. Here, the session layer processes first (loads the session from the cookie), then the CSRF layer checks the request origin, then the handler runs.
User table
Create a migration for the users table:
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email TEXT UNIQUE NOT NULL,
email_confirmed_at TIMESTAMPTZ,
password_hash TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
The corresponding Rust struct:
use sqlx::types::time::OffsetDateTime;
use uuid::Uuid;
#[derive(Debug, Clone, sqlx::FromRow)]
pub struct User {
pub id: Uuid,
pub email: String,
pub email_confirmed_at: Option<OffsetDateTime>,
pub password_hash: String,
pub created_at: OffsetDateTime,
pub updated_at: OffsetDateTime,
}
Registration
The registration handler validates input, hashes the password, and creates the user. It does not reveal whether an email is already taken, to prevent account enumeration.
use axum::{extract::State, response::IntoResponse, Form};
use maud::{html, Markup};
#[derive(serde::Deserialize)]
struct RegisterForm {
email: String,
password: String,
password_confirmation: String,
}
async fn show_register() -> Markup {
html! {
h1 { "Create an account" }
form method="post" action="/register" {
label for="email" { "Email" }
input type="email" name="email" id="email" required;
label for="password" { "Password" }
input type="password" name="password" id="password"
required minlength="10" maxlength="128"
autocomplete="new-password";
label for="password_confirmation" { "Confirm password" }
input type="password" name="password_confirmation"
id="password_confirmation" required
autocomplete="new-password";
button type="submit" { "Register" }
}
}
}
async fn handle_register(
State(state): State<AppState>,
Form(form): Form<RegisterForm>,
) -> impl IntoResponse {
if form.password != form.password_confirmation {
return show_error("Passwords do not match").into_response();
}
if form.password.len() < 10 || form.password.len() > 128 {
return show_error("Password must be 10 to 128 characters").into_response();
}
let password_hash = match hash_password_async(form.password).await {
Ok(hash) => hash,
Err(_) => return show_error("Registration failed").into_response(),
};
let result = sqlx::query(
"INSERT INTO users (email, password_hash) \
VALUES ($1, $2) ON CONFLICT (email) DO NOTHING",
)
.bind(&form.email)
.bind(&password_hash)
.execute(&state.db)
.await;
html! {
h1 { "Check your email" }
p { "If this email can be used for an account, you will receive further instructions." }
}
.into_response()
}
The ON CONFLICT (email) DO NOTHING query combined with a uniform response prevents attackers from probing which emails have accounts. The autocomplete="new-password" attribute tells password managers this is a registration form.
Login
The login handler verifies the password against the stored hash, creates a session, and cycles the session ID to prevent fixation attacks.
use axum::response::Redirect;
use tower_sessions::Session;
#[derive(serde::Deserialize)]
struct LoginForm {
email: String,
password: String,
}
async fn handle_login(
session: Session,
State(state): State<AppState>,
Form(form): Form<LoginForm>,
) -> impl IntoResponse {
let user: Option<User> = sqlx::query_as("SELECT * FROM users WHERE email = $1")
.bind(&form.email)
.fetch_optional(&state.db)
.await
.unwrap_or(None);
let Some(user) = user else {
let _ = hash_password_async("dummy-password".to_string()).await;
return show_login_error("Invalid email or password").into_response();
};
if verify_password_async(form.password, user.password_hash.clone())
.await
.is_err()
{
return show_login_error("Invalid email or password").into_response();
}
session.cycle_id().await.expect("failed to cycle session ID");
session
.insert("user_id", user.id)
.await
.expect("failed to insert session data");
Redirect::to("/").into_response()
}
Three security details matter here:
Timing attack prevention. When no user is found, a dummy hash_password_async call runs so the response time is similar regardless of whether the email exists. Without this, an attacker can distinguish “email not found” from “wrong password” by measuring response latency.
Session fixation prevention. session.cycle_id() generates a new session ID while preserving session data. Without this, an attacker who planted a known session ID (via a crafted link or subdomain cookie injection) could hijack the authenticated session.
Post-login redirect validation. If you add a ?next= parameter so users return to the page they were visiting before login, validate the target strictly. Allow only relative paths. Reject absolute URLs, URLs with different schemes or hosts, and URLs with embedded credentials. Without validation, an attacker can craft https://yoursite.com/login?next=https://evil.com, and the user sees a legitimate login page that redirects to a phishing site after authentication.
The error message is the same for both “user not found” and “wrong password”. Never reveal which one failed.
Rate limiting
Without rate limiting, the login endpoint is vulnerable to brute-force and credential stuffing attacks. Apply limits at two levels:
- Per-account: Lock the account after a threshold of failed attempts (for example, 10). Unlock after a cooldown period (15 minutes) or via email. This stops targeted attacks against a single user.
- Per-IP: Apply a sliding window limit (for example, 20 attempts per minute per IP). Return HTTP 429 with a
Retry-After header. This slows distributed scanning.
Per-account limiting is the primary defence. Per-IP limiting alone is insufficient because botnets rotate IP addresses.
For Axum, tower_governor provides a Tower-compatible rate limiting layer based on the governor crate. Apply it to your auth routes:
use tower_governor::{GovernorConfig, GovernorLayer};
let governor_config = GovernorConfig::default();
let governor_layer = GovernorLayer {
config: governor_config,
};
let auth_routes = Router::new()
.route("/login", get(show_login).post(handle_login))
.route("/register", get(show_register).post(handle_register))
.layer(governor_layer);
This handles per-IP limiting. For per-account lockout, track failed attempts in a database column or a Redis counter keyed by email, and check it before verifying the password.
Logout
Destroy the session and redirect. Protect logout with a POST request, not GET, so cross-site <img> tags or link prefetching cannot force a logout.
async fn handle_logout(session: Session) -> impl IntoResponse {
session.flush().await.expect("failed to flush session");
Redirect::to("/login")
}
session.flush() clears all session data, deletes the record from the database, and nullifies the session cookie.
Build an Axum extractor that loads the authenticated user from the session. Use this wherever a handler needs the current user.
use axum::{
extract::FromRequestParts,
http::{request::Parts, StatusCode},
};
pub struct AuthUser(pub User);
impl<S: Send + Sync> FromRequestParts<S> for AuthUser {
type Rejection = StatusCode;
async fn from_request_parts(
parts: &mut Parts,
state: &S,
) -> Result<Self, Self::Rejection> {
let session = Session::from_request_parts(parts, state)
.await
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
let user_id: Uuid = session
.get("user_id")
.await
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?
.ok_or(StatusCode::UNAUTHORIZED)?;
let pool = parts
.extensions
.get::<PgPool>()
.ok_or(StatusCode::INTERNAL_SERVER_ERROR)?;
let user: User = sqlx::query_as("SELECT * FROM users WHERE id = $1")
.bind(user_id)
.fetch_optional(pool)
.await
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?
.ok_or(StatusCode::UNAUTHORIZED)?;
Ok(AuthUser(user))
}
}
Handlers that need authentication add AuthUser as a parameter. If no valid session exists, the request returns 401 before the handler body runs:
async fn dashboard(AuthUser(user): AuthUser) -> Markup {
html! {
h1 { "Welcome, " (user.email) }
}
}
For the extractor to access the database pool, add it to request extensions via middleware, or make the extractor generic over your AppState. The approach depends on how you structure shared state; see Web Server with Axum.
CSRF protection
Cross-site request forgery tricks a logged-in user’s browser into making unintended requests to your application. Traditional defences embed hidden tokens in forms. A simpler approach validates the request origin using headers the browser sends automatically.
tower-csrf implements this origin-based approach, inspired by Filippo Valsorda’s analysis of CSRF and the defence built into Go 1.25’s net/http. Instead of managing tokens, it checks the Sec-Fetch-Site and Origin headers. Modern browsers (all major browsers since 2023) send Sec-Fetch-Site: same-origin for same-site requests. Cross-origin requests are blocked. Safe methods (GET, HEAD, OPTIONS) are allowed unconditionally.
use axum::{
error_handling::HandleErrorLayer,
http::StatusCode,
response::IntoResponse,
};
use tower::ServiceBuilder;
use tower_csrf::{CrossOriginProtectionLayer, ProtectionError};
let csrf_layer = ServiceBuilder::new()
.layer(HandleErrorLayer::new(
|error: Box<dyn std::error::Error + Send + Sync>| async move {
if error.downcast_ref::<ProtectionError>().is_some() {
(StatusCode::FORBIDDEN, "Cross-origin request blocked").into_response()
} else {
StatusCode::INTERNAL_SERVER_ERROR.into_response()
}
},
))
.layer(CrossOriginProtectionLayer::default());
No hidden form fields. No hx-headers configuration for htmx. Same-origin requests pass automatically because the browser attests to the origin. This is a clean fit for HDA applications where every form submission and htmx request originates from the same domain.
If you need to accept cross-origin requests from specific origins (SSO callbacks, webhooks), add them explicitly:
let csrf = CrossOriginProtectionLayer::default()
.add_trusted_origin("https://sso.example.com")
.expect("valid origin URL");
For the full argument behind origin-based CSRF validation and why token-based CSRF is unnecessary in modern browsers, read Filippo Valsorda’s analysis.
If you need to support browsers that do not send Sec-Fetch-Site headers (pre-2023), or you prefer a traditional token-based approach, axum_csrf provides a double-submit cookie pattern compatible with Axum 0.8.
Email confirmation
Confirm email addresses before activating accounts. Without confirmation, anyone can register with someone else’s email, and your application sends unwanted messages to non-users.
The flow uses a split token pattern: a 16-byte identifier for database lookup and a 16-byte verifier for constant-time comparison. Store the SHA-256 hash of the verifier, never the verifier itself. If the database leaks, attackers cannot reconstruct valid confirmation links.
Flow
- On registration, generate an identifier (16 random bytes) and a verifier (16 random bytes) using a CSPRNG (
OsRng).
- Store in a
confirmations table: identifier (indexed), SHA-256(verifier), user ID, expiration (24-48 hours), and action type (email_confirmation).
- Base64url-encode the concatenated identifier + verifier into a link:
https://example.com/confirm?token=<encoded>.
- Send the link via email. See Email for sending with Lettre and testing with MailCrab.
- When the user clicks the link, require an active session (the user must be logged in). Split the token back into identifier and verifier. Look up by identifier. Check expiration. Constant-time compare SHA-256(received verifier) with the stored hash using the
subtle crate.
- On success, set
email_confirmed_at on the user record and delete the confirmation record.
Requiring an active session at step 5 prevents an attacker who intercepts the confirmation email (compromised mailbox, network interception) from confirming the account without knowing the password. The user must both possess the token and be authenticated.
Preventing enumeration
Never reveal whether an email is already registered. On registration:
- Always display: “Check your email to complete registration.”
- If the email is new, send a confirmation link.
- If the email already exists, send a different message: “Someone attempted to register with your email. If this was you, you can log in or reset your password.”
Schedule the email step asynchronously so the response time is identical in both cases. A timing difference between “new account” and “existing account” is enough for an attacker to enumerate emails.
Confirmations table
A single table handles email confirmations, password resets, and email changes:
CREATE TABLE confirmations (
identifier BYTEA PRIMARY KEY,
verifier_hash BYTEA NOT NULL,
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
action_type TEXT NOT NULL,
details JSONB,
expires_at TIMESTAMPTZ NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_confirmations_user_id ON confirmations(user_id);
The action_type column distinguishes confirmation purposes. The details column holds action-specific data as JSON (for example, the new email address during an email change).
Password reset
Password resets follow the same split token pattern as email confirmation. The key differences are a shorter expiration and the requirement to invalidate all existing sessions after a successful reset.
Flow
- User submits their email on the reset form.
- Display: “If this email belongs to an account, you will receive reset instructions.” Never reveal whether the email has an account.
- Look up the user. If found, generate a split token, store it with a 30-minute expiration and action type
password_reset, and email the link. If not found, do nothing. Schedule the work asynchronously for consistent response timing.
- When the user clicks the link, verify the token (same split-and-compare as email confirmation). Show a new password form with the token as a hidden field.
- On form submission, re-verify the token, hash the new password, update the user record, delete all reset tokens for this user, invalidate all sessions for this user, create a new session, and log them in.
Security considerations
- 30-minute expiration. Reset tokens are high-value targets. Keep the window short.
- Invalidate all sessions after a successful reset. If the password was compromised, existing sessions may belong to an attacker.
- Allow multiple outstanding tokens. Don’t delete old tokens when a new reset is requested. The user may request a reset, not receive the email, and request again.
- Delete tokens on login. If the user remembers their password and logs in normally, delete their outstanding reset tokens.
- Require the new password on the same form as the token. Don’t split this into two steps. Re-verify the token on submission to prevent replay.
When to delegate authentication
Session-based auth works well for a single application with its own user base and straightforward login requirements. Delegate to an external Identity Provider when the requirements outgrow what the application should manage directly:
- Multiple applications need SSO. Users log in once and access several services. A shared identity layer is easier to maintain than per-application auth.
- Enterprise customers expect SAML or OIDC. B2B SaaS products typically need to integrate with customers’ corporate identity systems.
- Compliance frameworks require it. SOC 2, HIPAA, and PCI-DSS audits favour dedicated identity infrastructure with built-in audit logging, brute-force protection, and pre-certified MFA controls. An external IdP gives auditors a clear separation of concerns.
- Existing infrastructure. Your organisation already runs Active Directory, LDAP, or a corporate IdP that users expect to log in with.
Self-hosted identity providers
Keycloak is a full-featured open-source IdP (CNCF incubation project) supporting OAuth2, OIDC, SAML, and LDAP federation. It handles SSO, MFA, identity brokering, and user management. The trade-off is operational weight: it is a Java application with significant resource requirements.
Authentik is a lighter alternative with a more modern developer experience, supporting OAuth2, OIDC, SAML, LDAP, and SCIM.
Both align with this guide’s preference for self-hosted infrastructure.
Integration with OAuth2 Proxy
OAuth2 Proxy sits between users and your Axum application as a reverse proxy. It handles the OAuth2/OIDC flow with your IdP and forwards authenticated requests with identity headers:
use axum::http::HeaderMap;
async fn handler(headers: HeaderMap) -> Markup {
let email = headers
.get("X-Forwarded-Email")
.and_then(|v| v.to_str().ok())
.unwrap_or("anonymous");
html! { p { "Logged in as " (email) } }
}
The application reads identity from trusted headers (X-Forwarded-User, X-Forwarded-Email) without implementing OAuth2 flows directly. The proxy strips any client-supplied identity headers before injecting authenticated values, preventing spoofing.
Your application must only be reachable through the proxy, never directly from the internet. Enforce this at the network level: firewall rules, container networking, or Tailscale ACLs. If a client can bypass the proxy, it can set X-Forwarded-User to any value.
Choosing an auth strategy
| Situation | Approach |
| Single app, simple login | Session auth (tower-sessions + argon2) |
| Single app, social login (GitHub, Google) | oauth2 / openidconnect crate in Axum |
| Multiple apps needing SSO | External IdP (Keycloak/Authentik) + OAuth2 Proxy |
| B2B SaaS, enterprise customers | External IdP or managed service (Auth0, WorkOS) |
| SOC 2 / HIPAA / PCI-DSS compliance | External IdP strongly recommended |
| Existing Active Directory / LDAP | Keycloak |
Start with session-based auth. Move to an external IdP when you hit one of the triggers above. The migration is additive: OAuth2 Proxy sits in front of your existing application, and the AuthUser extractor reads from proxy headers instead of session data.
Implementation resources
For AI coding agents implementing authentication: the secure-auth skill provides detailed security reference material covering cryptographic fundamentals, password hashing parameters, the split token pattern, session management, MFA (TOTP, WebAuthn, recovery codes), and security review checklists. Use it as context when building the patterns described in this section.
For access to the secure-auth skill and detailed implementation guidance, contact the author.
Gotchas
spawn_blocking for all password operations. Forgetting to offload argon2 to the blocking thread pool is the most common mistake. Under load, a single blocked tokio worker thread cascades into request timeouts across the application.
Session ID cycling must happen before inserting user data. Call session.cycle_id() before session.insert("user_id", ...). If you insert first and the cycle fails, the old (potentially attacker-controlled) session ID now has authenticated data.
tower-sessions version compatibility. The tower-sessions and tower-sessions-sqlx-store crates track tower-sessions-core versions independently. If Cargo reports a version conflict on tower-sessions-core, check that the published sqlx-store version matches your tower-sessions version. Pin both until they align.
Consistent error messages on auth forms. Every registration, login, and reset form must give the same response regardless of whether the email exists. This includes response timing. An async email-sending step after registration or reset prevents timing leaks.
CSRF on logout. Logout must be a POST request protected by CSRF, not a GET link. A GET-based logout allows any cross-site <img> tag to force a logout, which is a nuisance attack that can also be chained with session fixation.
Authorization
Authentication answers “who is this user?” Authorization answers “what can this user do?” The two concerns are separate. A user can be authenticated and still forbidden from accessing a resource.
Authorization is domain-dependent. An internal admin tool, a multi-tenant SaaS product, and a public content platform all need different models. There is no universal authorization framework worth adopting for every project. The Rust ecosystem reflects this: most production Axum applications build authorization with custom extractors rather than reaching for a policy engine. This section follows that approach, building on the AuthUser extractor from Authentication.
Adding roles to the user model
The simplest authorization model adds a role directly to the user record. This covers the majority of applications where users fall into a small number of categories with distinct access levels.
Add a migration:
CREATE TYPE user_role AS ENUM ('user', 'editor', 'admin');
ALTER TABLE users ADD COLUMN role user_role NOT NULL DEFAULT 'user';
A PostgreSQL enum constrains the value at the database level. New roles require a migration (ALTER TYPE user_role ADD VALUE 'moderator'), which is appropriate when roles change infrequently. If your roles change often or vary per deployment, use a TEXT column with application-level validation instead.
The corresponding Rust types:
#[derive(Debug, Clone, Copy, PartialEq, Eq, sqlx::Type)]
#[sqlx(type_name = "user_role", rename_all = "lowercase")]
pub enum Role {
User,
Editor,
Admin,
}
#[derive(Debug, Clone, sqlx::FromRow)]
pub struct User {
pub id: Uuid,
pub email: String,
pub email_confirmed_at: Option<OffsetDateTime>,
pub password_hash: String,
pub role: Role,
pub created_at: OffsetDateTime,
pub updated_at: OffsetDateTime,
}
sqlx::Type with type_name = "user_role" maps the Rust enum to the PostgreSQL enum. The rename_all = "lowercase" attribute matches the lowercase variants in the SQL definition.
Axum extractors are the natural place for authorization. They run before the handler body, they can reject requests early, and they make permission requirements visible in the handler’s function signature.
A generic extractor that requires a minimum role:
use axum::{
extract::FromRequestParts,
http::{request::Parts, StatusCode},
response::{IntoResponse, Response},
};
pub struct RequireRole<const ROLE: u8>(pub User);
impl<S: Send + Sync, const ROLE: u8> FromRequestParts<S> for RequireRole<ROLE> {
type Rejection = Response;
async fn from_request_parts(
parts: &mut Parts,
state: &S,
) -> Result<Self, Self::Rejection> {
let AuthUser(user) = AuthUser::from_request_parts(parts, state)
.await
.map_err(|e| e.into_response())?;
if !user.role.has_at_least(ROLE) {
return Err(StatusCode::FORBIDDEN.into_response());
}
Ok(RequireRole(user))
}
}
The const generic approach is clean, but Rust does not yet support enum values as const generics. Use integer constants as a workaround:
impl Role {
const fn level(self) -> u8 {
match self {
Role::User => 0,
Role::Editor => 1,
Role::Admin => 2,
}
}
fn has_at_least(&self, required: u8) -> bool {
self.level() >= required
}
}
pub const EDITOR: u8 = 1;
pub const ADMIN: u8 = 2;
Handlers declare their required role in the signature:
async fn admin_dashboard(RequireRole<ADMIN>(user): RequireRole<ADMIN>) -> Markup {
html! {
h1 { "Admin dashboard" }
p { "Logged in as " (user.email) }
}
}
async fn edit_article(RequireRole<EDITOR>(user): RequireRole<EDITOR>) -> Markup {
html! { h1 { "Edit article" } }
}
A simpler alternative avoids const generics entirely. Define separate extractor types for each role:
pub struct RequireAdmin(pub User);
impl<S: Send + Sync> FromRequestParts<S> for RequireAdmin {
type Rejection = Response;
async fn from_request_parts(
parts: &mut Parts,
state: &S,
) -> Result<Self, Self::Rejection> {
let AuthUser(user) = AuthUser::from_request_parts(parts, state)
.await
.map_err(|e| e.into_response())?;
if user.role != Role::Admin {
return Err(StatusCode::FORBIDDEN.into_response());
}
Ok(RequireAdmin(user))
}
}
This is more verbose when you have many roles, but each extractor is self-contained and easy to understand. For most applications with two or three roles, separate types are the better choice.
Resource ownership checks
Role-based checks are not enough when access depends on who owns a resource. An editor should edit their own articles but not someone else’s. This is resource-level authorization, and it belongs in the handler, not in an extractor, because the handler is where you load the resource.
async fn update_article(
AuthUser(user): AuthUser,
State(state): State<AppState>,
Path(article_id): Path<Uuid>,
Form(form): Form<ArticleForm>,
) -> Result<impl IntoResponse, StatusCode> {
let article = sqlx::query_as!(
Article,
"SELECT * FROM articles WHERE id = $1",
article_id
)
.fetch_optional(&state.db)
.await
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?
.ok_or(StatusCode::NOT_FOUND)?;
if article.author_id != user.id && user.role != Role::Admin {
return Err(StatusCode::FORBIDDEN);
}
Ok(Redirect::to(&format!("/articles/{}", article_id)))
}
The pattern is straightforward: load the resource, check whether the user has access, proceed or reject. Resist the temptation to push this into middleware or an extractor. Resource-level checks depend on the specific resource being accessed, which makes them inherently handler-level logic.
Protecting route groups
For coarse-grained protection (all routes under /admin require an admin), apply the extractor as a route layer:
use axum::middleware;
let admin_routes = Router::new()
.route("/admin/dashboard", get(admin_dashboard))
.route("/admin/users", get(admin_users))
.route("/admin/settings", get(admin_settings).post(update_settings))
.route_layer(middleware::from_extractor::<RequireAdmin>());
let app = Router::new()
.merge(admin_routes)
.route("/", get(home))
.route("/articles", get(list_articles))
.layer(session_layer);
route_layer applies the extractor to all routes in the group. Any request to /admin/* that fails the admin check gets a 403 before the handler runs. The extractor still runs per-request, hitting the database each time, so the session-backed user lookup from AuthUser happens on every admin request.
For unauthenticated route groups mixed with authenticated ones, structure your router so the auth layer only wraps the routes that need it:
let public_routes = Router::new()
.route("/", get(home))
.route("/login", get(show_login).post(handle_login))
.route("/register", get(show_register).post(handle_register));
let protected_routes = Router::new()
.route("/dashboard", get(dashboard))
.route("/settings", get(settings).post(update_settings))
.route_layer(middleware::from_extractor::<AuthUser>());
let app = Router::new()
.merge(public_routes)
.merge(protected_routes)
.layer(session_layer);
Returning meaningful errors
A bare StatusCode::FORBIDDEN is unhelpful to users. In an HDA application, return an HTML fragment that explains what went wrong:
use axum::response::{IntoResponse, Response};
pub enum AuthzError {
Unauthenticated,
Forbidden,
}
impl IntoResponse for AuthzError {
fn into_response(self) -> Response {
match self {
AuthzError::Unauthenticated => {
Redirect::to("/login").into_response()
}
AuthzError::Forbidden => {
(StatusCode::FORBIDDEN, html! {
h1 { "Access denied" }
p { "You do not have permission to access this page." }
a href="/" { "Return to home" }
}).into_response()
}
}
}
}
Use AuthzError::Unauthenticated (not logged in) to redirect to login. Use AuthzError::Forbidden (logged in but insufficient permissions) to show a 403 page. The distinction matters: a 401 redirect invites the user to log in, while a 403 tells them their current account cannot access the resource.
Multi-tenancy authorization
When your application serves multiple tenants (organisations, teams, workspaces), authorization gains a tenant dimension. A user may be an admin in one organisation and a regular member in another.
The most common model for HDA applications is a shared database with a tenant_id column on tenant-scoped tables:
CREATE TABLE organisations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE memberships (
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
organisation_id UUID NOT NULL REFERENCES organisations(id) ON DELETE CASCADE,
role user_role NOT NULL DEFAULT 'user',
PRIMARY KEY (user_id, organisation_id)
);
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
organisation_id UUID NOT NULL REFERENCES organisations(id) ON DELETE CASCADE,
name TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
A memberships table maps users to organisations with a role per membership. This replaces the single role column on the user record with a per-tenant role.
Build an extractor that resolves the current tenant from the request (subdomain, path parameter, or header) and verifies the user’s membership:
pub struct TenantUser {
pub user: User,
pub organisation_id: Uuid,
pub role: Role,
}
impl<S: Send + Sync> FromRequestParts<S> for TenantUser {
type Rejection = Response;
async fn from_request_parts(
parts: &mut Parts,
state: &S,
) -> Result<Self, Self::Rejection> {
let AuthUser(user) = AuthUser::from_request_parts(parts, state)
.await
.map_err(|e| e.into_response())?;
let Path(org_id): Path<Uuid> = Path::from_request_parts(parts, state)
.await
.map_err(|_| StatusCode::BAD_REQUEST.into_response())?;
let pool = parts
.extensions
.get::<PgPool>()
.ok_or(StatusCode::INTERNAL_SERVER_ERROR.into_response())?;
let membership = sqlx::query_as!(
Membership,
"SELECT role as \"role: Role\" FROM memberships \
WHERE user_id = $1 AND organisation_id = $2",
user.id,
org_id
)
.fetch_optional(pool)
.await
.map_err(|_| StatusCode::INTERNAL_SERVER_ERROR.into_response())?
.ok_or(StatusCode::FORBIDDEN.into_response())?;
Ok(TenantUser {
user,
organisation_id: org_id,
role: membership.role,
})
}
}
Every query in a tenant-scoped handler then filters by organisation_id:
async fn list_projects(
tenant: TenantUser,
State(state): State<AppState>,
) -> Markup {
let projects = sqlx::query_as!(
Project,
"SELECT * FROM projects WHERE organisation_id = $1 ORDER BY name",
tenant.organisation_id
)
.fetch_all(&state.db)
.await
.expect("query failed");
html! {
h1 { "Projects" }
ul {
@for project in &projects {
li { (project.name) }
}
}
}
}
The WHERE organisation_id = $1 clause is the tenant boundary. Miss it on a single query and you leak data across tenants. This is the fundamental weakness of application-level tenant isolation: it depends on every query being correct.
For the three database-level isolation models (shared database with tenant column, schema-per-tenant, database-per-tenant), shared database with a tenant column is the right starting point for most applications. Schema-per-tenant and database-per-tenant add operational complexity (per-tenant migrations, connection routing) that is only justified when regulatory requirements or enterprise customers demand stronger isolation guarantees.
PostgreSQL Row-Level Security
PostgreSQL Row-Level Security (RLS) enforces tenant isolation at the database level rather than relying on application code to include WHERE tenant_id = $1 on every query. The database rejects or filters rows that violate the policy, regardless of what the application sends.
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON projects
USING (organisation_id = current_setting('app.organisation_id')::uuid);
Set the tenant context at the start of each request using SET LOCAL inside a transaction:
let mut tx = pool.begin().await?;
sqlx::query("SELECT set_config('app.organisation_id', $1, true)")
.bind(org_id.to_string())
.execute(&mut *tx)
.await?;
let projects = sqlx::query_as!(Project, "SELECT * FROM projects")
.fetch_all(&mut *tx)
.await?;
tx.commit().await?;
The third argument to set_config (true) scopes the setting to the current transaction. When the transaction ends, the setting resets.
RLS is powerful but carries significant operational gotchas:
- Connection pool contamination. If you use
set_config(..., false) (session-scoped instead of transaction-local), the setting persists on the connection. When that connection returns to the pool and is reused by a different tenant, it carries the previous tenant’s context. Always use true for transaction-local scope.
- Superuser and table owner bypass. PostgreSQL superusers and table owners skip RLS entirely. Your development database (often running as a superuser) silently ignores all policies. Use
ALTER TABLE ... FORCE ROW LEVEL SECURITY and test with a restricted role.
- SQLx compile-time checking. SQLx’s
query! macros connect to the database during compilation. If that database has RLS enabled and the compile-time connection uses a restricted role, queries fail at build time. Point DATABASE_URL at a superuser role for compilation and use a restricted role at runtime.
- PgBouncer. RLS with session variables does not work with PgBouncer in statement pooling mode. Use transaction pooling or session pooling.
For most applications, application-level WHERE clauses with a well-tested tenant-scoped extractor are simpler and sufficient. Consider RLS when you need defence-in-depth for sensitive data, or when compliance requirements demand database-enforced isolation.
When to reach for a policy engine
Custom extractors handle role-based checks well. They become unwieldy when authorization rules are complex, frequently changing, or need to be auditable separately from application code.
Signs you have outgrown custom extractors:
- Permissions depend on combinations of user attributes, resource attributes, and environmental conditions (time of day, IP range). This is attribute-based access control (ABAC), and encoding it in Rust conditionals becomes error-prone.
- Non-developers (compliance officers, product managers) need to review or modify access policies.
- You need an audit trail of policy changes separate from code deployments.
Two production-quality options exist in the Rust ecosystem:
- Cedar (by Amazon, 3.6M+ downloads) provides a purpose-built policy language for RBAC and ABAC. Policies are human-readable text files evaluated by the Cedar engine. No Axum-specific integration exists; wrap it in a service called from your extractors or handlers.
- Casbin (1M+ downloads) supports ACL, RBAC, and ABAC models through a configuration-driven approach. The
axum-casbin crate provides Axum middleware integration.
Both are well-maintained. Cedar has stronger backing and a more expressive policy language. Casbin has broader ecosystem support and a ready-made Axum integration. Evaluate both if you reach the point of needing one. Most HDA applications with a handful of roles and straightforward ownership rules will not.
Gotchas
Authorization checks on every code path. A handler that loads a resource and modifies it needs the authorization check between load and modify, not just at the route level. Route-level checks confirm the user’s role. Handler-level checks confirm access to the specific resource. Both are needed.
Forgetting to scope queries by tenant. In a multi-tenant application, every query that touches tenant data must include the tenant filter. A single unscoped query leaks data across tenants. Code review should treat a missing WHERE organisation_id = $1 with the same severity as a SQL injection.
Caching user roles. If you cache the user’s role (in the session, in memory), a role change by an admin does not take effect until the cache expires or the user logs out. For most applications, querying the database on each request through the extractor is fast enough and avoids stale-role bugs. If you do cache, keep the TTL short.
Confusing 401 and 403. Return 401 (Unauthorized) when the user is not authenticated, prompting a login. Return 403 (Forbidden) when the user is authenticated but lacks permission. Mixing these up confuses both users and API consumers.
Horizontal privilege escalation. A user modifies a URL parameter (changing /articles/123/edit to /articles/456/edit) and accesses another user’s resource. Role checks alone do not prevent this. Every handler that operates on a specific resource must verify that the authenticated user has access to that particular resource.
Web Application Security
Rust eliminates entire classes of memory safety vulnerabilities, but web application security is broader than memory safety. Cross-site scripting, SQL injection, missing security headers, and misconfigured policies are all possible in Rust applications. This section covers the web-specific security concerns that require explicit attention in the Axum/Maud/htmx/SQLx stack.
For CSRF protection, session cookie configuration, and rate limiting on authentication endpoints, see Authentication. For input validation and sanitisation, see Form Handling and Validation.
OWASP Top 10 in this stack
The OWASP Top 10 (2021) is the standard classification of web application security risks. Not all categories require equal attention in every stack. This table maps each category to its relevance in a Rust/Axum/Maud/SQLx application.
| # | Category | Relevance | Notes |
| A01 | Broken Access Control | High | Application logic. Rust provides no automatic protection. Every route needs explicit authorisation checks. |
| A02 | Cryptographic Failures | Medium | Depends on crate choices. Use audited crates (ring, rustls, argon2). Never roll custom cryptography. |
| A03 | Injection | Low | SQLx uses parameterized queries. Maud auto-escapes HTML. Both mitigate the primary injection vectors by default. Risk remains if you bypass either. |
| A04 | Insecure Design | Medium | Architecture-level concern. Applies equally to all stacks. Threat modelling and secure design reviews are the mitigations. |
| A05 | Security Misconfiguration | High | Missing security headers, permissive CORS, debug output in production, default session secrets. Requires explicit configuration. |
| A06 | Vulnerable and Outdated Components | Medium | Rust crates can have vulnerabilities. Run cargo audit in CI. Use cargo deny for policy enforcement. |
| A07 | Identification and Authentication Failures | Medium | Covered in Authentication: argon2 hashing, session management, rate limiting. |
| A08 | Software and Data Integrity Failures | Low | Cargo.lock pins exact dependency versions. Cargo verifies checksums. CI/CD pipeline security is the residual risk. |
| A09 | Security Logging and Monitoring Failures | Medium | Operational concern. Use the tracing crate for structured logging. |
| A10 | Server-Side Request Forgery | Low | Only relevant if your application makes outbound HTTP requests based on user-supplied URLs. Validate URL schemes and destinations if so. |
The categories that need the most attention in this stack are A01 (Broken Access Control) and A05 (Security Misconfiguration). The Rust type system and the crate ecosystem handle A03 (Injection) and A08 (Integrity) well by default, but only if you use them correctly.
XSS prevention with Maud
Maud’s html! macro HTML-entity-escapes all interpolated values by default. The characters <, >, &, ", and ' are converted to their entity equivalents. This prevents the most common XSS vector: injecting <script> tags through user input.
use maud::html;
html! {
p { (user_input) }
}
What bypasses escaping
PreEscaped() disables escaping for its argument. It exists for cases where you have trusted HTML that should be rendered as-is (content from a Markdown renderer, for example).
use maud::PreEscaped;
html! {
(PreEscaped(user_provided_html))
}
Treat every use of PreEscaped as a security boundary. If the input is not fully trusted, do not use it. If you must render user-supplied rich text, sanitise it with a dedicated HTML sanitiser (such as ammonia) before passing it to PreEscaped.
Risks that remain with auto-escaping
Maud performs HTML-entity escaping only. It does not do context-aware escaping, which means certain attack vectors survive even with escaping enabled.
javascript: URLs. If user input is used as an href or src value, HTML-entity escaping does not prevent javascript: scheme attacks because no < or > characters need escaping:
html! {
a href=(user_url) { "Click here" }
}
Validate URLs on the server before rendering them. Only allow http:// and https:// schemes:
fn is_safe_url(url: &str) -> bool {
url.starts_with("https://") || url.starts_with("http://") || url.starts_with("/")
}
CSS injection in style attributes. Setting a style attribute from user input can enable CSS-based attacks (data exfiltration via background-image URLs, UI redressing) even with HTML escaping. Do not interpolate user input into style attributes. Use CSS classes instead.
<script> and <style> element bodies. The HTML specification does not process entity escapes inside <script> and <style> elements. Maud will escape the content, which either mangles valid JavaScript/CSS or requires PreEscaped to work correctly. Never interpolate user input into <script> or <style> blocks. Pass data to JavaScript via data- attributes on HTML elements, where Maud’s escaping is effective.
Maud’s structured syntax makes it difficult to accidentally construct event handler attributes like onclick from user input, unlike string-based template engines where concatenation errors can create attribute injection. This is a genuine safety advantage of the macro approach.
SQL injection prevention with SQLx
SQLx uses bind parameters for all user-provided values. The database driver sends the query structure and parameter values separately over the wire, making injection impossible at the protocol level.
Compile-time checked queries
The sqlx::query! and sqlx::query_as! macros accept only string literals. You cannot pass a String or the result of format!(). The macro verifies the query against a live database at compile time, checking column names, types, and placeholder counts. SQL injection is structurally impossible in compile-time checked queries because there is no way to interpolate user input into the query string.
let user = sqlx::query_as!(User, "SELECT * FROM users WHERE id = $1", user_id)
.fetch_one(&pool)
.await?;
Runtime queries
The sqlx::query() function accepts a &str query string. Parameters are bound via .bind() calls. The API deliberately accepts &str rather than String to create friction against format!() usage.
let users = sqlx::query("SELECT * FROM users WHERE email = $1")
.bind(&email)
.fetch_all(&pool)
.await?;
How to accidentally bypass parameterization
The primary risk is using format!() to build query strings:
let query = format!("SELECT * FROM users WHERE name = '{}'", user_input);
let result = sqlx::query(&query).fetch_all(&pool).await?;
This is the Rust equivalent of string concatenation in PHP or Python SQL. SQLx’s design discourages it (.query() takes &str, not String), but does not make it impossible.
Dynamic table or column names present a legitimate challenge because SQL bind parameters only work for values, not identifiers. If you need dynamic identifiers (user-selected sort column, for example), validate them against an allowlist:
fn validated_sort_column(input: &str) -> &str {
match input {
"name" | "email" | "created_at" => input,
_ => "created_at",
}
}
let order_by = validated_sort_column(¶ms.sort);
let query = format!("SELECT * FROM users ORDER BY {order_by}");
let users = sqlx::query(&query).fetch_all(&pool).await?;
The format!() here is safe because order_by can only be one of the validated values. The principle: bind parameters for values, allowlists for identifiers.
Content Security Policy
A Content Security Policy (CSP) header tells the browser which sources of content are permitted. A well-configured CSP is the strongest defence against XSS after output escaping, because it prevents the browser from executing injected scripts even if they make it into the HTML.
A baseline CSP for this stack
For an HDA application serving HTML from Axum, with htmx loaded from a same-origin file and CSS in external stylesheets:
default-src 'self';
script-src 'self';
style-src 'self';
img-src 'self' data:;
font-src 'self';
connect-src 'self';
form-action 'self';
frame-ancestors 'none';
base-uri 'self';
object-src 'none'
This policy allows scripts, styles, images, fonts, and connections only from the same origin. It blocks framing (frame-ancestors 'none'), restricts form targets (form-action 'self'), and disallows plugins (object-src 'none'). The data: source for images permits inline data URIs (common for small icons and placeholder images).
htmx and CSP complications
htmx creates three specific CSP challenges that you need to plan for.
Inline indicator styles. By default, htmx injects a <style> element into the page for its loading indicator CSS. This violates a style-src 'self' policy. Disable this by setting the includeIndicatorStyles configuration to false and providing the indicator CSS in your own stylesheet:
<meta name="htmx-config" content='{"includeIndicatorStyles": false}'>
The indicator CSS you need to include in your own stylesheet:
.htmx-indicator {
opacity: 0;
transition: opacity 200ms ease-in;
}
.htmx-request .htmx-indicator {
opacity: 1;
}
.htmx-request.htmx-indicator {
opacity: 1;
}
hx-on:* attributes. htmx’s hx-on:* attributes (e.g., hx-on:click, hx-on:htmx:after-swap) are functionally equivalent to inline event handlers. They require 'unsafe-inline' in script-src, which defeats the purpose of CSP for script control. Avoid hx-on:* attributes entirely. Use hx-trigger with server-driven patterns instead, or attach event listeners in external JavaScript files.
Nonce propagation on AJAX responses. htmx automatically copies the nonce attribute from inline scripts it finds in AJAX responses. This is intended as a convenience but undermines the CSP nonce security model: if an attacker can inject a <script> tag into a server response (via stored XSS), htmx will propagate the nonce to it, and the browser will execute it. The defence is to not rely on nonces for htmx-loaded content. Instead, serve all JavaScript from external files (script-src 'self') and do not use inline scripts in htmx responses.
The practical approach
The combination of these constraints points to a clear policy:
- Serve all JavaScript from same-origin files. No inline scripts.
- Serve all CSS from same-origin stylesheets. Disable htmx’s built-in indicator styles.
- Do not use
hx-on:* attributes.
- Use
script-src 'self' and style-src 'self' without nonces or 'unsafe-inline'.
This approach is simpler and more secure than a nonce-based policy. The trade-off is that you cannot use inline scripts or styles at all, which in an HDA application is rarely a limitation.
Beyond CSP, several HTTP response headers improve security. Rather than pulling in a dependency, write a middleware function that sets them all. This keeps the headers visible in your codebase and avoids relying on a third-party crate’s defaults.
use axum::{extract::Request, middleware::Next, response::Response};
use http::{header, HeaderValue};
pub async fn security_headers(req: Request, next: Next) -> Response {
let mut res = next.run(req).await;
let headers = res.headers_mut();
headers.insert(
header::X_CONTENT_TYPE_OPTIONS,
HeaderValue::from_static("nosniff"),
);
headers.insert(
header::X_FRAME_OPTIONS,
HeaderValue::from_static("DENY"),
);
headers.insert(
header::STRICT_TRANSPORT_SECURITY,
HeaderValue::from_static("max-age=63072000; includeSubDomains"),
);
headers.insert(
header::REFERRER_POLICY,
HeaderValue::from_static("strict-origin-when-cross-origin"),
);
headers.insert(
header::CONTENT_SECURITY_POLICY,
HeaderValue::from_static(
"default-src 'self'; script-src 'self'; style-src 'self'; \
img-src 'self' data:; font-src 'self'; connect-src 'self'; \
form-action 'self'; frame-ancestors 'none'; base-uri 'self'; \
object-src 'none'"
),
);
headers.insert(
HeaderName::from_static("permissions-policy"),
HeaderValue::from_static("camera=(), microphone=(), geolocation=()"),
);
headers.insert(
HeaderName::from_static("cross-origin-opener-policy"),
HeaderValue::from_static("same-origin"),
);
res
}
Apply the middleware to your router:
use axum::{middleware, Router};
let app = Router::new()
.route("/", get(index))
.layer(middleware::from_fn(security_headers));
Add use http::HeaderName; for the headers that are not in the http crate’s built-in constants (permissions-policy, cross-origin-opener-policy).
| Header | Value | Purpose |
X-Content-Type-Options | nosniff | Prevents browsers from MIME-type sniffing responses away from the declared Content-Type. Stops attacks that trick browsers into treating HTML as JavaScript. |
X-Frame-Options | DENY | Prevents the page from being embedded in <iframe>, <frame>, or <object> elements. Blocks clickjacking. Superseded by CSP frame-ancestors but still needed for older browsers. |
Strict-Transport-Security | max-age=63072000; includeSubDomains | Forces HTTPS for two years, including all subdomains. Only set this once you are committed to HTTPS (it is difficult to undo). |
Referrer-Policy | strict-origin-when-cross-origin | Sends the full URL as referrer for same-origin requests, but only the origin (no path) for cross-origin requests. Prevents leaking internal URL paths to external sites. |
Content-Security-Policy | See above | Controls which sources of content the browser will load. The primary defence against XSS beyond output escaping. |
Permissions-Policy | camera=(), microphone=(), geolocation=() | Disables browser APIs your application does not use. Prevents third-party scripts (if any) from accessing the camera, microphone, or location. |
Cross-Origin-Opener-Policy | same-origin | Isolates the browsing context from cross-origin popups. Prevents Spectre-style side-channel attacks from cross-origin windows. |
HSTS caution
Strict-Transport-Security with includeSubDomains cannot be easily reversed once browsers cache it. Before deploying, confirm that every subdomain supports HTTPS. Start with a shorter max-age (e.g., 300 for 5 minutes) during testing and increase it once you are confident.
Dependency auditing
Rust crates can have known vulnerabilities. Two tools catch them in CI.
cargo audit checks your Cargo.lock against the RustSec Advisory Database, which tracks reported vulnerabilities in Rust crates:
cargo install cargo-audit
cargo audit
cargo deny is broader. It checks advisories (same database as cargo audit), licence compliance, duplicate dependency versions, and can ban specific crates:
cargo install cargo-deny
cargo deny init
cargo deny check
Run both in CI on every pull request. cargo audit is fast and catches known CVEs. cargo deny enforces policy (no GPL dependencies, no duplicate versions of security-critical crates, ban specific crates you have decided against).
Secure coding with AI agents
Empirical research shows that AI coding assistants introduce security vulnerabilities at measurable rates. A 2021 study by NYU researchers found that roughly 40% of GitHub Copilot’s generated code samples contained vulnerabilities across the MITRE CWE Top 25 categories. A Stanford user study found that participants using AI assistance wrote less secure code and were simultaneously more confident in its security.
These findings are relevant to this stack specifically:
unsafe blocks. AI agents may introduce unsafe when a safe alternative exists. Audit every unsafe block in AI-generated code for necessity and correctness.
PreEscaped with untrusted input. An agent solving an HTML rendering problem may reach for PreEscaped without considering the XSS implications.
format!() in SQL. An agent may build dynamic queries with string formatting instead of bind parameters, especially for complex queries with optional filters.
- Error messages that leak internals. AI-generated error handlers often pass raw database errors or file paths through to HTTP responses.
- Overly permissive defaults. CORS set to
*, SameSite::None, Secure: false, missing security headers. AI tends to generate code that works rather than code that is secure.
The Building with AI Coding Agents section covers review practices, a security checklist, and workflow patterns for catching these issues systematically.
Gotchas
PreEscaped is the primary XSS risk in Maud. Search your codebase for every use of PreEscaped and verify that the input is either trusted or sanitised. This is the single most impactful security audit you can do in a Maud application.
format!() is the primary SQL injection risk in SQLx. Search for format! near sqlx::query calls. Prefer query! (compile-time checked, string literal only) over query() (runtime, accepts &str) wherever possible.
CSP breaks htmx if you do not plan for it. Deploy CSP early, before you have built features that depend on inline scripts or hx-on:* attributes. Retrofitting a strict CSP onto an existing application is significantly harder than building with one from the start.
HSTS is sticky. Once a browser sees a Strict-Transport-Security header with a long max-age, it will refuse HTTP connections to your domain until the max-age expires. Test with short values first.
Security headers are not set by default. Axum sends no security headers out of the box. The middleware above is not optional; without it, your application is missing basic protections that browsers check for.
Forms & Errors
Form Handling and Validation
HTML forms are the primary input mechanism in a hypermedia-driven application. The browser collects data, the server validates and processes it, and the response is HTML. There is no JSON serialisation layer, no client-side state management for form data, and no separate API to keep in sync.
This section covers extracting form data in Axum handlers, sanitising and validating it, building a custom ValidatedForm extractor that combines all three steps, displaying errors with Maud and htmx, and the Post/Redirect/Get pattern for safe form submissions.
Use the Form<T> extractor from axum-extra (not the one in axum itself). The axum-extra version uses serde_html_form under the hood, which correctly handles multi-value fields: multiple <input> elements with the same name (checkboxes, for example) and <select> elements with the multiple attribute. The standard axum::extract::Form uses serde_urlencoded, which does not support these cases.
[dependencies]
axum-extra = { version = "0.10", features = ["form"] }
Define a struct with Deserialize:
use serde::Deserialize;
#[derive(Deserialize)]
struct CreateContact {
name: String,
email: String,
phone: Option<String>,
}
Extract it in a handler:
use axum_extra::extract::Form;
use axum::response::Redirect;
async fn create_contact(
Form(input): Form<CreateContact>,
) -> Redirect {
Redirect::to("/contacts")
}
For multi-value fields, collect into a Vec:
#[derive(Deserialize)]
struct SurveyResponse {
name: String,
#[serde(rename = "interest")]
interests: Vec<String>,
}
html! {
fieldset {
legend { "Interests" }
label {
input type="checkbox" name="interest" value="rust";
" Rust"
}
label {
input type="checkbox" name="interest" value="web";
" Web development"
}
label {
input type="checkbox" name="interest" value="databases";
" Databases"
}
}
}
Each checked box sends interest=rust&interest=web, and serde_html_form collects them into the Vec<String>.
Wire the handler to a POST route:
use axum::{routing::{get, post}, Router};
let app = Router::new()
.route("/contacts", get(list_contacts))
.route("/contacts", post(create_contact));
The corresponding HTML form:
use maud::{html, Markup};
fn contact_form() -> Markup {
html! {
form method="post" action="/contacts" {
label for="name" { "Name" }
input #name type="text" name="name" required;
label for="email" { "Email" }
input #email type="email" name="email" required;
label for="phone" { "Phone" }
input #phone type="tel" name="phone";
button type="submit" { "Save" }
}
}
}
The name attributes on the <input> elements must match the struct field names. serde handles the mapping. For fields with names that differ from Rust conventions, use #[serde(rename = "field-name")].
Option<String> fields map to inputs that may be left blank. If the field is absent or empty in the form submission, serde deserialises it as None.
Handling deserialisation failures
If the form body cannot be deserialised into the target struct (missing required fields, wrong types), Axum returns a 422 Unprocessable Entity by default. For a better user experience, accept a Result and handle the rejection:
use axum_extra::extract::FormRejection;
async fn create_contact(
form: Result<Form<CreateContact>, FormRejection>,
) -> impl IntoResponse {
match form {
Ok(Form(input)) => {
Redirect::to("/contacts").into_response()
}
Err(_) => {
(StatusCode::UNPROCESSABLE_ENTITY, contact_form()).into_response()
}
}
}
In practice, deserialisation failures are rare when the HTML form matches the struct. Validation errors (invalid email format, value out of range) are the common case.
User input needs cleaning before validation. Leading and trailing whitespace, inconsistent casing, and stray non-alphanumeric characters cause validation failures that are not the user’s fault. The sanitizer crate provides a derive macro that declares sanitisation rules directly on struct fields, the same way validator declares validation rules.
[dependencies]
sanitizer = "1"
Add Sanitize alongside Deserialize:
use sanitizer::prelude::*;
use serde::Deserialize;
#[derive(Deserialize, Sanitize)]
struct CreateContact {
#[sanitize(trim)]
name: String,
#[sanitize(trim, lower_case)]
email: String,
#[sanitize(trim)]
phone: Option<String>,
}
Call .sanitize() to modify the struct in place:
let mut input = CreateContact {
name: " Alice ".into(),
email: " Alice@Example.COM ".into(),
phone: None,
};
input.sanitize();
Available sanitisers
| Sanitiser | Effect |
trim | Remove leading and trailing whitespace |
lower_case | Convert to lowercase |
upper_case | Convert to UPPERCASE |
camel_case | Convert to camelCase |
snake_case | Convert to snake_case |
screaming_snake_case | Convert to SCREAMING_SNAKE_CASE |
numeric | Remove all non-numeric characters |
alphanumeric | Remove all non-alphanumeric characters |
e164 | Convert phone number to E.164 international format |
clamp(min, max) | Clamp an integer to a range |
clamp(max) | Truncate a string to a maximum length |
custom(function_name) | Apply a custom sanitisation function |
Custom sanitisation functions
For rules beyond the built-ins, write a function that takes &str and returns String:
use sanitizer::StringSanitizer;
fn collapse_whitespace(input: &str) -> String {
let mut s = StringSanitizer::from(input);
s.trim();
s.get()
.split_whitespace()
.collect::<Vec<_>>()
.join(" ")
}
#[derive(Deserialize, Sanitize)]
struct CreatePost {
#[sanitize(custom(collapse_whitespace))]
title: String,
}
Sanitisation runs before validation. Trim whitespace, normalise casing, and clean up formatting first, then validate the cleaned values. This order matters: " " (a single space) fails a length(min = 1) check only if you trim it first.
Server-side validation with validator
The validator crate provides a Validate derive macro that adds declarative validation rules to structs. Validation runs on the server after sanitisation and before any database or business logic.
[dependencies]
validator = { version = "0.20", features = ["derive"] }
Add Validate to the struct:
use sanitizer::prelude::*;
use serde::Deserialize;
use validator::Validate;
#[derive(Deserialize, Sanitize, Validate)]
struct CreateContact {
#[sanitize(trim)]
#[validate(length(min = 1, max = 255, message = "Name is required"))]
name: String,
#[sanitize(trim, lower_case)]
#[validate(email(message = "Enter a valid email address"))]
email: String,
#[sanitize(trim)]
#[validate(length(max = 20, message = "Phone number too long"))]
phone: Option<String>,
}
Built-in validators
| Validator | Usage | Checks |
email | #[validate(email)] | Valid email format per HTML5 spec |
url | #[validate(url)] | Valid URL |
length | #[validate(length(min = 1, max = 100))] | String or Vec length bounds |
range | #[validate(range(min = 0, max = 150))] | Numeric value bounds |
must_match | #[validate(must_match(other = "password_confirm"))] | Two fields have the same value |
contains | #[validate(contains(pattern = "@"))] | String contains a substring |
does_not_contain | #[validate(does_not_contain(pattern = "admin"))] | String does not contain a substring |
regex | #[validate(regex(path = *RE_PHONE))] | Matches a compiled regex |
custom | #[validate(custom(function = "check_slug"))] | Runs a custom function |
Every validator accepts an optional message parameter that provides the error text shown to users. Without it, the crate produces a default message keyed by the validation rule name.
Custom validation functions
For rules that don’t fit the built-in validators, write a function that returns Result<(), ValidationError>:
use validator::ValidationError;
fn validate_no_profanity(value: &str) -> Result<(), ValidationError> {
let blocked = ["spam", "scam"];
if blocked.iter().any(|w| value.to_lowercase().contains(w)) {
return Err(ValidationError::new("profanity")
.with_message("Contains blocked content".into()));
}
Ok(())
}
#[derive(Deserialize, Sanitize, Validate)]
struct CreatePost {
#[sanitize(trim)]
#[validate(length(min = 1, max = 200))]
title: String,
#[sanitize(trim)]
#[validate(custom(function = "validate_no_profanity"))]
body: String,
}
Nested validation
Structs containing other validatable structs use #[validate(nested)]:
#[derive(Deserialize, Sanitize, Validate)]
struct Address {
#[sanitize(trim)]
#[validate(length(min = 1))]
street: String,
#[sanitize(trim)]
#[validate(length(min = 1))]
city: String,
}
#[derive(Deserialize, Sanitize, Validate)]
struct CreateUser {
#[sanitize(trim)]
#[validate(length(min = 1))]
name: String,
#[sanitize]
#[validate(nested)]
address: Address,
}
Every form handler follows the same sequence: deserialise the body, sanitise the fields, validate, then branch on the result. A custom ValidatedForm<T> extractor wraps axum-extra’s Form and performs all three steps, so handlers never repeat the boilerplate.
The extractor uses FormRejection only for deserialisation failures (malformed request bodies). Validation failures are not rejections; they are a normal part of form handling. The extractor returns both the sanitised input and any validation errors, so the handler always has access to the user’s data for re-rendering the form.
use axum::extract::{FromRequest, Request};
use axum::http::StatusCode;
use axum::response::{IntoResponse, Response};
use axum_extra::extract::{Form, FormRejection};
use sanitizer::prelude::*;
use validator::{Validate, ValidationErrors};
pub struct ValidatedForm<T> {
pub input: T,
pub errors: Option<ValidationErrors>,
}
impl<S, T> FromRequest<S> for ValidatedForm<T>
where
S: Send + Sync,
T: serde::de::DeserializeOwned + Sanitize + Validate,
Form<T>: FromRequest<S, Rejection = FormRejection>,
{
type Rejection = FormRejection;
async fn from_request(
req: Request,
state: &S,
) -> Result<Self, Self::Rejection> {
let Form(mut input) = Form::<T>::from_request(req, state).await?;
input.sanitize();
let errors = input.validate().err();
Ok(ValidatedForm { input, errors })
}
}
The handler pattern becomes:
async fn create_contact(
validated: ValidatedForm<CreateContact>,
) -> impl IntoResponse {
if let Some(errors) = &validated.errors {
return (
StatusCode::UNPROCESSABLE_ENTITY,
render_contact_form(&validated.input, errors),
).into_response();
}
save_contact(&validated.input).await;
Redirect::to("/contacts").into_response()
}
The ValidatedForm extractor handles the mechanical work. The handler deals only with the business logic: render errors or save and redirect.
Place the ValidatedForm definition in a shared crate in your workspace (e.g., common or web). Every form handler across the application can use it.
HTML5 client-side validation
Use HTML5 validation attributes as the first line of defence. They provide instant feedback without a server round-trip and reduce unnecessary requests. The server always validates too, because client-side validation is trivially bypassed.
The relevant attributes:
| Attribute | Purpose | Example |
required | Field must not be empty | input required; |
type="email" | Must look like an email | input type="email"; |
type="url" | Must look like a URL | input type="url"; |
minlength / maxlength | Text length bounds | input minlength="1" maxlength="255"; |
min / max | Numeric or date bounds | input type="number" min="0" max="150"; |
pattern | Regex match | input pattern="[A-Za-z]+" title="Letters only"; |
Apply these in your Maud templates alongside the server-side validator rules. Keep the constraints consistent: if the server requires length(min = 1, max = 255), set required minlength="1" maxlength="255" on the input.
fn contact_form_fields(input: Option<&CreateContact>) -> Markup {
let name_val = input.map(|i| i.name.as_str()).unwrap_or("");
let email_val = input.map(|i| i.email.as_str()).unwrap_or("");
html! {
label for="name" { "Name" }
input #name type="text" name="name" value=(name_val)
required minlength="1" maxlength="255";
label for="email" { "Email" }
input #email type="email" name="email" value=(email_val)
required;
}
}
HTML5 validation is not a substitute for server-side validation. It is a UX optimisation that catches obvious mistakes before they hit the network.
Displaying validation errors with Maud
When validation fails, re-render the form with the user’s input preserved and error messages next to the relevant fields. The ValidationErrors struct from validator maps field names to a list of ValidationError values, each with a message field.
A helper to extract the first error message for a given field:
use validator::ValidationErrors;
fn field_error(errors: &ValidationErrors, field: &str) -> Option<String> {
errors
.field_errors()
.get(field)
.and_then(|errs| errs.first())
.and_then(|e| e.message.as_ref())
.map(|msg| msg.to_string())
}
An error message component:
fn field_error_message(errors: Option<&ValidationErrors>, field: &str) -> Markup {
let msg = errors.and_then(|e| field_error(e, field));
html! {
@if let Some(msg) = msg {
span.field-error role="alert" { (msg) }
}
}
}
Wire it into the form:
fn render_contact_form(
input: &CreateContact,
errors: &ValidationErrors,
) -> Markup {
html! {
form method="post" action="/contacts" {
div.form-error role="alert" {
p { "Please fix the errors below." }
}
div.field {
label for="name" { "Name" }
input #name type="text" name="name" value=(input.name)
required minlength="1" maxlength="255";
(field_error_message(Some(errors), "name"))
}
div.field {
label for="email" { "Email" }
input #email type="email" name="email" value=(input.email)
required;
(field_error_message(Some(errors), "email"))
}
button type="submit" { "Save" }
}
}
}
The handler using ValidatedForm:
async fn show_contact_form() -> Markup {
contact_form()
}
async fn create_contact(
validated: ValidatedForm<CreateContact>,
) -> impl IntoResponse {
if let Some(errors) = &validated.errors {
return (
StatusCode::UNPROCESSABLE_ENTITY,
render_contact_form(&validated.input, errors),
).into_response();
}
save_contact(&validated.input).await;
Redirect::to("/contacts").into_response()
}
Inline field validation with htmx
The full-form pattern above works without JavaScript. For a more responsive experience, add inline validation that checks individual fields as the user fills them in, using htmx to swap error messages without a full page reload.
Create a validation endpoint that accepts a single field value and returns just the error markup:
#[derive(Deserialize)]
struct FieldValidation {
name: Option<String>,
email: Option<String>,
}
async fn validate_field(
Form(input): Form<FieldValidation>,
) -> Markup {
let mut contact = CreateContact {
name: input.name.clone().unwrap_or_default(),
email: input.email.clone().unwrap_or_default(),
phone: None,
};
contact.sanitize();
let errors = contact.validate().err();
if input.name.is_some() {
return field_error_message(errors.as_ref(), "name");
}
if input.email.is_some() {
return field_error_message(errors.as_ref(), "email");
}
html! {}
}
Add htmx attributes to the form inputs. Each field posts its value on blur and swaps the error message next to it:
fn contact_form_with_inline_validation(
input: Option<&CreateContact>,
errors: Option<&ValidationErrors>,
) -> Markup {
let name_val = input.map(|i| i.name.as_str()).unwrap_or("");
let email_val = input.map(|i| i.email.as_str()).unwrap_or("");
html! {
form method="post" action="/contacts" {
div.field {
label for="name" { "Name" }
input #name type="text" name="name" value=(name_val)
required minlength="1" maxlength="255"
hx-post="/contacts/validate"
hx-trigger="blur"
hx-target="next .field-error-slot"
hx-swap="innerHTML";
span.field-error-slot {
(field_error_message(errors, "name"))
}
}
div.field {
label for="email" { "Email" }
input #email type="email" name="email" value=(email_val)
required
hx-post="/contacts/validate"
hx-trigger="blur"
hx-target="next .field-error-slot"
hx-swap="innerHTML";
span.field-error-slot {
(field_error_message(errors, "email"))
}
}
button type="submit" { "Save" }
}
}
}
Register the validation endpoint:
let app = Router::new()
.route("/contacts/new", get(show_contact_form))
.route("/contacts", post(create_contact))
.route("/contacts/validate", post(validate_field));
This layered approach gives three levels of validation feedback:
- HTML5 attributes catch basic mistakes instantly in the browser.
- htmx inline validation checks fields against server rules on blur, before submission.
- Full-form server validation on POST is the final authority. It always runs, catching anything the first two layers missed.
The form works without JavaScript (levels 1 and 3). htmx enhances it progressively.
Post/Redirect/Get
The Post/Redirect/Get (PRG) pattern prevents duplicate form submissions when users refresh the page after a POST. Without it, refreshing re-submits the form, potentially creating duplicate records.
The pattern:
- The browser POSTs the form data.
- The server processes it and responds with a 303 See Other redirect.
- The browser follows the redirect with a GET request.
- Refreshing the page repeats only the GET, not the POST.
In Axum:
use axum::response::Redirect;
async fn create_contact(
validated: ValidatedForm<CreateContact>,
) -> impl IntoResponse {
if let Some(errors) = &validated.errors {
return (
StatusCode::UNPROCESSABLE_ENTITY,
render_contact_form(&validated.input, errors),
).into_response();
}
save_contact(&validated.input).await;
Redirect::to("/contacts").into_response()
}
Redirect::to() sends a 303 See Other by default, which is correct for PRG. The browser converts the redirect to a GET regardless of the original method.
For success feedback after the redirect (a “Contact saved” flash message), store the message in the session before redirecting and display it on the next GET. Session management is covered in the Authentication section.
When the form is submitted via htmx (not a full page navigation), PRG is unnecessary. htmx replaces a targeted DOM fragment, and there is no browser history entry for the POST. The server can return an HTML fragment directly. Use the HxRequest extractor from axum-htmx to branch:
use axum_htmx::HxRequest;
async fn create_contact(
HxRequest(is_htmx): HxRequest,
validated: ValidatedForm<CreateContact>,
) -> impl IntoResponse {
if let Some(errors) = &validated.errors {
return (
StatusCode::UNPROCESSABLE_ENTITY,
render_contact_form(&validated.input, errors),
).into_response();
}
save_contact(&validated.input).await;
if is_htmx {
render_contact_list().await.into_response()
} else {
Redirect::to("/contacts").into_response()
}
}
CSRF protection
Every form that performs a state-changing action (POST, PUT, DELETE) needs protection against cross-site request forgery. Without it, a malicious page can submit a hidden form to your application using the victim’s authenticated session. CSRF protection is not specific to authentication forms; it applies to every form in the application, including the contact form examples above.
Apply the CSRF middleware layer to the router so it covers all routes with form handlers. The setup, configuration, and layer ordering are covered in the Authentication section.
File uploads
For forms that include file uploads, the browser sends multipart/form-data instead of URL-encoded data. The axum-typed-multipart crate provides a derive macro that handles multipart parsing with the same type-safe pattern as Form<T>.
[dependencies]
axum-typed-multipart = { version = "0.16", features = ["tempfile_3"] }
tempfile = "3"
The tempfile_3 feature streams uploads to temporary files instead of holding them in memory.
Define the upload struct:
use axum_typed_multipart::{FieldData, TryFromMultipart, TypedMultipart};
use tempfile::NamedTempFile;
#[derive(TryFromMultipart)]
struct CreateDocument {
title: String,
#[form_data(limit = "10MB")]
file: FieldData<NamedTempFile>,
}
FieldData<NamedTempFile> streams the upload to a temporary file on disk. The FieldData wrapper provides metadata: file.metadata.file_name for the original filename and file.metadata.content_type for the MIME type.
The handler:
async fn upload_document(
TypedMultipart(input): TypedMultipart<CreateDocument>,
) -> impl IntoResponse {
let file_name = input.file.metadata.file_name
.unwrap_or_else(|| "unnamed".to_string());
let content_type = input.file.metadata.content_type
.unwrap_or_else(|| "application/octet-stream".parse().unwrap());
let temp_path = input.file.contents.path();
Redirect::to("/documents")
}
The form needs enctype="multipart/form-data":
html! {
form method="post" action="/documents" enctype="multipart/form-data" {
label for="title" { "Title" }
input #title type="text" name="title" required;
label for="file" { "File" }
input #file type="file" name="file" required accept=".pdf,.doc,.docx";
button type="submit" { "Upload" }
}
}
For processing and storing uploaded files (S3-compatible storage, permanent paths, serving files back), see the File Storage section.
Alternatives
garde is an alternative validation crate with a different API style. Where validator uses string-based attribute arguments (#[validate(length(min = 1))]), garde uses Rust expressions (#[garde(length(min = 1))]) and supports context-dependent validation through a generic context parameter. Both crates are actively maintained. This guide uses validator because it is more widely adopted and its API is sufficient for typical web form validation.
Gotchas
Field names must match. The name attribute in the HTML form must match the struct field name exactly (or the #[serde(rename)] value). A mismatch causes deserialisation to silently use the default or fail entirely, depending on whether the field is Option.
Sanitise before validating. The ValidatedForm extractor handles this order automatically. If you call .validate() without sanitising first, a value like " " (whitespace) passes a length(min = 1) check even though it contains no meaningful content.
Validation runs after deserialisation. If serde cannot parse the form body at all (e.g., a required field is completely missing), Axum rejects the request before sanitisation or validation ever runs. The ValidatedForm extractor surfaces this as a FormRejection.
validator checks values, not business rules. Format and range checks belong on the struct. Rules that require database access (uniqueness, referential integrity) belong in the handler or service layer, after validation passes.
Optional fields need special handling with validator. #[validate(email)] on an Option<String> only validates the inner value when it is Some. An empty optional field passes validation, which is usually what you want. If a field should be non-empty when present, add #[validate(length(min = 1))].
CSRF protection is not optional. Every POST form needs CSRF middleware, not just login and registration. A contacts form, a settings page, a comment box: if it changes state, it needs protection. See CSRF protection for setup.
multipart/form-data for file uploads. Standard Form<T> only handles URL-encoded bodies. If the form includes a file input, the enctype must be multipart/form-data and the handler must use TypedMultipart<T> instead of Form<T>. The ValidatedForm extractor does not apply to multipart forms.
Error Handling
Rust has no exceptions. Functions that can fail return Result<T, E>, and the caller decides what to do with the error. This is a strength, but it means you need a deliberate strategy for how errors propagate through your web application and how they turn into HTTP responses.
The approach here uses two crates that serve different purposes. thiserror generates typed error enums for domain and library code, where callers need to match on specific failure modes. anyhow provides type-erased error propagation for application code, where you just want to bubble errors up with context.
thiserror for typed errors
thiserror is a derive macro that generates std::error::Error implementations for your error types. Use it when callers need to distinguish between different failure modes.
[dependencies]
thiserror = "2"
Define an error enum for a domain module:
use thiserror::Error;
#[derive(Debug, Error)]
pub enum UserError {
#[error("user not found: {0}")]
NotFound(i64),
#[error("email already registered: {0}")]
DuplicateEmail(String),
#[error("invalid email format: {0}")]
InvalidEmail(String),
}
The #[error("...")] attribute generates the Display implementation. Field values are interpolated using the same syntax as format!.
Wrapping source errors with #[from]
#[from] generates a From<T> implementation and sets the error’s source() return value, so the ? operator converts the source error automatically:
use thiserror::Error;
#[derive(Debug, Error)]
pub enum RepoError {
#[error("database error")]
Database(#[from] sqlx::Error),
#[error("user not found: {0}")]
NotFound(i64),
#[error("email already registered: {0}")]
DuplicateEmail(String),
}
A repository function can now use ? on SQLx calls and the error converts automatically:
async fn find_user(pool: &PgPool, id: i64) -> Result<User, RepoError> {
let user = sqlx::query_as!(User, "SELECT * FROM users WHERE id = $1", id)
.fetch_one(pool)
.await?;
Ok(user)
}
Transparent forwarding
#[error(transparent)] delegates both Display and source() to the inner error. This is useful for catch-all variants:
#[derive(Debug, Error)]
pub enum AppError {
#[error("not found: {0}")]
NotFound(String),
#[error(transparent)]
Database(#[from] sqlx::Error),
}
When AppError::Database is displayed, it prints the SQLx error’s message directly rather than wrapping it.
When to use thiserror
Use thiserror in library crates, domain modules, and any code where the caller needs to match on the error variant to decide what to do. A repository crate that returns RepoError::NotFound lets the handler return a 404. A repository that returns anyhow::Error forces the handler to treat everything as a 500.
anyhow for application code
anyhow provides a single error type, anyhow::Error, that wraps any error implementing std::error::Error. It is designed for application code where you want to propagate errors with added context rather than define typed variants for every possible failure.
[dependencies]
anyhow = "1"
Adding context
The .context() and .with_context() methods attach human-readable messages to errors as they propagate up the call stack:
use anyhow::{Context, Result};
async fn load_config(path: &str) -> Result<Config> {
let content = std::fs::read_to_string(path)
.with_context(|| format!("failed to read config from {path}"))?;
let config: Config = toml::from_str(&content)
.context("failed to parse config")?;
Ok(config)
}
Result here is anyhow::Result<T>, an alias for Result<T, anyhow::Error>. The ? operator converts any error into anyhow::Error and the .context() call wraps it with an additional message. When logged, the full chain is visible: “failed to parse config: expected =, found [ at line 3”.
Use .with_context(|| ...) when the message involves formatting (the closure is only evaluated on error). Use .context("...") for static strings.
bail! and ensure!
For errors that don’t originate from another error type:
use anyhow::{bail, ensure, Result};
fn validate_port(port: u16) -> Result<()> {
ensure!(port >= 1024, "port {port} is in the privileged range");
if port == 0 {
bail!("port must not be zero");
}
Ok(())
}
bail! returns early with an error. ensure! is a conditional bail!.
When to use anyhow
Use anyhow in application-level code where you don’t need callers to match on specific error variants: startup logic, configuration loading, background tasks, and any place where the only reasonable response to an error is to log it and return a 500. Anyhow’s .context() method produces better diagnostic messages than a bare ?, because each layer of the call stack can explain what it was trying to do.
The AppError type
Web handlers need to convert errors into HTTP responses. Axum requires that both the success and error types in a handler’s Result implement IntoResponse. The standard pattern is a single AppError enum that maps each variant to an HTTP status code.
use axum::http::StatusCode;
use axum::response::{IntoResponse, Response};
use thiserror::Error;
#[derive(Debug, Error)]
pub enum AppError {
#[error("not found: {0}")]
NotFound(String),
#[error("conflict: {0}")]
Conflict(String),
#[error("bad request: {0}")]
BadRequest(String),
#[error("unauthorized")]
Unauthorized,
#[error("database error")]
Database(#[from] sqlx::Error),
}
impl IntoResponse for AppError {
fn into_response(self) -> Response {
let status = match &self {
AppError::NotFound(_) => StatusCode::NOT_FOUND,
AppError::Conflict(_) => StatusCode::CONFLICT,
AppError::BadRequest(_) => StatusCode::BAD_REQUEST,
AppError::Unauthorized => StatusCode::UNAUTHORIZED,
AppError::Database(_) => StatusCode::INTERNAL_SERVER_ERROR,
};
(status, self.to_string()).into_response()
}
}
Handlers return Result<impl IntoResponse, AppError>:
async fn get_user(
State(pool): State<PgPool>,
Path(id): Path<i64>,
) -> Result<Html<String>, AppError> {
let user = sqlx::query_as!(User, "SELECT * FROM users WHERE id = $1", id)
.fetch_optional(&pool)
.await?
.ok_or_else(|| AppError::NotFound(format!("user {id}")))?;
Ok(Html(render_user(&user)))
}
The ? operator on the SQLx call converts sqlx::Error into AppError::Database via the #[from] attribute. The .ok_or_else() on the Option produces AppError::NotFound when the query returns no rows.
Converting from domain errors
Domain modules often define their own error types. Add From implementations to convert them into AppError:
impl From<UserError> for AppError {
fn from(err: UserError) -> Self {
match err {
UserError::NotFound(id) => AppError::NotFound(format!("user {id}")),
UserError::DuplicateEmail(email) => {
AppError::Conflict(format!("email {email} already registered"))
}
UserError::InvalidEmail(msg) => AppError::BadRequest(msg),
}
}
}
This mapping is where domain semantics meet HTTP semantics. A DuplicateEmail is a domain concept; a 409 Conflict is an HTTP concept. The From implementation is the bridge.
Mapping database errors
Most SQLx errors should map to a 500. The two cases worth handling explicitly are missing rows and unique constraint violations:
impl From<sqlx::Error> for AppError {
fn from(err: sqlx::Error) -> Self {
match &err {
sqlx::Error::RowNotFound => {
AppError::NotFound("record not found".to_string())
}
sqlx::Error::Database(db_err) if db_err.is_unique_violation() => {
AppError::Conflict("duplicate record".to_string())
}
_ => AppError::Database(err),
}
}
}
Prefer fetch_optional() over fetch_one() when a missing row is a normal case rather than an error. fetch_optional() returns Ok(None) instead of Err(RowNotFound), which keeps the “not found” logic in the handler where you have more context about what was being looked up:
let user = sqlx::query_as!(User, "SELECT * FROM users WHERE id = $1", id)
.fetch_optional(&pool)
.await?
.ok_or_else(|| AppError::NotFound(format!("user {id}")))?;
This produces a better error message (“user 42 not found”) than the generic “record not found” from the From<sqlx::Error> conversion.
Never expose raw database error messages to users. They can leak table names, column names, and constraint details. The From impl above replaces the database message with a generic string. Log the original error for debugging (covered below).
User-facing error pages
The plain text responses above work for development. For a production HDA application, render HTML error pages with Maud.
Define an error page component:
use maud::{html, Markup};
use axum::http::StatusCode;
fn error_page(status: StatusCode, message: &str) -> Markup {
html! {
h1 { (status.as_u16()) " " (status.canonical_reason().unwrap_or("Error")) }
p { (message) }
}
}
Wrap it in your site layout the same way you wrap any other page. The error page should look like part of the application, not a raw browser error.
Update IntoResponse to render HTML:
impl IntoResponse for AppError {
fn into_response(self) -> Response {
let (status, message) = match &self {
AppError::NotFound(msg) => (StatusCode::NOT_FOUND, msg.clone()),
AppError::Conflict(msg) => (StatusCode::CONFLICT, msg.clone()),
AppError::BadRequest(msg) => (StatusCode::BAD_REQUEST, msg.clone()),
AppError::Unauthorized => {
(StatusCode::UNAUTHORIZED, "Unauthorized".to_string())
}
AppError::Database(_) => (
StatusCode::INTERNAL_SERVER_ERROR,
"An internal error occurred".to_string(),
),
};
(status, error_page(status, &message)).into_response()
}
}
For 500 errors, show a generic message. The user does not need to know that the database connection timed out.
Error fragments for htmx requests
When an htmx request fails, you often want to return an error fragment that slots into the existing page rather than a full error page. Check the HX-Request header to branch:
use axum_htmx::HxRequest;
async fn delete_user(
HxRequest(is_htmx): HxRequest,
State(pool): State<PgPool>,
Path(id): Path<i64>,
) -> Result<impl IntoResponse, AppError> {
delete_user_by_id(&pool, id).await?;
if is_htmx {
Ok(Html("".to_string()).into_response())
} else {
Ok(Redirect::to("/users").into_response())
}
}
For a more systematic approach, move the htmx check into the IntoResponse implementation. This requires access to the request headers, which IntoResponse does not have. One option is to store the HxRequest value in the AppError type or use a middleware that sets a response header. In practice, handling htmx errors at the handler level (as above) is simpler and more explicit.
Logging errors with tracing
Log errors in the IntoResponse implementation so every error is captured in one place. This avoids scattering tracing::error! calls across every handler.
impl IntoResponse for AppError {
fn into_response(self) -> Response {
let (status, message) = match &self {
AppError::NotFound(msg) => {
tracing::warn!(error = %self, "not found");
(StatusCode::NOT_FOUND, msg.clone())
}
AppError::Conflict(msg) => {
tracing::warn!(error = %self, "conflict");
(StatusCode::CONFLICT, msg.clone())
}
AppError::BadRequest(msg) => {
tracing::warn!(error = %self, "bad request");
(StatusCode::BAD_REQUEST, msg.clone())
}
AppError::Unauthorized => {
tracing::warn!("unauthorized request");
(StatusCode::UNAUTHORIZED, "Unauthorized".to_string())
}
AppError::Database(err) => {
tracing::error!(error = ?err, "database error");
(
StatusCode::INTERNAL_SERVER_ERROR,
"An internal error occurred".to_string(),
)
}
};
(status, error_page(status, &message)).into_response()
}
}
Expected errors (not found, bad request) log at warn level with Display formatting (%). Unexpected errors (database failures) log at error level with Debug formatting (?) to capture the full error chain. This distinction keeps log noise manageable while ensuring genuine problems are visible.
Supplementary logging with #[instrument]
For handlers where you need more context about what failed, add #[instrument(err)] to log the error along with the function’s arguments:
use tracing::instrument;
#[instrument(skip(pool), err)]
async fn get_user(
State(pool): State<PgPool>,
Path(id): Path<i64>,
) -> Result<Html<String>, AppError> {
let user = find_user(&pool, id).await?;
Ok(Html(render_user(&user)))
}
skip(pool) prevents the database pool from being included in the log output (it produces enormous Debug output). err tells #[instrument] to log the error value when the function returns Err.
This is supplementary to the centralized logging in IntoResponse. Use it when you need to know which specific handler call failed and with what arguments. For most handlers, the centralized approach is sufficient.
The anyhow catch-all alternative
The AppError enum above requires a variant (or a From implementation) for every error source. As applications grow, this can mean a lot of conversion code. An alternative is to add an anyhow-based catch-all variant:
#[derive(Debug, Error)]
pub enum AppError {
#[error("not found: {0}")]
NotFound(String),
#[error("conflict: {0}")]
Conflict(String),
#[error("bad request: {0}")]
BadRequest(String),
#[error("unauthorized")]
Unauthorized,
#[error(transparent)]
Unexpected(#[from] anyhow::Error),
}
The Unexpected variant accepts any error that implements std::error::Error via anyhow’s blanket From implementation. Any error you haven’t explicitly handled falls through to this variant and becomes a 500.
This pattern is documented in the thiserror README and taught in Zero to Production in Rust. It trades explicit control for ergonomics: you no longer need to declare a variant or write a From impl for every error source, but unexpected errors all become opaque 500s with no chance for finer-grained status codes.
The trade-off is worth considering as your application grows. For a small application with a handful of error sources, explicit variants are manageable and give you full control over HTTP status codes. For a larger application with many fallible operations where most errors are genuinely unexpected, the catch-all reduces boilerplate.
Placement in the workspace
In a Cargo workspace, place the AppError type and its IntoResponse implementation in a shared crate (e.g., web or common). Domain error types (like UserError, OrderError) live in their respective domain crates with From implementations in the shared crate that bridge domain errors to AppError.
workspace/
├── crates/
│ ├── common/ # AppError, IntoResponse, shared types
│ ├── users/ # UserError, user domain logic
│ ├── orders/ # OrderError, order domain logic
│ └── web/ # Axum handlers, routes
Domain crates depend on thiserror. The shared crate depends on thiserror and axum. Domain crates do not depend on axum, keeping web concerns out of business logic.
Gotchas
Don’t expose internal details to users. Database errors, file paths, and stack traces belong in logs, not in HTTP responses. The IntoResponse implementation should return generic messages for 500 errors and log the real error separately.
sqlx::Error is non-exhaustive. Always include a catch-all arm when matching on it. New variants can be added in minor releases.
fetch_optional vs fetch_one. Use fetch_optional when a missing row is expected (looking up a user by ID). Use fetch_one when a missing row is a bug (fetching a record you just inserted). fetch_one returns Err(RowNotFound) on miss; fetch_optional returns Ok(None).
Order of From implementations matters. If AppError has both From<sqlx::Error> and From<RepoError>, and RepoError also wraps sqlx::Error, a bare ? on a SQLx call in a handler will use From<sqlx::Error> directly, bypassing the domain error’s semantics. Be explicit about which conversion you want.
thiserror 2.0 requires a direct dependency. Code using derive(Error) must declare thiserror as a direct dependency in its Cargo.toml. Relying on a transitive dependency no longer works as of thiserror 2.0.
Integrations
Server-Sent Events and Real-Time Updates
Server-Sent Events (SSE) push data from server to browser over a single HTTP connection. The browser opens a persistent connection, the server writes events to it as they occur, and the browser receives them in real time. No polling, no WebSocket handshake, no bidirectional protocol negotiation. For the class of problems that dominate hypermedia applications (notifications, progress bars, live feeds, status updates), SSE is the right tool.
This section covers implementing SSE endpoints in Axum, using Valkey pub/sub as the event distribution layer, consuming events in the browser with the htmx SSE extension, and common patterns for real-time features.
SSE fundamentals
SSE uses a simple text-based protocol over HTTP. The server responds with Content-Type: text/event-stream and writes events as plain text:
event: notification
data: <div class="alert">New message from Alice</div>
event: status
data: <span class="status">Processing complete</span>
Each event has an optional event: name and a data: payload, separated by blank lines. The browser’s EventSource API connects to the endpoint, receives events, and dispatches them by name. If the connection drops, the browser reconnects automatically.
SSE vs WebSockets
SSE is unidirectional: server to client only. WebSockets are bidirectional. Pick based on the data flow:
| Use SSE when | Use WebSockets when |
| Server pushes updates to the browser | Client and server both send messages |
| Notifications, progress bars, live feeds | Chat, collaborative editing, gaming |
| HTML fragments for htmx to swap | Binary data or high-frequency bidirectional messaging |
| You want HTTP semantics (caching, auth, proxies) | You need a persistent bidirectional channel |
SSE works over standard HTTP, which means it passes through proxies, load balancers, and CDNs without special configuration. WebSockets require upgrade support at every layer. For HDA applications where the server renders HTML and pushes fragments to the browser, SSE is the natural fit.
SSE endpoints in Axum
Axum provides first-class SSE support through axum::response::sse. The key types are Sse (the response wrapper), Event (a single event), and KeepAlive (heartbeat configuration).
A minimal SSE endpoint
use axum::response::sse::{Event, KeepAlive, Sse};
use futures_util::stream::Stream;
use std::convert::Infallible;
async fn events() -> Sse<impl Stream<Item = Result<Event, Infallible>>> {
let stream = futures_util::stream::repeat_with(|| {
Event::default()
.event("heartbeat")
.data("alive")
})
.map(Ok)
.throttle(std::time::Duration::from_secs(5));
Sse::new(stream).keep_alive(KeepAlive::default())
}
The handler returns Sse<impl Stream<Item = Result<Event, E>>> where E: Into<BoxError>. Axum sets Content-Type: text/event-stream and Cache-Control: no-cache automatically.
Building events
Event uses a builder pattern:
Event::default()
.event("notification")
.data("<div class=\"alert\">New message</div>")
Event::default()
.event("update")
.id("42")
.data("<span>Updated value</span>")
Event::default()
.retry(std::time::Duration::from_secs(5))
.data("connected")
Each setter (event, data, id, retry) can only be called once per Event. Calling it twice panics. The data method handles newlines in the payload correctly, splitting them across multiple data: lines per the SSE specification.
Keep-alive
Proxies and load balancers close idle HTTP connections. KeepAlive sends periodic comment lines (:keepalive\n\n) to keep the connection open:
Sse::new(stream).keep_alive(
KeepAlive::new()
.interval(std::time::Duration::from_secs(15))
.text("keepalive")
)
The default interval is 15 seconds. Always call .keep_alive() in production. Without it, connections through Nginx, Cloudflare, or other proxies will be silently dropped after their idle timeout.
Valkey pub/sub as the event bus
A single SSE endpoint connected to a single data source works for toy examples. In practice, events originate from many places (a background job finishes, another user edits a record, a workflow reaches a new stage) and each SSE client only cares about a subset of them. Valkey pub/sub provides the distribution layer, with per-resource channels as the organising principle.
Per-resource channels
Structure Valkey channels around the resources that generate events:
order:123 – status changes for order 123
project:456 – activity on project 456
user:789:notifications – notifications for user 789
task:abc:progress – progress updates for background task abc
Any part of your application publishes to the relevant channel. Each SSE connection subscribes only to the channels it needs. This maps naturally to how HDA pages work: a page showing order 123 opens an SSE connection that subscribes to order:123. A dashboard subscribes to several channels at once.
This design is also what Valkey performs best with. PUBLISH is O(N) where N is the number of subscribers on that specific channel. Many channels with a handful of subscribers each is fast. A single channel with thousands of subscribers makes every publish slow. The Valkey documentation and Redis creator are explicit on this point: prefer many fine-grained channels over a few broad ones.
The architecture
[Handler A] ──publish──▶ Valkey channel: order:123 ──subscribe──▶ [SSE Client 1]
[Handler B] ──publish──▶ Valkey channel: project:456 ──subscribe──▶ [SSE Client 2]
[Restate] ──publish──▶ Valkey channel: task:abc ──subscribe──▶ [SSE Client 1]
Each SSE connection opens its own Valkey pub/sub subscriber and subscribes to the specific channels the authenticated user is authorised to see. When the SSE client disconnects, the Valkey connection drops and the subscriptions are cleaned up automatically.
Dependencies
[dependencies]
axum = "0.8"
redis = { version = "1.0", features = ["tokio-comp"] }
tokio = { version = "1", features = ["full"] }
tokio-stream = "0.1"
futures-util = "0.3"
async-stream = "0.3"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
The redis crate works with Valkey without modification. Valkey is API-compatible with Redis, so any Redis client library works as-is.
Application state
#[derive(Clone)]
pub struct AppState {
pub db: sqlx::PgPool,
pub valkey: redis::Client,
}
Store a redis::Client rather than a connection. The client is a lightweight handle that creates new connections on demand. Each SSE handler will create its own pub/sub connection from this client.
Publishing events
Any handler or background process publishes events to a resource-specific channel:
use redis::AsyncCommands;
pub async fn publish_event(
valkey: &redis::Client,
channel: &str,
event_type: &str,
html: &str,
) -> Result<(), redis::RedisError> {
let mut conn = valkey.get_multiplexed_async_connection().await?;
let payload = serde_json::json!({
"event_type": event_type,
"data": html
});
conn.publish(channel, payload.to_string()).await?;
Ok(())
}
publish_event(
&state.valkey,
"order:123",
"status",
"<span class=\"badge\">Shipped</span>",
).await?;
The publisher uses a regular multiplexed connection. Multiplexed connections are shared across callers, so you do not need to manage a connection pool for publishing. The subscriber connection is separate because Valkey requires a dedicated connection for subscriptions (it cannot execute other commands while subscribed).
The SSE handler
Each SSE handler authenticates the user, checks authorisation for the requested resource, then opens a dedicated Valkey subscriber for that resource’s channel:
use async_stream::try_stream;
use axum::extract::{Path, State};
use axum::response::sse::{Event, KeepAlive, Sse};
use futures_util::stream::Stream;
use futures_util::StreamExt;
use std::convert::Infallible;
pub async fn order_events(
State(state): State<AppState>,
Path(order_id): Path<i64>,
user: AuthenticatedUser,
) -> Result<Sse<impl Stream<Item = Result<Event, Infallible>>>, AppError> {
let has_access = check_order_access(&state.db, user.id, order_id).await?;
if !has_access {
return Err(AppError::Forbidden);
}
let channel = format!("order:{order_id}");
let client = state.valkey.clone();
let stream = try_stream! {
let mut pubsub = client.get_async_pubsub().await.unwrap();
pubsub.subscribe(&channel).await.unwrap();
let mut messages = pubsub.into_on_message();
while let Some(msg) = messages.next().await {
let payload: String = msg.get_payload().unwrap();
if let Ok(event) = serde_json::from_str::<serde_json::Value>(&payload) {
let event_type = event["event_type"].as_str().unwrap_or("update");
let data = event["data"].as_str().unwrap_or("");
yield Event::default()
.event(event_type)
.data(data);
}
}
};
Ok(Sse::new(stream).keep_alive(KeepAlive::default()))
}
The authorisation check happens before the stream is created. If the user does not have access, the handler returns an error and no Valkey connection is opened. Once the SSE client disconnects (browser navigates away, tab closes, element removed from DOM), the stream is dropped, which drops the Valkey connection and automatically unsubscribes.
Subscribing to multiple channels
A page that needs events from several resources subscribes to all of them on a single connection:
pub async fn dashboard_events(
State(state): State<AppState>,
user: AuthenticatedUser,
) -> Result<Sse<impl Stream<Item = Result<Event, Infallible>>>, AppError> {
let channels = get_user_subscriptions(&state.db, user.id).await?;
let client = state.valkey.clone();
let stream = try_stream! {
let mut pubsub = client.get_async_pubsub().await.unwrap();
for channel in &channels {
pubsub.subscribe(channel).await.unwrap();
}
let mut messages = pubsub.into_on_message();
while let Some(msg) = messages.next().await {
let channel_name = msg.get_channel_name().to_string();
let payload: String = msg.get_payload().unwrap();
if let Ok(event) = serde_json::from_str::<serde_json::Value>(&payload) {
let event_type = event["event_type"].as_str().unwrap_or("update");
let data = event["data"].as_str().unwrap_or("");
yield Event::default()
.event(event_type)
.data(data);
}
}
};
Ok(Sse::new(stream).keep_alive(KeepAlive::default()))
}
The get_user_subscriptions function queries your database for the resources the user has access to and returns channel names like ["project:12", "project:45", "user:789:notifications"]. A single Valkey pub/sub connection can subscribe to any number of channels.
Wiring it together
use axum::{routing::get, Router};
#[tokio::main]
async fn main() {
let valkey = redis::Client::open("redis://127.0.0.1:6379").unwrap();
let state = AppState {
db: pool,
valkey,
};
let app = Router::new()
.route("/events/orders/{id}", get(order_events))
.route("/events/dashboard", get(dashboard_events))
.with_state(state);
let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await.unwrap();
axum::serve(listener, app).await.unwrap();
}
No global background tasks needed. Each SSE connection manages its own Valkey subscription lifecycle.
Security
SSE connections carry the same security requirements as any other authenticated endpoint, with additional considerations for long-lived connections.
Authentication
SSE uses a regular HTTP GET request. The browser’s EventSource API automatically sends cookies, so cookie-based session authentication works without any extra configuration. This is the recommended approach for HDA applications.
EventSource does not support custom HTTP headers. If your application uses Authorization: Bearer tokens, you cannot pass them through EventSource. Workarounds exist (tokens in query parameters, fetch-based SSE libraries), but they introduce their own risks. Stick with session cookies.
Authorisation at subscription time
Verify access before opening the Valkey subscription. The handler should:
- Authenticate the user from the session cookie.
- Extract the resource identifier from the request (path parameter, query parameter).
- Check that the user has permission to view the resource.
- Only then open the Valkey subscriber and begin streaming.
The order_events handler above demonstrates this pattern. The authorisation check is a standard database query, the same check you would run on a regular page load for that resource.
Per-resource channels provide security by architecture. Each SSE connection only receives messages from channels it explicitly subscribed to, and subscription is gated by a server-side authorisation check. There is no filtering step that could be bypassed or implemented incorrectly.
Do not use a single broadcast channel where all events flow to all connections and rely on server-side filtering. Every message passes through every connection’s filter logic, and any bug in that filter leaks data to unauthorised users.
Long-lived connection re-authorisation
SSE connections can persist for hours. Permissions change: a user’s role is downgraded, a project is archived, a session expires. The SSE connection established before the change continues streaming events unless you actively close it.
Strategies for handling this:
- Periodic session validation. Check the user’s session in the stream loop every N minutes. If the session is expired or revoked, close the stream.
- Revocation events. When a permission change occurs, publish a control event (e.g., to channel
user:{id}:control) that the SSE handler listens for and uses to close the connection.
- Short-lived connections. Set
sse-close on a timer event and have the browser reconnect periodically. Each reconnection runs the full authorisation check.
The simplest approach for most applications is periodic session validation. Add it to the stream loop:
let stream = try_stream! {
let mut pubsub = client.get_async_pubsub().await.unwrap();
pubsub.subscribe(&channel).await.unwrap();
let mut messages = pubsub.into_on_message();
let mut last_auth_check = std::time::Instant::now();
loop {
if last_auth_check.elapsed() > std::time::Duration::from_secs(300) {
let still_valid = check_order_access(&db, user_id, order_id).await;
if !still_valid.unwrap_or(false) {
break;
}
last_auth_check = std::time::Instant::now();
}
match tokio::time::timeout(
std::time::Duration::from_secs(30),
messages.next()
).await {
Ok(Some(msg)) => {
}
Ok(None) => break,
Err(_) => continue,
}
}
};
The tokio::time::timeout ensures the loop does not block indefinitely waiting for a message, giving the authorisation check a chance to run even when the channel is quiet.
CSRF
An attacker’s page can create an EventSource pointing at your SSE endpoint. The victim’s browser sends cookies automatically, so the attacker’s page receives the victim’s event stream.
Mitigate this with SameSite=Lax or SameSite=Strict on session cookies. SameSite=Lax is the browser default and prevents cookies from being sent on cross-origin sub-resource requests, which includes EventSource connections initiated from a different origin. If your session cookies already use SameSite=Lax (they should), this attack is blocked.
As a defence-in-depth measure, validate the Origin header on SSE endpoints and reject requests from unexpected origins.
Connection exhaustion
Each SSE connection consumes a TCP connection, a file descriptor, and a Tokio task. An attacker could open thousands of connections to exhaust server resources.
Rate-limit SSE connections per user and per IP at your reverse proxy layer. Caddy and Nginx both support connection limits. Also set a maximum connection duration server-side (close and let the browser reconnect after a reasonable period, e.g., 30 minutes).
Consuming events with htmx
The htmx SSE extension connects to an SSE endpoint and swaps event data into the DOM. The Interactivity with htmx section covers the basic setup. Here is the full pattern.
Connecting and swapping
Place hx-ext="sse" and sse-connect on a container element. Child elements with sse-swap receive the data from matching event names:
div hx-ext="sse" sse-connect="/events/orders/123" {
div sse-swap="status" hx-swap="innerHTML" {
"Loading status..."
}
div sse-swap="activity" hx-swap="innerHTML" {
"No recent activity."
}
}
When the server sends event: status\ndata: <span>Shipped</span>\n\n, htmx takes the data payload and swaps it into the element with sse-swap="status". The event name in the SSE stream must exactly match the sse-swap attribute value (case-sensitive).
Using SSE events as triggers
Instead of swapping SSE data directly, use an event as a trigger for a standard htmx request. This is useful when the SSE event signals “something changed” but the actual content comes from a separate endpoint:
div hx-ext="sse" sse-connect="/events/orders/123" {
div hx-get="/orders/123/details"
hx-trigger="sse:updated"
hx-swap="innerHTML" {
"Loading order details..."
}
}
This pattern keeps the SSE payload minimal (just a signal) and lets the triggered request fetch exactly the content it needs.
Closing the connection
The sse-close attribute closes the EventSource when a specific event arrives:
div hx-ext="sse" sse-connect="/events/tasks/abc/progress" sse-close="complete" {
div sse-swap="progress" {
"Starting..."
}
}
When the server sends event: complete\ndata: done\n\n, the browser closes the SSE connection. Without sse-close, the connection stays open until the element is removed from the DOM or the page navigates away.
Reconnection behaviour
The htmx SSE extension reconnects automatically with exponential backoff. The default configuration starts at 500ms and backs off to a maximum of 60 seconds, with 30% jitter to avoid thundering herd reconnections. It attempts up to 50 reconnections before giving up.
The browser’s native EventSource also reconnects on its own, but the htmx extension’s backoff algorithm is more configurable and better suited to production use. Each reconnection is a new HTTP request, so it runs through authentication and authorisation again.
Patterns
Progress updates
A long-running operation (file upload, report generation, data import) publishes progress events to a task-specific channel. The browser shows a progress bar that updates in real time.
Server-side, publish progress from wherever the work happens:
pub async fn publish_progress(
valkey: &redis::Client,
task_id: &str,
percent: u32,
message: &str,
) -> Result<(), anyhow::Error> {
let html = format!(
r#"<div class="progress-bar" style="width: {percent}%">{percent}%</div>
<p>{message}</p>"#
);
publish_event(valkey, &format!("task:{task_id}"), "progress", &html).await?;
Ok(())
}
When the task finishes, publish a completion event:
publish_event(valkey, &format!("task:{task_id}"), "complete", "<p>Done.</p>").await?;
Client-side, connect to the task’s SSE endpoint:
fn progress_tracker(task_id: &str) -> Markup {
html! {
div hx-ext="sse"
sse-connect=(format!("/events/tasks/{task_id}/progress"))
sse-close="complete" {
div sse-swap="progress" {
div .progress-bar style="width: 0%" { "0%" }
p { "Starting..." }
}
}
}
}
The sse-close="complete" closes the connection when the task finishes. No lingering connections.
Notifications
A notification feed that updates across all open tabs for a user. The SSE connection subscribes to the user’s notification channel:
div #notifications hx-ext="sse" sse-connect="/events/notifications" {
div sse-swap="notification" hx-swap="afterbegin" {
}
}
The hx-swap="afterbegin" prepends each new notification at the top of the container rather than replacing the entire contents. Each notification event delivers a self-contained HTML fragment.
The handler for /events/notifications subscribes to user:{user_id}:notifications, determined from the authenticated session.
Live data feeds
A dashboard element that refreshes when underlying data changes. Rather than streaming the data itself, use SSE as a signal to re-fetch:
div hx-ext="sse" sse-connect="/events/dashboard" {
div hx-get="/dashboard/metrics"
hx-trigger="sse:metrics-updated"
hx-swap="innerHTML" {
(render_metrics(¤t_metrics))
}
}
This pattern is simpler than pushing full HTML through the SSE stream, and it works well when the triggered endpoint already exists for initial page load.
Connection management
One Valkey connection per SSE client
Each SSE connection opens its own Valkey pub/sub connection. This is the simplest correct architecture. Valkey handles thousands of concurrent connections without issue. For most applications (hundreds of concurrent SSE clients), no further optimisation is needed.
If you reach tens of thousands of concurrent SSE connections on a single server process and Valkey connection count becomes a concern, introduce a subscription manager that deduplicates: when multiple SSE clients on the same server need the same channel, the manager subscribes once and fans out in-process via a tokio::broadcast channel. This is an optimisation, not a starting point.
Browser connection limits
Browsers limit the number of concurrent HTTP connections per domain. For HTTP/1.1, this limit is typically 6 connections. Each SSE connection consumes one of those slots. If a user opens multiple tabs, they can exhaust the limit quickly.
HTTP/2 multiplexes streams over a single TCP connection, so this limit does not apply. Most modern deployments use HTTP/2. If your reverse proxy (Caddy, Nginx) terminates TLS and serves over HTTP/2, this is a non-issue.
For pages that need events from many resources, prefer a single SSE connection that subscribes to multiple Valkey channels (as shown in the dashboard example) over multiple SSE connections from the same page.
Scaling beyond a single server
Per-resource channels work naturally across multiple application servers. When a handler on server A publishes to order:123, every server with a subscriber on that channel receives the message and delivers it to its local SSE clients. No coordination between servers is required because Valkey handles the distribution.
Integration with Restate
Restate workflows can publish progress events to Valkey as they execute. The SSE infrastructure picks them up and delivers them to the browser. A Restate workflow handler publishes status updates at each stage:
publish_event(
&valkey_client,
&format!("task:{workflow_id}"),
"progress",
"<p>Step 2 of 4: Processing data...</p>",
).await?;
The browser connects to /events/tasks/{workflow_id}/progress with sse-close="complete" and sees each step’s status in real time. The Background Jobs and Durable Execution with Restate section covers Restate in detail.
When SSE is not enough
SSE is unidirectional. If your application requires the client to send messages back to the server over the same connection (chat with typing indicators, collaborative cursors, multiplayer game state), you need WebSockets.
Axum supports WebSocket upgrades directly. The tokio-tungstenite crate provides the underlying WebSocket implementation, and Axum’s axum::extract::ws module wraps it with an ergonomic API. For most HDA applications, SSE covers the real-time requirements. Reach for WebSockets only when you have a genuinely bidirectional communication need.
Gotchas
Valkey pub/sub is fire-and-forget. Messages are not persisted. If a subscriber is disconnected when a message is published, that message is lost. If you need guaranteed delivery, use Valkey Streams instead of pub/sub, or design your application so that a missed SSE event triggers a full refresh on reconnection.
The subscriber connection is dedicated. A Valkey connection in subscribe mode cannot execute other commands (GET, SET, PUBLISH). Each SSE client uses its own dedicated subscriber connection. Publishing happens on a separate multiplexed connection.
Avoid pattern subscriptions at scale. PSUBSCRIBE (e.g., user:42:*) is convenient but has a global performance cost. Every PUBLISH to any channel pays O(M) where M is the total number of active pattern subscriptions across all clients. With hundreds of concurrent SSE connections each using pattern subscriptions, every publish slows down. Prefer explicit SUBSCRIBE to the specific channels each client needs.
Proxy timeouts can close idle connections. Nginx defaults to a 60-second proxy read timeout. Caddy and Cloudflare have their own defaults. KeepAlive sends heartbeat comments to prevent this, but verify your proxy configuration allows long-lived connections. For Nginx, set proxy_read_timeout to a high value (3600s or more) on SSE endpoints.
SSE connections count against server resources. Each connected client holds an open TCP connection, a Valkey connection, and a Tokio task. For hundreds of concurrent connections this is negligible. At tens of thousands, monitor memory and file descriptor usage on both your application server and Valkey.
Event names cannot contain newlines. The Event::event() method panics if the name contains \n or \r. Event names should be simple identifiers like status, progress, or updated.
HTTP Client and External APIs
Most web applications need to talk to something beyond their own database: a payment processor, a weather service, an email API, a third-party webhook. In Rust, reqwest is the standard HTTP client for these outgoing requests. It provides an async API built on hyper and tokio, with built-in JSON support, TLS, connection pooling, and configurable timeouts.
This section covers building and configuring a reqwest client, designing typed Rust interfaces for external APIs, handling errors and retries, and testing external integrations with mock servers.
Dependencies
[dependencies]
reqwest = { version = "0.13", features = ["json"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
tokio = { version = "1", features = ["full"] }
The json feature on reqwest enables the .json() request builder method and the .json::<T>() response deserialiser. Both rely on serde, which you already have in the stack for form handling and database mapping.
Building a client
reqwest::Client manages a connection pool internally. Create one at application startup and share it through Axum state. Do not create a new client per request, as that discards the connection pool and TLS session cache.
use reqwest::Client;
use std::time::Duration;
let client = Client::builder()
.timeout(Duration::from_secs(30))
.connect_timeout(Duration::from_secs(5))
.pool_max_idle_per_host(10)
.build()
.expect("failed to build HTTP client");
timeout sets the total time allowed for a request (connect + send + receive). connect_timeout sets the limit for TCP connection establishment alone. Both prevent your application from hanging indefinitely when an external service is unresponsive.
Add the client to your application state:
#[derive(Clone)]
pub struct AppState {
pub db: sqlx::PgPool,
pub http: reqwest::Client,
}
Making requests
GET with JSON response
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct WeatherResponse {
temperature: f64,
conditions: String,
wind_speed: f64,
}
async fn get_weather(
client: &reqwest::Client,
city: &str,
) -> Result<WeatherResponse, reqwest::Error> {
let response = client
.get("https://api.weather.example.com/v1/current")
.query(&[("city", city)])
.send()
.await?
.error_for_status()?
.json::<WeatherResponse>()
.await?;
Ok(response)
}
.query() appends URL query parameters. .error_for_status() converts 4xx and 5xx responses into errors before attempting to read the body. Without it, a 404 response would try to deserialise the error body as WeatherResponse and fail with a confusing deserialisation error instead of a clear status code error.
POST with JSON body
use serde::{Deserialize, Serialize};
#[derive(Debug, Serialize)]
struct CreateOrderRequest {
product_id: String,
quantity: u32,
customer_email: String,
}
#[derive(Debug, Deserialize)]
struct CreateOrderResponse {
order_id: String,
status: String,
total_cents: u64,
}
async fn create_order(
client: &reqwest::Client,
order: &CreateOrderRequest,
) -> Result<CreateOrderResponse, reqwest::Error> {
client
.post("https://api.orders.example.com/v1/orders")
.json(order)
.send()
.await?
.error_for_status()?
.json::<CreateOrderResponse>()
.await
}
.json(order) serialises the struct to JSON and sets Content-Type: application/json automatically.
Other response types
Not every API returns JSON. Use .text() for plain text, .bytes() for binary data:
let body = client.get(url).send().await?.text().await?;
let image = client.get(url).send().await?.bytes().await?;
Designing typed API clients
Wrap external API interactions in a dedicated struct. This gives you a single place for base URL configuration, authentication, and error mapping.
use reqwest::Client;
use serde::{Deserialize, Serialize};
pub struct WeatherClient {
client: Client,
base_url: String,
api_key: String,
}
#[derive(Debug, Deserialize)]
pub struct CurrentWeather {
pub temperature: f64,
pub conditions: String,
pub humidity: u32,
}
#[derive(Debug, Deserialize)]
pub struct Forecast {
pub days: Vec<ForecastDay>,
}
#[derive(Debug, Deserialize)]
pub struct ForecastDay {
pub date: String,
pub high: f64,
pub low: f64,
pub conditions: String,
}
impl WeatherClient {
pub fn new(client: Client, base_url: String, api_key: String) -> Self {
Self {
client,
base_url,
api_key,
}
}
pub async fn current(&self, city: &str) -> Result<CurrentWeather, WeatherError> {
let response = self
.client
.get(format!("{}/v1/current", self.base_url))
.query(&[("city", city)])
.bearer_auth(&self.api_key)
.send()
.await
.map_err(WeatherError::Request)?;
if !response.status().is_success() {
return Err(WeatherError::status(response).await);
}
response.json().await.map_err(WeatherError::Request)
}
pub async fn forecast(
&self,
city: &str,
days: u32,
) -> Result<Forecast, WeatherError> {
let response = self
.client
.get(format!("{}/v1/forecast", self.base_url))
.query(&[("city", city), ("days", &days.to_string())])
.bearer_auth(&self.api_key)
.send()
.await
.map_err(WeatherError::Request)?;
if !response.status().is_success() {
return Err(WeatherError::status(response).await);
}
response.json().await.map_err(WeatherError::Request)
}
}
The base_url is configurable so tests can point it at a mock server and production can point it at the real API. The Client is injected rather than created internally, so the application controls timeouts and connection pooling centrally.
Error type for external APIs
Define a dedicated error type that distinguishes between network failures, unexpected status codes, and API-specific error responses:
use thiserror::Error;
#[derive(Debug, Error)]
pub enum WeatherError {
#[error("request failed: {0}")]
Request(#[from] reqwest::Error),
#[error("API error {status}: {message}")]
Api {
status: u16,
message: String,
},
}
impl WeatherError {
async fn status(response: reqwest::Response) -> Self {
let status = response.status().as_u16();
let message = response
.text()
.await
.unwrap_or_else(|_| "unknown error".to_string());
Self::Api { status, message }
}
}
This lets callers match on the error to decide what to do. A 404 from the weather API might mean the city was not found (return a user-facing message). A 500 might mean the service is down (retry or degrade gracefully). A network error means the service was unreachable.
Wiring the client into application state
use std::time::Duration;
let http_client = reqwest::Client::builder()
.timeout(Duration::from_secs(10))
.connect_timeout(Duration::from_secs(5))
.build()
.expect("failed to build HTTP client");
let weather = WeatherClient::new(
http_client.clone(),
std::env::var("WEATHER_API_URL").expect("WEATHER_API_URL required"),
std::env::var("WEATHER_API_KEY").expect("WEATHER_API_KEY required"),
);
let state = AppState {
db: pool,
http: http_client,
weather,
};
Handlers access the client through state:
async fn weather_page(
State(state): State<AppState>,
Path(city): Path<String>,
) -> Result<impl IntoResponse, AppError> {
let weather = state.weather.current(&city).await?;
Ok(Html(render_weather(&weather)))
}
JSON serialisation patterns
Renaming fields
External APIs rarely use Rust’s snake_case convention. Use #[serde(rename)] or #[serde(rename_all)] to map between naming styles:
#[derive(Debug, Deserialize)]
#[serde(rename_all = "camelCase")]
struct PaymentResponse {
payment_id: String,
total_amount: u64,
currency_code: String,
}
For APIs that use a mix of conventions or have individual oddities:
#[derive(Debug, Deserialize)]
struct ApiResponse {
#[serde(rename = "ID")]
id: String,
#[serde(rename = "created_at")]
created: String,
}
Optional and default fields
External APIs evolve. Fields get added, deprecated, or become nullable. Use Option<T> for fields that might be absent, and #[serde(default)] for fields with sensible defaults:
#[derive(Debug, Deserialize)]
struct UserProfile {
pub name: String,
pub email: String,
#[serde(default)]
pub verified: bool,
pub avatar_url: Option<String>,
}
Option<String> handles both a missing field and an explicit null value. #[serde(default)] fills in false if the field is absent.
Handling API envelope patterns
Many APIs wrap their data in a common envelope:
{
"status": "ok",
"data": { "temperature": 22.5, "conditions": "sunny" },
"metadata": { "request_id": "abc123" }
}
Define a generic wrapper:
#[derive(Debug, Deserialize)]
struct ApiEnvelope<T> {
status: String,
data: T,
}
Then deserialise into the envelope and extract the inner value:
let envelope = response
.json::<ApiEnvelope<CurrentWeather>>()
.await
.map_err(WeatherError::Request)?;
Ok(envelope.data)
Flattening nested structures
#[serde(flatten)] merges fields from a nested struct into the parent, useful when an API returns metadata alongside entity fields:
#[derive(Debug, Deserialize)]
struct ApiMetadata {
request_id: String,
timestamp: String,
}
#[derive(Debug, Deserialize)]
struct OrderWithMetadata {
#[serde(flatten)]
order: Order,
#[serde(flatten)]
metadata: ApiMetadata,
}
Authentication with external services
Bearer tokens
The most common pattern. A static API key or an OAuth2 access token passed in the Authorization header:
client
.get(url)
.bearer_auth(&api_key)
.send()
.await?
reqwest’s .bearer_auth() sets the Authorization: Bearer <token> header.
Some APIs use a custom header instead of the standard Authorization header:
client
.get(url)
.header("X-API-Key", &api_key)
.send()
.await?
API keys in query parameters
Less secure (keys appear in server logs and URLs) but some APIs require it:
client
.get(url)
.query(&[("api_key", &api_key)])
.send()
.await?
If every request to an API needs the same authentication header, set it as a default on the client:
use reqwest::header::{HeaderMap, HeaderValue};
let mut headers = HeaderMap::new();
headers.insert(
"X-API-Key",
HeaderValue::from_str(&api_key).expect("invalid API key"),
);
let client = Client::builder()
.default_headers(headers)
.build()?;
Default headers are sent on every request made by this client. This avoids repeating authentication on each call and ensures you never accidentally forget it.
OAuth2 token refresh
For APIs that issue short-lived access tokens, store the token and its expiry in your API client and refresh when needed:
use std::sync::Arc;
use tokio::sync::RwLock;
struct TokenState {
access_token: String,
expires_at: std::time::Instant,
}
pub struct OAuthClient {
client: Client,
base_url: String,
client_id: String,
client_secret: String,
token: Arc<RwLock<Option<TokenState>>>,
}
impl OAuthClient {
async fn get_token(&self) -> Result<String, reqwest::Error> {
{
let token = self.token.read().await;
if let Some(state) = token.as_ref() {
if state.expires_at > std::time::Instant::now() {
return Ok(state.access_token.clone());
}
}
}
let response: TokenResponse = self
.client
.post(format!("{}/oauth/token", self.base_url))
.form(&[
("grant_type", "client_credentials"),
("client_id", &self.client_id),
("client_secret", &self.client_secret),
])
.send()
.await?
.json()
.await?;
let token_state = TokenState {
access_token: response.access_token.clone(),
expires_at: std::time::Instant::now()
+ std::time::Duration::from_secs(response.expires_in.saturating_sub(60)),
};
*self.token.write().await = Some(token_state);
Ok(response.access_token)
}
pub async fn request_with_auth(
&self,
url: &str,
) -> Result<reqwest::Response, reqwest::Error> {
let token = self.get_token().await?;
self.client.get(url).bearer_auth(&token).send().await
}
}
#[derive(Deserialize)]
struct TokenResponse {
access_token: String,
expires_in: u64,
}
The saturating_sub(60) refreshes the token 60 seconds before it actually expires, avoiding requests that race against expiry.
Error handling for external calls
External HTTP calls fail in ways that database calls and local operations do not. The network is unreliable, third-party services go down, and response formats change without warning.
Timeouts
Always set timeouts. A missing timeout means a single unresponsive external service can exhaust your server’s thread pool:
let client = Client::builder()
.timeout(Duration::from_secs(10))
.connect_timeout(Duration::from_secs(5))
.build()?;
For requests where you know the response should be fast, override per-request:
client.get(url)
.timeout(Duration::from_secs(3))
.send()
.await?
Retries with reqwest-middleware
reqwest-middleware wraps reqwest::Client with a middleware chain. reqwest-retry adds automatic retries with exponential backoff for transient failures.
[dependencies]
reqwest-middleware = "0.4"
reqwest-retry = "0.7"
use reqwest_middleware::ClientBuilder;
use reqwest_retry::{
RetryTransientMiddleware,
policies::ExponentialBackoff,
};
use std::time::Duration;
let retry_policy = ExponentialBackoff::builder()
.retry_bounds(
Duration::from_millis(500),
Duration::from_secs(10),
)
.build_with_max_retries(3);
let client = ClientBuilder::new(
reqwest::Client::builder()
.timeout(Duration::from_secs(10))
.connect_timeout(Duration::from_secs(5))
.build()
.expect("failed to build HTTP client"),
)
.with(RetryTransientMiddleware::new_with_policy(retry_policy))
.build();
The retry middleware automatically retries on:
- Connection timeouts and resets
- HTTP 500, 502, 503, 504 (server errors)
- HTTP 408 (request timeout)
- HTTP 429 (too many requests)
It does not retry 4xx client errors (except 408 and 429), because those indicate a problem with the request itself.
The middleware client (ClientWithMiddleware) has the same API as reqwest::Client for making requests. If your API client struct wraps the client, swap reqwest::Client for reqwest_middleware::ClientWithMiddleware:
use reqwest_middleware::ClientWithMiddleware;
pub struct WeatherClient {
client: ClientWithMiddleware,
base_url: String,
api_key: String,
}
Circuit breaking
A circuit breaker prevents your application from hammering a service that is already failing. After a threshold of consecutive failures, it “opens” the circuit and fails requests immediately for a cooldown period, then allows a probe request through to check if the service has recovered.
There is no dominant circuit breaker crate in the Rust ecosystem. For most applications, the combination of timeouts + retries with backoff is sufficient. The backoff itself acts as a partial circuit breaker: each retry waits longer, reducing pressure on the failing service.
For operations where reliability matters beyond what retries provide (payment processing, order fulfilment, webhook delivery), use Restate for durable execution. Restate persists the call, retries across process restarts, and provides exactly-once semantics. This is a fundamentally stronger guarantee than in-process retries, which are lost if the application crashes.
Mapping external errors to AppError
Bridge your API client errors to the application’s error type:
impl From<WeatherError> for AppError {
fn from(err: WeatherError) -> Self {
match err {
WeatherError::Api { status, message } if status == 404 => {
AppError::NotFound(format!("weather data: {message}"))
}
WeatherError::Api { status, message } => {
tracing::error!(status, message, "external API error");
AppError::BadGateway("external service returned an error".to_string())
}
WeatherError::Request(err) => {
tracing::error!(error = ?err, "external request failed");
AppError::BadGateway("external service unavailable".to_string())
}
}
}
}
502 Bad Gateway is the appropriate HTTP status when your server is acting as a gateway to an upstream service and that service fails. Add a BadGateway variant to your AppError if you don’t have one.
Testing external API integrations
External HTTP calls are one of the most important things to test and one of the easiest to get wrong. wiremock starts a real HTTP server in your test process and lets you define expected requests and canned responses.
[dev-dependencies]
wiremock = "0.6"
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
A complete test
This test exercises the WeatherClient against a mock server:
use wiremock::{MockServer, Mock, ResponseTemplate};
use wiremock::matchers::{method, path, query_param, header};
#[tokio::test]
async fn test_current_weather() {
let mock_server = MockServer::start().await;
Mock::given(method("GET"))
.and(path("/v1/current"))
.and(query_param("city", "london"))
.and(header("authorization", "Bearer test-key"))
.respond_with(
ResponseTemplate::new(200).set_body_json(serde_json::json!({
"temperature": 18.5,
"conditions": "cloudy",
"humidity": 72
})),
)
.expect(1)
.mount(&mock_server)
.await;
let client = WeatherClient::new(
reqwest::Client::new(),
mock_server.uri(),
"test-key".to_string(),
);
let weather = client.current("london").await.unwrap();
assert_eq!(weather.temperature, 18.5);
assert_eq!(weather.conditions, "cloudy");
assert_eq!(weather.humidity, 72);
}
The mock server binds to a random available port on localhost. mock_server.uri() returns the base URL (e.g., http://127.0.0.1:54321). Because WeatherClient accepts a configurable base_url, no production code needs to change.
.expect(1) asserts that the mock was called exactly once. If the test ends without the expected call count, wiremock panics with a clear message showing which mocks were not satisfied.
Testing error responses
#[tokio::test]
async fn test_city_not_found() {
let mock_server = MockServer::start().await;
Mock::given(method("GET"))
.and(path("/v1/current"))
.respond_with(
ResponseTemplate::new(404)
.set_body_string("city not found"),
)
.mount(&mock_server)
.await;
let client = WeatherClient::new(
reqwest::Client::new(),
mock_server.uri(),
"test-key".to_string(),
);
let err = client.current("atlantis").await.unwrap_err();
match err {
WeatherError::Api { status, .. } => assert_eq!(status, 404),
_ => panic!("expected Api error, got {err:?}"),
}
}
Testing timeout behaviour
#[tokio::test]
async fn test_timeout() {
let mock_server = MockServer::start().await;
Mock::given(method("GET"))
.and(path("/v1/current"))
.respond_with(
ResponseTemplate::new(200)
.set_body_json(serde_json::json!({"temperature": 20.0}))
.set_delay(std::time::Duration::from_secs(10)),
)
.mount(&mock_server)
.await;
let client = WeatherClient::new(
reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(1))
.build()
.unwrap(),
mock_server.uri(),
"test-key".to_string(),
);
let err = client.current("london").await.unwrap_err();
assert!(matches!(err, WeatherError::Request(_)));
}
wiremock’s .set_delay() on the response template simulates a slow upstream service. The test verifies that the client’s timeout configuration works as expected.
Gotchas
Reuse the Client. Each reqwest::Client holds a connection pool. Creating a new client per request means no connection reuse, no TLS session caching, and a DNS lookup on every call. Build one client at startup and share it.
Always set timeouts. reqwest has no default timeout. Without one, a request to an unresponsive host blocks the Tokio task indefinitely. Set both timeout (total) and connect_timeout (connection establishment) on the client builder.
Call .error_for_status() or check status manually. A 404 or 500 response from an external API is not a reqwest error by default. The request succeeded at the HTTP level. If you skip the status check, you’ll get a confusing deserialisation error when serde tries to parse an error body as your response type.
Watch for #[serde(deny_unknown_fields)]. It’s tempting to add this to be strict about API responses, but external APIs add new fields all the time. Unknown fields should be silently ignored (serde’s default) to avoid breaking your application when a third-party adds a field to their response.
Token and credential storage. API keys, client secrets, and OAuth2 credentials belong in environment variables or a secrets manager, not in code. The Configuration and Secrets section covers this in detail.
Log external failures, but not credentials. When logging failed external requests for debugging, ensure you are not writing API keys, bearer tokens, or request bodies containing sensitive data to your logs. Log the URL, status code, and error message. Skip headers and bodies unless you have confirmed they contain no secrets.
Background Jobs and Durable Execution with Restate
Web applications run work outside the request/response cycle constantly: processing a payment after checkout, sending confirmation emails, generating reports, importing data. When any of these steps fails partway through, you need the operation to resume from where it left off, not start over or silently disappear.
Restate is a durable execution engine. It records every step of a handler in a persistent journal. If a step fails or the process crashes, Restate replays from the journal, skipping completed steps and retrying from the point of failure. You get automatic retries, exactly-once side effects, and workflows that survive process restarts, without writing retry logic or state machines yourself.
This section covers integrating Restate with an Axum application as separate workspace binaries, defining durable workflows, triggering them from HTTP handlers, and reporting progress back to the browser via Valkey pub/sub and SSE.
When to use Restate
The design principle for this stack is durable by default: any work that must not be silently lost goes through Restate.
Use Restate for:
- Multi-step operations involving external services (payment + inventory + email)
- Any side effect that must happen exactly once (sending a notification, charging a card, calling a third-party API)
- Work that outlives an HTTP request timeout (report generation, data imports, file processing)
- Operations that need automatic retries with persistence across process restarts
- Coordinating work across multiple services with transactional guarantees
The bar for skipping Restate is high. A tokio::spawn that fires off a quick in-memory computation with no external effects is fine. But the moment the spawned task calls an external API, sends an email, or does anything the user expects to complete reliably, route it through Restate. The cost is one HTTP hop to the Restate server. What you get back is durability, observability, and automatic retry logic without writing any of it yourself.
How Restate works
Restate runs as a separate server process that sits between your application and your service handlers:
Axum app ──HTTP──▶ Restate Server (port 8080) ──HTTP──▶ Worker (port 9080)
│
▼
Journal + State
(durable log + RocksDB)
Your Axum application sends requests to the Restate server’s ingress on port 8080. Restate forwards them to your worker process, which runs the actual handler logic using the Restate SDK on port 9080. Every operation the handler performs is recorded in Restate’s journal. If the handler crashes, Restate replays the journal against a new handler invocation: completed steps return their stored results without re-executing, and execution resumes from the failed step.
The Restate server is a single binary with no external dependencies. It stores its journal and state in an embedded RocksDB instance backed by a durable replicated log. The admin API on port 9070 handles service registration and provides a built-in UI for inspecting invocations.
Service types
The Restate Rust SDK provides three types of handlers, each defined as a trait with a proc macro.
Services (#[restate_sdk::service]) are stateless handlers. Multiple invocations run concurrently. Use these for independent operations like sending emails or calling external APIs.
Virtual objects (#[restate_sdk::object]) are stateful entities identified by a string key. Each object has isolated key/value state stored durably by Restate. Only one exclusive handler runs at a time per key, which guarantees state consistency without locks. Handlers marked #[shared] run concurrently with read-only state access.
Workflows (#[restate_sdk::workflow]) are a specialised form of virtual object. The run handler executes exactly once per workflow ID. Additional #[shared] handlers can query the workflow’s state or signal it through durable promises. Workflow state is retained for 24 hours after completion by default.
| Type | State | Concurrency per key | Use case |
| Service | None | Concurrent | Stateless operations: send email, call API, transform data |
| Virtual Object | Per-key K/V | One exclusive handler at a time | Mutable state: counter, order tracker, rate limiter |
| Workflow | Per-workflow K/V | run exclusive; #[shared] concurrent | Multi-step processes: order fulfilment, onboarding, data pipeline |
Durable execution primitives
Journaled side effects
ctx.run() executes a closure and persists the result in Restate’s journal. On replay, the stored result is returned without re-executing the closure. Wrap every non-deterministic operation (HTTP calls, database writes, random number generation) in ctx.run().
let payment_id: String = ctx
.run(|| charge_payment(order.clone()))
.name("charge_payment")
.await?;
The .name() call labels the operation in the Restate UI for observability. It is optional but worth adding.
If the closure fails, Restate retries it with exponential backoff. The default retry policy retries indefinitely. Override it for operations that should fail fast:
use restate_sdk::prelude::*;
use std::time::Duration;
let result = ctx
.run(|| call_flaky_service())
.retry_policy(
RunRetryPolicy::default()
.initial_delay(Duration::from_millis(100))
.exponentiation_factor(2.0)
.max_attempts(5),
)
.name("flaky_service")
.await?;
Two constraints on ctx.run() closures: you cannot use the Restate context (ctx) inside the closure (no state access, no nested run calls, no service calls), and the run call must be immediately awaited before making other context calls.
Terminal errors
Return a TerminalError from a ctx.run() closure to signal a permanent failure that should not be retried. A declined credit card or an invalid request are terminal; a network timeout is not.
use restate_sdk::prelude::*;
async fn charge_payment(order: Order) -> Result<String, HandlerError> {
let resp = payment_client.charge(&order).await;
match resp {
Ok(charge) => Ok(charge.id),
Err(e) if e.is_retryable() => Err(e.into()),
Err(e) => Err(TerminalError::new(format!("Payment permanently failed: {e}")).into()),
}
}
Durable state
Virtual objects and workflows have access to key/value state that is persisted by Restate. State changes are journaled alongside execution and survive crashes.
ctx.set("status", "processing".to_string());
let status: Option<String> = ctx.get("status").await?;
ctx.clear("status");
Durable timers
ctx.sleep() suspends the handler for a duration. Restate persists the timer. If the process crashes during the sleep, Restate resumes the handler on another invocation when the timer fires.
ctx.sleep(Duration::from_secs(60)).await?;
Workspace layout
The Axum web application and the Restate worker run as separate processes. Shared domain types live in a common crate. This follows the project’s workspace-with-multiple-crates pattern.
your-project/
├── Cargo.toml (workspace root)
├── crates/
│ ├── web/ (Axum web application)
│ │ └── Cargo.toml
│ ├── worker/ (Restate service worker)
│ │ └── Cargo.toml
│ └── shared/ (shared domain types)
│ └── Cargo.toml
Worker dependencies
[dependencies]
restate-sdk = "0.9"
tokio = { version = "1", features = ["full"] }
tracing = "0.1"
tracing-subscriber = "0.3"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
reqwest = { version = "0.13", features = ["json"] }
redis = { version = "1.0", features = ["tokio-comp"] }
anyhow = "1"
shared = { path = "../shared" }
reqwest is for registering the worker with the Restate server on startup and for side effects that call external APIs. The redis crate is for publishing progress events to Valkey; the SSE infrastructure in the web application picks them up.
Shared types
Define domain types in the shared crate so both the web application and the worker use identical structs:
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Order {
pub id: String,
pub customer_email: String,
pub items: Vec<OrderItem>,
pub total_cents: u64,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OrderItem {
pub product_id: String,
pub quantity: u32,
pub price_cents: u64,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct FulfilmentResult {
pub order_id: String,
pub payment_id: String,
pub status: String,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Progress {
pub percent: u32,
pub message: String,
}
Defining the workflow
An order fulfilment workflow that processes payment, reserves inventory, sends a confirmation email, and reports progress at each stage. The workflow uses ctx.run() for each side effect, ctx.set() to track progress in durable state, and publishes to Valkey for real-time SSE updates.
use restate_sdk::prelude::*;
use shared::{FulfilmentResult, Order, Progress};
#[restate_sdk::workflow]
pub trait OrderFulfilment {
async fn run(order: Json<Order>) -> Result<Json<FulfilmentResult>, HandlerError>;
#[shared]
async fn get_progress() -> Result<Json<Progress>, HandlerError>;
}
pub struct OrderFulfilmentImpl {
pub valkey: redis::Client,
}
impl OrderFulfilment for OrderFulfilmentImpl {
async fn run(
&self,
ctx: WorkflowContext<'_>,
Json(order): Json<Order>,
) -> Result<Json<FulfilmentResult>, HandlerError> {
let order_id = ctx.key().to_string();
report_progress(&ctx, &self.valkey, &order_id, 0, "Processing payment...");
let payment_id: String = ctx
.run(|| charge_payment(order.clone()))
.name("charge_payment")
.await?;
report_progress(&ctx, &self.valkey, &order_id, 33, "Reserving inventory...");
ctx.run(|| reserve_inventory(order.clone()))
.name("reserve_inventory")
.await?;
report_progress(&ctx, &self.valkey, &order_id, 66, "Sending confirmation...");
ctx.run(|| send_confirmation(order.clone(), payment_id.clone()))
.name("send_confirmation")
.await?;
report_progress(&ctx, &self.valkey, &order_id, 100, "Order fulfilled.");
Ok(Json(FulfilmentResult {
order_id,
payment_id,
status: "fulfilled".to_string(),
}))
}
async fn get_progress(
&self,
ctx: SharedWorkflowContext<'_>,
) -> Result<Json<Progress>, HandlerError> {
let progress = ctx
.get::<Progress>("progress")
.await?
.unwrap_or(Progress {
percent: 0,
message: "Waiting to start...".to_string(),
});
Ok(Json(progress))
}
}
Each ctx.run() call wraps a side effect function that takes ownership of the data it needs. This is the standard pattern: clone the data before the closure so the closure owns its inputs. The side effect functions are regular async functions that call external services:
async fn charge_payment(order: Order) -> Result<String, anyhow::Error> {
let client = reqwest::Client::new();
let resp: serde_json::Value = client
.post("https://payments.example.com/v1/charges")
.json(&serde_json::json!({
"amount": order.total_cents,
"currency": "gbp",
}))
.send()
.await?
.error_for_status()?
.json()
.await?;
Ok(resp["id"].as_str().unwrap_or_default().to_string())
}
async fn reserve_inventory(order: Order) -> Result<(), anyhow::Error> {
Ok(())
}
async fn send_confirmation(order: Order, payment_id: String) -> Result<(), anyhow::Error> {
Ok(())
}
Progress reporting
The report_progress function does two things: it updates durable state (queryable via the get_progress handler) and publishes an event to Valkey for real-time SSE delivery. The durable state is the authoritative source; the Valkey publish is best-effort for pushing updates to the browser.
fn report_progress(
ctx: &WorkflowContext<'_>,
valkey: &redis::Client,
order_id: &str,
percent: u32,
message: &str,
) {
ctx.set(
"progress",
Progress {
percent,
message: message.to_string(),
},
);
let valkey = valkey.clone();
let channel = format!("order:{order_id}");
let event_type = if percent >= 100 { "complete" } else { "progress" };
let html = format!(
r#"<div class="progress-bar" style="width: {percent}%">{percent}%</div>
<p>{message}</p>"#
);
let payload = serde_json::json!({
"event_type": event_type,
"data": html,
})
.to_string();
let _ = ctx
.run(|| async move {
let mut conn = valkey.get_multiplexed_async_connection().await?;
redis::cmd("PUBLISH")
.arg(&channel)
.arg(&payload)
.query_async::<()>(&mut conn)
.await?;
Ok::<(), redis::RedisError>(())
})
.retry_policy(RunRetryPolicy::default().max_attempts(2))
.await;
}
The retry policy limits Valkey publish retries to 2 attempts. Progress events are informational; if Valkey is temporarily unreachable, the workflow should continue. The let _ = discards the result so a failed publish never fails the workflow.
The Valkey event payload matches the format established in the Server-Sent Events section: a JSON object with event_type and data fields. The SSE handler parses this format and delivers it to the browser. When percent >= 100, the event type switches to "complete", which triggers sse-close="complete" in the browser and closes the SSE connection.
Running the worker
The worker binary starts the Restate SDK HTTP server and registers itself with the Restate server on startup.
use restate_sdk::prelude::*;
use std::time::Duration;
mod fulfilment;
use fulfilment::OrderFulfilmentImpl;
#[tokio::main]
async fn main() {
tracing_subscriber::fmt::init();
let valkey_url =
std::env::var("VALKEY_URL").unwrap_or_else(|_| "redis://127.0.0.1:6379".to_string());
let valkey = redis::Client::open(valkey_url).expect("invalid VALKEY_URL");
let worker_addr = std::env::var("WORKER_ADDR")
.unwrap_or_else(|_| "0.0.0.0:9080".to_string());
let restate_admin_url = std::env::var("RESTATE_ADMIN_URL")
.unwrap_or_else(|_| "http://127.0.0.1:9070".to_string());
let worker_url = std::env::var("WORKER_URL")
.unwrap_or_else(|_| "http://127.0.0.1:9080".to_string());
let endpoint = Endpoint::builder()
.bind(OrderFulfilmentImpl { valkey }.serve())
.build();
let addr = worker_addr.parse().expect("invalid WORKER_ADDR");
tokio::spawn(async move {
HttpServer::new(endpoint).listen_and_serve(addr).await;
});
tokio::time::sleep(Duration::from_millis(500)).await;
register_deployment(&restate_admin_url, &worker_url).await;
tracing::info!("Worker running on {worker_addr}");
tokio::signal::ctrl_c().await.unwrap();
}
async fn register_deployment(admin_url: &str, worker_url: &str) {
let client = reqwest::Client::new();
match client
.post(format!("{admin_url}/deployments"))
.json(&serde_json::json!({
"uri": worker_url,
"force": true,
}))
.send()
.await
{
Ok(resp) if resp.status().is_success() => {
tracing::info!("Registered deployment at {worker_url}");
}
Ok(resp) => {
let status = resp.status();
let body = resp.text().await.unwrap_or_default();
tracing::warn!("Restate registration returned {status}: {body}");
}
Err(e) => {
tracing::warn!("Failed to register with Restate: {e}");
}
}
}
The "force": true field in the registration body tells Restate to update the deployment if it already exists. This handles the common development case where you restart the worker after code changes.
Registration calls the Restate admin API on port 9070. Restate performs service discovery automatically: it queries your worker’s endpoint, finds all bound services and their handlers, and registers them. After registration, the services are callable through the Restate ingress on port 8080.
Triggering workflows from Axum handlers
The Axum web application triggers Restate workflows by sending HTTP requests to the Restate ingress. This is a plain reqwest call; no Restate SDK is needed in the web application.
The Restate team is developing a standalone typed client that will be generated from the same trait declarations that define the service. This will replace the raw HTTP calls shown below with compile-time checked method calls. Until then, the ingress HTTP API is the integration point.
URL patterns
| Service type | URL pattern |
| Service | POST /ServiceName/handlerName |
| Virtual Object | POST /ObjectName/{key}/handlerName |
| Workflow (run) | POST /WorkflowName/{workflowId}/run |
| Workflow (query/signal) | POST /WorkflowName/{workflowId}/handlerName |
Append /send to any URL for fire-and-forget invocation. Restate accepts the request immediately and returns an invocation ID. The handler runs in the background.
Starting a workflow
An Axum handler that creates an order, triggers the fulfilment workflow, and returns a confirmation page with live progress:
use axum::{extract::State, response::Html, Form};
async fn create_order(
State(state): State<AppState>,
Form(input): Form<OrderInput>,
) -> Result<Html<String>, AppError> {
let order = insert_order(&state.db, &input).await?;
let resp = state
.http
.post(format!(
"{}/OrderFulfilment/{}/run/send",
state.restate_ingress_url, order.id,
))
.json(&order)
.send()
.await
.map_err(|e| {
tracing::error!(error = ?e, "failed to trigger fulfilment workflow");
AppError::BadGateway("could not start order processing".into())
})?;
if !resp.status().is_success() {
tracing::error!(status = %resp.status(), "Restate rejected workflow");
return Err(AppError::BadGateway("could not start order processing".into()));
}
Ok(Html(render_order_progress(&order)))
}
The /send suffix is critical. Without it, the HTTP call blocks until the entire workflow completes, which could take seconds or minutes. With /send, Restate returns immediately and the workflow runs in the background.
The confirmation page with SSE progress
The rendered page connects to the SSE endpoint established in the Server-Sent Events section. The SSE handler subscribes to the Valkey channel order:{id}, which the workflow publishes progress events to.
use maud::{html, Markup};
fn render_order_progress(order: &Order) -> String {
html! {
h2 { "Order " (order.id) " placed" }
div hx-ext="sse"
sse-connect=(format!("/events/orders/{}/progress", order.id))
sse-close="complete" {
div sse-swap="progress" hx-swap="innerHTML" {
div .progress-bar style="width: 0%" { "0%" }
p { "Starting fulfilment..." }
}
}
}
.into_string()
}
The sse-close="complete" attribute closes the SSE connection when the workflow publishes its final event with event_type: "complete". The full SSE wiring (the /events/orders/{id}/progress endpoint, Valkey subscriber, event delivery) is covered in the Server-Sent Events section.
Synchronous invocation
For cases where you need the workflow result before responding (rare, but sometimes necessary for short-running workflows):
let result: FulfilmentResult = state
.http
.post(format!(
"{}/OrderFulfilment/{}/run",
state.restate_ingress_url, order.id,
))
.json(&order)
.send()
.await?
.error_for_status()?
.json()
.await?;
Without /send, the call blocks until run completes and returns the workflow’s result directly.
Idempotency
Workflows are inherently idempotent: the run handler executes exactly once per workflow ID. Calling /OrderFulfilment/order-123/run twice with the same ID attaches the second call to the existing execution rather than starting a new one. Use the order ID (or another natural identifier) as the workflow ID to get this deduplication for free.
For services (which are not keyed), add an Idempotency-Key header to prevent duplicate processing:
state
.http
.post(format!("{}/EmailSender/send_confirmation", state.restate_ingress_url))
.header("idempotency-key", &order.id)
.json(&email_details)
.send()
.await?;
Restate caches the response for 24 hours, returning the cached result for duplicate calls.
Application state
Add the Restate ingress URL to your Axum application state:
#[derive(Clone)]
pub struct AppState {
pub db: sqlx::PgPool,
pub http: reqwest::Client,
pub valkey: redis::Client,
pub restate_ingress_url: String,
}
Read the URL from an environment variable at startup:
let restate_ingress_url = std::env::var("RESTATE_INGRESS_URL")
.unwrap_or_else(|_| "http://127.0.0.1:8080".to_string());
Running Restate in development
Restate and Valkey run as Docker containers. The web application and worker run on the host, consistent with the project’s approach of Docker for backing services only.
Add Restate to your Docker Compose file alongside PostgreSQL and Valkey:
services:
postgres:
image: postgres:17
ports:
- "5432:5432"
environment:
POSTGRES_DB: app
POSTGRES_USER: app
POSTGRES_PASSWORD: app
valkey:
image: valkey/valkey:8
ports:
- "6379:6379"
restate:
image: docker.restate.dev/restatedev/restate:1.6
ports:
- "8080:8080"
- "9070:9070"
- "9071:9071"
Development workflow
- Start the backing services:
docker compose up -d
- Run the worker (registers itself with Restate on startup):
cargo run -p worker
- Run the web application:
cargo run -p web
- Open the Restate UI at
http://localhost:9070 to inspect registered services, active invocations, and their journal entries.
When the worker registers with Restate, the Restate server (running in Docker) needs to reach the worker (running on the host). On macOS and Windows, Docker Desktop maps host.docker.internal to the host. Set WORKER_URL=http://host.docker.internal:9080 when the worker registers. On Linux, use --network=host on the Restate container or set up bridge networking.
Re-registration on code changes
The worker registers with "force": true, so restarting the worker after code changes updates the deployment automatically. If you add or remove handlers, the updated service definitions are picked up on the next registration.
Deploying Restate in production
Restate is a single binary with no external dependencies. It stores its own state in an embedded data directory. Deploy it alongside your application, not as a managed service.
Single-server deployment
On a VPS running Docker Compose, add Restate as another service:
services:
restate:
image: docker.restate.dev/restatedev/restate:1.6
restart: unless-stopped
volumes:
- restate_data:/target/restate-data
ports:
- "8080:8080"
volumes:
restate_data:
Persist the Restate data directory (/target/restate-data inside the container) to a volume. This preserves the journal and state across container restarts.
The admin API (port 9070) should not be exposed to the public internet. Access it through your private network (Tailscale, VPN, or SSH tunnel) for service registration and the UI.
The worker in production
Build the worker as a separate Docker image from the same workspace:
FROM rust:1.85 AS builder
WORKDIR /app
COPY . .
RUN cargo build --release -p worker
FROM debian:bookworm-slim
COPY --from=builder /app/target/release/worker /usr/local/bin/
CMD ["worker"]
The worker registers with Restate on startup, so deploying a new version is: build the image, restart the container, and the new handlers are registered automatically.
Scaling
Restate handles hundreds of concurrent workflow invocations on a single server. Each invocation consumes minimal resources when suspended (waiting on a timer, sleeping, or paused between steps).
If you outgrow a single Restate server, Restate supports clustered deployment with partitioned state and replicated logs. This is a significant operational step. For most applications covered by this guide (content sites, CRUD apps, internal tools), a single Restate instance is sufficient.
Gotchas
Everything non-deterministic must be in ctx.run(). HTTP calls, database queries, reading the current time, generating random numbers. If it produces different results on different executions, wrap it. Forgetting this causes journal replay to diverge, which Restate detects and flags as an error.
Side effect functions must own their data. The closure passed to ctx.run() must be Send + 'static. Clone values before passing them into the closure rather than borrowing from the handler’s scope.
The worker is not your web server. The Restate SDK’s HttpServer serves the Restate protocol, not HTTP for browsers. Keep the Axum web application and the Restate worker as separate binaries. They share types through the workspace, not a runtime.
Registration must happen after code changes. When you add, remove, or rename handlers, the worker must re-register with Restate so it knows the updated service definitions. The auto-registration pattern shown above handles this. If you skip registration after a change, Restate routes requests to the old handler definitions and invocations fail.
Workflow IDs are unique. Calling run on a workflow ID that has already completed or is currently running attaches to the existing execution. If you need to re-run a workflow for the same entity, use a new ID (e.g., order-123-retry-1 or include a timestamp).
Valkey progress events are fire-and-forget. If the browser is not connected when a progress event is published, that event is lost. The get_progress shared handler provides a durable fallback: the browser can poll it on reconnection or use it as the initial state before the SSE connection is established.
Test with restate-sdk-testcontainers. The restate-sdk-testcontainers crate spins up a Restate server in Docker for integration tests, similar to how testcontainers works for PostgreSQL. Use it to test workflows end-to-end without a persistent Restate instance.
Pin the Restate server version. Use a specific image tag (e.g., restate:1.6) rather than latest. The Restate SDK and server have version compatibility ranges. The Rust SDK v0.9 supports Restate Server 1.3 through 1.6.
AI and LLM Integration
LLM features in a web application (content generation, summarisation, classification, conversational interfaces) are fundamentally HTTP calls to an inference API. The challenge is not the call itself but everything around it: provider abstraction, structured tool use, streaming partial responses to the browser, and surviving failures in calls that are expensive, slow, and rate-limited.
Rig is a Rust library for building LLM-powered applications. It provides a unified interface across providers (Anthropic, OpenAI, Ollama, Gemini, and others), typed tool definitions, streaming support, and an agent abstraction that handles multi-step tool-calling loops. This section covers integrating Rig with Axum handlers, defining tools, streaming responses to the browser via SSE, and making AI workflows durable with Restate.
Dependencies
[dependencies]
rig-core = "0.31"
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
futures-util = "0.3"
Rig includes all providers by default. No feature flags are needed to enable Anthropic, OpenAI, or Ollama support.
Provider setup
Anthropic
Anthropic’s Claude models are the primary provider for the examples in this section. Create a client from the ANTHROPIC_API_KEY environment variable:
use rig::providers::anthropic;
let client = anthropic::Client::from_env();
let agent = client.agent("claude-sonnet-4-20250514")
.preamble("You are a helpful assistant.")
.build();
Client::from_env() reads ANTHROPIC_API_KEY from the environment. The model string matches Anthropic’s model ID format. Add the API key to your .env file for local development:
ANTHROPIC_API_KEY=sk-ant-...
OpenAI
OpenAI is a drop-in alternative. The agent code is identical apart from the client and model name:
use rig::providers::openai;
let client = openai::Client::from_env();
let agent = client.agent("gpt-4o")
.preamble("You are a helpful assistant.")
.build();
This is the core value of Rig’s provider abstraction: your application code uses the Prompt, Chat, and StreamingPrompt traits. Swapping providers means changing two lines, not rewriting your handlers.
Ollama for local inference
Ollama runs open-weight models locally. It fits the self-hosted ethos of this stack and is useful for development without burning API credits, for privacy-sensitive workloads, and for running smaller models where latency to a cloud API is unnecessary overhead.
Add Ollama to your Docker Compose alongside other backing services:
services:
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
volumes:
ollama_data:
Pull a model after the container starts:
docker exec -it ollama ollama pull llama3.2
Create a Rig client pointing at the local instance:
use rig::providers::ollama;
let client = ollama::Client::from_env();
let agent = client.agent("llama3.2")
.preamble("You are a helpful assistant.")
.build();
OLLAMA_API_BASE_URL defaults to http://localhost:11434. No API key is required.
Basic completions in Axum handlers
The simplest integration: an Axum handler that sends a prompt to the LLM and returns the response as HTML.
use axum::{extract::State, response::Html, Form};
use rig::completion::Prompt;
use serde::Deserialize;
#[derive(Deserialize)]
struct SummariseInput {
text: String,
}
async fn summarise(
State(state): State<AppState>,
Form(input): Form<SummariseInput>,
) -> Result<Html<String>, AppError> {
let prompt = format!(
"Summarise the following text in 2-3 sentences:\n\n{}",
input.text
);
let summary = state.agent.prompt(&prompt).await.map_err(|e| {
tracing::error!(error = ?e, "LLM completion failed");
AppError::BadGateway("AI service unavailable".into())
})?;
Ok(Html(format!("<div class=\"summary\">{summary}</div>")))
}
The Prompt trait’s .prompt() method sends a one-shot request and returns the full response as a String. The agent is stored in application state, shared across requests:
use rig::providers::anthropic;
#[derive(Clone)]
pub struct AppState {
pub db: sqlx::PgPool,
pub http: reqwest::Client,
pub agent: rig::agent::Agent<rig::providers::anthropic::completion::CompletionModel>,
}
Building the agent at startup:
let anthropic = anthropic::Client::from_env();
let agent = anthropic
.agent("claude-sonnet-4-20250514")
.preamble("You are an assistant that summarises text concisely.")
.temperature(0.3)
.build();
let state = AppState {
db: pool,
http: reqwest::Client::new(),
agent,
};
Lower temperature values (0.0 to 0.3) produce more deterministic output, which is appropriate for summarisation, classification, and extraction. Higher values (0.7 to 1.0) produce more creative output for generation tasks.
Chat with history
For multi-turn conversations, the Chat trait accepts a message and a history vector:
use rig::completion::{Chat, Message};
async fn chat(
State(state): State<AppState>,
Form(input): Form<ChatInput>,
) -> Result<Html<String>, AppError> {
let history: Vec<Message> = load_chat_history(&state.db, input.session_id).await?;
let response = state.agent.chat(&input.message, history).await.map_err(|e| {
tracing::error!(error = ?e, "chat completion failed");
AppError::BadGateway("AI service unavailable".into())
})?;
save_chat_messages(&state.db, input.session_id, &input.message, &response).await?;
Ok(Html(format!("<div class=\"message assistant\">{response}</div>")))
}
Store chat history in PostgreSQL rather than in-memory. Sessions expire, servers restart, and users expect conversations to persist.
LLMs generate text. Tools let them take actions: query a database, call an API, perform calculations, look up current information. The model decides which tool to call and with what arguments, your code executes the tool, and the result feeds back into the model’s next response.
Rig defines tools through the Tool trait. Each tool is a Rust struct with typed arguments, typed output, and a JSON schema that tells the model what the tool does and what parameters it accepts.
A tool that searches for products in a database:
use rig::tool::{Tool, ToolDyn};
use rig::completion::ToolDefinition;
use serde::{Deserialize, Serialize};
use serde_json::json;
#[derive(Debug, Deserialize)]
struct ProductSearchArgs {
query: String,
max_results: Option<u32>,
}
#[derive(Debug, thiserror::Error)]
#[error("product search failed: {0}")]
struct ProductSearchError(String);
#[derive(Serialize, Deserialize)]
struct ProductSearch {
db: sqlx::PgPool,
}
impl Tool for ProductSearch {
const NAME: &'static str = "search_products";
type Error = ProductSearchError;
type Args = ProductSearchArgs;
type Output = String;
async fn definition(&self, _prompt: String) -> ToolDefinition {
ToolDefinition {
name: "search_products".to_string(),
description: "Search the product catalogue by name or description. Returns matching products with prices.".to_string(),
parameters: json!({
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query for product name or description"
},
"max_results": {
"type": "integer",
"description": "Maximum number of results to return (default 5)"
}
},
"required": ["query"]
}),
}
}
async fn call(&self, args: Self::Args) -> Result<Self::Output, Self::Error> {
let max = args.max_results.unwrap_or(5) as i64;
let products = sqlx::query_as!(
Product,
r#"
SELECT id, name, description, price_cents
FROM products
WHERE to_tsvector('english', name || ' ' || description) @@ plainto_tsquery('english', $1)
ORDER BY ts_rank(to_tsvector('english', name || ' ' || description), plainto_tsquery('english', $1)) DESC
LIMIT $2
"#,
args.query,
max,
)
.fetch_all(&self.db)
.await
.map_err(|e| ProductSearchError(e.to_string()))?;
let formatted = products
.iter()
.map(|p| format!("- {} ({}): {}", p.name, format_price(p.price_cents), p.description))
.collect::<Vec<_>>()
.join("\n");
if formatted.is_empty() {
Ok("No products found matching the search query.".to_string())
} else {
Ok(formatted)
}
}
}
Key points about the Tool trait:
NAME: a static string identifier the model uses to invoke the tool.
Args: a deserializable struct. Rig parses the model’s JSON arguments into this type automatically.
Output: a serialisable type returned to the model. Strings work well because the model consumes the result as text.
definition(): returns a JSON Schema that describes the tool’s purpose and parameters. The model uses this to decide when and how to call the tool.
call(): the actual implementation. This is regular Rust code, so it can query databases, call APIs, read files, or do anything else.
Build an agent with tools attached:
use rig::tool::ToolDyn;
let product_search = ProductSearch { db: pool.clone() };
let tools: Vec<Box<dyn ToolDyn>> = vec![Box::new(product_search)];
let agent = anthropic
.agent("claude-sonnet-4-20250514")
.preamble(
"You are a shopping assistant. Use the search_products tool to find products \
that match what the customer is looking for. Provide helpful recommendations \
based on the search results."
)
.tools(tools)
.max_tokens(1024)
.build();
When the user prompts this agent, the model can decide to call search_products with appropriate arguments. Rig handles the loop automatically: it sends the prompt, receives a tool call, executes the tool, sends the result back to the model, and returns the final text response. A single .prompt() call can involve multiple round trips between your code and the model.
let response = agent.prompt("I need a waterproof jacket for hiking").await?;
Agents can use multiple tools. Define each tool separately and pass them all to the builder:
let tools: Vec<Box<dyn ToolDyn>> = vec![
Box::new(ProductSearch { db: pool.clone() }),
Box::new(OrderLookup { db: pool.clone() }),
Box::new(InventoryCheck { http: http_client.clone() }),
];
let agent = anthropic
.agent("claude-sonnet-4-20250514")
.preamble("You are a customer service agent. You can search products, look up orders, and check inventory.")
.tools(tools)
.build();
The model chooses which tools to call based on the user’s query and the tool descriptions. Good tool descriptions are critical: the model relies on the description field in ToolDefinition to understand when each tool is appropriate.
Streaming LLM responses via SSE
LLM responses arrive token by token. Streaming them to the browser as they generate gives the user immediate feedback instead of a blank screen followed by a wall of text. Rig’s StreamingPrompt trait produces a stream of chunks that you can convert into Axum SSE events.
use axum::response::sse::{Event, KeepAlive, Sse};
use futures_util::{Stream, StreamExt};
use rig::streaming::StreamingPrompt;
use std::convert::Infallible;
async fn stream_response(
State(state): State<AppState>,
Form(input): Form<PromptInput>,
) -> Sse<impl Stream<Item = Result<Event, Infallible>>> {
let prompt = input.prompt.clone();
let agent = state.agent.clone();
let stream = async_stream::stream! {
match agent.stream_prompt(&prompt).await {
Ok(mut completion_stream) => {
let mut stream = completion_stream.stream().await.unwrap();
while let Some(chunk) = stream.next().await {
match chunk {
Ok(rig::streaming::StreamedAssistantContent::Text(text)) => {
let html = format!("<span>{}</span>", text.text);
yield Ok(Event::default().event("chunk").data(html));
}
Ok(_) => {}
Err(e) => {
tracing::error!(error = ?e, "stream error");
yield Ok(
Event::default()
.event("error")
.data("<span class=\"error\">Generation failed</span>"),
);
break;
}
}
}
yield Ok(Event::default().event("done").data("<span class=\"done\"></span>"));
}
Err(e) => {
tracing::error!(error = ?e, "failed to start stream");
yield Ok(
Event::default()
.event("error")
.data("<span class=\"error\">AI service unavailable</span>"),
);
}
}
};
Sse::new(stream).keep_alive(KeepAlive::default())
}
The handler returns Sse<impl Stream>, which Axum sends as Content-Type: text/event-stream. Each text chunk from the model becomes an SSE event with an HTML fragment as its data.
On the browser side, htmx’s SSE extension consumes the events and swaps them into the page. The full SSE-to-htmx wiring (event subscription, sse-swap, connection lifecycle) is covered in the Server-Sent Events section. The relevant htmx markup:
<div hx-ext="sse"
sse-connect="/ai/stream"
sse-close="done">
<div id="response" sse-swap="chunk" hx-swap="beforeend">
</div>
</div>
sse-swap="chunk" appends each chunk event’s data to the target div. sse-close="done" closes the SSE connection when the stream completes.
Escaping HTML in streamed output
LLM output may contain characters that break HTML (<, >, &). If you render the output as raw HTML, you must escape it. Maud’s PreEscaped type handles this, but since streaming bypasses Maud’s template rendering, escape manually:
fn escape_html(s: &str) -> String {
s.replace('&', "&")
.replace('<', "<")
.replace('>', ">")
.replace('"', """)
}
let html = format!("<span>{}</span>", escape_html(&text.text));
If you want the model to produce HTML (e.g., for formatted responses), sanitise the output instead of escaping it. Use a library like ammonia to strip dangerous tags while preserving safe formatting.
Durable AI workflows with Restate
LLM calls are expensive, slow (seconds, not milliseconds), and rate-limited. A crashed process that loses a partially complete AI workflow wastes money and time. Wrapping AI calls in Restate gives you automatic retries, exactly-once execution, and crash recovery for every step.
The pattern: each LLM call goes inside a ctx.run() closure. If the process crashes after the call completes but before the next step starts, Restate replays from the journal and skips the completed call, returning the stored result without re-invoking the model.
A content generation workflow
A workflow that generates a product description, translates it, and stores the results. Each step is independently durable.
use restate_sdk::prelude::*;
use rig::completion::Prompt;
use rig::providers::anthropic;
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ContentRequest {
pub product_id: String,
pub product_name: String,
pub product_details: String,
pub target_languages: Vec<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ContentResult {
pub product_id: String,
pub description: String,
pub translations: Vec<Translation>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Translation {
pub language: String,
pub text: String,
}
#[restate_sdk::workflow]
pub trait ContentGeneration {
async fn run(request: Json<ContentRequest>) -> Result<Json<ContentResult>, HandlerError>;
#[shared]
async fn get_status() -> Result<String, HandlerError>;
}
pub struct ContentGenerationImpl;
impl ContentGeneration for ContentGenerationImpl {
async fn run(
&self,
ctx: WorkflowContext<'_>,
Json(request): Json<ContentRequest>,
) -> Result<Json<ContentResult>, HandlerError> {
ctx.set("status", "Generating description...".to_string());
let description: String = ctx
.run(|| generate_description(request.clone()))
.name("generate_description")
.await?;
let mut translations = Vec::new();
for language in &request.target_languages {
ctx.set("status", format!("Translating to {language}..."));
let translation: String = ctx
.run(|| translate_text(description.clone(), language.clone()))
.name(&format!("translate_{language}"))
.await?;
translations.push(Translation {
language: language.clone(),
text: translation,
});
}
ctx.set("status", "Complete".to_string());
Ok(Json(ContentResult {
product_id: request.product_id,
description,
translations,
}))
}
async fn get_status(
&self,
ctx: SharedWorkflowContext<'_>,
) -> Result<String, HandlerError> {
Ok(ctx
.get::<String>("status")
.await?
.unwrap_or_else(|| "Waiting to start...".to_string()))
}
}
async fn generate_description(request: ContentRequest) -> Result<String, anyhow::Error> {
let client = anthropic::Client::from_env();
let agent = client
.agent("claude-sonnet-4-20250514")
.preamble(
"You are a copywriter. Write a compelling product description \
in 2-3 paragraphs. Be specific and highlight key features."
)
.temperature(0.7)
.build();
let prompt = format!(
"Write a product description for: {}\n\nDetails: {}",
request.product_name, request.product_details
);
Ok(agent.prompt(&prompt).await?)
}
async fn translate_text(text: String, language: String) -> Result<String, anyhow::Error> {
let client = anthropic::Client::from_env();
let agent = client
.agent("claude-sonnet-4-20250514")
.preamble(&format!(
"You are a translator. Translate the following text into {language}. \
Preserve the tone and style of the original."
))
.temperature(0.3)
.build();
Ok(agent.prompt(&text).await?)
}
Each ctx.run() call wraps one LLM invocation. The side effect functions create their own Rig clients because Restate closures must be Send + 'static, which means they cannot borrow the handler’s context. Creating an Anthropic client is cheap (it is just an HTTP client with credentials), so this overhead is negligible compared to the LLM call itself.
If the worker crashes after generating the description but before the translations, Restate restarts the workflow and replays from the journal. The description step returns its stored result without calling the model again, and execution resumes with the first translation.
Triggering the workflow from Axum
Fire-and-forget from an Axum handler, with the workflow running in the background:
async fn generate_content(
State(state): State<AppState>,
Form(input): Form<ContentInput>,
) -> Result<Html<String>, AppError> {
let request = ContentRequest {
product_id: input.product_id.clone(),
product_name: input.product_name,
product_details: input.product_details,
target_languages: vec!["fr".into(), "de".into(), "es".into()],
};
state
.http
.post(format!(
"{}/ContentGeneration/{}/run/send",
state.restate_ingress_url, input.product_id,
))
.json(&request)
.send()
.await
.map_err(|e| {
tracing::error!(error = ?e, "failed to trigger content generation");
AppError::BadGateway("could not start content generation".into())
})?;
Ok(Html(render_generation_progress(&input.product_id)))
}
The /send suffix makes the call fire-and-forget. The Restate workflow runs durably in the background. The rendered page can use SSE to display progress updates, following the same pattern shown in the Background Jobs section.
When to use Restate for AI calls
Wrap LLM calls in Restate when:
- The call is part of a multi-step workflow where earlier steps are expensive to repeat
- The result will be stored (database write, file creation) and losing it means re-running the model
- You are chaining multiple model calls where later calls depend on earlier results
- The operation is user-initiated and the user expects it to complete even if the server restarts
Skip Restate for:
- Single low-latency completions served directly in the HTTP response (the basic handler pattern above)
- Streaming responses where the user sees output in real time and can retry if it fails
- Development and experimentation where durability adds friction
Prompt management
Hardcoded prompt strings work for simple cases. As your application grows, prompts need structure.
Preamble as configuration
Store system prompts in configuration rather than code. This lets you adjust model behaviour without redeploying:
#[derive(Clone)]
pub struct AiConfig {
pub model: String,
pub summarise_preamble: String,
pub chat_preamble: String,
pub temperature: f64,
}
impl AiConfig {
pub fn from_env() -> Self {
Self {
model: std::env::var("AI_MODEL")
.unwrap_or_else(|_| "claude-sonnet-4-20250514".to_string()),
summarise_preamble: std::env::var("AI_SUMMARISE_PREAMBLE")
.unwrap_or_else(|_| "You summarise text concisely in 2-3 sentences.".to_string()),
chat_preamble: std::env::var("AI_CHAT_PREAMBLE")
.unwrap_or_else(|_| "You are a helpful assistant.".to_string()),
temperature: std::env::var("AI_TEMPERATURE")
.ok()
.and_then(|v| v.parse().ok())
.unwrap_or(0.3),
}
}
}
Build agents from the configuration at startup:
let ai_config = AiConfig::from_env();
let anthropic = anthropic::Client::from_env();
let summarise_agent = anthropic
.agent(&ai_config.model)
.preamble(&ai_config.summarise_preamble)
.temperature(ai_config.temperature)
.build();
Prompt templates
For prompts that combine fixed instructions with dynamic data, format strings are sufficient:
let prompt = format!(
"Classify the following support ticket into one of these categories: \
billing, technical, account, other.\n\n\
Respond with only the category name.\n\n\
Ticket: {ticket_text}"
);
For more complex templates with conditional sections, build the prompt string with standard Rust string manipulation. There is no need for a dedicated templating engine for prompts. Rig’s .context() method on the agent builder is another option for injecting dynamic context alongside the preamble:
let agent = anthropic
.agent("claude-sonnet-4-20250514")
.preamble("You answer questions about the user's order history.")
.context(&format!("Customer name: {}\nAccount since: {}", name, since))
.build();
Context documents are sent alongside the preamble in every request, giving the model additional information without modifying the system prompt.
Retrieval-augmented generation
Retrieval-augmented generation (RAG) grounds an LLM’s answers in your data. Instead of relying on the model’s training data alone, you retrieve relevant documents from your database and include them in the prompt as context. The model answers based on what you provided, reducing hallucination and keeping responses current.
The pattern has three steps: embed the user’s query into a vector, search your database for similar documents, and inject the results into the prompt alongside the question.
The retrieval step
The Semantic Search section covers pgvector setup, embedding generation with Ollama, and similarity queries with SQLx. The functions below come directly from that section:
generate_embeddings() converts text into vectors via Ollama’s /api/embed endpoint
semantic_search() finds the most similar documents by cosine distance
If your application needs better retrieval quality, swap semantic_search() for the hybrid_search() function from the same section, which combines vector similarity with full-text search using Reciprocal Rank Fusion.
Building a RAG handler
Retrieve context, format it, and pass it to the agent in a single Axum handler:
use axum::{extract::State, response::Html, Form};
use rig::completion::Prompt;
use serde::Deserialize;
#[derive(Deserialize)]
struct AskInput {
question: String,
}
async fn ask_with_context(
State(state): State<AppState>,
Form(input): Form<AskInput>,
) -> Result<Html<String>, AppError> {
let embeddings = generate_embeddings(
&state.http,
&state.config.ollama_url,
&[&input.question],
)
.await
.map_err(|e| {
tracing::error!(error = ?e, "embedding generation failed");
AppError::BadGateway("embedding service unavailable".into())
})?;
let query_embedding = embeddings
.into_iter()
.next()
.ok_or_else(|| AppError::Internal("no embedding returned".into()))?;
let documents = semantic_search(&state.db, query_embedding, 5).await?;
let context = documents
.iter()
.map(|doc| format!("## {}\n{}", doc.title, doc.content))
.collect::<Vec<_>>()
.join("\n\n");
let prompt = format!(
"Answer the question using only the provided documents. \
If the documents do not contain enough information, say so.\n\n\
{context}\n\n\
Question: {}",
input.question
);
let answer = state.agent.prompt(&prompt).await.map_err(|e| {
tracing::error!(error = ?e, "RAG completion failed");
AppError::BadGateway("AI service unavailable".into())
})?;
Ok(Html(format!(
"<div class=\"answer\">{}</div>",
escape_html(&answer)
)))
}
The agent used here is the same one built at startup and stored in AppState, as shown in the basic completions section. The only difference is that the prompt now includes retrieved documents as context.
Context window management
Retrieved documents consume input tokens. A pgvector query returning five documents of 500 words each adds roughly 3,000 to 4,000 tokens to the prompt. Monitor this budget:
let max_context_chars = 8_000;
let context = if context.len() > max_context_chars {
let truncated = &context[..max_context_chars];
truncated
.rfind("\n\n")
.map(|pos| &truncated[..pos])
.unwrap_or(truncated)
.to_string()
} else {
context
};
For large document sets, retrieve more candidates than you need (e.g., 10 to 20) and include only those that fit within your token budget. The similarity score from semantic_search() helps here: set a minimum threshold (e.g., 0.7) and discard documents below it.
Alternative: Rig’s dynamic_context
Rig provides a built-in RAG mechanism through .dynamic_context() on the agent builder. Combined with the rig-postgres companion crate, which implements VectorStoreIndex for pgvector, you can wire retrieval directly into the agent:
let vector_store = PostgresVectorStore::default(embedding_model, pool);
let index = vector_store.index(embedding_model);
let agent = anthropic
.agent("claude-sonnet-4-20250514")
.preamble("Answer questions using the provided context.")
.dynamic_context(5, index)
.build();
With .dynamic_context(5, index), Rig automatically retrieves the top 5 similar documents before every prompt and injects them as context. This is convenient but less flexible: you cannot use hybrid search, you cannot filter results by similarity threshold, and rig-postgres requires its own table schema (id uuid, document jsonb, embedded_text text, embedding vector(N)) that differs from the typed columns established in the Semantic Search section. The manual approach gives you full control over retrieval and context formatting.
Agentic retrieval
Standard RAG retrieves context on every query regardless of whether the query needs it. “What is 2 + 2?” triggers a vector search that returns irrelevant results and wastes tokens. Agentic retrieval inverts this: the LLM decides when to search, what to search for, and whether to search again with a refined query.
This is a direct application of the Tool trait covered in the tool use section. Define a tool that wraps semantic search, attach it to an agent, and let the model decide when retrieval is appropriate.
use rig::tool::Tool;
use rig::completion::ToolDefinition;
use serde::{Deserialize, Serialize};
use serde_json::json;
#[derive(Debug, Deserialize)]
struct SearchArgs {
query: String,
max_results: Option<u32>,
}
#[derive(Debug, thiserror::Error)]
#[error("knowledge base search failed: {0}")]
struct SearchError(String);
#[derive(Serialize, Deserialize)]
struct KnowledgeBaseSearch {
db: sqlx::PgPool,
http: reqwest::Client,
ollama_url: String,
}
impl Tool for KnowledgeBaseSearch {
const NAME: &'static str = "search_knowledge_base";
type Error = SearchError;
type Args = SearchArgs;
type Output = String;
async fn definition(&self, _prompt: String) -> ToolDefinition {
ToolDefinition {
name: "search_knowledge_base".to_string(),
description: "Search the knowledge base for documents relevant to a query. \
Use this when you need factual information to answer a question. \
You can call this tool multiple times with different or refined \
queries if the initial results are insufficient."
.to_string(),
parameters: json!({
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Natural language search query describing what information you need"
},
"max_results": {
"type": "integer",
"description": "Maximum number of documents to return (default 5)"
}
},
"required": ["query"]
}),
}
}
async fn call(&self, args: Self::Args) -> Result<Self::Output, Self::Error> {
let limit = args.max_results.unwrap_or(5) as i64;
let embeddings =
generate_embeddings(&self.http, &self.ollama_url, &[&args.query])
.await
.map_err(|e| SearchError(e.to_string()))?;
let query_embedding = embeddings
.into_iter()
.next()
.ok_or_else(|| SearchError("no embedding returned".into()))?;
let results = semantic_search(&self.db, query_embedding, limit)
.await
.map_err(|e| SearchError(e.to_string()))?;
if results.is_empty() {
return Ok("No relevant documents found.".to_string());
}
let formatted = results
.iter()
.map(|doc| {
format!(
"## {} (relevance: {:.0}%)\n{}",
doc.title,
doc.similarity * 100.0,
doc.content
)
})
.collect::<Vec<_>>()
.join("\n\n");
Ok(formatted)
}
}
The tool wraps the same generate_embeddings() and semantic_search() functions from the Semantic Search section. The model receives the formatted results as text and reasons about them.
Two details in the tool definition matter for multi-turn retrieval:
- The description explicitly tells the model it can call the tool multiple times with different queries. Without this, models tend to search once and work with whatever comes back.
- Including the relevance percentage in the output helps the model judge whether the results are useful or whether a refined search is warranted.
Building the agent
use rig::tool::ToolDyn;
let search_tool = KnowledgeBaseSearch {
db: pool.clone(),
http: reqwest::Client::new(),
ollama_url: config.ollama_url.clone(),
};
let tools: Vec<Box<dyn ToolDyn>> = vec![Box::new(search_tool)];
let agent = anthropic
.agent("claude-sonnet-4-20250514")
.preamble(
"You are a knowledge assistant. You have access to a search tool that \
queries the knowledge base. Use it when you need factual information to \
answer a question. If your first search does not return relevant results, \
try rephrasing the query or searching for related terms. When you have \
enough information, answer the question directly. If you cannot find the \
answer after searching, say so."
)
.tools(tools)
.max_tokens(1024)
.build();
The preamble instructs the model to search selectively and refine when needed. A single .prompt() call can trigger multiple search rounds: the model calls the tool, reads the results, decides they are too broad, calls the tool again with a more specific query, and synthesises an answer from the combined results. Rig manages this loop automatically, as described in the tool use section.
Wiring into an Axum handler
async fn ask_agent(
State(state): State<AppState>,
Form(input): Form<AskInput>,
) -> Result<Html<String>, AppError> {
let answer = state.rag_agent.prompt(&input.question).await.map_err(|e| {
tracing::error!(error = ?e, "agentic retrieval failed");
AppError::BadGateway("AI service unavailable".into())
})?;
Ok(Html(format!(
"<div class=\"answer\">{}</div>",
escape_html(&answer)
)))
}
The handler is simpler than the manual RAG handler because the agent manages retrieval internally. The trade-off is less control: you cannot inspect or filter the retrieved documents before they reach the model, and each query may trigger zero, one, or several search tool calls depending on the model’s judgement.
RAG vs agentic retrieval
Use standard RAG when:
- Every query needs context from the knowledge base (e.g., a documentation Q&A system)
- You want deterministic retrieval: same query always retrieves the same documents
- You need to control exactly which documents the model sees
Use agentic retrieval when:
- Queries vary widely and not all require retrieval (e.g., a general assistant that sometimes needs to look things up)
- The model benefits from refining its search strategy based on initial results
- You want the agent to combine multiple searches to answer complex questions
Both approaches can be made durable with Restate using the same patterns shown in the durable AI workflows section.
Gotchas
LLM calls are slow. A typical completion takes 1 to 10 seconds. Do not call them synchronously in a request that the user is waiting on unless you are streaming the response. For non-streaming use cases, trigger a Restate workflow and show progress.
Token limits are real. Each model has input and output token limits. If your prompt plus context exceeds the input limit, the API returns an error. Track prompt sizes, especially when injecting user-provided content or database results. Use .max_tokens() on the agent builder to cap output length.
Rate limits vary by provider. Anthropic, OpenAI, and other providers enforce rate limits on tokens per minute and requests per minute. Handle 429 Too Many Requests errors gracefully. Restate’s retry logic helps here: if a rate limit error is retryable, the journaled side effect retries automatically with backoff.
Model output is not safe HTML. Never insert raw LLM output into an HTML page without escaping or sanitising. Models can produce arbitrary text, including strings that look like HTML tags or script injections. Escape by default, sanitise only when you explicitly want formatted output.
Tool definitions need good descriptions. The model decides whether to call a tool based on the description field in ToolDefinition and the parameter descriptions. Vague descriptions lead to the model not calling tools when it should, or calling them with wrong arguments. Write descriptions as if explaining the tool to a colleague who will use it without seeing the implementation.
Rig creates HTTP clients internally. Each Rig provider client manages its own HTTP connection pool. This is separate from the reqwest::Client you use for other external API calls. Do not try to share a single reqwest::Client across both Rig and your own HTTP calls.
Ollama model availability. Ollama models must be pulled before use. If the model is not available locally, the API call fails. Pull models as part of your development setup, not at application startup. For production Ollama deployments, pre-bake models into the container image or volume.
Provider-specific features. Some features (vision, extended thinking, tool use with streaming) vary by provider and model. Test your specific provider/model combination. Rig’s unified interface covers the common surface, but edge cases may behave differently across providers.
Retrieved context is a prompt injection surface. In RAG, retrieved documents become part of the prompt. If your documents contain adversarial text (e.g., “Ignore previous instructions and…”), the model may follow it. This is a fundamental limitation of injecting external content into prompts. Sanitise stored content if it originates from untrusted sources, and do not treat model output from RAG queries as trusted.
Infrastructure
File Storage
File uploads and downloads are a core feature in most web applications: user avatars, document attachments, image galleries, CSV exports. The S3 API has become the standard interface for object storage, and every major provider implements it. Write your code against S3 once, then swap between a local development server and any production provider by changing environment variables.
This section covers setting up an S3-compatible storage backend, handling file uploads in Axum (both server-side and direct-to-storage), generating presigned URLs, and serving files back to users.
Dependencies
[dependencies]
rust-s3 = "0.37"
axum = { version = "0.8", features = ["multipart"] }
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
uuid = { version = "1", features = ["v4"] }
rust-s3 is a lightweight S3 client that works with any S3-compatible provider. It supports async operations via tokio out of the box, with a clean API centred on the Bucket type. The aws-sdk-s3 crate is the other option, but it pulls in a larger dependency tree and its API is more verbose. rust-s3 covers everything needed here.
The multipart feature on axum enables the Multipart extractor for handling file upload forms.
RustFS for local development
RustFS is an S3-compatible object storage server written in Rust. It serves as the local development replacement for production object storage, the same way PostgreSQL in Docker serves as the local database. RustFS is Apache 2.0 licensed, making it a good alternative to MinIO (AGPL).
Add RustFS to your Docker Compose file alongside your other backing services:
services:
rustfs:
image: rustfs/rustfs:latest
ports:
- "9000:9000"
- "9001:9001"
environment:
RUSTFS_ACCESS_KEY: rustfsadmin
RUSTFS_SECRET_KEY: rustfsadmin
RUSTFS_CONSOLE_ENABLE: "true"
volumes:
- rustfs-data:/data
- rustfs-logs:/logs
command: /data
volumes:
rustfs-data:
rustfs-logs:
Port 9000 exposes the S3 API. Port 9001 exposes a web console for browsing buckets and objects. Default credentials are rustfsadmin / rustfsadmin.
After starting the container, create a bucket for development. You can do this through the web console at http://localhost:9001, or with the MinIO client CLI (which works with any S3-compatible server):
mc alias set rustfs http://localhost:9000 rustfsadmin rustfsadmin
mc mb rustfs/uploads
Configuring the S3 client
Build the Bucket handle from environment variables so the same code works in development (RustFS) and production (Hetzner Object Storage or any other provider).
use s3::bucket::Bucket;
use s3::creds::Credentials;
use s3::Region;
pub fn create_bucket() -> Box<Bucket> {
let region = Region::Custom {
region: env_var("S3_REGION"),
endpoint: env_var("S3_ENDPOINT"),
};
let credentials = Credentials::new(
Some(&env_var("S3_ACCESS_KEY")),
Some(&env_var("S3_SECRET_KEY")),
None,
None,
None,
)
.expect("valid S3 credentials");
let bucket_name = env_var("S3_BUCKET");
Bucket::new(&bucket_name, region, credentials).expect("valid S3 bucket configuration")
}
fn env_var(name: &str) -> String {
std::env::var(name).unwrap_or_else(|_| panic!("{name} must be set"))
}
Region::Custom accepts any endpoint URL, which is how you point the client at RustFS locally or Hetzner in production. The Bucket type is the main handle for all S3 operations: uploads, downloads, listing, deletion, and presigned URL generation.
Add the bucket to your application state:
#[derive(Clone)]
pub struct AppState {
pub db: sqlx::PgPool,
pub bucket: Box<Bucket>,
}
Environment variables
For local development with RustFS:
S3_ENDPOINT=http://localhost:9000
S3_REGION=us-east-1
S3_ACCESS_KEY=rustfsadmin
S3_SECRET_KEY=rustfsadmin
S3_BUCKET=uploads
For production with Hetzner Object Storage:
S3_ENDPOINT=https://fsn1.your-objectstorage.com
S3_REGION=fsn1
S3_ACCESS_KEY=<hetzner-access-key>
S3_SECRET_KEY=<hetzner-secret-key>
S3_BUCKET=prod-uploads
Hetzner Object Storage provides S3-compatible storage in European data centres (Falkenstein, Nuremberg, Helsinki). Generate access keys from the Hetzner Cloud Console. The endpoint must match the region where your bucket was created.
Upload handling in Axum
Server-side upload via multipart
The most straightforward approach: the browser sends the file to your Axum handler via a standard HTML multipart form, and the handler uploads it to S3. No client-side JavaScript required, which fits the HDA model well.
The HTML form:
use maud::{html, Markup};
fn upload_form() -> Markup {
html! {
form method="post" action="/files" enctype="multipart/form-data" {
label {
"Choose file"
input type="file" name="file" required;
}
button type="submit" { "Upload" }
}
}
}
The handler:
use axum::{
extract::{Multipart, State},
response::{IntoResponse, Redirect},
};
use uuid::Uuid;
pub async fn upload_file(
State(state): State<AppState>,
mut multipart: Multipart,
) -> Result<impl IntoResponse, AppError> {
let field = multipart
.next_field()
.await?
.ok_or(AppError::BadRequest("no file provided".into()))?;
let original_name = field
.file_name()
.unwrap_or("unnamed")
.to_string();
let content_type = field
.content_type()
.unwrap_or("application/octet-stream")
.to_string();
let data = field.bytes().await?;
let ext = original_name
.rsplit('.')
.next()
.unwrap_or("bin");
let key = format!("uploads/{}.{}", Uuid::new_v4(), ext);
state
.bucket
.put_object_with_content_type(&key, &data, &content_type)
.await
.map_err(|e| AppError::Internal(format!("S3 upload failed: {e}")))?;
Ok(Redirect::to("/files"))
}
field.bytes() reads the entire file into memory. This is fine for files up to a few megabytes (avatars, documents). For larger files, use the presigned URL approach described below.
The object key uses a UUID to avoid filename collisions and path traversal issues. Store the mapping between the generated key and the original filename in your database.
Adjusting the body size limit
Axum’s default body size limit is 2 MB. For file uploads, you’ll typically need to raise this on the upload route:
use axum::{extract::DefaultBodyLimit, routing::post, Router};
let app = Router::new()
.route("/files", post(upload_file))
.layer(DefaultBodyLimit::max(25 * 1024 * 1024));
Apply the limit to specific routes rather than globally. A 25 MB limit on your file upload route is reasonable; the same limit on your login form is not.
Direct upload via presigned URL
For larger files, skip the server entirely. The server generates a presigned PUT URL, and the browser uploads directly to S3. This avoids buffering the file through your application server, reducing memory usage and latency.
The flow:
- The browser requests a presigned upload URL from your server.
- The server generates a presigned PUT URL with a short expiry.
- The browser uploads the file directly to S3 using that URL.
- The browser notifies the server that the upload is complete.
The handler that generates the presigned URL:
use axum::{extract::State, Json};
use serde::{Deserialize, Serialize};
use uuid::Uuid;
#[derive(Deserialize)]
pub struct PresignedUploadRequest {
pub filename: String,
pub content_type: String,
}
#[derive(Serialize)]
pub struct PresignedUploadResponse {
pub upload_url: String,
pub object_key: String,
}
pub async fn presigned_upload_url(
State(state): State<AppState>,
Json(req): Json<PresignedUploadRequest>,
) -> Result<Json<PresignedUploadResponse>, AppError> {
let ext = req
.filename
.rsplit('.')
.next()
.unwrap_or("bin");
let key = format!("uploads/{}.{}", Uuid::new_v4(), ext);
let url = state
.bucket
.presign_put(&key, 3600, None, None)
.await
.map_err(|e| AppError::Internal(format!("presign failed: {e}")))?;
Ok(Json(PresignedUploadResponse {
upload_url: url,
object_key: key,
}))
}
The presigned URL is valid for 3600 seconds (one hour). The client uploads with a PUT request to that URL. No credentials are needed on the client side because the URL itself contains the authentication signature.
On the client, a small amount of JavaScript handles the direct upload:
async function uploadFile(file) {
const res = await fetch("/files/presign", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
filename: file.name,
content_type: file.type,
}),
});
const { upload_url, object_key } = await res.json();
await fetch(upload_url, {
method: "PUT",
headers: { "Content-Type": file.type },
body: file,
});
await fetch("/files/confirm", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ object_key }),
});
}
The confirmation step (step 3) is where the server records the file in the database. Without it, orphaned objects accumulate in S3 from abandoned or failed uploads. Set up a lifecycle rule on your bucket to automatically delete objects older than 24 hours from the uploads/ prefix if they were never confirmed.
Choosing between the two approaches
| Concern | Server-side multipart | Presigned URL |
| Simplicity | Simpler. Standard HTML form, no JavaScript. | Requires JavaScript for the upload flow. |
| Server load | File passes through your server. Memory proportional to file size. | File goes directly to S3. Server only generates a URL. |
| File size | Practical up to ~25 MB. | Works for files of any size. |
| Progress tracking | Requires SSE or polling for progress on large files. | Browser fetch can report upload progress natively. |
| HDA fit | Works naturally with forms and htmx. | Requires a small JavaScript module. |
Use server-side multipart as the default for typical uploads (documents, images, avatars). Switch to presigned URLs when files are large enough that buffering them through the server becomes a problem.
Serving files to users
Presigned GET URLs
The simplest way to serve a file: generate a presigned GET URL and redirect the browser to it.
pub async fn download_file(
State(state): State<AppState>,
Path(file_id): Path<Uuid>,
) -> Result<impl IntoResponse, AppError> {
let file = sqlx::query_as!(
FileRecord,
"SELECT object_key, original_name FROM files WHERE id = $1",
file_id
)
.fetch_optional(&state.db)
.await?
.ok_or(AppError::NotFound("file not found".into()))?;
let url = state
.bucket
.presign_get(&file.object_key, 3600, None)
.await
.map_err(|e| AppError::Internal(format!("presign failed: {e}")))?;
Ok(Redirect::temporary(&url))
}
The presigned URL expires after an hour. The browser follows the redirect and downloads directly from S3. This keeps file serving load off your application server.
Controlling Content-Disposition
By default, browsers display files inline if they can (images, PDFs). To force a download with the original filename, pass response-content-disposition as a query parameter on the presigned URL:
use std::collections::HashMap;
pub async fn download_file_as_attachment(
State(state): State<AppState>,
Path(file_id): Path<Uuid>,
) -> Result<impl IntoResponse, AppError> {
let file = sqlx::query_as!(
FileRecord,
"SELECT object_key, original_name FROM files WHERE id = $1",
file_id
)
.fetch_optional(&state.db)
.await?
.ok_or(AppError::NotFound("file not found".into()))?;
let mut queries = HashMap::new();
queries.insert(
"response-content-disposition".to_string(),
format!("attachment; filename=\"{}\"", file.original_name),
);
let url = state
.bucket
.presign_get(&file.object_key, 3600, Some(queries))
.await
.map_err(|e| AppError::Internal(format!("presign failed: {e}")))?;
Ok(Redirect::temporary(&url))
}
Use attachment when the user explicitly clicks a download link. Use inline (or omit the header) when displaying an image or PDF in the browser.
Proxy handler for access-controlled files
Presigned URLs are convenient but have a limitation: once generated, anyone with the URL can access the file until it expires. For files that require per-request access control (private documents, paid content), proxy the file through your server instead.
use axum::{
body::Body,
http::{header, StatusCode},
response::Response,
};
pub async fn serve_private_file(
State(state): State<AppState>,
Path(file_id): Path<Uuid>,
user: AuthenticatedUser,
) -> Result<Response, AppError> {
let file = sqlx::query_as!(
FileRecord,
"SELECT object_key, original_name, content_type, owner_id FROM files WHERE id = $1",
file_id
)
.fetch_optional(&state.db)
.await?
.ok_or(AppError::NotFound("file not found".into()))?;
if file.owner_id != user.id {
return Err(AppError::Forbidden("not your file".into()));
}
let response = state
.bucket
.get_object(&file.object_key)
.await
.map_err(|e| AppError::Internal(format!("S3 get failed: {e}")))?;
Ok(Response::builder()
.status(StatusCode::OK)
.header(header::CONTENT_TYPE, &file.content_type)
.header(
header::CONTENT_DISPOSITION,
format!("inline; filename=\"{}\"", file.original_name),
)
.body(Body::from(response.to_vec()))
.unwrap())
}
This approach loads the entire file into memory before sending it to the client. For large files behind access control, consider generating a short-lived presigned URL (60 seconds) after the access check passes, then redirecting. This gives you per-request authorisation without proxying the bytes.
Image thumbnails
For image-heavy applications (galleries, user avatars), serve resized thumbnails instead of full-size originals. Generate thumbnails at upload time and store them as separate S3 objects.
use image::imageops::FilterType;
use std::io::Cursor;
fn generate_thumbnail(data: &[u8], max_dimension: u32) -> Result<Vec<u8>, image::ImageError> {
let img = image::load_from_memory(data)?;
let thumb = img.resize(max_dimension, max_dimension, FilterType::Lanczos3);
let mut buf = Vec::new();
thumb.write_to(&mut Cursor::new(&mut buf), image::ImageFormat::WebP)?;
Ok(buf)
}
Add the image crate to your dependencies:
[dependencies]
image = { version = "0.25", default-features = false, features = ["webp", "jpeg", "png"] }
Store the thumbnail alongside the original:
let thumb_data = generate_thumbnail(&data, 300)?;
let thumb_key = format!("uploads/thumb_{}.webp", file_id);
state
.bucket
.put_object_with_content_type(&thumb_key, &thumb_data, "image/webp")
.await?;
WebP produces smaller files than JPEG at equivalent quality. Disable default features on the image crate and enable only the formats you need, as the full feature set pulls in decoders you won’t use.
For applications where thumbnail generation is slow or needs to handle many formats, move the processing to a background job via Restate and update the database record when the thumbnail is ready.
Deleting files
When a user deletes a record that has an associated file, delete the S3 object as well:
pub async fn delete_file(
State(state): State<AppState>,
Path(file_id): Path<Uuid>,
) -> Result<impl IntoResponse, AppError> {
let file = sqlx::query_as!(
FileRecord,
"SELECT object_key FROM files WHERE id = $1",
file_id
)
.fetch_optional(&state.db)
.await?
.ok_or(AppError::NotFound("file not found".into()))?;
state
.bucket
.delete_object(&file.object_key)
.await
.map_err(|e| AppError::Internal(format!("S3 delete failed: {e}")))?;
sqlx::query!("DELETE FROM files WHERE id = $1", file_id)
.execute(&state.db)
.await?;
Ok(Redirect::to("/files"))
}
Delete the S3 object before the database record. If the S3 deletion fails, the database record remains and you can retry. If you delete the database record first and the S3 deletion fails, you have an orphaned object with no reference to it.
Database schema for file records
A minimal files table to track uploaded objects:
CREATE TABLE files (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
object_key TEXT NOT NULL,
original_name TEXT NOT NULL,
content_type TEXT NOT NULL,
size_bytes BIGINT NOT NULL,
owner_id UUID NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_files_owner ON files(owner_id);
The object_key is the S3 path. The original_name is what the user uploaded. Keep both: the key for S3 operations, the name for display and download headers.
Production providers
The S3 API is a de facto standard. The Region::Custom configuration in rust-s3 means any compliant provider works without code changes.
Hetzner Object Storage is a good default for European deployments. EUR 4.99/month includes 1 TB of storage and 1 TB of egress. Three EU regions: Falkenstein (fsn1), Nuremberg (nbg1), and Helsinki (hel1). Endpoints follow the pattern https://{region}.your-objectstorage.com. Generate access keys from the Hetzner Cloud Console.
Other S3-compatible providers worth considering:
- Cloudflare R2: No egress fees. Good for globally distributed read-heavy workloads.
- Backblaze B2: Cheap storage at $6/TB/month. Free egress to Cloudflare via the Bandwidth Alliance.
- AWS S3: The original. More expensive, but the widest feature set and the most mature tooling ecosystem.
- DigitalOcean Spaces: Simple pricing, CDN included. $5/month for 250 GB.
Switching providers means changing four environment variables. No code changes required.
Gotchas
Set Content-Type on upload. If you upload without setting the content type, S3 defaults to application/octet-stream. Browsers then download the file instead of displaying it inline, even for images and PDFs. Always pass the content type from the upload form to put_object_with_content_type.
Generate unique object keys. Never use the original filename as the S3 key. Users upload files named document.pdf constantly. Use a UUID or similar unique identifier. This also prevents path traversal attacks where a crafted filename like ../../etc/passwd could cause problems.
Handle the 2 MB default body limit. Axum rejects request bodies larger than 2 MB by default. If your upload handler returns a 413 Payload Too Large error, you forgot to raise the limit with DefaultBodyLimit::max(). Apply the higher limit only to upload routes.
Clean up orphaned objects. Presigned URL uploads can be abandoned partway through. Failed server-side uploads might write to S3 but crash before recording the database entry. Set an S3 lifecycle rule to expire unconfirmed objects after 24 hours, or run a periodic cleanup job that compares S3 contents against database records.
Delete S3 objects before database records. If the S3 delete fails, the database record survives and you can retry. The reverse order leaves orphaned objects you can’t find.
Watch for CORS with presigned URLs. When using direct browser uploads, the S3 endpoint must return appropriate CORS headers. Configure CORS on the bucket to allow PUT requests from your application’s origin. RustFS and most S3 providers support bucket-level CORS configuration.
Don’t store files in the database. It’s tempting to store small files as BYTEA columns in PostgreSQL. Resist this. It bloats your database, makes backups slower, and prevents you from using CDN or S3 features like presigned URLs. Object storage exists for a reason.
Email
Transactional email is the backbone of account management: registration confirmations, password resets, order receipts, notification digests. lettre is the standard Rust crate for composing and sending email via SMTP. It provides a message builder, SMTP transport with TLS, connection pooling, and async support via Tokio.
This section covers configuring lettre for SMTP, composing emails with Maud templates, sending asynchronously without blocking request handlers, and testing with MailCrab in development.
Dependencies
[dependencies]
lettre = { version = "0.11", features = ["tokio1", "tokio1-rustls", "aws-lc-rs"] }
tokio = { version = "1", features = ["full"] }
maud = { version = "0.26", features = ["axum"] }
The tokio1 feature enables AsyncSmtpTransport. tokio1-rustls provides TLS via rustls, consistent with the rest of the stack (reqwest uses rustls by default). The aws-lc-rs feature selects the crypto provider that rustls requires.
lettre’s default features include the synchronous SMTP transport, the message builder, hostname detection, and connection pooling. Adding the three features above layers async Tokio support on top.
SMTP configuration
Configure the SMTP transport from environment variables so the same code works in development (MailCrab) and production (your SMTP provider).
use lettre::{
transport::smtp::authentication::Credentials,
AsyncSmtpTransport, Tokio1Executor,
};
pub fn build_mailer() -> AsyncSmtpTransport<Tokio1Executor> {
let host = env_var("SMTP_HOST");
let username = std::env::var("SMTP_USERNAME").ok();
let password = std::env::var("SMTP_PASSWORD").ok();
let mut builder = if env_var_or("SMTP_TLS", "true") == "true" {
AsyncSmtpTransport::<Tokio1Executor>::relay(&host)
.expect("valid SMTP relay hostname")
} else {
AsyncSmtpTransport::<Tokio1Executor>::builder_dangerous(&host)
};
if let (Some(user), Some(pass)) = (username, password) {
builder = builder.credentials(Credentials::new(user, pass));
}
if let Ok(port) = std::env::var("SMTP_PORT") {
builder = builder.port(port.parse().expect("SMTP_PORT must be a number"));
}
builder.build()
}
fn env_var(name: &str) -> String {
std::env::var(name).unwrap_or_else(|_| panic!("{name} must be set"))
}
fn env_var_or(name: &str, default: &str) -> String {
std::env::var(name).unwrap_or_else(|_| default.to_string())
}
builder_dangerous disables TLS certificate verification. Use it only for local development with MailCrab, where there is no TLS. In production, relay() establishes an implicit TLS connection (port 465) and validates the server certificate.
Add the mailer to your application state:
use lettre::{AsyncSmtpTransport, Tokio1Executor};
#[derive(Clone)]
pub struct AppState {
pub db: sqlx::PgPool,
pub mailer: AsyncSmtpTransport<Tokio1Executor>,
}
AsyncSmtpTransport implements Clone and manages a connection pool internally, so sharing it through Axum state is safe and efficient.
Environment variables
For local development with MailCrab:
SMTP_HOST=localhost
SMTP_PORT=1025
SMTP_TLS=false
For production with an SMTP provider:
SMTP_HOST=smtp.example.com
SMTP_PORT=465
SMTP_TLS=true
SMTP_USERNAME=apikey
SMTP_PASSWORD=<your-smtp-api-key>
Most transactional email providers (Postmark, Resend, Amazon SES, Mailgun) expose an SMTP interface. Use their SMTP credentials here. No provider-specific SDK is needed.
Sender address
Define a default sender address in configuration rather than hardcoding it in every email:
use lettre::message::Mailbox;
pub fn default_sender() -> Mailbox {
let address = env_var("EMAIL_FROM");
address.parse().expect("EMAIL_FROM must be a valid email address")
}
EMAIL_FROM="MyApp <noreply@example.com>"
Composing messages
lettre’s Message::builder() provides a fluent API for constructing RFC-compliant email messages.
Plain text
use lettre::Message;
use lettre::message::header::ContentType;
let email = Message::builder()
.from(default_sender())
.to("user@example.com".parse().unwrap())
.subject("Your password has been changed")
.header(ContentType::TEXT_PLAIN)
.body("Your password was changed successfully. If you did not make this change, contact support immediately.".to_string())
.expect("valid email message");
HTML with plain text fallback
Every HTML email should include a plain text alternative. Some email clients only render plain text, and spam filters penalise HTML-only messages.
use lettre::message::MultiPart;
let email = Message::builder()
.from(default_sender())
.to("user@example.com".parse().unwrap())
.subject("Confirm your email address")
.multipart(MultiPart::alternative_plain_html(
"Visit this link to confirm: https://example.com/confirm?token=abc123".to_string(),
"<p>Click <a href=\"https://example.com/confirm?token=abc123\">here</a> to confirm your email address.</p>".to_string(),
))
.expect("valid email message");
MultiPart::alternative_plain_html creates a multipart/alternative body with both variants. The email client picks whichever it prefers (typically HTML if available).
Attachments
use lettre::message::{Attachment, MultiPart, SinglePart};
use lettre::message::header::ContentType;
let pdf_bytes = std::fs::read("invoice.pdf").expect("read invoice");
let attachment = Attachment::new("invoice.pdf".to_string())
.body(pdf_bytes, ContentType::parse("application/pdf").unwrap());
let email = Message::builder()
.from(default_sender())
.to("customer@example.com".parse().unwrap())
.subject("Your invoice")
.multipart(
MultiPart::mixed()
.multipart(MultiPart::alternative_plain_html(
"Your invoice is attached.".to_string(),
"<p>Your invoice is attached.</p>".to_string(),
))
.singlepart(attachment),
)
.expect("valid email message");
MultiPart::mixed() combines the message body with attachments. Nest a MultiPart::alternative_plain_html inside it for the text content, then append attachments with .singlepart().
Email templates with Maud
Maud is already in the stack for HTML rendering. Use it for email templates too. This keeps all HTML generation in one system, with compile-time checking and the same component patterns as your page templates.
Email HTML is not web HTML. Email clients ignore external stylesheets, strip <style> tags inconsistently, and have limited CSS support. Inline styles on every element are the only reliable approach.
A base email layout
use maud::{html, Markup, PreEscaped, DOCTYPE};
pub fn email_layout(title: &str, content: Markup) -> Markup {
html! {
(DOCTYPE)
html lang="en" {
head {
meta charset="utf-8";
meta name="viewport" content="width=device-width, initial-scale=1.0";
title { (title) }
}
body style="margin: 0; padding: 0; background-color: #f4f4f5; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;" {
table role="presentation" width="100%" cellpadding="0" cellspacing="0" style="background-color: #f4f4f5;" {
tr {
td align="center" style="padding: 40px 20px;" {
table role="presentation" width="600" cellpadding="0" cellspacing="0" style="background-color: #ffffff; border-radius: 8px; overflow: hidden;" {
tr {
td style="padding: 32px 40px 24px; background-color: #18181b;" {
h1 style="margin: 0; color: #ffffff; font-size: 20px; font-weight: 600;" {
"MyApp"
}
}
}
tr {
td style="padding: 32px 40px;" {
(content)
}
}
tr {
td style="padding: 24px 40px; border-top: 1px solid #e4e4e7;" {
p style="margin: 0; color: #71717a; font-size: 13px; line-height: 1.5;" {
"You received this email because you have an account at MyApp. "
"If you did not expect this, you can ignore it."
}
}
}
}
}
}
}
}
}
}
}
Table-based layout is intentional. Email clients (particularly Outlook) have poor support for modern CSS layout. Tables are the only layout method that renders consistently across Gmail, Outlook, Apple Mail, and others. role="presentation" tells screen readers these tables are structural, not data.
A transactional email template
A password reset email, using the layout above:
pub fn password_reset_email(reset_url: &str, expires_in_hours: u32) -> (String, String) {
let html_body = email_layout("Reset your password", html! {
h2 style="margin: 0 0 16px; color: #18181b; font-size: 22px; font-weight: 600;" {
"Reset your password"
}
p style="margin: 0 0 16px; color: #3f3f46; font-size: 15px; line-height: 1.6;" {
"We received a request to reset the password for your account. "
"Click the button below to choose a new password."
}
table role="presentation" cellpadding="0" cellspacing="0" style="margin: 24px 0;" {
tr {
td style="background-color: #18181b; border-radius: 6px;" {
a href=(reset_url) style="display: inline-block; padding: 12px 32px; color: #ffffff; font-size: 15px; font-weight: 600; text-decoration: none;" {
"Reset password"
}
}
}
}
p style="margin: 0 0 8px; color: #71717a; font-size: 13px; line-height: 1.5;" {
"This link expires in " (expires_in_hours) " hours. If you did not request a password reset, ignore this email."
}
p style="margin: 0; color: #71717a; font-size: 13px; line-height: 1.5; word-break: break-all;" {
"If the button does not work, copy and paste this URL into your browser: "
(reset_url)
}
});
let plain_body = format!(
"Reset your password\n\n\
We received a request to reset the password for your account.\n\n\
Visit this link to choose a new password:\n{reset_url}\n\n\
This link expires in {expires_in_hours} hours. \
If you did not request a password reset, ignore this email."
);
(html_body.into_string(), plain_body)
}
The function returns both HTML and plain text bodies. The caller passes them to MultiPart::alternative_plain_html. Always include the raw URL in the HTML body for clients that strip links or where the button fails to render. The plain text version covers email clients that don’t render HTML.
Sending a templated email
use lettre::{AsyncTransport, Message};
use lettre::message::MultiPart;
pub async fn send_password_reset(
mailer: &AsyncSmtpTransport<Tokio1Executor>,
to: &str,
reset_url: &str,
) -> Result<(), lettre::transport::smtp::Error> {
let (html_body, plain_body) = password_reset_email(reset_url, 24);
let email = Message::builder()
.from(default_sender())
.to(to.parse().unwrap())
.subject("Reset your password")
.multipart(MultiPart::alternative_plain_html(plain_body, html_body))
.expect("valid email message");
mailer.send(email).await?;
Ok(())
}
Async sending patterns
Email delivery takes time. An SMTP handshake, TLS negotiation, and message transfer can take 500ms to several seconds, depending on the provider. Awaiting this inline in a request handler adds that latency directly to the user’s response time.
Fire-and-forget with tokio::spawn
For emails where delivery latency should not block the response (confirmations, notifications, receipts), spawn the send as a background task:
use axum::{extract::State, response::Redirect};
use lettre::AsyncTransport;
pub async fn handle_password_reset_request(
State(state): State<AppState>,
form: axum::extract::Form<PasswordResetForm>,
) -> Result<Redirect, AppError> {
let user = sqlx::query_as!(
User,
"SELECT id, email FROM users WHERE email = $1",
form.email
)
.fetch_optional(&state.db)
.await?;
if let Some(user) = user {
let token = generate_reset_token(&state.db, user.id).await?;
let reset_url = format!("https://example.com/reset?token={token}");
let mailer = state.mailer.clone();
tokio::spawn(async move {
if let Err(e) = send_password_reset(&mailer, &user.email, &reset_url).await {
tracing::error!(error = ?e, email = %user.email, "failed to send password reset email");
}
});
}
Ok(Redirect::to("/reset-sent"))
}
tokio::spawn moves the email send to a separate Tokio task. The handler returns the redirect immediately. If the send fails, it logs the error but does not surface it to the user.
This pattern is appropriate for most transactional email. The trade-off: if the application crashes between spawning the task and the send completing, the email is lost. For the vast majority of transactional emails (confirmations, notifications, receipts), this is acceptable. The user can re-trigger the action.
Durable delivery with Restate
For emails that must be delivered (invoice delivery, compliance notifications, onboarding sequences), a tokio::spawn that disappears on crash is not sufficient. Use Restate to make the send durable. Restate persists the operation, retries on failure, and survives process restarts. This is the same pattern described in the background jobs section: trigger a Restate workflow from the handler and let the workflow handle the email send with guaranteed delivery.
MailCrab for local development
MailCrab is an email testing server that accepts all SMTP traffic and displays it in a web interface. It replaces the need for a real SMTP provider during development. Point your application’s SMTP configuration at MailCrab and every email your application sends appears in the web UI, where you can inspect the HTML rendering, headers, and plain text.
MailCrab runs as a Docker container alongside your other backing services. Add it to your project’s Docker Compose file:
services:
mailcrab:
image: marlonb/mailcrab:latest
ports:
- "1025:1025"
- "1080:1080"
Port 1025 is the SMTP server. Port 1080 is the web UI. Open http://localhost:1080 in your browser to see captured emails in real time.
MailCrab accepts all mail regardless of sender, recipient, or credentials. No accounts or configuration are needed. It stores messages in memory only, so restarting the container clears all captured email. This is a feature, not a limitation, for a development tool.
Gotchas
Inline styles in email HTML. External stylesheets and <style> blocks are stripped or ignored by many email clients. Gmail strips <style> tags entirely. Outlook uses the Word rendering engine. Put styles directly on elements with style="". This is tedious but necessary.
Always include a plain text body. HTML-only emails are more likely to be flagged as spam. Some corporate email clients render plain text only. MultiPart::alternative_plain_html makes this straightforward.
Don’t send email synchronously in handlers. An SMTP send can take several seconds. If you await it in the handler, the user waits for the email to be sent before seeing a response. Use tokio::spawn for fire-and-forget, or Restate for durable delivery.
Validate email addresses before sending. "user@example.com".parse::<lettre::Address>() validates the address format. Catch invalid addresses at form validation time, not at send time, to give users clear error messages.
Test with MailCrab, not your production provider. Sending test emails through a real provider risks hitting rate limits, landing on blocklists, and sending unintended email to real addresses. MailCrab catches everything locally.
Watch for from address restrictions. Most SMTP providers restrict which From addresses you can use. You typically need to verify the domain or specific address in the provider’s dashboard before sending from it. Emails with unverified From addresses are silently dropped or rejected.
Connection pooling is automatic. AsyncSmtpTransport pools connections internally. Do not create a new transport per request. Build one at startup and share it through application state, the same pattern as reqwest::Client and sqlx::PgPool.
Configuration and Secrets
Application configuration follows the twelve-factor methodology: store config in environment variables. Database connection strings, SMTP credentials, S3 keys, listen addresses, and feature flags all come from the environment. The same code runs in development and production. Only the source of the variables changes.
This section covers loading .env files in development with dotenvy, parsing environment variables into a typed Config struct at startup, protecting secrets with the secrecy crate, and managing secrets in production with Docker Compose and Terraform.
Dependencies
The app-config crate handles all environment variable parsing. It depends on dotenvy for .env file loading and secrecy for wrapping sensitive values.
[package]
name = "app-config"
edition.workspace = true
version.workspace = true
[lints]
workspace = true
[dependencies]
dotenvy.workspace = true
secrecy = "0.10"
Add secrecy to the workspace dependency table:
[workspace.dependencies]
secrecy = "0.10"
Then use secrecy.workspace = true in the config crate.
Loading .env files with dotenvy
dotenvy loads environment variables from a .env file at the project root. It is a maintained fork of the original dotenv crate, which was flagged as unmaintained in RUSTSEC-2021-0141.
dotenvy::dotenv() reads the .env file and calls std::env::set_var for each entry. Existing environment variables are not overridden, so production values injected by the deployment platform always take precedence over a stale .env file.
Rust 2024 edition safety
In Rust 2024 edition (the edition this project targets), std::env::set_var is unsafe. The underlying setenv libc call is not thread-safe: calling it while other threads read environment variables is a data race. Rust’s internal mutex only protects std::env calls, not libc-level getenv calls from other libraries.
The fix is to call dotenvy::dotenv() before the Tokio runtime starts, while the process is still single-threaded. Separate the synchronous entry point from the async runtime:
use anyhow::Result;
use tracing_subscriber::EnvFilter;
fn main() {
dotenvy::dotenv().ok();
run();
}
#[tokio::main]
async fn run() {
tracing_subscriber::fmt()
.with_env_filter(EnvFilter::from_default_env())
.init();
let config = app_config::load();
let db = app_db::connect(config.database_url.expose_secret()).await;
let listener = tokio::net::TcpListener::bind(&config.listen_addr)
.await
.expect("failed to bind listener");
tracing::info!("listening on {}", config.listen_addr);
let app = app_web::router(db);
axum::serve(listener, app).await.expect("server error");
}
dotenv().ok() silently ignores a missing .env file. In production, no .env file exists and all variables come from the deployment platform. In development, the .env file is present and its values are loaded before any threads are spawned.
The Config struct
Parse all environment variables into a single typed struct at startup. If a required variable is missing or malformed, the application panics immediately with a clear error message. Failing fast at startup is better than discovering a missing variable minutes later when a handler first needs it.
use secrecy::{ExposeSecret, SecretString};
pub struct Config {
pub listen_addr: String,
pub database_url: SecretString,
pub smtp_host: String,
pub smtp_port: u16,
pub smtp_tls: bool,
pub smtp_username: Option<String>,
pub smtp_password: Option<SecretString>,
pub email_from: String,
pub s3_endpoint: String,
pub s3_region: String,
pub s3_bucket: String,
pub s3_access_key: SecretString,
pub s3_secret_key: SecretString,
}
pub fn load() -> Config {
Config {
listen_addr: format!(
"{}:{}",
optional("HOST", "127.0.0.1"),
optional("PORT", "3000"),
),
database_url: required_secret("DATABASE_URL"),
smtp_host: required("SMTP_HOST"),
smtp_port: parse("SMTP_PORT", 1025),
smtp_tls: optional("SMTP_TLS", "false") == "true",
smtp_username: std::env::var("SMTP_USERNAME").ok(),
smtp_password: optional_secret("SMTP_PASSWORD"),
email_from: required("EMAIL_FROM"),
s3_endpoint: required("S3_ENDPOINT"),
s3_region: optional("S3_REGION", "us-east-1"),
s3_bucket: required("S3_BUCKET"),
s3_access_key: required_secret("S3_ACCESS_KEY"),
s3_secret_key: required_secret("S3_SECRET_KEY"),
}
}
fn required(name: &str) -> String {
std::env::var(name).unwrap_or_else(|_| panic!("{name} must be set"))
}
fn required_secret(name: &str) -> SecretString {
SecretString::from(required(name))
}
fn optional_secret(name: &str) -> Option<SecretString> {
std::env::var(name).ok().map(SecretString::from)
}
fn optional(name: &str, default: &str) -> String {
std::env::var(name).unwrap_or_else(|_| default.to_string())
}
fn parse<T: std::str::FromStr>(name: &str, default: T) -> T
where
T::Err: std::fmt::Debug,
{
match std::env::var(name) {
Ok(val) => val
.parse()
.unwrap_or_else(|_| panic!("{name} must be a valid {}", std::any::type_name::<T>())),
Err(_) => default,
}
}
Five helper functions cover every case: required panics on missing variables, required_secret wraps the value in SecretString, optional_secret returns Option<SecretString> for credentials that are only present in some environments, optional provides a default, and parse handles non-string types. These helpers include the variable name in the panic message, which std::env::var does not do on its own.
Sharing config through application state
Pass the Config into Axum’s application state alongside the database pool and other shared resources:
use app_config::Config;
use std::sync::Arc;
#[derive(Clone)]
pub struct AppState {
pub config: Arc<Config>,
pub db: sqlx::PgPool,
}
Wrap Config in Arc because it does not derive Clone (the SecretString fields implement Clone, but wrapping in Arc avoids cloning secret data on every request). Handlers access configuration through the state extractor:
use axum::extract::State;
async fn some_handler(State(state): State<AppState>) {
let from = &state.config.email_from;
}
Optional configuration with Option<T>
Use Option<T> for configuration that is only present in some environments. SMTP credentials are a common example: MailCrab in development accepts unauthenticated connections, while production SMTP providers require credentials.
pub smtp_username: Option<String>,
pub smtp_password: Option<SecretString>,
Read optional variables with .ok(), which converts Err(VarError::NotPresent) to None:
smtp_username: std::env::var("SMTP_USERNAME").ok(),
smtp_password: std::env::var("SMTP_PASSWORD").ok().map(SecretString::from),
This pattern works for any feature that is conditionally enabled: a Sentry DSN, a Redis URL for caching, external API keys for optional integrations. The handler checks if let Some(ref password) = config.smtp_password and adapts its behaviour.
Protecting secrets with secrecy
The secrecy crate wraps sensitive values in SecretBox<T>, which provides two protections:
-
Memory zeroing. When a SecretBox is dropped, its contents are overwritten with zeros before deallocation. This prevents secrets from lingering in freed memory where a crash dump or memory scan could recover them.
-
Debug redaction. The Debug implementation prints SecretBox<str>([REDACTED]) instead of the actual value. If a Config struct ends up in a panic message, log line, or error report, secrets are not exposed.
Access the underlying value explicitly with expose_secret():
use secrecy::ExposeSecret;
let pool = sqlx::PgPool::connect(config.database_url.expose_secret()).await?;
The explicit .expose_secret() call makes every point where a secret is used visible and auditable. A grep for expose_secret shows exactly where secrets flow in the codebase.
What to wrap
Wrap values that would cause damage if leaked: database connection strings (which contain passwords), API keys, SMTP passwords, S3 secret keys, session signing keys. Do not wrap non-sensitive values like hostnames, ports, or log levels.
.env.example
Commit a .env.example file to version control that documents every variable the application needs. New developers copy it to .env and fill in real values. Any time a new variable is added to the Config struct, add a corresponding entry to .env.example.
HOST=127.0.0.1
PORT=3000
DATABASE_URL=postgres://postgres:password@localhost:5432/myapp_dev
SMTP_HOST=localhost
SMTP_PORT=1025
SMTP_TLS=false
EMAIL_FROM="MyApp <noreply@example.com>"
S3_ENDPOINT=http://localhost:9000
S3_REGION=us-east-1
S3_BUCKET=myapp
S3_ACCESS_KEY=minioadmin
S3_SECRET_KEY=minioadmin
Add .env to .gitignore to prevent committing real secrets. The .env.example file is safe to commit because it contains only placeholder values and local development defaults.
.env
!.env.example
Secrets in production
In production, environment variables come from the deployment platform, not a .env file. The Config::load() function does not care where the variables originate. It calls std::env::var(), which reads whatever is in the process environment.
Docker Compose
The simplest production pattern is an env_file directive in Docker Compose that points to a file on the host server:
services:
app:
image: myapp:latest
env_file:
- .env.production
ports:
- "3000:3000"
The .env.production file lives on the server, not in the repository. It is created during provisioning and contains real credentials. Alternatively, set variables directly in the environment block:
services:
app:
image: myapp:latest
environment:
DATABASE_URL: postgres://user:${DB_PASSWORD}@db:5432/myapp
SMTP_HOST: smtp.example.com
SMTP_TLS: "true"
Variables referenced with ${} syntax are interpolated from the host shell environment, which allows Terraform or CI/CD to inject them without writing credentials into the Compose file.
When deploying to a Hetzner VPS with Terraform, provision the .env.production file on the server during infrastructure setup:
resource "null_resource" "deploy_env" {
provisioner "file" {
content = templatefile("env.production.tftpl", {
database_url = var.database_url
smtp_host = var.smtp_host
smtp_password = var.smtp_password
s3_access_key = var.s3_access_key
s3_secret_key = var.s3_secret_key
})
destination = "/opt/myapp/.env.production"
}
}
The template file (env.production.tftpl) contains variable placeholders that Terraform fills in:
DATABASE_URL=${database_url}
SMTP_HOST=${smtp_host}
SMTP_PASSWORD=${smtp_password}
S3_ACCESS_KEY=${s3_access_key}
S3_SECRET_KEY=${s3_secret_key}
Terraform variables themselves come from terraform.tfvars (git-ignored) or TF_VAR_* environment variables set in CI/CD. The chain is: CI/CD secrets store → Terraform variables → provisioned file on server → Docker Compose env_file → process environment → Config::load().
GitHub Actions
For CI/CD pipelines, store secrets in GitHub Actions and pass them to deployment scripts:
jobs:
deploy:
steps:
- name: Deploy
env:
DATABASE_URL: ${{ secrets.DATABASE_URL }}
SMTP_PASSWORD: ${{ secrets.SMTP_PASSWORD }}
run: ./deploy.sh
GitHub encrypts secrets at rest and masks them in log output. They are available only to workflows running on the repository.
Gotchas
std::env::var does not include the variable name in its error. Err(VarError::NotPresent) does not say which variable was missing. Always use a wrapper like the required() helper above, which includes the name in the panic message.
dotenvy does not override existing variables. If DATABASE_URL is already set in the shell, the .env file value is ignored. This is correct behaviour: production values set by the platform should not be overridden by a .env file. Use dotenvy::dotenv_override() if you need the opposite (rarely useful).
Parse errors are confusing without context. "abc".parse::<u16>().unwrap() panics with a bare ParseIntError. The parse() helper above includes the variable name in the error, making startup failures easy to diagnose.
Secrets in panic messages. If you unwrap a String containing a database password and the operation panics, the password appears in the panic output. Wrapping secrets in SecretString prevents this. The Debug output shows [REDACTED] instead of the value.
Compile-time environment variables for SQLx. SQLx’s query! macro reads DATABASE_URL at compile time to check queries against the database schema. This uses dotenvy internally (SQLx depends on it). The compile-time .env file must contain a valid DATABASE_URL that points to a running database, even though the application reads it at runtime too.
Observability
Production applications need three categories of telemetry: logs (discrete events), traces (request flows across spans of work), and metrics (numeric measurements over time). The Rust ecosystem handles all three through the tracing crate for instrumentation and OpenTelemetry for export to a self-hosted Grafana stack.
The web server section introduces tracing basics: initialising a subscriber, adding TraceLayer, and controlling log levels with RUST_LOG. The error handling section covers logging errors with tracing::error! and #[instrument]. This section builds on that foundation, covering production subscriber configuration, OpenTelemetry export, metrics collection, and the self-hosted observability stack that receives it all.
Dependencies
tracing and tracing-subscriber are already workspace dependencies from the web server setup. Add the OpenTelemetry crates and the metrics stack:
[workspace.dependencies]
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter", "json", "registry"] }
opentelemetry = "0.31"
opentelemetry_sdk = { version = "0.31", features = ["rt-tokio"] }
opentelemetry-otlp = { version = "0.31", features = ["grpc-tonic"] }
tracing-opentelemetry = "0.32"
opentelemetry-appender-tracing = "0.31"
metrics = "0.24"
metrics-exporter-prometheus = "0.18"
The grpc-tonic feature on opentelemetry-otlp enables gRPC transport via Tonic. This is not a default feature as of 0.31, so it must be explicitly enabled. The alternative is HTTP/protobuf (http-proto feature), which is now the default. Either works; gRPC is the more established choice for OTLP.
Add json and registry features to tracing-subscriber if they are not already present. json enables structured JSON log output and registry provides the composable multi-layer subscriber.
Structured logging with tracing
tracing goes beyond traditional logging. Where log::info!("processing user {}", user_id) produces a flat string, tracing attaches structured key-value fields to both events and spans.
use tracing::{info, warn, info_span};
info!(user_id = 42, action = "login", "user authenticated");
let user_id = 42;
info!(user_id, "user authenticated");
info!(?some_struct, "debug format");
info!(%some_value, "display format");
Spans represent a unit of work with a duration. Events occur within spans. When you nest spans, child spans carry their parent’s context, building a tree that traces the full path of a request through your application.
let span = info_span!("process_order", order_id = 1234);
let _guard = span.enter();
info!("validating payment");
info!("updating inventory");
The #[instrument] attribute macro (covered in error handling) is the most common way to create spans. It wraps a function in a span named after the function, recording arguments as fields:
#[tracing::instrument(skip(pool))]
async fn create_order(pool: &PgPool, user_id: i64, items: &[Item]) -> Result<Order, AppError> {
info!("creating order");
}
This structured data is what makes observability work. Log aggregation systems like Loki can filter and group by field values. Trace backends like Tempo use spans to reconstruct request flows. None of this works with unstructured string logs.
Production subscriber configuration
The simple tracing_subscriber::fmt().init() from the web server section works for development. Production needs a multi-layer subscriber that sends telemetry to both stdout and OpenTelemetry.
tracing-subscriber’s architecture is built on composable layers stacked on a Registry:
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt, EnvFilter};
pub fn init_telemetry() {
let env_filter = EnvFilter::try_from_default_env()
.unwrap_or_else(|_| EnvFilter::new("info,tower_http=debug"));
let fmt_layer = tracing_subscriber::fmt::layer()
.json()
.with_target(true)
.with_thread_ids(false)
.with_file(true)
.with_line_number(true);
tracing_subscriber::registry()
.with(env_filter)
.with(fmt_layer)
.init();
}
Each layer handles one concern. The EnvFilter controls what gets processed (respects RUST_LOG). The fmt::layer() handles output formatting. Calling .json() switches from human-readable to structured JSON output, which log aggregation tools parse far more reliably than plain text.
When OpenTelemetry is configured (next section), two additional layers join the stack: one that exports spans as traces, and one that exports events as log records.
OpenTelemetry export
OpenTelemetry (OTel) is a vendor-neutral standard for telemetry data. The Rust application exports traces and logs via the OTLP protocol to an OpenTelemetry Collector, which forwards them to the storage backends (Tempo for traces, Loki for logs).
Setting up the trace exporter
tracing-opentelemetry provides a layer that converts tracing spans into OpenTelemetry spans and exports them via OTLP:
use opentelemetry::trace::TracerProvider;
use opentelemetry_otlp::SpanExporter;
use opentelemetry_sdk::{
trace::SdkTracerProvider,
Resource,
};
use tracing_opentelemetry::OpenTelemetryLayer;
fn init_tracer_provider() -> SdkTracerProvider {
let exporter = SpanExporter::builder()
.with_tonic()
.build()
.expect("failed to create OTLP span exporter");
SdkTracerProvider::builder()
.with_batch_exporter(exporter)
.with_resource(
Resource::builder()
.with_service_name("my-app")
.build(),
)
.build()
}
with_tonic() sends spans over gRPC to the collector. The exporter reads OTEL_EXPORTER_OTLP_ENDPOINT for the collector address (defaults to http://localhost:4317). with_batch_exporter batches spans before sending, reducing network overhead.
The Resource identifies this application in the observability stack. service.name is the minimum; it appears as the service label in Grafana.
Setting up the logs exporter
opentelemetry-appender-tracing bridges tracing events to OpenTelemetry log records. This sends your application’s log output through the same OTLP pipeline as traces, and automatically attaches the active trace ID and span ID to each log record. That attachment is what enables clicking from a log line in Loki directly to the corresponding trace in Tempo.
use opentelemetry_appender_tracing::layer::OpenTelemetryTracingBridge;
use opentelemetry_otlp::LogExporter;
use opentelemetry_sdk::logs::SdkLoggerProvider;
fn init_logger_provider() -> SdkLoggerProvider {
let exporter = LogExporter::builder()
.with_tonic()
.build()
.expect("failed to create OTLP log exporter");
SdkLoggerProvider::builder()
.with_batch_exporter(exporter)
.build()
}
Combining everything
Wire up the full subscriber with all four layers:
use opentelemetry::trace::TracerProvider;
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt, EnvFilter};
pub struct TelemetryGuard {
tracer_provider: SdkTracerProvider,
logger_provider: SdkLoggerProvider,
}
impl Drop for TelemetryGuard {
fn drop(&mut self) {
if let Err(e) = self.tracer_provider.shutdown() {
eprintln!("failed to shutdown tracer provider: {e}");
}
if let Err(e) = self.logger_provider.shutdown() {
eprintln!("failed to shutdown logger provider: {e}");
}
}
}
pub fn init_telemetry() -> TelemetryGuard {
let tracer_provider = init_tracer_provider();
let logger_provider = init_logger_provider();
let env_filter = EnvFilter::try_from_default_env()
.unwrap_or_else(|_| EnvFilter::new("info,tower_http=debug"));
let fmt_layer = tracing_subscriber::fmt::layer()
.json()
.with_target(true);
let tracer = tracer_provider.tracer("my-app");
let otel_trace_layer = tracing_opentelemetry::layer().with_tracer(tracer);
let otel_logs_layer = OpenTelemetryTracingBridge::new(&logger_provider);
tracing_subscriber::registry()
.with(env_filter)
.with(fmt_layer)
.with(otel_trace_layer)
.with(otel_logs_layer)
.init();
TelemetryGuard {
tracer_provider,
logger_provider,
}
}
The TelemetryGuard ensures providers flush pending telemetry on shutdown. Hold it in main:
#[tokio::main]
async fn main() {
let _telemetry = init_telemetry();
}
When _telemetry drops (at the end of main or on graceful shutdown), the providers flush any buffered spans and log records to the collector. Without this, the last few seconds of telemetry are lost on shutdown.
Development vs production
In development, you may not want to run the full observability stack. Make OpenTelemetry export conditional:
pub fn init_telemetry() -> Option<TelemetryGuard> {
let env_filter = EnvFilter::try_from_default_env()
.unwrap_or_else(|_| EnvFilter::new("info,tower_http=debug"));
let otel_enabled = std::env::var("OTEL_EXPORTER_OTLP_ENDPOINT").is_ok();
if otel_enabled {
let tracer_provider = init_tracer_provider();
let logger_provider = init_logger_provider();
let tracer = tracer_provider.tracer("my-app");
let otel_trace_layer = tracing_opentelemetry::layer().with_tracer(tracer);
let otel_logs_layer = OpenTelemetryTracingBridge::new(&logger_provider);
tracing_subscriber::registry()
.with(env_filter)
.with(tracing_subscriber::fmt::layer().json().with_target(true))
.with(otel_trace_layer)
.with(otel_logs_layer)
.init();
Some(TelemetryGuard { tracer_provider, logger_provider })
} else {
tracing_subscriber::registry()
.with(env_filter)
.with(tracing_subscriber::fmt::layer().with_target(true))
.init();
None
}
}
When OTEL_EXPORTER_OTLP_ENDPOINT is not set, the subscriber falls back to human-readable stdout logging with no OTel export. Set the variable to enable export:
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
The observability stack
The receiving end is a self-hosted Grafana stack: Loki for logs, Tempo for traces, Prometheus for metrics, and Grafana for dashboards. An OpenTelemetry Collector sits between the application and these backends, receiving OTLP data and routing it to the correct destination.
┌──────────┐ OTLP ┌───────────────┐
│ Rust App │───────────── │ OTel │──── Loki (logs)
│ │ gRPC/HTTP │ Collector │──── Tempo (traces)
└──────────┘ └───────────────┘──── Prometheus (metrics)
│
┌─────┘
▼
Grafana (dashboards)
Local development
For development, Grafana provides an all-in-one Docker image that bundles the OTel Collector, Grafana, Loki, Tempo, and Prometheus in a single container. Add it to your Docker Compose file alongside the other backing services:
services:
lgtm:
image: grafana/otel-lgtm:latest
ports:
- "3000:3000"
- "4317:4317"
- "4318:4318"
Start the container and set the OTLP endpoint in your .env:
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
Open http://localhost:3000 to access Grafana. All data sources (Loki, Tempo, Prometheus) are pre-configured. No additional setup is needed.
Production
In production, run each component as a separate container: Grafana, Loki, Tempo, Prometheus, and the OpenTelemetry Collector. The Grafana documentation covers configuration for each component. The OTel Collector documentation covers the YAML pipeline configuration for routing OTLP data to each backend.
The key configuration is the collector’s pipeline, which routes each signal type to its destination:
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
http:
endpoint: "0.0.0.0:4318"
exporters:
otlphttp/loki:
endpoint: http://loki:3100/otlp
otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true
prometheusremotewrite:
endpoint: http://prometheus:9090/api/v1/write
service:
pipelines:
logs:
receivers: [otlp]
exporters: [otlphttp/loki]
traces:
receivers: [otlp]
exporters: [otlp/tempo]
metrics:
receivers: [otlp]
exporters: [prometheusremotewrite]
Loki 3.x accepts OTLP logs natively at its /otlp endpoint. Tempo accepts OTLP traces over gRPC on port 4317. Prometheus 3.x accepts remote writes with the --web.enable-remote-write-receiver flag. Use the otel/opentelemetry-collector-contrib Docker image, which includes all the required exporters.
Metrics with Prometheus
The metrics crate provides a facade for recording application metrics. metrics-exporter-prometheus exposes those metrics in Prometheus text format at a /metrics endpoint that Prometheus scrapes.
Setup
use metrics_exporter_prometheus::{Formatter, PrometheusBuilder, PrometheusHandle};
pub fn init_metrics() -> PrometheusHandle {
PrometheusBuilder::new()
.install_recorder()
.expect("failed to install Prometheus recorder")
}
install_recorder() registers the Prometheus exporter as the global metrics recorder and returns a handle. Use the handle to render the metrics output in an Axum handler:
use axum::{routing::get, Router, Extension};
async fn metrics_handler(
Extension(handle): Extension<PrometheusHandle>,
) -> String {
handle.render()
}
let metrics_handle = init_metrics();
let app = Router::new()
.route("/", get(index))
.route("/metrics", get(metrics_handler))
.layer(Extension(metrics_handle));
Prometheus scrapes http://your-app:3000/metrics on a configured interval (typically 15 seconds) and stores the time series data.
Recording metrics
The metrics crate provides three metric types:
use metrics::{counter, gauge, histogram};
counter!("http_requests_total", "method" => "GET", "path" => "/users").increment(1);
gauge!("active_connections").set(42.0);
histogram!("http_request_duration_seconds").record(0.035);
Labels (the "method" => "GET" pairs) create separate time series for each label combination. Use labels to break down metrics by dimensions you need to filter or group by in dashboards.
Request metrics middleware
Record HTTP request count and duration for every request with Tower middleware:
use axum::{extract::MatchedPath, middleware, http::Request, response::Response};
use std::time::Instant;
async fn track_metrics<B>(
matched_path: Option<MatchedPath>,
request: Request<B>,
next: middleware::Next<B>,
) -> Response {
let method = request.method().to_string();
let path = matched_path
.map(|p| p.as_str().to_string())
.unwrap_or_else(|| "unknown".to_string());
let start = Instant::now();
let response = next.run(request).await;
let duration = start.elapsed().as_secs_f64();
let status = response.status().as_u16().to_string();
counter!("http_requests_total", "method" => method.clone(), "path" => path.clone(), "status" => status).increment(1);
histogram!("http_request_duration_seconds", "method" => method, "path" => path).record(duration);
response
}
let app = Router::new()
.route("/", get(index))
.route_layer(middleware::from_fn(track_metrics))
.route("/metrics", get(metrics_handler));
Use MatchedPath rather than the raw URI for the path label. Raw URIs with path parameters (e.g., /users/42, /users/73) create unbounded label cardinality, which bloats Prometheus storage. MatchedPath returns the route template (/users/:id), keeping cardinality bounded.
The /metrics endpoint is outside the route_layer scope so it does not record metrics about metrics scraping.
What to measure
Start with RED metrics for HTTP services:
- Rate:
http_requests_total (counter, by method/path/status)
- Errors: filter
http_requests_total where status is 5xx
- Duration:
http_request_duration_seconds (histogram, by method/path)
Add application-specific metrics as you identify monitoring needs:
db_query_duration_seconds (histogram) for slow query detection
background_jobs_total (counter, by job type and outcome)
active_sessions (gauge) for capacity planning
email_send_total (counter, by outcome: success/failure)
Resist adding metrics for everything up front. Start with RED, observe your application in production, and add metrics when you find yourself asking a question that the existing telemetry cannot answer.
Correlating requests across services
The value of the observability stack multiplies when signals are connected. A log line links to its trace. A metric spike links to the traces that caused it.
Trace IDs in logs
The opentelemetry-appender-tracing bridge automatically attaches the active trace ID and span ID to every log record exported via OTLP. When these logs land in Loki, Grafana can extract the trace ID and create a clickable link to the corresponding trace in Tempo.
Configure this in Grafana’s Loki data source settings by adding a derived field that matches the trace ID and links to Tempo. In the grafana/otel-lgtm development image, this correlation is pre-configured.
Propagation across services
If your application calls other services (via reqwest or similar), propagate the trace context so spans across services form a single trace. Inject the W3C traceparent header into outgoing requests:
use opentelemetry::global;
use opentelemetry::propagation::Injector;
use reqwest::header::HeaderMap;
struct HeaderInjector<'a>(&'a mut HeaderMap);
impl<'a> Injector for HeaderInjector<'a> {
fn set(&mut self, key: &str, value: String) {
if let Ok(header_name) = key.parse() {
if let Ok(header_value) = value.parse() {
self.0.insert(header_name, header_value);
}
}
}
}
pub async fn call_other_service(client: &reqwest::Client, url: &str) -> reqwest::Result<String> {
let mut headers = HeaderMap::new();
global::get_text_map_propagator(|propagator| {
propagator.inject(&mut HeaderInjector(&mut headers));
});
client
.get(url)
.headers(headers)
.send()
.await?
.text()
.await
}
Register the W3C propagator at startup, before initialising the subscriber:
use opentelemetry::global;
use opentelemetry_sdk::propagation::TraceContextPropagator;
global::set_text_map_propagator(TraceContextPropagator::new());
For a single-service application (which most projects in this guide will be), propagation is not needed. Add it when you split into multiple services and want end-to-end traces.
Environment variables
The OpenTelemetry SDK reads standard environment variables. Set these in production:
# Collector endpoint
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
# Service identification
OTEL_SERVICE_NAME=my-app
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production
# Log level filtering
RUST_LOG=info,tower_http=debug
# Prometheus scrape (configure in prometheus.yml, not as an env var)
OTEL_SERVICE_NAME overrides the service.name set in code. OTEL_RESOURCE_ATTRIBUTES adds arbitrary key-value pairs to the OTel resource (useful for environment, version, or region labels).
Gotchas
Shutdown order matters. The TelemetryGuard must outlive the Axum server. If the guard drops before in-flight requests complete, those requests’ spans and logs are lost. Structure main so the guard is declared before the server starts and drops after shutdown completes.
EnvFilter is applied once. The filter determines which spans and events reach any layer. If you filter at info level, the OTel layers will not receive debug spans either. For production, info is typically appropriate. Avoid trace or debug in production unless you are actively debugging, as the volume of OTel data grows rapidly.
gRPC vs HTTP for OTLP. The grpc-tonic feature adds Tonic as a dependency, which pulls in prost, hyper, and h2. If binary size or compile time is a concern, use the http-proto feature instead, which sends OTLP over HTTP/protobuf using reqwest. Both are functionally equivalent.
Prometheus label cardinality. Every unique combination of label values creates a separate time series in Prometheus. High-cardinality labels (user IDs, request IDs, raw URLs) cause storage bloat and query slowness. Keep labels to bounded sets: HTTP methods, route templates, status code classes, job types.
The metrics crate is a facade. Like log for logging, metrics defines the recording API but not the backend. If you forget to call PrometheusBuilder::new().install_recorder(), metric calls silently do nothing. Initialise the recorder early in startup.
OpenTelemetry crate versions move together. The opentelemetry, opentelemetry_sdk, and opentelemetry-otlp crates share a version number and must match. tracing-opentelemetry is one minor version ahead (0.32.x works with opentelemetry 0.31.x). Check compatibility when upgrading.
Operations
Testing
Rust’s type system catches many errors at compile time, but it does not verify business logic, database queries, or HTML output. Tests fill that gap. This section covers unit tests for domain logic and Maud components, integration tests against real databases and services using testcontainers, and testing Axum handlers with the axum_test crate.
Unit tests for domain logic
Standard #[test] functions in the same file as the code under test. Rust’s built-in test framework needs no external dependencies for pure logic.
pub fn slugify(input: &str) -> String {
input
.to_lowercase()
.chars()
.map(|c| if c.is_alphanumeric() { c } else { '-' })
.collect::<String>()
.trim_matches('-')
.to_string()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn slugify_converts_spaces_and_lowercases() {
assert_eq!(slugify("Hello World"), "hello-world");
}
#[test]
fn slugify_strips_special_characters() {
assert_eq!(slugify("Rust & Axum!"), "rust---axum-");
}
}
Place unit tests in a #[cfg(test)] mod tests block at the bottom of the module file. They compile only during cargo test and have access to private items in the parent module.
For async code, use #[tokio::test]:
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn sends_welcome_email() {
let result = send_welcome("alice@example.com").await;
assert!(result.is_ok());
}
}
Testing Maud components
Maud components are Rust functions that return Markup. Test them by rendering to a string and asserting on the HTML output.
Simple assertions with into_string()
For small components with predictable output, compare the rendered string directly:
use maud::{html, Markup};
pub fn status_badge(active: bool) -> Markup {
html! {
@if active {
span.badge.bg-success { "Active" }
} @else {
span.badge.bg-secondary { "Inactive" }
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn badge_renders_active() {
let html = status_badge(true).into_string();
assert_eq!(html, r#"<span class="badge bg-success">Active</span>"#);
}
#[test]
fn badge_renders_inactive() {
let html = status_badge(false).into_string();
assert_eq!(html, r#"<span class="badge bg-secondary">Inactive</span>"#);
}
}
into_string() consumes the Markup and returns the inner String. Maud produces deterministic output with no extra whitespace, so exact string comparison works reliably for small components.
Structured assertions with scraper
For larger components or pages where exact string matching is fragile, use the scraper crate to parse the HTML and query it with CSS selectors.
[dev-dependencies]
scraper = "0.23"
Define a few test helpers that the rest of the test suite can reuse:
use scraper::{Html, Selector};
pub fn parse(markup: maud::Markup) -> Html {
Html::parse_fragment(&markup.into_string())
}
pub fn select_one<'a>(doc: &'a Html, css: &str) -> scraper::ElementRef<'a> {
let sel = Selector::parse(css).unwrap();
doc.select(&sel)
.next()
.unwrap_or_else(|| panic!("no element matching '{css}'"))
}
pub fn select_count(doc: &Html, css: &str) -> usize {
let sel = Selector::parse(css).unwrap();
doc.select(&sel).count()
}
pub fn text_of(doc: &Html, css: &str) -> String {
select_one(doc, css).text().collect()
}
Use the helpers to make structural assertions:
use crate::support::html::{parse, select_one, select_count, text_of};
#[test]
fn user_card_renders_name_and_link() {
let doc = parse(user_card(42, "Alice"));
assert_eq!(text_of(&doc, ".user-card h2"), "Alice");
let link = select_one(&doc, "a.profile-link");
assert_eq!(link.value().attr("href"), Some("/users/42"));
}
#[test]
fn user_list_renders_all_users() {
let users = vec![
User { id: 1, name: "Alice".into() },
User { id: 2, name: "Bob".into() },
];
let doc = parse(user_list(&users));
assert_eq!(select_count(&doc, ".user-card"), 2);
}
This approach survives attribute reordering, whitespace changes, and additions to unrelated parts of the template. It breaks only when the structural contract changes, which is what you want.
Verifying HTML escaping
Maud auto-escapes interpolated content. Verify this in tests whenever a component renders user-provided input:
#[test]
fn escapes_user_input() {
let doc = parse(user_card(1, "<script>alert('xss')</script>"));
let html = select_one(&doc, ".user-card h2").inner_html();
assert!(!html.contains("<script>"));
assert!(html.contains("<script>"));
}
Integration tests with testcontainers
testcontainers starts real Docker containers for PostgreSQL, Redis, Restate, or any other service your application depends on. Each test gets isolated infrastructure that is torn down automatically when the test finishes.
[dev-dependencies]
testcontainers = "0.27"
testcontainers-modules = { version = "0.15", features = ["postgres", "redis"] }
The testcontainers-modules crate re-exports the core testcontainers crate, so you can import from either.
PostgreSQL
use testcontainers_modules::postgres::Postgres;
use testcontainers_modules::testcontainers::runners::AsyncRunner;
use sqlx::PgPool;
async fn start_postgres() -> (testcontainers::ContainerAsync<Postgres>, PgPool) {
let container = Postgres::default()
.with_db_name("test_db")
.with_user("test")
.with_password("test")
.start()
.await
.expect("failed to start postgres container");
let host = container.get_host().await.unwrap();
let port = container.get_host_port_ipv4(5432).await.unwrap();
let url = format!("postgres://test:test@{host}:{port}/test_db");
let pool = PgPool::connect(&url).await.expect("failed to connect to test database");
sqlx::migrate!().run(&pool).await.expect("failed to run migrations");
(container, pool)
}
The container variable must stay in scope for the duration of the test. When it is dropped, testcontainers stops and removes the Docker container.
sqlx::migrate!() reads migrations from the migrations/ directory (relative to the crate’s Cargo.toml) and applies them to the fresh database. Every test starts with a clean schema.
Redis
use testcontainers_modules::redis::{Redis, REDIS_PORT};
use testcontainers_modules::testcontainers::runners::AsyncRunner;
async fn start_redis() -> (testcontainers::ContainerAsync<Redis>, String) {
let container = Redis::default()
.start()
.await
.expect("failed to start redis container");
let host = container.get_host().await.unwrap();
let port = container.get_host_port_ipv4(REDIS_PORT).await.unwrap();
let url = format!("redis://{host}:{port}");
(container, url)
}
Restate
Restate provides its own testcontainers integration through the restate-sdk-testcontainers crate:
[dev-dependencies]
restate-sdk = "0.9"
restate-sdk-testcontainers = "0.9"
use restate_sdk_testcontainers::TestEnvironment;
#[tokio::test]
async fn test_workflow() {
let env = TestEnvironment::new()
.start(my_service_endpoint)
.await;
let ingress_url = env.ingress_url();
}
TestEnvironment starts a Restate container, binds your service endpoint to a random port, health-checks it, and registers the endpoint with Restate’s admin API. The ingress_url() gives you the URL for sending requests through Restate.
For services without a dedicated testcontainers module, use GenericImage:
use testcontainers::{GenericImage, runners::AsyncRunner};
use testcontainers::core::{IntoContainerPort, WaitFor};
let container = GenericImage::new("my-service", "1.0")
.with_exposed_port(8080.tcp())
.with_wait_for(WaitFor::message_on_stdout("ready"))
.start()
.await
.unwrap();
Testing Axum handlers
The axum_test crate provides an ergonomic test client for Axum applications. It handles request construction, response parsing, and cookie persistence.
[dev-dependencies]
axum-test = "0.22"
Basic setup
Create a function that builds the application Router with its state. This same function serves both production and tests:
#[derive(Clone)]
pub struct AppState {
pub db: PgPool,
}
pub fn app(state: AppState) -> Router {
Router::new()
.route("/users", get(list_users).post(create_user))
.route("/users/{id}", get(get_user))
.with_state(state)
}
In tests, build the state with a testcontainers-backed pool and pass it to TestServer:
use axum_test::TestServer;
#[tokio::test]
async fn test_list_users() {
let (_pg, pool) = start_postgres().await;
let state = AppState { db: pool };
let server = TestServer::new(app(state)).unwrap();
let response = server.get("/users").await;
response.assert_status_ok();
response.assert_header("content-type", "text/html; charset=utf-8");
}
The _pg binding keeps the PostgreSQL container alive for the duration of the test. Dropping it stops the container.
Testing HTML responses
Combine axum_test with the scraper helpers to assert on HTML structure:
#[tokio::test]
async fn test_user_list_renders_users() {
let (_pg, pool) = start_postgres().await;
seed_users(&pool).await;
let server = TestServer::new(app(AppState { db: pool })).unwrap();
let response = server.get("/users").await;
response.assert_status_ok();
let doc = Html::parse_document(&response.text());
assert_eq!(select_count(&doc, ".user-card"), 3);
assert_eq!(text_of(&doc, ".user-card:first-child h2"), "Alice");
}
Submit forms with .form() and verify the response:
#[tokio::test]
async fn test_create_user() {
let (_pg, pool) = start_postgres().await;
let server = TestServer::new(app(AppState { db: pool.clone() })).unwrap();
let response = server
.post("/users")
.form(&[("name", "Charlie"), ("email", "charlie@example.com")])
.await;
response.assert_status_ok();
let user = sqlx::query!("SELECT name FROM users WHERE email = 'charlie@example.com'")
.fetch_one(&pool)
.await
.unwrap();
assert_eq!(user.name, "Charlie");
}
Testing htmx requests
htmx adds an HX-Request: true header to every request. Add it in tests to exercise the fragment-returning code path:
#[tokio::test]
async fn test_htmx_search_returns_fragment() {
let (_pg, pool) = start_postgres().await;
seed_users(&pool).await;
let server = TestServer::new(app(AppState { db: pool })).unwrap();
let response = server
.get("/users/search")
.add_query_param("q", "alice")
.add_header("HX-Request", "true")
.await;
response.assert_status_ok();
let html = response.text();
assert!(!html.contains("<!DOCTYPE"));
assert!(html.contains("Alice"));
}
Cookie and session handling
Enable cookie persistence on the test server for testing authenticated flows:
#[tokio::test]
async fn test_authenticated_page() {
let (_pg, pool) = start_postgres().await;
seed_users(&pool).await;
let server = TestServer::builder()
.save_cookies()
.build(app(AppState { db: pool }))
.unwrap();
server
.post("/login")
.form(&[("email", "alice@example.com"), ("password", "correct-password")])
.await
.assert_status_ok();
let response = server.get("/dashboard").await;
response.assert_status_ok();
response.assert_text_contains("Welcome, Alice");
}
save_cookies() tells TestServer to persist cookies across requests, simulating a browser session.
Test fixtures and data setup
Seed test data with plain async functions that run after migrations:
async fn seed_users(pool: &PgPool) {
sqlx::query!(
"INSERT INTO users (name, email) VALUES ($1, $2), ($3, $4), ($5, $6)",
"Alice", "alice@example.com",
"Bob", "bob@example.com",
"Charlie", "charlie@example.com"
)
.execute(pool)
.await
.unwrap();
}
Call seed_users(&pool).await at the start of any test that needs data. Each test gets its own database container with its own schema, so fixtures do not conflict between tests.
For larger fixture sets, organise seed functions by domain:
pub async fn seed_users(pool: &PgPool) { }
pub async fn seed_projects(pool: &PgPool) { }
pub async fn seed_users_with_projects(pool: &PgPool) {
seed_users(pool).await;
seed_projects(pool).await;
}
Shared test setup
Extract container startup and state construction into a helper to avoid repeating the boilerplate in every test:
pub mod fixtures;
pub mod html;
use testcontainers::ContainerAsync;
use testcontainers_modules::postgres::Postgres;
pub struct TestContext {
pub server: axum_test::TestServer,
pub pool: PgPool,
_pg: ContainerAsync<Postgres>,
}
impl TestContext {
pub async fn new() -> Self {
let (pg, pool) = start_postgres().await;
let state = AppState { db: pool.clone() };
let server = axum_test::TestServer::new(app(state)).unwrap();
Self { server, pool, _pg: pg }
}
}
Tests become concise:
#[tokio::test]
async fn test_user_creation() {
let ctx = TestContext::new().await;
seed_users(&ctx.pool).await;
let response = ctx.server.get("/users").await;
response.assert_status_ok();
}
Running tests in CI
Container requirements
Testcontainers requires Docker. In GitHub Actions, Docker is available on the default ubuntu-latest runner. No special setup is needed.
name: Test
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- name: Run tests
run: cargo test --workspace
rust-cache caches compiled dependencies between runs. Without it, every CI run rebuilds the dependency tree from scratch.
SQLx offline mode
SQLx compile-time query checking requires a live database during compilation. In CI, compilation happens before Docker containers start, so the build would fail without offline mode.
Generate the query cache locally (with your database running):
cargo sqlx prepare --workspace
Commit the resulting .sqlx/ directory to version control. When DATABASE_URL is absent at compile time, SQLx uses the cached metadata instead of connecting to a database.
In CI, verify the cache is current:
- name: Check SQLx cache
run: cargo sqlx prepare --workspace --check
This fails if any query has changed without regenerating the cache.
Parallel test execution
cargo test runs tests in parallel by default using one thread per test. Testcontainers starts a separate Docker container per test, so parallelism works without shared state conflicts. Each test has its own database.
If container startup becomes a bottleneck (each PostgreSQL container takes 2-3 seconds), reduce parallelism:
cargo test -- --test-threads=4
Or use the TESTCONTAINERS environment variable with reusable-containers feature to share containers across tests in the same binary. This trades isolation for speed.
End-to-end browser testing
For testing the fully rendered application in a real browser, the most mature option is Playwright running via Node.js. Playwright does not care what language your server is written in. Start the Rust application, point Playwright at http://localhost:PORT, and write tests against the rendered HTML.
Rust-native alternatives exist. fantoccini is an async WebDriver client that works with Chrome, Firefox, and Safari via their respective drivers. thirtyfour provides similar WebDriver-based testing with a richer query and waiting API. Both are mature and actively maintained.
For most teams, Playwright provides the best E2E testing experience: auto-waiting, tracing, video recording, and multi-browser support. The trade-off is a Node.js dependency in your test toolchain. If that dependency is unacceptable, fantoccini or thirtyfour are solid pure-Rust options.
E2E tests are a complement to the unit and integration tests described above, not a replacement. Run them separately from cargo test, typically as a dedicated CI step after the application is built and running.
Gotchas
Container startup time. Each PostgreSQL container takes 2-3 seconds to start. For a test suite with dozens of integration tests, this adds up. Consider grouping related assertions into fewer, larger tests rather than many small ones, or using the reusable-containers feature to share containers.
Docker must be running. Testcontainers communicates with the Docker daemon. Tests fail immediately if Docker is not available. In CI, this is handled by the default runner. Locally, ensure Docker Desktop or the Docker daemon is running before running cargo test.
Port mapping. Testcontainers maps container ports to random host ports. Always use container.get_host_port_ipv4(5432) to get the mapped port. Never hardcode port numbers.
Container variable lifetime. The container handle (ContainerAsync<Postgres>) must remain in scope for the test’s duration. If it is dropped, the container stops. A common mistake is discarding the handle:
let pool = {
let (container, pool) = start_postgres().await;
pool
};
let (_container, pool) = start_postgres().await;
SQLx compile-time checking vs test databases. The query! macros check against DATABASE_URL at compile time. This is your development database, not the testcontainers database (which does not exist yet at compile time). The offline cache (.sqlx/) bridges this gap in CI. Locally, keep your development PostgreSQL running for compilation and let testcontainers handle test databases at runtime.
Continuous Integration and Delivery
Every commit should pass formatting checks, linting, compile-time query validation, and tests before it reaches the main branch. This section covers GitHub Actions workflows for Rust projects, building Docker images with cached dependency layers, and pushing images to a container registry.
GitHub Actions workflow structure
The workflows below use GitHub Actions. Forgejo Actions uses a compatible YAML format, so these workflows transfer to a self-hosted Forgejo instance with minor adjustments (runner labels, service hostnames). See the Forgejo section below for specifics. Keeping workflow files in .github/workflows/ works in both systems.
Split CI into parallel jobs for fast feedback. Formatting and linting fail quickly and cheaply. Tests take longer and need service containers. Running them in separate jobs means a formatting mistake fails in seconds, not after a full build.
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
CARGO_TERM_COLOR: always
jobs:
fmt:
name: Formatting
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
with:
components: rustfmt
- run: cargo fmt --all -- --check
clippy:
name: Clippy
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
with:
components: clippy
- uses: Swatinem/rust-cache@v2
- run: cargo clippy --all-targets --all-features -- -D warnings
test:
name: Test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- name: Run tests
run: cargo test --workspace
sqlx-check:
name: SQLx Cache
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- run: cargo install sqlx-cli --no-default-features --features postgres
- run: cargo sqlx prepare --workspace --check
Four jobs, each with a single responsibility:
- fmt checks formatting with
rustfmt. No caching needed since it does not compile code.
- clippy runs the Rust linter with warnings treated as errors (
-D warnings). Caching avoids recompiling dependencies on every run.
- test runs the full test suite. The testing section covers test setup in detail, including testcontainers for PostgreSQL and Redis,
axum_test for handler testing, and SQLx offline mode for compile-time query checking without a live database.
- sqlx-check verifies the
.sqlx/ query cache is current. This catches stale query metadata before it reaches production.
dtolnay/rust-toolchain installs the Rust toolchain. Specify the channel via the @ref syntax: @stable, @nightly, or a pinned version like @1.84.0. The components input adds tools like clippy and rustfmt.
Swatinem/rust-cache@v2 caches the ~/.cargo registry and ./target directory between runs. It builds cache keys from Cargo.toml, Cargo.lock, and the Rust compiler version, so a toolchain update or dependency change invalidates the cache automatically. Place it after the toolchain setup but before any build steps.
Without rust-cache, every CI run rebuilds the full dependency tree from scratch. For a typical Axum application with 100+ transitive dependencies, that adds several minutes to every pipeline.
The fmt job skips caching intentionally. cargo fmt --check only parses the source; it does not compile anything.
Service containers for integration tests
If your test suite needs PostgreSQL, Redis, or other services and you are not using testcontainers (which manages containers itself), GitHub Actions can start service containers:
test:
name: Test
runs-on: ubuntu-latest
services:
postgres:
image: postgres:17
env:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: test_db
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
redis:
image: redis:7
ports:
- 6379:6379
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- name: Run tests
run: cargo test --workspace
env:
DATABASE_URL: postgres://postgres:postgres@localhost:5432/test_db
REDIS_URL: redis://localhost:6379
The services block starts Docker containers before the job’s steps run. Health checks ensure the service is ready before tests begin. When the job runs directly on the runner (not inside a container), services are accessible at localhost on the mapped port.
The testcontainers approach described in the testing section manages containers per test and does not need the services block. Both approaches work in CI. Testcontainers gives per-test isolation; service containers give a single shared instance.
Building Docker images
Rust compiles to a single static binary. The build image needs the full Rust toolchain; the runtime image only needs the binary. A multi-stage Dockerfile separates these concerns.
Dockerfile with cargo-chef
cargo-chef caches dependency compilation across Docker builds. Without it, changing a single line of application code triggers a full rebuild of every dependency. With it, dependencies only rebuild when Cargo.toml or Cargo.lock change. The speedup is typically 3-5x.
The pattern uses three stages:
FROM lukemathwalker/cargo-chef:latest-rust-1 AS chef
WORKDIR /app
FROM chef AS planner
COPY . .
RUN cargo chef prepare --recipe-path recipe.json
FROM chef AS builder
COPY --from=planner /app/recipe.json recipe.json
RUN cargo chef cook --release --recipe-path recipe.json
COPY . .
RUN cargo build --release --bin myapp
FROM debian:bookworm-slim AS runtime
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
RUN groupadd -g 1001 appgroup && \
useradd -u 1001 -g appgroup -m appuser
COPY --from=builder --chown=appuser:appgroup /app/target/release/myapp /usr/local/bin/myapp
USER appuser
ENTRYPOINT ["/usr/local/bin/myapp"]
How it works:
- The planner stage scans all
Cargo.toml files and Cargo.lock and produces recipe.json, a manifest of your dependency tree with workspace structure.
- The builder stage first runs
cargo chef cook with the recipe. This downloads and compiles all dependencies. Docker caches this layer. As long as recipe.json has not changed (meaning your dependencies are the same), this step is a cache hit on subsequent builds. Then it copies the full source and compiles the application. Only this final compilation step runs on each code change.
- The runtime stage copies the compiled binary into a minimal Debian image.
ca-certificates is included for TLS connections. The application runs as a non-root user.
The lukemathwalker/cargo-chef base image bundles cargo-chef with the official Rust image. The latest-rust-1 tag tracks the latest Rust 1.x release.
Runtime image choices
debian:bookworm-slim (~80 MB) provides a good balance of size and debuggability. It includes a shell, basic tools, and glibc. For production troubleshooting, being able to docker exec into a container and run basic commands is worth the extra megabytes.
If image size or attack surface is a hard requirement, consider gcr.io/distroless/cc-debian12 (~20 MB, no shell) or scratch (empty, requires a fully statically linked binary built with musl). For most applications, bookworm-slim is the practical choice.
Pushing to a container registry
After the CI jobs pass, build the Docker image and push it to a registry. The image job depends on the test jobs so that broken code never produces an image.
GitHub Container Registry
image:
name: Build and Push Image
runs-on: ubuntu-latest
needs: [fmt, clippy, test, sqlx-check]
if: github.ref == 'refs/heads/main'
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/metadata-action@v5
id: meta
with:
images: ghcr.io/${{ github.repository }}
tags: |
type=sha
type=raw,value=latest,enable={{is_default_branch}}
- uses: docker/build-push-action@v6
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
This job runs only on pushes to main (not on pull requests). The GITHUB_TOKEN is automatically available and has sufficient permissions for GHCR when the packages: write permission is set.
The metadata-action generates two tags: a short commit SHA (sha-abc1234) for traceability, and latest for the default branch. The commit SHA tag lets you trace any running container back to the exact commit that produced it.
cache-from: type=gha and cache-to: type=gha,mode=max use the GitHub Actions cache backend for Docker layer caching. Combined with cargo-chef inside the Dockerfile, this means dependency layers are cached both within Docker and across CI runs.
Self-hosted container registry
If you run a self-hosted Forgejo instance or another registry, the workflow is the same pattern with different login credentials:
- uses: docker/login-action@v3
with:
registry: ${{ secrets.REGISTRY_URL }}
username: ${{ secrets.REGISTRY_USERNAME }}
password: ${{ secrets.REGISTRY_PASSWORD }}
Replace the images value in the metadata-action to match your registry URL. The rest of the workflow is identical.
Conventional commits
Conventional Commits is a format for commit messages that makes the history machine-readable. Each message starts with a type, an optional scope, and a description:
feat(auth): add password reset flow
fix(search): handle empty query parameter
docs: update deployment instructions
refactor(db): extract connection pool setup
chore(ci): update rust-toolchain to 1.84
The common types are feat (new feature), fix (bug fix), docs, refactor, test, chore, and ci. A ! after the type or scope (e.g. feat!: or feat(api)!:) signals a breaking change.
The value of this convention is consistent, scannable history. When every commit follows the same format, it becomes trivial to see what changed, filter commits by type, or generate changelogs automatically.
Several tools automate workflows around conventional commits. git-cliff generates changelogs from commit history using configurable templates. cocogitto provides commit validation, version bumping, and changelog generation in a single Rust-native binary. Both are actively maintained and integrate with CI pipelines. Choose based on what you actually need, whether that is changelog generation, commit validation, automated version bumps, or all three.
Enforcing the format in CI is optional but straightforward. A validation step on pull requests catches non-conforming commits before they reach the main branch.
Self-hosted CI with Forgejo
Forgejo is a self-hosted Git forge with its own CI system, Forgejo Actions. Workflow files use YAML syntax similar to GitHub Actions and can be placed in .forgejo/workflows/ or .github/workflows/. Many GitHub Actions concepts carry over: triggers, job matrices, service containers, and step syntax.
Forgejo Actions is still marked as experimental. The main differences from GitHub Actions:
- The default runner image is minimal Debian with Node.js, not the full Ubuntu image GitHub provides. You may need to install additional build tools.
- Service container networking uses the service label as the hostname (e.g.
postgres rather than localhost) since both the job and the service run on the same Docker network.
- Some GitHub Actions marketplace actions need modification to work with Forgejo.
- The Forgejo Runner must be installed separately from the Forgejo instance, ideally on a different machine.
For a project that currently uses GitHub Actions and may migrate later, keep workflows in .github/workflows/. Forgejo falls back to that directory when .forgejo/workflows/ does not exist, which means the same workflow files work in both systems with minor adjustments for runner labels and service hostnames.
Gotchas
GitHub Actions cache limit. GitHub enforces a 10 GB total cache limit per repository. Rust builds produce large target directories. If caches are evicted frequently, check that Swatinem/rust-cache is configured with save-if: ${{ github.ref == 'refs/heads/main' }} to avoid filling the cache with every PR branch.
SQLx offline mode is required in CI. SQLx’s compile-time query checking connects to a live database. During CI compilation, no database is running yet. Run cargo sqlx prepare --workspace locally and commit the .sqlx/ directory. The sqlx-check job in the workflow above catches stale metadata.
cargo-chef and workspace layout. cargo chef prepare scans all Cargo.toml files in the workspace. If your workspace structure changes (adding or removing crates), the recipe changes and the dependency cache invalidates. This is correct behaviour. What can catch you off guard: the planner stage copies the entire source tree. If you have large non-Rust files (assets, data), consider adding a .dockerignore file to exclude them from the build context.
Docker BuildKit is required for cache mounts. The docker/setup-buildx-action enables BuildKit automatically in GitHub Actions. If building locally, ensure BuildKit is enabled (DOCKER_BUILDKIT=1 or Docker Desktop with BuildKit as default).
Container registry authentication in CI. The GITHUB_TOKEN has sufficient permissions for GHCR if the workflow sets permissions: packages: write. For self-hosted registries, store credentials as repository secrets, never in the workflow file.
Deployment
A single VPS running Docker Compose handles more traffic than most applications will ever see. A compiled Rust binary serving HTML fragments through Axum is fast enough that a 4 GB server comfortably sustains thousands of concurrent users. Start here. Add servers when monitoring shows you need them, not before.
This section covers the full deployment path: cross-compiling the application, packaging it as a Docker image, provisioning infrastructure with Terraform, orchestrating services with Docker Compose, and handling database backups. It starts with a single-server setup and describes how to grow into a multi-server architecture when the time comes.
The architecture progression
Three stages, from simplest to most complex:
Stage 1: Single VPS. One server runs everything: your application, PostgreSQL, Redis, Caddy, and the observability stack. This is the right starting point for most projects. It deploys in minutes and has no distributed-systems complexity.
Stage 2: Separate services. Split into two or three servers: one for your application, one for shared services (PostgreSQL, Redis, Grafana stack), and optionally one for your software forge (Forgejo). Block storage volumes hold persistent data. Tailscale connects the servers privately. This stage suits production workloads that need independent scaling or isolation between the database and the application.
Stage 3: Kubernetes. When Docker Compose on a small number of servers is genuinely constraining you, not before. The scaling strategy section covers the signals.
The Terraform configurations in this section provision stage 2 (separate stacks for volumes, services, and application). Collapsing them onto a single server for stage 1 is straightforward: put everything in one Compose file and skip the networking.
Hetzner Cloud
Hetzner Cloud offers VPS instances at a fraction of the price of AWS or DigitalOcean for equivalent specs. Check Hetzner’s pricing page for current rates.
Server lines
| Line | CPU | Use case |
| CX | Shared Intel/AMD | General workloads |
| CAX | Shared ARM (Ampere Altra) | Best price-performance ratio |
| CPX | Dedicated AMD EPYC | CPU-intensive workloads |
| CCX | Dedicated high-memory | Databases, caching |
For a Rust web application, a CX23 or CX33 (4 vCPU / 8 GB) is a strong starting point. The CAX (ARM) line offers better price-performance, but requires ARM64 Docker images, which adds cross-compilation complexity. Stick with x86 (CX line) unless you have a reason to use ARM.
Regions
Hetzner operates data centres in Nuremberg (NBG1), Falkenstein (FSN1), Helsinki (HEL1), Ashburn (ASH), Hillsboro (HIL), and Singapore (SIN). EU regions include 20-60 TB of traffic. US and Singapore regions cost more and include less traffic. Choose the region closest to your users.
Block storage
Hetzner Volumes provide block storage that attaches to a server. Volumes persist independently of the server lifecycle, which is the entire point: you can destroy and recreate a server without losing data.
- 10 GB minimum, 10 TB maximum
- A volume can only attach to one server at a time
- Volume and server must be in the same location
- Data is stored with triple replication
Use volumes for PostgreSQL data directories, Redis persistence, and any other state that must survive a server rebuild.
Building the application
Cross-compile on your development machine (or in CI), then package the binary into a minimal Docker image. This avoids slow in-Docker Rust compilation entirely.
Cross-compilation with cargo-zigbuild
cargo-zigbuild replaces the system linker with Zig’s cross-compilation toolchain. It produces Linux binaries from macOS (or any host) without Docker, a Linux VM, or a separate cross-compilation toolchain.
Install cargo-zigbuild and Zig:
cargo install --locked cargo-zigbuild
brew install zig
Add the target and build:
rustup target add x86_64-unknown-linux-gnu
cargo zigbuild --release --target x86_64-unknown-linux-gnu
The binary lands in target/x86_64-unknown-linux-gnu/release/. It is dynamically linked against glibc, which is fine because the runtime Docker image includes glibc.
To pin a specific minimum glibc version (Debian 12 ships glibc 2.36):
cargo zigbuild --release --target x86_64-unknown-linux-gnu.2.36
The glibc version suffix ensures the binary runs on any system with that glibc version or newer. This prevents surprises where a binary compiled against a newer glibc fails on an older host.
The Dockerfile
The Docker image does not compile anything. It copies the pre-built binary into a minimal base image.
FROM gcr.io/distroless/cc-debian12:nonroot
COPY target/x86_64-unknown-linux-gnu/release/myapp /usr/local/bin/myapp
ENTRYPOINT ["/usr/local/bin/myapp"]
distroless/cc-debian12 includes glibc, libgcc, CA certificates, and timezone data. Nothing else: no shell, no package manager, no utilities. The :nonroot tag runs as UID 65534 instead of root.
The resulting image is roughly 30 MB. Builds take seconds because there is no compilation step.
Build and tag the image:
cargo zigbuild --release --target x86_64-unknown-linux-gnu
docker build -t git.example.com/myorg/myapp:v1.0.0 .
Pushing to the registry
This section assumes a running Forgejo instance with its built-in container registry. The registry is enabled by default.
Log in and push:
docker login git.example.com
docker push git.example.com/myorg/myapp:v1.0.0
Authenticate with a Forgejo personal access token that has package:read and package:write scopes. Create one under Settings > Applications in Forgejo.
Terraform provisions the servers, volumes, firewall rules, and networking. The hcloud provider (v1.60+) is officially maintained by Hetzner and covers the full API.
Stack separation
Split infrastructure into three Terraform stacks (separate state files):
- Volumes stack: block storage for persistent data. Deploy first, destroy last (or never).
- Services stack: VPS for PostgreSQL, Redis, observability. References volumes by data source.
- Application stack: VPS for Caddy and the application. Can be destroyed and recreated without affecting data.
This separation protects persistent data. Rebuilding the application server does not touch the database volume. Rebuilding the services server does not touch the volumes themselves.
Shared variables
Each stack needs access to the Hetzner API token and common settings. Create a terraform.tfvars file (git-ignored) in each stack directory:
hcloud_token = "your-hetzner-api-token"
location = "fsn1"
ssh_key_name = "deploy-key"
Stack 1: Volumes
terraform {
required_providers {
hcloud = {
source = "hetznercloud/hcloud"
version = "~> 1.60"
}
}
}
variable "hcloud_token" { sensitive = true }
variable "location" { default = "fsn1" }
provider "hcloud" {
token = var.hcloud_token
}
resource "hcloud_volume" "postgres" {
name = "postgres-data"
size = 50
location = var.location
format = "ext4"
delete_protection = true
labels = {
service = "postgres"
}
}
resource "hcloud_volume" "redis" {
name = "redis-data"
size = 10
location = var.location
format = "ext4"
delete_protection = true
labels = {
service = "redis"
}
}
resource "hcloud_volume" "grafana" {
name = "grafana-data"
size = 20
location = var.location
format = "ext4"
delete_protection = true
labels = {
service = "grafana"
}
}
output "postgres_volume_id" { value = hcloud_volume.postgres.id }
output "redis_volume_id" { value = hcloud_volume.redis.id }
output "grafana_volume_id" { value = hcloud_volume.grafana.id }
delete_protection = true prevents accidental terraform destroy from deleting the volumes. To remove a protected volume, you must first set delete_protection = false and apply, then destroy. This is intentional friction.
Stack 2: Services VPS
terraform {
required_providers {
hcloud = {
source = "hetznercloud/hcloud"
version = "~> 1.60"
}
}
}
variable "hcloud_token" { sensitive = true }
variable "location" { default = "fsn1" }
variable "ssh_key_name" {}
variable "postgres_volume_id" {}
variable "redis_volume_id" {}
variable "grafana_volume_id" {}
provider "hcloud" {
token = var.hcloud_token
}
data "hcloud_ssh_key" "deploy" {
name = var.ssh_key_name
}
resource "hcloud_firewall" "services" {
name = "services"
rule {
direction = "in"
protocol = "tcp"
port = "22"
source_ips = ["0.0.0.0/0", "::/0"]
}
rule {
direction = "in"
protocol = "udp"
port = "41641"
source_ips = ["0.0.0.0/0", "::/0"]
}
}
resource "hcloud_server" "services" {
name = "services-1"
server_type = "cx33"
image = "ubuntu-24.04"
location = var.location
ssh_keys = [data.hcloud_ssh_key.deploy.id]
firewall_ids = [hcloud_firewall.services.id]
user_data = file("${path.module}/cloud-init.yaml")
}
resource "hcloud_volume_attachment" "postgres" {
volume_id = var.postgres_volume_id
server_id = hcloud_server.services.id
automount = true
}
resource "hcloud_volume_attachment" "redis" {
volume_id = var.redis_volume_id
server_id = hcloud_server.services.id
automount = true
}
resource "hcloud_volume_attachment" "grafana" {
volume_id = var.grafana_volume_id
server_id = hcloud_server.services.id
automount = true
}
output "services_ip" { value = hcloud_server.services.ipv4_address }
Pass the volume IDs from stack 1 via terraform.tfvars:
hcloud_token = "your-token"
location = "fsn1"
ssh_key_name = "deploy-key"
postgres_volume_id = 12345678
redis_volume_id = 12345679
grafana_volume_id = 12345680
Stack 3: Application VPS
terraform {
required_providers {
hcloud = {
source = "hetznercloud/hcloud"
version = "~> 1.60"
}
}
}
variable "hcloud_token" { sensitive = true }
variable "location" { default = "fsn1" }
variable "ssh_key_name" {}
variable "domain" {}
provider "hcloud" {
token = var.hcloud_token
}
data "hcloud_ssh_key" "deploy" {
name = var.ssh_key_name
}
resource "hcloud_firewall" "app" {
name = "app"
rule {
direction = "in"
protocol = "tcp"
port = "80"
source_ips = ["0.0.0.0/0", "::/0"]
}
rule {
direction = "in"
protocol = "tcp"
port = "443"
source_ips = ["0.0.0.0/0", "::/0"]
}
rule {
direction = "in"
protocol = "tcp"
port = "22"
source_ips = ["0.0.0.0/0", "::/0"]
}
rule {
direction = "in"
protocol = "udp"
port = "41641"
source_ips = ["0.0.0.0/0", "::/0"]
}
}
resource "hcloud_server" "app" {
name = "app-1"
server_type = "cx23"
image = "ubuntu-24.04"
location = var.location
ssh_keys = [data.hcloud_ssh_key.deploy.id]
firewall_ids = [hcloud_firewall.app.id]
user_data = file("${path.module}/cloud-init.yaml")
}
output "app_ip" { value = hcloud_server.app.ipv4_address }
Cloud-init for server setup
Both VPS stacks use a cloud-init.yaml that installs Docker and Tailscale on first boot:
package_update: true
package_upgrade: true
packages:
- curl
- ca-certificates
runcmd:
- curl -fsSL https://get.docker.com | sh
- systemctl enable docker
- systemctl start docker
- curl -fsSL https://tailscale.com/install.sh | sh
- mkdir -p /opt/app
After Terraform provisions the server, SSH in and run tailscale up with an auth key to join the tailnet. Subsequent deployments happen over Tailscale.
Tailscale for secure networking
Tailscale builds a WireGuard-based mesh VPN between your servers. Each server gets a stable IP on the 100.x.y.z range. All traffic between servers is encrypted end-to-end. The free tier supports up to 3 users and 100 devices.
Why Tailscale
Without Tailscale, the services VPS must expose PostgreSQL (port 5432), Redis (port 6379), and the Grafana stack to the public internet, even if firewalled to specific IPs. With Tailscale, these services bind only to the Tailscale interface. They are invisible to the public internet entirely.
Tailscale also simplifies SSH access. Once your servers are on the tailnet, you can close port 22 in the Hetzner firewall and SSH over the Tailscale IP instead.
Setting up servers
On each server, install Tailscale (already done via cloud-init) and authenticate:
tailscale up --authkey tskey-auth-xxxxx --advertise-tags=tag:server
Tagged auth keys disable key expiry, so the server stays connected indefinitely. Generate auth keys in the Tailscale admin console.
Verify connectivity:
ping services-1
Tailscale’s MagicDNS assigns each machine a hostname on your tailnet. Use these hostnames in your application’s DATABASE_URL and Redis connection strings instead of public IPs.
Docker sidecar pattern
For services running inside Docker containers, use a Tailscale sidecar container. Other containers share its network stack via network_mode: service::
services:
tailscale:
image: tailscale/tailscale:latest
hostname: app-1
environment:
TS_AUTHKEY: ${TS_AUTHKEY}
TS_EXTRA_ARGS: --advertise-tags=tag:server
TS_STATE_DIR: /var/lib/tailscale
TS_USERSPACE: "false"
volumes:
- ts-state:/var/lib/tailscale
devices:
- /dev/net/tun:/dev/net/tun
cap_add:
- net_admin
restart: unless-stopped
app:
image: git.example.com/myorg/myapp:latest
network_mode: service:tailscale
depends_on:
- tailscale
network_mode: service:tailscale makes the app container reachable at the Tailscale IP. The Tailscale state volume (ts-state) preserves the node identity across container restarts.
The sidecar approach is most useful when the application itself needs to be reachable over Tailscale. For the simpler case where the application only connects to services over Tailscale (and the host already has Tailscale installed), the host-level Tailscale installation is sufficient and the sidecar is not needed.
Production Docker Compose
Services VPS
The services VPS runs PostgreSQL, Redis, and the observability stack. Each service stores data on a Hetzner Volume mounted to the host.
services:
postgres:
image: postgres:17-alpine
restart: unless-stopped
ports:
- "100.x.y.z:5432:5432"
volumes:
- /mnt/postgres-data/pgdata:/var/lib/postgresql/data
environment:
POSTGRES_USER: myapp
POSTGRES_PASSWORD: ${DB_PASSWORD}
POSTGRES_DB: myapp
healthcheck:
test: ["CMD-SHELL", "pg_isready -U myapp"]
interval: 10s
timeout: 5s
retries: 5
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
redis:
image: redis:7-alpine
restart: unless-stopped
ports:
- "100.x.y.z:6379:6379"
volumes:
- /mnt/redis-data:/data
command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}
healthcheck:
test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
interval: 10s
timeout: 5s
retries: 5
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
Replace 100.x.y.z with the services server’s Tailscale IP. Binding to the Tailscale IP means PostgreSQL and Redis accept connections only from the tailnet, not from the public internet.
Volume mount paths (/mnt/postgres-data/, /mnt/redis-data/) correspond to where Hetzner automounts the attached volumes. Check the mount points with lsblk or df -h after Terraform provisions the server.
The observability stack (Grafana, Loki, Tempo, Prometheus, OTel Collector) runs on the same VPS. See the observability section for Docker Compose configuration of those services.
Application VPS
The application VPS runs Caddy and the application.
services:
app:
image: git.example.com/myorg/myapp:${TAG:-latest}
restart: unless-stopped
expose:
- "3000"
env_file:
- .env.production
healthcheck:
test: ["CMD", "/usr/local/bin/myapp", "healthcheck"]
interval: 30s
timeout: 10s
retries: 3
start_period: 15s
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
caddy:
image: caddy:2-alpine
restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile:ro
- caddy-data:/data
- caddy-config:/config
depends_on:
app:
condition: service_healthy
volumes:
caddy-data:
caddy-config:
The application reads DATABASE_URL, REDIS_URL, and other configuration from .env.production. These connection strings use Tailscale hostnames:
DATABASE_URL=postgres://myapp:password@services-1:5432/myapp
REDIS_URL=redis://:password@services-1:6379
OTEL_EXPORTER_OTLP_ENDPOINT=http://services-1:4317
See the configuration section for the full .env.production setup and how Terraform provisions it.
Health check implementation
The distroless runtime image has no shell, so curl and wget are not available. Implement a health check subcommand in the application binary:
fn main() {
dotenvy::dotenv().ok();
if std::env::args().nth(1).as_deref() == Some("healthcheck") {
match healthcheck() {
Ok(()) => std::process::exit(0),
Err(e) => {
eprintln!("healthcheck failed: {e}");
std::process::exit(1);
}
}
}
run();
}
fn healthcheck() -> Result<(), Box<dyn std::error::Error>> {
let port = std::env::var("PORT").unwrap_or_else(|_| "3000".to_string());
let url = format!("http://127.0.0.1:{port}/health");
let resp = ureq::get(&url).call()?;
if resp.status() == 200 { Ok(()) } else { Err("non-200 status".into()) }
}
Add ureq (a minimal synchronous HTTP client) to the server crate’s dependencies. The health check subcommand calls the application’s /health endpoint and exits with code 0 or 1. Docker uses the exit code to determine container health.
Caddy as reverse proxy
Caddy handles TLS termination and reverse proxying. Its defining feature is automatic HTTPS: point a domain at your server, and Caddy obtains and renews a Let’s Encrypt certificate with zero configuration.
Caddyfile
app.example.com {
reverse_proxy app:3000
}
That is the entire configuration. Caddy obtains a TLS certificate for app.example.com, terminates HTTPS, and proxies requests to the application container on port 3000.
For multiple services behind different subdomains:
app.example.com {
reverse_proxy app:3000
}
grafana.example.com {
reverse_proxy services-1:3000
}
Caddy resolves services-1 via Tailscale’s MagicDNS when the Caddy container shares the host’s Tailscale network (or uses the Tailscale sidecar pattern).
Requirements
Caddy’s automatic HTTPS needs two things:
- A DNS A record pointing your domain to the server’s public IP.
- Ports 80 and 443 open and routed to Caddy (for the ACME challenge and HTTPS traffic).
Both are handled by the Terraform firewall configuration and your DNS provider. If you manage DNS through Hetzner, the hcloud provider supports DNS zone and record resources (hcloud_zone, hcloud_zone_record).
Why Caddy over Nginx
Nginx requires certbot, cron jobs, and manual renewal configuration for Let’s Encrypt certificates. Caddy does this automatically. For a reverse proxy in front of a Rust application, Caddy’s simpler configuration and automatic certificate management eliminate an entire category of operational work. Nginx’s performance advantage is irrelevant at the traffic levels where you are running one or two VPS instances.
Database deployment and backups
PostgreSQL runs in a Docker container on the services VPS, with its data directory mounted on a Hetzner Volume. The volume provides persistence and triple replication at the storage layer. Backups provide recovery from logical errors (accidental deletes, bad migrations) that replication cannot protect against.
pg_dump with cron
A daily pg_dump via cron is the simplest backup strategy and covers the majority of use cases.
Create the backup script on the services VPS:
set -euo pipefail
BACKUP_DIR="/opt/app/backups"
RETENTION_DAYS=14
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
DUMP_FILE="${BACKUP_DIR}/myapp-${TIMESTAMP}.dump"
mkdir -p "${BACKUP_DIR}"
docker exec postgres pg_dump \
-U myapp \
-Fc \
myapp > "${DUMP_FILE}"
find "${BACKUP_DIR}" -name "myapp-*.dump" -mtime +${RETENTION_DAYS} -delete
echo "Backup complete: ${DUMP_FILE} ($(du -h "${DUMP_FILE}" | cut -f1))"
-Fc produces a custom-format dump: compressed, supports selective restore of individual tables, and works with pg_restore. It is smaller and more flexible than plain SQL dumps.
Scheduling with cron
echo "0 2 * * * root /opt/app/scripts/backup-db.sh >> /var/log/db-backup.log 2>&1" \
| sudo tee /etc/cron.d/db-backup
Off-site copies
A backup on the same server as the database is not a real backup. Copy dumps to an S3-compatible storage provider. Add this to the backup script after the dump:
S3_BUCKET="s3://myapp-backups"
S3_ENDPOINT="https://fsn1.your-objectstorage.com"
aws s3 cp "${DUMP_FILE}" "${S3_BUCKET}/db/${TIMESTAMP}.dump" \
--endpoint-url "${S3_ENDPOINT}"
aws s3 ls "${S3_BUCKET}/db/" --endpoint-url "${S3_ENDPOINT}" \
| awk '{print $4}' \
| head -n -30 \
| xargs -I{} aws s3 rm "${S3_BUCKET}/db/{}" --endpoint-url "${S3_ENDPOINT}"
Install the AWS CLI on the services VPS (apt install awscli) and configure it with your S3 credentials. The AWS CLI works with any S3-compatible provider.
Restoring from backup
docker exec -i postgres pg_restore \
-U myapp \
-d myapp \
--clean \
--if-exists \
< /opt/app/backups/myapp-20260227-020000.dump
--clean --if-exists drops existing objects before restoring, which handles the common case of restoring to an existing database. Test the restore process periodically. An untested backup is not a backup.
When pg_dump is not enough
pg_dump takes a full snapshot every time. For databases larger than a few GB, consider:
- pgBackRest: supports incremental and differential backups, parallel restore, backup verification, and direct archiving to S3. It is the standard tool for serious PostgreSQL backup infrastructure.
- WAL archiving with point-in-time recovery: continuous archiving of write-ahead logs enables recovery to any point in time, not just the last backup. This requires more configuration but provides the strongest recovery guarantees.
For most applications in this guide’s scope, daily pg_dump with off-site copies is sufficient.
Deploying updates
The simplest deployment workflow: pull the new image and recreate the container.
docker compose pull app
docker compose up -d app
docker compose up -d app recreates only the app service if its image has changed. There is a brief interruption (typically 1-3 seconds) while the old container stops and the new one starts and passes its health check. For most applications, this is acceptable.
Wrap this in a deployment script:
set -euo pipefail
TAG="${1:?Usage: deploy.sh <tag>}"
export TAG
cd /opt/app
docker compose pull app
docker compose up -d app
echo "Deployed ${TAG}"
echo "Waiting for health check..."
docker compose exec app /usr/local/bin/myapp healthcheck
echo "Healthy."
Trigger the script from CI or run it manually over SSH (via Tailscale):
ssh services-1 "cd /opt/app && TAG=v1.2.3 ./deploy.sh v1.2.3"
Zero-downtime with Caddy
If a few seconds of downtime during deployment is not acceptable, use a blue-green approach with Caddy’s admin API. Run two instances of the application (blue and green). Deploy to the inactive one, verify it is healthy, then switch Caddy’s upstream.
Caddy exposes an admin API on localhost:2019 that can update routing without a restart. Switch the upstream from blue to green atomically:
curl -X PATCH http://localhost:2019/config/apps/http/servers/srv0/routes/0/handle/0/upstreams \
-H "Content-Type: application/json" \
-d '[{"dial": "green:3000"}]'
This is more complex to set up and maintain. Start with the simple recreate approach and add blue-green when you have a genuine need for zero-downtime deployments.
Scaling strategy
Docker Compose on a single VPS scales further than most people expect. A Rust application serving HTML is fast. PostgreSQL on a dedicated volume with proper indexes handles millions of rows. You will likely hit organisational complexity before you hit server capacity.
When to add servers
- Database contention: the application and database compete for CPU or memory on the same server. Move the database to the services VPS (stage 2).
- Backup impact:
pg_dump on a busy database affects application latency. A separate services VPS isolates backup I/O.
- Independent scaling: the application needs more CPU but the database does not (or vice versa). Separate servers let you size each independently.
- Isolation requirements: security policy requires the database to be on a server with no public internet access.
When to consider Kubernetes
Stay on Docker Compose until you genuinely exhaust what a small number of VPS instances can provide. The signals that Kubernetes might be worth the operational cost:
- Multiple application instances across servers: you need horizontal scaling beyond what a single server provides, and a load balancer in front of multiple app servers.
- Auto-scaling: traffic is bursty enough that you need to add and remove capacity automatically.
- Many services with complex dependencies: once you pass 10-20 containers with interdependent deployment ordering, Docker Compose files become fragile.
- Multiple teams deploying independently: Kubernetes namespaces and RBAC provide isolation that Docker Compose does not.
If you reach this point, k3s is a lightweight Kubernetes distribution that runs on a single node or a small cluster. It installs in under a minute, uses roughly 500 MB of RAM, and provides full Kubernetes API compatibility. k3s on two or three Hetzner CX33 instances is a reasonable stepping stone before full managed Kubernetes.
Do not adopt Kubernetes because it seems like the professional choice. The operational complexity is real. For most Rust web applications, Docker Compose on one to three servers is the right answer for years.
Gotchas
Mount the Hetzner Volume before starting containers. If a volume is not mounted when Docker Compose starts, containers write to the server’s local disk. When the volume is later mounted, the data on local disk is hidden. Verify mount points with df -h before the first docker compose up.
Pin Docker image tags in production. Use myapp:v1.2.3, not myapp:latest. With latest, docker compose pull fetches whatever was most recently pushed, which makes deployments unpredictable and rollbacks impossible.
Caddy’s data volume is important. The caddy-data volume stores TLS certificates and ACME account keys. Losing this volume means Caddy must re-issue all certificates, which can hit Let’s Encrypt rate limits (50 certificates per registered domain per week). Keep the Caddy data volume on persistent storage.
Tailscale auth keys expire. Pre-authentication keys have a maximum lifetime of 90 days. Once a server joins the tailnet, the key is no longer needed, so this only matters for new server provisioning. Use tagged auth keys (--advertise-tags=tag:server) to disable node key expiry on the device itself.
Docker logging fills disks. Without max-size and max-file on the json-file logging driver, container logs grow without bound. Set these on every service in production Compose files.
Test your backups. Periodically restore a backup to a temporary database and verify the data. A backup that cannot be restored is not a backup.
Distroless has no shell. You cannot docker exec -it app bash into a distroless container. For debugging, run a temporary container with a full image (docker run -it --network container:app debian:bookworm-slim bash) that shares the application container’s network namespace. Or add a debug sidecar to the Compose file temporarily.
Web Application Performance
Rust already eliminates the largest performance tax in most web stacks: garbage collection pauses, interpreter overhead, and runtime type-checking. An Axum application serving HTML fragments through Maud starts fast and stays fast under load. The HDA architecture adds a structural advantage: no client-side framework bundle to download and parse, no JSON serialisation layer between server and browser, and smaller payloads because HTML fragments replace full JSON responses plus client-side rendering.
That said, performance work in any stack follows the same rule: measure first, then optimise what the measurements show. Adding caching, compression layers, or index hints without evidence of a real problem creates complexity that must be maintained, debugged, and reasoned about. Every technique in this section is worth knowing. None of them should be applied preemptively.
HTTP caching is the highest-leverage performance tool available. A response that never reaches your server costs nothing to serve.
Cache-Control for dynamic responses
Set Cache-Control headers in Axum handlers using tuple responses:
use axum::{
http::header,
response::IntoResponse,
};
use maud::{html, Markup};
async fn product_page() -> impl IntoResponse {
let markup: Markup = html! { };
(
[(header::CACHE_CONTROL, "public, max-age=300")],
markup,
)
}
For pages that must revalidate on every request but can still benefit from conditional caching:
(
[(header::CACHE_CONTROL, "no-cache")],
markup,
)
no-cache does not mean “don’t cache.” It means “cache, but revalidate with the server before using.” Combined with an ETag, the server can respond with 304 Not Modified and skip sending the body entirely.
An ETag is a fingerprint of the response content. When the browser sends the ETag back in an If-None-Match header, the server can return a 304 Not Modified if the content has not changed, saving bandwidth and rendering time.
use axum::{
body::Body,
extract::Request,
http::{header, Response, StatusCode},
response::IntoResponse,
};
use std::hash::{DefaultHasher, Hash, Hasher};
fn compute_etag(content: &str) -> String {
let mut hasher = DefaultHasher::new();
content.hash(&mut hasher);
format!("\"{}\"", format!("{:x}", hasher.finish()))
}
async fn cacheable_page(req: Request) -> Response<Body> {
let html = render_page();
let etag = compute_etag(&html);
if let Some(if_none_match) = req.headers().get(header::IF_NONE_MATCH) {
if if_none_match.to_str().ok() == Some(etag.as_str()) {
return Response::builder()
.status(StatusCode::NOT_MODIFIED)
.header(header::ETAG, &etag)
.header(header::CACHE_CONTROL, "public, max-age=60")
.body(Body::empty())
.unwrap();
}
}
Response::builder()
.status(StatusCode::OK)
.header(header::ETAG, &etag)
.header(header::CACHE_CONTROL, "public, max-age=60")
.header(header::CONTENT_TYPE, "text/html; charset=utf-8")
.body(Body::from(html))
.unwrap()
}
For pages where generating the full HTML is itself expensive, ETags are less useful because you must still render the content to compute the hash. In those cases, consider using a version number, last-modified timestamp from the database, or a cache layer (covered below).
Apply Cache-Control to groups of routes using SetResponseHeaderLayer. This sets the header only if the handler did not already set one, so individual handlers can override the default.
use axum::{routing::get, Router};
use tower_http::set_header::SetResponseHeaderLayer;
use http::{header, HeaderValue};
let app = Router::new()
.route("/products", get(list_products))
.route("/products/{id}", get(product_page))
.layer(
SetResponseHeaderLayer::if_not_present(
header::CACHE_CONTROL,
HeaderValue::from_static("public, max-age=60"),
)
);
[dependencies]
tower-http = { version = "0.6", features = ["set-header"] }
Caching guidelines by content type
| Content type | Cache-Control | Rationale |
| Static assets (CSS, JS, images) | public, max-age=31536000, immutable | Content-hashed filenames mean the URL changes when the file changes. Cache forever. |
| Public pages (product listing, homepage) | public, max-age=60 to max-age=300 | Short TTL allows quick updates. Adjust based on how frequently content changes. |
| Personalised pages (dashboard, profile) | private, no-cache | Must not be cached by shared proxies. Revalidate on every request. |
| HTMX fragments | no-store or no-cache | Fragments usually reflect current state. no-store prevents any caching. no-cache allows ETag-based revalidation. |
| API responses (JSON, if you have them) | private, no-cache or short max-age | Depends on the data. Default to conservative. |
Static asset caching with content-hashed filenames is covered in the CSS section. The immutable directive tells browsers not to revalidate even when the user reloads the page, eliminating conditional requests entirely for fingerprinted assets.
Response compression
Compressing responses reduces bandwidth and improves page load times, particularly on slower connections. tower-http provides a compression middleware that negotiates the algorithm from the client’s Accept-Encoding header.
[dependencies]
tower-http = { version = "0.6", features = ["compression-gzip", "compression-br"] }
use axum::{routing::get, Router};
use tower_http::compression::CompressionLayer;
let app = Router::new()
.route("/", get(index))
.layer(CompressionLayer::new());
CompressionLayer::new() enables all compiled-in algorithms and negotiates automatically. By default it skips images, gRPC responses, Server-Sent Events, and responses smaller than 32 bytes.
Choosing algorithms
Enable gzip and Brotli for broad browser support. Zstandard (zstd) compresses faster than both but lacks Safari support as of early 2026.
let compression = CompressionLayer::new()
.gzip(true)
.br(true)
.zstd(false);
Tuning the compression predicate
The default predicate is conservative. Raise the minimum response size to avoid compressing tiny responses where the overhead exceeds the savings:
use tower_http::compression::{
CompressionLayer,
predicate::{NotForContentType, SizeAbove, Predicate},
};
let predicate = SizeAbove::new(256)
.and(NotForContentType::IMAGES)
.and(NotForContentType::SSE)
.and(NotForContentType::GRPC);
let compression = CompressionLayer::new().compress_when(predicate);
Compression level
For dynamic HTML responses, CompressionLevel::Default balances compression ratio against CPU cost. Avoid CompressionLevel::Best for dynamic content; the marginal size reduction does not justify the CPU cost per request.
use tower_http::compression::CompressionLevel;
let compression = CompressionLayer::new()
.quality(CompressionLevel::Default);
For static assets, pre-compress at build time rather than compressing on every request. tower-http’s ServeDir supports pre-compressed files:
use tower_http::services::ServeDir;
let static_files = ServeDir::new("static")
.precompressed_br()
.precompressed_gzip();
Place pre-compressed files alongside the originals (app.css.br, app.css.gz). ServeDir selects the correct variant based on the request’s Accept-Encoding.
PostgreSQL with proper indexes handles far more traffic than most developers expect. Before reaching for caching or read replicas, check that queries are efficient.
EXPLAIN ANALYZE
EXPLAIN ANALYZE executes a query and reports the actual execution plan, timing, and row counts.
EXPLAIN (ANALYZE, BUFFERS)
SELECT u.id, u.name, count(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name;
Read the plan from the innermost (most indented) nodes outward. Look for:
- Seq Scan on large tables. A sequential scan on a table with thousands of rows, where the query selects a small fraction, signals a missing index.
- Estimated vs actual row divergence. When the planner estimates 10 rows but actual is 50,000, it picks a bad join strategy. Run
ANALYZE tablename to update statistics.
- Nested Loop with high loops count. Multiply
actual time by loops to get the real elapsed time. A node showing actual time=0.05ms loops=10000 is really consuming 500ms.
- Sort on disk. If a Sort node reports
Sort Method: external merge Disk, increase work_mem or reduce the data being sorted.
BUFFERS is particularly useful: shared hit shows pages read from the PostgreSQL buffer cache, shared read shows pages fetched from disk. High read values on frequently-executed queries mean the working set exceeds shared_buffers.
Indexing strategies
B-tree (the default) covers equality, range, and sorting queries. Place equality columns before range columns in multicolumn indexes:
CREATE INDEX idx_orders_tenant_created ON orders (tenant_id, created_at);
Partial indexes cover only rows matching a condition. Effective for queue-like patterns:
CREATE INDEX idx_jobs_pending ON jobs (priority, created_at)
WHERE status = 'pending';
Covering indexes (INCLUDE) store extra columns in the index to enable index-only scans:
CREATE INDEX idx_users_email_covering ON users (email) INCLUDE (name);
GIN indexes handle JSONB, arrays, and full-text search vectors. They are larger and slower to update than B-tree but fast for containment lookups.
Do not create indexes speculatively. Every index slows down writes. Add indexes when EXPLAIN ANALYZE shows a problem. Use CREATE INDEX CONCURRENTLY on production tables to avoid locking writes during creation.
Monitor index usage
Find unused indexes that are slowing down writes for no benefit:
SELECT schemaname, relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
The N+1 problem
One query to fetch a list, then one query per item to fetch related data. The most common performance problem in web applications.
let orders = sqlx::query_as!(Order, "SELECT * FROM orders LIMIT 50")
.fetch_all(&pool).await?;
for order in &orders {
let user = sqlx::query_as!(User,
"SELECT name FROM users WHERE id = $1", order.user_id)
.fetch_one(&pool).await?;
}
Fix with a JOIN:
let results = sqlx::query_as!(
OrderWithUser,
r#"SELECT o.id as order_id, o.total, u.name as user_name
FROM orders o
JOIN users u ON u.id = o.user_id
LIMIT 50"#
)
.fetch_all(&pool).await?;
Or batch-load with ANY when JOINs are not practical:
let user_ids: Vec<i32> = orders.iter().map(|o| o.user_id).collect();
let users = sqlx::query_as!(User,
"SELECT id, name FROM users WHERE id = ANY($1)", &user_ids)
.fetch_all(&pool).await?;
Connection pool tuning
SQLx’s default pool settings are conservative. Tune them for web workloads:
use sqlx::postgres::PgPoolOptions;
use std::time::Duration;
let pool = PgPoolOptions::new()
.max_connections(20)
.min_connections(2)
.acquire_timeout(Duration::from_secs(5))
.idle_timeout(Duration::from_secs(300))
.max_lifetime(Duration::from_secs(1800))
.connect(&database_url)
.await?;
Key adjustments:
max_connections: set based on PostgreSQL’s max_connections divided by the number of application instances. With PostgreSQL’s default of 100 and four app instances, 20 per instance leaves headroom for admin connections and migrations.
min_connections: set to 2-5 to avoid cold-start latency. The default (0) means the first requests after an idle period wait for TCP handshake, TLS negotiation, and authentication.
acquire_timeout: the default (30 seconds) is far too long for a web request. Set to 3-5 seconds. Fail fast with a 503 rather than making the user wait.
Keeping transactions short
Do not hold database connections open during external HTTP calls or other non-database I/O:
let mut tx = pool.begin().await?;
sqlx::query!("UPDATE orders SET status = 'processing' WHERE id = $1", id)
.execute(&mut *tx).await?;
let result = reqwest::get("https://payment-api.example.com/charge").await?;
sqlx::query!("UPDATE orders SET status = $1 WHERE id = $2", result.status, id)
.execute(&mut *tx).await?;
tx.commit().await?;
Under load, this drains the pool. Restructure to minimise the transaction scope, or use Restate for operations that need durable execution across external calls.
pg_stat_statements
Enable pg_stat_statements to identify the queries consuming the most cumulative database time. This is the production equivalent of EXPLAIN ANALYZE for individual queries.
In Docker Compose for development:
services:
postgres:
image: postgres:17
command: >
postgres
-c shared_preload_libraries=pg_stat_statements
-c pg_stat_statements.track=all
-c track_io_timing=on
Then create the extension:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
Find the queries consuming the most database time:
SELECT
substring(query, 1, 200) AS query_preview,
calls,
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 2) AS mean_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
A query with a 2ms mean but 5 million daily calls contributes more total load than a 500ms query called 100 times. Optimise by total time, not mean time.
Redis as a caching layer
Redis adds a shared, network-accessible cache that works across multiple application instances. It also adds an infrastructure dependency, a consistency problem (stale data), and invalidation complexity. Only introduce it when you have measured a real bottleneck that cannot be solved with database optimisation or HTTP caching.
If you are running a single application instance and want in-process caching, consider moka first. It provides an async-aware concurrent cache with TTL support, avoids the network hop, and requires no additional infrastructure.
Setup
The application’s Redis pub/sub setup already includes a ConnectionManager. Reuse it for caching:
use redis::aio::ConnectionManager;
#[derive(Clone)]
struct AppState {
db: sqlx::PgPool,
redis: ConnectionManager,
}
ConnectionManager wraps a MultiplexedConnection with automatic reconnection. It is cheap to clone and safe to share across handlers.
Cache-aside pattern
The application checks the cache first. On a miss, it queries the database, stores the result, and returns it.
use redis::AsyncCommands;
use serde::{de::DeserializeOwned, Serialize};
use std::future::Future;
async fn cache_aside<T, F, Fut>(
redis: &mut ConnectionManager,
key: &str,
ttl_seconds: u64,
fetch_fn: F,
) -> anyhow::Result<T>
where
T: Serialize + DeserializeOwned,
F: FnOnce() -> Fut,
Fut: Future<Output = anyhow::Result<T>>,
{
let cached: Option<String> = redis.get(key).await?;
if let Some(json) = cached {
return Ok(serde_json::from_str(&json)?);
}
let value = fetch_fn().await?;
let json = serde_json::to_string(&value)?;
redis.set_ex(key, &json, ttl_seconds).await?;
Ok(value)
}
Use it in a handler:
async fn get_product(
State(state): State<AppState>,
Path(id): Path<i64>,
) -> Result<impl IntoResponse, AppError> {
let mut redis = state.redis.clone();
let product = cache_aside(
&mut redis,
&format!("product:{id}"),
300,
|| async {
sqlx::query_as!(Product, "SELECT * FROM products WHERE id = $1", id)
.fetch_one(&state.db)
.await
.map_err(Into::into)
},
).await?;
Ok(render_product(&product))
}
Cache invalidation
Invalidation is the hard part. Two practical strategies:
TTL-based expiration. Set a time-to-live and accept that data may be stale for up to that duration. Simple, self-healing, no coordination needed. Choose the TTL based on how stale the data can acceptably be.
Explicit invalidation on write. Delete the cache entry when the underlying data changes:
async fn update_product(
State(state): State<AppState>,
Path(id): Path<i64>,
Form(input): Form<ProductInput>,
) -> Result<impl IntoResponse, AppError> {
sqlx::query!("UPDATE products SET name = $1, price = $2 WHERE id = $3",
input.name, input.price, id)
.execute(&state.db).await?;
let mut redis = state.redis.clone();
let _: () = redis.del(format!("product:{id}")).await?;
Ok(Redirect::to(&format!("/products/{id}")))
}
The practical approach is both: set a TTL as a safety net, and explicitly invalidate on known write paths. The TTL catches any invalidation you missed.
The problem compounds with list queries. When you update a product, which cached list queries include it? Unless you can answer that precisely, you end up invalidating aggressively (clearing all product-related caches on any product write) or accepting staleness. Neither is free.
Graceful degradation
Never let a Redis failure break a request. Fall back to the database:
match redis.get::<_, Option<String>>(&cache_key).await {
Ok(Some(json)) => {
if let Ok(product) = serde_json::from_str(&json) {
return Ok(product);
}
}
Ok(None) => {}
Err(e) => {
tracing::warn!("Redis error, falling back to database: {e}");
}
}
let product = sqlx::query_as!(Product, "SELECT * FROM products WHERE id = $1", id)
.fetch_one(&state.db).await?;
Redis anti-patterns
- Never set keys without a TTL. Unbounded memory growth leads to eviction storms or out-of-memory errors. Always use
set_ex.
- Never use the
KEYS command. It blocks the single-threaded Redis server while scanning the entire keyspace. Use SCAN for iteration.
- Use pipelining for multiple operations. Serial single-operation calls waste round trips:
let mut pipe = redis::pipe();
pipe.get("key1").get("key2").get("key3");
let (v1, v2, v3): (Option<String>, Option<String>, Option<String>) =
pipe.query_async(&mut redis).await?;
Profiling Rust applications
When the techniques above are not enough, or when you need to identify where time is actually being spent, reach for a profiler. The table below covers the tools that work well with async Rust and Axum.
| Need | Tool | Platform | Notes |
| CPU hotspots | cargo-flamegraph | Linux, macOS | Generates interactive SVG flamegraphs. Requires debug = true in release profile. Uses perf on Linux, xctrace on macOS. |
| CPU hotspots (interactive) | samply | Linux, macOS | Opens results in Firefox Profiler’s web UI. Better macOS experience than flamegraph. |
| Heap allocation profiling | dhat | All | Requires a #[global_allocator] swap and feature flag. View results in the DHAT online viewer. |
| Async runtime debugging | tokio-console | All | Terminal UI showing task states, wakeup counts, and poll durations. Requires tokio_unstable cfg flag. Development only. |
| Microbenchmarks | criterion | All | Statistics-driven benchmarking with regression detection. Supports async with the async_tokio feature. |
| Per-request latency | tower-http TraceLayer | All | Already covered in the observability section. Instrument handlers with #[instrument] for function-level timing. |
| Memory growth analysis | heaptrack | Linux | No code changes needed. Uses LD_PRELOAD to intercept allocations. |
The general workflow: start with TraceLayer and #[instrument] spans in the observability section to identify which requests are slow. Use pg_stat_statements and EXPLAIN ANALYZE if the slowness is in the database. Reach for flamegraph or samply when the bottleneck is in application code. Use criterion to benchmark specific functions before and after optimisation.
Practices
Rust Best Practices for Web Development
Rust-specific patterns that come up repeatedly when building web applications with this stack. This section focuses on the Rust angle: ownership patterns in request handlers, async pitfalls, linting configuration, and dependency decisions. Topics that have dedicated sections elsewhere (Tower middleware, database performance, project structure) are summarised here with links to the full treatment.
Ownership and borrowing in web contexts
Shared application state
Axum handlers receive shared application state through the State extractor. Since multiple handlers run concurrently, the state must be wrapped in Arc:
use std::sync::Arc;
use axum::extract::State;
struct AppState {
db: sqlx::PgPool,
config: AppConfig,
}
async fn list_contacts(
State(state): State<Arc<AppState>>,
) -> impl IntoResponse {
let contacts = sqlx::query_as!(Contact, "SELECT * FROM contacts")
.fetch_all(&state.db)
.await?;
}
Register the state once when building the router:
let state = Arc::new(AppState { db: pool, config });
let app = Router::new()
.route("/contacts", get(list_contacts))
.with_state(state);
Use State for application-wide data (database pools, configuration, service handles). Use Extension only for request-scoped data injected by middleware, such as the authenticated user. State is type-safe at compile time; a missing .with_state() call produces a compiler error. Extension is not: a missing .layer(Extension(...)) compiles but panics at runtime.
Axum extractors (Path, Query, Form, Json) deserialise into owned types. Handlers receive owned String, Vec<T>, and struct fields. This matches the request lifecycle: each request is independent, so its data is naturally owned by the handler that processes it.
The practical consequence is that you rarely fight the borrow checker inside handlers. Borrowing becomes relevant for intermediate operations within the handler body, not for the handler’s inputs and outputs.
Extractors that consume the request body (Form, Json, Bytes, String, Multipart) implement FromRequest. Only one body-consuming extractor can appear per handler, and it must be the last parameter. Non-body extractors (Path, Query, State, HeaderMap) implement FromRequestParts and can appear in any order.
async fn update_contact(
State(state): State<Arc<AppState>>,
Path(id): Path<i64>,
Form(data): Form<ContactForm>,
) -> impl IntoResponse {
}
When you need mutability
Mutable shared state is uncommon in web applications. Database pools handle their own synchronisation internally. Configuration is read-only after startup. If you genuinely need mutable shared state, choose based on whether the lock crosses an .await:
- No
.await while holding the lock: Use std::sync::Mutex (or parking_lot::Mutex). Cheaper and simpler.
- Must hold across
.await: Use tokio::sync::Mutex. This is rare and usually a sign the design can be restructured.
let counter = state.counter.lock().unwrap();
let value = *counter + 1;
drop(counter);
Holding a std::sync::Mutex guard across .await produces a !Send future that the compiler will reject. This is the compiler protecting you from a deadlock.
If you find yourself reaching for Arc<Mutex<T>> frequently, consider whether a channel-based design or a database-backed approach is more appropriate.
Clone discipline
Developers coming from garbage-collected languages tend to .clone() defensively. In web handlers, most data is already owned, so cloning is less necessary than it first appears.
Clone when you need a second owner. Do not clone to satisfy the borrow checker when restructuring the code would eliminate the need. Arc::clone is cheap (an atomic increment). Cloning a Vec<String> with thousands of elements is not.
For shared read-only data, wrap in Arc rather than cloning. For values that are sometimes borrowed and sometimes owned, use Cow<'a, str>.
Async Rust pitfalls
Axum runs on the Tokio multi-threaded runtime. The runtime spawns one worker thread per CPU core and uses cooperative scheduling: tasks must yield at .await points. Understanding this model prevents the most common async mistakes.
Blocking the runtime
The single most damaging mistake in async Rust web applications. If you block a runtime thread with synchronous work, every other task scheduled on that thread stalls. With 4 worker threads and each request blocking for 100ms, throughput caps at 40 requests per second regardless of how many tasks are queued.
Blocking operations include:
- Synchronous file I/O (
std::fs)
- CPU-intensive computation (image processing, password hashing, compression)
std::thread::sleep
- Any third-party library call that does not return a future
Wrap blocking work with tokio::task::spawn_blocking:
use tokio::task;
async fn hash_password(password: String) -> Result<String, anyhow::Error> {
task::spawn_blocking(move || {
let salt = SaltString::generate(&mut OsRng);
let hash = Argon2::default()
.hash_password(password.as_bytes(), &salt)?
.to_string();
Ok(hash)
})
.await?
}
spawn_blocking moves the work to a dedicated thread pool that does not interfere with the async runtime.
Holding guards across .await
Any RAII guard (mutex lock, database transaction handle, file handle) held across an .await point blocks the resource for the entire time the task is suspended. Other tasks waiting for that resource stall.
let mut data = state.cache.lock().unwrap();
let result = fetch_from_db(&state.db).await;
data.insert(key, result);
let needs_fetch = {
let data = state.cache.lock().unwrap();
!data.contains_key(&key)
};
if needs_fetch {
let result = fetch_from_db(&state.db).await;
let mut data = state.cache.lock().unwrap();
data.insert(key, result);
}
Scope guards tightly. Drop them before .await.
Send bounds
Axum handlers must return Send futures because the multi-threaded runtime moves tasks between threads. The most common way to produce a !Send future is holding a !Send type (like an Rc or a std::sync::MutexGuard) across an .await. The compiler error messages for this are notoriously unhelpful, but the fix is almost always: restructure so the !Send value is dropped before the .await.
Task starvation
A long-running CPU-bound loop inside an async task starves all other tasks on that thread. Unlike Go’s goroutines, Rust async tasks do not pre-empt. They must voluntarily yield.
For loops that do significant work per iteration, either move the entire operation to spawn_blocking or insert periodic yields:
for item in large_collection {
process(item);
tokio::task::yield_now().await;
}
Cancellation safety
When tokio::select! resolves one branch, all other futures are dropped immediately. If a dropped future was partway through writing data or accumulating state, that work is lost. Use cancel-safe primitives (tokio channels, tokio::time::interval) and keep critical state outside futures that participate in select!.
async fn in traits
Native async fn in traits landed in Rust 1.75 (December 2023). Use it where possible. The limitation: native async trait methods are not dyn-compatible. If you need dyn Trait with async methods, the async-trait crate is still required.
Tower middleware
Tower middleware is covered in detail in the Web Server with Axum section. The key patterns for day-to-day use:
Execution order
Requests flow through layers outside-in; responses return inside-out. With Router::layer(), middleware executes bottom-to-top (the last .layer() call runs first on requests). With tower::ServiceBuilder, middleware executes top-to-bottom. This inversion is a common source of confusion.
Three approaches to custom middleware
axum::middleware::from_fn: Write an async function. No Tower boilerplate. Use this for application-specific concerns (auth checks, request logging, header injection).
- Implement
Layer + Service: Full Tower machinery. Needed when you must wrap the response body or share middleware across non-Axum services.
tower-http crates: For standard concerns (tracing, compression, CORS, timeouts, request IDs), always use tower-http. Do not reimplement these.
Clone requirement
Services must be Clone because Axum clones them for concurrent request handling. Wrap middleware state in Arc to make cloning cheap.
Dependency management
Choosing when to pull in a crate and when to implement yourself is a recurring decision. The general principle: use crates for complex, security-sensitive, or system-level concerns; implement trivial operations yourself.
Use a crate when
- Security is involved: Cryptography, TLS, password hashing, authentication. Getting these wrong has real consequences, and the ecosystem crates (
argon2, rustls, jsonwebtoken) are well-audited.
- The problem is complex and well-solved: Serialisation (
serde), async runtime (tokio), HTTP (hyper/axum), database access (sqlx). These represent thousands of hours of work and extensive testing.
- Unsafe code is required: Crates that isolate
unsafe behind a safe API (database drivers, system interfaces) are doing work you should not duplicate.
Implement yourself when
- The operation is trivial: A few lines of string manipulation or data transformation do not justify a dependency. If you need one function from a crate that brings in 30 transitive dependencies, write the function.
- It is glue code between your types: Conversion traits, domain-specific validation, serialisation adapters between your own types belong in your codebase.
- The crate is heavier than your need: Check
cargo tree to see what a crate pulls in. Prefer lighter alternatives when they exist (futures-lite over futures, ureq over reqwest if you do not need async).
Evaluating crates
Check maintenance status (last release, open issues), transitive dependency count (cargo tree -d), and reverse dependencies on crates.io. Use --no-default-features and enable only what you need. Run cargo-deny in CI to check licenses, known advisories, and duplicate versions. For high-security applications, cargo-vet provides formal dependency auditing.
The rust-deps skill in Claude Code provides crate-specific guidance when you need it.
Rust web application performance is covered in depth in Web Application Performance. The Rust-specific patterns that matter most:
Where Rust helps automatically
- No garbage collection pauses: Consistent p99 latency under sustained load. Memory is freed deterministically at scope boundaries.
- Low per-request overhead: No interpreter, no JIT warmup. The first request is as fast as the millionth.
- Fast serialisation: Serde significantly outperforms JSON libraries in most other languages.
Where Rust does not help
The database is almost always the bottleneck in CRUD applications. Rust’s speed does not compensate for missing indexes, N+1 query patterns, or an undersised connection pool. Run EXPLAIN ANALYZE before optimising Rust code.
- Excessive cloning in hot paths. Use
Arc for shared read-only data, Cow<'a, str> for borrow-or-own scenarios, and Vec::with_capacity() when the size is known.
- Blocking the async runtime (see above). A single blocking call degrades throughput for all concurrent requests.
- Connection pool exhaustion. SQLx pools have a limited number of connections. Under load, handlers queue waiting. Size the pool appropriately (a starting point: 2x CPU cores) and monitor pool wait times.
tokio-console: Real-time async task visualisation. Invaluable for diagnosing task starvation and blocked tasks.
cargo flamegraph: CPU profiling with flame graph output.
dhat: Heap allocation profiling.
Measure before optimising. The most common performance problems are architectural (blocking the runtime, slow queries, pool sizing) rather than language-level (clone costs, allocation patterns).
Code organisation
Project structure is covered in Project Structure. The conventions relevant to daily coding:
Module organisation within a crate
src/
├── handlers/ # Axum handler functions, one file per resource
├── models/ # Domain types, DTOs
├── db/ # Database access functions
├── errors.rs # Crate-specific error types
├── lib.rs # Public API re-exports
└── main.rs # Binary entry point (if applicable)
Route organisation
Each feature module exposes a routes() function. Compose them with Router::nest():
pub fn routes() -> Router<Arc<AppState>> {
Router::new()
.route("/", get(list_contacts).post(create_contact))
.route("/{id}", get(show_contact).put(update_contact))
.route("/{id}/delete", post(delete_contact))
}
let app = Router::new()
.nest("/contacts", contacts::routes())
.nest("/invoices", invoices::routes())
.with_state(state);
When to split into a new crate
Split when you have a clear boundary: a different domain, a different dependency set, or when compilation time for the crate becomes painful. A single crate with well-organised modules is preferable to many tiny crates with unclear boundaries.
Workspace-level lint configuration
Configure lints once in the workspace root Cargo.toml. Each member crate opts in with [lints] workspace = true, giving a single source of truth for the entire project.
[workspace.lints.clippy]
pedantic = { level = "warn", priority = -1 }
unwrap_used = "deny"
expect_used = "deny"
panic = "deny"
dbg_macro = "deny"
todo = "deny"
print_stdout = "deny"
print_stderr = "deny"
await_holding_lock = "deny"
large_futures = "warn"
allow_attributes = "deny"
[workspace.lints.rust]
unsafe_code = "deny"
[lints]
workspace = true
Why deny unwrap, expect, and panic
These three cause the process to abort on failure. In a request handler, that takes down the entire server. Denying them forces proper error handling with ? and explicit error types. This is especially valuable when AI coding agents generate code, as they frequently reach for .unwrap() as the path of least resistance.
The strict deny applies everywhere, including tests and startup. In tests, replace .unwrap() with .expect("test: reason") or return Result from the test function. In main(), use .expect("fatal: reason") with a #[expect(clippy::expect_used)] annotation (see below).
Prefer #[expect] over #[allow]
When you genuinely need to suppress a lint, use #[expect(clippy::lint_name)] instead of #[allow(clippy::lint_name)]. The difference: #[expect] triggers a warning if the lint it suppresses is no longer produced. This means suppression annotations do not silently outlive their usefulness.
#[expect(clippy::expect_used, reason = "fatal if database URL is missing")]
fn main() {
let db_url = std::env::var("DATABASE_URL")
.expect("DATABASE_URL must be set");
}
#[allow(clippy::expect_used)]
fn main() { }
The allow_attributes = "deny" lint in the workspace configuration enforces this. Any #[allow(...)] produces a compiler error, requiring #[expect(...)] instead. This is particularly effective when working with AI coding agents, which tend to silence errors with #[allow] rather than fixing the underlying issue.
Commonly suppressed pedantic lints
The pedantic group is worth enabling but includes lints that produce noise in web application code. Suppress these per crate as needed:
[lints.clippy]
module_name_repetitions = "allow"
must_use_candidate = "allow"
missing_errors_doc = "allow"
missing_panics_doc = "allow"
clippy.toml
Create a clippy.toml at the workspace root for threshold-based configuration:
cognitive-complexity-threshold = 15
type-complexity-threshold = 200
too-many-lines-threshold = 100
rustfmt
Use default rustfmt settings. The value of consistent formatting across the Rust ecosystem outweighs individual preferences. A minimal rustfmt.toml if needed:
edition = "2021"
max_width = 100
use_field_init_shorthand = true
CI integration
Run both in CI to enforce standards:
cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings
The -D warnings flag promotes all warnings to errors, so lint violations fail the build.
Gotchas
- Axum’s
#[debug_handler] produces clear compiler errors when handler signatures are wrong. Use it during development, remove it before release.
- Turbofish syntax (
::<Type>) is needed more often than you might expect with SQLx queries. When query_as cannot infer the output type, be explicit: query_as::<_, Contact>(...).
- Feature flag accumulation: Cargo features are additive and cannot be disabled by dependents. If crate A enables
tokio/full and crate B only needs tokio/rt, the workspace gets tokio/full. Audit features with cargo tree -f '{p} {f}'.
- Compilation times: Rust compilation is slow. Use
cargo-watch for incremental rebuilds, the mold linker on Linux (or the default linker on macOS, which is already fast), and split your workspace into crates so only changed code recompiles.
- Error messages from trait bounds: When Axum rejects a handler, the error often refers to missing trait implementations deep in Tower’s type system. Enable
#[debug_handler] first. If the error persists, check that all extractors are in the right order and that the return type implements IntoResponse.
impl IntoResponse hides the type: Returning impl IntoResponse from handlers is ergonomic but means you cannot name the return type elsewhere. If you need to store handlers in a collection or return them from a function, use axum::response::Response as the concrete type.
Building with AI Coding Agents
AI coding agents work better with Rust than most developers expect. The compiler provides deterministic, actionable feedback at every step: propose code, compile, read the error, revise. In dynamically typed languages, an agent can produce code that runs but is subtly wrong. In Rust, the borrow checker, type system, and lifetime rules catch entire categories of errors before the code ever executes. This feedback loop turns the agent’s stochastic generation into a convergence process. The strictness that makes Rust harder to learn makes it easier for agents to get right.
That advantage has limits. Agents hallucinate crate names and API surfaces, produce code with security vulnerabilities at high rates, and make architectural decisions that compile but are not what you want. This section covers how to structure your project for effective agent collaboration, what agents handle well, where human judgment is essential, and how to review what they produce.
Structuring your project for AI agents
AI coding agents work from context. The more precise the context, the better the output. Without project-specific instructions, an agent falls back on its training data, which contains code from every version of every crate, every architectural style, and every level of quality. Give it a narrow lane to work in.
The CLAUDE.md file
Place a CLAUDE.md file in your project root. Claude Code reads this file automatically at the start of every session. Other tools have their own conventions (.cursor/rules/*.mdc for Cursor, .github/copilot-instructions.md for GitHub Copilot), but the content principles are the same.
Keep it lean. Target under 200 lines. The file is injected into the agent’s context window alongside its system prompt and your conversation, so every line competes for attention. A 500-line instruction file dilutes the instructions that matter.
Structure the file around three concerns:
What this project is. The tech stack, architectural style, and key crates with versions.
## Stack
- Axum 0.8 (web framework)
- Maud 0.26 (HTML templating)
- SQLx 0.8 (database, compile-time checked queries, Postgres)
- htmx 2.0 (interactivity)
- tower-sessions 0.14 (session management)
How to build and verify. The commands an agent needs to check its own work.
## Commands
- `cargo check` — fast type checking
- `cargo clippy --all-targets -- -D warnings` — lint with warnings as errors
- `cargo test --workspace` — full test suite
- `cargo sqlx prepare --workspace` — regenerate SQLx query cache
What conventions to follow. Error handling patterns, module organisation, naming conventions, anything the agent would get wrong without guidance.
## Conventions
- Error handling: thiserror for library crates, anyhow for application crates
- All handlers return Result<impl IntoResponse, AppError>
- HTML fragments for htmx requests, full pages for normal requests
- British English in user-facing strings
Hierarchical instruction files
For larger projects, place additional CLAUDE.md files in subdirectories. A file in crates/web/CLAUDE.md provides context specific to the web crate without cluttering the root file. Claude Code merges these automatically when working in that directory.
This mirrors how a team onboards a new developer: general project context first, then module-specific conventions as they start working in a particular area.
What to leave out
Do not duplicate what tools already enforce. Formatting rules belong in rustfmt.toml. Lint configuration belongs in clippy.toml or Cargo.toml lint sections. The instruction file covers what the agent cannot infer from tooling configuration.
Avoid task-specific instructions. “When writing a new handler, always add a test” is a good instruction. “Add a handler for /users/{id}/edit that returns an edit form” is a task, not a convention. Tasks belong in your conversation with the agent, not in the instruction file.
AGENTS.md is an emerging cross-tool standard backed by the Linux Foundation’s Agentic AI Foundation, with support from Claude Code, Cursor, Copilot, Codex, and others. If your team uses multiple AI tools, an AGENTS.md file provides a single source of project context that all tools read. The content guidance is identical to what is described above for CLAUDE.md.
| Tool | Instruction file | Notes |
| Claude Code | CLAUDE.md | Hierarchical, nested in subdirectories |
| Cursor | .cursor/rules/*.mdc | YAML frontmatter with activation modes |
| GitHub Copilot | .github/copilot-instructions.md | Supported since late 2024 |
| Cross-tool | AGENTS.md | Linux Foundation backed, 60k+ projects |
Using this guide as agent context
This guide is designed to work as context for AI coding agents. Each section is self-contained, uses explicit file paths and crate names, and avoids implied knowledge that requires reading other sections first.
When starting a task, give the agent the relevant section from this guide alongside your project’s instruction file. If you are implementing authentication, provide the authentication section. If you are setting up deployment, provide the deployment section. The agent gets current, opinionated guidance instead of drawing on its training data, which may reference deprecated APIs or different architectural patterns.
For Claude Code specifically, this works through the instruction file hierarchy. Reference sections by linking to them or by including the key patterns inline in a subdirectory CLAUDE.md:
## Auth patterns
- Session-based auth with tower-sessions and sqlx-store
- See the project guide's authentication section for implementation details
- Password hashing: argon2 crate, never store plaintext
- CSRF: tower-csrf middleware on all state-changing endpoints
The goal is not to paste entire sections into context. It is to give the agent enough anchoring information that it produces code consistent with your chosen patterns rather than inventing its own.
Writing effective prompts for Rust web development
Prompting an AI agent for Rust web development is more constrained than prompting for Python or JavaScript. The type system defines a narrow space of valid programs, and the more of that space you specify upfront, the better the output.
Specify crate versions
The single most impactful habit. Agents draw on training data spanning multiple years of crate releases. Axum 0.7 and 0.8 have different APIs. SQLx 0.7 and 0.8 changed their macro syntax. Stating the version explicitly prevents the agent from generating code for the wrong API surface.
Put versions in your instruction file so you do not repeat them in every prompt. When they appear in context, the agent uses them consistently.
Provide type signatures
Rust’s type system constrains solutions. Providing an explicit function signature gives the agent a precise target:
“Write a handler with this signature: async fn create_user(State(pool): State<PgPool>, Form(payload): Form<CreateUserForm>) -> Result<impl IntoResponse, AppError>. It should insert the user into the users table and redirect to /users/{id}.”
This is more effective than “write a handler that creates a user” because the agent does not need to guess the extractor types, error handling approach, or return type.
Show one example, ask for variations
Instead of describing a pattern from scratch, show the agent one working handler, test, or component and ask for similar ones. The agent matches the style, error handling, and conventions of the example rather than inventing its own. This produces more consistent codebases than generating each piece independently.
Agents default to generating full HTML pages. For htmx-driven applications, most handlers return HTML fragments. Be explicit: “This handler responds to an htmx request and returns an HTML fragment. Do not include <html>, <head>, or <body> tags.”
Provide the database schema
Include the relevant CREATE TABLE statements when asking for database-related code. This prevents the agent from hallucinating column names, types, or relationships. For sqlx::query_as! macros, the agent cannot run compile-time verification itself, so the schema serves as the source of truth.
Use iterative refinement
Ask the agent to review its own output before you accept it. “Review the code you just wrote for non-idiomatic Rust patterns, unnecessary allocations, and missing error cases. Fix any issues you find.” The OpenSSF’s security-focused guide for AI code assistant instructions specifically recommends this recursive self-review pattern over telling the agent it is an expert.
Patterns AI agents handle well
Agents perform best on tasks where the type system constrains the solution space and the pattern is well-represented in training data.
CRUD handlers. Standard create, read, update, delete operations with Axum extractors and SQLx queries. The combination of typed extractors, parameterised queries, and structured return types leaves little room for the agent to go wrong.
Trait implementations. Generating impl blocks for Display, From, Serialize, Deserialize, IntoResponse, and similar traits. The compiler defines the expected shape precisely.
Test scaffolding. Given a function signature and expected behaviour, agents produce solid test structures. Rust’s #[cfg(test)] module pattern and assert! macros are well-represented in training data. Review the assertions for correctness, since a test that always passes proves nothing.
Boilerplate and repetitive code. Migration files, configuration parsing, middleware setup, route registration. These follow established patterns with little variation.
Explaining compiler errors. When you hit a confusing borrow checker or lifetime error, asking the agent to explain it is often faster than searching for the error code. Current models understand Rust’s ownership semantics well enough to give accurate explanations.
Multi-file consistency. Agents that operate across files (Claude Code, Cursor, Windsurf) maintain synchronisation between handler definitions, route registration, and type declarations. This is one area where agents save significant manual coordination effort.
Areas where human judgment is needed
Agents generate code that compiles. Compiling is necessary but not sufficient. These areas require active human judgment, not just verification that the build passes.
Architectural decisions
Agents optimise for completing the immediate task. They do not consider how a piece of code fits into the broader system. An agent will happily put everything in main.rs if you do not specify a module structure. It will create a new database connection pool per request if you do not show it the shared state pattern. Architectural decisions, where to draw module boundaries, how to structure the workspace, when to extract a crate, remain human responsibilities.
Crate selection
Agents hallucinate crate names. They recommend crates that do not exist, suggest deprecated crates, or use the wrong crate for the job. In Shuttle’s 2025 testing of seven AI tools on the same Rust project, hallucinated crate versions were the most consistently reported problem across all tools. Always verify that a suggested crate exists on crates.io and check its maintenance status before adding it to Cargo.toml.
Ownership and lifetime design
Agents can fix individual borrow checker errors, but they sometimes fix them by adding unnecessary .clone() calls or wrapping things in Arc<Mutex<>> when a simpler restructuring would work. The resulting code compiles but carries hidden performance costs and obscures the ownership model. When an agent adds a clone to satisfy the compiler, consider whether the data flow should be restructured instead.
Agents produce code that is functionally correct but not necessarily performant. Hidden allocations (.collect::<Vec<_>>() when streaming would work), blocking calls in async contexts, holding locks across await points, these compile and pass tests but degrade under load. In hot paths, review the generated code for unnecessary allocations and synchronisation overhead.
Error handling granularity
Agents tend toward two extremes: either they use .unwrap() everywhere or they create overly granular error types for every possible failure. Neither is appropriate. Error handling requires judgment about which failures the caller can handle, which should propagate, and which need logging.
Sensitive business logic
Authorisation rules, pricing calculations, data retention policies, anything where a subtle bug has business consequences beyond a 500 error. These require understanding the domain, not just the type system. Use agents to generate the scaffolding, then write the core logic yourself or review it with particular care.
Review practices for AI-generated Rust code
Rust’s toolchain provides a review pipeline that catches more issues automatically than any other mainstream language. Use it.
The automated pipeline
Run these checks on every piece of AI-generated code, in order:
-
cargo check — Fast type checking without full compilation. Catches type errors, borrow checker violations, and missing trait implementations. If this fails, the code has fundamental problems. Send the error back to the agent.
-
cargo clippy --all-targets --all-features -- -D warnings — Clippy provides over 600 lints. It catches non-idiomatic patterns, common performance mistakes, and correctness issues. Treat warnings as errors. If clippy flags something, fix it before proceeding.
-
cargo test --workspace — Run the full test suite. If the agent wrote tests, verify that they actually test meaningful behaviour. A test that asserts true == true passes but proves nothing.
What to look for in human review
After the automated pipeline passes, review the code for issues that tools cannot catch:
Unnecessary clones and allocations. Agents satisfy the borrow checker by adding .clone() where restructuring the data flow would be better. Look for clones of large types, clones inside loops, and String allocations where &str would suffice.
Over-engineering. Agents sometimes introduce unnecessary traits, generic parameters, or abstraction layers for code that does one thing. Three lines of straightforward code is better than a generic trait with one implementor.
Hidden unwrap() calls. Search generated code for .unwrap() and .expect(). In handler code, these cause panics that crash the request (or worse, the server). They should be replaced with proper error propagation using ? and typed errors.
Stale or hallucinated dependencies. Check Cargo.toml changes. Verify that any new crate the agent added actually exists, is maintained, and is the right tool for the job. Check the version number against crates.io.
SQL query correctness. SQLx’s compile-time checking validates syntax and types against the database schema, but it does not verify business logic. A query that returns the wrong rows or updates the wrong records compiles fine. Read the SQL.
Security review
AI-generated code contains security vulnerabilities at rates that warrant systematic review. Veracode’s 2025 study across 100+ LLMs found security flaws in 45% of generated code. The Stanford study found that developers using AI assistants wrote less secure code while believing it was more secure.
Rust’s memory safety eliminates one category of vulnerabilities (buffer overflows, use-after-free, null pointer dereferences), but does nothing for application-level security: SQL injection, XSS, hardcoded secrets, improper access control, information leakage through error messages.
See the web application security section for a thorough treatment of security practices and a review checklist specific to AI-generated code. The short version: never trust that generated code handles user input safely, always verify that secrets come from environment variables, and check that error responses do not leak internal details.
The research-plan-implement workflow
The techniques above address individual prompts and reviews. The broader question is how to organise an entire feature’s worth of AI-assisted work. A pattern that works well: split the work into three phases, each with its own fresh context.
Research. Explore the codebase, identify existing patterns, map the relevant types and modules. Compress the findings into a focused summary. This phase is about understanding what exists before deciding what to change. The agent is good at this: reading files, tracing call chains, summarising structure. The output is a short document, not code.
Plan. Using the research summary as input, produce a detailed execution plan: which files to create or modify, in what order, with what interfaces. Include test criteria and references to specific code locations. Human review at this stage is high leverage. Catching an architectural mistake in a plan costs one line of editing. Catching it after implementation costs a rework cycle.
Implement. Feed the plan and only the necessary source files into a fresh context. Work in chunks, testing between steps. The plan constrains the agent’s decisions, reducing the chance of it inventing its own architecture or drifting from the intended design.
Each phase starts with a clean context window. This matters because of context rot: agent performance degrades as the context fills with stale conversation history, abandoned approaches, and accumulated noise. Research suggests reasoning quality peaks around 40% context window utilisation. Long, sprawling sessions where research, planning, and implementation all happen in one thread produce worse results than short, focused sessions with clear inputs.
The key insight is that human leverage is highest at the research and planning stages, not at the code level. A wrong assumption in research multiplies into dozens of wrong lines of code. A plan that specifies the wrong module boundary produces a coherent but misguided implementation. Catch errors early, where they are cheap to fix.
Gotchas
Hallucinated crate versions are the most common problem. Agents confidently generate code using APIs from older versions of Axum, SQLx, tokio, and other rapidly evolving crates. Specifying versions in your instruction file mitigates this but does not eliminate it. Always verify that generated code uses the current API surface.
Agents break working code on subsequent edits. A common failure mode: the agent writes correct code for a feature, then on a later edit to the same file, modifies or deletes the earlier code. Review diffs carefully, not just the new code. Use version control to catch regressions.
Tests generated by agents need review. Agents produce tests that compile and pass but sometimes test the wrong thing or test trivial properties. A test for a create-user handler that never checks whether the user was actually persisted to the database is worse than useless, since it provides false confidence.
Agents fight the borrow checker with brute force. When an agent encounters a lifetime or borrowing error, it sometimes adds Arc<Mutex<>> wrapping, unnecessary clones, or 'static lifetime bounds rather than restructuring the code. The result compiles but is not idiomatic and may have performance implications. If an agent’s fix involves wrapping something in Arc<Mutex<>> that was not originally behind one, ask why the ownership model needs shared mutable state.
Context window limits affect large projects. Rust projects with deep module trees and many crates can exceed what an agent can hold in context. When working on a large workspace, guide the agent to the specific crates and files relevant to the task rather than expecting it to understand the entire project.
Further reading
These posts explore the practices touched on in this section in more depth:
- AI Engineer vs. Sloperator — The distinction between producing quality code with AI tools and generating slop. Covers context rot, the research-plan-implement workflow, and how to configure projects for agent collaboration.
- Context Engineering Is the Job — Context engineering as the core discipline of working with LLMs. How to gather, curate, and manage the information that goes into each generation step.
- Thinking in Plans, Not Code — Progressive refinement from requirements through detailed plans before implementation. Why the planning phase, not the coding phase, is where quality is determined.
- Code Review in AI-Augmented Development — How code review changes when AI generates the code. Right-sizing work units, reviewing plans before code, and triaging review effort toward high-risk areas.