Hypermedia-Driven Applications with Rust

A practical guide to building full-stack web applications in Rust using hypermedia-driven architecture
Generated on February 27, 2026

Table of Contents

Getting Started
  1. Development Environment
  2. Project Structure
Architecture
  1. Why Hypermedia-Driven Architecture
  2. The Web Platform Has Caught Up
  3. SPA vs HDA: A Side-by-Side Comparison
  4. When to Use HDA (and When Not To)
Core Stack
  1. Web Server with Axum
  2. HTML Templating with Maud
  3. Interactivity with HTMX
  4. CSS Without Frameworks
Data
  1. Database with PostgreSQL and SQLx
  2. Database Migrations
  3. Search
  4. Semantic Search
Auth & Security
  1. Authentication
  2. Authorization
  3. Web Application Security
Forms & Errors
  1. Form Handling and Validation
  2. Error Handling
Integrations
  1. Server-Sent Events and Real-Time Updates
  2. HTTP Client and External APIs
  3. Background Jobs and Durable Execution with Restate
  4. AI and LLM Integration
Infrastructure
  1. File Storage
  2. Email
  3. Configuration and Secrets
  4. Observability
Operations
  1. Testing
  2. Continuous Integration and Delivery
  3. Deployment
  4. Web Application Performance
Practices
  1. Rust Best Practices for Web Development
  2. Building with AI Coding Agents

Getting Started

Development Environment

Run Rust natively on your host machine. Run backing services (PostgreSQL, Valkey, Restate, RustFS, MailCrab) in Docker containers. This separation keeps your edit-compile-run cycle fast while giving you disposable, reproducible infrastructure.

Rust Toolchain

Install rustup, which manages your Rust compiler, standard library, and development tools.

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

The default installation profile includes rustc, cargo, clippy, and rustfmt. Add rust-analyzer (the language server) and rust-src (standard library source, needed for full rust-analyzer functionality) separately:

rustup component add rust-analyzer rust-src

Verify the installation:

rustc --version
cargo --version

Keep everything current with rustup update. Rust releases a new stable version every six weeks.

What each tool does

  • rustc compiles Rust source code. You rarely invoke it directly; cargo handles it.
  • cargo builds, tests, runs, and manages dependencies. It is the entry point for nearly every Rust workflow.
  • clippy is the official linter. Run cargo clippy to catch common mistakes and non-idiomatic patterns.
  • rustfmt formats code to a consistent style. Run cargo fmt to format, cargo fmt -- --check to verify without modifying files.
  • rust-analyzer provides IDE features (completions, diagnostics, go-to-definition, refactoring) via the Language Server Protocol. Any editor or AI coding agent with LSP support can use it.

Backing Services with Docker Compose

The application depends on five external services during development. Run them in containers so they are disposable and require no host-level installation.

ServiceImagePortsPurpose
PostgreSQLpostgres:18-alpine5432Primary database
Valkeyvalkey/valkey:9-alpine6379Pub/sub and caching
Restatedocker.restate.dev/restatedev/restate:latest8080, 9070, 9071Durable execution engine
RustFSrustfs/rustfs:latest9000, 9001S3-compatible object storage
MailCrabmarlonb/mailcrab:latest1025 (SMTP), 1080 (Web UI)Email capture for testing

Create compose.yaml at the project root:

services:
  postgres:
    image: postgres:18-alpine
    ports:
      - "5432:5432"
    environment:
      POSTGRES_USER: app
      POSTGRES_PASSWORD: secret
      POSTGRES_DB: app_dev
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app"]
      interval: 5s
      timeout: 5s
      retries: 5

  valkey:
    image: valkey/valkey:9-alpine
    ports:
      - "6379:6379"
    volumes:
      - valkeydata:/data

  restate:
    image: docker.restate.dev/restatedev/restate:latest
    ports:
      - "8080:8080"
      - "9070:9070"
      - "9071:9071"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    volumes:
      - restatedata:/target

  rustfs:
    image: rustfs/rustfs:latest
    command: server /data --console-address ":9001"
    ports:
      - "9000:9000"
      - "9001:9001"
    environment:
      RUSTFS_ROOT_USER: minioadmin
      RUSTFS_ROOT_PASSWORD: minioadmin
    volumes:
      - rustfsdata:/data

  mailcrab:
    image: marlonb/mailcrab:latest
    ports:
      - "1080:1080"
      - "1025:1025"

volumes:
  pgdata:
  valkeydata:
  restatedata:
  rustfsdata:

Start all services:

docker compose up -d

Stop containers (data persists in named volumes):

docker compose down

Stop and destroy everything, including data:

docker compose down -v

Service notes

Valkey is the BSD-licensed fork of Redis, maintained by the Linux Foundation. It is fully API-compatible with Redis, so any Redis client library works without changes. The guide uses Valkey because its licence is unambiguous.

Restate is a durable execution engine for reliable background work, workflows, and agentic AI. The extra_hosts entry allows Restate (running inside Docker) to reach your application (running on the host) via host.docker.internal. Use this hostname instead of localhost when registering service deployments with the Restate admin API on port 9070.

RustFS is an S3-compatible object storage server written in Rust, licensed under Apache 2.0. It replaces MinIO, which entered maintenance mode in December 2025. RustFS is still in alpha but functional for local development. Its web console is available at http://localhost:9001.

MailCrab captures all email sent to it. Configure your application’s SMTP to point at localhost:1025, then view captured messages at http://localhost:1080. No email leaves your machine.

Docker runtime

Any Docker-compatible runtime works: Docker Desktop, OrbStack (macOS), Colima (macOS/Linux), or Podman. The docker compose commands behave identically across all of them.

cargo xtask

cargo xtask is a convention for writing project automation as a Rust binary inside your workspace. Instead of shell scripts or Makefiles, your build tasks are Rust code: checked by the compiler, cross-platform, and requiring no external tooling beyond cargo.

The pattern works by defining a cargo alias that runs a dedicated crate.

Setup

Create the alias in .cargo/config.toml:

[alias]
xtask = "run --package xtask --"

Add an xtask crate to your workspace. In the root Cargo.toml:

[workspace]
resolver = "2"
members = ["app", "xtask"]
default-members = ["app"]

default-members prevents cargo build and cargo test from compiling the xtask crate unless explicitly requested.

Create xtask/Cargo.toml:

[package]
name = "xtask"
version = "0.1.0"
edition = "2024"
publish = false

[dependencies]
clap = { version = "4", features = ["derive"] }
xshell = "0.2"
anyhow = "1"

xshell provides shell-like command execution without invoking an actual shell. Variable interpolation is safe by construction, preventing injection.

Create xtask/src/main.rs:

use std::process::ExitCode;

use anyhow::Result;
use clap::{Parser, Subcommand};
use xshell::{cmd, Shell};

#[derive(Parser)]
#[command(name = "xtask")]
struct Cli {
    #[command(subcommand)]
    command: Command,
}

#[derive(Subcommand)]
enum Command {
    /// Start backing services and the dev server
    Dev,
    /// Run database migrations
    Migrate,
    /// Run all CI checks locally
    Ci,
}

fn main() -> ExitCode {
    let cli = Cli::parse();
    let result = match cli.command {
        Command::Dev => dev(),
        Command::Migrate => migrate(),
        Command::Ci => ci(),
    };
    match result {
        Ok(()) => ExitCode::SUCCESS,
        Err(e) => {
            eprintln!("error: {e:?}");
            ExitCode::FAILURE
        }
    }
}

fn dev() -> Result<()> {
    let sh = Shell::new()?;
    cmd!(sh, "docker compose up -d").run()?;
    cmd!(sh, "bacon run").run()?;
    Ok(())
}

fn migrate() -> Result<()> {
    let sh = Shell::new()?;
    cmd!(sh, "cargo sqlx migrate run").run()?;
    Ok(())
}

fn ci() -> Result<()> {
    let sh = Shell::new()?;
    cmd!(sh, "cargo fmt --all -- --check").run()?;
    cmd!(sh, "cargo clippy --all-targets -- -D warnings").run()?;
    cmd!(sh, "cargo nextest run").run()?;
    Ok(())
}

Run tasks with:

cargo xtask dev       # start services + dev server
cargo xtask migrate   # run database migrations
cargo xtask ci        # fmt check, clippy, tests

Add subcommands as your project grows. Common additions: seed (populate development data), reset (drop and recreate the database), build-css (run lightningcss processing).

Editor Configuration

Any editor with Language Server Protocol support works for Rust development. Install the rust-analyzer extension or plugin for your editor of choice.

The following rust-analyzer settings matter for this stack. Apply them through your editor’s LSP configuration.

{
  "rust-analyzer.check.command": "clippy",
  "rust-analyzer.procMacro.enable": true,
  "rust-analyzer.cargo.buildScripts.enable": true,
  "rust-analyzer.check.allTargets": true
}

check.command: "clippy" runs clippy instead of cargo check on save, giving you lint feedback inline. Slightly slower on large workspaces, but the additional warnings are worth it.

procMacro.enable: true is critical for this stack. Maud’s html! macro, serde’s derive macros, and SQLx’s query! macro are all procedural macros. Without this setting, rust-analyzer cannot expand them, resulting in false errors and missing completions inside macro invocations.

cargo.buildScripts.enable: true ensures build scripts run during analysis. SQLx’s compile-time query checking depends on this.

check.allTargets: true includes tests, examples, and benchmarks in diagnostic checking.

Fast Iteration

bacon

bacon watches your source files and runs cargo commands on every change. It replaces the older cargo-watch, which is no longer actively developed (its maintainer recommends bacon).

Install it:

cargo install --locked bacon

Run it:

bacon           # defaults to cargo check
bacon clippy    # run clippy on every change
bacon test      # run tests on every change
bacon run       # build and run on every change

bacon provides a TUI with sorted, filtered diagnostics. Press t to switch to tests, c to switch to clippy, r to run the application. The full set of keyboard shortcuts is shown in the interface.

For project-specific jobs, create a bacon.toml at the workspace root:

[jobs.run]
command = ["cargo", "run"]
watch = ["src"]

[jobs.test-integration]
command = ["cargo", "nextest", "run", "--test", "integration"]
watch = ["src", "tests"]

Linking

On Linux with Rust 1.90+, the compiler uses lld (the LLVM linker) by default. This is significantly faster than the traditional system linker and requires no configuration.

On macOS, Apple’s default linker is adequate. No special setup is needed.

Incremental compilation

Cargo enables incremental compilation by default for debug builds. After the initial compile, changing a single file typically triggers a rebuild of only the affected crate and its dependents.

Two practices keep incremental rebuilds fast:

  • Split your workspace into focused crates. A change in one crate does not recompile unrelated crates. The Project Structure section covers this in detail.
  • Keep macro-heavy code in leaf crates. Procedural macro expansion is one of the slower compilation phases. Isolating it limits the rebuild radius.

cargo-nextest

cargo-nextest is a test runner that executes tests in parallel across separate processes. It is noticeably faster than cargo test on projects with more than a handful of tests, and its output is easier to read.

cargo install --locked cargo-nextest
cargo nextest run

Doctests are not supported by nextest. Run them separately with cargo test --doc.

Project Structure

A Cargo workspace groups multiple crates under a single Cargo.lock and shared target/ directory. Each crate has its own Cargo.toml and its own dependency list, which means the compiler enforces boundaries between crates: if crates/domain/Cargo.toml does not list sqlx, no code in that crate can import it. This is not a convention. It is a compilation error.

Splitting a project into workspace crates gives you faster incremental builds (changing one crate does not recompile unrelated ones), enforced dependency boundaries, and a clear map of what depends on what.

Workspace layout

Use a virtual manifest, a root Cargo.toml that contains [workspace] but no [package]. All application crates live under crates/:

my-app/
  Cargo.toml              # virtual manifest (workspace root)
  Cargo.lock
  .cargo/
    config.toml            # cargo aliases (xtask)
  crates/
    server/                # binary: composition root
    web/                   # library: Axum handlers, routing, middleware
    db/                    # library: SQLx queries and database access
    domain/                # library: shared types, business logic
    config/                # library: environment variable parsing
    jobs/                  # library: Restate durable execution handlers
    xtask/                 # binary: build automation (dev, migrate, ci)
  migrations/              # SQLx migration files
  compose.yaml             # backing services (Postgres, Valkey, etc.)
  .env                     # local environment variables

The flat crates/* layout is the simplest approach. Cargo’s crate namespace is flat, so hierarchical folder structures (like crates/libs/ and crates/services/) add visual complexity that does not map to anything Cargo understands. Put everything under crates/ and use the crate names to communicate purpose.

What each crate does

server is the binary crate and the composition root. It depends on every other crate and wires them together at startup: builds the database pool, constructs the Axum router, starts the HTTP listener. This is the only crate that sees the full dependency graph.

web contains Axum handlers, route definitions, middleware configuration, and Maud templates. It depends on domain for shared types and on db for data access. All HTTP-facing code lives here.

db owns all database access. SQLx queries, connection pool management, and result-to-type mappings belong in this crate. It depends on domain for the types that queries return.

domain holds types and logic shared across the application: entity structs, error enums, validation rules, and any business logic that is not tied to a specific framework. This crate should have minimal dependencies. It does not depend on Axum, SQLx, or any infrastructure crate.

config parses environment variables into typed configuration structs at startup. It depends on serde and dotenvy, not on framework crates.

jobs contains Restate service and workflow handlers for durable background work. It depends on domain and db, but not on web. Jobs are triggered by HTTP handlers but execute independently.

xtask is the build automation crate. The Development Environment section covers its setup in detail.

Root Cargo.toml

The workspace root defines shared settings, dependency versions, and lint configuration for all members.

[workspace]
members = ["crates/*"]
resolver = "3"

[workspace.package]
edition = "2024"
version = "0.1.0"
rust-version = "1.85"

[workspace.dependencies]
# Async runtime
tokio = { version = "1", features = ["rt-multi-thread", "macros", "signal"] }

# Web framework
axum = "0.8"
tower = "0.5"
tower-http = { version = "0.6", features = ["trace", "compression-gzip"] }
tower-sessions = "0.14"

# HTML templating
maud = { version = "0.26", features = ["axum"] }

# Serialisation
serde = { version = "1", features = ["derive"] }
serde_json = "1"

# Database
sqlx = { version = "0.8", default-features = false, features = [
  "runtime-tokio", "postgres", "macros", "migrate",
] }

# Error handling
thiserror = "2"
anyhow = "1"

# Observability
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }

# Config
dotenvy = "0.15"

# HTTP client
reqwest = { version = "0.12", default-features = false, features = [
  "rustls-tls", "json",
] }

# Internal crates
app-server = { path = "crates/server" }
app-web = { path = "crates/web" }
app-db = { path = "crates/db" }
app-domain = { path = "crates/domain" }
app-config = { path = "crates/config" }
app-jobs = { path = "crates/jobs" }

[workspace.lints.rust]
unsafe_code = "forbid"
rust_2018_idioms = { level = "warn", priority = -1 }
unreachable_pub = "warn"

[workspace.lints.clippy]
enum_glob_use = "warn"
implicit_clone = "warn"
dbg_macro = "warn"

Workspace dependencies

The [workspace.dependencies] table defines dependency versions once. Member crates reference them with workspace = true:

# crates/web/Cargo.toml
[package]
name = "app-web"
edition.workspace = true
version.workspace = true

[lints]
workspace = true

[dependencies]
axum.workspace = true
maud.workspace = true
tower.workspace = true
tower-http.workspace = true
tower-sessions.workspace = true
serde.workspace = true
tracing.workspace = true
app-domain.workspace = true
app-db.workspace = true

Members can add features on top of the workspace definition. Features are additive: you can add but not remove them.

# crates/db/Cargo.toml
[dependencies]
sqlx = { workspace = true, features = ["uuid", "time"] }

Workspace lints

The [workspace.lints] table shares lint configuration across all crates. Each member opts in with [lints] workspace = true. The example above forbids unsafe code project-wide and enables several useful Clippy lints.

Workspace package metadata

[workspace.package] avoids repeating edition, version, and rust-version in every crate. Members inherit with edition.workspace = true, and so on. Only unpublished, internal crates should share a version this way. If you publish crates to crates.io, give them independent version numbers.

The dependency graph

The crate dependency graph for this layout looks like this:

server ──→ web ──→ domain
  │         │
  │         └────→ db ──→ domain
  │
  ├──────→ db
  ├──────→ config
  ├──────→ jobs ──→ domain
  │         │
  │         └────→ db
  └──────→ domain

domain sits at the bottom with no framework dependencies. db depends on domain and sqlx. web depends on domain, db, and axum. server depends on everything and wires it all together.

This graph is enforced by Cargo.toml files. If someone adds an axum import to the domain crate, the compiler rejects it. No linting rules or code review discipline required.

The domain crate

Keep domain lean. It holds types that multiple crates need: entity structs, ID types, error enums, validation logic. It depends on serde for serialisation and thiserror for error types. It does not depend on Axum, SQLx, Maud, or Tokio.

# crates/domain/Cargo.toml
[package]
name = "app-domain"
edition.workspace = true
version.workspace = true

[lints]
workspace = true

[dependencies]
serde.workspace = true
thiserror.workspace = true

A typical domain crate:

use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Contact {
    pub id: i64,
    pub name: String,
    pub email: String,
}

#[derive(Debug, Deserialize)]
pub struct CreateContact {
    pub name: String,
    pub email: String,
}

#[derive(Debug, thiserror::Error)]
pub enum ContactError {
    #[error("contact not found")]
    NotFound,
    #[error("email already exists")]
    DuplicateEmail,
}

Other crates import these types. The db crate maps SQL rows to Contact. The web crate uses CreateContact to deserialise form submissions. Neither crate defines these types itself, so there is a single source of truth.

The server crate

The binary crate has one job: connect everything and start listening.

# crates/server/Cargo.toml
[package]
name = "app-server"
edition.workspace = true
version.workspace = true

[lints]
workspace = true

[dependencies]
tokio.workspace = true
axum.workspace = true
tracing.workspace = true
tracing-subscriber.workspace = true
app-web.workspace = true
app-db.workspace = true
app-config.workspace = true
use anyhow::Result;
use tracing_subscriber::EnvFilter;

#[tokio::main]
async fn main() -> Result<()> {
    tracing_subscriber::fmt()
        .with_env_filter(EnvFilter::from_default_env())
        .init();

    let config = app_config::load()?;
    let db = app_db::connect(&config.database_url).await?;
    let app = app_web::router(db);

    let listener = tokio::net::TcpListener::bind(&config.listen_addr).await?;
    tracing::info!("listening on {}", config.listen_addr);
    axum::serve(listener, app).await?;

    Ok(())
}

This is deliberately thin. Route definitions, middleware, and handler logic live in the web crate. The server crate only constructs dependencies and passes them in.

default-members

Set default-members in the workspace root to control which crates cargo build and cargo run operate on by default:

[workspace]
members = ["crates/*"]
default-members = ["crates/server"]
resolver = "3"

With this setting, cargo run starts the server without needing -p app-server. The xtask crate and other library crates only compile when explicitly requested or pulled in as dependencies.

When to split into more crates

Start with fewer crates than you think you need. A single lib crate alongside server and xtask is a reasonable starting point. Split when you have a concrete reason:

  • Compile times. A change in one module triggers recompilation of unrelated code. Splitting into separate crates isolates the rebuild radius.
  • Dependency sprawl. A module pulls in heavy dependencies that most of the codebase does not need. Moving it to its own crate keeps those dependencies contained.
  • Independent deployment. A Restate worker or CLI tool needs to share domain types with the web server but should not pull in Axum.
  • Team boundaries. Different people or teams own different parts of the system and want clear interfaces between them.

Do not split pre-emptively. Each new crate adds a Cargo.toml to maintain and a boundary to design. Split when the cost of staying in one crate (slow builds, tangled dependencies) exceeds the cost of the boundary.

Feature unification

Cargo unifies features of shared dependencies across all workspace members. If the web crate enables sqlx/postgres and the jobs crate enables sqlx/uuid, both features are active everywhere. Features are additive, so this usually works fine. It becomes a problem only if two crates need genuinely incompatible configurations of the same dependency, which is rare in practice.

Resolver 3 (the default with edition = "2024") already avoids unifying features across different target platforms, which eliminates the most common source of unexpected feature activation.

Architecture

Why Hypermedia-Driven Architecture

This section makes the case for hypermedia-driven architecture (HDA) as the default approach to building web applications. The arguments here are opinionated but grounded in the original definition of REST, the economics of framework migration, and the structural properties of HTML as a transfer format.

The technical implementation follows in later sections. This one answers the prior question: why build this way at all?

REST was always about hypermedia

Roy Fielding’s 2000 doctoral dissertation, Architectural Styles and the Design of Network-based Software Architectures, defined REST as “an architectural style for distributed hypermedia systems.” The word hypermedia is not incidental. It is the subject of the entire architecture.

Chapter 5 of the dissertation specifies four interface constraints for REST. The fourth is HATEOAS: Hypermedia As The Engine of Application State. Server responses carry both data and navigational controls. The client does not hardcode knowledge of available actions. It discovers them through hypermedia links and forms embedded in the response. HTML is the canonical format that satisfies this constraint: an HTML page contains both content and the controls (links, forms, buttons) that drive state transitions.

JSON carries no native hypermedia controls. A JSON response like {"name": "Alice", "email": "alice@example.com"} contains data but no affordances. The client must know in advance what URLs to call, what HTTP methods to use, and what payloads to send. This requires out-of-band documentation and tight client-server coupling, which is precisely what REST’s uniform interface constraint was designed to prevent.

By 2008, the drift had become bad enough that Fielding wrote a blog post titled “REST APIs must be hypertext-driven”:

I am getting frustrated by the number of people calling any HTTP-based interface a REST API. […] If the engine of application state (and hence the API) is not being driven by hypertext, then it cannot be RESTful and cannot be a REST API. Period.

The industry ignored him. The Richardson Maturity Model, popularised by Martin Fowler, formalised REST into “levels.” Most developers stopped at Level 2 (HTTP verbs and resource URLs) and never implemented Level 3 (hypermedia controls). When JSON replaced XML as the dominant transfer format, the “REST” label stuck even though the defining constraint had been dropped. What the industry calls a “RESTful API” is, by Fielding’s definition, RPC with nice URLs.

This matters because the original REST architecture was designed to solve real problems: evolvability, loose coupling, and independent deployment of client and server. Those problems did not go away when the industry adopted JSON APIs. The solutions were simply abandoned.

The HDA architecture defined

A hypermedia-driven application (HDA) returns HTML from the server, not JSON. The term comes from Carson Gross, creator of htmx, and is defined in detail in the book Hypermedia Systems and on the htmx website.

The architecture has two constraints:

  1. Hypermedia communication. The server responds to HTTP requests with HTML. The client renders it. There is no JSON serialisation layer, no client-side data model, and no mapping between API responses and UI state. The HTML is the interface.

  2. Declarative interactivity. HTML-embedded attributes (such as htmx’s hx-get, hx-post, hx-swap) drive dynamic behaviour. The developer declares what should happen in the markup rather than writing imperative JavaScript to manage requests, state, and DOM updates.

The key mechanism is partial page replacement. When the user interacts with an element, the browser sends an HTTP request and receives an HTML fragment. That fragment replaces a targeted region of the DOM. The server controls what the user sees next, because the server produces the HTML. The client is a rendering engine, not an application runtime.

This eliminates an entire layer of software. In a typical SPA, the server serialises data to JSON, the client deserialises it, maps it into a state store, derives a virtual DOM from that state, and diffs it against the real DOM. In HDA, the server renders HTML and the browser displays it. The serialisation, deserialisation, state management, and virtual DOM diffing layers do not exist because they are not needed.

An HDA is not a traditional multi-page application with full page reloads on every click. The partial replacement model provides the same responsiveness that SPAs deliver, but the interactivity logic lives on the server rather than in client-side JavaScript.

The coupling advantage

Each endpoint in an HDA produces self-contained HTML. A handler for GET /contacts/42/edit returns an edit form. That form contains the data, the input fields, the validation rules (via HTML5 attributes), and the submit action (via the form’s action attribute or htmx attributes). Everything the client needs is in the response. There is no shared state to coordinate with.

SPA architectures centralise client-side state. React applications commonly use a global state store (Redux, Zustand, Jotai, or React Context) to hold data that multiple components need. This creates a coupling pattern: when you change the shape of data in the store, every component that reads or writes that data must be updated.

Redux’s single-store design has been criticised for exhibiting the God Object anti-pattern, where a single entity becomes tightly coupled to much of the codebase. Changes intended to benefit one feature create ripple effects in unrelated features. The React-Redux community documented this problem: hooks encourage tight coupling between Redux state shape and component internals, reducing testability and violating the single responsibility principle.

The single-spa project (a framework for combining multiple SPAs) explicitly warns against sharing Redux stores across micro-frontends: “if you find yourself needing constant sharing of UI state, your microfrontends are likely more coupled than they should be.” This is an acknowledgement from within the SPA ecosystem that centralised client state creates coupling problems.

In HDA, the coupling boundary is the HTTP response. Each response is stateless and self-contained. The server can change the HTML structure of one endpoint without affecting any other endpoint, because there is no shared client-side state that binds them together. Two developers can modify two different pages concurrently with zero coordination. This property is structural, not a matter of discipline. It falls out automatically from the architecture.

The framework migration tax

JavaScript framework churn imposes a recurring cost on every project built with a client-side framework.

AngularJS to Angular 2+. React class components to hooks to server components. Vue 2 to Vue 3. Each major transition changes fundamental patterns: how components are defined, how state is managed, how side effects are handled. Code written against the old patterns must be rewritten, not just updated.

A peer-reviewed study by Ferreira, Borges, and Valente (On the (Un-)Adoption of JavaScript Front-end Frameworks, published in Software: Practice and Experience, 2021) examined 12 open-source projects that performed framework migrations. The findings:

  • The time spent performing the migration was greater than or equal to the time spent using the old framework in all 12 projects.
  • In 5 of the 12 projects, the time spent migrating exceeded the time spent using both the old and new frameworks combined.
  • Migration durations ranged from 7 days to 966 days.

AngularJS reached end-of-life on 31 December 2021. Three years later, BuiltWith reports over one million live websites still running AngularJS. WebTechSurvey puts the figure above 500,000. The exact count varies by measurement method, but the order of magnitude is clear: hundreds of thousands of applications remain on a deprecated, unpatched framework because migrating to Angular 2+ requires a near-complete rewrite of the client-side codebase.

This is not a one-time problem. React’s transition from class components to hooks changed every component pattern in the ecosystem. The ongoing shift toward React Server Components is changing the execution model itself, blurring the boundary between server and client in ways that require rethinking application architecture. Each transition resets knowledge, breaks libraries, and forces rewrites.

The migration tax is a structural property of the SPA model: when interactivity logic lives in client-side JavaScript tied to a specific framework’s component model, that logic must be rewritten whenever the framework’s model changes. HDA does not eliminate the need to stay current with server-side tools, but server-side framework transitions (switching from one Rust web framework to another, for example) affect route definitions and middleware, not the fundamental rendering model. The HTML your server produces is the same regardless of which framework generates it.

The backward-compatibility guarantee

No HTML element has ever been removed from the specification in a way that breaks rendering.

The WHATWG HTML Standard, which governs HTML as a living specification, lists obsolete elements including <marquee>, <center>, <font>, <frame>, and <acronym>. Authors are told not to use them. But the specification still mandates that browsers render them. <marquee> has a complete interface specification (HTMLMarqueeElement) with defined behaviour. <acronym> must be treated as equivalent to <abbr> for rendering purposes. These elements work in every modern browser because the spec requires it.

This is not accidental. It is policy. The W3C HTML Design Principles document establishes a priority of constituencies: “In case of conflict, consider users over authors over implementors over specifiers over theoretical purity.” Backward compatibility flows directly from this principle: breaking existing content harms users, so the specification does not break existing content.

The WHATWG’s founding position reinforces this:

Technologies need to be backwards compatible, that specifications and implementations need to match even if this means changing the specification rather than the implementations.

An application built on HTML, CSS, and HTTP in 2026 can reasonably expect its platform foundation to remain stable for decades. The same HTML that rendered in Netscape Navigator still renders in Chrome today. No JavaScript framework has provided, or can provide, a comparable guarantee. React is 12 years old and has undergone three major paradigm shifts. The <form> element is 31 years old and works exactly as it did in 1995, with additional capabilities layered on top.

This is the core durability argument for HDA. Your investment in HTML templates, HTTP handlers, and declarative interactivity attributes is protected by the strongest backward-compatibility commitment in software: the web platform’s refusal to break existing content.

No separate API layer

In HDA, the HTML response is the API. There is no JSON layer to design, version, document, or maintain.

A traditional SPA architecture requires two applications: a client-side app that renders UI, and a server-side API that produces JSON. These are developed, tested, deployed, and versioned as separate artefacts with a contract between them. When the contract changes, both sides must change in coordination.

HDA collapses this into one application. An Axum handler receives a request, queries the database, renders HTML with Maud, and returns it. The browser displays the HTML. There is one codebase, one deployment, one thing to reason about.

This has practical consequences:

  • No API versioning. The server controls the HTML. If the data model changes, the server updates the template. There is no external consumer relying on a JSON schema.
  • No serialisation code. No serde annotations on response types, no JSON schema validation on the client, no mapping between API responses and component props.
  • No CORS configuration. The browser requests HTML from the same origin that served the page. Cross-origin issues do not arise.
  • Faster feature delivery. Adding a field to a page means adding it to the query and the template. In an SPA, it means updating the API response, the TypeScript types, the state store, and the component that renders it.

The reduction in moving parts is not incremental. It is categorical. An entire class of bugs (schema mismatches, stale client caches, API versioning conflicts) cannot occur because the architecture does not have the layers where those bugs live.

When you do need a separate API

HDA does not mean you never write JSON endpoints. It means JSON is not the default, and HTML handles the majority of your application’s interface.

There are legitimate cases where a JSON API is the right tool:

  • Third-party integrations. External services that call your application (payment webhooks, OAuth callbacks, partner integrations) communicate in JSON. These are not UI interactions; they are machine-to-machine interfaces.
  • Mobile applications. If you ship a native mobile app alongside your web application, the mobile client needs a data API. HDA applies to the web interface; the mobile interface has different constraints.
  • Public APIs. If your product offers an API as a feature (for customers to build integrations), that API will be JSON and needs the usual API design treatment: versioning, documentation, authentication, rate limiting.
  • Islands of rich interactivity. Some UI components genuinely need client-side state: a drag-and-drop kanban board, a collaborative text editor, a real-time data visualisation. These components can fetch JSON from dedicated endpoints while the rest of the application uses HDA. This is the islands pattern, covered in When to Use HDA.

The principle is straightforward: use HTML for the interface, JSON for integrations. Most web applications are overwhelmingly interface. The JSON endpoints, when needed, are a small surface area alongside the HDA core, not a parallel architecture that doubles the codebase.

The Web Platform Has Caught Up

Between 2022 and 2026, the web platform crossed a capability threshold. Native CSS and HTML features now provide the functionality that historically justified adopting a CSS preprocessor, a utility framework, a CSS-in-JS library, or a JavaScript UI component system. No single feature is transformative. The cumulative effect is that the problems requiring these tools in 2020 can be solved with the platform itself in 2026.

This section catalogues what changed and why it matters for the architectural choice described in Why Hypermedia-Driven Architecture. The HDA model depends on the platform being capable enough that server-rendered HTML, plain CSS, and minimal JavaScript can deliver a production-quality experience. That dependency is now met.

The Interop Project

Cross-browser inconsistency was a primary driver of framework and preprocessor adoption. Developers reached for jQuery, Sass, Autoprefixer, and eventually React because writing to the platform directly meant writing to four different platforms with different bugs. The Interop Project has largely eliminated this rationale.

Interop is a joint initiative of Apple, Google, Igalia, Microsoft, and Mozilla, running annually since 2021 (initially as “Compat 2021”). Each year, the participants agree on a set of web platform features, write shared test suites via the Web Platform Tests project, and publicly track each browser engine’s pass rate. The Interop dashboard reports a single “interop score”: the percentage of tests that pass in all browsers simultaneously.

The scores tell the story:

YearStarting interop scoreEnd-of-year (stable)End-of-year (experimental)
Compat 202164-69%>90%–
Interop 2022~49%83%~97%
Interop 2023~48%75%89%
Interop 202446%95%99%
Interop 202529%97%99%

The low starting scores each year reflect the selection of new focus areas, not regression. Each iteration targets harder, more recent features. That Interop 2025 started at 29% and finished at 97% in stable releases means the browser vendors are converging on new features within a single calendar year.

WebKit’s review of Interop 2025 described the result directly: “Every browser engine invested heavily, and the lines converge at the top. That convergence is what makes the Interop project so valuable, the shared progress that means you can write code once and trust that it works everywhere.”

Interop 2026 launched in February 2026 with 20 focus areas including cross-document view transitions, scroll-driven animation timelines, and continued anchor positioning alignment. The initiative is now in its fifth consecutive year with no signs of winding down.

The practical consequence: if you write CSS and HTML to the current specifications, it works in Chrome, Firefox, Safari, and Edge. The “works in my browser but not yours” problem that drove an entire generation of tooling adoption is, for the features that matter most, solved.

CSS features that replace frameworks

Eight CSS features, all shipping between 2022 and 2026, collectively address the problems that justified Sass, Less, PostCSS, Tailwind, CSS-in-JS, and JavaScript positioning libraries.

Cascade Layers (@layer)

Cascade Layers provide explicit control over cascade priority, independent of selector specificity or source order. All major browsers shipped support within five weeks of each other in early 2022. @layer reached Baseline Widely Available in September 2024.

@layer reset, base, components, utilities;

@layer reset {
  * { margin: 0; box-sizing: border-box; }
}

@layer components {
  .card { padding: 1rem; border: 1px solid #ddd; }
}

@layer utilities {
  .hidden { display: none; }
}

Styles in later-declared layers always win over earlier layers, regardless of specificity. This replaces the specificity arms race that led to !important abuse, strict BEM naming conventions, and CSS-in-JS libraries whose primary value proposition was specificity isolation. Styles outside any @layer have the highest priority, which allows third-party CSS to be layered below application styles without modification.

CSS Nesting

CSS Nesting reached Baseline Newly Available in December 2023, when Chrome 120 and Safari 17.2 shipped the relaxed syntax (Firefox 117 had shipped in August 2023).

.card {
  padding: 1rem;

  h2 {
    font-size: 1.25rem;
  }

  &:hover {
    box-shadow: 0 2px 8px rgb(0 0 0 / 0.1);
  }

  @media (width >= 768px) {
    padding: 2rem;
  }
}

This is the feature that eliminated the most common reason for using Sass or Less. The relaxed nesting syntax (no & required before element selectors) matches what preprocessor users expect. Media queries and other at-rules can nest directly inside selectors.

Container Queries

Container Queries reached Baseline Widely Available in August 2025. Firefox 110 was the last browser to ship, completing Baseline in February 2023.

.card-container {
  container-type: inline-size;
}

@container (inline-size > 400px) {
  .card {
    display: grid;
    grid-template-columns: 200px 1fr;
  }
}

Media queries respond to the viewport. Container queries respond to the size of the containing element. This makes components genuinely reusable: a card component that switches from stacked to horizontal layout based on its container width, not the window width. Previously, achieving this required JavaScript ResizeObserver workarounds or abandoning the idea entirely.

Size container queries are the Baseline part. Style container queries (@container style(...)) remain Chromium-only as of early 2026.

The :has() selector

:has() reached Baseline Newly Available in December 2023, when Firefox 121 shipped (Safari had led in March 2022, Chrome followed in August 2022).

/* Style a card differently when it contains an image */
.card:has(img) {
  grid-template-rows: 200px 1fr;
}

/* Style a form group when its input is invalid */
.form-group:has(:invalid) {
  border-color: var(--color-error);
}

/* Style a section when it has no content */
section:has(> :only-child) {
  padding: 0;
}

:has() is the long-requested “parent selector,” though it is more general than that name implies. It selects an element based on its descendants, siblings, or any relational condition expressible as a selector. Before :has(), selecting a parent based on its children required JavaScript DOM traversal. Entire categories of conditional styling that needed classList.toggle() or framework-level reactivity can now be expressed in CSS alone.

@scope

@scope reached Baseline Newly Available in December 2025, when Firefox 146 shipped (Chrome 118 had led in October 2023, Safari 17.4 followed in March 2024).

@scope (.card) to (.card-footer) {
  p { margin-bottom: 0.5rem; }
  a { color: var(--card-link-color); }
}

@scope provides proximity-based style scoping with both an upper bound (the scope root) and an optional lower bound (the scope limit), creating a “donut scope” that prevents styles from leaking into nested sub-components. This addresses the problem that CSS Modules, BEM, and Shadow DOM each solved partially: keeping component styles from colliding. Unlike Shadow DOM, @scope does not create hard encapsulation boundaries, so styles remain inspectable and overridable when needed.

The cumulative effect

No single feature here replaces a framework. The replacement is structural.

In 2020, a developer building a component library needed: a preprocessor for nesting and variables (Sass), a naming convention or tooling for specificity management (BEM or CSS Modules), JavaScript for responsive component behaviour (ResizeObserver hacks), JavaScript for parent-based conditional styling (no :has()), and either strict discipline or a CSS-in-JS library to prevent style collisions.

In 2026, native CSS handles all of this. Nesting and custom properties replace the preprocessor. @layer replaces specificity management tooling. Container queries replace JavaScript resize detection. :has() replaces JavaScript conditional styling. @scope replaces CSS-in-JS scoping. The developer writes CSS, and it works across browsers.

HTML features that replace JavaScript UI primitives

The historical justification for React’s component model arose partly because HTML lacked native primitives for modals, tooltips, menus, and rich selects. Three of those gaps are now closed at Baseline. Two more are closing.

The <dialog> element

<dialog> reached Baseline Widely Available in approximately September 2024. Firefox 98 and Safari 15.4 completed cross-browser support in March 2022.

<dialog id="confirm-dialog">
  <h2>Delete this item?</h2>
  <p>This action cannot be undone.</p>
  <form method="dialog">
    <button value="cancel">Cancel</button>
    <button value="confirm">Delete</button>
  </form>
</dialog>

A modal <dialog> (opened via showModal()) provides focus trapping, top-layer rendering, backdrop styling via ::backdrop, the Escape key to close, and <form method="dialog"> for declarative close actions. These are the behaviours that every custom modal library (Bootstrap Modal, React Modal, a11y-dialog) reimplements in JavaScript. The native element provides them with correct accessibility semantics, including the dialog ARIA role and proper focus restoration on close, out of the box.

The Popover API

The Popover API reached Baseline Newly Available in January 2025 (Safari 18.3 resolved a light-dismiss bug on iOS that had delayed the designation).

<button popovertarget="menu">Options</button>
<div id="menu" popover>
  <a href="/settings">Settings</a>
  <a href="/profile">Profile</a>
  <a href="/logout">Log out</a>
</div>

The popover attribute gives any element top-layer rendering, light dismiss (click outside or press Escape to close), and automatic accessibility wiring. popover="auto" provides light dismiss; popover="manual" requires explicit close. This replaces Tippy.js, Bootstrap Popovers, and the custom JavaScript that every dropdown menu previously required.

The popover="hint" variant (for hover-triggered tooltips) is an Interop 2026 focus area and not yet Baseline.

Invoker Commands

Invoker Commands (command and commandfor attributes) reached Baseline Newly Available in early 2026, with Safari 26.2 completing cross-browser support after Chrome 135 (April 2025) and Firefox 144.

<button commandfor="my-dialog" command="show-modal">Open</button>
<dialog id="my-dialog">
  <p>Dialog content</p>
  <button commandfor="my-dialog" command="close">Close</button>
</dialog>

Invoker Commands connect a button to a target element declaratively: commandfor names the target, command specifies the action. Built-in commands include show-modal, close, and request-close for dialogs, and toggle-popover, show-popover, hide-popover for popovers. No JavaScript required for these interactions.

Combined with <dialog> and the Popover API, Invoker Commands eliminate the last bit of JavaScript glue that modals and popovers previously required. A dialog can be opened, populated, and closed entirely through HTML attributes and server-rendered content, which is exactly what HDA needs.

Gaps still closing

Two features listed in the outline remain Chromium-only as of February 2026:

Customizable <select> (appearance: base-select). Chrome 134+ and Edge 134+ ship full CSS styling of <select> elements, including custom option rendering via exposed pseudo-elements (::picker(select), selectedoption). Firefox and Safari are implementing but have not shipped to stable. This feature replaces React Select, Select2, and the entire category of custom dropdown libraries that exist because native <select> has been unstyled. The opt-in (appearance: base-select) means browsers without support simply show the default <select>, making it safe to adopt as progressive enhancement.

Speculation Rules API. Chrome 121+ supports declarative prefetch and prerender rules via <script type="speculationrules">. WordPress and Shopify have deployed it at scale. Firefox’s standards position is positive for prefetch but neutral on prerender; Safari has published no position. Non-supporting browsers ignore the <script> block entirely, so it can be deployed today without harm. For HDA applications, speculation rules offer the multi-page navigation speed that SPA prefetching provides, without any client-side routing framework.

Both features work as progressive enhancement: they improve the experience in supporting browsers without breaking others.

Progressive enhancement as the architectural default

The features above share a property: they degrade gracefully. A <dialog> without JavaScript still renders its content. A popover without support becomes a static element. A <select> without appearance: base-select falls back to the native control. This is not accidental. The web platform is designed around progressive enhancement.

Native HTML elements carry built-in ARIA semantics, focus management, and keyboard handling. A <dialog> opened with showModal() traps focus, responds to Escape, announces itself to screen readers, and restores focus to the triggering element on close. A <button> with commandfor and command attributes communicates its relationship to the target element through the accessibility tree. These behaviours are defined by the specification and implemented by the browser.

SPA component libraries must reimplement all of this. A React modal component needs explicit focus-trap logic, an Escape key handler, ARIA attributes, a portal to render in the correct DOM position, and focus restoration on unmount. Libraries like Radix UI and Headless UI exist specifically because implementing accessible interactive components in React is difficult. The native elements provide the same behaviours correctly by default.

In HDA, progressive enhancement is the structural default. The baseline is server-rendered HTML with standard links and forms. htmx attributes enhance but are not required; a form with hx-post and hx-swap still submits normally via the browser’s native form handling if htmx fails to load. In SPA frameworks, progressive enhancement is opt-in and, under deadline pressure, frequently abandoned.

No-build JavaScript

ES Modules (<script type="module">) have been supported in all major browsers since 2018 and are Baseline Widely Available. Import Maps reached Baseline Widely Available in approximately September 2025, with Safari 16.4 completing cross-browser support in March 2023.

Together, they enable npm-style bare specifier imports in the browser without npm, Node.js, or a bundler:

<script type="importmap">
{
  "imports": {
    "htmx": "/static/js/htmx.min.js",
    "alpinejs": "/static/js/alpine.min.js"
  }
}
</script>
<script type="module">
  import 'htmx';
</script>

Import maps resolve bare specifiers (import 'htmx') to URLs, the same job that webpack, Rollup, and esbuild perform during a build step. With import maps, the browser does this resolution at runtime. No bundler needed.

The trade-offs are real. There is no tree-shaking: unused code in imported modules ships to the client. No TypeScript compilation: types are stripped only if a build step runs. No code splitting: the browser loads entire modules rather than optimised chunks. For applications with large client-side dependency graphs, these costs matter.

For HDA applications, they do not. The client-side dependency count is typically small: htmx (14 KB gzipped), perhaps a date formatting library, perhaps a small charting library for a dashboard page. The total client-side JavaScript in an HDA application is measured in tens of kilobytes, not megabytes. HTTP/2 and HTTP/3 multiplexing further reduce the cost of serving a handful of small modules individually.

Some practitioners retain a build step for minification, but this is an optional optimisation, not an architectural requirement. The htmx project itself argues explicitly against build steps, distributing as a single file that can be included with a <script> tag. The no-build approach is not a compromise for HDA. It is the natural fit.

The supply chain security argument

The architectural choice to avoid npm is not only a simplicity argument. It is a security argument, grounded in the structural properties of the npm dependency graph and the empirical record of supply chain attacks against it.

The dependency graph problem

Zimmermann, Staicu, Tenny, and Pradel (Small World with High Risks: A Study of Security Threats in the npm Ecosystem, USENIX Security 2019) analysed npm’s dependency graph as of April 2018 and found small-world network properties: just 20 maintainer accounts could reach more than half of the entire ecosystem through transitive dependencies. Installing an average npm package implicitly trusts approximately 80 other packages and 39 maintainers. 391 highly influential maintainers each affected more than 10,000 packages.

A comparative study by Decan, Mens, and Grosjean (An Empirical Comparison of Dependency Network Evolution in Seven Software Packaging Ecosystems, Empirical Software Engineering, 2019) found npm had the highest transitive dependency counts among seven ecosystems. A more recent study by Biernat et al. (How Deep Does Your Dependency Tree Go?, December 2025) across ten ecosystems found that Maven now shows the highest mean amplification ratio (24.70x transitive-to-direct), with npm at 4.32x. npm is not the worst offender across all ecosystems, but it remains structurally exposed: 12% of npm projects exceed a 10x amplification ratio, and the absolute number of affected projects is enormous given npm’s scale.

The empirical record

The structural risk is not theoretical. Supply chain attacks against npm are recurring and escalating in sophistication.

event-stream (November 2018). A new maintainer, given publish access through social engineering, added a dependency on flatmap-stream containing encrypted malicious code targeting the Copay Bitcoin wallet. The package had approximately 2 million weekly downloads. The malicious code was live for over two months before a computer science student noticed it.

polyfill.io (June 2024). The polyfill.io CDN domain was acquired by a new owner in February 2024. Four months later, the CDN began serving modified JavaScript that redirected mobile users to scam sites. Over 380,000 websites were embedding scripts from the compromised domain. Andrew Betts, the original creator, had warned users when the sale occurred. Most did not act.

chalk/debug (September 2025). A phishing attack compromised the npm credentials of a maintainer of chalk, debug, and 16 other packages. The malicious versions contained code to hijack cryptocurrency transactions in browsers. The 18 affected packages accounted for over 2.6 billion combined weekly downloads. The malicious versions were live for approximately two hours.

These incidents share a structural cause: the npm ecosystem’s deep transitive dependency graphs mean that compromising a single package or maintainer account can reach thousands or millions of downstream projects. The risk scales with the number of dependencies.

The HDA alternative

An HDA application with vendored htmx eliminates this entire attack surface. htmx is 14 KB minified and gzipped, has zero dependencies, and is distributed as a single JavaScript file. There is no npm install step, no node_modules directory, no transitive dependency graph, and no exposure to registry-level supply chain attacks.

This is not an incremental improvement. A typical React application created with Vite installs approximately 270 packages, and projects using Create React App (now deprecated) routinely exceeded 1,500. Each package is a node in the dependency graph that the Zimmermann findings describe. Reducing that graph from hundreds of nodes to zero is a categorical change in supply chain risk profile.

The comparison is worth stating plainly. One architecture requires you to trust hundreds of packages, maintained by strangers, with update cadences you do not control, delivered through a registry that is a recurring target of supply chain attacks. The other architecture requires you to trust one 14 KB file that you can vendor, audit, and pin.

What this means for HDA

The web platform’s capability expansion between 2022 and 2026 is the material condition that makes hypermedia-driven architecture practical for production applications. The HDA model depends on three platform properties:

  1. CSS is sufficient for production UI. Nesting, container queries, cascade layers, :has(), and @scope collectively provide the capabilities that previously required a preprocessor, a utility framework, or CSS-in-JS.

  2. HTML provides interactive primitives. <dialog>, the Popover API, and Invoker Commands cover modals, tooltips, dropdowns, and declarative element interaction without JavaScript component libraries.

  3. The browser is a capable module system. ES Modules and Import Maps enable dependency management without a build tool, and the small dependency footprint of HDA applications makes the trade-offs (no tree-shaking, no code splitting) irrelevant.

The Interop Project ensures these features work consistently across browsers. The backward-compatibility guarantee described in the previous section ensures they will continue to work. And the elimination of the npm dependency graph provides a supply chain security posture that no framework-dependent architecture can match.

The web platform was not always adequate for building rich applications without frameworks. It is now.

SPA vs HDA: A Side-by-Side Comparison

The previous sections argued for hypermedia-driven architecture on structural grounds: coupling, migration cost, backward compatibility. This section puts code next to code. What does the same feature actually look like when built both ways, and what do published migrations tell us about the difference at scale?

What real migrations show

The strongest published data comes from Contexte, a SaaS product for media professionals built with React. In 2022, developer David Guillot presented the results of porting the application from React to Django templates with htmx:

MetricReactDjango + htmxChange
Total lines of code21,5007,200−67%
JavaScript dependencies2559−96%
Web build time40s5s−88%
First load time-to-interactive2–6s1–2s−50–60%
Memory usage~75 MB~45 MB−46%

The port took roughly two months to rewrite a codebase that had taken two years to build. The team eliminated the hard split between frontend and backend developers. User experience did not degrade.

Contexte is a media-oriented application, exactly the kind of content-driven, read-heavy workload that hypermedia was designed for. The htmx project acknowledges this: “These sorts of numbers would not be expected for every web application.” A separate Next.js to htmx port showed a 17% reduction in written application code and over 50% reduction in total shipped code when accounting for dependency weight.

The pattern across these migrations is consistent. The JSON serialisation layer disappears. Client-side state management disappears. The build toolchain disappears. The dependency graph collapses. What remains is server-side code that got somewhat larger (Contexte’s Python grew from 500 to 1,200 lines) and a total codebase that got dramatically smaller.

The same feature, two architectures

Consider a searchable contact list with inline editing and deletion. The specification is identical for both implementations:

  • Display contacts from a database
  • Live search with debounce (300ms)
  • Click a row to get an editable form
  • Delete with confirmation
  • All changes persist to the server

This is a bread-and-butter CRUD feature. Most web applications are made of features like this one.

SPA: React + Vite + REST API

The SPA approach requires two applications. A React client handles rendering and state. A server exposes JSON endpoints. They communicate through a serialisation boundary.

Search with debounce needs a custom hook or a library:

function useDebounce(value, delay) {
  const [debounced, setDebounced] = useState(value);
  useEffect(() => {
    const timer = setTimeout(() => setDebounced(value), delay);
    return () => clearTimeout(timer);
  }, [value, delay]);
  return debounced;
}

function ContactList() {
  const [query, setQuery] = useState('');
  const [contacts, setContacts] = useState([]);
  const [editingId, setEditingId] = useState(null);
  const debouncedQuery = useDebounce(query, 300);

  useEffect(() => {
    fetch(`/api/contacts?q=${debouncedQuery}`)
      .then(res => res.json())
      .then(setContacts);
  }, [debouncedQuery]);

  // ... render logic, edit mode toggling, delete handlers
}

The component manages three pieces of state: the search query, the contact list, and which row is being edited. Each state change triggers a re-render. The search query flows through a debounce hook, which triggers a fetch, which deserialises JSON, which updates state, which triggers another re-render. The edit mode is a client-side toggle: clicking a row sets editingId, and the component conditionally renders either a display row or a form row based on that state.

Inline editing requires the client to manage form state, submit JSON to the API, handle the response, and update the local contact list to reflect the change:

async function handleSave(contact) {
  const res = await fetch(`/api/contacts/${contact.id}`, {
    method: 'PUT',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(contact),
  });
  const updated = await res.json();
  setContacts(prev =>
    prev.map(c => c.id === updated.id ? updated : c)
  );
  setEditingId(null);
}

The server side mirrors this with JSON endpoints:

app.get('/api/contacts', async (req, res) => {
  const contacts = await db.query(
    'SELECT * FROM contacts WHERE name ILIKE $1',
    [`%${req.query.q}%`]
  );
  res.json(contacts);
});

app.put('/api/contacts/:id', async (req, res) => {
  const { name, email } = req.body;
  const updated = await db.query(
    'UPDATE contacts SET name = $1, email = $2 WHERE id = $3 RETURNING *',
    [name, email, req.params.id]
  );
  res.json(updated[0]);
});

Every interaction crosses the serialisation boundary twice: the server serialises to JSON, the client deserialises, processes the data, and re-renders. The CORS configuration, the Content-Type headers, the JSON.stringify and res.json() calls are all infrastructure that exists solely because the client and server are separate applications communicating through a data format that carries no hypermedia controls.

The project also needs a build toolchain. A fresh React + Vite project installs Node.js, npm, Vite (which bundles esbuild for development and Rollup for production), a JSX transformer (Babel or SWC), and ESLint. The node_modules directory contains hundreds of transitive packages. Each is a separate project with its own release cycle.

HDA: Rust/Axum/Maud + htmx

The HDA approach is one application. The server handles everything: routing, data access, rendering, and interactivity declarations.

Search with debounce is a single HTML attribute:

fn search_input(query: &str) -> Markup {
    html! {
        input type="text" name="q" value=(query)
            hx-get="/contacts"
            hx-trigger="input changed delay:300ms"
            hx-target="#contact-list"
            placeholder="Search contacts...";
    }
}

No hook. No state. No effect. The hx-trigger attribute declares the debounce behaviour inline. When the user types, htmx waits 300ms after the last keystroke, sends a GET request, and swaps the response into #contact-list. The server returns an HTML fragment containing the filtered rows.

The search handler queries the database and renders HTML directly:

async fn list_contacts(
    State(pool): State<PgPool>,
    Query(params): Query<SearchParams>,
) -> Markup {
    let contacts = sqlx::query_as!(
        Contact,
        "SELECT * FROM contacts WHERE name ILIKE '%' || $1 || '%'",
        params.q.unwrap_or_default()
    )
    .fetch_all(&pool)
    .await
    .unwrap();

    html! {
        tbody#contact-list {
            @for contact in &contacts {
                (contact_row(contact))
            }
        }
    }
}

There is no JSON serialisation. The handler returns Markup, which Axum sends as an HTML response. The database result flows directly into the template. The query is checked at compile time by SQLx.

Inline editing is a template swap, not a state toggle. Clicking the edit button asks the server for an edit form:

fn contact_row(contact: &Contact) -> Markup {
    html! {
        tr {
            td { (contact.name) }
            td { (contact.email) }
            td {
                button hx-get={"/contacts/" (contact.id) "/edit"}
                    hx-target="closest tr"
                    hx-swap="outerHTML" { "Edit" }
            }
        }
    }
}

fn contact_edit_row(contact: &Contact) -> Markup {
    html! {
        tr {
            td {
                input type="text" name="name" value=(contact.name);
            }
            td {
                input type="text" name="email" value=(contact.email);
            }
            td {
                button hx-put={"/contacts/" (contact.id)}
                    hx-target="closest tr"
                    hx-swap="outerHTML"
                    hx-include="closest tr" { "Save" }
            }
        }
    }
}

The edit handler returns contact_edit_row, which replaces the display row. The save handler updates the database and returns contact_row, which replaces the edit form. No client-side state tracks which row is being edited. The server controls the UI by returning the appropriate HTML fragment.

The entire client-side dependency is htmx: a single 14 KB file (minified and gzipped) with zero dependencies. No build step. No node_modules. No package manager. Vendor the file and serve it from your Rust application.

Key observations

The comparison reveals differences that are structural, not incremental.

The JSON serialisation layer is eliminated entirely. In the SPA, every interaction crosses a serialisation boundary: JSON.stringify on the client, res.json() on the server, res.json() then setContacts() on the way back. In HDA, the handler returns HTML. The serialisation layer does not exist because the architecture does not need it.

Client-side state management disappears. The React component manages query, contacts, and editingId as state. Changes to any of these trigger re-renders. The htmx version has no client-side state at all. The server is the single source of truth, and every user action asks the server what to show next.

The dependency asymmetry is categorical. One side installs hundreds of packages through a package manager, maintained by hundreds of independent maintainers, each a potential supply chain risk. The other vendors a single file. The React runtime alone (~55 KB gzipped for React 19 + ReactDOM) is roughly four times the size of htmx (~14 KB gzipped), and that comparison ignores the entire build toolchain and its transitive dependencies.

The build toolchain is a complexity tax. The SPA needs Node.js, npm, Vite, esbuild, Rollup, and a JSX transformer to convert source files into something a browser can execute. The HDA serves HTML from a compiled Rust binary. The browser needs no build artefact because the server already produced what the browser understands natively: HTML.

What the SPA provides that HDA does not

The comparison above is favourable to HDA because this is a CRUD feature, and CRUD features are what HDA handles best. The SPA architecture has genuine strengths that should not be dismissed as irrelevant.

Component-level encapsulation with typed props. React components accept typed props and manage their own state in a well-defined scope. This composability model is genuinely powerful for building complex UIs. A component can be tested in isolation, rendered in a storybook, and reused across pages with different data. Maud functions provide similar composition, but the pattern is less formalised and has no equivalent to React’s developer tooling for component inspection.

React DevTools and the debugging experience. React DevTools lets you inspect the component tree, view props and state, trace re-renders, and profile performance. The htmx debugging experience is the browser’s network tab and the DOM inspector. For complex UIs, React’s tooling gives developers significantly better visibility into what the application is doing and why.

Client-side rendering avoids some server round-trips. When edit mode is a client-side state toggle, the UI updates instantly. No network request is needed to show a form. In HDA, clicking “Edit” sends a request to the server and waits for the response. On a fast connection, this difference is imperceptible. On a slow connection or for highly interactive interfaces, it matters.

The component library ecosystem is unmatched. Libraries like shadcn/ui and Radix provide production-quality, accessible UI primitives: dialogs, dropdowns, date pickers, data tables, command palettes. These components handle keyboard navigation, screen reader announcements, focus trapping, and edge cases that take significant effort to implement correctly. The HDA ecosystem has no equivalent at comparable maturity. If your application needs a complex, accessible data table with column sorting, filtering, pagination, and row selection, a React component library gives you that out of the box.

TypeScript provides end-to-end type checking. TypeScript catches errors across the entire client-side codebase: props, state, API response shapes, event handlers. In the SPA model, a type error in a component is caught before the code runs. Rust provides this same safety on the server side (and Maud catches malformed HTML at compile time), but the client-side interactivity in HDA is untyped HTML attributes. A typo in hx-target is a runtime error, not a compile-time error.

Hiring and ecosystem momentum. React dominates job postings and developer mindshare. Finding developers who know React is straightforward. Finding developers who know Rust, Axum, Maud, and htmx is harder. This is not a technical argument, but it is a practical one that affects team building and hiring timelines.

For most CRUD and content-driven features, these trade-offs favour HDA. The component ecosystem advantage matters most when building interfaces that require complex, accessible widgets. The typing advantage is real but narrower than it appears, because the majority of interactivity in an HDA is handled by a small set of well-tested htmx attributes rather than arbitrary JavaScript. The hiring argument is genuine and may be the strongest practical objection for many teams.

Rust-specific advantages

The contact list comparison used generic server code for the SPA side. The HDA side is Rust, and Rust brings specific advantages beyond the architectural ones.

Maud checks HTML at compile time. Most server-side template engines (Jinja2, ERB, Handlebars) parse templates at runtime. A typo in a variable name, a missing closing tag, or a type mismatch surfaces as a runtime error, sometimes only when that specific template path is hit in production. Maud’s html! macro is evaluated during compilation. If the template contains a syntax error or references a variable that does not exist, the code does not compile. This is a meaningful safety guarantee that most server-side frameworks cannot offer.

SQLx checks queries at compile time. The sqlx::query_as! macro verifies SQL against a live database during compilation. If a column name is wrong, a type does not match, or a table does not exist, the compiler catches it. Combined with Maud’s compile-time HTML checking, the Rust HDA stack catches errors at two boundaries (database-to-code and code-to-HTML) where most stacks only discover problems at runtime.

The combination delivers type safety comparable to TypeScript + React, but without the client-side dependency graph. TypeScript checks component props and state. Rust + SQLx + Maud checks database queries, handler types, and HTML output. Both approaches catch a broad category of errors before the code runs. The difference is that the Rust approach achieves this with a single compiled binary, while the TypeScript approach requires a build toolchain, a runtime, and hundreds of dependencies to deliver the same guarantee.

When to Use HDA (and When Not To)

HDA is not a universal prescription. It is an architecture that fits a specific, large class of web applications extremely well and fits others poorly. This section draws the boundary and describes how to handle the cases that fall on either side of it.

Where HDA excels

HDA is the natural architecture for any application where the primary interaction is reading, writing, and navigating server-managed data. This covers:

Content-heavy sites. Media publications, documentation platforms, blogs, knowledge bases, wikis. The content lives on the server. The user reads it. The server renders HTML. There is nothing to manage on the client. These applications gain nothing from a client-side framework and pay a real cost in complexity if they adopt one.

CRUD applications. Admin panels, CRM systems, ERP interfaces, internal tools, project management dashboards. The interaction pattern is: list records, view a record, edit fields, save. Every step is a request-response cycle that maps directly onto HTTP. htmx’s partial page replacement handles the dynamic parts (inline editing, live search, filtered lists) without requiring client-side state.

Form-heavy workflows. Onboarding sequences, multi-step applications, surveys, checkout flows, approval processes. Forms are native HTML. Validation can happen both in the browser (HTML5 attributes) and on the server. The Post/Redirect/Get pattern handles submission cleanly. Adding htmx provides progressive enhancement: inline validation, step transitions without full page reloads, conditional form sections that load from the server based on prior answers.

E-commerce. Catalogue browsing, product search, filtering, cart management, checkout. These are read-heavy with occasional writes. The product page is server-rendered content. The cart is server-managed state. Search is a server query. The few interactive elements (add to cart, quantity adjustment) are simple HTTP requests that return HTML fragments. Shopify, the largest e-commerce platform, serves server-rendered pages.

Dashboards with periodic data updates. Reporting interfaces, analytics dashboards, monitoring views. If the data refreshes on a cadence measured in seconds or minutes (not milliseconds), server-sent events or periodic htmx polling deliver updates without client-side state management. A dashboard that refreshes every 30 seconds does not need React.

The common thread: the server owns the data, the user interacts through standard HTTP patterns (links, forms, requests), and the UI is a representation of server state rather than an independent application with its own state model.

Where SPAs are genuinely superior

Some applications have interaction models that fundamentally require client-side state. For these, HDA is the wrong tool.

Real-time collaborative editing. Google Docs, Figma, and Linear all maintain local copies of document state on the client. Edits apply optimistically, synchronise with the server via WebSockets, and reconcile conflicts using operational transformation or CRDTs. Figma’s multiplayer system gives each document its own server process and maintains persistent WebSocket connections for every collaborating client. This is architecturally incompatible with request-response HTML. The client must own state because it must apply edits instantly and resolve conflicts locally before the server confirms them.

Offline-first applications. Applications that must function without a network connection need a complete client-side data model, a sync engine, and a conflict resolution strategy. Service workers and IndexedDB provide the storage. CRDTs or similar structures handle the merge logic. The server is not available to render HTML when the user is on an aeroplane, so the client must be a self-sufficient application.

Continuous manipulation interfaces. Drawing tools (Figma, Excalidraw), music production software, video editors, spreadsheets with real-time formula recalculation. These require sub-16ms frame rendering for smooth interaction. A server round-trip is physically incompatible with the latency budget. Many of these applications bypass the DOM entirely, rendering to <canvas> or WebGL because even DOM manipulation is too slow for their needs. Google Docs moved to canvas-based rendering to sidestep DOM performance constraints. Quadratic, an open-source spreadsheet, chose WebGL over HTML because the DOM cannot handle millions of cells.

Extreme latency sensitivity. In-browser IDEs need sub-50ms keystroke-to-render times. Trading dashboards require sub-second updates with client-side filtering across large datasets. Audio applications measure latency in single-digit milliseconds. Any architecture that routes through the server for UI updates cannot meet these constraints.

The common thread across all four: the client must own state because the interaction model is physically incompatible with server round-trips. This is not a preference or a trade-off. It is a hard constraint imposed by latency, connectivity, or rendering performance.

Steelmanning client-side frameworks

The SPA vs HDA comparison covered the technical strengths of client-side frameworks in detail: component encapsulation, developer tooling, TypeScript type checking, and the component library ecosystem. Those arguments are real and worth reading.

Beyond the technical merits, there are organisational strengths that matter for team decisions:

Hiring and ecosystem momentum. React appears in roughly 45% of developer survey responses. Job postings that require React are abundant. Job postings that require htmx are nearly nonexistent. Adopting HDA means training developers rather than hiring specialists. The htmx learning curve is shallow (it is a small library over standard HTML), but the absence of a recognised hiring category creates friction for teams accustomed to recruiting by framework name.

Established patterns for complex UIs. The React ecosystem has converged on well-documented patterns for routing, data fetching, state management, and component composition. A developer joining a React project finds familiar structure. The HDA ecosystem has fewer established conventions, and the patterns vary more between projects. This is improving (htmx’s own documentation is thorough, and this guide exists for the Rust stack), but it is not yet at parity.

The component library gap. This is worth repeating because it is the most concrete practical difference. Libraries like shadcn/ui and Radix provide accessible, production-quality date pickers, command palettes, data tables, comboboxes, and dropdown menus with keyboard navigation, focus trapping, and screen reader support built in. The HDA ecosystem has nothing at comparable maturity. Building an accessible combobox from scratch is significant work. If your application needs several such components, the React ecosystem delivers them faster today.

These are genuine advantages, not strawmen. For many teams, the hiring argument alone outweighs the architectural benefits of HDA. The right response is not to dismiss these concerns but to weigh them honestly against the structural costs documented in the preceding sections.

The islands pattern

Most applications are not purely one thing. A CRUD application might need a rich text editor on one page. A dashboard might need a real-time chart alongside otherwise static report content. A form workflow might need an interactive date range picker.

The answer is not to adopt an SPA framework for the entire application because one page needs a complex widget. The answer is islands: HDA as the default architecture, with isolated client-side components for the specific interactions that require them.

The concept is straightforward. The server renders the page as HTML. Most of the page is standard hypermedia, driven by htmx. One region of the page mounts a standalone JavaScript component, a chart library, a rich text editor, a custom date picker, whatever the specific interaction demands. That component owns its own state and manages its own rendering within its DOM region. The rest of the page is unaware of it.

Events are the integration mechanism. The island communicates with the surrounding hypermedia through DOM events. When the rich text editor saves, it dispatches a custom event. An htmx attribute on a nearby element listens for that event and triggers a server request. When the server needs to update the island, it can return an HTML fragment containing updated data- attributes or a <script> tag that the island picks up. The boundary between hypermedia and non-hypermedia is clean: HTML and HTTP on one side, JavaScript and local state on the other, with events bridging the gap.

This is not a compromise. It is the architecturally correct approach: matching the interaction style to the interaction requirements. Using htmx for a contact list is correct. Using a JavaScript charting library for a real-time visualisation is also correct. Using React for both, or htmx for both, optimises for consistency at the expense of fitness.

The practical implication is that an HDA project should have a clear policy for when an island is warranted. A reasonable threshold: if a component requires persistent client-side state that cannot be modelled as a series of server requests, it is an island. If it can be expressed as “user acts, server responds with HTML,” it is hypermedia. Most features in most applications fall into the second category.

The web’s actual composition

SPA frameworks dominate developer discourse, conference talks, blog posts, job listings, and tutorial ecosystems. This creates a perception that SPAs are the standard way to build for the web.

The data tells a different story. According to W3Techs, React is used on roughly 6% of all websites. Angular and Vue each account for 1-2%. Even granting that some sites use client-side frameworks not captured by these measurements, and that some React sites are server-rendered via Next.js, the total share of websites running as true single-page applications is well under 10%.

The remaining 90%+ is WordPress (43% of all websites alone), other CMS platforms (Shopify, Squarespace, Wix, Drupal, Joomla), static sites, and traditional server-rendered applications. The web is overwhelmingly server-rendered, content-oriented, and CRUD-driven.

This matters because the architecture you choose should match the architecture your application actually needs, not the architecture that dominates Hacker News. If you are building a collaborative design tool, use a client-side framework. If you are building a content site, an admin panel, a form workflow, a dashboard, or an e-commerce platform, you are building the kind of application that constitutes the vast majority of the web. HDA is the architecture designed for that majority.

Core Stack

Web Server with Axum

Axum is the HTTP framework for this stack. Built on Tower and Hyper, it provides type-safe request handling through extractors and uses the same Tower middleware that the rest of the Rust async ecosystem uses.

This section covers routing, handlers, extractors, shared state, middleware, static assets, and graceful shutdown. A complete runnable server is assembled at the end.

A minimal server

Add Axum and Tokio to your Cargo.toml:

[dependencies]
axum = "0.8"
tokio = { version = "1", features = ["full"] }

A server that responds to GET /:

use axum::{Router, routing::get};

#[tokio::main]
async fn main() {
    let app = Router::new()
        .route("/", get(|| async { "hello" }));

    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000")
        .await
        .unwrap();

    axum::serve(listener, app).await.unwrap();
}

axum::serve binds the router to a TCP listener. There is no separate Server type.

Handlers

A handler is an async function that receives zero or more extractors and returns something that implements IntoResponse:

use axum::response::Html;

async fn index() -> Html<&'static str> {
    Html("<h1>Home</h1>")
}

Axum provides IntoResponse implementations for common types: String, &str, StatusCode, Html<T>, Json<T>, and tuples that combine a status code with a body.

use axum::{http::StatusCode, response::IntoResponse};

async fn not_found() -> impl IntoResponse {
    (StatusCode::NOT_FOUND, Html("<h1>404</h1>"))
}

In a hypermedia-driven application, most handlers return Html. The JSON response types exist but are rarely the primary format.

Debugging handler signatures

Enable the macros feature and annotate handlers with #[debug_handler] during development. It produces clearer compiler errors when an extractor or return type is wrong:

axum = { version = "0.8", features = ["macros"] }
use axum::debug_handler;

#[debug_handler]
async fn index() -> Html<&'static str> {
    Html("<h1>Home</h1>")
}

Remove #[debug_handler] before release. It adds overhead that is only useful during compilation.

Extractors

Extractors pull data out of the incoming request. Axum calls FromRequestParts (for headers, path parameters, query strings) or FromRequest (for the body) on each handler argument. A body-consuming extractor must be the last argument.

Common extractors:

ExtractorSourceExample
Path<T>URL path parametersPath(id): Path<u64>
Query<T>Query stringQuery(params): Query<SearchParams>
Form<T>URL-encoded bodyForm(data): Form<LoginForm>
State<T>Shared application stateState(state): State<AppState>
HeaderMapRequest headersheaders: HeaderMap
use axum::extract::{Path, Query, State};
use axum::response::Html;
use serde::Deserialize;

#[derive(Deserialize)]
struct SearchParams {
    q: Option<String>,
    page: Option<u32>,
}

async fn search(
    State(state): State<AppState>,
    Query(params): Query<SearchParams>,
) -> Html<String> {
    // use state.db and params.q to query and render results
    Html(format!("<p>Searching for {:?}</p>", params.q))
}

Path parameters use curly-brace syntax in route definitions. This changed in Axum 0.8; the older colon syntax (:id) no longer works:

// /{id} not /:id
app.route("/users/{id}", get(show_user));

async fn show_user(Path(id): Path<u64>) -> impl IntoResponse {
    Html(format!("<h1>User {id}</h1>"))
}

Application state

Shared state is how handlers access the database pool, configuration, and other application-wide resources. Define a struct, derive Clone, and pass it to the router with with_state:

use sqlx::PgPool;

#[derive(Clone)]
struct AppState {
    db: PgPool,
    config: AppConfig,
}

#[derive(Clone)]
struct AppConfig {
    app_name: String,
    base_url: String,
}

Wire the state into the router:

let state = AppState {
    db: PgPool::connect(&database_url).await.unwrap(),
    config: AppConfig {
        app_name: "My App".into(),
        base_url: "http://localhost:3000".into(),
    },
};

let app = Router::new()
    .route("/", get(index))
    .with_state(state);

Handlers extract it with State<AppState>:

async fn index(State(state): State<AppState>) -> Html<String> {
    Html(format!("<h1>{}</h1>", state.config.app_name))
}

Router<S> means the router is missing state of type S. Calling .with_state(state) produces Router<()>, meaning all state has been provided. Only Router<()> can be passed to axum::serve.

PgPool is internally reference-counted, so cloning AppState is cheap. For fields that need interior mutability (counters, caches), wrap them in Arc<RwLock<T>>.

Route organisation with nest

Router::nest mounts a sub-router under a path prefix. Use this to organise routes by feature or domain area:

fn user_routes() -> Router<AppState> {
    Router::new()
        .route("/", get(list_users).post(create_user))
        .route("/{id}", get(show_user))
        .route("/{id}/edit", get(edit_user_form).post(update_user))
}

fn admin_routes() -> Router<AppState> {
    Router::new()
        .route("/", get(admin_dashboard))
        .route("/users", get(admin_users))
}

let app = Router::new()
    .route("/", get(index))
    .nest("/users", user_routes())
    .nest("/admin", admin_routes())
    .with_state(state);

Requests to /users/42 reach show_user with the path /42. The prefix is stripped before the nested router sees the request. If a handler needs the full original URI, extract OriginalUri from axum::extract.

In a workspace with multiple crates, define route functions in each crate and assemble them in the binary crate:

// in src/main.rs
use users::user_routes;
use admin::admin_routes;

let app = Router::new()
    .nest("/users", user_routes())
    .nest("/admin", admin_routes())
    .with_state(state);

All nested routers must share the same state type. If a sub-router has its own state, call .with_state() on it before nesting:

let inner = Router::new()
    .route("/bar", get(inner_handler))
    .with_state(InnerState {});  // becomes Router<()>

let app = Router::new()
    .nest("/foo", inner)  // Router<()> nests into any parent
    .with_state(OuterState {});

Middleware

Axum uses Tower layers for middleware. The tower-http crate provides HTTP-specific layers that cover most common needs.

[dependencies]
tower = "0.5"
tower-http = { version = "0.6", features = ["trace", "compression-gzip"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }

Request tracing

TraceLayer logs every request and response, integrating with the tracing crate:

use tower_http::trace::TraceLayer;
use tracing_subscriber::EnvFilter;

tracing_subscriber::fmt()
    .with_env_filter(EnvFilter::from_default_env())
    .init();

let app = Router::new()
    .route("/", get(index))
    .layer(TraceLayer::new_for_http());

Control log levels with the RUST_LOG environment variable: RUST_LOG=info for production, RUST_LOG=tower_http=trace during development.

Response compression

CompressionLayer compresses response bodies. Enable additional algorithms by adding features like compression-br or compression-zstd:

use tower_http::compression::CompressionLayer;

let app = Router::new()
    .route("/", get(index))
    .layer(CompressionLayer::new())
    .layer(TraceLayer::new_for_http());

Combining layers

Apply multiple layers with ServiceBuilder. Layers are listed top-to-bottom, and the first layer listed is the outermost (runs first on the request, last on the response):

use tower::ServiceBuilder;

let app = Router::new()
    .route("/", get(index))
    .layer(
        ServiceBuilder::new()
            .layer(TraceLayer::new_for_http())
            .layer(CompressionLayer::new())
    )
    .with_state(state);

Here, tracing wraps compression: requests are logged before responses are compressed.

Sessions and CSRF

Session management (tower-sessions) and CSRF protection follow the same .layer() pattern. They are covered in the Authentication section.

Custom middleware

For application-specific middleware, use axum::middleware::from_fn. Write a plain async function that receives the request and a Next handle:

use axum::{
    middleware::{self, Next},
    extract::Request,
    response::Response,
    http::StatusCode,
};

async fn require_auth(
    State(state): State<AppState>,
    request: Request,
    next: Next,
) -> Result<Response, StatusCode> {
    // check auth, return Err(StatusCode::UNAUTHORIZED) if invalid
    Ok(next.run(request).await)
}

let app = Router::new()
    .route("/dashboard", get(dashboard))
    .route_layer(middleware::from_fn_with_state(
        state.clone(),
        require_auth,
    ))
    .with_state(state);

.route_layer() applies middleware only to matched routes. Unmatched requests fall through to the fallback without hitting this middleware. .layer() applies to all requests, including fallbacks.

Serving static assets

An HDA application typically serves a small set of CSS and JavaScript files. The rust-embed crate embeds an entire directory into the binary at compile time, producing a single self-contained executable.

[dependencies]
rust-embed = "8"
mime_guess = "2"

Define an embedded asset struct pointing at your assets directory:

use rust_embed::RustEmbed;

#[derive(RustEmbed)]
#[folder = "assets/"]
struct Assets;

Write a handler that serves embedded files:

use axum::{
    extract::Path,
    http::{header, StatusCode},
    response::IntoResponse,
};

async fn static_handler(Path(path): Path<String>) -> impl IntoResponse {
    match Assets::get(&path) {
        Some(file) => {
            let mime = mime_guess::from_path(&path).first_or_octet_stream();
            (
                [(header::CONTENT_TYPE, mime.as_ref())],
                file.data.to_vec(),
            )
                .into_response()
        }
        None => StatusCode::NOT_FOUND.into_response(),
    }
}

Mount it on the router:

let app = Router::new()
    .route("/", get(index))
    .route("/assets/{*path}", get(static_handler));

In debug builds, rust-embed reads files from disk, so changes to CSS and JavaScript appear without recompilation. In release builds, everything is baked into the binary.

If your project grows to include many large assets (images, fonts), consider tower-http’s ServeDir to serve from the filesystem instead, or move large files to object storage.

Graceful shutdown

axum::serve accepts a shutdown signal via .with_graceful_shutdown(). When the signal fires, the server stops accepting new connections and waits for in-flight requests to complete.

use tokio::signal;

async fn shutdown_signal() {
    let ctrl_c = async {
        signal::ctrl_c()
            .await
            .expect("failed to install Ctrl+C handler");
    };

    #[cfg(unix)]
    let terminate = async {
        signal::unix::signal(signal::unix::SignalKind::terminate())
            .expect("failed to install SIGTERM handler")
            .recv()
            .await;
    };

    #[cfg(not(unix))]
    let terminate = std::future::pending::<()>();

    tokio::select! {
        _ = ctrl_c => {},
        _ = terminate => {},
    }
}

Pass the signal to the server:

axum::serve(listener, app)
    .with_graceful_shutdown(shutdown_signal())
    .await
    .unwrap();

This handles both Ctrl+C (SIGINT) and SIGTERM, which is what Docker and most process managers send when stopping a container. In production, consider adding a TimeoutLayer from tower-http so that slow in-flight requests cannot block shutdown indefinitely.

Putting it together

A complete main.rs combining routing, state, middleware, static assets, and graceful shutdown:

use axum::{
    extract::{Path, State},
    http::{header, StatusCode},
    response::{Html, IntoResponse},
    routing::get,
    Router,
};
use rust_embed::RustEmbed;
use sqlx::PgPool;
use tokio::signal;
use tower::ServiceBuilder;
use tower_http::{compression::CompressionLayer, trace::TraceLayer};
use tracing_subscriber::EnvFilter;

// -- State --

#[derive(Clone)]
struct AppState {
    db: PgPool,
    config: AppConfig,
}

#[derive(Clone)]
struct AppConfig {
    app_name: String,
}

// -- Static assets --

#[derive(RustEmbed)]
#[folder = "assets/"]
struct Assets;

async fn static_handler(Path(path): Path<String>) -> impl IntoResponse {
    match Assets::get(&path) {
        Some(file) => {
            let mime = mime_guess::from_path(&path).first_or_octet_stream();
            ([(header::CONTENT_TYPE, mime.as_ref())], file.data.to_vec())
                .into_response()
        }
        None => StatusCode::NOT_FOUND.into_response(),
    }
}

// -- Handlers --

async fn index(State(state): State<AppState>) -> Html<String> {
    Html(format!(
        r#"<html>
  <head><link rel="stylesheet" href="/assets/style.css"></head>
  <body><h1>{}</h1></body>
</html>"#,
        state.config.app_name
    ))
}

// -- User routes --

fn user_routes() -> Router<AppState> {
    Router::new()
        .route("/", get(list_users))
        .route("/{id}", get(show_user))
}

async fn list_users() -> Html<&'static str> {
    Html("<h1>Users</h1>")
}

async fn show_user(Path(id): Path<u64>) -> Html<String> {
    Html(format!("<h1>User {id}</h1>"))
}

// -- Shutdown --

async fn shutdown_signal() {
    let ctrl_c = async {
        signal::ctrl_c()
            .await
            .expect("failed to install Ctrl+C handler");
    };

    #[cfg(unix)]
    let terminate = async {
        signal::unix::signal(signal::unix::SignalKind::terminate())
            .expect("failed to install SIGTERM handler")
            .recv()
            .await;
    };

    #[cfg(not(unix))]
    let terminate = std::future::pending::<()>();

    tokio::select! {
        _ = ctrl_c => {},
        _ = terminate => {},
    }
}

// -- Main --

#[tokio::main]
async fn main() {
    tracing_subscriber::fmt()
        .with_env_filter(EnvFilter::from_default_env())
        .init();

    let database_url =
        std::env::var("DATABASE_URL").expect("DATABASE_URL must be set");

    let state = AppState {
        db: PgPool::connect(&database_url).await.unwrap(),
        config: AppConfig {
            app_name: "My App".into(),
        },
    };

    let app = Router::new()
        .route("/", get(index))
        .nest("/users", user_routes())
        .route("/assets/{*path}", get(static_handler))
        .layer(
            ServiceBuilder::new()
                .layer(TraceLayer::new_for_http())
                .layer(CompressionLayer::new()),
        )
        .with_state(state);

    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000")
        .await
        .unwrap();

    tracing::info!("listening on {}", listener.local_addr().unwrap());

    axum::serve(listener, app)
        .with_graceful_shutdown(shutdown_signal())
        .await
        .unwrap();
}

The corresponding dependencies:

[dependencies]
axum = "0.8"
tokio = { version = "1", features = ["full"] }
tower = "0.5"
tower-http = { version = "0.6", features = ["trace", "compression-gzip"] }
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
sqlx = { version = "0.8", features = ["runtime-tokio", "postgres"] }
rust-embed = "8"
mime_guess = "2"
serde = { version = "1", features = ["derive"] }

HTML Templating with Maud

Maud is a compile-time HTML templating library for Rust. Its html! macro checks your markup at compile time and expands it to efficient string-building code, so there is no runtime template parsing, no template files to deploy, and no possibility of a missing closing tag appearing in production.

The Web Server with Axum section used Html<String> with format! for responses. That works for trivial cases, but it gives you no structure, no escaping, and no compile-time checking. Maud replaces it entirely. Handlers return Markup instead of Html<String>, and the compiler catches template errors before the server starts.

Setup

Add Maud to your Cargo.toml with the axum feature:

[dependencies]
maud = { version = "0.27", features = ["axum"] }

The axum feature implements IntoResponse for Maud’s Markup type, so handlers can return markup directly. It targets axum-core 0.5, which corresponds to Axum 0.8.

The html! macro

The html! macro is the core of Maud. It takes a custom syntax that resembles HTML but follows Rust conventions, and returns a Markup value:

use maud::{html, Markup};

let greeting = "world";
let page: Markup = html! {
    h1 { "Hello, " (greeting) "!" }
};

Elements

Elements with content use curly braces. Void elements (those that cannot have children) use a semicolon:

html! {
    h1 { "Page title" }
    p {
        strong { "Bold text" }
        " followed by normal text."
    }
    br;
    input type="text" name="query";
}

Non-void elements that need no content still use braces:

html! {
    script src="/static/app.js" {}
    div.placeholder {}
}

Attributes

Attributes appear after the element name, before the braces or semicolon:

html! {
    input type="email" name="user_email" required placeholder="you@example.com";
    a href="/about" { "About" }
    article data-id="12345" { "Content" }
}

Classes and IDs have a shorthand syntax, chained directly onto the element:

html! {
    input #search-input .form-control type="text";
    div.card.shadow-sm { "Card content" }
}
// Produces:
// <input id="search-input" class="form-control" type="text">
// <div class="card shadow-sm">Card content</div>

A class or ID without an element name produces a div:

html! {
    #main { "Main content" }
    .sidebar { "Sidebar content" }
}
// Produces:
// <div id="main">Main content</div>
// <div class="sidebar">Sidebar content</div>

Quote class names that contain characters Maud’s parser would choke on:

html! {
    div."col-sm-6" { "Column" }
}

Dynamic values with splices

Parentheses insert a Rust expression into the output. Maud automatically escapes HTML special characters:

let username = "Alice <script>alert('xss')</script>";
html! {
    p { "Hello, " (username) "!" }
}
// Output: <p>Hello, Alice &lt;script&gt;alert('xss')&lt;/script&gt;!</p>

Any type implementing std::fmt::Display can be spliced. This includes strings, numbers, and any type with a Display implementation.

For dynamic attribute values, use parentheses for a single expression or braces for concatenation:

let user_id = 42;
let base = "/users";

html! {
    // Single expression
    span data-id=(user_id) { "User" }

    // Concatenation
    a href={ (base) "/" (user_id) } { "Profile" }
}

Boolean attributes and toggles

Square brackets conditionally toggle boolean attributes and classes:

let is_active = true;
let is_disabled = false;

html! {
    button disabled[is_disabled] { "Submit" }
    a.nav-link.active[is_active] href="/" { "Home" }
}
// Produces:
// <button>Submit</button>
// <a class="nav-link active" href="/">Home</a>

Optional attributes

Attributes that take an Option value render only when Some:

let tooltip: Option<&str> = Some("More info");
let label: Option<&str> = None;

html! {
    span title=[tooltip] { "Hover me" }
    span aria-label=[label] { "No aria-label rendered" }
}

Control flow

Prefix control structures with @:

let user: Option<&str> = Some("Alice");
let items = vec!["Bread", "Milk", "Eggs"];

html! {
    // if / else
    @if let Some(name) = user {
        p { "Welcome, " (name) }
    } @else {
        p { a href="/login" { "Log in" } }
    }

    // Loops
    ul {
        @for item in &items {
            li { (item) }
        }
    }

    // Let bindings
    @for (i, item) in items.iter().enumerate() {
        @let label = format!("{}. {}", i + 1, item);
        p { (label) }
    }

    // Match
    @match items.len() {
        0 => { p { "No items." } },
        1 => { p { "One item." } },
        n => { p { (n) " items." } },
    }
}

DOCTYPE

Maud provides a DOCTYPE constant:

use maud::DOCTYPE;

html! {
    (DOCTYPE)
    html lang="en" {
        head { title { "My App" } }
        body { h1 { "Hello" } }
    }
}
// Outputs: <!DOCTYPE html><html lang="en">...

Raw HTML with PreEscaped

Maud escapes all spliced content by default. When you have trusted HTML that should not be escaped, wrap it in PreEscaped:

use maud::PreEscaped;

let svg = r#"<svg viewBox="0 0 100 100"><circle cx="50" cy="50" r="40"/></svg>"#;

html! {
    div.icon { (PreEscaped(svg)) }
}

Use this for inline SVGs, pre-rendered markdown, or other HTML you control. Never pass user input to PreEscaped.

Components as functions

Maud has no built-in component system. Components are Rust functions that return Markup. This is simpler and more flexible than a template inheritance system, because you have the full language for composition, branching, and parameterisation.

A basic component:

use maud::{html, Markup};

fn nav_link(href: &str, text: &str, active: bool) -> Markup {
    html! {
        a.nav-link.active[active] href=(href) { (text) }
    }
}

Use it by calling the function inside a splice:

html! {
    nav {
        (nav_link("/", "Home", true))
        (nav_link("/about", "About", false))
        (nav_link("/contact", "Contact", false))
    }
}

Passing content blocks

The simplest approach is accepting Markup directly:

fn card(title: &str, body: Markup) -> Markup {
    html! {
        div.card {
            div.card-header { h3 { (title) } }
            div.card-body { (body) }
        }
    }
}

// Usage
let output = card("Settings", html! {
    p { "Adjust your preferences below." }
    form method="post" {
        // form fields
    }
});

A more flexible approach is accepting anything that implements Render. This lets callers pass Markup, strings, numbers, or any custom type with a Render implementation, without forcing them to wrap everything in html!:

use maud::Render;

fn card(title: &str, body: impl Render) -> Markup {
    html! {
        div.card {
            div.card-header { h3 { (title) } }
            div.card-body { (body) }
        }
    }
}

// All of these work:
card("Note", html! { p { "Rich content." } });
card("Note", "Plain text content");
card("Note", my_renderable_struct);

Prefer impl Render over Markup for component parameters. It is a small change that makes components more composable.

Components as structs with Render

When a component has several fields, or when you want it to compose via splice syntax rather than a function call, make it a struct that implements Render:

use maud::{html, Markup, Render};

enum AlertLevel {
    Info,
    Warning,
    Error,
}

struct Alert<'a, B: Render> {
    level: AlertLevel,
    title: &'a str,
    body: B,
    dismissible: bool,
}

impl<B: Render> Render for Alert<'_, B> {
    fn render(&self) -> Markup {
        let class = match self.level {
            AlertLevel::Info => "alert-info",
            AlertLevel::Warning => "alert-warning",
            AlertLevel::Error => "alert-error",
        };
        html! {
            div.alert.(class) role="alert" {
                strong { (self.title) }
                div { (self.body) }
                @if self.dismissible {
                    button.close type="button" { "×" }
                }
            }
        }
    }
}

Splice it directly, no wrapper function needed:

html! {
    (Alert {
        level: AlertLevel::Warning,
        title: "Disk space low",
        body: html! { p { "Less than 10% remaining." } },
        dismissible: true,
    })
}

Another example, a breadcrumb navigation:

struct Breadcrumb {
    segments: Vec<(String, String)>, // (label, href)
}

impl Render for Breadcrumb {
    fn render(&self) -> Markup {
        html! {
            nav aria-label="breadcrumb" {
                ol.breadcrumb {
                    @for (i, (label, href)) in self.segments.iter().enumerate() {
                        @let is_last = i == self.segments.len() - 1;
                        li.breadcrumb-item.active[is_last] {
                            @if is_last {
                                (label)
                            } @else {
                                a href=(href) { (label) }
                            }
                        }
                    }
                }
            }
        }
    }
}

Reach for Render when a component has enough fields that a function signature would get unwieldy, when it will be stored in collections and rendered in loops, or when other crates need to provide renderable types. For simple one- or two-parameter components, plain functions are shorter and sufficient.

Page layouts

A layout is a function that wraps content in a full HTML document. Since layouts in an HDA application typically need request context (the current user, flash messages, navigation state), build the layout as an Axum extractor.

First, a minimal layout function to show the shape:

use maud::{html, Markup, DOCTYPE};

fn base_layout(title: &str, content: Markup) -> Markup {
    html! {
        (DOCTYPE)
        html lang="en" {
            head {
                meta charset="utf-8";
                meta name="viewport" content="width=device-width, initial-scale=1";
                title { (title) }
                link rel="stylesheet" href="/assets/style.css";
                script src="/assets/htmx.min.js" defer {}
            }
            body {
                main { (content) }
            }
        }
    }
}

In practice, layouts need data from the request: the authenticated user for navigation, flash messages from the session, the current path for active link highlighting. Extract all of this in a layout struct that implements FromRequestParts:

use axum::extract::FromRequestParts;
use axum::http::request::Parts;
use maud::{html, Markup, DOCTYPE};

struct PageLayout {
    user: Option<User>,
    current_path: String,
}

impl<S: Send + Sync> FromRequestParts<S> for PageLayout {
    type Rejection = std::convert::Infallible;

    async fn from_request_parts(
        parts: &mut Parts,
        _state: &S,
    ) -> Result<Self, Self::Rejection> {
        let user = parts.extensions.get::<User>().cloned();
        let current_path = parts.uri.path().to_string();
        Ok(PageLayout { user, current_path })
    }
}

impl PageLayout {
    fn render(self, title: &str, content: Markup) -> Markup {
        html! {
            (DOCTYPE)
            html lang="en" {
                head {
                    meta charset="utf-8";
                    meta name="viewport" content="width=device-width, initial-scale=1";
                    title { (title) }
                    link rel="stylesheet" href="/assets/style.css";
                    script src="/assets/htmx.min.js" defer {}
                }
                body {
                    nav {
                        a.active[self.current_path == "/"] href="/" { "Home" }
                        a.active[self.current_path.starts_with("/users")]
                            href="/users" { "Users" }

                        div.nav-right {
                            @if let Some(user) = &self.user {
                                span { (user.name) }
                                a href="/logout" { "Log out" }
                            } @else {
                                a href="/login" { "Log in" }
                            }
                        }
                    }
                    main { (content) }
                    footer {
                        p { "© 2026" }
                    }
                }
            }
        }
    }
}

Handlers extract the layout alongside other parameters:

async fn user_list(
    layout: PageLayout,
    State(state): State<AppState>,
) -> Markup {
    let users = fetch_users(&state.db).await;

    layout.render("Users", html! {
        h1 { "Users" }
        ul {
            @for user in &users {
                li { (user.name) }
            }
        }
    })
}

The handler focuses on its content. The layout handles the document shell, navigation, and any request-scoped data. Add fields to PageLayout as the application grows (flash messages, CSRF tokens, feature flags) without changing handler signatures.

Full pages vs HTML fragments

In an HDA application, the same handler often needs to return a full HTML page for normal browser requests and a bare HTML fragment for htmx requests. A normal navigation loads the entire page. An htmx-boosted link or hx-get request only needs the content that will be swapped into the page.

The axum-htmx crate provides typed extractors for htmx request headers:

[dependencies]
axum-htmx = "0.6"

Use HxBoosted to detect boosted navigation (where htmx intercepts a normal link click and swaps just the body), or HxRequest to detect any htmx-initiated request:

use axum_htmx::HxBoosted;

async fn user_list(
    HxBoosted(boosted): HxBoosted,
    layout: PageLayout,
    State(state): State<AppState>,
) -> Markup {
    let users = fetch_users(&state.db).await;

    let content = html! {
        h1 { "Users" }
        ul {
            @for user in &users {
                li { (user.name) }
            }
        }
    };

    if boosted {
        content
    } else {
        layout.render("Users", content)
    }
}

For targeted fragment swaps (where htmx replaces a specific element on the page), handlers return only the fragment:

use axum_htmx::HxRequest;

async fn user_search(
    HxRequest(is_htmx): HxRequest,
    Query(params): Query<SearchParams>,
    layout: PageLayout,
    State(state): State<AppState>,
) -> Markup {
    let users = search_users(&state.db, &params.q).await;

    let results = html! {
        ul #search-results {
            @for user in &users {
                li { (user.name) }
            }
        }
    };

    if is_htmx {
        results
    } else {
        layout.render("Search", html! {
            h1 { "Search users" }
            input type="search" name="q" value=(params.q)
                hx-get="/users/search"
                hx-target="#search-results"
                hx-trigger="input changed delay:300ms";
            (results)
        })
    }
}

This pattern means every URL works as a full page when accessed directly (bookmarks, shared links, first page load) and as a fragment when accessed via htmx. No separate endpoint needed.

htmx attributes in Maud

htmx attributes use the hx- prefix, which works naturally in Maud:

html! {
    // Click to load
    button hx-get="/api/data" hx-target="#results" hx-swap="innerHTML" {
        "Load data"
    }

    // Form submission
    form hx-post="/contacts" hx-target="#contact-list" hx-swap="beforeend" {
        input type="text" name="name" required;
        button type="submit" { "Add contact" }
    }

    // Inline editing
    tr hx-get={ "/users/" (user.id) "/edit" } hx-trigger="click"
       hx-target="this" hx-swap="outerHTML" {
        td { (user.name) }
        td { (user.email) }
    }

    // Delete with confirmation
    button hx-delete={ "/users/" (user.id) }
           hx-confirm="Delete this user?"
           hx-target="closest tr"
           hx-swap="outerHTML swap:500ms" {
        "Delete"
    }
}

The Interactivity with htmx section covers htmx patterns in full.

Gotchas

Semicolons on void elements. Forgetting the semicolon on input, br, meta, link, or img causes a compile error. If the compiler complains about unexpected tokens after an element name, check for a missing semicolon.

// Wrong: Maud expects children
input type="text" { }

// Correct: semicolon terminates void elements
input type="text";

The @ prefix is mandatory for control flow. All if, for, let, and match inside html! must start with @. Without it, Maud tries to parse the keyword as an element name.

Brace vs parenthesis in attributes. Parentheses splice a single expression. Braces concatenate multiple parts. Using parentheses when you need concatenation silently drops everything after the first expression:

let id = 42;

// Wrong: only inserts base, "/users/" and id are treated as element content
a href=("/users/") (id) { "Profile" }

// Correct: braces concatenate
a href={ "/users/" (id) } { "Profile" }

Compile-time cost. Maud macros expand at compile time, which is good for runtime performance but can slow incremental builds on large templates. Breaking templates into smaller functions across modules helps, because Rust only recompiles the modules that changed.

Interactivity with HTMX

htmx gives HTML the ability to issue HTTP requests and swap content into the page, without writing JavaScript. Add attributes to your markup, and htmx handles the rest: it sends an AJAX request, receives an HTML fragment from the server, and replaces a targeted element in the DOM. The server stays in control of rendering. There is no client-side state, no JSON serialisation layer, and no build step.

htmx is 14 KB minified and gzipped, has zero runtime dependencies, and works with any server that returns HTML. In this stack, Axum handlers return Maud Markup and htmx swaps it into place.

Including htmx

Vendor htmx into your project rather than loading it from a CDN. Download the minified file and place it in your assets directory:

assets/
  htmx.min.js

The Web Server with Axum section covers serving static assets with rust-embed. Include htmx in your layout’s <head>:

head {
    // ...
    script src="/assets/htmx.min.js" defer {}
}

The defer attribute ensures htmx loads after HTML parsing completes, avoiding render-blocking.

Vendoring eliminates CDN availability concerns and keeps the dependency auditable. htmx has zero transitive runtime dependencies, so you are vendoring exactly one file.

How htmx works

htmx extends HTML with attributes that describe HTTP interactions declaratively. The core mechanism is:

  1. An event occurs on an element (click, submit, keyup, or any DOM event).
  2. htmx sends an HTTP request (GET, POST, PUT, PATCH, DELETE) to a URL.
  3. The server returns an HTML fragment.
  4. htmx swaps that fragment into a target element in the DOM.

Every step is controlled by HTML attributes. No JavaScript is written by the application developer.

button hx-get="/clicked" hx-target="#result" hx-swap="innerHTML" {
    "Click me"
}
div #result {}

When the button is clicked, htmx issues GET /clicked, takes the response body, and replaces the inner HTML of #result with it. The handler for /clicked returns a Maud fragment:

async fn clicked() -> Markup {
    html! { p { "You clicked the button." } }
}

That is the entire pattern. Everything else in htmx is refinement of these four concepts: what triggers the request, what request to send, where to put the response, and how to swap it in.

Core attributes

Request attributes

Five attributes correspond to the five HTTP methods:

AttributeMethodTypical use
hx-getGETFetch and display data
hx-postPOSTSubmit data, create resources
hx-putPUTFull resource replacement
hx-patchPATCHPartial update
hx-deleteDELETERemove a resource

Each takes a URL as its value:

// Fetch a list
button hx-get="/users" hx-target="#user-list" { "Load users" }

// Create a resource
form hx-post="/users" hx-target="#user-list" hx-swap="beforeend" {
    input type="text" name="name" required;
    button type="submit" { "Add user" }
}

// Delete a resource
button hx-delete={ "/users/" (user.id) }
       hx-confirm="Delete this user?"
       hx-target="closest tr"
       hx-swap="outerHTML" {
    "Delete"
}

htmx sends form data automatically for elements within a form. For elements outside a form, use hx-include to specify which inputs to include in the request.

hx-target

hx-target specifies which element receives the swapped content. It takes a CSS selector, or one of these special values:

  • this: the element that triggered the request
  • closest <selector>: the nearest ancestor matching the selector
  • find <selector>: the first descendant matching the selector
  • next <selector>: the next sibling matching the selector
  • previous <selector>: the previous sibling matching the selector

If hx-target is omitted, htmx swaps content into the element that issued the request.

// Swap into a specific element
button hx-get="/stats" hx-target="#dashboard-stats" { "Refresh" }

// Swap into the closest ancestor
button hx-delete={ "/items/" (item.id) }
       hx-target="closest li"
       hx-swap="outerHTML" {
    "Remove"
}

hx-swap

hx-swap controls how the response is inserted relative to the target. The default is innerHTML.

ValueBehaviour
innerHTMLReplace the target’s children (default)
outerHTMLReplace the entire target element
beforebeginInsert before the target
afterbeginInsert as the target’s first child
beforeendInsert as the target’s last child
afterendInsert after the target
deleteDelete the target element, ignore the response
noneDon’t swap anything (out-of-band swaps still process)

beforeend is particularly useful for appending to lists:

// Handler returns a single <li>, appended to the list
form hx-post="/todos" hx-target="#todo-list" hx-swap="beforeend" {
    input type="text" name="title" placeholder="New todo" required;
    button type="submit" { "Add" }
}
ul #todo-list {
    @for todo in &todos {
        li { (todo.title) }
    }
}

outerHTML replaces the target itself, which is the right choice for inline editing where the display row swaps with an edit form and back:

// Display row, click to edit
tr hx-get={ "/contacts/" (contact.id) "/edit" }
   hx-trigger="click"
   hx-target="this"
   hx-swap="outerHTML" {
    td { (contact.name) }
    td { (contact.email) }
}

Swap modifiers

Append modifiers to the swap value, separated by spaces:

  • swap:<timing>: delay before performing the swap (e.g., swap:100ms)
  • settle:<timing>: delay between swap and settle phase, useful for CSS transitions (e.g., settle:200ms)
  • scroll:<target>:<direction>: scroll the target or window after swap (scroll:top, scroll:bottom)
  • show:<target>:<direction>: scroll to show the swapped element (show:top, show:bottom)
  • transition:true: use the View Transitions API for the swap animation
// Fade-out delete: swap after 500ms to allow a CSS transition
button hx-delete={ "/users/" (user.id) }
       hx-target="closest tr"
       hx-swap="outerHTML swap:500ms" {
    "Delete"
}

// Scroll to top after loading new content
div hx-get="/page/2" hx-swap="innerHTML show:top" {
    "Load more"
}

hx-confirm

hx-confirm shows a browser confirmation dialog before the request is sent. The request only proceeds if the user confirms:

button hx-delete={ "/projects/" (project.id) }
       hx-confirm="Are you sure? This cannot be undone."
       hx-target="closest .project-card"
       hx-swap="outerHTML" {
    "Delete project"
}

hx-select

hx-select extracts a subset of the response using a CSS selector before swapping. This is useful when a handler returns a full page but you only need a fragment:

// Extract just the #results element from the response
a hx-get="/search?q=rust" hx-target="#results" hx-select="#results" {
    "Search for Rust"
}

hx-include

hx-include tells htmx to include values from additional elements in the request. Accepts a CSS selector:

// Include the search input's value in the request
input #search type="text" name="q";
button hx-get="/search" hx-include="#search" hx-target="#results" {
    "Search"
}

hx-vals and hx-headers

hx-vals adds extra parameters to the request body as JSON:

button hx-post="/track"
       hx-vals=r#"{"event": "button_click", "source": "header"}"# {
    "Track"
}

hx-headers adds custom HTTP headers:

button hx-get="/api/data"
       hx-headers=r#"{"X-Custom-Header": "value"}"# {
    "Fetch"
}

hx-push-url

hx-push-url pushes the request URL into the browser’s history stack, so the back button works. Set it to true to push the request URL, or provide a specific URL:

// Push the request URL to history
a hx-get="/users" hx-target="#content" hx-push-url="true" {
    "Users"
}

// Push a different URL
button hx-get="/users?page=2" hx-target="#user-list"
       hx-push-url="/users/page/2" {
    "Next page"
}

Triggering requests

hx-trigger

hx-trigger specifies which event initiates the request. Without it, htmx uses the natural event for each element type:

ElementDefault trigger
input, textarea, selectchange
formsubmit
Everything elseclick

Override the default by specifying any DOM event:

// Trigger on keyup instead of change
input type="search" name="q"
      hx-get="/search"
      hx-target="#results"
      hx-trigger="keyup" {
}

Trigger modifiers

Modifiers refine when and how triggers fire. Append them after the event name, separated by spaces:

changed: only fire if the element’s value has actually changed since the last request:

input type="search" name="q"
      hx-get="/search"
      hx-target="#results"
      hx-trigger="keyup changed";

delay:<time>: wait before firing. Each new event resets the timer. This is debouncing: the request fires only after the user stops typing:

input type="search" name="q"
      hx-get="/search"
      hx-target="#results"
      hx-trigger="keyup changed delay:300ms";

throttle:<time>: fire at most once per interval. Unlike delay, the first event fires immediately. Subsequent events within the window are dropped:

// Update position at most every 200ms
div hx-post="/position"
    hx-trigger="mousemove throttle:200ms" {
    "Track mouse"
}

from:<selector>: listen for the event on a different element. Useful for keyboard shortcuts:

// Trigger search when Enter is pressed anywhere in the document
div hx-get="/search"
    hx-target="#results"
    hx-trigger="keyup[key=='Enter'] from:body" {
    // ...
}

consume: prevent the event from propagating to parent elements.

queue:<strategy>: control what happens when a new event fires while a request is in flight:

  • queue:first: queue the first event, drop the rest
  • queue:last: queue only the most recent event (default)
  • queue:all: queue every event, process them one at a time
  • queue:none: drop all events while a request is active

Multiple triggers

Separate multiple triggers with commas:

// Load on page load AND on click
div hx-get="/notifications"
    hx-trigger="load, click"
    hx-target="this" {
    "Loading..."
}

Polling

Use the every syntax to poll an endpoint at a fixed interval:

div hx-get="/status"
    hx-trigger="every 5s"
    hx-target="this" {
    "Checking status..."
}

Event filters

Square brackets filter events by a JavaScript expression:

// Only trigger on Enter key
input type="text" name="q"
      hx-get="/search"
      hx-target="#results"
      hx-trigger="keyup[key=='Enter']";

Boosted links and navigation

hx-boost converts standard links and forms into AJAX requests. Apply it to a parent element and all descendant <a> and <form> elements are automatically boosted:

body hx-boost="true" {
    nav {
        a href="/users" { "Users" }
        a href="/settings" { "Settings" }
    }
    main #content {
        (content)
    }
}

When a user clicks a boosted link, htmx:

  1. Issues a GET request to the link’s href
  2. Swaps the response into <body> using innerHTML
  3. Pushes the URL into browser history

The page does not fully reload. The browser keeps the existing <head> (scripts, stylesheets) and only swaps the body content, making navigation feel instant.

Boosted forms work the same way: the form is submitted via AJAX and the response replaces the body.

Progressive enhancement is built in. If JavaScript is disabled or fails to load, boosted links and forms still work as standard HTML. The server returns the same full HTML page either way. htmx intercepts the navigation when it can; the browser handles it normally when it cannot.

Detecting boosted requests on the server

Boosted requests include the HX-Boosted: true header. Use this to return just the body content instead of a full HTML document, avoiding redundant <head> parsing:

use axum_htmx::HxBoosted;

async fn users_page(
    HxBoosted(boosted): HxBoosted,
    layout: PageLayout,
    State(state): State<AppState>,
) -> Markup {
    let users = fetch_users(&state.db).await;

    let content = html! {
        h1 { "Users" }
        ul {
            @for user in &users {
                li { (user.name) }
            }
        }
    };

    if boosted {
        content
    } else {
        layout.render("Users", content)
    }
}

Every URL works as both a full page (direct navigation, bookmarks) and a fragment (boosted navigation). One handler, one template, no separate endpoint.

Loading indicators

htmx adds the htmx-request CSS class to elements while a request is in flight. Use this to show loading spinners, disable buttons, or fade content.

Default behaviour

By default, htmx adds htmx-request to the element that issued the request:

button hx-get="/slow-endpoint" hx-target="#result" {
    "Load data"
}

Style the loading state with CSS:

button.htmx-request {
    opacity: 0.5;
    pointer-events: none;
}

hx-indicator

hx-indicator specifies a different element to receive the htmx-request class. This is useful for showing a spinner that is separate from the trigger element:

button hx-get="/data" hx-target="#results" hx-indicator="#spinner" {
    "Load"
}
img #spinner.htmx-indicator src="/assets/spinner.svg" alt="Loading";

htmx includes default CSS for the htmx-indicator class that hides the element until the request is active:

.htmx-indicator {
    opacity: 0;
}
.htmx-request .htmx-indicator,
.htmx-request.htmx-indicator {
    opacity: 1;
    transition: opacity 200ms ease-in;
}

Override these styles to match your application’s design. The visibility approach avoids layout shifts:

.htmx-indicator {
    display: none;
}
.htmx-request .htmx-indicator,
.htmx-request.htmx-indicator {
    display: inline-block;
}

Inline loading text

A common pattern replaces the button text while loading:

button hx-post="/save" hx-target="#form-container" hx-swap="outerHTML" {
    span.ready { "Save" }
    span.htmx-indicator { "Saving..." }
}
button .htmx-indicator { display: none; }
button.htmx-request .ready { display: none; }
button.htmx-request .htmx-indicator { display: inline; }

Working with Axum

The axum-htmx crate provides typed extractors for htmx request headers and typed responders for htmx response headers:

[dependencies]
axum-htmx = "0.6"

Request extractors

All htmx request headers have a corresponding extractor. Extractors are infallible, so they always succeed and never reject a request:

HeaderExtractorValue
HX-RequestHxRequestbool
HX-BoostedHxBoostedbool
HX-TargetHxTargetOption<String>
HX-TriggerHxTriggerOption<String>
HX-Trigger-NameHxTriggerNameOption<String>
HX-Current-URLHxCurrentUrlOption<Uri>
HX-PromptHxPromptOption<String>

HxRequest detects any htmx-initiated request. HxBoosted specifically detects boosted navigation. Use whichever matches the handler’s needs:

use axum_htmx::HxRequest;

async fn search(
    HxRequest(is_htmx): HxRequest,
    Query(params): Query<SearchParams>,
    layout: PageLayout,
    State(state): State<AppState>,
) -> Markup {
    let users = search_users(&state.db, &params.q).await;

    let results = html! {
        ul #search-results {
            @for user in &users {
                li { (user.name) " – " (user.email) }
            }
        }
    };

    if is_htmx {
        results
    } else {
        layout.render("Search", html! {
            h1 { "Search users" }
            input type="search" name="q" value=(params.q)
                hx-get="/users/search"
                hx-target="#search-results"
                hx-trigger="input changed delay:300ms";
            (results)
        })
    }
}

This pattern ensures every URL works as both a full page (direct navigation, bookmarks, search engine indexing) and as a fragment (htmx requests). The handler renders the same data either way; the only difference is whether it wraps the content in the layout.

Returning fragments from handlers

Handlers that serve only htmx requests return a bare Maud Markup:

async fn delete_user(
    Path(id): Path<i64>,
    State(state): State<AppState>,
) -> Markup {
    sqlx::query!("DELETE FROM users WHERE id = $1", id)
        .execute(&state.db)
        .await
        .unwrap();

    // Return empty markup. htmx will swap with hx-swap="outerHTML"
    // to remove the deleted row
    html! {}
}

For delete operations, the handler returns an empty fragment. Combined with hx-swap="outerHTML" on the trigger element, this removes the target element from the DOM.

Server response headers

htmx checks specific response headers to control client-side behaviour. The axum-htmx crate provides typed responders for each header. Return them as part of a tuple with your response body.

HX-Redirect

Forces a full-page redirect (not an htmx swap). Use this for operations that should leave the current page entirely, like a successful login:

use axum_htmx::HxRedirect;

async fn login(Form(data): Form<LoginForm>) -> impl IntoResponse {
    // ... authenticate ...
    (HxRedirect("/dashboard".to_string()), html! {})
}

htmx intercepts the response and performs window.location = url. The browser does a full navigation. This is different from a standard HTTP 302 redirect, which the browser handles transparently before htmx sees the response.

HX-Push-Url and HX-Replace-Url

HX-Push-Url pushes a URL into the browser’s history stack (creates a new history entry). HX-Replace-Url replaces the current history entry. Both let the server control the displayed URL after a swap:

use axum_htmx::HxPushUrl;

async fn filter_users(
    Query(params): Query<FilterParams>,
    State(state): State<AppState>,
) -> impl IntoResponse {
    let users = filter_users(&state.db, &params).await;
    let url = format!("/users?role={}", params.role);

    (
        HxPushUrl(url),
        html! {
            @for user in &users {
                tr {
                    td { (user.name) }
                    td { (user.role) }
                }
            }
        },
    )
}

HX-Retarget and HX-Reswap

HX-Retarget overrides the element’s hx-target from the server side. HX-Reswap overrides hx-swap. Together, they let the server change where and how content is placed based on the response:

use axum_htmx::{HxRetarget, HxReswap, SwapOption};
use axum::http::StatusCode;

async fn create_user(Form(data): Form<NewUser>) -> impl IntoResponse {
    match validate_and_save(&data).await {
        Ok(user) => {
            // Success: append to the user list as intended by the form's hx-target
            (StatusCode::OK, html! {
                tr {
                    td { (user.name) }
                    td { (user.email) }
                }
            }).into_response()
        }
        Err(errors) => {
            // Validation failed: retarget to the error container, swap innerHTML
            (
                StatusCode::UNPROCESSABLE_ENTITY,
                HxRetarget("#form-errors".to_string()),
                HxReswap(SwapOption::InnerHtml),
                html! {
                    ul.errors {
                        @for error in &errors {
                            li { (error) }
                        }
                    }
                },
            ).into_response()
        }
    }
}

This is a powerful pattern: the form’s hx-target and hx-swap describe the success case. When validation fails, the server redirects the swap to a different element without any client-side logic.

HX-Trigger (response)

HX-Trigger fires custom events on the client after the response is processed. Other elements on the page can listen for these events using hx-trigger="from:body":

use axum_htmx::HxResponseTrigger;

async fn create_todo(
    Form(data): Form<NewTodo>,
    State(state): State<AppState>,
) -> impl IntoResponse {
    let todo = save_todo(&state.db, &data).await.unwrap();

    (
        HxResponseTrigger::normal(vec!["todo-added".to_string()]),
        html! {
            li { (todo.title) }
        },
    )
}

An element elsewhere on the page can react to this event:

// This element refreshes when a todo is added
span hx-get="/todos/count"
     hx-trigger="todo-added from:body"
     hx-target="this" {
    (count)
}

HxResponseTrigger supports three timing modes:

  • HxResponseTrigger::normal(): fires immediately (sets HX-Trigger)
  • HxResponseTrigger::after_swap(): fires after the swap completes (sets HX-Trigger-After-Swap)
  • HxResponseTrigger::after_settle(): fires after the settle phase (sets HX-Trigger-After-Settle)

HX-Refresh

Forces a full page refresh:

use axum_htmx::HxRefresh;

async fn clear_cache() -> impl IntoResponse {
    // ... clear cache ...
    (HxRefresh(true), html! {})
}

Complete responder reference

HeaderResponderValue
HX-LocationHxLocationString
HX-Push-UrlHxPushUrlString
HX-RedirectHxRedirectString
HX-RefreshHxRefreshbool
HX-Replace-UrlHxReplaceUrlString
HX-ReswapHxReswapSwapOption
HX-RetargetHxRetargetString
HX-ReselectHxReselectString
HX-TriggerHxResponseTriggerVec<String> or Vec<HxEvent>

All responders implement IntoResponseParts, so they compose naturally with Maud’s Markup in tuples.

Form submission and validation

A basic pattern: submit a form with hx-post, display validation errors inline if the submission fails. The Form Handling and Validation section covers this topic in full.

The form:

fn new_contact_form(errors: &[String]) -> Markup {
    html! {
        form #contact-form hx-post="/contacts" hx-target="this" hx-swap="outerHTML" {
            @if !errors.is_empty() {
                ul.errors {
                    @for error in errors {
                        li { (error) }
                    }
                }
            }
            label {
                "Name"
                input type="text" name="name" required;
            }
            label {
                "Email"
                input type="email" name="email" required;
            }
            button type="submit" {
                span.ready { "Save" }
                span.htmx-indicator { "Saving..." }
            }
        }
    }
}

The handler:

async fn create_contact(
    State(state): State<AppState>,
    Form(data): Form<NewContact>,
) -> Markup {
    let mut errors = Vec::new();
    if data.name.trim().is_empty() {
        errors.push("Name is required.".to_string());
    }
    if !data.email.contains('@') {
        errors.push("A valid email is required.".to_string());
    }

    if !errors.is_empty() {
        return new_contact_form(&errors);
    }

    let contact = save_contact(&state.db, &data).await.unwrap();

    html! {
        tr {
            td { (contact.name) }
            td { (contact.email) }
        }
    }
}

On validation failure, the handler returns the form again with error messages. The form’s hx-swap="outerHTML" replaces itself with the re-rendered version, preserving the user’s input. On success, the handler returns a table row, which replaces the form.

SSE extension

htmx includes an SSE (Server-Sent Events) extension for receiving real-time server-pushed updates. The Server-Sent Events and Real-Time Updates section covers SSE in depth. Here is the basic setup.

Include the SSE extension after htmx. Vendor the file alongside htmx.min.js:

assets/
  htmx.min.js
  ext/
    sse.js

Include it in the layout:

head {
    // ...
    script src="/assets/htmx.min.js" defer {}
    script src="/assets/ext/sse.js" defer {}
}

Connect to an SSE endpoint and swap content when events arrive:

div hx-ext="sse" sse-connect="/events" {
    // This element updates when the server sends a "notification" event
    div sse-swap="notification" {
        "Waiting for notifications..."
    }

    // This element updates on "status" events
    div sse-swap="status" {
        "Status: unknown"
    }
}

The Axum handler returns an SSE stream. When the server sends an event named notification, htmx takes the event’s data (an HTML fragment) and swaps it into the element with sse-swap="notification". The browser handles reconnection automatically if the connection drops.

Gotchas

htmx processes 2xx responses only. By default, htmx swaps content only for 200-level status codes. Non-2xx responses are ignored (no swap happens). To swap error content into the page, either return a 200 with error markup, or configure htmx’s responseHandling to process specific error codes. The HX-Retarget and HX-Reswap headers offer a clean alternative: return the error markup with a 422 status and redirect the swap to an error container.

3xx redirects bypass htmx headers. When a server returns a 302 or 301, the browser follows the redirect transparently. htmx never sees the response headers. Use HX-Redirect (with a 200 status) instead of HTTP 302 when you need htmx to process the redirect.

hx-boost changes form encoding. Boosted forms send requests via AJAX. If a form uses enctype="multipart/form-data" for file uploads, htmx handles this correctly. But be aware that boosted GET forms append parameters to the URL rather than the body, matching standard HTML form behaviour.

History and the back button. When using hx-push-url or hx-boost, htmx caches page snapshots for the back button. If your pages include dynamic state (e.g., a logged-in user’s name in the nav), the cached snapshot may show stale data. htmx fires an htmx:historyRestore event when restoring from cache, which you can use to refresh stale sections.

Attribute inheritance. htmx attributes inherit down the DOM tree. An hx-target on a <div> applies to all htmx-enabled elements inside it. This is useful for setting defaults (e.g., hx-target="#content" on a container), but can cause surprises if a nested element inherits a target you did not intend. Use hx-target="unset" to break inheritance.

CSS Without Frameworks

CSS frameworks and preprocessors exist to solve problems that the web platform now handles natively. CSS nesting, container queries, :has(), @layer, and custom properties eliminate the need for Sass, Less, or utility-class frameworks. This section covers writing plain CSS for an HDA application, processing it with the lightningcss crate, and co-locating styles alongside Maud components using the inventory crate.

The result is a single processed stylesheet, built at startup from a base CSS file and component-scoped fragments, minified and vendor-prefixed, served from memory with cache-busting.

Plain CSS in 2026

Native CSS now provides the features that historically required preprocessors:

  • Nesting replaces Sass/Less nesting syntax. Write .card { .title { ... } } directly.
  • Custom properties (--color-primary: #1a1a2e;) replace preprocessor variables, with the advantage of being runtime-configurable and inheritable through the DOM.
  • @layer controls cascade priority without specificity hacks.
  • Container queries let components respond to their container’s size rather than the viewport.
  • :has() selects elements based on their children, replacing many patterns that previously required JavaScript.

The Web Platform Has Caught Up section covers these features in detail. This section focuses on the tooling pipeline: how to write, process, and serve CSS in a Rust HDA application.

CSS organisation with RSCSS

RSCSS (Reasonable System for CSS Stylesheet Structure) provides a lightweight naming convention that works well with component-based architectures. It imposes just enough structure to keep styles maintainable without the ceremony of BEM or the magic of CSS Modules.

The core rules:

  • Components are named with at least two words, separated by dashes: .search-form, .article-card, .user-profile.
  • Elements within a component use a single word: .title, .body, .avatar. Multi-word elements are concatenated: .firstname, .submitbutton. Use the child selector (>) to prevent styles bleeding into nested components.
  • Variants modify a component or element. RSCSS normally prefixes variants with a dash (.search-form.-compact), but dashes at the start of a class name are awkward in Maud templates. Use a double underscore prefix instead: .search-form.__compact. The double underscore distinguishes variants from helpers at a glance.
  • Helpers are global utility classes prefixed with a single underscore: ._hidden, ._center. Keep these minimal.

In practice:

.article-card {
    border: 1px solid var(--border);
    border-radius: 0.5rem;
    padding: 1rem;

    > .title {
        font-size: 1.25rem;
        font-weight: 600;
    }

    > .meta {
        color: var(--text-muted);
        font-size: 0.875rem;
    }

    &.__featured {
        border-color: var(--accent);
    }
}

And the corresponding Maud component:

fn article_card(article: &Article) -> Markup {
    html! {
        div.article-card.__featured[article.featured] {
            h2.title { (article.title) }
            p.meta { "By " (article.author) }
        }
    }
}

The two-word component rule means component classes never collide with single-word element classes. The double-underscore variant prefix is visually distinct from both element classes and helper utilities, and works cleanly in Maud’s class syntax.

lightningcss

lightningcss is a CSS parser, transformer, and minifier written in Rust by the Parcel team. It processes over 2.7 million lines of CSS per second on a single thread. Use it to minify, vendor-prefix, and downlevel modern CSS syntax for older browsers.

Add it to Cargo.toml:

[dependencies]
lightningcss = { version = "1.0.0-alpha.70", default-features = false }

Disable default features to avoid pulling in Node.js binding dependencies. Enable bundler if you need @import resolution, or visitor if you need custom AST transforms.

A function to process a CSS string:

use lightningcss::stylesheet::{StyleSheet, ParserOptions, MinifyOptions};
use lightningcss::printer::PrinterOptions;
use lightningcss::targets::{Targets, Browsers};

pub fn process_css(raw: &str) -> Result<String, String> {
    let targets = Targets::from(Browsers {
        chrome: Some(95 << 16),
        firefox: Some(90 << 16),
        safari: Some(15 << 16),
        ..Browsers::default()
    });

    let mut stylesheet = StyleSheet::parse(raw, ParserOptions {
        filename: "styles.css".to_string(),
        ..ParserOptions::default()
    })
    .map_err(|e| format!("CSS parse error: {e}"))?;

    stylesheet
        .minify(MinifyOptions {
            targets,
            ..MinifyOptions::default()
        })
        .map_err(|e| format!("CSS minify error: {e}"))?;

    let result = stylesheet
        .to_css(PrinterOptions {
            minify: true,
            targets,
            ..PrinterOptions::default()
        })
        .map_err(|e| format!("CSS print error: {e}"))?;

    Ok(result.code)
}

Browser targets are encoded as major << 16 | minor << 8 | patch. Chrome 95 is 95 << 16.

Pass targets to both MinifyOptions and PrinterOptions. The minify step transforms modern syntax (nesting, oklch() colours, logical properties) into forms the target browsers understand. The printer step serialises the result, applying minification when minify: true.

What lightningcss handles automatically:

  • Flattens CSS nesting for older browsers
  • Adds vendor prefixes (-webkit-, -moz-) where targets require them
  • Converts modern colour functions (oklch(), lab(), color-mix()) to rgb()/rgba() fallbacks
  • Transpiles logical properties (margin-inline-start) to physical equivalents
  • Converts media query range syntax (@media (width >= 768px)) to min-width form
  • Merges longhand properties into shorthands
  • Removes redundant vendor prefixes the targets don’t need

Locality of behaviour with inventory

The inventory crate provides a distributed registration pattern: declare values in any module, collect them all in one place at startup. This enables locality of behaviour for CSS, where each component’s styles live in the same file as its markup.

[dependencies]
inventory = "0.3"

Define a CSS fragment type

Create a type to hold a CSS fragment and register it with inventory::collect!:

// src/styles.rs

pub struct CssFragment(pub &'static str);

inventory::collect!(CssFragment);

Co-locate CSS with components

In each component file, declare the CSS alongside the markup using inventory::submit!:

// src/components/article_card.rs

use maud::{html, Markup};
use crate::styles::CssFragment;

inventory::submit! {
    CssFragment(r#"
        .article-card {
            border: 1px solid var(--border);
            border-radius: 0.5rem;
            padding: 1rem;

            > .title {
                font-size: 1.25rem;
                font-weight: 600;
            }

            > .meta {
                color: var(--text-muted);
                font-size: 0.875rem;
            }

            &.__featured {
                border-color: var(--accent);
            }
        }
    "#)
}

pub fn article_card(article: &Article) -> Markup {
    html! {
        div.article-card.__featured[article.featured] {
            h2.title { (article.title) }
            p.meta { "By " (article.author) }
        }
    }
}

Adding a new component with styles requires no changes to any other file. The CSS lives next to the markup that uses it.

Another component

// src/components/nav_bar.rs

use maud::{html, Markup};
use crate::styles::CssFragment;

inventory::submit! {
    CssFragment(r#"
        .nav-bar {
            display: flex;
            align-items: center;
            gap: 1rem;
            padding: 0.75rem 1.5rem;
            background: var(--nav-bg);

            > .link {
                color: var(--nav-link);
                text-decoration: none;
            }

            > .link.__active {
                font-weight: 600;
                color: var(--nav-link-active);
            }
        }
    "#)
}

pub fn nav_bar(current_path: &str) -> Markup {
    html! {
        nav.nav-bar {
            a.link.__active[current_path == "/"] href="/" { "Home" }
            a.link.__active[current_path.starts_with("/users")] href="/users" { "Users" }
        }
    }
}

The processing pipeline

At startup, collect all CSS fragments, concatenate them with a base stylesheet, process through lightningcss, and cache the result in memory. A content hash in the filename enables indefinite browser caching.

Base stylesheet

A base.css file contains resets, custom properties, and global styles that don’t belong to any component:

/* assets/base.css */

*,
*::before,
*::after {
    box-sizing: border-box;
}

:root {
    --text: #1a1a2e;
    --text-muted: #6b7280;
    --bg: #ffffff;
    --border: #e5e7eb;
    --accent: #2563eb;
    --nav-bg: #f9fafb;
    --nav-link: #374151;
    --nav-link-active: #1a1a2e;
}

body {
    font-family: system-ui, -apple-system, sans-serif;
    color: var(--text);
    background: var(--bg);
    margin: 0;
    line-height: 1.6;
}

Build and serve the stylesheet

// src/styles.rs

use lightningcss::stylesheet::{StyleSheet, ParserOptions, MinifyOptions};
use lightningcss::printer::PrinterOptions;
use lightningcss::targets::{Targets, Browsers};
use std::sync::LazyLock;

pub struct CssFragment(pub &'static str);

inventory::collect!(CssFragment);

static BASE_CSS: &str = include_str!("../assets/base.css");

pub struct ProcessedCss {
    pub body: String,
    pub filename: String,
    pub route: String,
}

static STYLESHEET: LazyLock<ProcessedCss> = LazyLock::new(|| build_stylesheet());

pub fn stylesheet() -> &'static ProcessedCss {
    &STYLESHEET
}

fn build_stylesheet() -> ProcessedCss {
    // Concatenate base CSS and all component fragments
    let mut raw = String::from(BASE_CSS);
    for fragment in inventory::iter::<CssFragment> {
        raw.push('\n');
        raw.push_str(fragment.0);
    }

    // Process with lightningcss
    let targets = Targets::from(Browsers {
        chrome: Some(95 << 16),
        firefox: Some(90 << 16),
        safari: Some(15 << 16),
        ..Browsers::default()
    });

    let mut sheet = StyleSheet::parse(&raw, ParserOptions {
        filename: "styles.css".to_string(),
        ..ParserOptions::default()
    })
    .expect("CSS parse error");

    sheet
        .minify(MinifyOptions {
            targets,
            ..MinifyOptions::default()
        })
        .expect("CSS minify error");

    let result = sheet
        .to_css(PrinterOptions {
            minify: true,
            targets,
            ..PrinterOptions::default()
        })
        .expect("CSS print error");

    // Hash the output for cache-busting
    let hash = {
        use std::hash::{Hash, Hasher};
        let mut hasher = std::collections::hash_map::DefaultHasher::new();
        result.code.hash(&mut hasher);
        format!("{:x}", hasher.finish())
    };

    let filename = format!("style.{hash}.css");
    let route = format!("/assets/{filename}");

    ProcessedCss {
        body: result.code,
        filename,
        route,
    }
}

The LazyLock ensures the CSS is built once on first access and cached for the lifetime of the process. include_str! embeds base.css into the binary at compile time, so the binary is self-contained.

Wire it into Axum

Expose the stylesheet as a route and make the filename available to the layout:

// src/main.rs

use axum::{
    http::header,
    response::IntoResponse,
    routing::get,
    Router,
};

mod styles;
mod components;

async fn css_handler() -> impl IntoResponse {
    let css = styles::stylesheet();
    (
        [
            (header::CONTENT_TYPE, "text/css"),
            (header::CACHE_CONTROL, "public, max-age=31536000, immutable"),
        ],
        css.body.clone(),
    )
}

fn app() -> Router {
    let css = styles::stylesheet();

    Router::new()
        .route(&css.route, get(css_handler))
        // ... other routes
}

The Cache-Control header tells browsers to cache the file for a year. Because the filename contains a content hash, deploying new CSS produces a new filename, and browsers fetch the new version automatically. Old cached versions expire naturally.

Reference the stylesheet in the layout

The layout component needs the hashed filename to build the <link> tag:

use maud::{html, Markup, DOCTYPE};
use crate::styles;

fn base_layout(title: &str, content: Markup) -> Markup {
    let css = styles::stylesheet();

    html! {
        (DOCTYPE)
        html lang="en" {
            head {
                meta charset="utf-8";
                meta name="viewport" content="width=device-width, initial-scale=1";
                title { (title) }
                link rel="stylesheet" href=(css.route);
                script src="/assets/htmx.min.js" defer {}
            }
            body {
                (content)
            }
        }
    }
}

Every page automatically references the current stylesheet version. When any component’s CSS changes, the hash changes, the filename changes, and browsers fetch the new file on the next page load.

How inventory works

inventory uses platform-specific linker constructor sections (the same mechanism as __attribute__((constructor)) in C). Each inventory::submit! call creates a static value and a constructor function that registers it in an atomic linked list. The OS loader runs all constructors before main() starts, so by the time your application code runs, every fragment is already registered and inventory::iter yields them all.

Three things to keep in mind:

  • No ordering guarantees. Fragments are yielded in whatever order the linker placed them. If CSS cascade order matters between components, switch to a struct with a weight field and sort after collecting. In practice, well-scoped component styles rarely depend on source order.
  • Same-crate usage is safe. The known linker dead-code-elimination issue (where submitted items in an unreferenced crate get stripped) does not apply when collect! and submit! are in the same crate. For a workspace with multiple crates, ensure each crate that submits fragments is referenced by at least one symbol in the binary crate.
  • submit! is module-level only. It cannot appear inside a function body. It is a static declaration, not a runtime statement.

Putting it together

The full flow:

  1. base.css contains resets, custom properties, and global styles. It is embedded with include_str!.
  2. Each component file uses inventory::submit! to register its CSS alongside its Maud markup.
  3. At startup, build_stylesheet() concatenates the base CSS with all registered fragments, processes the result through lightningcss, and hashes the output.
  4. The hashed filename is available to the layout via styles::stylesheet().route.
  5. A single Axum route serves the processed CSS from memory with long-lived cache headers.

No build step. No CSS preprocessor. No file watchers. The Rust compiler and lightningcss handle everything at compile time and startup.

Data

Database with PostgreSQL and SQLx

SQLx is an async database library for Rust that checks your SQL queries against a real PostgreSQL database at compile time. If a query references a column that does not exist, uses the wrong type, or has a syntax error, the compiler catches it before the application runs. This is the primary reason to choose SQLx over other database libraries.

SQLx is not an ORM. There is no query builder, no model macros, and no schema-to-struct code generation. Write SQL directly, and SQLx verifies it.

Setup

Add SQLx to your Cargo.toml:

[dependencies]
sqlx = { version = "0.8", features = [
    "runtime-tokio",
    "tls-rustls-ring-webpki",
    "postgres",
    "macros",
    "migrate",
] }

Feature breakdown:

  • runtime-tokio selects the Tokio async runtime.
  • tls-rustls-ring-webpki enables TLS via rustls with WebPKI certificate roots. For local development without TLS, this still needs to be present but the connection will negotiate plaintext if the server allows it.
  • postgres enables the PostgreSQL driver.
  • macros enables query!, query_as!, and the other compile-time checked query macros.
  • migrate enables the migration runner and migrate! macro.

Add type integration features as needed:

sqlx = { version = "0.8", features = [
    "runtime-tokio",
    "tls-rustls-ring-webpki",
    "postgres",
    "macros",
    "migrate",
    "uuid",
    "time",
    "json",
] }

These enable uuid::Uuid, time crate date/time types, and serde_json::Value / Json<T> for JSONB columns, respectively.

Install the CLI

The sqlx-cli tool manages databases and migrations:

cargo install sqlx-cli --no-default-features --features rustls,postgres

This installs only PostgreSQL support, which keeps the build faster than the full default install.

Connecting to PostgreSQL

SQLx reads the database connection string from the DATABASE_URL environment variable. Set it in a .env file at the project root:

DATABASE_URL=postgres://myapp:password@localhost:5432/myapp_dev

The format is postgres://user:password@host:port/database. SQLx’s macros use dotenvy to read .env automatically at compile time.

PostgreSQL itself should be running as a Docker container managed by Docker Compose. See the Development Environment section for the container setup.

Connection pooling

Create a connection pool at application startup and share it through Axum’s application state. PgPool is internally reference-counted, so cloning it is cheap.

use sqlx::postgres::PgPoolOptions;
use sqlx::PgPool;

let pool = PgPoolOptions::new()
    .max_connections(5)
    .connect(&std::env::var("DATABASE_URL").expect("DATABASE_URL must be set"))
    .await
    .expect("failed to connect to database");

Pass the pool into your Axum AppState:

#[derive(Clone)]
struct AppState {
    db: PgPool,
}

let app = Router::new()
    .route("/", get(index))
    .with_state(AppState { db: pool });

Handlers extract it with State:

async fn list_users(State(state): State<AppState>) -> impl IntoResponse {
    let users = sqlx::query_as!(User, "SELECT id, name, email FROM users")
        .fetch_all(&state.db)
        .await
        .unwrap();
    // render users
}

The default pool configuration is reasonable for most applications:

OptionDefaultPurpose
max_connections10Maximum connections in the pool
min_connections0Minimum idle connections maintained
acquire_timeout30sHow long to wait for a connection
idle_timeout10 minClose idle connections after this duration
max_lifetime30 minClose connections older than this

Override them on PgPoolOptions if needed. For most web applications, setting max_connections to match your expected concurrency and leaving the rest at defaults works well.

For lazy connection establishment (useful in tests or CLIs where the database might not be needed):

let pool = PgPoolOptions::new()
    .max_connections(5)
    .connect_lazy(&database_url)?;

This returns immediately. Connections are established on first use.

Compile-time checked queries

The query! macro is the core of SQLx. At compile time, it connects to the database specified by DATABASE_URL, sends the query to PostgreSQL for parsing and type-checking, and generates Rust code that matches the result columns.

query!

query! returns an anonymous record type with fields matching the query’s output columns:

let row = sqlx::query!("SELECT id, name, email FROM users WHERE id = $1", user_id)
    .fetch_one(&pool)
    .await?;

// row.id: i32
// row.name: String
// row.email: String

Bind parameters use PostgreSQL’s $1, $2, … syntax. The macro checks that the number and types of bind arguments match what the query expects.

query_as!

query_as! maps results directly into a named struct:

struct User {
    id: i32,
    name: String,
    email: String,
}

let user = sqlx::query_as!(User, "SELECT id, name, email FROM users WHERE id = $1", user_id)
    .fetch_one(&pool)
    .await?;

The macro generates a struct literal, matching column names to field names. It does not use the FromRow trait. The struct does not need any derive macros.

Fetch methods

Choose the fetch method based on how many rows you expect:

MethodReturnsUse when
.execute(&pool)PgQueryResultINSERT, UPDATE, DELETE with no RETURNING
.fetch_one(&pool)TExactly one row expected (errors if zero or multiple)
.fetch_optional(&pool)Option<T>Zero or one row
.fetch_all(&pool)Vec<T>Collect all rows into a Vec
.fetch(&pool)impl Stream<Item = Result<T>>Stream rows without buffering

fetch_one returns an error if the query produces zero rows or more than one. Use fetch_optional when the row might not exist.

Nullable columns

The macro infers nullability from the database schema. A column with a NOT NULL constraint maps to T; a nullable column maps to Option<T>.

Override nullability in the column alias when the macro gets it wrong (common with expressions, COALESCE, or complex joins):

// Force non-null (panics at runtime if NULL)
sqlx::query!(r#"SELECT count(*) as "count!" FROM users"#)

// Force nullable
sqlx::query!(r#"SELECT name as "name?" FROM users"#)

// Override both nullability and type
sqlx::query!(r#"SELECT id as "id!: uuid::Uuid" FROM users"#)

The override syntax uses the column alias in double quotes:

  • "col!" forces non-null
  • "col?" forces nullable
  • "col: Type" overrides the Rust type
  • "col!: Type" forces non-null with a type override

RETURNING clauses

PostgreSQL’s RETURNING clause turns INSERT, UPDATE, and DELETE into queries that produce rows. Use fetch_one with query_as! to get the created or modified record back:

let user = sqlx::query_as!(
    User,
    "INSERT INTO users (name, email) VALUES ($1, $2) RETURNING id, name, email",
    name,
    email
)
.fetch_one(&pool)
.await?;

This avoids a separate SELECT after every insert.

Offline mode for CI

Compile-time query checking requires a running PostgreSQL database. In CI environments where a database is not available during compilation, SQLx provides offline mode.

  1. With the database running locally, generate the query cache:
cargo sqlx prepare --workspace

This creates a .sqlx/ directory containing metadata for every compile-time checked query in the project.

  1. Commit .sqlx/ to version control.

  2. When DATABASE_URL is absent at compile time and .sqlx/ exists, the macros use the cached metadata instead of connecting to a database.

  3. In CI, verify the cache is up to date:

cargo sqlx prepare --workspace --check

This fails if any query has changed without regenerating the cache, catching stale metadata before it causes runtime surprises.

To include queries from tests and other non-default targets:

cargo sqlx prepare --workspace -- --all-targets --all-features

Set SQLX_OFFLINE=true to force offline mode even when DATABASE_URL is present. This is useful for verifying that the offline cache works correctly.

Writing and organising queries

Keep queries inline, next to the code that uses them. SQLx’s macros are designed for this: the query text and its bind parameters live together in the handler or module function, so the reader sees the full picture without jumping between files.

pub async fn find_user_by_email(pool: &PgPool, email: &str) -> Result<Option<User>, sqlx::Error> {
    sqlx::query_as!(
        User,
        "SELECT id, name, email, created_at FROM users WHERE email = $1",
        email
    )
    .fetch_optional(pool)
    .await
}

pub async fn create_user(pool: &PgPool, name: &str, email: &str) -> Result<User, sqlx::Error> {
    sqlx::query_as!(
        User,
        "INSERT INTO users (name, email) VALUES ($1, $2) RETURNING id, name, email, created_at",
        name,
        email
    )
    .fetch_one(pool)
    .await
}

For queries that are genuinely long (complex joins, CTEs), query_file_as! reads SQL from a separate file:

-- queries/users_with_posts.sql
SELECT u.id, u.name, u.email, count(p.id) as "post_count!"
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
GROUP BY u.id, u.name, u.email
ORDER BY u.name
let users = sqlx::query_file_as!(UserWithPosts, "queries/users_with_posts.sql")
    .fetch_all(&pool)
    .await?;

File paths are relative to the crate’s Cargo.toml directory. The file is still checked at compile time against the database.

Mapping query results to Rust types

With macros (preferred)

query_as! maps columns to struct fields by name. The struct needs no special derives:

struct User {
    id: i32,
    name: String,
    email: String,
    bio: Option<String>,       // nullable column
    created_at: time::OffsetDateTime, // TIMESTAMPTZ with the `time` feature
}

let users = sqlx::query_as!(User, "SELECT id, name, email, bio, created_at FROM users")
    .fetch_all(&pool)
    .await?;

The macro matches column names to field names at compile time. If the types do not match (e.g., a NOT NULL TEXT column mapped to i32), compilation fails.

With FromRow (runtime)

For cases where compile-time checking is not available (dynamic queries, generic code), use sqlx::FromRow:

#[derive(Debug, sqlx::FromRow)]
struct User {
    id: i32,
    name: String,
    email: String,
    bio: Option<String>,
}

let users: Vec<User> = sqlx::query_as::<_, User>("SELECT id, name, email, bio FROM users")
    .fetch_all(&pool)
    .await?;

Note the distinction: query_as! (with !) is a macro that checks at compile time and does not use FromRow. query_as::<_, T>() (without !) is a runtime function that requires T: FromRow.

FromRow supports field-level attributes for column renaming, defaults, and type conversion:

#[derive(sqlx::FromRow)]
struct User {
    id: i32,
    #[sqlx(rename = "user_name")]
    name: String,
    #[sqlx(default)]
    role: String,
}

PostgreSQL type mappings

SQLx maps PostgreSQL types to Rust types. The common mappings, using the feature flags from the setup above:

PostgreSQLRustFeature
BOOLbool
INT2 / SMALLINTi16
INT4 / INTi32
INT8 / BIGINTi64
FLOAT4 / REALf32
FLOAT8 / DOUBLE PRECISIONf64
TEXT, VARCHARString
BYTEAVec<u8>
UUIDuuid::Uuiduuid
TIMESTAMPTZtime::OffsetDateTimetime
TIMESTAMPtime::PrimitiveDateTimetime
DATEtime::Datetime
TIMEtime::Timetime
JSON, JSONBserde_json::Value or Json<T>json
INT4[], TEXT[], etc.Vec<T>

UUID

UUID primary keys are common in web applications. Enable the uuid feature and use uuid::Uuid directly:

use uuid::Uuid;

struct User {
    id: Uuid,
    name: String,
    email: String,
}

let user = sqlx::query_as!(
    User,
    "INSERT INTO users (id, name, email) VALUES ($1, $2, $3) RETURNING id, name, email",
    Uuid::new_v4(),
    name,
    email
)
.fetch_one(&pool)
.await?;

Add uuid to your direct dependencies too, since you will construct values from it:

uuid = { version = "1", features = ["v4"] }

Timestamps with the time crate

Enable the time feature for date and time support. TIMESTAMPTZ columns map to time::OffsetDateTime, which carries a UTC offset:

use time::OffsetDateTime;

struct AuditEntry {
    id: i32,
    action: String,
    created_at: OffsetDateTime,
}

let entry = sqlx::query_as!(
    AuditEntry,
    "INSERT INTO audit_log (action) VALUES ($1) RETURNING id, action, created_at",
    action
)
.fetch_one(&pool)
.await?;

PostgreSQL stores TIMESTAMPTZ in UTC internally. The OffsetDateTime you receive will always have a UTC offset.

For the time crate, add it as a direct dependency:

time = "0.3"

JSONB

JSONB is useful for semi-structured data that does not warrant its own columns. Enable the json feature and use serde_json::Value for unstructured JSON or sqlx::types::Json<T> for typed deserialization:

use sqlx::types::Json;

#[derive(serde::Serialize, serde::Deserialize)]
struct Preferences {
    theme: String,
    notifications: bool,
}

// Insert typed JSON
sqlx::query!(
    "UPDATE users SET preferences = $1 WHERE id = $2",
    Json(&prefs) as _,
    user_id
)
.execute(&pool)
.await?;

// Read typed JSON
let row = sqlx::query!(
    r#"SELECT preferences as "preferences!: Json<Preferences>" FROM users WHERE id = $1"#,
    user_id
)
.fetch_one(&pool)
.await?;

let prefs: Preferences = row.preferences.0;

The as _ cast on the insert side is required to help the macro infer the correct PostgreSQL type. On the read side, the type override in the column alias tells the macro to deserialise into Json<Preferences>.

Custom enum types

Map PostgreSQL enum types to Rust enums with sqlx::Type:

#[derive(Debug, sqlx::Type)]
#[sqlx(type_name = "user_role", rename_all = "lowercase")]
enum UserRole {
    Admin,
    Member,
    Guest,
}

This corresponds to a PostgreSQL type created with:

CREATE TYPE user_role AS ENUM ('admin', 'member', 'guest');

Use the enum directly in queries:

sqlx::query!(
    "INSERT INTO users (name, role) VALUES ($1, $2)",
    name,
    role as UserRole
)
.execute(&pool)
.await?;

The as UserRole cast tells the macro which Rust type to use for encoding.

Transactions

A transaction groups multiple queries into an atomic unit. Either all succeed and the changes are committed, or any failure rolls everything back.

Start a transaction with pool.begin():

let mut tx = pool.begin().await?;

let user = sqlx::query_as!(
    User,
    "INSERT INTO users (name, email) VALUES ($1, $2) RETURNING id, name, email",
    name,
    email
)
.execute(&mut *tx)
.await?;

sqlx::query!(
    "INSERT INTO audit_log (user_id, action) VALUES ($1, $2)",
    user.id,
    "account_created"
)
.execute(&mut *tx)
.await?;

tx.commit().await?;

Pass the transaction to queries with &mut *tx. This dereferences the Transaction to the underlying connection and reborrows it.

If commit() is never called, the transaction rolls back when it is dropped. This makes the ? operator transaction-safe: if any query fails and the function returns early, the transaction is dropped and automatically rolled back.

async fn transfer(
    pool: &PgPool,
    from_id: i32,
    to_id: i32,
    amount: i64,
) -> Result<(), sqlx::Error> {
    let mut tx = pool.begin().await?;

    sqlx::query!(
        "UPDATE accounts SET balance = balance - $1 WHERE id = $2",
        amount,
        from_id
    )
    .execute(&mut *tx)
    .await?;  // rolls back on failure

    sqlx::query!(
        "UPDATE accounts SET balance = balance + $1 WHERE id = $2",
        amount,
        to_id
    )
    .execute(&mut *tx)
    .await?;  // rolls back on failure

    tx.commit().await?;
    Ok(())
}

For explicit rollback (useful when a business rule fails after the queries succeed):

if balance_too_low {
    tx.rollback().await?;
    return Err(/* ... */);
}

Gotchas

DATABASE_URL must be set at compile time. The query! macros connect to PostgreSQL during compilation. If the variable is missing and no .sqlx/ cache exists, compilation fails. Keep a .env file in your project root for local development.

*&mut tx syntax. Passing a transaction to a query requires &mut *tx, not &mut tx or &tx. The Transaction type implements DerefMut to the underlying connection; the dereference-reborrow is needed for the borrow checker.

Column name matching in query_as!. The column names in the SELECT must match the struct field names exactly. Use AS to rename columns if the database naming convention differs:

sqlx::query_as!(
    User,
    "SELECT id, user_name AS name FROM users"
)

Nullable inference in expressions. The macro sometimes cannot determine nullability for computed expressions (count(*), COALESCE, subqueries). Use the "col!" override to tell it the result is non-null:

sqlx::query!(r#"SELECT count(*) as "total!" FROM users"#)

Pool exhaustion. If all connections are in use and acquire_timeout is reached, the next query fails. This usually means the pool is too small for the application’s concurrency, or a handler is holding a connection too long (a common cause is doing non-database work while a transaction is open). Keep transactions short.

Database Migrations

Migrations track every change to your database schema as versioned SQL files. SQLx includes a migration system that runs these files in order, records which have been applied, and validates that applied migrations have not been modified. The same sqlx-cli tool installed in the database section manages the full lifecycle.

Creating migrations

Generate a new migration with sqlx migrate add. Use the -r flag to create reversible migrations, which produce a .up.sql and .down.sql pair:

sqlx migrate add -r create_users

This creates two files in the migrations/ directory at the project root:

migrations/
  20260226140000_create_users.up.sql
  20260226140000_create_users.down.sql

The timestamp prefix is generated in UTC and determines execution order. Timestamp versioning is the default and prevents conflicts when multiple developers create migrations concurrently.

Once the first migration uses -r, subsequent calls to sqlx migrate add will produce reversible pairs automatically. The CLI infers the mode from existing files.

Writing the SQL

The .up.sql file contains the forward schema change:

-- migrations/20260226140000_create_users.up.sql
CREATE TABLE users (
    id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email      TEXT NOT NULL UNIQUE,
    name       TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

The .down.sql file reverses it:

-- migrations/20260226140000_create_users.down.sql
DROP TABLE users;

Keep each migration focused on a single change. A migration that creates a table should not also modify a different table. This makes reverting predictable and keeps the history readable.

Running migrations

At application startup

The migrate! macro embeds migration files directly into the compiled binary. Call .run() on the pool at startup to apply any pending migrations before the application begins serving requests:

use sqlx::PgPool;

#[tokio::main]
async fn main() {
    let pool = PgPool::connect(&std::env::var("DATABASE_URL").expect("DATABASE_URL must be set"))
        .await
        .expect("failed to connect to database");

    sqlx::migrate!()
        .run(&pool)
        .await
        .expect("failed to run migrations");

    // build router, start server...
}

migrate!() reads from the migrations/ directory relative to Cargo.toml. The migration SQL is baked into the binary at compile time, so the deployed binary is self-contained, it does not need the migration files on disk.

This is the simplest deployment model. One binary, one process, and the schema is always in sync with the code.

With the CLI

For larger deployments where migrations should run as a separate step before the application starts, use the CLI directly:

sqlx migrate run

This reads DATABASE_URL from the environment or a .env file. The CLI approach gives you explicit control over when schema changes happen, which matters when you have multiple application instances starting simultaneously, need to run migrations from a CI pipeline before deployment, or want human review of what will be applied before it runs.

The two approaches are not mutually exclusive. migrate run is idempotent: it skips any migration already recorded in the database. You can run migrations from the CLI in your deployment pipeline and keep sqlx::migrate!().run(&pool) in your application code as a safety net.

Recompilation caveat

The migrate! macro runs at compile time, but Cargo does not automatically detect changes to non-Rust files. Adding a new .sql migration without modifying any .rs file will not trigger recompilation. The application will silently use the old set of migrations.

Fix this by generating a build.rs that watches the migrations directory:

sqlx migrate build-script

This creates a build.rs at the project root:

// generated by `sqlx migrate build-script`
fn main() {
    println!("cargo:rerun-if-changed=migrations");
}

Commit this file. With it in place, any change to the migrations/ directory triggers a rebuild.

Reverting migrations

Revert the most recently applied migration:

sqlx migrate revert

This runs the .down.sql file for the last applied migration. Run it multiple times to step back further, or target a specific version:

# revert everything after version 20260226140000
sqlx migrate revert --target-version 20260226140000

# revert all migrations
sqlx migrate revert --target-version 0

Reverting is primarily a development tool. In production, writing a new forward migration to undo a change is usually safer than reverting, because other parts of the system may already depend on the schema change.

Checking migration status

Inspect which migrations have been applied and whether any are out of sync:

sqlx migrate info

This prints each migration’s version, description, applied status, and whether its checksum matches the file on disk. Use this to diagnose problems before making changes, especially in shared environments.

How SQLx tracks migrations

SQLx creates a _sqlx_migrations table automatically on first run. It records each applied migration’s version, description, checksum (SHA-256 of the SQL content), execution time, and success status.

Two behaviours follow from this:

Checksum validation. Every time migrations run, SQLx compares the stored checksum for each already-applied migration against the current file on disk. If a file has been edited after it was applied, SQLx raises an error. This catches accidental edits to applied migrations. If you need to correct a mistake, write a new migration rather than editing the old one.

Dirty state detection. If a migration fails partway through, its row may be recorded with success = false. SQLx refuses to run further migrations until the dirty state is resolved. In development, the simplest fix is to drop and recreate the database. In production, investigate the failure, fix it manually, and update the row.

Managing migrations across environments

Development

The typical workflow during development:

# create the database (if it doesn't exist)
sqlx database create

# apply all pending migrations
sqlx migrate run

# full reset when needed
sqlx database drop
sqlx database create
sqlx migrate run

CI

In CI, create a disposable database, apply migrations, and verify the offline query cache is up to date:

sqlx database create
sqlx migrate run
cargo sqlx prepare --workspace --check

The --check flag fails the build if any query! macro’s cached metadata in .sqlx/ is stale. This enforces that developers run cargo sqlx prepare after schema changes.

Production

For applications using the embedded migrate!() macro, no separate migration step is needed. The binary applies its own migrations on startup.

For CLI-based deployments, run sqlx migrate run as part of the deployment process, before starting the application. In Docker, this is typically an entrypoint script or an init container. The --dry-run flag shows what would be applied without executing, useful for pre-deployment review:

sqlx migrate run --dry-run

Concurrency safety

SQLx acquires a PostgreSQL advisory lock before running migrations. If multiple instances start simultaneously, only one will apply migrations while the others wait. This prevents race conditions during rolling deployments.

Gotchas

Never edit an applied migration. The checksum validation will reject it. Write a new corrective migration instead.

Don’t mix simple and reversible migrations. SQLx infers the migration type from existing files. Stick with one style (reversible, using -r) throughout the project.

Commit build.rs and .sqlx/. The build.rs file (from sqlx migrate build-script) ensures new migrations trigger recompilation. The .sqlx/ directory (from cargo sqlx prepare) enables compilation without a live database. Both belong in version control.

DATABASE_URL takes precedence over .sqlx/. In CI, if DATABASE_URL is set during compilation, the query! macros will try to connect to it rather than using the offline cache. Set SQLX_OFFLINE=true explicitly when you want to force offline mode.

Search

PostgreSQL ships a full-text search engine. For most content-heavy and CRUD applications, it is the right starting point: no extra service to run, no index to keep in sync, and search results are transactionally consistent with your writes. Start here, and graduate to a dedicated search engine only when you hit a specific limitation that PostgreSQL cannot address.

This section covers PostgreSQL full-text search and trigram matching, the SQLx patterns for using them from Rust, building a search UI with HTMX and Maud, and when and how to move to Meilisearch.

PostgreSQL full-text search

PostgreSQL full-text search works by converting text into tsvector (a sorted list of normalised lexemes) and matching it against a tsquery (a search predicate). The engine handles stemming, stop-word removal, and ranking.

Schema setup

Add a tsvector column to your table using GENERATED ALWAYS AS ... STORED. PostgreSQL maintains it automatically on every insert and update.

CREATE TABLE articles (
    id          BIGSERIAL PRIMARY KEY,
    title       TEXT NOT NULL,
    body        TEXT NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    search_vector tsvector
        GENERATED ALWAYS AS (
            setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
            setweight(to_tsvector('english', coalesce(body, '')), 'B')
        ) STORED
);

setweight assigns a weight label (A, B, C, or D) to lexemes. Title matches weighted A will rank higher than body matches weighted B when you use the ranking functions.

Create a GIN index on the column:

CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);

Without this index, every search query scans the full table and recomputes tsvectors. With it, PostgreSQL uses an inverted index to look up only rows containing the matching lexemes.

Building search queries

websearch_to_tsquery is the best choice for user-facing search. It accepts Google-like syntax (quoted phrases, - for exclusion, OR), and it never raises a syntax error on malformed input.

-- "rust web" becomes: 'rust' & 'web'
-- "rust framework" -django becomes: 'rust' & 'framework' & !'django'
-- "full text" OR search becomes: 'full' <-> 'text' | 'search'
SELECT websearch_to_tsquery('english', 'rust web framework');

The @@ operator matches a tsvector against a tsquery:

SELECT id, title
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', 'rust web framework')
ORDER BY ts_rank_cd(search_vector, websearch_to_tsquery('english', 'rust web framework')) DESC
LIMIT 20;

Other tsquery constructors exist for specific needs:

FunctionBehaviour
websearch_to_tsqueryGoogle-like syntax, never errors. Best for user input.
plainto_tsqueryInserts & (AND) between all words. No special syntax.
phraseto_tsqueryInserts <-> (adjacent) between words. For exact phrase matching.
to_tsqueryRequires explicit operators (&, |, !, <->). For programmatic query building.

Ranking results

ts_rank_cd uses cover density ranking, which rewards documents where matching terms appear close together. It generally produces better results than ts_rank for multi-term queries.

SELECT id, title,
       ts_rank_cd(search_vector, query) AS rank
FROM articles, websearch_to_tsquery('english', 'rust web') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

The weights array controls how much each label contributes to the rank. The default is {0.1, 0.2, 0.4, 1.0} for D, C, B, A respectively. Override it when you need different weighting:

ts_rank_cd('{0.1, 0.2, 0.4, 1.0}', search_vector, query)

Highlighting search results

ts_headline generates a text snippet with matching terms wrapped in markers:

SELECT id, title,
       ts_headline('english', body, query,
                   'StartSel=<mark>, StopSel=</mark>, MaxWords=35, MinWords=15, MaxFragments=2')
       AS snippet
FROM articles, websearch_to_tsquery('english', 'rust web') AS query
WHERE search_vector @@ query
ORDER BY ts_rank_cd(search_vector, query) DESC
LIMIT 20;

ts_headline is expensive. It re-parses the original text for every row. Always apply it only to rows that have already been filtered and limited.

Search queries with SQLx

SQLx does not have native Rust types for tsvector or tsquery. This is not a problem in practice: keep the FTS logic in SQL, bind the search term as a String, and return only types SQLx understands.

ts_rank and ts_rank_cd return float4 (maps to f32). ts_headline returns text (maps to String). The @@ operator returns bool. All work directly with SQLx’s compile-time checked macros.

struct SearchResult {
    id: i64,
    title: String,
    snippet: String,
    rank: f32,
}

pub async fn search_articles(
    pool: &PgPool,
    query: &str,
    limit: i64,
) -> Result<Vec<SearchResult>, sqlx::Error> {
    sqlx::query_as!(
        SearchResult,
        r#"
        SELECT
            id,
            title,
            ts_headline('english', body, websearch_to_tsquery('english', $1),
                        'StartSel=<mark>, StopSel=</mark>, MaxWords=35, MinWords=15')
                as "snippet!",
            ts_rank_cd(search_vector, websearch_to_tsquery('english', $1))
                as "rank!"
        FROM articles
        WHERE search_vector @@ websearch_to_tsquery('english', $1)
        ORDER BY rank DESC
        LIMIT $2
        "#,
        query,
        limit
    )
    .fetch_all(pool)
    .await
}

The "snippet!" and "rank!" column aliases force SQLx to treat these as non-nullable. Without the ! suffix, the macro infers Option<String> and Option<f32> for computed columns, even though these functions never return NULL for non-null inputs.

Do not use SELECT * on tables with tsvector columns. The query! and query_as! macros will fail at compile time because SQLx has no Rust type for tsvector. Always list your columns explicitly, omitting the tsvector column or casting it with ::text if you genuinely need its contents.

pg_trgm for fuzzy matching

PostgreSQL full-text search is lexeme-exact after normalisation. If a user types “postgre” instead of “postgresql”, FTS will not match. The pg_trgm extension fills this gap with trigram-based similarity matching, providing typo tolerance that FTS lacks.

Enable the extension

Add a migration:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

pg_trgm is a contrib extension shipped with PostgreSQL but not enabled by default. The compile-time query! macros connect to your development database, so the extension must be installed there too.

Similarity search

A trigram is a sequence of three consecutive characters. Two strings are similar if they share many trigrams. The similarity function returns a score between 0.0 and 1.0:

SELECT similarity('postgresql', 'postgre');
-- Result: ~0.47

The % operator returns true when similarity exceeds a threshold (default 0.3, configurable with SET pg_trgm.similarity_threshold):

SELECT title, similarity(title, 'postgre') AS sml
FROM articles
WHERE title % 'postgre'
ORDER BY sml DESC
LIMIT 10;

word_similarity compares a search term against substrings of a longer text. It is better suited when searching for a word within a title or sentence:

SELECT title, word_similarity('serch', title) AS sml
FROM articles
WHERE 'serch' <% title
ORDER BY sml DESC
LIMIT 10;

Indexing for trigram queries

Create a GIN index with the gin_trgm_ops operator class:

CREATE INDEX idx_articles_title_trgm ON articles USING GIN (title gin_trgm_ops);

This index supports the % operator, LIKE, ILIKE, and regex patterns. Without it, every trigram query requires a sequential scan.

If you need KNN (K-Nearest Neighbour) ordering with the <-> distance operator, use a GiST index instead:

CREATE INDEX idx_articles_title_trgm_gist ON articles USING GiST (title gist_trgm_ops);

GiST supports ORDER BY title <-> 'search term' directly, which GIN does not.

Trigram queries with SQLx

similarity and word_similarity take text inputs and return real (maps to f32). No casting workarounds needed.

struct FuzzyResult {
    id: i64,
    title: String,
    similarity: f32,
}

pub async fn fuzzy_search(
    pool: &PgPool,
    query: &str,
    limit: i64,
) -> Result<Vec<FuzzyResult>, sqlx::Error> {
    sqlx::query_as!(
        FuzzyResult,
        r#"
        SELECT
            id,
            title,
            similarity(title, $1) as "similarity!"
        FROM articles
        WHERE title % $1
        ORDER BY similarity DESC
        LIMIT $2
        "#,
        query,
        limit
    )
    .fetch_all(pool)
    .await
}

Combining FTS and trigram search

A practical search function tries full-text search first for precise, ranked results, then falls back to trigram matching for typo tolerance:

pub async fn search(
    pool: &PgPool,
    query: &str,
    limit: i64,
) -> Result<Vec<SearchResult>, sqlx::Error> {
    let results = search_articles(pool, query, limit).await?;

    if results.is_empty() {
        // Fall back to fuzzy matching on title
        return sqlx::query_as!(
            SearchResult,
            r#"
            SELECT
                id,
                title,
                '' as "snippet!",
                similarity(title, $1) as "rank!"
            FROM articles
            WHERE title % $1
            ORDER BY rank DESC
            LIMIT $2
            "#,
            query,
            limit
        )
        .fetch_all(pool)
        .await;
    }

    Ok(results)
}

You can also combine both in a single query with a weighted score, but the fallback pattern is simpler to reason about and avoids the cost of trigram comparison on every row when FTS already produces good results.

Search UI with HTMX

A search interface needs a text input that sends queries as the user types, a target element where results appear, and debouncing to avoid flooding the server with requests on every keystroke. HTMX handles all of this declaratively.

The search input

fn search_input(query: &str) -> Markup {
    html! {
        input type="search" name="q" value=(query)
            placeholder="Search articles..."
            hx-get="/search"
            hx-trigger="input changed delay:300ms, keyup[key=='Enter'], search"
            hx-target="#search-results"
            hx-sync="this:replace"
            hx-replace-url="true"
            hx-indicator="#search-spinner";
        span #search-spinner .htmx-indicator { "Searching..." }
    }
}

The trigger configuration:

  • input changed delay:300ms debounces: fires 300ms after the user stops typing, and only if the value actually changed.
  • keyup[key=='Enter'] fires immediately on Enter.
  • search fires when the user clicks the browser’s native clear button on <input type="search">.

hx-sync="this:replace" cancels any in-flight request and replaces it with the new one. Without this, a slow response for “ab” could arrive after a fast response for “abc” and overwrite the correct results with stale ones.

hx-replace-url="true" updates the browser URL bar to /search?q=... without creating a history entry for every keystroke. The user can copy, bookmark, or share the URL.

The results fragment

fn search_results(results: &[SearchResult]) -> Markup {
    html! {
        @if results.is_empty() {
            p .no-results { "No articles found." }
        } @else {
            @for result in results {
                article .search-result {
                    h3 {
                        a href={ "/articles/" (result.id) } { (result.title) }
                    }
                    p .snippet { (PreEscaped(&result.snippet)) }
                }
            }
        }
    }
}

Use PreEscaped for the snippet because ts_headline returns HTML with <mark> tags. The snippet content comes from your own database, not from user input, so this is safe.

The Axum handler

The handler serves both full page loads (direct navigation to /search?q=rust) and HTMX fragment requests (triggered by typing in the input). Detect the difference with the HX-Request header.

use axum::extract::{Query, State};
use axum::http::HeaderMap;
use axum::response::Html;
use maud::{html, Markup, PreEscaped};

#[derive(serde::Deserialize)]
pub struct SearchParams {
    #[serde(default)]
    q: String,
}

pub async fn search_handler(
    headers: HeaderMap,
    State(state): State<AppState>,
    Query(params): Query<SearchParams>,
) -> Markup {
    let results = if params.q.is_empty() {
        vec![]
    } else {
        search(&state.db, &params.q, 20)
            .await
            .unwrap_or_default()
    };

    let fragment = search_results(&results);

    if headers.get("HX-Request").is_some() {
        fragment
    } else {
        search_page(&params.q, fragment)
    }
}

fn search_page(query: &str, results: Markup) -> Markup {
    html! {
        h1 { "Search" }
        (search_input(query))
        div #search-results {
            (results)
        }
    }
}

When HTMX sends a request, the handler returns only the results fragment. When the user navigates directly to /search?q=rust, it returns the full page with the search input pre-populated and results already rendered. This makes search URLs bookmarkable and shareable.

Route setup

use axum::{routing::get, Router};

let app = Router::new()
    .route("/search", get(search_handler))
    .with_state(state);

When PostgreSQL search is not enough

PostgreSQL FTS handles most search requirements for content-heavy and CRUD applications. Recognise these limits so you know when to reach for a dedicated engine:

  • No built-in typo tolerance. pg_trgm helps, but it works on string similarity, not search-query-level fuzzy matching. A dedicated engine like Meilisearch handles typos automatically across all indexed fields.
  • No faceted search. Counting results by category, tag, or date range alongside search results requires separate GROUP BY queries. Dedicated engines provide facets as a first-class feature.
  • Limited relevance tuning. ts_rank and ts_rank_cd are basic. There is no equivalent to Elasticsearch’s function scoring, decay functions, or field-level boosting beyond four weight levels (A/B/C/D).
  • Performance at scale. PostgreSQL FTS works well into the millions of rows for straightforward queries. Beyond that, GIN indexes become large and slow to update, and ts_headline is CPU-intensive.
  • No instant prefix matching. FTS matches complete lexemes. Searching for “rus” will not match “rust”. Dedicated engines handle prefix matching out of the box.
  • No semantic matching. FTS matches words, not meaning. “How to fix a flat tire” will not find documents about “tire puncture repair”. For meaning-based retrieval, see Semantic Search.

If your application hits one or more of these limits and search is a primary user-facing feature, add Meilisearch or pgvector depending on what you need.

Meilisearch

Meilisearch is a search engine built in Rust with built-in typo tolerance, instant search, and faceted filtering. It runs as a separate service, providing a RESTful API that your application talks to via the Rust SDK.

Running Meilisearch in development

Add it to your Docker Compose file:

services:
  meilisearch:
    image: getmeili/meilisearch:v1.12
    ports:
      - "7700:7700"
    environment:
      MEILI_ENV: development
      MEILI_MASTER_KEY: devMasterKey123
    volumes:
      - meili_data:/meili_data

volumes:
  meili_data:

In development mode, Meilisearch exposes a web-based search preview UI at http://localhost:7700.

Rust SDK

Add the dependency:

[dependencies]
meilisearch-sdk = "0.28"

Index documents and search:

use meilisearch_sdk::client::Client;

#[derive(serde::Serialize, serde::Deserialize, Debug)]
struct Article {
    id: i64,
    title: String,
    body: String,
}

// Create client
let client = Client::new("http://localhost:7700", Some("devMasterKey123"))?;

// Index documents
let articles: Vec<Article> = fetch_all_articles(&pool).await?;
client.index("articles")
    .add_documents(&articles, Some("id"))
    .await?;

// Search (typo-tolerant by default: "rrust" finds "rust")
let results = client.index("articles")
    .search()
    .with_query("rrust web framwork")
    .with_limit(20)
    .execute::<Article>()
    .await?;

Keeping the index in sync

PostgreSQL remains the source of truth. Meilisearch is a derived, read-optimised search layer. The simplest sync strategy is application-level dual write with a periodic full resync as a safety net.

Dual write: when your application inserts or updates an article in PostgreSQL, also push the document to Meilisearch:

pub async fn create_article(
    pool: &PgPool,
    meili: &Client,
    title: &str,
    body: &str,
) -> Result<Article, AppError> {
    let article = sqlx::query_as!(
        Article,
        "INSERT INTO articles (title, body) VALUES ($1, $2) RETURNING id, title, body",
        title, body
    )
    .fetch_one(pool)
    .await?;

    meili.index("articles")
        .add_documents(&[&article], Some("id"))
        .await?;

    Ok(article)
}

Periodic resync: a background task queries PostgreSQL for rows modified since the last sync (using an updated_at column) and pushes them to Meilisearch. Run this every 30-60 seconds. It catches any drift caused by failed dual writes.

If the Meilisearch write fails, the search index is temporarily stale but the database is correct. Design your application to tolerate this eventual consistency.

When to use Meilisearch

Add Meilisearch when search is a primary user-facing feature and you need:

  • Automatic typo tolerance across all indexed fields
  • Faceted search and filtering
  • Instant prefix matching (results as the user types each character)
  • Relevance ranking that works well out of the box without manual tuning

Accept the operational cost: a separate service to run, a sync strategy to maintain, and eventual consistency between your database and search index.

tantivy

tantivy is an embedded full-text search library for Rust. Think of it as Lucene for Rust: you link it into your application directly, with no separate process or HTTP API. It provides BM25 scoring, configurable tokenisers with stemming support for 17 languages, phrase queries, and faceted search.

tantivy is a good fit when you need more powerful search than PostgreSQL FTS but want to avoid adding infrastructure. The index lives in your application process, so there is no sync problem and no network hop. The trade-off is that you manage the index lifecycle yourself, and the index writer holds an exclusive lock, which limits it to a single-process deployment (or requires designating one process as the indexer).

tantivy does not provide built-in typo tolerance. If you need automatic fuzzy matching, Meilisearch is a better choice.

Gotchas

websearch_to_tsquery never errors, to_tsquery does. Use websearch_to_tsquery or plainto_tsquery for user-facing search. to_tsquery requires valid operator syntax and will return a SQL error on malformed input like unbalanced parentheses.

Generated column expressions must be immutable. to_tsvector('english', title) with a string literal regconfig is immutable. If the language configuration comes from another column, you need a trigger instead of a generated column.

The pg_trgm extension must be installed in your development database. The query! macros connect at compile time. If the extension is missing, any query using similarity(), %, or related operators will fail to compile.

ts_headline on large result sets is slow. Always filter and limit rows before applying ts_headline. Never call it on the full table.

sqlx prepare needs extensions too. If you use cargo sqlx prepare for offline compilation in CI, the database used for preparation must have pg_trgm installed and the schema fully migrated.

Semantic Search

PostgreSQL full-text search matches words. Semantic search matches meaning. A user searching for “how to fix a flat tire” finds documents about “tire puncture repair” even though no words overlap. This is possible because text is converted into high-dimensional vectors (embeddings) that encode meaning, and similar meanings produce similar vectors.

pgvector adds vector similarity search to PostgreSQL. It introduces a native vector column type with distance operators and index support. If you already run PostgreSQL, adding semantic search requires an extension, not a new service. Your embeddings live alongside your relational data, with full ACID guarantees and SQL for filtering.

This section covers pgvector setup, generating embeddings with a local model, storing and querying vectors from Rust with SQLx, and combining vector similarity with full-text search for hybrid retrieval. For building complete RAG pipelines that feed retrieved context to an LLM, see Retrieval-Augmented Generation in the AI and LLM Integration section.

pgvector setup

Enable the extension

Add a migration:

CREATE EXTENSION IF NOT EXISTS vector;

vector is a contrib-style extension included in the standard postgres Docker image from PostgreSQL 17 onward. Cloud-managed PostgreSQL services (AWS RDS, Supabase, Neon) include it too.

Schema

Add a vector column sized to match your embedding model’s output dimensions. The example below uses 768 dimensions, which matches nomic-embed-text.

CREATE TABLE documents (
    id          BIGSERIAL PRIMARY KEY,
    title       TEXT NOT NULL,
    content     TEXT NOT NULL,
    embedding   vector(768),
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

Unlike the tsvector column used for full-text search, embeddings cannot be a GENERATED ALWAYS column. Generating an embedding requires calling an external model, which PostgreSQL cannot do in a column expression. Your application generates the embedding and writes it alongside the content.

Indexing

Create an HNSW (Hierarchical Navigable Small World) index for approximate nearest neighbour search:

CREATE INDEX idx_documents_embedding ON documents
    USING hnsw (embedding vector_cosine_ops);

HNSW is the recommended index type. It provides logarithmic search time and handles data updates without degrading recall. The alternative, IVFFlat, builds faster and uses less space, but its recall degrades as data changes because cluster centroids are not recalculated.

Without an index, pgvector performs exact nearest neighbour search via sequential scan. This is fine for small datasets (under ~100K vectors) but does not scale.

The vector_cosine_ops operator class matches cosine distance (<=>), which is the right choice for text embeddings. Other operator classes exist for L2 distance (vector_l2_ops), inner product (vector_ip_ops), and others.

Tuning index parameters

HNSW accepts two build-time parameters:

CREATE INDEX idx_documents_embedding ON documents
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);
  • m controls the maximum number of connections per node (default 16). Higher values improve recall but increase index size and build time.
  • ef_construction controls the search breadth during index building (default 64). Higher values produce a better quality index at the cost of slower builds.

At query time, hnsw.ef_search controls how many nodes the search visits (default 40). Increase it when you need higher recall:

SET hnsw.ef_search = 100;

The defaults work well for most workloads. Benchmark against your actual data before changing them.

Generating embeddings

An embedding model converts text into a fixed-size vector. You need one to populate the embedding column and to convert search queries into vectors at query time.

Local embeddings with Ollama

Ollama runs embedding models locally. It serves an HTTP API compatible with the OpenAI embeddings endpoint, so any client that speaks that protocol works.

Pull an embedding model:

ollama pull nomic-embed-text

nomic-embed-text produces 768-dimension vectors, supports 8,192 token context, and runs on commodity hardware. It scores competitively with commercial APIs on retrieval benchmarks.

Generate an embedding via the API:

curl http://localhost:11434/api/embed -d '{
  "model": "nomic-embed-text",
  "input": "How to handle errors in Rust web applications"
}'

The response includes an embeddings array containing one vector per input string.

Calling Ollama from Rust

Ollama’s /api/embed endpoint accepts JSON and returns JSON. Use reqwest directly:

use reqwest::Client;
use serde::{Deserialize, Serialize};

#[derive(Serialize)]
struct EmbedRequest {
    model: String,
    input: Vec<String>,
}

#[derive(Deserialize)]
struct EmbedResponse {
    embeddings: Vec<Vec<f32>>,
}

pub async fn generate_embeddings(
    client: &Client,
    ollama_url: &str,
    texts: &[&str],
) -> Result<Vec<Vec<f32>>, reqwest::Error> {
    let response: EmbedResponse = client
        .post(format!("{}/api/embed", ollama_url))
        .json(&EmbedRequest {
            model: "nomic-embed-text".to_string(),
            input: texts.iter().map(|s| s.to_string()).collect(),
        })
        .send()
        .await?
        .json()
        .await?;

    Ok(response.embeddings)
}

Batch multiple texts in a single request. Ollama processes them together, which is faster than one request per text.

OpenAI as an alternative

If you prefer a hosted API, OpenAI’s text-embedding-3-small produces 1,536-dimension vectors at $0.02 per million tokens. Change the vector(768) column to vector(1536), swap the model name, and point the request at https://api.openai.com/v1/embeddings with a bearer token. The query patterns in this section work the same regardless of how the embedding was generated.

Storing and querying vectors with SQLx

The pgvector crate

The pgvector crate provides a Vector type that implements SQLx’s Encode and Decode traits.

[dependencies]
pgvector = { version = "0.4", features = ["sqlx"] }

Inserting documents with embeddings

use pgvector::Vector;
use sqlx::PgPool;

pub async fn insert_document(
    pool: &PgPool,
    title: &str,
    content: &str,
    embedding: Vec<f32>,
) -> Result<i64, sqlx::Error> {
    let embedding = Vector::from(embedding);

    sqlx::query_scalar!(
        r#"
        INSERT INTO documents (title, content, embedding)
        VALUES ($1, $2, $3)
        RETURNING id
        "#,
        title,
        content,
        embedding as _
    )
    .fetch_one(pool)
    .await
}

The as _ cast tells SQLx to use the pgvector crate’s Encode implementation rather than trying to infer a type mapping for the vector column.

Similarity search

The <=> operator computes cosine distance. Lower distance means higher similarity. Order by distance ascending to get the most similar results first.

struct SimilarDocument {
    id: i64,
    title: String,
    content: String,
    similarity: f64,
}

pub async fn semantic_search(
    pool: &PgPool,
    query_embedding: Vec<f32>,
    limit: i64,
) -> Result<Vec<SimilarDocument>, sqlx::Error> {
    let embedding = Vector::from(query_embedding);

    sqlx::query_as!(
        SimilarDocument,
        r#"
        SELECT
            id,
            title,
            content,
            1 - (embedding <=> $1) as "similarity!"
        FROM documents
        ORDER BY embedding <=> $1
        LIMIT $2
        "#,
        embedding as _,
        limit
    )
    .fetch_all(pool)
    .await
}

1 - cosine_distance converts the distance into a similarity score between 0.0 and 1.0, where 1.0 is identical.

Filtered similarity search

Combine vector similarity with standard SQL filtering. pgvector’s HNSW index supports iterative scans (v0.8.0+), so filtered queries return the expected number of results even when the filter is selective:

pub async fn search_by_category(
    pool: &PgPool,
    query_embedding: Vec<f32>,
    category: &str,
    limit: i64,
) -> Result<Vec<SimilarDocument>, sqlx::Error> {
    let embedding = Vector::from(query_embedding);

    sqlx::query_as!(
        SimilarDocument,
        r#"
        SELECT
            id,
            title,
            content,
            1 - (embedding <=> $1) as "similarity!"
        FROM documents
        WHERE category = $2
        ORDER BY embedding <=> $1
        LIMIT $3
        "#,
        embedding as _,
        category,
        limit
    )
    .fetch_all(pool)
    .await
}

Hybrid search

Vector similarity alone achieves roughly 62% retrieval precision. Combining it with full-text search using Reciprocal Rank Fusion (RRF) pushes this to roughly 84%. RRF merges two ranked result lists by converting ranks into scores and summing them, so a document that ranks well in both lists scores highest.

Schema for hybrid search

A table that supports both search strategies needs a tsvector column for FTS and a vector column for semantic search:

CREATE TABLE documents (
    id            BIGSERIAL PRIMARY KEY,
    title         TEXT NOT NULL,
    content       TEXT NOT NULL,
    embedding     vector(768),
    search_vector tsvector
        GENERATED ALWAYS AS (
            setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
            setweight(to_tsvector('english', coalesce(content, '')), 'B')
        ) STORED,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_documents_embedding ON documents
    USING hnsw (embedding vector_cosine_ops);

CREATE INDEX idx_documents_search ON documents
    USING gin (search_vector);

The hybrid search query

Run both search strategies, rank each result set independently, then merge with RRF:

WITH semantic AS (
    SELECT id, title, content,
           row_number() OVER (ORDER BY embedding <=> $1) AS rank
    FROM documents
    ORDER BY embedding <=> $1
    LIMIT $3
),
fulltext AS (
    SELECT id, title, content,
           row_number() OVER (
               ORDER BY ts_rank_cd(search_vector,
                   websearch_to_tsquery('english', $2)) DESC
           ) AS rank
    FROM documents
    WHERE search_vector @@ websearch_to_tsquery('english', $2)
    LIMIT $3
),
combined AS (
    SELECT id, title, content, rank, 'semantic' AS source FROM semantic
    UNION ALL
    SELECT id, title, content, rank, 'fulltext' AS source FROM fulltext
)
SELECT id, title, content,
       sum(1.0 / (50 + rank)) AS score
FROM combined
GROUP BY id, title, content
ORDER BY score DESC
LIMIT $3;

The constant 50 in the RRF formula (1.0 / (50 + rank)) is a smoothing parameter. It prevents top-ranked results from dominating excessively. 50 is the standard value from the original RRF paper.

Hybrid search in Rust

struct HybridResult {
    id: i64,
    title: String,
    content: String,
    score: f64,
}

pub async fn hybrid_search(
    pool: &PgPool,
    query_embedding: Vec<f32>,
    query_text: &str,
    limit: i64,
) -> Result<Vec<HybridResult>, sqlx::Error> {
    let embedding = Vector::from(query_embedding);

    sqlx::query_as!(
        HybridResult,
        r#"
        WITH semantic AS (
            SELECT id, title, content,
                   row_number() OVER (ORDER BY embedding <=> $1) AS rank
            FROM documents
            ORDER BY embedding <=> $1
            LIMIT $3
        ),
        fulltext AS (
            SELECT id, title, content,
                   row_number() OVER (
                       ORDER BY ts_rank_cd(search_vector,
                           websearch_to_tsquery('english', $2)) DESC
                   ) AS rank
            FROM documents
            WHERE search_vector @@ websearch_to_tsquery('english', $2)
            LIMIT $3
        ),
        combined AS (
            SELECT id, title, content, rank FROM semantic
            UNION ALL
            SELECT id, title, content, rank FROM fulltext
        )
        SELECT
            id as "id!",
            title as "title!",
            content as "content!",
            sum(1.0 / (50 + rank))::float8 as "score!"
        FROM combined
        GROUP BY id, title, content
        ORDER BY score DESC
        LIMIT $3
        "#,
        embedding as _,
        query_text,
        limit
    )
    .fetch_all(pool)
    .await
}

The caller generates an embedding from the query text, then passes both the embedding and the raw text. The embedding drives the semantic branch; the raw text drives the FTS branch.

pub async fn search(
    pool: &PgPool,
    http_client: &reqwest::Client,
    ollama_url: &str,
    query: &str,
    limit: i64,
) -> Result<Vec<HybridResult>, sqlx::Error> {
    let embeddings = generate_embeddings(http_client, ollama_url, &[query])
        .await
        .map_err(|e| sqlx::Error::Protocol(e.to_string()))?;

    hybrid_search(pool, embeddings.into_iter().next().unwrap(), query, limit).await
}

When to use semantic search

Add pgvector when your application needs to match by meaning rather than keywords:

  • Knowledge base search. Users describe problems in their own words; documents use different terminology.
  • Recommendation. “Show me articles similar to this one” is a single vector distance query.
  • RAG retrieval. An LLM needs relevant context from your data to generate grounded answers. See Retrieval-Augmented Generation in the AI and LLM Integration section.
  • Classification and clustering. Group documents by semantic similarity without manual tagging.

Stick with full-text search when exact keyword matching, boolean queries, or phrase search are what users expect. The two approaches complement each other, as the hybrid search pattern above demonstrates.

pgvector vs dedicated vector databases

pgvector handles up to a few million vectors comfortably. Beyond that, index builds become slow and memory-intensive. Dedicated vector databases (Qdrant, Weaviate, Pinecone) are built for horizontal scaling to billions of vectors.

For most content-heavy and CRUD web applications, pgvector is the right choice. Your embeddings share a database with the data they describe, transactions keep them consistent, and there is no sync pipeline to maintain. The same reasoning that makes PostgreSQL FTS the right starting point for keyword search applies here: start with what you have, and graduate to a dedicated service only when you hit a specific limitation.

Gotchas

The vector type has a dimension limit. Maximum 2,000 dimensions for vector, 4,000 for halfvec. Most embedding models produce 768 or 1,536 dimensions, which fit comfortably. OpenAI’s text-embedding-3-large at 3,072 dimensions exceeds the vector limit — reduce it to 1,536 via the API’s dimensions parameter.

Embeddings are not free to generate. Every document insert or update requires an embedding model call. For bulk imports, batch the embedding requests. For Ollama, send multiple texts in a single /api/embed request.

HNSW index builds can spike memory. Building an HNSW index on a large table may consume significant memory. For tables with millions of rows, build the index during a maintenance window and monitor resource usage.

IVFFlat recall degrades silently. If you use IVFFlat instead of HNSW, recall drops as your data changes because cluster centroids are not recalculated. Rebuild the index periodically or use HNSW.

SELECT * fails with vector columns in query_as!. Just as with tsvector columns, SQLx’s compile-time macros need explicit column lists. List your columns explicitly, omitting or casting the embedding column unless you need its contents.

The extension must be installed in your development database. SQLx’s compile-time query! macros connect to the database during compilation. The vector extension must be enabled there. The same applies to cargo sqlx prepare for offline compilation in CI.

Auth & Security

Authentication

Session-based authentication fits naturally into a hypermedia-driven architecture. The server manages all auth state. The browser sends a cookie. No client-side token management, no JWT parsing in JavaScript, no OAuth dance in the browser. The server decides who the user is, renders the appropriate HTML, and sends it.

This section builds authentication with tower-sessions for session management, argon2 for password hashing, and tower-csrf for cross-site request forgery protection. PostgreSQL stores both user records and session data via tower-sessions-sqlx-store.

Dependencies

[dependencies]
tower-sessions = "0.14"
tower-sessions-sqlx-store = { version = "0.15", features = ["postgres"] }
tower-csrf = "0.1"
argon2 = "0.5"
sqlx = { version = "0.8", features = ["runtime-tokio", "postgres", "time", "uuid"] }
time = "0.3"
uuid = { version = "1", features = ["v4", "serde"] }

tower-sessions provides the session middleware layer. tower-sessions-sqlx-store backs it with PostgreSQL so sessions survive server restarts. argon2 handles password hashing using the Argon2id algorithm, the OWASP primary recommendation. tower-csrf protects state-changing requests from cross-site forgery.

Note the version pairing: tower-sessions 0.14 and tower-sessions-sqlx-store 0.15 are compatible through their shared dependency on tower-sessions-core 0.14. Check both crates for newer matching releases.

Password hashing

Argon2id is memory-hard and CPU-hard, which makes brute-force attacks expensive even with GPUs. The argon2 crate provides a pure-Rust implementation.

Passwords are stored as PHC-format strings. The algorithm, version, and parameters are embedded alongside the hash, making the value self-describing:

$argon2id$v=19$m=65536,t=2,p=1$<salt>$<hash>

This means you can change hashing parameters over time without breaking verification of existing hashes. During verification, the argon2 crate reads parameters from the stored hash, not from the Argon2 instance.

use argon2::{
    password_hash::{
        rand_core::OsRng, PasswordHash, PasswordHasher, PasswordVerifier, SaltString,
    },
    Algorithm, Argon2, Params, Version,
};

fn build_hasher() -> Argon2<'static> {
    let params = Params::new(
        64 * 1024,  // 64 MiB memory cost
        2,          // 2 iterations
        1,          // 1 degree of parallelism
        None,       // default output length (32 bytes)
    )
    .expect("valid argon2 params");

    Argon2::new(Algorithm::Argon2id, Version::V0x13, params)
}

fn hash_password(password: &str) -> Result<String, argon2::password_hash::Error> {
    let salt = SaltString::generate(&mut OsRng);
    let hash = build_hasher().hash_password(password.as_bytes(), &salt)?;
    Ok(hash.to_string())
}

fn verify_password(
    password: &str,
    stored_hash: &str,
) -> Result<(), argon2::password_hash::Error> {
    let parsed = PasswordHash::new(stored_hash)?;
    Argon2::default().verify_password(password.as_bytes(), &parsed)
}

SaltString::generate(&mut OsRng) produces a cryptographically random salt using the OS random number generator. The build_hasher function configures Argon2id with 64 MiB of memory, which is a reasonable starting point. Argon2::default() uses 19 MiB (the OWASP floor), but the recommendation is 64 MiB or higher if your server can handle it. Tune the memory parameter upward until hashing takes roughly 200ms on your production hardware.

The verify_password function uses Argon2::default() because it reads parameters from the stored hash, not from the instance. This means old hashes created with different parameters continue to verify correctly.

Peppering

A pepper is a secret key stored only in the application server, never in the database. If the database leaks but the application server is not compromised, the pepper makes the stolen hashes unverifiable. Argon2 has a built-in secret parameter for this:

fn build_hasher_with_pepper(pepper: &[u8]) -> Argon2<'_> {
    let params = Params::new(64 * 1024, 2, 1, None).expect("valid argon2 params");
    Argon2::new_with_secret(pepper, Algorithm::Argon2id, Version::V0x13, params)
        .expect("valid argon2 secret")
}

Generate the pepper once (32 random bytes from a CSPRNG), store it as an environment variable or in a secrets manager, and load it at application startup. If the pepper is lost, all password hashes become unverifiable and every user must reset their password. Treat it with the same care as a database encryption key.

If you want a simpler API, the password-auth crate wraps argon2 with two functions (generate_hash, verify_password) and provides is_hash_obsolete() for detecting when stored hashes should be re-hashed with newer parameters. The lower-level API shown here gives more control when you need it.

Async context

Argon2 hashing is CPU-intensive. A single hash takes 50-200ms depending on hardware. Running it directly in an async handler blocks the tokio worker thread and starves other requests. Always offload to the blocking thread pool:

use tokio::task;

async fn hash_password_async(password: String) -> Result<String, anyhow::Error> {
    task::spawn_blocking(move || hash_password(&password))
        .await?
        .map_err(Into::into)
}

async fn verify_password_async(
    password: String,
    stored_hash: String,
) -> Result<(), anyhow::Error> {
    task::spawn_blocking(move || verify_password(&password, &stored_hash))
        .await?
        .map_err(Into::into)
}

The closure takes owned String values because spawn_blocking requires 'static. This moves the work to tokio’s dedicated blocking thread pool (separate from the async worker threads), keeping the async runtime responsive.

Password validation

Enforce constraints before hashing:

  • Minimum length: 10 characters. Shorter passwords are too easy to brute-force.
  • Maximum length: 128 characters. Without a maximum, an attacker can submit multi-megabyte passwords to exhaust server resources through expensive hashing.
  • Unicode normalisation: Apply NFKC normalisation before hashing. Different systems represent the same characters differently, which causes cross-platform login failures. The unicode-normalization crate handles this.

For password quality checking, the zxcvbn crate (a Rust port of Dropbox’s password strength estimator) catches common and weak passwords without maintaining a separate banned-password list.

Session layer

Set up a PostgreSQL-backed session store, run its migration to create the session table, and start a background task to clean up expired sessions.

use axum::Router;
use sqlx::PgPool;
use tower_sessions::{Expiry, SessionManagerLayer};
use tower_sessions_sqlx_store::PostgresStore;
use time::Duration;

async fn session_layer(pool: PgPool) -> SessionManagerLayer<PostgresStore> {
    let store = PostgresStore::new(pool);
    store.migrate().await.expect("session table migration failed");

    // Clean up expired sessions every 60 seconds
    tokio::task::spawn(
        store
            .clone()
            .continuously_delete_expired(tokio::time::Duration::from_secs(60)),
    );

    SessionManagerLayer::new(store)
        .with_secure(true)
        .with_expiry(Expiry::OnInactivity(Duration::hours(24)))
}

PostgresStore::migrate() creates a tower_sessions schema with a session table (columns: id TEXT, data BYTEA, expiry_date TIMESTAMPTZ). The continuously_delete_expired task runs in the background, removing sessions that have passed their expiry date.

Cookie configuration

SessionManagerLayer configures the session cookie through builder methods:

  • with_secure(true) sets the Secure flag so the cookie is only sent over HTTPS. Always enable this in production.
  • with_http_only(true) is the default. The cookie is inaccessible to JavaScript, protecting against XSS-based session theft.
  • with_same_site(SameSite::Lax) is the default. Cookies are sent on top-level navigations but not on cross-site subrequests. Combined with CSRF protection, this is sufficient for most applications. Use SameSite::Strict for high-security applications, with the trade-off that users clicking links to your site from email will appear logged out on first load.

Expiry options

tower-sessions supports three expiry strategies:

  • Expiry::OnInactivity(Duration) resets the expiration on each request. A sliding window. Good for most applications.
  • Expiry::AtDateTime(OffsetDateTime) sets a fixed expiration. The session expires at that time regardless of activity.
  • Expiry::OnSessionEnd creates a browser session cookie with no Max-Age. The cookie is deleted when the browser closes.

The default when no expiry is set is two weeks. For applications handling sensitive data, consider shorter windows (1-24 hours) and requiring re-authentication for high-risk actions.

Layer ordering

Apply the session layer as the outermost middleware so sessions are available to all inner layers and handlers:

let app = Router::new()
    .route("/register", get(show_register).post(handle_register))
    .route("/login", get(show_login).post(handle_login))
    .route("/logout", post(handle_logout))
    .layer(csrf_layer)
    .layer(session_layer(pool).await);

In Axum, the last .layer() call is the outermost layer and processes requests first. Here, the session layer processes first (loads the session from the cookie), then the CSRF layer checks the request origin, then the handler runs.

User table

Create a migration for the users table:

CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email TEXT UNIQUE NOT NULL,
    email_confirmed_at TIMESTAMPTZ,
    password_hash TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

The corresponding Rust struct:

use sqlx::types::time::OffsetDateTime;
use uuid::Uuid;

#[derive(Debug, Clone, sqlx::FromRow)]
pub struct User {
    pub id: Uuid,
    pub email: String,
    pub email_confirmed_at: Option<OffsetDateTime>,
    pub password_hash: String,
    pub created_at: OffsetDateTime,
    pub updated_at: OffsetDateTime,
}

Registration

The registration handler validates input, hashes the password, and creates the user. It does not reveal whether an email is already taken, to prevent account enumeration.

use axum::{extract::State, response::IntoResponse, Form};
use maud::{html, Markup};

#[derive(serde::Deserialize)]
struct RegisterForm {
    email: String,
    password: String,
    password_confirmation: String,
}

async fn show_register() -> Markup {
    html! {
        h1 { "Create an account" }
        form method="post" action="/register" {
            label for="email" { "Email" }
            input type="email" name="email" id="email" required;

            label for="password" { "Password" }
            input type="password" name="password" id="password"
                required minlength="10" maxlength="128"
                autocomplete="new-password";

            label for="password_confirmation" { "Confirm password" }
            input type="password" name="password_confirmation"
                id="password_confirmation" required
                autocomplete="new-password";

            button type="submit" { "Register" }
        }
    }
}

async fn handle_register(
    State(state): State<AppState>,
    Form(form): Form<RegisterForm>,
) -> impl IntoResponse {
    if form.password != form.password_confirmation {
        return show_error("Passwords do not match").into_response();
    }
    if form.password.len() < 10 || form.password.len() > 128 {
        return show_error("Password must be 10 to 128 characters").into_response();
    }

    let password_hash = match hash_password_async(form.password).await {
        Ok(hash) => hash,
        Err(_) => return show_error("Registration failed").into_response(),
    };

    // ON CONFLICT DO NOTHING prevents errors on duplicate email
    // without revealing whether the email already exists
    let result = sqlx::query(
        "INSERT INTO users (email, password_hash) \
         VALUES ($1, $2) ON CONFLICT (email) DO NOTHING",
    )
    .bind(&form.email)
    .bind(&password_hash)
    .execute(&state.db)
    .await;

    // Always show the same message. In the background, send different emails:
    // - New user: send a confirmation link
    // - Existing email: send "someone tried to register with your email"
    // See the Email Confirmation section below for the token flow.
    html! {
        h1 { "Check your email" }
        p { "If this email can be used for an account, you will receive further instructions." }
    }
    .into_response()
}

The ON CONFLICT (email) DO NOTHING query combined with a uniform response prevents attackers from probing which emails have accounts. The autocomplete="new-password" attribute tells password managers this is a registration form.

Login

The login handler verifies the password against the stored hash, creates a session, and cycles the session ID to prevent fixation attacks.

use axum::response::Redirect;
use tower_sessions::Session;

#[derive(serde::Deserialize)]
struct LoginForm {
    email: String,
    password: String,
}

async fn handle_login(
    session: Session,
    State(state): State<AppState>,
    Form(form): Form<LoginForm>,
) -> impl IntoResponse {
    let user: Option<User> = sqlx::query_as("SELECT * FROM users WHERE email = $1")
        .bind(&form.email)
        .fetch_optional(&state.db)
        .await
        .unwrap_or(None);

    let Some(user) = user else {
        // Run a dummy hash to prevent timing-based user enumeration
        let _ = hash_password_async("dummy-password".to_string()).await;
        return show_login_error("Invalid email or password").into_response();
    };

    if verify_password_async(form.password, user.password_hash.clone())
        .await
        .is_err()
    {
        return show_login_error("Invalid email or password").into_response();
    }

    // Prevent session fixation: generate a new session ID, preserving data
    session.cycle_id().await.expect("failed to cycle session ID");

    // Store user identity in the session
    session
        .insert("user_id", user.id)
        .await
        .expect("failed to insert session data");

    // Validate redirect target if using a ?next= parameter.
    // Only allow relative paths. Reject absolute URLs to prevent open redirects.
    Redirect::to("/").into_response()
}

Three security details matter here:

Timing attack prevention. When no user is found, a dummy hash_password_async call runs so the response time is similar regardless of whether the email exists. Without this, an attacker can distinguish “email not found” from “wrong password” by measuring response latency.

Session fixation prevention. session.cycle_id() generates a new session ID while preserving session data. Without this, an attacker who planted a known session ID (via a crafted link or subdomain cookie injection) could hijack the authenticated session.

Post-login redirect validation. If you add a ?next= parameter so users return to the page they were visiting before login, validate the target strictly. Allow only relative paths. Reject absolute URLs, URLs with different schemes or hosts, and URLs with embedded credentials. Without validation, an attacker can craft https://yoursite.com/login?next=https://evil.com, and the user sees a legitimate login page that redirects to a phishing site after authentication.

The error message is the same for both “user not found” and “wrong password”. Never reveal which one failed.

Rate limiting

Without rate limiting, the login endpoint is vulnerable to brute-force and credential stuffing attacks. Apply limits at two levels:

  • Per-account: Lock the account after a threshold of failed attempts (for example, 10). Unlock after a cooldown period (15 minutes) or via email. This stops targeted attacks against a single user.
  • Per-IP: Apply a sliding window limit (for example, 20 attempts per minute per IP). Return HTTP 429 with a Retry-After header. This slows distributed scanning.

Per-account limiting is the primary defence. Per-IP limiting alone is insufficient because botnets rotate IP addresses.

For Axum, tower_governor provides a Tower-compatible rate limiting layer based on the governor crate. Apply it to your auth routes:

use tower_governor::{GovernorConfig, GovernorLayer};

let governor_config = GovernorConfig::default(); // 1 request per 500ms per IP
let governor_layer = GovernorLayer {
    config: governor_config,
};

let auth_routes = Router::new()
    .route("/login", get(show_login).post(handle_login))
    .route("/register", get(show_register).post(handle_register))
    .layer(governor_layer);

This handles per-IP limiting. For per-account lockout, track failed attempts in a database column or a Redis counter keyed by email, and check it before verifying the password.

Logout

Destroy the session and redirect. Protect logout with a POST request, not GET, so cross-site <img> tags or link prefetching cannot force a logout.

async fn handle_logout(session: Session) -> impl IntoResponse {
    session.flush().await.expect("failed to flush session");
    Redirect::to("/login")
}

session.flush() clears all session data, deletes the record from the database, and nullifies the session cookie.

Extracting the current user

Build an Axum extractor that loads the authenticated user from the session. Use this wherever a handler needs the current user.

use axum::{
    extract::FromRequestParts,
    http::{request::Parts, StatusCode},
};

pub struct AuthUser(pub User);

impl<S: Send + Sync> FromRequestParts<S> for AuthUser {
    type Rejection = StatusCode;

    async fn from_request_parts(
        parts: &mut Parts,
        state: &S,
    ) -> Result<Self, Self::Rejection> {
        let session = Session::from_request_parts(parts, state)
            .await
            .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;

        let user_id: Uuid = session
            .get("user_id")
            .await
            .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?
            .ok_or(StatusCode::UNAUTHORIZED)?;

        let pool = parts
            .extensions
            .get::<PgPool>()
            .ok_or(StatusCode::INTERNAL_SERVER_ERROR)?;

        let user: User = sqlx::query_as("SELECT * FROM users WHERE id = $1")
            .bind(user_id)
            .fetch_optional(pool)
            .await
            .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?
            .ok_or(StatusCode::UNAUTHORIZED)?;

        Ok(AuthUser(user))
    }
}

Handlers that need authentication add AuthUser as a parameter. If no valid session exists, the request returns 401 before the handler body runs:

async fn dashboard(AuthUser(user): AuthUser) -> Markup {
    html! {
        h1 { "Welcome, " (user.email) }
    }
}

For the extractor to access the database pool, add it to request extensions via middleware, or make the extractor generic over your AppState. The approach depends on how you structure shared state; see Web Server with Axum.

CSRF protection

Cross-site request forgery tricks a logged-in user’s browser into making unintended requests to your application. Traditional defences embed hidden tokens in forms. A simpler approach validates the request origin using headers the browser sends automatically.

tower-csrf implements this origin-based approach, inspired by Filippo Valsorda’s analysis of CSRF and the defence built into Go 1.25’s net/http. Instead of managing tokens, it checks the Sec-Fetch-Site and Origin headers. Modern browsers (all major browsers since 2023) send Sec-Fetch-Site: same-origin for same-site requests. Cross-origin requests are blocked. Safe methods (GET, HEAD, OPTIONS) are allowed unconditionally.

use axum::{
    error_handling::HandleErrorLayer,
    http::StatusCode,
    response::IntoResponse,
};
use tower::ServiceBuilder;
use tower_csrf::{CrossOriginProtectionLayer, ProtectionError};

let csrf_layer = ServiceBuilder::new()
    .layer(HandleErrorLayer::new(
        |error: Box<dyn std::error::Error + Send + Sync>| async move {
            if error.downcast_ref::<ProtectionError>().is_some() {
                (StatusCode::FORBIDDEN, "Cross-origin request blocked").into_response()
            } else {
                StatusCode::INTERNAL_SERVER_ERROR.into_response()
            }
        },
    ))
    .layer(CrossOriginProtectionLayer::default());

No hidden form fields. No hx-headers configuration for htmx. Same-origin requests pass automatically because the browser attests to the origin. This is a clean fit for HDA applications where every form submission and htmx request originates from the same domain.

If you need to accept cross-origin requests from specific origins (SSO callbacks, webhooks), add them explicitly:

let csrf = CrossOriginProtectionLayer::default()
    .add_trusted_origin("https://sso.example.com")
    .expect("valid origin URL");

For the full argument behind origin-based CSRF validation and why token-based CSRF is unnecessary in modern browsers, read Filippo Valsorda’s analysis.

If you need to support browsers that do not send Sec-Fetch-Site headers (pre-2023), or you prefer a traditional token-based approach, axum_csrf provides a double-submit cookie pattern compatible with Axum 0.8.

Email confirmation

Confirm email addresses before activating accounts. Without confirmation, anyone can register with someone else’s email, and your application sends unwanted messages to non-users.

The flow uses a split token pattern: a 16-byte identifier for database lookup and a 16-byte verifier for constant-time comparison. Store the SHA-256 hash of the verifier, never the verifier itself. If the database leaks, attackers cannot reconstruct valid confirmation links.

Flow

  1. On registration, generate an identifier (16 random bytes) and a verifier (16 random bytes) using a CSPRNG (OsRng).
  2. Store in a confirmations table: identifier (indexed), SHA-256(verifier), user ID, expiration (24-48 hours), and action type (email_confirmation).
  3. Base64url-encode the concatenated identifier + verifier into a link: https://example.com/confirm?token=<encoded>.
  4. Send the link via email. See Email for sending with Lettre and testing with MailCrab.
  5. When the user clicks the link, require an active session (the user must be logged in). Split the token back into identifier and verifier. Look up by identifier. Check expiration. Constant-time compare SHA-256(received verifier) with the stored hash using the subtle crate.
  6. On success, set email_confirmed_at on the user record and delete the confirmation record.

Requiring an active session at step 5 prevents an attacker who intercepts the confirmation email (compromised mailbox, network interception) from confirming the account without knowing the password. The user must both possess the token and be authenticated.

Preventing enumeration

Never reveal whether an email is already registered. On registration:

  • Always display: “Check your email to complete registration.”
  • If the email is new, send a confirmation link.
  • If the email already exists, send a different message: “Someone attempted to register with your email. If this was you, you can log in or reset your password.”

Schedule the email step asynchronously so the response time is identical in both cases. A timing difference between “new account” and “existing account” is enough for an attacker to enumerate emails.

Confirmations table

A single table handles email confirmations, password resets, and email changes:

CREATE TABLE confirmations (
    identifier BYTEA PRIMARY KEY,
    verifier_hash BYTEA NOT NULL,
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    action_type TEXT NOT NULL,
    details JSONB,
    expires_at TIMESTAMPTZ NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_confirmations_user_id ON confirmations(user_id);

The action_type column distinguishes confirmation purposes. The details column holds action-specific data as JSON (for example, the new email address during an email change).

Password reset

Password resets follow the same split token pattern as email confirmation. The key differences are a shorter expiration and the requirement to invalidate all existing sessions after a successful reset.

Flow

  1. User submits their email on the reset form.
  2. Display: “If this email belongs to an account, you will receive reset instructions.” Never reveal whether the email has an account.
  3. Look up the user. If found, generate a split token, store it with a 30-minute expiration and action type password_reset, and email the link. If not found, do nothing. Schedule the work asynchronously for consistent response timing.
  4. When the user clicks the link, verify the token (same split-and-compare as email confirmation). Show a new password form with the token as a hidden field.
  5. On form submission, re-verify the token, hash the new password, update the user record, delete all reset tokens for this user, invalidate all sessions for this user, create a new session, and log them in.

Security considerations

  • 30-minute expiration. Reset tokens are high-value targets. Keep the window short.
  • Invalidate all sessions after a successful reset. If the password was compromised, existing sessions may belong to an attacker.
  • Allow multiple outstanding tokens. Don’t delete old tokens when a new reset is requested. The user may request a reset, not receive the email, and request again.
  • Delete tokens on login. If the user remembers their password and logs in normally, delete their outstanding reset tokens.
  • Require the new password on the same form as the token. Don’t split this into two steps. Re-verify the token on submission to prevent replay.

When to delegate authentication

Session-based auth works well for a single application with its own user base and straightforward login requirements. Delegate to an external Identity Provider when the requirements outgrow what the application should manage directly:

  • Multiple applications need SSO. Users log in once and access several services. A shared identity layer is easier to maintain than per-application auth.
  • Enterprise customers expect SAML or OIDC. B2B SaaS products typically need to integrate with customers’ corporate identity systems.
  • Compliance frameworks require it. SOC 2, HIPAA, and PCI-DSS audits favour dedicated identity infrastructure with built-in audit logging, brute-force protection, and pre-certified MFA controls. An external IdP gives auditors a clear separation of concerns.
  • Existing infrastructure. Your organisation already runs Active Directory, LDAP, or a corporate IdP that users expect to log in with.

Self-hosted identity providers

Keycloak is a full-featured open-source IdP (CNCF incubation project) supporting OAuth2, OIDC, SAML, and LDAP federation. It handles SSO, MFA, identity brokering, and user management. The trade-off is operational weight: it is a Java application with significant resource requirements.

Authentik is a lighter alternative with a more modern developer experience, supporting OAuth2, OIDC, SAML, LDAP, and SCIM.

Both align with this guide’s preference for self-hosted infrastructure.

Integration with OAuth2 Proxy

OAuth2 Proxy sits between users and your Axum application as a reverse proxy. It handles the OAuth2/OIDC flow with your IdP and forwards authenticated requests with identity headers:

use axum::http::HeaderMap;

async fn handler(headers: HeaderMap) -> Markup {
    let email = headers
        .get("X-Forwarded-Email")
        .and_then(|v| v.to_str().ok())
        .unwrap_or("anonymous");

    html! { p { "Logged in as " (email) } }
}

The application reads identity from trusted headers (X-Forwarded-User, X-Forwarded-Email) without implementing OAuth2 flows directly. The proxy strips any client-supplied identity headers before injecting authenticated values, preventing spoofing.

Your application must only be reachable through the proxy, never directly from the internet. Enforce this at the network level: firewall rules, container networking, or Tailscale ACLs. If a client can bypass the proxy, it can set X-Forwarded-User to any value.

Choosing an auth strategy

SituationApproach
Single app, simple loginSession auth (tower-sessions + argon2)
Single app, social login (GitHub, Google)oauth2 / openidconnect crate in Axum
Multiple apps needing SSOExternal IdP (Keycloak/Authentik) + OAuth2 Proxy
B2B SaaS, enterprise customersExternal IdP or managed service (Auth0, WorkOS)
SOC 2 / HIPAA / PCI-DSS complianceExternal IdP strongly recommended
Existing Active Directory / LDAPKeycloak

Start with session-based auth. Move to an external IdP when you hit one of the triggers above. The migration is additive: OAuth2 Proxy sits in front of your existing application, and the AuthUser extractor reads from proxy headers instead of session data.

Implementation resources

For AI coding agents implementing authentication: the secure-auth skill provides detailed security reference material covering cryptographic fundamentals, password hashing parameters, the split token pattern, session management, MFA (TOTP, WebAuthn, recovery codes), and security review checklists. Use it as context when building the patterns described in this section.

For access to the secure-auth skill and detailed implementation guidance, contact the author.

Gotchas

spawn_blocking for all password operations. Forgetting to offload argon2 to the blocking thread pool is the most common mistake. Under load, a single blocked tokio worker thread cascades into request timeouts across the application.

Session ID cycling must happen before inserting user data. Call session.cycle_id() before session.insert("user_id", ...). If you insert first and the cycle fails, the old (potentially attacker-controlled) session ID now has authenticated data.

tower-sessions version compatibility. The tower-sessions and tower-sessions-sqlx-store crates track tower-sessions-core versions independently. If Cargo reports a version conflict on tower-sessions-core, check that the published sqlx-store version matches your tower-sessions version. Pin both until they align.

Consistent error messages on auth forms. Every registration, login, and reset form must give the same response regardless of whether the email exists. This includes response timing. An async email-sending step after registration or reset prevents timing leaks.

CSRF on logout. Logout must be a POST request protected by CSRF, not a GET link. A GET-based logout allows any cross-site <img> tag to force a logout, which is a nuisance attack that can also be chained with session fixation.

Authorization

Authentication answers “who is this user?” Authorization answers “what can this user do?” The two concerns are separate. A user can be authenticated and still forbidden from accessing a resource.

Authorization is domain-dependent. An internal admin tool, a multi-tenant SaaS product, and a public content platform all need different models. There is no universal authorization framework worth adopting for every project. The Rust ecosystem reflects this: most production Axum applications build authorization with custom extractors rather than reaching for a policy engine. This section follows that approach, building on the AuthUser extractor from Authentication.

Adding roles to the user model

The simplest authorization model adds a role directly to the user record. This covers the majority of applications where users fall into a small number of categories with distinct access levels.

Add a migration:

CREATE TYPE user_role AS ENUM ('user', 'editor', 'admin');

ALTER TABLE users ADD COLUMN role user_role NOT NULL DEFAULT 'user';

A PostgreSQL enum constrains the value at the database level. New roles require a migration (ALTER TYPE user_role ADD VALUE 'moderator'), which is appropriate when roles change infrequently. If your roles change often or vary per deployment, use a TEXT column with application-level validation instead.

The corresponding Rust types:

#[derive(Debug, Clone, Copy, PartialEq, Eq, sqlx::Type)]
#[sqlx(type_name = "user_role", rename_all = "lowercase")]
pub enum Role {
    User,
    Editor,
    Admin,
}

#[derive(Debug, Clone, sqlx::FromRow)]
pub struct User {
    pub id: Uuid,
    pub email: String,
    pub email_confirmed_at: Option<OffsetDateTime>,
    pub password_hash: String,
    pub role: Role,
    pub created_at: OffsetDateTime,
    pub updated_at: OffsetDateTime,
}

sqlx::Type with type_name = "user_role" maps the Rust enum to the PostgreSQL enum. The rename_all = "lowercase" attribute matches the lowercase variants in the SQL definition.

Permission checking with extractors

Axum extractors are the natural place for authorization. They run before the handler body, they can reject requests early, and they make permission requirements visible in the handler’s function signature.

Role-based extractor

A generic extractor that requires a minimum role:

use axum::{
    extract::FromRequestParts,
    http::{request::Parts, StatusCode},
    response::{IntoResponse, Response},
};

pub struct RequireRole<const ROLE: u8>(pub User);

impl<S: Send + Sync, const ROLE: u8> FromRequestParts<S> for RequireRole<ROLE> {
    type Rejection = Response;

    async fn from_request_parts(
        parts: &mut Parts,
        state: &S,
    ) -> Result<Self, Self::Rejection> {
        let AuthUser(user) = AuthUser::from_request_parts(parts, state)
            .await
            .map_err(|e| e.into_response())?;

        if !user.role.has_at_least(ROLE) {
            return Err(StatusCode::FORBIDDEN.into_response());
        }

        Ok(RequireRole(user))
    }
}

The const generic approach is clean, but Rust does not yet support enum values as const generics. Use integer constants as a workaround:

impl Role {
    const fn level(self) -> u8 {
        match self {
            Role::User => 0,
            Role::Editor => 1,
            Role::Admin => 2,
        }
    }

    fn has_at_least(&self, required: u8) -> bool {
        self.level() >= required
    }
}

pub const EDITOR: u8 = 1;
pub const ADMIN: u8 = 2;

Handlers declare their required role in the signature:

async fn admin_dashboard(RequireRole<ADMIN>(user): RequireRole<ADMIN>) -> Markup {
    html! {
        h1 { "Admin dashboard" }
        p { "Logged in as " (user.email) }
    }
}

async fn edit_article(RequireRole<EDITOR>(user): RequireRole<EDITOR>) -> Markup {
    // Editors and admins can reach this handler
    html! { h1 { "Edit article" } }
}

A simpler alternative avoids const generics entirely. Define separate extractor types for each role:

pub struct RequireAdmin(pub User);

impl<S: Send + Sync> FromRequestParts<S> for RequireAdmin {
    type Rejection = Response;

    async fn from_request_parts(
        parts: &mut Parts,
        state: &S,
    ) -> Result<Self, Self::Rejection> {
        let AuthUser(user) = AuthUser::from_request_parts(parts, state)
            .await
            .map_err(|e| e.into_response())?;

        if user.role != Role::Admin {
            return Err(StatusCode::FORBIDDEN.into_response());
        }

        Ok(RequireAdmin(user))
    }
}

This is more verbose when you have many roles, but each extractor is self-contained and easy to understand. For most applications with two or three roles, separate types are the better choice.

Resource ownership checks

Role-based checks are not enough when access depends on who owns a resource. An editor should edit their own articles but not someone else’s. This is resource-level authorization, and it belongs in the handler, not in an extractor, because the handler is where you load the resource.

async fn update_article(
    AuthUser(user): AuthUser,
    State(state): State<AppState>,
    Path(article_id): Path<Uuid>,
    Form(form): Form<ArticleForm>,
) -> Result<impl IntoResponse, StatusCode> {
    let article = sqlx::query_as!(
        Article,
        "SELECT * FROM articles WHERE id = $1",
        article_id
    )
    .fetch_optional(&state.db)
    .await
    .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?
    .ok_or(StatusCode::NOT_FOUND)?;

    // Ownership check: author or admin
    if article.author_id != user.id && user.role != Role::Admin {
        return Err(StatusCode::FORBIDDEN);
    }

    // Proceed with update...
    Ok(Redirect::to(&format!("/articles/{}", article_id)))
}

The pattern is straightforward: load the resource, check whether the user has access, proceed or reject. Resist the temptation to push this into middleware or an extractor. Resource-level checks depend on the specific resource being accessed, which makes them inherently handler-level logic.

Protecting route groups

For coarse-grained protection (all routes under /admin require an admin), apply the extractor as a route layer:

use axum::middleware;

let admin_routes = Router::new()
    .route("/admin/dashboard", get(admin_dashboard))
    .route("/admin/users", get(admin_users))
    .route("/admin/settings", get(admin_settings).post(update_settings))
    .route_layer(middleware::from_extractor::<RequireAdmin>());

let app = Router::new()
    .merge(admin_routes)
    .route("/", get(home))
    .route("/articles", get(list_articles))
    .layer(session_layer);

route_layer applies the extractor to all routes in the group. Any request to /admin/* that fails the admin check gets a 403 before the handler runs. The extractor still runs per-request, hitting the database each time, so the session-backed user lookup from AuthUser happens on every admin request.

For unauthenticated route groups mixed with authenticated ones, structure your router so the auth layer only wraps the routes that need it:

let public_routes = Router::new()
    .route("/", get(home))
    .route("/login", get(show_login).post(handle_login))
    .route("/register", get(show_register).post(handle_register));

let protected_routes = Router::new()
    .route("/dashboard", get(dashboard))
    .route("/settings", get(settings).post(update_settings))
    .route_layer(middleware::from_extractor::<AuthUser>());

let app = Router::new()
    .merge(public_routes)
    .merge(protected_routes)
    .layer(session_layer);

Returning meaningful errors

A bare StatusCode::FORBIDDEN is unhelpful to users. In an HDA application, return an HTML fragment that explains what went wrong:

use axum::response::{IntoResponse, Response};

pub enum AuthzError {
    Unauthenticated,
    Forbidden,
}

impl IntoResponse for AuthzError {
    fn into_response(self) -> Response {
        match self {
            AuthzError::Unauthenticated => {
                // Redirect to login with a return URL
                Redirect::to("/login").into_response()
            }
            AuthzError::Forbidden => {
                (StatusCode::FORBIDDEN, html! {
                    h1 { "Access denied" }
                    p { "You do not have permission to access this page." }
                    a href="/" { "Return to home" }
                }).into_response()
            }
        }
    }
}

Use AuthzError::Unauthenticated (not logged in) to redirect to login. Use AuthzError::Forbidden (logged in but insufficient permissions) to show a 403 page. The distinction matters: a 401 redirect invites the user to log in, while a 403 tells them their current account cannot access the resource.

Multi-tenancy authorization

When your application serves multiple tenants (organisations, teams, workspaces), authorization gains a tenant dimension. A user may be an admin in one organisation and a regular member in another.

The most common model for HDA applications is a shared database with a tenant_id column on tenant-scoped tables:

CREATE TABLE organisations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE memberships (
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    organisation_id UUID NOT NULL REFERENCES organisations(id) ON DELETE CASCADE,
    role user_role NOT NULL DEFAULT 'user',
    PRIMARY KEY (user_id, organisation_id)
);

CREATE TABLE projects (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    organisation_id UUID NOT NULL REFERENCES organisations(id) ON DELETE CASCADE,
    name TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

A memberships table maps users to organisations with a role per membership. This replaces the single role column on the user record with a per-tenant role.

Tenant-scoped extractor

Build an extractor that resolves the current tenant from the request (subdomain, path parameter, or header) and verifies the user’s membership:

pub struct TenantUser {
    pub user: User,
    pub organisation_id: Uuid,
    pub role: Role,
}

impl<S: Send + Sync> FromRequestParts<S> for TenantUser {
    type Rejection = Response;

    async fn from_request_parts(
        parts: &mut Parts,
        state: &S,
    ) -> Result<Self, Self::Rejection> {
        let AuthUser(user) = AuthUser::from_request_parts(parts, state)
            .await
            .map_err(|e| e.into_response())?;

        // Extract organisation ID from path, e.g. /org/:org_id/projects
        let Path(org_id): Path<Uuid> = Path::from_request_parts(parts, state)
            .await
            .map_err(|_| StatusCode::BAD_REQUEST.into_response())?;

        let pool = parts
            .extensions
            .get::<PgPool>()
            .ok_or(StatusCode::INTERNAL_SERVER_ERROR.into_response())?;

        let membership = sqlx::query_as!(
            Membership,
            "SELECT role as \"role: Role\" FROM memberships \
             WHERE user_id = $1 AND organisation_id = $2",
            user.id,
            org_id
        )
        .fetch_optional(pool)
        .await
        .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR.into_response())?
        .ok_or(StatusCode::FORBIDDEN.into_response())?;

        Ok(TenantUser {
            user,
            organisation_id: org_id,
            role: membership.role,
        })
    }
}

Every query in a tenant-scoped handler then filters by organisation_id:

async fn list_projects(
    tenant: TenantUser,
    State(state): State<AppState>,
) -> Markup {
    let projects = sqlx::query_as!(
        Project,
        "SELECT * FROM projects WHERE organisation_id = $1 ORDER BY name",
        tenant.organisation_id
    )
    .fetch_all(&state.db)
    .await
    .expect("query failed");

    html! {
        h1 { "Projects" }
        ul {
            @for project in &projects {
                li { (project.name) }
            }
        }
    }
}

The WHERE organisation_id = $1 clause is the tenant boundary. Miss it on a single query and you leak data across tenants. This is the fundamental weakness of application-level tenant isolation: it depends on every query being correct.

For the three database-level isolation models (shared database with tenant column, schema-per-tenant, database-per-tenant), shared database with a tenant column is the right starting point for most applications. Schema-per-tenant and database-per-tenant add operational complexity (per-tenant migrations, connection routing) that is only justified when regulatory requirements or enterprise customers demand stronger isolation guarantees.

PostgreSQL Row-Level Security

PostgreSQL Row-Level Security (RLS) enforces tenant isolation at the database level rather than relying on application code to include WHERE tenant_id = $1 on every query. The database rejects or filters rows that violate the policy, regardless of what the application sends.

ALTER TABLE projects ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON projects
    USING (organisation_id = current_setting('app.organisation_id')::uuid);

Set the tenant context at the start of each request using SET LOCAL inside a transaction:

let mut tx = pool.begin().await?;
sqlx::query("SELECT set_config('app.organisation_id', $1, true)")
    .bind(org_id.to_string())
    .execute(&mut *tx)
    .await?;

// Queries in this transaction are automatically filtered by tenant
let projects = sqlx::query_as!(Project, "SELECT * FROM projects")
    .fetch_all(&mut *tx)
    .await?;

tx.commit().await?;

The third argument to set_config (true) scopes the setting to the current transaction. When the transaction ends, the setting resets.

RLS is powerful but carries significant operational gotchas:

  • Connection pool contamination. If you use set_config(..., false) (session-scoped instead of transaction-local), the setting persists on the connection. When that connection returns to the pool and is reused by a different tenant, it carries the previous tenant’s context. Always use true for transaction-local scope.
  • Superuser and table owner bypass. PostgreSQL superusers and table owners skip RLS entirely. Your development database (often running as a superuser) silently ignores all policies. Use ALTER TABLE ... FORCE ROW LEVEL SECURITY and test with a restricted role.
  • SQLx compile-time checking. SQLx’s query! macros connect to the database during compilation. If that database has RLS enabled and the compile-time connection uses a restricted role, queries fail at build time. Point DATABASE_URL at a superuser role for compilation and use a restricted role at runtime.
  • PgBouncer. RLS with session variables does not work with PgBouncer in statement pooling mode. Use transaction pooling or session pooling.

For most applications, application-level WHERE clauses with a well-tested tenant-scoped extractor are simpler and sufficient. Consider RLS when you need defence-in-depth for sensitive data, or when compliance requirements demand database-enforced isolation.

When to reach for a policy engine

Custom extractors handle role-based checks well. They become unwieldy when authorization rules are complex, frequently changing, or need to be auditable separately from application code.

Signs you have outgrown custom extractors:

  • Permissions depend on combinations of user attributes, resource attributes, and environmental conditions (time of day, IP range). This is attribute-based access control (ABAC), and encoding it in Rust conditionals becomes error-prone.
  • Non-developers (compliance officers, product managers) need to review or modify access policies.
  • You need an audit trail of policy changes separate from code deployments.

Two production-quality options exist in the Rust ecosystem:

  • Cedar (by Amazon, 3.6M+ downloads) provides a purpose-built policy language for RBAC and ABAC. Policies are human-readable text files evaluated by the Cedar engine. No Axum-specific integration exists; wrap it in a service called from your extractors or handlers.
  • Casbin (1M+ downloads) supports ACL, RBAC, and ABAC models through a configuration-driven approach. The axum-casbin crate provides Axum middleware integration.

Both are well-maintained. Cedar has stronger backing and a more expressive policy language. Casbin has broader ecosystem support and a ready-made Axum integration. Evaluate both if you reach the point of needing one. Most HDA applications with a handful of roles and straightforward ownership rules will not.

Gotchas

Authorization checks on every code path. A handler that loads a resource and modifies it needs the authorization check between load and modify, not just at the route level. Route-level checks confirm the user’s role. Handler-level checks confirm access to the specific resource. Both are needed.

Forgetting to scope queries by tenant. In a multi-tenant application, every query that touches tenant data must include the tenant filter. A single unscoped query leaks data across tenants. Code review should treat a missing WHERE organisation_id = $1 with the same severity as a SQL injection.

Caching user roles. If you cache the user’s role (in the session, in memory), a role change by an admin does not take effect until the cache expires or the user logs out. For most applications, querying the database on each request through the extractor is fast enough and avoids stale-role bugs. If you do cache, keep the TTL short.

Confusing 401 and 403. Return 401 (Unauthorized) when the user is not authenticated, prompting a login. Return 403 (Forbidden) when the user is authenticated but lacks permission. Mixing these up confuses both users and API consumers.

Horizontal privilege escalation. A user modifies a URL parameter (changing /articles/123/edit to /articles/456/edit) and accesses another user’s resource. Role checks alone do not prevent this. Every handler that operates on a specific resource must verify that the authenticated user has access to that particular resource.

Web Application Security

Rust eliminates entire classes of memory safety vulnerabilities, but web application security is broader than memory safety. Cross-site scripting, SQL injection, missing security headers, and misconfigured policies are all possible in Rust applications. This section covers the web-specific security concerns that require explicit attention in the Axum/Maud/htmx/SQLx stack.

For CSRF protection, session cookie configuration, and rate limiting on authentication endpoints, see Authentication. For input validation and sanitisation, see Form Handling and Validation.

OWASP Top 10 in this stack

The OWASP Top 10 (2021) is the standard classification of web application security risks. Not all categories require equal attention in every stack. This table maps each category to its relevance in a Rust/Axum/Maud/SQLx application.

#CategoryRelevanceNotes
A01Broken Access ControlHighApplication logic. Rust provides no automatic protection. Every route needs explicit authorisation checks.
A02Cryptographic FailuresMediumDepends on crate choices. Use audited crates (ring, rustls, argon2). Never roll custom cryptography.
A03InjectionLowSQLx uses parameterized queries. Maud auto-escapes HTML. Both mitigate the primary injection vectors by default. Risk remains if you bypass either.
A04Insecure DesignMediumArchitecture-level concern. Applies equally to all stacks. Threat modelling and secure design reviews are the mitigations.
A05Security MisconfigurationHighMissing security headers, permissive CORS, debug output in production, default session secrets. Requires explicit configuration.
A06Vulnerable and Outdated ComponentsMediumRust crates can have vulnerabilities. Run cargo audit in CI. Use cargo deny for policy enforcement.
A07Identification and Authentication FailuresMediumCovered in Authentication: argon2 hashing, session management, rate limiting.
A08Software and Data Integrity FailuresLowCargo.lock pins exact dependency versions. Cargo verifies checksums. CI/CD pipeline security is the residual risk.
A09Security Logging and Monitoring FailuresMediumOperational concern. Use the tracing crate for structured logging.
A10Server-Side Request ForgeryLowOnly relevant if your application makes outbound HTTP requests based on user-supplied URLs. Validate URL schemes and destinations if so.

The categories that need the most attention in this stack are A01 (Broken Access Control) and A05 (Security Misconfiguration). The Rust type system and the crate ecosystem handle A03 (Injection) and A08 (Integrity) well by default, but only if you use them correctly.

XSS prevention with Maud

Maud’s html! macro HTML-entity-escapes all interpolated values by default. The characters <, >, &, ", and ' are converted to their entity equivalents. This prevents the most common XSS vector: injecting <script> tags through user input.

use maud::html;

// Safe: user_input is escaped automatically
html! {
    p { (user_input) }
}
// If user_input = "<script>alert('xss')</script>"
// Renders: <p>&lt;script&gt;alert('xss')&lt;/script&gt;</p>

What bypasses escaping

PreEscaped() disables escaping for its argument. It exists for cases where you have trusted HTML that should be rendered as-is (content from a Markdown renderer, for example).

use maud::PreEscaped;

// DANGEROUS if content is untrusted:
html! {
    (PreEscaped(user_provided_html))
}

Treat every use of PreEscaped as a security boundary. If the input is not fully trusted, do not use it. If you must render user-supplied rich text, sanitise it with a dedicated HTML sanitiser (such as ammonia) before passing it to PreEscaped.

Risks that remain with auto-escaping

Maud performs HTML-entity escaping only. It does not do context-aware escaping, which means certain attack vectors survive even with escaping enabled.

javascript: URLs. If user input is used as an href or src value, HTML-entity escaping does not prevent javascript: scheme attacks because no < or > characters need escaping:

// VULNERABLE: user_url could be "javascript:alert(1)"
html! {
    a href=(user_url) { "Click here" }
}

Validate URLs on the server before rendering them. Only allow http:// and https:// schemes:

fn is_safe_url(url: &str) -> bool {
    url.starts_with("https://") || url.starts_with("http://") || url.starts_with("/")
}

CSS injection in style attributes. Setting a style attribute from user input can enable CSS-based attacks (data exfiltration via background-image URLs, UI redressing) even with HTML escaping. Do not interpolate user input into style attributes. Use CSS classes instead.

<script> and <style> element bodies. The HTML specification does not process entity escapes inside <script> and <style> elements. Maud will escape the content, which either mangles valid JavaScript/CSS or requires PreEscaped to work correctly. Never interpolate user input into <script> or <style> blocks. Pass data to JavaScript via data- attributes on HTML elements, where Maud’s escaping is effective.

Maud’s structured syntax makes it difficult to accidentally construct event handler attributes like onclick from user input, unlike string-based template engines where concatenation errors can create attribute injection. This is a genuine safety advantage of the macro approach.

SQL injection prevention with SQLx

SQLx uses bind parameters for all user-provided values. The database driver sends the query structure and parameter values separately over the wire, making injection impossible at the protocol level.

Compile-time checked queries

The sqlx::query! and sqlx::query_as! macros accept only string literals. You cannot pass a String or the result of format!(). The macro verifies the query against a live database at compile time, checking column names, types, and placeholder counts. SQL injection is structurally impossible in compile-time checked queries because there is no way to interpolate user input into the query string.

// Safe: query is a string literal, user_id is a bind parameter
let user = sqlx::query_as!(User, "SELECT * FROM users WHERE id = $1", user_id)
    .fetch_one(&pool)
    .await?;

Runtime queries

The sqlx::query() function accepts a &str query string. Parameters are bound via .bind() calls. The API deliberately accepts &str rather than String to create friction against format!() usage.

// Safe: parameterized query
let users = sqlx::query("SELECT * FROM users WHERE email = $1")
    .bind(&email)
    .fetch_all(&pool)
    .await?;

How to accidentally bypass parameterization

The primary risk is using format!() to build query strings:

// VULNERABLE: format! injects user input directly into the query string
let query = format!("SELECT * FROM users WHERE name = '{}'", user_input);
let result = sqlx::query(&query).fetch_all(&pool).await?;

This is the Rust equivalent of string concatenation in PHP or Python SQL. SQLx’s design discourages it (.query() takes &str, not String), but does not make it impossible.

Dynamic table or column names present a legitimate challenge because SQL bind parameters only work for values, not identifiers. If you need dynamic identifiers (user-selected sort column, for example), validate them against an allowlist:

fn validated_sort_column(input: &str) -> &str {
    match input {
        "name" | "email" | "created_at" => input,
        _ => "created_at", // safe default
    }
}

let order_by = validated_sort_column(&params.sort);
let query = format!("SELECT * FROM users ORDER BY {order_by}");
let users = sqlx::query(&query).fetch_all(&pool).await?;

The format!() here is safe because order_by can only be one of the validated values. The principle: bind parameters for values, allowlists for identifiers.

Content Security Policy

A Content Security Policy (CSP) header tells the browser which sources of content are permitted. A well-configured CSP is the strongest defence against XSS after output escaping, because it prevents the browser from executing injected scripts even if they make it into the HTML.

A baseline CSP for this stack

For an HDA application serving HTML from Axum, with htmx loaded from a same-origin file and CSS in external stylesheets:

default-src 'self';
script-src 'self';
style-src 'self';
img-src 'self' data:;
font-src 'self';
connect-src 'self';
form-action 'self';
frame-ancestors 'none';
base-uri 'self';
object-src 'none'

This policy allows scripts, styles, images, fonts, and connections only from the same origin. It blocks framing (frame-ancestors 'none'), restricts form targets (form-action 'self'), and disallows plugins (object-src 'none'). The data: source for images permits inline data URIs (common for small icons and placeholder images).

htmx and CSP complications

htmx creates three specific CSP challenges that you need to plan for.

Inline indicator styles. By default, htmx injects a <style> element into the page for its loading indicator CSS. This violates a style-src 'self' policy. Disable this by setting the includeIndicatorStyles configuration to false and providing the indicator CSS in your own stylesheet:

<meta name="htmx-config" content='{"includeIndicatorStyles": false}'>

The indicator CSS you need to include in your own stylesheet:

.htmx-indicator {
    opacity: 0;
    transition: opacity 200ms ease-in;
}

.htmx-request .htmx-indicator {
    opacity: 1;
}

.htmx-request.htmx-indicator {
    opacity: 1;
}

hx-on:* attributes. htmx’s hx-on:* attributes (e.g., hx-on:click, hx-on:htmx:after-swap) are functionally equivalent to inline event handlers. They require 'unsafe-inline' in script-src, which defeats the purpose of CSP for script control. Avoid hx-on:* attributes entirely. Use hx-trigger with server-driven patterns instead, or attach event listeners in external JavaScript files.

Nonce propagation on AJAX responses. htmx automatically copies the nonce attribute from inline scripts it finds in AJAX responses. This is intended as a convenience but undermines the CSP nonce security model: if an attacker can inject a <script> tag into a server response (via stored XSS), htmx will propagate the nonce to it, and the browser will execute it. The defence is to not rely on nonces for htmx-loaded content. Instead, serve all JavaScript from external files (script-src 'self') and do not use inline scripts in htmx responses.

The practical approach

The combination of these constraints points to a clear policy:

  1. Serve all JavaScript from same-origin files. No inline scripts.
  2. Serve all CSS from same-origin stylesheets. Disable htmx’s built-in indicator styles.
  3. Do not use hx-on:* attributes.
  4. Use script-src 'self' and style-src 'self' without nonces or 'unsafe-inline'.

This approach is simpler and more secure than a nonce-based policy. The trade-off is that you cannot use inline scripts or styles at all, which in an HDA application is rarely a limitation.

Security headers middleware

Beyond CSP, several HTTP response headers improve security. Rather than pulling in a dependency, write a middleware function that sets them all. This keeps the headers visible in your codebase and avoids relying on a third-party crate’s defaults.

use axum::{extract::Request, middleware::Next, response::Response};
use http::{header, HeaderValue};

pub async fn security_headers(req: Request, next: Next) -> Response {
    let mut res = next.run(req).await;
    let headers = res.headers_mut();

    headers.insert(
        header::X_CONTENT_TYPE_OPTIONS,
        HeaderValue::from_static("nosniff"),
    );
    headers.insert(
        header::X_FRAME_OPTIONS,
        HeaderValue::from_static("DENY"),
    );
    headers.insert(
        header::STRICT_TRANSPORT_SECURITY,
        HeaderValue::from_static("max-age=63072000; includeSubDomains"),
    );
    headers.insert(
        header::REFERRER_POLICY,
        HeaderValue::from_static("strict-origin-when-cross-origin"),
    );
    headers.insert(
        header::CONTENT_SECURITY_POLICY,
        HeaderValue::from_static(
            "default-src 'self'; script-src 'self'; style-src 'self'; \
             img-src 'self' data:; font-src 'self'; connect-src 'self'; \
             form-action 'self'; frame-ancestors 'none'; base-uri 'self'; \
             object-src 'none'"
        ),
    );
    headers.insert(
        HeaderName::from_static("permissions-policy"),
        HeaderValue::from_static("camera=(), microphone=(), geolocation=()"),
    );
    headers.insert(
        HeaderName::from_static("cross-origin-opener-policy"),
        HeaderValue::from_static("same-origin"),
    );

    res
}

Apply the middleware to your router:

use axum::{middleware, Router};

let app = Router::new()
    .route("/", get(index))
    // ... other routes
    .layer(middleware::from_fn(security_headers));

Add use http::HeaderName; for the headers that are not in the http crate’s built-in constants (permissions-policy, cross-origin-opener-policy).

What each header does

HeaderValuePurpose
X-Content-Type-OptionsnosniffPrevents browsers from MIME-type sniffing responses away from the declared Content-Type. Stops attacks that trick browsers into treating HTML as JavaScript.
X-Frame-OptionsDENYPrevents the page from being embedded in <iframe>, <frame>, or <object> elements. Blocks clickjacking. Superseded by CSP frame-ancestors but still needed for older browsers.
Strict-Transport-Securitymax-age=63072000; includeSubDomainsForces HTTPS for two years, including all subdomains. Only set this once you are committed to HTTPS (it is difficult to undo).
Referrer-Policystrict-origin-when-cross-originSends the full URL as referrer for same-origin requests, but only the origin (no path) for cross-origin requests. Prevents leaking internal URL paths to external sites.
Content-Security-PolicySee aboveControls which sources of content the browser will load. The primary defence against XSS beyond output escaping.
Permissions-Policycamera=(), microphone=(), geolocation=()Disables browser APIs your application does not use. Prevents third-party scripts (if any) from accessing the camera, microphone, or location.
Cross-Origin-Opener-Policysame-originIsolates the browsing context from cross-origin popups. Prevents Spectre-style side-channel attacks from cross-origin windows.

HSTS caution

Strict-Transport-Security with includeSubDomains cannot be easily reversed once browsers cache it. Before deploying, confirm that every subdomain supports HTTPS. Start with a shorter max-age (e.g., 300 for 5 minutes) during testing and increase it once you are confident.

Dependency auditing

Rust crates can have known vulnerabilities. Two tools catch them in CI.

cargo audit checks your Cargo.lock against the RustSec Advisory Database, which tracks reported vulnerabilities in Rust crates:

cargo install cargo-audit
cargo audit

cargo deny is broader. It checks advisories (same database as cargo audit), licence compliance, duplicate dependency versions, and can ban specific crates:

cargo install cargo-deny
cargo deny init   # creates deny.toml
cargo deny check

Run both in CI on every pull request. cargo audit is fast and catches known CVEs. cargo deny enforces policy (no GPL dependencies, no duplicate versions of security-critical crates, ban specific crates you have decided against).

Secure coding with AI agents

Empirical research shows that AI coding assistants introduce security vulnerabilities at measurable rates. A 2021 study by NYU researchers found that roughly 40% of GitHub Copilot’s generated code samples contained vulnerabilities across the MITRE CWE Top 25 categories. A Stanford user study found that participants using AI assistance wrote less secure code and were simultaneously more confident in its security.

These findings are relevant to this stack specifically:

  • unsafe blocks. AI agents may introduce unsafe when a safe alternative exists. Audit every unsafe block in AI-generated code for necessity and correctness.
  • PreEscaped with untrusted input. An agent solving an HTML rendering problem may reach for PreEscaped without considering the XSS implications.
  • format!() in SQL. An agent may build dynamic queries with string formatting instead of bind parameters, especially for complex queries with optional filters.
  • Error messages that leak internals. AI-generated error handlers often pass raw database errors or file paths through to HTTP responses.
  • Overly permissive defaults. CORS set to *, SameSite::None, Secure: false, missing security headers. AI tends to generate code that works rather than code that is secure.

The Building with AI Coding Agents section covers review practices, a security checklist, and workflow patterns for catching these issues systematically.

Gotchas

PreEscaped is the primary XSS risk in Maud. Search your codebase for every use of PreEscaped and verify that the input is either trusted or sanitised. This is the single most impactful security audit you can do in a Maud application.

format!() is the primary SQL injection risk in SQLx. Search for format! near sqlx::query calls. Prefer query! (compile-time checked, string literal only) over query() (runtime, accepts &str) wherever possible.

CSP breaks htmx if you do not plan for it. Deploy CSP early, before you have built features that depend on inline scripts or hx-on:* attributes. Retrofitting a strict CSP onto an existing application is significantly harder than building with one from the start.

HSTS is sticky. Once a browser sees a Strict-Transport-Security header with a long max-age, it will refuse HTTP connections to your domain until the max-age expires. Test with short values first.

Security headers are not set by default. Axum sends no security headers out of the box. The middleware above is not optional; without it, your application is missing basic protections that browsers check for.

Forms & Errors

Form Handling and Validation

HTML forms are the primary input mechanism in a hypermedia-driven application. The browser collects data, the server validates and processes it, and the response is HTML. There is no JSON serialisation layer, no client-side state management for form data, and no separate API to keep in sync.

This section covers extracting form data in Axum handlers, sanitising and validating it, building a custom ValidatedForm extractor that combines all three steps, displaying errors with Maud and htmx, and the Post/Redirect/Get pattern for safe form submissions.

Extracting form data

Use the Form<T> extractor from axum-extra (not the one in axum itself). The axum-extra version uses serde_html_form under the hood, which correctly handles multi-value fields: multiple <input> elements with the same name (checkboxes, for example) and <select> elements with the multiple attribute. The standard axum::extract::Form uses serde_urlencoded, which does not support these cases.

[dependencies]
axum-extra = { version = "0.10", features = ["form"] }

Define a struct with Deserialize:

use serde::Deserialize;

#[derive(Deserialize)]
struct CreateContact {
    name: String,
    email: String,
    phone: Option<String>,
}

Extract it in a handler:

use axum_extra::extract::Form;
use axum::response::Redirect;

async fn create_contact(
    Form(input): Form<CreateContact>,
) -> Redirect {
    // input.name, input.email, input.phone are ready to use
    Redirect::to("/contacts")
}

For multi-value fields, collect into a Vec:

#[derive(Deserialize)]
struct SurveyResponse {
    name: String,
    #[serde(rename = "interest")]
    interests: Vec<String>,
}
html! {
    fieldset {
        legend { "Interests" }
        label {
            input type="checkbox" name="interest" value="rust";
            " Rust"
        }
        label {
            input type="checkbox" name="interest" value="web";
            " Web development"
        }
        label {
            input type="checkbox" name="interest" value="databases";
            " Databases"
        }
    }
}

Each checked box sends interest=rust&interest=web, and serde_html_form collects them into the Vec<String>.

Wire the handler to a POST route:

use axum::{routing::{get, post}, Router};

let app = Router::new()
    .route("/contacts", get(list_contacts))
    .route("/contacts", post(create_contact));

The corresponding HTML form:

use maud::{html, Markup};

fn contact_form() -> Markup {
    html! {
        form method="post" action="/contacts" {
            label for="name" { "Name" }
            input #name type="text" name="name" required;

            label for="email" { "Email" }
            input #email type="email" name="email" required;

            label for="phone" { "Phone" }
            input #phone type="tel" name="phone";

            button type="submit" { "Save" }
        }
    }
}

The name attributes on the <input> elements must match the struct field names. serde handles the mapping. For fields with names that differ from Rust conventions, use #[serde(rename = "field-name")].

Option<String> fields map to inputs that may be left blank. If the field is absent or empty in the form submission, serde deserialises it as None.

Handling deserialisation failures

If the form body cannot be deserialised into the target struct (missing required fields, wrong types), Axum returns a 422 Unprocessable Entity by default. For a better user experience, accept a Result and handle the rejection:

use axum_extra::extract::FormRejection;

async fn create_contact(
    form: Result<Form<CreateContact>, FormRejection>,
) -> impl IntoResponse {
    match form {
        Ok(Form(input)) => {
            // process valid input
            Redirect::to("/contacts").into_response()
        }
        Err(_) => {
            // re-render the form with a general error
            (StatusCode::UNPROCESSABLE_ENTITY, contact_form()).into_response()
        }
    }
}

In practice, deserialisation failures are rare when the HTML form matches the struct. Validation errors (invalid email format, value out of range) are the common case.

Sanitising input

User input needs cleaning before validation. Leading and trailing whitespace, inconsistent casing, and stray non-alphanumeric characters cause validation failures that are not the user’s fault. The sanitizer crate provides a derive macro that declares sanitisation rules directly on struct fields, the same way validator declares validation rules.

[dependencies]
sanitizer = "1"

Add Sanitize alongside Deserialize:

use sanitizer::prelude::*;
use serde::Deserialize;

#[derive(Deserialize, Sanitize)]
struct CreateContact {
    #[sanitize(trim)]
    name: String,

    #[sanitize(trim, lower_case)]
    email: String,

    #[sanitize(trim)]
    phone: Option<String>,
}

Call .sanitize() to modify the struct in place:

let mut input = CreateContact {
    name: "  Alice   ".into(),
    email: " Alice@Example.COM ".into(),
    phone: None,
};
input.sanitize();
// input.name == "Alice"
// input.email == "alice@example.com"

Available sanitisers

SanitiserEffect
trimRemove leading and trailing whitespace
lower_caseConvert to lowercase
upper_caseConvert to UPPERCASE
camel_caseConvert to camelCase
snake_caseConvert to snake_case
screaming_snake_caseConvert to SCREAMING_SNAKE_CASE
numericRemove all non-numeric characters
alphanumericRemove all non-alphanumeric characters
e164Convert phone number to E.164 international format
clamp(min, max)Clamp an integer to a range
clamp(max)Truncate a string to a maximum length
custom(function_name)Apply a custom sanitisation function

Custom sanitisation functions

For rules beyond the built-ins, write a function that takes &str and returns String:

use sanitizer::StringSanitizer;

fn collapse_whitespace(input: &str) -> String {
    let mut s = StringSanitizer::from(input);
    s.trim();
    s.get()
        .split_whitespace()
        .collect::<Vec<_>>()
        .join(" ")
}

#[derive(Deserialize, Sanitize)]
struct CreatePost {
    #[sanitize(custom(collapse_whitespace))]
    title: String,
}

Sanitisation runs before validation. Trim whitespace, normalise casing, and clean up formatting first, then validate the cleaned values. This order matters: " " (a single space) fails a length(min = 1) check only if you trim it first.

Server-side validation with validator

The validator crate provides a Validate derive macro that adds declarative validation rules to structs. Validation runs on the server after sanitisation and before any database or business logic.

[dependencies]
validator = { version = "0.20", features = ["derive"] }

Add Validate to the struct:

use sanitizer::prelude::*;
use serde::Deserialize;
use validator::Validate;

#[derive(Deserialize, Sanitize, Validate)]
struct CreateContact {
    #[sanitize(trim)]
    #[validate(length(min = 1, max = 255, message = "Name is required"))]
    name: String,

    #[sanitize(trim, lower_case)]
    #[validate(email(message = "Enter a valid email address"))]
    email: String,

    #[sanitize(trim)]
    #[validate(length(max = 20, message = "Phone number too long"))]
    phone: Option<String>,
}

Built-in validators

ValidatorUsageChecks
email#[validate(email)]Valid email format per HTML5 spec
url#[validate(url)]Valid URL
length#[validate(length(min = 1, max = 100))]String or Vec length bounds
range#[validate(range(min = 0, max = 150))]Numeric value bounds
must_match#[validate(must_match(other = "password_confirm"))]Two fields have the same value
contains#[validate(contains(pattern = "@"))]String contains a substring
does_not_contain#[validate(does_not_contain(pattern = "admin"))]String does not contain a substring
regex#[validate(regex(path = *RE_PHONE))]Matches a compiled regex
custom#[validate(custom(function = "check_slug"))]Runs a custom function

Every validator accepts an optional message parameter that provides the error text shown to users. Without it, the crate produces a default message keyed by the validation rule name.

Custom validation functions

For rules that don’t fit the built-in validators, write a function that returns Result<(), ValidationError>:

use validator::ValidationError;

fn validate_no_profanity(value: &str) -> Result<(), ValidationError> {
    let blocked = ["spam", "scam"];
    if blocked.iter().any(|w| value.to_lowercase().contains(w)) {
        return Err(ValidationError::new("profanity")
            .with_message("Contains blocked content".into()));
    }
    Ok(())
}

#[derive(Deserialize, Sanitize, Validate)]
struct CreatePost {
    #[sanitize(trim)]
    #[validate(length(min = 1, max = 200))]
    title: String,

    #[sanitize(trim)]
    #[validate(custom(function = "validate_no_profanity"))]
    body: String,
}

Nested validation

Structs containing other validatable structs use #[validate(nested)]:

#[derive(Deserialize, Sanitize, Validate)]
struct Address {
    #[sanitize(trim)]
    #[validate(length(min = 1))]
    street: String,
    #[sanitize(trim)]
    #[validate(length(min = 1))]
    city: String,
}

#[derive(Deserialize, Sanitize, Validate)]
struct CreateUser {
    #[sanitize(trim)]
    #[validate(length(min = 1))]
    name: String,
    #[sanitize]
    #[validate(nested)]
    address: Address,
}

The ValidatedForm extractor

Every form handler follows the same sequence: deserialise the body, sanitise the fields, validate, then branch on the result. A custom ValidatedForm<T> extractor wraps axum-extra’s Form and performs all three steps, so handlers never repeat the boilerplate.

The extractor uses FormRejection only for deserialisation failures (malformed request bodies). Validation failures are not rejections; they are a normal part of form handling. The extractor returns both the sanitised input and any validation errors, so the handler always has access to the user’s data for re-rendering the form.

use axum::extract::{FromRequest, Request};
use axum::http::StatusCode;
use axum::response::{IntoResponse, Response};
use axum_extra::extract::{Form, FormRejection};
use sanitizer::prelude::*;
use validator::{Validate, ValidationErrors};

pub struct ValidatedForm<T> {
    pub input: T,
    pub errors: Option<ValidationErrors>,
}

impl<S, T> FromRequest<S> for ValidatedForm<T>
where
    S: Send + Sync,
    T: serde::de::DeserializeOwned + Sanitize + Validate,
    Form<T>: FromRequest<S, Rejection = FormRejection>,
{
    type Rejection = FormRejection;

    async fn from_request(
        req: Request,
        state: &S,
    ) -> Result<Self, Self::Rejection> {
        let Form(mut input) = Form::<T>::from_request(req, state).await?;
        input.sanitize();
        let errors = input.validate().err();
        Ok(ValidatedForm { input, errors })
    }
}

The handler pattern becomes:

async fn create_contact(
    validated: ValidatedForm<CreateContact>,
) -> impl IntoResponse {
    if let Some(errors) = &validated.errors {
        return (
            StatusCode::UNPROCESSABLE_ENTITY,
            render_contact_form(&validated.input, errors),
        ).into_response();
    }

    save_contact(&validated.input).await;
    Redirect::to("/contacts").into_response()
}

The ValidatedForm extractor handles the mechanical work. The handler deals only with the business logic: render errors or save and redirect.

Place the ValidatedForm definition in a shared crate in your workspace (e.g., common or web). Every form handler across the application can use it.

HTML5 client-side validation

Use HTML5 validation attributes as the first line of defence. They provide instant feedback without a server round-trip and reduce unnecessary requests. The server always validates too, because client-side validation is trivially bypassed.

The relevant attributes:

AttributePurposeExample
requiredField must not be emptyinput required;
type="email"Must look like an emailinput type="email";
type="url"Must look like a URLinput type="url";
minlength / maxlengthText length boundsinput minlength="1" maxlength="255";
min / maxNumeric or date boundsinput type="number" min="0" max="150";
patternRegex matchinput pattern="[A-Za-z]+" title="Letters only";

Apply these in your Maud templates alongside the server-side validator rules. Keep the constraints consistent: if the server requires length(min = 1, max = 255), set required minlength="1" maxlength="255" on the input.

fn contact_form_fields(input: Option<&CreateContact>) -> Markup {
    let name_val = input.map(|i| i.name.as_str()).unwrap_or("");
    let email_val = input.map(|i| i.email.as_str()).unwrap_or("");

    html! {
        label for="name" { "Name" }
        input #name type="text" name="name" value=(name_val)
            required minlength="1" maxlength="255";

        label for="email" { "Email" }
        input #email type="email" name="email" value=(email_val)
            required;
    }
}

HTML5 validation is not a substitute for server-side validation. It is a UX optimisation that catches obvious mistakes before they hit the network.

Displaying validation errors with Maud

When validation fails, re-render the form with the user’s input preserved and error messages next to the relevant fields. The ValidationErrors struct from validator maps field names to a list of ValidationError values, each with a message field.

A helper to extract the first error message for a given field:

use validator::ValidationErrors;

fn field_error(errors: &ValidationErrors, field: &str) -> Option<String> {
    errors
        .field_errors()
        .get(field)
        .and_then(|errs| errs.first())
        .and_then(|e| e.message.as_ref())
        .map(|msg| msg.to_string())
}

An error message component:

fn field_error_message(errors: Option<&ValidationErrors>, field: &str) -> Markup {
    let msg = errors.and_then(|e| field_error(e, field));
    html! {
        @if let Some(msg) = msg {
            span.field-error role="alert" { (msg) }
        }
    }
}

Wire it into the form:

fn render_contact_form(
    input: &CreateContact,
    errors: &ValidationErrors,
) -> Markup {
    html! {
        form method="post" action="/contacts" {
            div.form-error role="alert" {
                p { "Please fix the errors below." }
            }

            div.field {
                label for="name" { "Name" }
                input #name type="text" name="name" value=(input.name)
                    required minlength="1" maxlength="255";
                (field_error_message(Some(errors), "name"))
            }

            div.field {
                label for="email" { "Email" }
                input #email type="email" name="email" value=(input.email)
                    required;
                (field_error_message(Some(errors), "email"))
            }

            button type="submit" { "Save" }
        }
    }
}

The handler using ValidatedForm:

async fn show_contact_form() -> Markup {
    contact_form()
}

async fn create_contact(
    validated: ValidatedForm<CreateContact>,
) -> impl IntoResponse {
    if let Some(errors) = &validated.errors {
        return (
            StatusCode::UNPROCESSABLE_ENTITY,
            render_contact_form(&validated.input, errors),
        ).into_response();
    }

    save_contact(&validated.input).await;
    Redirect::to("/contacts").into_response()
}

Inline field validation with htmx

The full-form pattern above works without JavaScript. For a more responsive experience, add inline validation that checks individual fields as the user fills them in, using htmx to swap error messages without a full page reload.

Create a validation endpoint that accepts a single field value and returns just the error markup:

#[derive(Deserialize)]
struct FieldValidation {
    name: Option<String>,
    email: Option<String>,
}

async fn validate_field(
    Form(input): Form<FieldValidation>,
) -> Markup {
    // Build a partial struct for validation
    let mut contact = CreateContact {
        name: input.name.clone().unwrap_or_default(),
        email: input.email.clone().unwrap_or_default(),
        phone: None,
    };
    contact.sanitize();

    let errors = contact.validate().err();
    // Determine which field was submitted and return its error
    if input.name.is_some() {
        return field_error_message(errors.as_ref(), "name");
    }
    if input.email.is_some() {
        return field_error_message(errors.as_ref(), "email");
    }
    html! {}
}

Add htmx attributes to the form inputs. Each field posts its value on blur and swaps the error message next to it:

fn contact_form_with_inline_validation(
    input: Option<&CreateContact>,
    errors: Option<&ValidationErrors>,
) -> Markup {
    let name_val = input.map(|i| i.name.as_str()).unwrap_or("");
    let email_val = input.map(|i| i.email.as_str()).unwrap_or("");

    html! {
        form method="post" action="/contacts" {
            div.field {
                label for="name" { "Name" }
                input #name type="text" name="name" value=(name_val)
                    required minlength="1" maxlength="255"
                    hx-post="/contacts/validate"
                    hx-trigger="blur"
                    hx-target="next .field-error-slot"
                    hx-swap="innerHTML";
                span.field-error-slot {
                    (field_error_message(errors, "name"))
                }
            }

            div.field {
                label for="email" { "Email" }
                input #email type="email" name="email" value=(email_val)
                    required
                    hx-post="/contacts/validate"
                    hx-trigger="blur"
                    hx-target="next .field-error-slot"
                    hx-swap="innerHTML";
                span.field-error-slot {
                    (field_error_message(errors, "email"))
                }
            }

            button type="submit" { "Save" }
        }
    }
}

Register the validation endpoint:

let app = Router::new()
    .route("/contacts/new", get(show_contact_form))
    .route("/contacts", post(create_contact))
    .route("/contacts/validate", post(validate_field));

This layered approach gives three levels of validation feedback:

  1. HTML5 attributes catch basic mistakes instantly in the browser.
  2. htmx inline validation checks fields against server rules on blur, before submission.
  3. Full-form server validation on POST is the final authority. It always runs, catching anything the first two layers missed.

The form works without JavaScript (levels 1 and 3). htmx enhances it progressively.

Post/Redirect/Get

The Post/Redirect/Get (PRG) pattern prevents duplicate form submissions when users refresh the page after a POST. Without it, refreshing re-submits the form, potentially creating duplicate records.

The pattern:

  1. The browser POSTs the form data.
  2. The server processes it and responds with a 303 See Other redirect.
  3. The browser follows the redirect with a GET request.
  4. Refreshing the page repeats only the GET, not the POST.

In Axum:

use axum::response::Redirect;

async fn create_contact(
    validated: ValidatedForm<CreateContact>,
) -> impl IntoResponse {
    if let Some(errors) = &validated.errors {
        return (
            StatusCode::UNPROCESSABLE_ENTITY,
            render_contact_form(&validated.input, errors),
        ).into_response();
    }

    save_contact(&validated.input).await;
    Redirect::to("/contacts").into_response()
}

Redirect::to() sends a 303 See Other by default, which is correct for PRG. The browser converts the redirect to a GET regardless of the original method.

For success feedback after the redirect (a “Contact saved” flash message), store the message in the session before redirecting and display it on the next GET. Session management is covered in the Authentication section.

When the form is submitted via htmx (not a full page navigation), PRG is unnecessary. htmx replaces a targeted DOM fragment, and there is no browser history entry for the POST. The server can return an HTML fragment directly. Use the HxRequest extractor from axum-htmx to branch:

use axum_htmx::HxRequest;

async fn create_contact(
    HxRequest(is_htmx): HxRequest,
    validated: ValidatedForm<CreateContact>,
) -> impl IntoResponse {
    if let Some(errors) = &validated.errors {
        return (
            StatusCode::UNPROCESSABLE_ENTITY,
            render_contact_form(&validated.input, errors),
        ).into_response();
    }

    save_contact(&validated.input).await;

    if is_htmx {
        // Return updated content fragment
        render_contact_list().await.into_response()
    } else {
        // Standard PRG for non-htmx submissions
        Redirect::to("/contacts").into_response()
    }
}

CSRF protection

Every form that performs a state-changing action (POST, PUT, DELETE) needs protection against cross-site request forgery. Without it, a malicious page can submit a hidden form to your application using the victim’s authenticated session. CSRF protection is not specific to authentication forms; it applies to every form in the application, including the contact form examples above.

Apply the CSRF middleware layer to the router so it covers all routes with form handlers. The setup, configuration, and layer ordering are covered in the Authentication section.

File uploads

For forms that include file uploads, the browser sends multipart/form-data instead of URL-encoded data. The axum-typed-multipart crate provides a derive macro that handles multipart parsing with the same type-safe pattern as Form<T>.

[dependencies]
axum-typed-multipart = { version = "0.16", features = ["tempfile_3"] }
tempfile = "3"

The tempfile_3 feature streams uploads to temporary files instead of holding them in memory.

Define the upload struct:

use axum_typed_multipart::{FieldData, TryFromMultipart, TypedMultipart};
use tempfile::NamedTempFile;

#[derive(TryFromMultipart)]
struct CreateDocument {
    title: String,

    #[form_data(limit = "10MB")]
    file: FieldData<NamedTempFile>,
}

FieldData<NamedTempFile> streams the upload to a temporary file on disk. The FieldData wrapper provides metadata: file.metadata.file_name for the original filename and file.metadata.content_type for the MIME type.

The handler:

async fn upload_document(
    TypedMultipart(input): TypedMultipart<CreateDocument>,
) -> impl IntoResponse {
    let file_name = input.file.metadata.file_name
        .unwrap_or_else(|| "unnamed".to_string());
    let content_type = input.file.metadata.content_type
        .unwrap_or_else(|| "application/octet-stream".parse().unwrap());

    // input.file.contents is the NamedTempFile
    // Move it to permanent storage or upload to S3
    let temp_path = input.file.contents.path();

    // ... process the file

    Redirect::to("/documents")
}

The form needs enctype="multipart/form-data":

html! {
    form method="post" action="/documents" enctype="multipart/form-data" {
        label for="title" { "Title" }
        input #title type="text" name="title" required;

        label for="file" { "File" }
        input #file type="file" name="file" required accept=".pdf,.doc,.docx";

        button type="submit" { "Upload" }
    }
}

For processing and storing uploaded files (S3-compatible storage, permanent paths, serving files back), see the File Storage section.

Alternatives

garde is an alternative validation crate with a different API style. Where validator uses string-based attribute arguments (#[validate(length(min = 1))]), garde uses Rust expressions (#[garde(length(min = 1))]) and supports context-dependent validation through a generic context parameter. Both crates are actively maintained. This guide uses validator because it is more widely adopted and its API is sufficient for typical web form validation.

Gotchas

Field names must match. The name attribute in the HTML form must match the struct field name exactly (or the #[serde(rename)] value). A mismatch causes deserialisation to silently use the default or fail entirely, depending on whether the field is Option.

Sanitise before validating. The ValidatedForm extractor handles this order automatically. If you call .validate() without sanitising first, a value like " " (whitespace) passes a length(min = 1) check even though it contains no meaningful content.

Validation runs after deserialisation. If serde cannot parse the form body at all (e.g., a required field is completely missing), Axum rejects the request before sanitisation or validation ever runs. The ValidatedForm extractor surfaces this as a FormRejection.

validator checks values, not business rules. Format and range checks belong on the struct. Rules that require database access (uniqueness, referential integrity) belong in the handler or service layer, after validation passes.

Optional fields need special handling with validator. #[validate(email)] on an Option<String> only validates the inner value when it is Some. An empty optional field passes validation, which is usually what you want. If a field should be non-empty when present, add #[validate(length(min = 1))].

CSRF protection is not optional. Every POST form needs CSRF middleware, not just login and registration. A contacts form, a settings page, a comment box: if it changes state, it needs protection. See CSRF protection for setup.

multipart/form-data for file uploads. Standard Form<T> only handles URL-encoded bodies. If the form includes a file input, the enctype must be multipart/form-data and the handler must use TypedMultipart<T> instead of Form<T>. The ValidatedForm extractor does not apply to multipart forms.

Error Handling

Rust has no exceptions. Functions that can fail return Result<T, E>, and the caller decides what to do with the error. This is a strength, but it means you need a deliberate strategy for how errors propagate through your web application and how they turn into HTTP responses.

The approach here uses two crates that serve different purposes. thiserror generates typed error enums for domain and library code, where callers need to match on specific failure modes. anyhow provides type-erased error propagation for application code, where you just want to bubble errors up with context.

thiserror for typed errors

thiserror is a derive macro that generates std::error::Error implementations for your error types. Use it when callers need to distinguish between different failure modes.

[dependencies]
thiserror = "2"

Define an error enum for a domain module:

use thiserror::Error;

#[derive(Debug, Error)]
pub enum UserError {
    #[error("user not found: {0}")]
    NotFound(i64),

    #[error("email already registered: {0}")]
    DuplicateEmail(String),

    #[error("invalid email format: {0}")]
    InvalidEmail(String),
}

The #[error("...")] attribute generates the Display implementation. Field values are interpolated using the same syntax as format!.

Wrapping source errors with #[from]

#[from] generates a From<T> implementation and sets the error’s source() return value, so the ? operator converts the source error automatically:

use thiserror::Error;

#[derive(Debug, Error)]
pub enum RepoError {
    #[error("database error")]
    Database(#[from] sqlx::Error),

    #[error("user not found: {0}")]
    NotFound(i64),

    #[error("email already registered: {0}")]
    DuplicateEmail(String),
}

A repository function can now use ? on SQLx calls and the error converts automatically:

async fn find_user(pool: &PgPool, id: i64) -> Result<User, RepoError> {
    let user = sqlx::query_as!(User, "SELECT * FROM users WHERE id = $1", id)
        .fetch_one(pool)
        .await?; // sqlx::Error -> RepoError::Database
    Ok(user)
}

Transparent forwarding

#[error(transparent)] delegates both Display and source() to the inner error. This is useful for catch-all variants:

#[derive(Debug, Error)]
pub enum AppError {
    #[error("not found: {0}")]
    NotFound(String),

    #[error(transparent)]
    Database(#[from] sqlx::Error),
}

When AppError::Database is displayed, it prints the SQLx error’s message directly rather than wrapping it.

When to use thiserror

Use thiserror in library crates, domain modules, and any code where the caller needs to match on the error variant to decide what to do. A repository crate that returns RepoError::NotFound lets the handler return a 404. A repository that returns anyhow::Error forces the handler to treat everything as a 500.

anyhow for application code

anyhow provides a single error type, anyhow::Error, that wraps any error implementing std::error::Error. It is designed for application code where you want to propagate errors with added context rather than define typed variants for every possible failure.

[dependencies]
anyhow = "1"

Adding context

The .context() and .with_context() methods attach human-readable messages to errors as they propagate up the call stack:

use anyhow::{Context, Result};

async fn load_config(path: &str) -> Result<Config> {
    let content = std::fs::read_to_string(path)
        .with_context(|| format!("failed to read config from {path}"))?;

    let config: Config = toml::from_str(&content)
        .context("failed to parse config")?;

    Ok(config)
}

Result here is anyhow::Result<T>, an alias for Result<T, anyhow::Error>. The ? operator converts any error into anyhow::Error and the .context() call wraps it with an additional message. When logged, the full chain is visible: “failed to parse config: expected =, found [ at line 3”.

Use .with_context(|| ...) when the message involves formatting (the closure is only evaluated on error). Use .context("...") for static strings.

bail! and ensure!

For errors that don’t originate from another error type:

use anyhow::{bail, ensure, Result};

fn validate_port(port: u16) -> Result<()> {
    ensure!(port >= 1024, "port {port} is in the privileged range");
    if port == 0 {
        bail!("port must not be zero");
    }
    Ok(())
}

bail! returns early with an error. ensure! is a conditional bail!.

When to use anyhow

Use anyhow in application-level code where you don’t need callers to match on specific error variants: startup logic, configuration loading, background tasks, and any place where the only reasonable response to an error is to log it and return a 500. Anyhow’s .context() method produces better diagnostic messages than a bare ?, because each layer of the call stack can explain what it was trying to do.

The AppError type

Web handlers need to convert errors into HTTP responses. Axum requires that both the success and error types in a handler’s Result implement IntoResponse. The standard pattern is a single AppError enum that maps each variant to an HTTP status code.

use axum::http::StatusCode;
use axum::response::{IntoResponse, Response};
use thiserror::Error;

#[derive(Debug, Error)]
pub enum AppError {
    #[error("not found: {0}")]
    NotFound(String),

    #[error("conflict: {0}")]
    Conflict(String),

    #[error("bad request: {0}")]
    BadRequest(String),

    #[error("unauthorized")]
    Unauthorized,

    #[error("database error")]
    Database(#[from] sqlx::Error),
}

impl IntoResponse for AppError {
    fn into_response(self) -> Response {
        let status = match &self {
            AppError::NotFound(_) => StatusCode::NOT_FOUND,
            AppError::Conflict(_) => StatusCode::CONFLICT,
            AppError::BadRequest(_) => StatusCode::BAD_REQUEST,
            AppError::Unauthorized => StatusCode::UNAUTHORIZED,
            AppError::Database(_) => StatusCode::INTERNAL_SERVER_ERROR,
        };

        (status, self.to_string()).into_response()
    }
}

Handlers return Result<impl IntoResponse, AppError>:

async fn get_user(
    State(pool): State<PgPool>,
    Path(id): Path<i64>,
) -> Result<Html<String>, AppError> {
    let user = sqlx::query_as!(User, "SELECT * FROM users WHERE id = $1", id)
        .fetch_optional(&pool)
        .await?  // sqlx::Error -> AppError::Database
        .ok_or_else(|| AppError::NotFound(format!("user {id}")))?;

    Ok(Html(render_user(&user)))
}

The ? operator on the SQLx call converts sqlx::Error into AppError::Database via the #[from] attribute. The .ok_or_else() on the Option produces AppError::NotFound when the query returns no rows.

Converting from domain errors

Domain modules often define their own error types. Add From implementations to convert them into AppError:

impl From<UserError> for AppError {
    fn from(err: UserError) -> Self {
        match err {
            UserError::NotFound(id) => AppError::NotFound(format!("user {id}")),
            UserError::DuplicateEmail(email) => {
                AppError::Conflict(format!("email {email} already registered"))
            }
            UserError::InvalidEmail(msg) => AppError::BadRequest(msg),
        }
    }
}

This mapping is where domain semantics meet HTTP semantics. A DuplicateEmail is a domain concept; a 409 Conflict is an HTTP concept. The From implementation is the bridge.

Mapping database errors

Most SQLx errors should map to a 500. The two cases worth handling explicitly are missing rows and unique constraint violations:

impl From<sqlx::Error> for AppError {
    fn from(err: sqlx::Error) -> Self {
        match &err {
            sqlx::Error::RowNotFound => {
                AppError::NotFound("record not found".to_string())
            }
            sqlx::Error::Database(db_err) if db_err.is_unique_violation() => {
                AppError::Conflict("duplicate record".to_string())
            }
            _ => AppError::Database(err),
        }
    }
}

Prefer fetch_optional() over fetch_one() when a missing row is a normal case rather than an error. fetch_optional() returns Ok(None) instead of Err(RowNotFound), which keeps the “not found” logic in the handler where you have more context about what was being looked up:

let user = sqlx::query_as!(User, "SELECT * FROM users WHERE id = $1", id)
    .fetch_optional(&pool)
    .await?
    .ok_or_else(|| AppError::NotFound(format!("user {id}")))?;

This produces a better error message (“user 42 not found”) than the generic “record not found” from the From<sqlx::Error> conversion.

Never expose raw database error messages to users. They can leak table names, column names, and constraint details. The From impl above replaces the database message with a generic string. Log the original error for debugging (covered below).

User-facing error pages

The plain text responses above work for development. For a production HDA application, render HTML error pages with Maud.

Define an error page component:

use maud::{html, Markup};
use axum::http::StatusCode;

fn error_page(status: StatusCode, message: &str) -> Markup {
    html! {
        h1 { (status.as_u16()) " " (status.canonical_reason().unwrap_or("Error")) }
        p { (message) }
    }
}

Wrap it in your site layout the same way you wrap any other page. The error page should look like part of the application, not a raw browser error.

Update IntoResponse to render HTML:

impl IntoResponse for AppError {
    fn into_response(self) -> Response {
        let (status, message) = match &self {
            AppError::NotFound(msg) => (StatusCode::NOT_FOUND, msg.clone()),
            AppError::Conflict(msg) => (StatusCode::CONFLICT, msg.clone()),
            AppError::BadRequest(msg) => (StatusCode::BAD_REQUEST, msg.clone()),
            AppError::Unauthorized => {
                (StatusCode::UNAUTHORIZED, "Unauthorized".to_string())
            }
            AppError::Database(_) => (
                StatusCode::INTERNAL_SERVER_ERROR,
                "An internal error occurred".to_string(),
            ),
        };

        (status, error_page(status, &message)).into_response()
    }
}

For 500 errors, show a generic message. The user does not need to know that the database connection timed out.

Error fragments for htmx requests

When an htmx request fails, you often want to return an error fragment that slots into the existing page rather than a full error page. Check the HX-Request header to branch:

use axum_htmx::HxRequest;

async fn delete_user(
    HxRequest(is_htmx): HxRequest,
    State(pool): State<PgPool>,
    Path(id): Path<i64>,
) -> Result<impl IntoResponse, AppError> {
    delete_user_by_id(&pool, id).await?;

    if is_htmx {
        Ok(Html("".to_string()).into_response())
    } else {
        Ok(Redirect::to("/users").into_response())
    }
}

For a more systematic approach, move the htmx check into the IntoResponse implementation. This requires access to the request headers, which IntoResponse does not have. One option is to store the HxRequest value in the AppError type or use a middleware that sets a response header. In practice, handling htmx errors at the handler level (as above) is simpler and more explicit.

Logging errors with tracing

Log errors in the IntoResponse implementation so every error is captured in one place. This avoids scattering tracing::error! calls across every handler.

impl IntoResponse for AppError {
    fn into_response(self) -> Response {
        let (status, message) = match &self {
            AppError::NotFound(msg) => {
                tracing::warn!(error = %self, "not found");
                (StatusCode::NOT_FOUND, msg.clone())
            }
            AppError::Conflict(msg) => {
                tracing::warn!(error = %self, "conflict");
                (StatusCode::CONFLICT, msg.clone())
            }
            AppError::BadRequest(msg) => {
                tracing::warn!(error = %self, "bad request");
                (StatusCode::BAD_REQUEST, msg.clone())
            }
            AppError::Unauthorized => {
                tracing::warn!("unauthorized request");
                (StatusCode::UNAUTHORIZED, "Unauthorized".to_string())
            }
            AppError::Database(err) => {
                tracing::error!(error = ?err, "database error");
                (
                    StatusCode::INTERNAL_SERVER_ERROR,
                    "An internal error occurred".to_string(),
                )
            }
        };

        (status, error_page(status, &message)).into_response()
    }
}

Expected errors (not found, bad request) log at warn level with Display formatting (%). Unexpected errors (database failures) log at error level with Debug formatting (?) to capture the full error chain. This distinction keeps log noise manageable while ensuring genuine problems are visible.

Supplementary logging with #[instrument]

For handlers where you need more context about what failed, add #[instrument(err)] to log the error along with the function’s arguments:

use tracing::instrument;

#[instrument(skip(pool), err)]
async fn get_user(
    State(pool): State<PgPool>,
    Path(id): Path<i64>,
) -> Result<Html<String>, AppError> {
    // If this returns Err, tracing logs the error with id as context
    let user = find_user(&pool, id).await?;
    Ok(Html(render_user(&user)))
}

skip(pool) prevents the database pool from being included in the log output (it produces enormous Debug output). err tells #[instrument] to log the error value when the function returns Err.

This is supplementary to the centralized logging in IntoResponse. Use it when you need to know which specific handler call failed and with what arguments. For most handlers, the centralized approach is sufficient.

The anyhow catch-all alternative

The AppError enum above requires a variant (or a From implementation) for every error source. As applications grow, this can mean a lot of conversion code. An alternative is to add an anyhow-based catch-all variant:

#[derive(Debug, Error)]
pub enum AppError {
    #[error("not found: {0}")]
    NotFound(String),

    #[error("conflict: {0}")]
    Conflict(String),

    #[error("bad request: {0}")]
    BadRequest(String),

    #[error("unauthorized")]
    Unauthorized,

    #[error(transparent)]
    Unexpected(#[from] anyhow::Error),
}

The Unexpected variant accepts any error that implements std::error::Error via anyhow’s blanket From implementation. Any error you haven’t explicitly handled falls through to this variant and becomes a 500.

This pattern is documented in the thiserror README and taught in Zero to Production in Rust. It trades explicit control for ergonomics: you no longer need to declare a variant or write a From impl for every error source, but unexpected errors all become opaque 500s with no chance for finer-grained status codes.

The trade-off is worth considering as your application grows. For a small application with a handful of error sources, explicit variants are manageable and give you full control over HTTP status codes. For a larger application with many fallible operations where most errors are genuinely unexpected, the catch-all reduces boilerplate.

Placement in the workspace

In a Cargo workspace, place the AppError type and its IntoResponse implementation in a shared crate (e.g., web or common). Domain error types (like UserError, OrderError) live in their respective domain crates with From implementations in the shared crate that bridge domain errors to AppError.

workspace/
├── crates/
│   ├── common/         # AppError, IntoResponse, shared types
│   ├── users/          # UserError, user domain logic
│   ├── orders/         # OrderError, order domain logic
│   └── web/            # Axum handlers, routes

Domain crates depend on thiserror. The shared crate depends on thiserror and axum. Domain crates do not depend on axum, keeping web concerns out of business logic.

Gotchas

Don’t expose internal details to users. Database errors, file paths, and stack traces belong in logs, not in HTTP responses. The IntoResponse implementation should return generic messages for 500 errors and log the real error separately.

sqlx::Error is non-exhaustive. Always include a catch-all arm when matching on it. New variants can be added in minor releases.

fetch_optional vs fetch_one. Use fetch_optional when a missing row is expected (looking up a user by ID). Use fetch_one when a missing row is a bug (fetching a record you just inserted). fetch_one returns Err(RowNotFound) on miss; fetch_optional returns Ok(None).

Order of From implementations matters. If AppError has both From<sqlx::Error> and From<RepoError>, and RepoError also wraps sqlx::Error, a bare ? on a SQLx call in a handler will use From<sqlx::Error> directly, bypassing the domain error’s semantics. Be explicit about which conversion you want.

thiserror 2.0 requires a direct dependency. Code using derive(Error) must declare thiserror as a direct dependency in its Cargo.toml. Relying on a transitive dependency no longer works as of thiserror 2.0.

Integrations

Server-Sent Events and Real-Time Updates

Server-Sent Events (SSE) push data from server to browser over a single HTTP connection. The browser opens a persistent connection, the server writes events to it as they occur, and the browser receives them in real time. No polling, no WebSocket handshake, no bidirectional protocol negotiation. For the class of problems that dominate hypermedia applications (notifications, progress bars, live feeds, status updates), SSE is the right tool.

This section covers implementing SSE endpoints in Axum, using Valkey pub/sub as the event distribution layer, consuming events in the browser with the htmx SSE extension, and common patterns for real-time features.

SSE fundamentals

SSE uses a simple text-based protocol over HTTP. The server responds with Content-Type: text/event-stream and writes events as plain text:

event: notification
data: <div class="alert">New message from Alice</div>

event: status
data: <span class="status">Processing complete</span>

Each event has an optional event: name and a data: payload, separated by blank lines. The browser’s EventSource API connects to the endpoint, receives events, and dispatches them by name. If the connection drops, the browser reconnects automatically.

SSE vs WebSockets

SSE is unidirectional: server to client only. WebSockets are bidirectional. Pick based on the data flow:

Use SSE whenUse WebSockets when
Server pushes updates to the browserClient and server both send messages
Notifications, progress bars, live feedsChat, collaborative editing, gaming
HTML fragments for htmx to swapBinary data or high-frequency bidirectional messaging
You want HTTP semantics (caching, auth, proxies)You need a persistent bidirectional channel

SSE works over standard HTTP, which means it passes through proxies, load balancers, and CDNs without special configuration. WebSockets require upgrade support at every layer. For HDA applications where the server renders HTML and pushes fragments to the browser, SSE is the natural fit.

SSE endpoints in Axum

Axum provides first-class SSE support through axum::response::sse. The key types are Sse (the response wrapper), Event (a single event), and KeepAlive (heartbeat configuration).

A minimal SSE endpoint

use axum::response::sse::{Event, KeepAlive, Sse};
use futures_util::stream::Stream;
use std::convert::Infallible;

async fn events() -> Sse<impl Stream<Item = Result<Event, Infallible>>> {
    let stream = futures_util::stream::repeat_with(|| {
        Event::default()
            .event("heartbeat")
            .data("alive")
    })
    .map(Ok)
    .throttle(std::time::Duration::from_secs(5));

    Sse::new(stream).keep_alive(KeepAlive::default())
}

The handler returns Sse<impl Stream<Item = Result<Event, E>>> where E: Into<BoxError>. Axum sets Content-Type: text/event-stream and Cache-Control: no-cache automatically.

Building events

Event uses a builder pattern:

// Named event with HTML data
Event::default()
    .event("notification")
    .data("<div class=\"alert\">New message</div>")

// Event with an ID (for reconnection tracking)
Event::default()
    .event("update")
    .id("42")
    .data("<span>Updated value</span>")

// Retry interval hint (tells the browser how long to wait before reconnecting)
Event::default()
    .retry(std::time::Duration::from_secs(5))
    .data("connected")

Each setter (event, data, id, retry) can only be called once per Event. Calling it twice panics. The data method handles newlines in the payload correctly, splitting them across multiple data: lines per the SSE specification.

Keep-alive

Proxies and load balancers close idle HTTP connections. KeepAlive sends periodic comment lines (:keepalive\n\n) to keep the connection open:

Sse::new(stream).keep_alive(
    KeepAlive::new()
        .interval(std::time::Duration::from_secs(15))
        .text("keepalive")
)

The default interval is 15 seconds. Always call .keep_alive() in production. Without it, connections through Nginx, Cloudflare, or other proxies will be silently dropped after their idle timeout.

Valkey pub/sub as the event bus

A single SSE endpoint connected to a single data source works for toy examples. In practice, events originate from many places (a background job finishes, another user edits a record, a workflow reaches a new stage) and each SSE client only cares about a subset of them. Valkey pub/sub provides the distribution layer, with per-resource channels as the organising principle.

Per-resource channels

Structure Valkey channels around the resources that generate events:

order:123           – status changes for order 123
project:456         – activity on project 456
user:789:notifications – notifications for user 789
task:abc:progress   – progress updates for background task abc

Any part of your application publishes to the relevant channel. Each SSE connection subscribes only to the channels it needs. This maps naturally to how HDA pages work: a page showing order 123 opens an SSE connection that subscribes to order:123. A dashboard subscribes to several channels at once.

This design is also what Valkey performs best with. PUBLISH is O(N) where N is the number of subscribers on that specific channel. Many channels with a handful of subscribers each is fast. A single channel with thousands of subscribers makes every publish slow. The Valkey documentation and Redis creator are explicit on this point: prefer many fine-grained channels over a few broad ones.

The architecture

[Handler A] ──publish──▶ Valkey channel: order:123   ──subscribe──▶ [SSE Client 1]
[Handler B] ──publish──▶ Valkey channel: project:456 ──subscribe──▶ [SSE Client 2]
[Restate]   ──publish──▶ Valkey channel: task:abc    ──subscribe──▶ [SSE Client 1]

Each SSE connection opens its own Valkey pub/sub subscriber and subscribes to the specific channels the authenticated user is authorised to see. When the SSE client disconnects, the Valkey connection drops and the subscriptions are cleaned up automatically.

Dependencies

[dependencies]
axum = "0.8"
redis = { version = "1.0", features = ["tokio-comp"] }
tokio = { version = "1", features = ["full"] }
tokio-stream = "0.1"
futures-util = "0.3"
async-stream = "0.3"
serde = { version = "1", features = ["derive"] }
serde_json = "1"

The redis crate works with Valkey without modification. Valkey is API-compatible with Redis, so any Redis client library works as-is.

Application state

#[derive(Clone)]
pub struct AppState {
    pub db: sqlx::PgPool,
    pub valkey: redis::Client,
}

Store a redis::Client rather than a connection. The client is a lightweight handle that creates new connections on demand. Each SSE handler will create its own pub/sub connection from this client.

Publishing events

Any handler or background process publishes events to a resource-specific channel:

use redis::AsyncCommands;

pub async fn publish_event(
    valkey: &redis::Client,
    channel: &str,
    event_type: &str,
    html: &str,
) -> Result<(), redis::RedisError> {
    let mut conn = valkey.get_multiplexed_async_connection().await?;
    let payload = serde_json::json!({
        "event_type": event_type,
        "data": html
    });
    conn.publish(channel, payload.to_string()).await?;
    Ok(())
}

// Example: order status changed
publish_event(
    &state.valkey,
    "order:123",
    "status",
    "<span class=\"badge\">Shipped</span>",
).await?;

The publisher uses a regular multiplexed connection. Multiplexed connections are shared across callers, so you do not need to manage a connection pool for publishing. The subscriber connection is separate because Valkey requires a dedicated connection for subscriptions (it cannot execute other commands while subscribed).

The SSE handler

Each SSE handler authenticates the user, checks authorisation for the requested resource, then opens a dedicated Valkey subscriber for that resource’s channel:

use async_stream::try_stream;
use axum::extract::{Path, State};
use axum::response::sse::{Event, KeepAlive, Sse};
use futures_util::stream::Stream;
use futures_util::StreamExt;
use std::convert::Infallible;

pub async fn order_events(
    State(state): State<AppState>,
    Path(order_id): Path<i64>,
    user: AuthenticatedUser,
) -> Result<Sse<impl Stream<Item = Result<Event, Infallible>>>, AppError> {
    // Verify the user has access to this order
    let has_access = check_order_access(&state.db, user.id, order_id).await?;
    if !has_access {
        return Err(AppError::Forbidden);
    }

    let channel = format!("order:{order_id}");
    let client = state.valkey.clone();

    let stream = try_stream! {
        // Open a dedicated pub/sub connection for this SSE client
        let mut pubsub = client.get_async_pubsub().await.unwrap();
        pubsub.subscribe(&channel).await.unwrap();

        let mut messages = pubsub.into_on_message();
        while let Some(msg) = messages.next().await {
            let payload: String = msg.get_payload().unwrap();
            if let Ok(event) = serde_json::from_str::<serde_json::Value>(&payload) {
                let event_type = event["event_type"].as_str().unwrap_or("update");
                let data = event["data"].as_str().unwrap_or("");
                yield Event::default()
                    .event(event_type)
                    .data(data);
            }
        }
    };

    Ok(Sse::new(stream).keep_alive(KeepAlive::default()))
}

The authorisation check happens before the stream is created. If the user does not have access, the handler returns an error and no Valkey connection is opened. Once the SSE client disconnects (browser navigates away, tab closes, element removed from DOM), the stream is dropped, which drops the Valkey connection and automatically unsubscribes.

Subscribing to multiple channels

A page that needs events from several resources subscribes to all of them on a single connection:

pub async fn dashboard_events(
    State(state): State<AppState>,
    user: AuthenticatedUser,
) -> Result<Sse<impl Stream<Item = Result<Event, Infallible>>>, AppError> {
    // Determine which resources this user should receive updates for
    let channels = get_user_subscriptions(&state.db, user.id).await?;

    let client = state.valkey.clone();

    let stream = try_stream! {
        let mut pubsub = client.get_async_pubsub().await.unwrap();

        // Subscribe to all channels at once
        for channel in &channels {
            pubsub.subscribe(channel).await.unwrap();
        }

        let mut messages = pubsub.into_on_message();
        while let Some(msg) = messages.next().await {
            let channel_name = msg.get_channel_name().to_string();
            let payload: String = msg.get_payload().unwrap();
            if let Ok(event) = serde_json::from_str::<serde_json::Value>(&payload) {
                let event_type = event["event_type"].as_str().unwrap_or("update");
                let data = event["data"].as_str().unwrap_or("");
                yield Event::default()
                    .event(event_type)
                    .data(data);
            }
        }
    };

    Ok(Sse::new(stream).keep_alive(KeepAlive::default()))
}

The get_user_subscriptions function queries your database for the resources the user has access to and returns channel names like ["project:12", "project:45", "user:789:notifications"]. A single Valkey pub/sub connection can subscribe to any number of channels.

Wiring it together

use axum::{routing::get, Router};

#[tokio::main]
async fn main() {
    let valkey = redis::Client::open("redis://127.0.0.1:6379").unwrap();

    let state = AppState {
        db: pool,
        valkey,
    };

    let app = Router::new()
        .route("/events/orders/{id}", get(order_events))
        .route("/events/dashboard", get(dashboard_events))
        .with_state(state);

    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await.unwrap();
    axum::serve(listener, app).await.unwrap();
}

No global background tasks needed. Each SSE connection manages its own Valkey subscription lifecycle.

Security

SSE connections carry the same security requirements as any other authenticated endpoint, with additional considerations for long-lived connections.

Authentication

SSE uses a regular HTTP GET request. The browser’s EventSource API automatically sends cookies, so cookie-based session authentication works without any extra configuration. This is the recommended approach for HDA applications.

EventSource does not support custom HTTP headers. If your application uses Authorization: Bearer tokens, you cannot pass them through EventSource. Workarounds exist (tokens in query parameters, fetch-based SSE libraries), but they introduce their own risks. Stick with session cookies.

Authorisation at subscription time

Verify access before opening the Valkey subscription. The handler should:

  1. Authenticate the user from the session cookie.
  2. Extract the resource identifier from the request (path parameter, query parameter).
  3. Check that the user has permission to view the resource.
  4. Only then open the Valkey subscriber and begin streaming.

The order_events handler above demonstrates this pattern. The authorisation check is a standard database query, the same check you would run on a regular page load for that resource.

Per-resource channels provide security by architecture. Each SSE connection only receives messages from channels it explicitly subscribed to, and subscription is gated by a server-side authorisation check. There is no filtering step that could be bypassed or implemented incorrectly.

Do not use a single broadcast channel where all events flow to all connections and rely on server-side filtering. Every message passes through every connection’s filter logic, and any bug in that filter leaks data to unauthorised users.

Long-lived connection re-authorisation

SSE connections can persist for hours. Permissions change: a user’s role is downgraded, a project is archived, a session expires. The SSE connection established before the change continues streaming events unless you actively close it.

Strategies for handling this:

  • Periodic session validation. Check the user’s session in the stream loop every N minutes. If the session is expired or revoked, close the stream.
  • Revocation events. When a permission change occurs, publish a control event (e.g., to channel user:{id}:control) that the SSE handler listens for and uses to close the connection.
  • Short-lived connections. Set sse-close on a timer event and have the browser reconnect periodically. Each reconnection runs the full authorisation check.

The simplest approach for most applications is periodic session validation. Add it to the stream loop:

let stream = try_stream! {
    let mut pubsub = client.get_async_pubsub().await.unwrap();
    pubsub.subscribe(&channel).await.unwrap();
    let mut messages = pubsub.into_on_message();
    let mut last_auth_check = std::time::Instant::now();

    loop {
        // Re-check authorisation every 5 minutes
        if last_auth_check.elapsed() > std::time::Duration::from_secs(300) {
            let still_valid = check_order_access(&db, user_id, order_id).await;
            if !still_valid.unwrap_or(false) {
                break;
            }
            last_auth_check = std::time::Instant::now();
        }

        match tokio::time::timeout(
            std::time::Duration::from_secs(30),
            messages.next()
        ).await {
            Ok(Some(msg)) => {
                // ... yield event as before
            }
            Ok(None) => break,
            Err(_) => continue, // Timeout, loop back to re-check auth
        }
    }
};

The tokio::time::timeout ensures the loop does not block indefinitely waiting for a message, giving the authorisation check a chance to run even when the channel is quiet.

CSRF

An attacker’s page can create an EventSource pointing at your SSE endpoint. The victim’s browser sends cookies automatically, so the attacker’s page receives the victim’s event stream.

Mitigate this with SameSite=Lax or SameSite=Strict on session cookies. SameSite=Lax is the browser default and prevents cookies from being sent on cross-origin sub-resource requests, which includes EventSource connections initiated from a different origin. If your session cookies already use SameSite=Lax (they should), this attack is blocked.

As a defence-in-depth measure, validate the Origin header on SSE endpoints and reject requests from unexpected origins.

Connection exhaustion

Each SSE connection consumes a TCP connection, a file descriptor, and a Tokio task. An attacker could open thousands of connections to exhaust server resources.

Rate-limit SSE connections per user and per IP at your reverse proxy layer. Caddy and Nginx both support connection limits. Also set a maximum connection duration server-side (close and let the browser reconnect after a reasonable period, e.g., 30 minutes).

Consuming events with htmx

The htmx SSE extension connects to an SSE endpoint and swaps event data into the DOM. The Interactivity with htmx section covers the basic setup. Here is the full pattern.

Connecting and swapping

Place hx-ext="sse" and sse-connect on a container element. Child elements with sse-swap receive the data from matching event names:

div hx-ext="sse" sse-connect="/events/orders/123" {
    // Replaced when the server sends an event named "status"
    div sse-swap="status" hx-swap="innerHTML" {
        "Loading status..."
    }

    // Replaced on "activity" events
    div sse-swap="activity" hx-swap="innerHTML" {
        "No recent activity."
    }
}

When the server sends event: status\ndata: <span>Shipped</span>\n\n, htmx takes the data payload and swaps it into the element with sse-swap="status". The event name in the SSE stream must exactly match the sse-swap attribute value (case-sensitive).

Using SSE events as triggers

Instead of swapping SSE data directly, use an event as a trigger for a standard htmx request. This is useful when the SSE event signals “something changed” but the actual content comes from a separate endpoint:

div hx-ext="sse" sse-connect="/events/orders/123" {
    // When "updated" fires, fetch fresh order details
    div hx-get="/orders/123/details"
        hx-trigger="sse:updated"
        hx-swap="innerHTML" {
        "Loading order details..."
    }
}

This pattern keeps the SSE payload minimal (just a signal) and lets the triggered request fetch exactly the content it needs.

Closing the connection

The sse-close attribute closes the EventSource when a specific event arrives:

div hx-ext="sse" sse-connect="/events/tasks/abc/progress" sse-close="complete" {
    div sse-swap="progress" {
        "Starting..."
    }
}

When the server sends event: complete\ndata: done\n\n, the browser closes the SSE connection. Without sse-close, the connection stays open until the element is removed from the DOM or the page navigates away.

Reconnection behaviour

The htmx SSE extension reconnects automatically with exponential backoff. The default configuration starts at 500ms and backs off to a maximum of 60 seconds, with 30% jitter to avoid thundering herd reconnections. It attempts up to 50 reconnections before giving up.

The browser’s native EventSource also reconnects on its own, but the htmx extension’s backoff algorithm is more configurable and better suited to production use. Each reconnection is a new HTTP request, so it runs through authentication and authorisation again.

Patterns

Progress updates

A long-running operation (file upload, report generation, data import) publishes progress events to a task-specific channel. The browser shows a progress bar that updates in real time.

Server-side, publish progress from wherever the work happens:

pub async fn publish_progress(
    valkey: &redis::Client,
    task_id: &str,
    percent: u32,
    message: &str,
) -> Result<(), anyhow::Error> {
    let html = format!(
        r#"<div class="progress-bar" style="width: {percent}%">{percent}%</div>
        <p>{message}</p>"#
    );
    publish_event(valkey, &format!("task:{task_id}"), "progress", &html).await?;
    Ok(())
}

When the task finishes, publish a completion event:

publish_event(valkey, &format!("task:{task_id}"), "complete", "<p>Done.</p>").await?;

Client-side, connect to the task’s SSE endpoint:

fn progress_tracker(task_id: &str) -> Markup {
    html! {
        div hx-ext="sse"
            sse-connect=(format!("/events/tasks/{task_id}/progress"))
            sse-close="complete" {
            div sse-swap="progress" {
                div .progress-bar style="width: 0%" { "0%" }
                p { "Starting..." }
            }
        }
    }
}

The sse-close="complete" closes the connection when the task finishes. No lingering connections.

Notifications

A notification feed that updates across all open tabs for a user. The SSE connection subscribes to the user’s notification channel:

div #notifications hx-ext="sse" sse-connect="/events/notifications" {
    div sse-swap="notification" hx-swap="afterbegin" {
        // New notifications prepended here
    }
}

The hx-swap="afterbegin" prepends each new notification at the top of the container rather than replacing the entire contents. Each notification event delivers a self-contained HTML fragment.

The handler for /events/notifications subscribes to user:{user_id}:notifications, determined from the authenticated session.

Live data feeds

A dashboard element that refreshes when underlying data changes. Rather than streaming the data itself, use SSE as a signal to re-fetch:

div hx-ext="sse" sse-connect="/events/dashboard" {
    div hx-get="/dashboard/metrics"
        hx-trigger="sse:metrics-updated"
        hx-swap="innerHTML" {
        (render_metrics(&current_metrics))
    }
}

This pattern is simpler than pushing full HTML through the SSE stream, and it works well when the triggered endpoint already exists for initial page load.

Connection management

One Valkey connection per SSE client

Each SSE connection opens its own Valkey pub/sub connection. This is the simplest correct architecture. Valkey handles thousands of concurrent connections without issue. For most applications (hundreds of concurrent SSE clients), no further optimisation is needed.

If you reach tens of thousands of concurrent SSE connections on a single server process and Valkey connection count becomes a concern, introduce a subscription manager that deduplicates: when multiple SSE clients on the same server need the same channel, the manager subscribes once and fans out in-process via a tokio::broadcast channel. This is an optimisation, not a starting point.

Browser connection limits

Browsers limit the number of concurrent HTTP connections per domain. For HTTP/1.1, this limit is typically 6 connections. Each SSE connection consumes one of those slots. If a user opens multiple tabs, they can exhaust the limit quickly.

HTTP/2 multiplexes streams over a single TCP connection, so this limit does not apply. Most modern deployments use HTTP/2. If your reverse proxy (Caddy, Nginx) terminates TLS and serves over HTTP/2, this is a non-issue.

For pages that need events from many resources, prefer a single SSE connection that subscribes to multiple Valkey channels (as shown in the dashboard example) over multiple SSE connections from the same page.

Scaling beyond a single server

Per-resource channels work naturally across multiple application servers. When a handler on server A publishes to order:123, every server with a subscriber on that channel receives the message and delivers it to its local SSE clients. No coordination between servers is required because Valkey handles the distribution.

Integration with Restate

Restate workflows can publish progress events to Valkey as they execute. The SSE infrastructure picks them up and delivers them to the browser. A Restate workflow handler publishes status updates at each stage:

// Inside a Restate workflow step
publish_event(
    &valkey_client,
    &format!("task:{workflow_id}"),
    "progress",
    "<p>Step 2 of 4: Processing data...</p>",
).await?;

The browser connects to /events/tasks/{workflow_id}/progress with sse-close="complete" and sees each step’s status in real time. The Background Jobs and Durable Execution with Restate section covers Restate in detail.

When SSE is not enough

SSE is unidirectional. If your application requires the client to send messages back to the server over the same connection (chat with typing indicators, collaborative cursors, multiplayer game state), you need WebSockets.

Axum supports WebSocket upgrades directly. The tokio-tungstenite crate provides the underlying WebSocket implementation, and Axum’s axum::extract::ws module wraps it with an ergonomic API. For most HDA applications, SSE covers the real-time requirements. Reach for WebSockets only when you have a genuinely bidirectional communication need.

Gotchas

Valkey pub/sub is fire-and-forget. Messages are not persisted. If a subscriber is disconnected when a message is published, that message is lost. If you need guaranteed delivery, use Valkey Streams instead of pub/sub, or design your application so that a missed SSE event triggers a full refresh on reconnection.

The subscriber connection is dedicated. A Valkey connection in subscribe mode cannot execute other commands (GET, SET, PUBLISH). Each SSE client uses its own dedicated subscriber connection. Publishing happens on a separate multiplexed connection.

Avoid pattern subscriptions at scale. PSUBSCRIBE (e.g., user:42:*) is convenient but has a global performance cost. Every PUBLISH to any channel pays O(M) where M is the total number of active pattern subscriptions across all clients. With hundreds of concurrent SSE connections each using pattern subscriptions, every publish slows down. Prefer explicit SUBSCRIBE to the specific channels each client needs.

Proxy timeouts can close idle connections. Nginx defaults to a 60-second proxy read timeout. Caddy and Cloudflare have their own defaults. KeepAlive sends heartbeat comments to prevent this, but verify your proxy configuration allows long-lived connections. For Nginx, set proxy_read_timeout to a high value (3600s or more) on SSE endpoints.

SSE connections count against server resources. Each connected client holds an open TCP connection, a Valkey connection, and a Tokio task. For hundreds of concurrent connections this is negligible. At tens of thousands, monitor memory and file descriptor usage on both your application server and Valkey.

Event names cannot contain newlines. The Event::event() method panics if the name contains \n or \r. Event names should be simple identifiers like status, progress, or updated.

HTTP Client and External APIs

Most web applications need to talk to something beyond their own database: a payment processor, a weather service, an email API, a third-party webhook. In Rust, reqwest is the standard HTTP client for these outgoing requests. It provides an async API built on hyper and tokio, with built-in JSON support, TLS, connection pooling, and configurable timeouts.

This section covers building and configuring a reqwest client, designing typed Rust interfaces for external APIs, handling errors and retries, and testing external integrations with mock servers.

Dependencies

[dependencies]
reqwest = { version = "0.13", features = ["json"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
tokio = { version = "1", features = ["full"] }

The json feature on reqwest enables the .json() request builder method and the .json::<T>() response deserialiser. Both rely on serde, which you already have in the stack for form handling and database mapping.

Building a client

reqwest::Client manages a connection pool internally. Create one at application startup and share it through Axum state. Do not create a new client per request, as that discards the connection pool and TLS session cache.

use reqwest::Client;
use std::time::Duration;

let client = Client::builder()
    .timeout(Duration::from_secs(30))
    .connect_timeout(Duration::from_secs(5))
    .pool_max_idle_per_host(10)
    .build()
    .expect("failed to build HTTP client");

timeout sets the total time allowed for a request (connect + send + receive). connect_timeout sets the limit for TCP connection establishment alone. Both prevent your application from hanging indefinitely when an external service is unresponsive.

Add the client to your application state:

#[derive(Clone)]
pub struct AppState {
    pub db: sqlx::PgPool,
    pub http: reqwest::Client,
}

Making requests

GET with JSON response

use serde::Deserialize;

#[derive(Debug, Deserialize)]
struct WeatherResponse {
    temperature: f64,
    conditions: String,
    wind_speed: f64,
}

async fn get_weather(
    client: &reqwest::Client,
    city: &str,
) -> Result<WeatherResponse, reqwest::Error> {
    let response = client
        .get("https://api.weather.example.com/v1/current")
        .query(&[("city", city)])
        .send()
        .await?
        .error_for_status()?
        .json::<WeatherResponse>()
        .await?;

    Ok(response)
}

.query() appends URL query parameters. .error_for_status() converts 4xx and 5xx responses into errors before attempting to read the body. Without it, a 404 response would try to deserialise the error body as WeatherResponse and fail with a confusing deserialisation error instead of a clear status code error.

POST with JSON body

use serde::{Deserialize, Serialize};

#[derive(Debug, Serialize)]
struct CreateOrderRequest {
    product_id: String,
    quantity: u32,
    customer_email: String,
}

#[derive(Debug, Deserialize)]
struct CreateOrderResponse {
    order_id: String,
    status: String,
    total_cents: u64,
}

async fn create_order(
    client: &reqwest::Client,
    order: &CreateOrderRequest,
) -> Result<CreateOrderResponse, reqwest::Error> {
    client
        .post("https://api.orders.example.com/v1/orders")
        .json(order)
        .send()
        .await?
        .error_for_status()?
        .json::<CreateOrderResponse>()
        .await
}

.json(order) serialises the struct to JSON and sets Content-Type: application/json automatically.

Other response types

Not every API returns JSON. Use .text() for plain text, .bytes() for binary data:

let body = client.get(url).send().await?.text().await?;
let image = client.get(url).send().await?.bytes().await?;

Designing typed API clients

Wrap external API interactions in a dedicated struct. This gives you a single place for base URL configuration, authentication, and error mapping.

use reqwest::Client;
use serde::{Deserialize, Serialize};

pub struct WeatherClient {
    client: Client,
    base_url: String,
    api_key: String,
}

#[derive(Debug, Deserialize)]
pub struct CurrentWeather {
    pub temperature: f64,
    pub conditions: String,
    pub humidity: u32,
}

#[derive(Debug, Deserialize)]
pub struct Forecast {
    pub days: Vec<ForecastDay>,
}

#[derive(Debug, Deserialize)]
pub struct ForecastDay {
    pub date: String,
    pub high: f64,
    pub low: f64,
    pub conditions: String,
}

impl WeatherClient {
    pub fn new(client: Client, base_url: String, api_key: String) -> Self {
        Self {
            client,
            base_url,
            api_key,
        }
    }

    pub async fn current(&self, city: &str) -> Result<CurrentWeather, WeatherError> {
        let response = self
            .client
            .get(format!("{}/v1/current", self.base_url))
            .query(&[("city", city)])
            .bearer_auth(&self.api_key)
            .send()
            .await
            .map_err(WeatherError::Request)?;

        if !response.status().is_success() {
            return Err(WeatherError::status(response).await);
        }

        response.json().await.map_err(WeatherError::Request)
    }

    pub async fn forecast(
        &self,
        city: &str,
        days: u32,
    ) -> Result<Forecast, WeatherError> {
        let response = self
            .client
            .get(format!("{}/v1/forecast", self.base_url))
            .query(&[("city", city), ("days", &days.to_string())])
            .bearer_auth(&self.api_key)
            .send()
            .await
            .map_err(WeatherError::Request)?;

        if !response.status().is_success() {
            return Err(WeatherError::status(response).await);
        }

        response.json().await.map_err(WeatherError::Request)
    }
}

The base_url is configurable so tests can point it at a mock server and production can point it at the real API. The Client is injected rather than created internally, so the application controls timeouts and connection pooling centrally.

Error type for external APIs

Define a dedicated error type that distinguishes between network failures, unexpected status codes, and API-specific error responses:

use thiserror::Error;

#[derive(Debug, Error)]
pub enum WeatherError {
    #[error("request failed: {0}")]
    Request(#[from] reqwest::Error),

    #[error("API error {status}: {message}")]
    Api {
        status: u16,
        message: String,
    },
}

impl WeatherError {
    async fn status(response: reqwest::Response) -> Self {
        let status = response.status().as_u16();
        let message = response
            .text()
            .await
            .unwrap_or_else(|_| "unknown error".to_string());
        Self::Api { status, message }
    }
}

This lets callers match on the error to decide what to do. A 404 from the weather API might mean the city was not found (return a user-facing message). A 500 might mean the service is down (retry or degrade gracefully). A network error means the service was unreachable.

Wiring the client into application state

use std::time::Duration;

let http_client = reqwest::Client::builder()
    .timeout(Duration::from_secs(10))
    .connect_timeout(Duration::from_secs(5))
    .build()
    .expect("failed to build HTTP client");

let weather = WeatherClient::new(
    http_client.clone(),
    std::env::var("WEATHER_API_URL").expect("WEATHER_API_URL required"),
    std::env::var("WEATHER_API_KEY").expect("WEATHER_API_KEY required"),
);

let state = AppState {
    db: pool,
    http: http_client,
    weather,
};

Handlers access the client through state:

async fn weather_page(
    State(state): State<AppState>,
    Path(city): Path<String>,
) -> Result<impl IntoResponse, AppError> {
    let weather = state.weather.current(&city).await?;
    Ok(Html(render_weather(&weather)))
}

JSON serialisation patterns

Renaming fields

External APIs rarely use Rust’s snake_case convention. Use #[serde(rename)] or #[serde(rename_all)] to map between naming styles:

#[derive(Debug, Deserialize)]
#[serde(rename_all = "camelCase")]
struct PaymentResponse {
    payment_id: String,      // maps from "paymentId"
    total_amount: u64,       // maps from "totalAmount"
    currency_code: String,   // maps from "currencyCode"
}

For APIs that use a mix of conventions or have individual oddities:

#[derive(Debug, Deserialize)]
struct ApiResponse {
    #[serde(rename = "ID")]
    id: String,

    #[serde(rename = "created_at")]
    created: String,
}

Optional and default fields

External APIs evolve. Fields get added, deprecated, or become nullable. Use Option<T> for fields that might be absent, and #[serde(default)] for fields with sensible defaults:

#[derive(Debug, Deserialize)]
struct UserProfile {
    pub name: String,
    pub email: String,

    #[serde(default)]
    pub verified: bool,

    pub avatar_url: Option<String>,
}

Option<String> handles both a missing field and an explicit null value. #[serde(default)] fills in false if the field is absent.

Handling API envelope patterns

Many APIs wrap their data in a common envelope:

{
  "status": "ok",
  "data": { "temperature": 22.5, "conditions": "sunny" },
  "metadata": { "request_id": "abc123" }
}

Define a generic wrapper:

#[derive(Debug, Deserialize)]
struct ApiEnvelope<T> {
    status: String,
    data: T,
}

Then deserialise into the envelope and extract the inner value:

let envelope = response
    .json::<ApiEnvelope<CurrentWeather>>()
    .await
    .map_err(WeatherError::Request)?;

Ok(envelope.data)

Flattening nested structures

#[serde(flatten)] merges fields from a nested struct into the parent, useful when an API returns metadata alongside entity fields:

#[derive(Debug, Deserialize)]
struct ApiMetadata {
    request_id: String,
    timestamp: String,
}

#[derive(Debug, Deserialize)]
struct OrderWithMetadata {
    #[serde(flatten)]
    order: Order,

    #[serde(flatten)]
    metadata: ApiMetadata,
}

Authentication with external services

Bearer tokens

The most common pattern. A static API key or an OAuth2 access token passed in the Authorization header:

client
    .get(url)
    .bearer_auth(&api_key)
    .send()
    .await?

reqwest’s .bearer_auth() sets the Authorization: Bearer <token> header.

API keys in headers

Some APIs use a custom header instead of the standard Authorization header:

client
    .get(url)
    .header("X-API-Key", &api_key)
    .send()
    .await?

API keys in query parameters

Less secure (keys appear in server logs and URLs) but some APIs require it:

client
    .get(url)
    .query(&[("api_key", &api_key)])
    .send()
    .await?

Default headers

If every request to an API needs the same authentication header, set it as a default on the client:

use reqwest::header::{HeaderMap, HeaderValue};

let mut headers = HeaderMap::new();
headers.insert(
    "X-API-Key",
    HeaderValue::from_str(&api_key).expect("invalid API key"),
);

let client = Client::builder()
    .default_headers(headers)
    .build()?;

Default headers are sent on every request made by this client. This avoids repeating authentication on each call and ensures you never accidentally forget it.

OAuth2 token refresh

For APIs that issue short-lived access tokens, store the token and its expiry in your API client and refresh when needed:

use std::sync::Arc;
use tokio::sync::RwLock;

struct TokenState {
    access_token: String,
    expires_at: std::time::Instant,
}

pub struct OAuthClient {
    client: Client,
    base_url: String,
    client_id: String,
    client_secret: String,
    token: Arc<RwLock<Option<TokenState>>>,
}

impl OAuthClient {
    async fn get_token(&self) -> Result<String, reqwest::Error> {
        // Check if current token is still valid
        {
            let token = self.token.read().await;
            if let Some(state) = token.as_ref() {
                if state.expires_at > std::time::Instant::now() {
                    return Ok(state.access_token.clone());
                }
            }
        }

        // Refresh the token
        let response: TokenResponse = self
            .client
            .post(format!("{}/oauth/token", self.base_url))
            .form(&[
                ("grant_type", "client_credentials"),
                ("client_id", &self.client_id),
                ("client_secret", &self.client_secret),
            ])
            .send()
            .await?
            .json()
            .await?;

        let token_state = TokenState {
            access_token: response.access_token.clone(),
            expires_at: std::time::Instant::now()
                + std::time::Duration::from_secs(response.expires_in.saturating_sub(60)),
        };

        *self.token.write().await = Some(token_state);
        Ok(response.access_token)
    }

    pub async fn request_with_auth(
        &self,
        url: &str,
    ) -> Result<reqwest::Response, reqwest::Error> {
        let token = self.get_token().await?;
        self.client.get(url).bearer_auth(&token).send().await
    }
}

#[derive(Deserialize)]
struct TokenResponse {
    access_token: String,
    expires_in: u64,
}

The saturating_sub(60) refreshes the token 60 seconds before it actually expires, avoiding requests that race against expiry.

Error handling for external calls

External HTTP calls fail in ways that database calls and local operations do not. The network is unreliable, third-party services go down, and response formats change without warning.

Timeouts

Always set timeouts. A missing timeout means a single unresponsive external service can exhaust your server’s thread pool:

let client = Client::builder()
    .timeout(Duration::from_secs(10))     // total request timeout
    .connect_timeout(Duration::from_secs(5)) // TCP connect timeout
    .build()?;

For requests where you know the response should be fast, override per-request:

client.get(url)
    .timeout(Duration::from_secs(3))
    .send()
    .await?

Retries with reqwest-middleware

reqwest-middleware wraps reqwest::Client with a middleware chain. reqwest-retry adds automatic retries with exponential backoff for transient failures.

[dependencies]
reqwest-middleware = "0.4"
reqwest-retry = "0.7"
use reqwest_middleware::ClientBuilder;
use reqwest_retry::{
    RetryTransientMiddleware,
    policies::ExponentialBackoff,
};
use std::time::Duration;

let retry_policy = ExponentialBackoff::builder()
    .retry_bounds(
        Duration::from_millis(500),
        Duration::from_secs(10),
    )
    .build_with_max_retries(3);

let client = ClientBuilder::new(
    reqwest::Client::builder()
        .timeout(Duration::from_secs(10))
        .connect_timeout(Duration::from_secs(5))
        .build()
        .expect("failed to build HTTP client"),
)
.with(RetryTransientMiddleware::new_with_policy(retry_policy))
.build();

The retry middleware automatically retries on:

  • Connection timeouts and resets
  • HTTP 500, 502, 503, 504 (server errors)
  • HTTP 408 (request timeout)
  • HTTP 429 (too many requests)

It does not retry 4xx client errors (except 408 and 429), because those indicate a problem with the request itself.

The middleware client (ClientWithMiddleware) has the same API as reqwest::Client for making requests. If your API client struct wraps the client, swap reqwest::Client for reqwest_middleware::ClientWithMiddleware:

use reqwest_middleware::ClientWithMiddleware;

pub struct WeatherClient {
    client: ClientWithMiddleware,
    base_url: String,
    api_key: String,
}

Circuit breaking

A circuit breaker prevents your application from hammering a service that is already failing. After a threshold of consecutive failures, it “opens” the circuit and fails requests immediately for a cooldown period, then allows a probe request through to check if the service has recovered.

There is no dominant circuit breaker crate in the Rust ecosystem. For most applications, the combination of timeouts + retries with backoff is sufficient. The backoff itself acts as a partial circuit breaker: each retry waits longer, reducing pressure on the failing service.

For operations where reliability matters beyond what retries provide (payment processing, order fulfilment, webhook delivery), use Restate for durable execution. Restate persists the call, retries across process restarts, and provides exactly-once semantics. This is a fundamentally stronger guarantee than in-process retries, which are lost if the application crashes.

Mapping external errors to AppError

Bridge your API client errors to the application’s error type:

impl From<WeatherError> for AppError {
    fn from(err: WeatherError) -> Self {
        match err {
            WeatherError::Api { status, message } if status == 404 => {
                AppError::NotFound(format!("weather data: {message}"))
            }
            WeatherError::Api { status, message } => {
                tracing::error!(status, message, "external API error");
                AppError::BadGateway("external service returned an error".to_string())
            }
            WeatherError::Request(err) => {
                tracing::error!(error = ?err, "external request failed");
                AppError::BadGateway("external service unavailable".to_string())
            }
        }
    }
}

502 Bad Gateway is the appropriate HTTP status when your server is acting as a gateway to an upstream service and that service fails. Add a BadGateway variant to your AppError if you don’t have one.

Testing external API integrations

External HTTP calls are one of the most important things to test and one of the easiest to get wrong. wiremock starts a real HTTP server in your test process and lets you define expected requests and canned responses.

[dev-dependencies]
wiremock = "0.6"
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }

A complete test

This test exercises the WeatherClient against a mock server:

use wiremock::{MockServer, Mock, ResponseTemplate};
use wiremock::matchers::{method, path, query_param, header};

#[tokio::test]
async fn test_current_weather() {
    // Start a mock HTTP server
    let mock_server = MockServer::start().await;

    // Define the expected request and response
    Mock::given(method("GET"))
        .and(path("/v1/current"))
        .and(query_param("city", "london"))
        .and(header("authorization", "Bearer test-key"))
        .respond_with(
            ResponseTemplate::new(200).set_body_json(serde_json::json!({
                "temperature": 18.5,
                "conditions": "cloudy",
                "humidity": 72
            })),
        )
        .expect(1)
        .mount(&mock_server)
        .await;

    // Create the client pointing at the mock server
    let client = WeatherClient::new(
        reqwest::Client::new(),
        mock_server.uri(),
        "test-key".to_string(),
    );

    // Call the method under test
    let weather = client.current("london").await.unwrap();

    assert_eq!(weather.temperature, 18.5);
    assert_eq!(weather.conditions, "cloudy");
    assert_eq!(weather.humidity, 72);
}

The mock server binds to a random available port on localhost. mock_server.uri() returns the base URL (e.g., http://127.0.0.1:54321). Because WeatherClient accepts a configurable base_url, no production code needs to change.

.expect(1) asserts that the mock was called exactly once. If the test ends without the expected call count, wiremock panics with a clear message showing which mocks were not satisfied.

Testing error responses

#[tokio::test]
async fn test_city_not_found() {
    let mock_server = MockServer::start().await;

    Mock::given(method("GET"))
        .and(path("/v1/current"))
        .respond_with(
            ResponseTemplate::new(404)
                .set_body_string("city not found"),
        )
        .mount(&mock_server)
        .await;

    let client = WeatherClient::new(
        reqwest::Client::new(),
        mock_server.uri(),
        "test-key".to_string(),
    );

    let err = client.current("atlantis").await.unwrap_err();
    match err {
        WeatherError::Api { status, .. } => assert_eq!(status, 404),
        _ => panic!("expected Api error, got {err:?}"),
    }
}

Testing timeout behaviour

#[tokio::test]
async fn test_timeout() {
    let mock_server = MockServer::start().await;

    Mock::given(method("GET"))
        .and(path("/v1/current"))
        .respond_with(
            ResponseTemplate::new(200)
                .set_body_json(serde_json::json!({"temperature": 20.0}))
                .set_delay(std::time::Duration::from_secs(10)),
        )
        .mount(&mock_server)
        .await;

    let client = WeatherClient::new(
        reqwest::Client::builder()
            .timeout(std::time::Duration::from_secs(1))
            .build()
            .unwrap(),
        mock_server.uri(),
        "test-key".to_string(),
    );

    let err = client.current("london").await.unwrap_err();
    assert!(matches!(err, WeatherError::Request(_)));
}

wiremock’s .set_delay() on the response template simulates a slow upstream service. The test verifies that the client’s timeout configuration works as expected.

Gotchas

Reuse the Client. Each reqwest::Client holds a connection pool. Creating a new client per request means no connection reuse, no TLS session caching, and a DNS lookup on every call. Build one client at startup and share it.

Always set timeouts. reqwest has no default timeout. Without one, a request to an unresponsive host blocks the Tokio task indefinitely. Set both timeout (total) and connect_timeout (connection establishment) on the client builder.

Call .error_for_status() or check status manually. A 404 or 500 response from an external API is not a reqwest error by default. The request succeeded at the HTTP level. If you skip the status check, you’ll get a confusing deserialisation error when serde tries to parse an error body as your response type.

Watch for #[serde(deny_unknown_fields)]. It’s tempting to add this to be strict about API responses, but external APIs add new fields all the time. Unknown fields should be silently ignored (serde’s default) to avoid breaking your application when a third-party adds a field to their response.

Token and credential storage. API keys, client secrets, and OAuth2 credentials belong in environment variables or a secrets manager, not in code. The Configuration and Secrets section covers this in detail.

Log external failures, but not credentials. When logging failed external requests for debugging, ensure you are not writing API keys, bearer tokens, or request bodies containing sensitive data to your logs. Log the URL, status code, and error message. Skip headers and bodies unless you have confirmed they contain no secrets.

Background Jobs and Durable Execution with Restate

Web applications run work outside the request/response cycle constantly: processing a payment after checkout, sending confirmation emails, generating reports, importing data. When any of these steps fails partway through, you need the operation to resume from where it left off, not start over or silently disappear.

Restate is a durable execution engine. It records every step of a handler in a persistent journal. If a step fails or the process crashes, Restate replays from the journal, skipping completed steps and retrying from the point of failure. You get automatic retries, exactly-once side effects, and workflows that survive process restarts, without writing retry logic or state machines yourself.

This section covers integrating Restate with an Axum application as separate workspace binaries, defining durable workflows, triggering them from HTTP handlers, and reporting progress back to the browser via Valkey pub/sub and SSE.

When to use Restate

The design principle for this stack is durable by default: any work that must not be silently lost goes through Restate.

Use Restate for:

  • Multi-step operations involving external services (payment + inventory + email)
  • Any side effect that must happen exactly once (sending a notification, charging a card, calling a third-party API)
  • Work that outlives an HTTP request timeout (report generation, data imports, file processing)
  • Operations that need automatic retries with persistence across process restarts
  • Coordinating work across multiple services with transactional guarantees

The bar for skipping Restate is high. A tokio::spawn that fires off a quick in-memory computation with no external effects is fine. But the moment the spawned task calls an external API, sends an email, or does anything the user expects to complete reliably, route it through Restate. The cost is one HTTP hop to the Restate server. What you get back is durability, observability, and automatic retry logic without writing any of it yourself.

How Restate works

Restate runs as a separate server process that sits between your application and your service handlers:

Axum app ──HTTP──▶ Restate Server (port 8080) ──HTTP──▶ Worker (port 9080)
                         │
                         ▼
                   Journal + State
                 (durable log + RocksDB)

Your Axum application sends requests to the Restate server’s ingress on port 8080. Restate forwards them to your worker process, which runs the actual handler logic using the Restate SDK on port 9080. Every operation the handler performs is recorded in Restate’s journal. If the handler crashes, Restate replays the journal against a new handler invocation: completed steps return their stored results without re-executing, and execution resumes from the failed step.

The Restate server is a single binary with no external dependencies. It stores its journal and state in an embedded RocksDB instance backed by a durable replicated log. The admin API on port 9070 handles service registration and provides a built-in UI for inspecting invocations.

Service types

The Restate Rust SDK provides three types of handlers, each defined as a trait with a proc macro.

Services (#[restate_sdk::service]) are stateless handlers. Multiple invocations run concurrently. Use these for independent operations like sending emails or calling external APIs.

Virtual objects (#[restate_sdk::object]) are stateful entities identified by a string key. Each object has isolated key/value state stored durably by Restate. Only one exclusive handler runs at a time per key, which guarantees state consistency without locks. Handlers marked #[shared] run concurrently with read-only state access.

Workflows (#[restate_sdk::workflow]) are a specialised form of virtual object. The run handler executes exactly once per workflow ID. Additional #[shared] handlers can query the workflow’s state or signal it through durable promises. Workflow state is retained for 24 hours after completion by default.

TypeStateConcurrency per keyUse case
ServiceNoneConcurrentStateless operations: send email, call API, transform data
Virtual ObjectPer-key K/VOne exclusive handler at a timeMutable state: counter, order tracker, rate limiter
WorkflowPer-workflow K/Vrun exclusive; #[shared] concurrentMulti-step processes: order fulfilment, onboarding, data pipeline

Durable execution primitives

Journaled side effects

ctx.run() executes a closure and persists the result in Restate’s journal. On replay, the stored result is returned without re-executing the closure. Wrap every non-deterministic operation (HTTP calls, database writes, random number generation) in ctx.run().

let payment_id: String = ctx
    .run(|| charge_payment(order.clone()))
    .name("charge_payment")
    .await?;

The .name() call labels the operation in the Restate UI for observability. It is optional but worth adding.

If the closure fails, Restate retries it with exponential backoff. The default retry policy retries indefinitely. Override it for operations that should fail fast:

use restate_sdk::prelude::*;
use std::time::Duration;

let result = ctx
    .run(|| call_flaky_service())
    .retry_policy(
        RunRetryPolicy::default()
            .initial_delay(Duration::from_millis(100))
            .exponentiation_factor(2.0)
            .max_attempts(5),
    )
    .name("flaky_service")
    .await?;

Two constraints on ctx.run() closures: you cannot use the Restate context (ctx) inside the closure (no state access, no nested run calls, no service calls), and the run call must be immediately awaited before making other context calls.

Terminal errors

Return a TerminalError from a ctx.run() closure to signal a permanent failure that should not be retried. A declined credit card or an invalid request are terminal; a network timeout is not.

use restate_sdk::prelude::*;

async fn charge_payment(order: Order) -> Result<String, HandlerError> {
    let resp = payment_client.charge(&order).await;
    match resp {
        Ok(charge) => Ok(charge.id),
        Err(e) if e.is_retryable() => Err(e.into()),
        Err(e) => Err(TerminalError::new(format!("Payment permanently failed: {e}")).into()),
    }
}

Durable state

Virtual objects and workflows have access to key/value state that is persisted by Restate. State changes are journaled alongside execution and survive crashes.

ctx.set("status", "processing".to_string());
let status: Option<String> = ctx.get("status").await?;
ctx.clear("status");

Durable timers

ctx.sleep() suspends the handler for a duration. Restate persists the timer. If the process crashes during the sleep, Restate resumes the handler on another invocation when the timer fires.

ctx.sleep(Duration::from_secs(60)).await?;

Workspace layout

The Axum web application and the Restate worker run as separate processes. Shared domain types live in a common crate. This follows the project’s workspace-with-multiple-crates pattern.

your-project/
├── Cargo.toml              (workspace root)
├── crates/
│   ├── web/                (Axum web application)
│   │   └── Cargo.toml
│   ├── worker/             (Restate service worker)
│   │   └── Cargo.toml
│   └── shared/             (shared domain types)
│       └── Cargo.toml

Worker dependencies

# crates/worker/Cargo.toml
[dependencies]
restate-sdk = "0.9"
tokio = { version = "1", features = ["full"] }
tracing = "0.1"
tracing-subscriber = "0.3"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
reqwest = { version = "0.13", features = ["json"] }
redis = { version = "1.0", features = ["tokio-comp"] }
anyhow = "1"
shared = { path = "../shared" }

reqwest is for registering the worker with the Restate server on startup and for side effects that call external APIs. The redis crate is for publishing progress events to Valkey; the SSE infrastructure in the web application picks them up.

Shared types

Define domain types in the shared crate so both the web application and the worker use identical structs:

// crates/shared/src/lib.rs
use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Order {
    pub id: String,
    pub customer_email: String,
    pub items: Vec<OrderItem>,
    pub total_cents: u64,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OrderItem {
    pub product_id: String,
    pub quantity: u32,
    pub price_cents: u64,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct FulfilmentResult {
    pub order_id: String,
    pub payment_id: String,
    pub status: String,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Progress {
    pub percent: u32,
    pub message: String,
}

Defining the workflow

An order fulfilment workflow that processes payment, reserves inventory, sends a confirmation email, and reports progress at each stage. The workflow uses ctx.run() for each side effect, ctx.set() to track progress in durable state, and publishes to Valkey for real-time SSE updates.

// crates/worker/src/fulfilment.rs
use restate_sdk::prelude::*;
use shared::{FulfilmentResult, Order, Progress};

#[restate_sdk::workflow]
pub trait OrderFulfilment {
    async fn run(order: Json<Order>) -> Result<Json<FulfilmentResult>, HandlerError>;
    #[shared]
    async fn get_progress() -> Result<Json<Progress>, HandlerError>;
}

pub struct OrderFulfilmentImpl {
    pub valkey: redis::Client,
}

impl OrderFulfilment for OrderFulfilmentImpl {
    async fn run(
        &self,
        ctx: WorkflowContext<'_>,
        Json(order): Json<Order>,
    ) -> Result<Json<FulfilmentResult>, HandlerError> {
        let order_id = ctx.key().to_string();

        // Step 1: Process payment
        report_progress(&ctx, &self.valkey, &order_id, 0, "Processing payment...");
        let payment_id: String = ctx
            .run(|| charge_payment(order.clone()))
            .name("charge_payment")
            .await?;

        // Step 2: Reserve inventory
        report_progress(&ctx, &self.valkey, &order_id, 33, "Reserving inventory...");
        ctx.run(|| reserve_inventory(order.clone()))
            .name("reserve_inventory")
            .await?;

        // Step 3: Send confirmation email
        report_progress(&ctx, &self.valkey, &order_id, 66, "Sending confirmation...");
        ctx.run(|| send_confirmation(order.clone(), payment_id.clone()))
            .name("send_confirmation")
            .await?;

        // Step 4: Complete
        report_progress(&ctx, &self.valkey, &order_id, 100, "Order fulfilled.");

        Ok(Json(FulfilmentResult {
            order_id,
            payment_id,
            status: "fulfilled".to_string(),
        }))
    }

    async fn get_progress(
        &self,
        ctx: SharedWorkflowContext<'_>,
    ) -> Result<Json<Progress>, HandlerError> {
        let progress = ctx
            .get::<Progress>("progress")
            .await?
            .unwrap_or(Progress {
                percent: 0,
                message: "Waiting to start...".to_string(),
            });
        Ok(Json(progress))
    }
}

Each ctx.run() call wraps a side effect function that takes ownership of the data it needs. This is the standard pattern: clone the data before the closure so the closure owns its inputs. The side effect functions are regular async functions that call external services:

async fn charge_payment(order: Order) -> Result<String, anyhow::Error> {
    // Call payment provider API (see HTTP Client section for typed client patterns)
    let client = reqwest::Client::new();
    let resp: serde_json::Value = client
        .post("https://payments.example.com/v1/charges")
        .json(&serde_json::json!({
            "amount": order.total_cents,
            "currency": "gbp",
        }))
        .send()
        .await?
        .error_for_status()?
        .json()
        .await?;
    Ok(resp["id"].as_str().unwrap_or_default().to_string())
}

async fn reserve_inventory(order: Order) -> Result<(), anyhow::Error> {
    // Call inventory service
    Ok(())
}

async fn send_confirmation(order: Order, payment_id: String) -> Result<(), anyhow::Error> {
    // Send email via Lettre (see Email section)
    Ok(())
}

Progress reporting

The report_progress function does two things: it updates durable state (queryable via the get_progress handler) and publishes an event to Valkey for real-time SSE delivery. The durable state is the authoritative source; the Valkey publish is best-effort for pushing updates to the browser.

fn report_progress(
    ctx: &WorkflowContext<'_>,
    valkey: &redis::Client,
    order_id: &str,
    percent: u32,
    message: &str,
) {
    // Durable state, queryable via get_progress
    ctx.set(
        "progress",
        Progress {
            percent,
            message: message.to_string(),
        },
    );

    // Real-time push to SSE clients via Valkey (best-effort, limited retries)
    let valkey = valkey.clone();
    let channel = format!("order:{order_id}");
    let event_type = if percent >= 100 { "complete" } else { "progress" };
    let html = format!(
        r#"<div class="progress-bar" style="width: {percent}%">{percent}%</div>
        <p>{message}</p>"#
    );
    let payload = serde_json::json!({
        "event_type": event_type,
        "data": html,
    })
    .to_string();

    // Fire-and-forget: ignore Valkey publish failures
    let _ = ctx
        .run(|| async move {
            let mut conn = valkey.get_multiplexed_async_connection().await?;
            redis::cmd("PUBLISH")
                .arg(&channel)
                .arg(&payload)
                .query_async::<()>(&mut conn)
                .await?;
            Ok::<(), redis::RedisError>(())
        })
        .retry_policy(RunRetryPolicy::default().max_attempts(2))
        .await;
}

The retry policy limits Valkey publish retries to 2 attempts. Progress events are informational; if Valkey is temporarily unreachable, the workflow should continue. The let _ = discards the result so a failed publish never fails the workflow.

The Valkey event payload matches the format established in the Server-Sent Events section: a JSON object with event_type and data fields. The SSE handler parses this format and delivers it to the browser. When percent >= 100, the event type switches to "complete", which triggers sse-close="complete" in the browser and closes the SSE connection.

Running the worker

The worker binary starts the Restate SDK HTTP server and registers itself with the Restate server on startup.

// crates/worker/src/main.rs
use restate_sdk::prelude::*;
use std::time::Duration;

mod fulfilment;
use fulfilment::OrderFulfilmentImpl;

#[tokio::main]
async fn main() {
    tracing_subscriber::fmt::init();

    let valkey_url =
        std::env::var("VALKEY_URL").unwrap_or_else(|_| "redis://127.0.0.1:6379".to_string());
    let valkey = redis::Client::open(valkey_url).expect("invalid VALKEY_URL");

    let worker_addr = std::env::var("WORKER_ADDR")
        .unwrap_or_else(|_| "0.0.0.0:9080".to_string());
    let restate_admin_url = std::env::var("RESTATE_ADMIN_URL")
        .unwrap_or_else(|_| "http://127.0.0.1:9070".to_string());
    let worker_url = std::env::var("WORKER_URL")
        .unwrap_or_else(|_| "http://127.0.0.1:9080".to_string());

    let endpoint = Endpoint::builder()
        .bind(OrderFulfilmentImpl { valkey }.serve())
        .build();

    // Start the SDK HTTP server in a background task
    let addr = worker_addr.parse().expect("invalid WORKER_ADDR");
    tokio::spawn(async move {
        HttpServer::new(endpoint).listen_and_serve(addr).await;
    });

    // Wait for the server to bind
    tokio::time::sleep(Duration::from_millis(500)).await;

    // Register with the Restate server
    register_deployment(&restate_admin_url, &worker_url).await;

    tracing::info!("Worker running on {worker_addr}");

    // Block until shutdown signal
    tokio::signal::ctrl_c().await.unwrap();
}

async fn register_deployment(admin_url: &str, worker_url: &str) {
    let client = reqwest::Client::new();
    match client
        .post(format!("{admin_url}/deployments"))
        .json(&serde_json::json!({
            "uri": worker_url,
            "force": true,
        }))
        .send()
        .await
    {
        Ok(resp) if resp.status().is_success() => {
            tracing::info!("Registered deployment at {worker_url}");
        }
        Ok(resp) => {
            let status = resp.status();
            let body = resp.text().await.unwrap_or_default();
            tracing::warn!("Restate registration returned {status}: {body}");
        }
        Err(e) => {
            tracing::warn!("Failed to register with Restate: {e}");
        }
    }
}

The "force": true field in the registration body tells Restate to update the deployment if it already exists. This handles the common development case where you restart the worker after code changes.

Registration calls the Restate admin API on port 9070. Restate performs service discovery automatically: it queries your worker’s endpoint, finds all bound services and their handlers, and registers them. After registration, the services are callable through the Restate ingress on port 8080.

Triggering workflows from Axum handlers

The Axum web application triggers Restate workflows by sending HTTP requests to the Restate ingress. This is a plain reqwest call; no Restate SDK is needed in the web application.

The Restate team is developing a standalone typed client that will be generated from the same trait declarations that define the service. This will replace the raw HTTP calls shown below with compile-time checked method calls. Until then, the ingress HTTP API is the integration point.

URL patterns

Service typeURL pattern
ServicePOST /ServiceName/handlerName
Virtual ObjectPOST /ObjectName/{key}/handlerName
Workflow (run)POST /WorkflowName/{workflowId}/run
Workflow (query/signal)POST /WorkflowName/{workflowId}/handlerName

Append /send to any URL for fire-and-forget invocation. Restate accepts the request immediately and returns an invocation ID. The handler runs in the background.

Starting a workflow

An Axum handler that creates an order, triggers the fulfilment workflow, and returns a confirmation page with live progress:

use axum::{extract::State, response::Html, Form};

async fn create_order(
    State(state): State<AppState>,
    Form(input): Form<OrderInput>,
) -> Result<Html<String>, AppError> {
    // Insert order into the database
    let order = insert_order(&state.db, &input).await?;

    // Trigger the fulfilment workflow (fire-and-forget)
    let resp = state
        .http
        .post(format!(
            "{}/OrderFulfilment/{}/run/send",
            state.restate_ingress_url, order.id,
        ))
        .json(&order)
        .send()
        .await
        .map_err(|e| {
            tracing::error!(error = ?e, "failed to trigger fulfilment workflow");
            AppError::BadGateway("could not start order processing".into())
        })?;

    if !resp.status().is_success() {
        tracing::error!(status = %resp.status(), "Restate rejected workflow");
        return Err(AppError::BadGateway("could not start order processing".into()));
    }

    Ok(Html(render_order_progress(&order)))
}

The /send suffix is critical. Without it, the HTTP call blocks until the entire workflow completes, which could take seconds or minutes. With /send, Restate returns immediately and the workflow runs in the background.

The confirmation page with SSE progress

The rendered page connects to the SSE endpoint established in the Server-Sent Events section. The SSE handler subscribes to the Valkey channel order:{id}, which the workflow publishes progress events to.

use maud::{html, Markup};

fn render_order_progress(order: &Order) -> String {
    html! {
        h2 { "Order " (order.id) " placed" }
        div hx-ext="sse"
            sse-connect=(format!("/events/orders/{}/progress", order.id))
            sse-close="complete" {
            div sse-swap="progress" hx-swap="innerHTML" {
                div .progress-bar style="width: 0%" { "0%" }
                p { "Starting fulfilment..." }
            }
        }
    }
    .into_string()
}

The sse-close="complete" attribute closes the SSE connection when the workflow publishes its final event with event_type: "complete". The full SSE wiring (the /events/orders/{id}/progress endpoint, Valkey subscriber, event delivery) is covered in the Server-Sent Events section.

Synchronous invocation

For cases where you need the workflow result before responding (rare, but sometimes necessary for short-running workflows):

let result: FulfilmentResult = state
    .http
    .post(format!(
        "{}/OrderFulfilment/{}/run",
        state.restate_ingress_url, order.id,
    ))
    .json(&order)
    .send()
    .await?
    .error_for_status()?
    .json()
    .await?;

Without /send, the call blocks until run completes and returns the workflow’s result directly.

Idempotency

Workflows are inherently idempotent: the run handler executes exactly once per workflow ID. Calling /OrderFulfilment/order-123/run twice with the same ID attaches the second call to the existing execution rather than starting a new one. Use the order ID (or another natural identifier) as the workflow ID to get this deduplication for free.

For services (which are not keyed), add an Idempotency-Key header to prevent duplicate processing:

state
    .http
    .post(format!("{}/EmailSender/send_confirmation", state.restate_ingress_url))
    .header("idempotency-key", &order.id)
    .json(&email_details)
    .send()
    .await?;

Restate caches the response for 24 hours, returning the cached result for duplicate calls.

Application state

Add the Restate ingress URL to your Axum application state:

#[derive(Clone)]
pub struct AppState {
    pub db: sqlx::PgPool,
    pub http: reqwest::Client,
    pub valkey: redis::Client,
    pub restate_ingress_url: String,
}

Read the URL from an environment variable at startup:

let restate_ingress_url = std::env::var("RESTATE_INGRESS_URL")
    .unwrap_or_else(|_| "http://127.0.0.1:8080".to_string());

Running Restate in development

Restate and Valkey run as Docker containers. The web application and worker run on the host, consistent with the project’s approach of Docker for backing services only.

Add Restate to your Docker Compose file alongside PostgreSQL and Valkey:

# docker-compose.yml
services:
  postgres:
    image: postgres:17
    ports:
      - "5432:5432"
    environment:
      POSTGRES_DB: app
      POSTGRES_USER: app
      POSTGRES_PASSWORD: app

  valkey:
    image: valkey/valkey:8
    ports:
      - "6379:6379"

  restate:
    image: docker.restate.dev/restatedev/restate:1.6
    ports:
      - "8080:8080"   # ingress (client requests)
      - "9070:9070"   # admin API + UI
      - "9071:9071"   # internal (cluster communication)

Development workflow

  1. Start the backing services:
docker compose up -d
  1. Run the worker (registers itself with Restate on startup):
cargo run -p worker
  1. Run the web application:
cargo run -p web
  1. Open the Restate UI at http://localhost:9070 to inspect registered services, active invocations, and their journal entries.

When the worker registers with Restate, the Restate server (running in Docker) needs to reach the worker (running on the host). On macOS and Windows, Docker Desktop maps host.docker.internal to the host. Set WORKER_URL=http://host.docker.internal:9080 when the worker registers. On Linux, use --network=host on the Restate container or set up bridge networking.

Re-registration on code changes

The worker registers with "force": true, so restarting the worker after code changes updates the deployment automatically. If you add or remove handlers, the updated service definitions are picked up on the next registration.

Deploying Restate in production

Restate is a single binary with no external dependencies. It stores its own state in an embedded data directory. Deploy it alongside your application, not as a managed service.

Single-server deployment

On a VPS running Docker Compose, add Restate as another service:

services:
  restate:
    image: docker.restate.dev/restatedev/restate:1.6
    restart: unless-stopped
    volumes:
      - restate_data:/target/restate-data
    ports:
      - "8080:8080"
    # Admin API should not be publicly exposed
    # Access via Tailscale or SSH tunnel

volumes:
  restate_data:

Persist the Restate data directory (/target/restate-data inside the container) to a volume. This preserves the journal and state across container restarts.

The admin API (port 9070) should not be exposed to the public internet. Access it through your private network (Tailscale, VPN, or SSH tunnel) for service registration and the UI.

The worker in production

Build the worker as a separate Docker image from the same workspace:

FROM rust:1.85 AS builder
WORKDIR /app
COPY . .
RUN cargo build --release -p worker

FROM debian:bookworm-slim
COPY --from=builder /app/target/release/worker /usr/local/bin/
CMD ["worker"]

The worker registers with Restate on startup, so deploying a new version is: build the image, restart the container, and the new handlers are registered automatically.

Scaling

Restate handles hundreds of concurrent workflow invocations on a single server. Each invocation consumes minimal resources when suspended (waiting on a timer, sleeping, or paused between steps).

If you outgrow a single Restate server, Restate supports clustered deployment with partitioned state and replicated logs. This is a significant operational step. For most applications covered by this guide (content sites, CRUD apps, internal tools), a single Restate instance is sufficient.

Gotchas

Everything non-deterministic must be in ctx.run(). HTTP calls, database queries, reading the current time, generating random numbers. If it produces different results on different executions, wrap it. Forgetting this causes journal replay to diverge, which Restate detects and flags as an error.

Side effect functions must own their data. The closure passed to ctx.run() must be Send + 'static. Clone values before passing them into the closure rather than borrowing from the handler’s scope.

The worker is not your web server. The Restate SDK’s HttpServer serves the Restate protocol, not HTTP for browsers. Keep the Axum web application and the Restate worker as separate binaries. They share types through the workspace, not a runtime.

Registration must happen after code changes. When you add, remove, or rename handlers, the worker must re-register with Restate so it knows the updated service definitions. The auto-registration pattern shown above handles this. If you skip registration after a change, Restate routes requests to the old handler definitions and invocations fail.

Workflow IDs are unique. Calling run on a workflow ID that has already completed or is currently running attaches to the existing execution. If you need to re-run a workflow for the same entity, use a new ID (e.g., order-123-retry-1 or include a timestamp).

Valkey progress events are fire-and-forget. If the browser is not connected when a progress event is published, that event is lost. The get_progress shared handler provides a durable fallback: the browser can poll it on reconnection or use it as the initial state before the SSE connection is established.

Test with restate-sdk-testcontainers. The restate-sdk-testcontainers crate spins up a Restate server in Docker for integration tests, similar to how testcontainers works for PostgreSQL. Use it to test workflows end-to-end without a persistent Restate instance.

Pin the Restate server version. Use a specific image tag (e.g., restate:1.6) rather than latest. The Restate SDK and server have version compatibility ranges. The Rust SDK v0.9 supports Restate Server 1.3 through 1.6.

AI and LLM Integration

LLM features in a web application (content generation, summarisation, classification, conversational interfaces) are fundamentally HTTP calls to an inference API. The challenge is not the call itself but everything around it: provider abstraction, structured tool use, streaming partial responses to the browser, and surviving failures in calls that are expensive, slow, and rate-limited.

Rig is a Rust library for building LLM-powered applications. It provides a unified interface across providers (Anthropic, OpenAI, Ollama, Gemini, and others), typed tool definitions, streaming support, and an agent abstraction that handles multi-step tool-calling loops. This section covers integrating Rig with Axum handlers, defining tools, streaming responses to the browser via SSE, and making AI workflows durable with Restate.

Dependencies

[dependencies]
rig-core = "0.31"
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
futures-util = "0.3"

Rig includes all providers by default. No feature flags are needed to enable Anthropic, OpenAI, or Ollama support.

Provider setup

Anthropic

Anthropic’s Claude models are the primary provider for the examples in this section. Create a client from the ANTHROPIC_API_KEY environment variable:

use rig::providers::anthropic;

let client = anthropic::Client::from_env();
let agent = client.agent("claude-sonnet-4-20250514")
    .preamble("You are a helpful assistant.")
    .build();

Client::from_env() reads ANTHROPIC_API_KEY from the environment. The model string matches Anthropic’s model ID format. Add the API key to your .env file for local development:

# .env
ANTHROPIC_API_KEY=sk-ant-...

OpenAI

OpenAI is a drop-in alternative. The agent code is identical apart from the client and model name:

use rig::providers::openai;

let client = openai::Client::from_env(); // reads OPENAI_API_KEY
let agent = client.agent("gpt-4o")
    .preamble("You are a helpful assistant.")
    .build();

This is the core value of Rig’s provider abstraction: your application code uses the Prompt, Chat, and StreamingPrompt traits. Swapping providers means changing two lines, not rewriting your handlers.

Ollama for local inference

Ollama runs open-weight models locally. It fits the self-hosted ethos of this stack and is useful for development without burning API credits, for privacy-sensitive workloads, and for running smaller models where latency to a cloud API is unnecessary overhead.

Add Ollama to your Docker Compose alongside other backing services:

# docker-compose.yml
services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama

volumes:
  ollama_data:

Pull a model after the container starts:

docker exec -it ollama ollama pull llama3.2

Create a Rig client pointing at the local instance:

use rig::providers::ollama;

let client = ollama::Client::from_env(); // reads OLLAMA_API_BASE_URL, default http://localhost:11434
let agent = client.agent("llama3.2")
    .preamble("You are a helpful assistant.")
    .build();

OLLAMA_API_BASE_URL defaults to http://localhost:11434. No API key is required.

Basic completions in Axum handlers

The simplest integration: an Axum handler that sends a prompt to the LLM and returns the response as HTML.

use axum::{extract::State, response::Html, Form};
use rig::completion::Prompt;
use serde::Deserialize;

#[derive(Deserialize)]
struct SummariseInput {
    text: String,
}

async fn summarise(
    State(state): State<AppState>,
    Form(input): Form<SummariseInput>,
) -> Result<Html<String>, AppError> {
    let prompt = format!(
        "Summarise the following text in 2-3 sentences:\n\n{}",
        input.text
    );

    let summary = state.agent.prompt(&prompt).await.map_err(|e| {
        tracing::error!(error = ?e, "LLM completion failed");
        AppError::BadGateway("AI service unavailable".into())
    })?;

    Ok(Html(format!("<div class=\"summary\">{summary}</div>")))
}

The Prompt trait’s .prompt() method sends a one-shot request and returns the full response as a String. The agent is stored in application state, shared across requests:

use rig::providers::anthropic;

#[derive(Clone)]
pub struct AppState {
    pub db: sqlx::PgPool,
    pub http: reqwest::Client,
    pub agent: rig::agent::Agent<rig::providers::anthropic::completion::CompletionModel>,
}

Building the agent at startup:

let anthropic = anthropic::Client::from_env();

let agent = anthropic
    .agent("claude-sonnet-4-20250514")
    .preamble("You are an assistant that summarises text concisely.")
    .temperature(0.3)
    .build();

let state = AppState {
    db: pool,
    http: reqwest::Client::new(),
    agent,
};

Lower temperature values (0.0 to 0.3) produce more deterministic output, which is appropriate for summarisation, classification, and extraction. Higher values (0.7 to 1.0) produce more creative output for generation tasks.

Chat with history

For multi-turn conversations, the Chat trait accepts a message and a history vector:

use rig::completion::{Chat, Message};

async fn chat(
    State(state): State<AppState>,
    Form(input): Form<ChatInput>,
) -> Result<Html<String>, AppError> {
    // Load chat history from session or database
    let history: Vec<Message> = load_chat_history(&state.db, input.session_id).await?;

    let response = state.agent.chat(&input.message, history).await.map_err(|e| {
        tracing::error!(error = ?e, "chat completion failed");
        AppError::BadGateway("AI service unavailable".into())
    })?;

    // Persist the new exchange
    save_chat_messages(&state.db, input.session_id, &input.message, &response).await?;

    Ok(Html(format!("<div class=\"message assistant\">{response}</div>")))
}

Store chat history in PostgreSQL rather than in-memory. Sessions expire, servers restart, and users expect conversations to persist.

Tool use

LLMs generate text. Tools let them take actions: query a database, call an API, perform calculations, look up current information. The model decides which tool to call and with what arguments, your code executes the tool, and the result feeds back into the model’s next response.

Rig defines tools through the Tool trait. Each tool is a Rust struct with typed arguments, typed output, and a JSON schema that tells the model what the tool does and what parameters it accepts.

Defining a tool

A tool that searches for products in a database:

use rig::tool::{Tool, ToolDyn};
use rig::completion::ToolDefinition;
use serde::{Deserialize, Serialize};
use serde_json::json;

#[derive(Debug, Deserialize)]
struct ProductSearchArgs {
    query: String,
    max_results: Option<u32>,
}

#[derive(Debug, thiserror::Error)]
#[error("product search failed: {0}")]
struct ProductSearchError(String);

#[derive(Serialize, Deserialize)]
struct ProductSearch {
    db: sqlx::PgPool,
}

impl Tool for ProductSearch {
    const NAME: &'static str = "search_products";

    type Error = ProductSearchError;
    type Args = ProductSearchArgs;
    type Output = String;

    async fn definition(&self, _prompt: String) -> ToolDefinition {
        ToolDefinition {
            name: "search_products".to_string(),
            description: "Search the product catalogue by name or description. Returns matching products with prices.".to_string(),
            parameters: json!({
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query for product name or description"
                    },
                    "max_results": {
                        "type": "integer",
                        "description": "Maximum number of results to return (default 5)"
                    }
                },
                "required": ["query"]
            }),
        }
    }

    async fn call(&self, args: Self::Args) -> Result<Self::Output, Self::Error> {
        let max = args.max_results.unwrap_or(5) as i64;

        let products = sqlx::query_as!(
            Product,
            r#"
            SELECT id, name, description, price_cents
            FROM products
            WHERE to_tsvector('english', name || ' ' || description) @@ plainto_tsquery('english', $1)
            ORDER BY ts_rank(to_tsvector('english', name || ' ' || description), plainto_tsquery('english', $1)) DESC
            LIMIT $2
            "#,
            args.query,
            max,
        )
        .fetch_all(&self.db)
        .await
        .map_err(|e| ProductSearchError(e.to_string()))?;

        // Return results as a formatted string the model can reason about
        let formatted = products
            .iter()
            .map(|p| format!("- {} ({}): {}", p.name, format_price(p.price_cents), p.description))
            .collect::<Vec<_>>()
            .join("\n");

        if formatted.is_empty() {
            Ok("No products found matching the search query.".to_string())
        } else {
            Ok(formatted)
        }
    }
}

Key points about the Tool trait:

  • NAME: a static string identifier the model uses to invoke the tool.
  • Args: a deserializable struct. Rig parses the model’s JSON arguments into this type automatically.
  • Output: a serialisable type returned to the model. Strings work well because the model consumes the result as text.
  • definition(): returns a JSON Schema that describes the tool’s purpose and parameters. The model uses this to decide when and how to call the tool.
  • call(): the actual implementation. This is regular Rust code, so it can query databases, call APIs, read files, or do anything else.

Wiring tools into an agent

Build an agent with tools attached:

use rig::tool::ToolDyn;

let product_search = ProductSearch { db: pool.clone() };

let tools: Vec<Box<dyn ToolDyn>> = vec![Box::new(product_search)];

let agent = anthropic
    .agent("claude-sonnet-4-20250514")
    .preamble(
        "You are a shopping assistant. Use the search_products tool to find products \
         that match what the customer is looking for. Provide helpful recommendations \
         based on the search results."
    )
    .tools(tools)
    .max_tokens(1024)
    .build();

When the user prompts this agent, the model can decide to call search_products with appropriate arguments. Rig handles the loop automatically: it sends the prompt, receives a tool call, executes the tool, sends the result back to the model, and returns the final text response. A single .prompt() call can involve multiple round trips between your code and the model.

// The agent calls search_products internally, then responds with recommendations
let response = agent.prompt("I need a waterproof jacket for hiking").await?;

Multiple tools

Agents can use multiple tools. Define each tool separately and pass them all to the builder:

let tools: Vec<Box<dyn ToolDyn>> = vec![
    Box::new(ProductSearch { db: pool.clone() }),
    Box::new(OrderLookup { db: pool.clone() }),
    Box::new(InventoryCheck { http: http_client.clone() }),
];

let agent = anthropic
    .agent("claude-sonnet-4-20250514")
    .preamble("You are a customer service agent. You can search products, look up orders, and check inventory.")
    .tools(tools)
    .build();

The model chooses which tools to call based on the user’s query and the tool descriptions. Good tool descriptions are critical: the model relies on the description field in ToolDefinition to understand when each tool is appropriate.

Streaming LLM responses via SSE

LLM responses arrive token by token. Streaming them to the browser as they generate gives the user immediate feedback instead of a blank screen followed by a wall of text. Rig’s StreamingPrompt trait produces a stream of chunks that you can convert into Axum SSE events.

use axum::response::sse::{Event, KeepAlive, Sse};
use futures_util::{Stream, StreamExt};
use rig::streaming::StreamingPrompt;
use std::convert::Infallible;

async fn stream_response(
    State(state): State<AppState>,
    Form(input): Form<PromptInput>,
) -> Sse<impl Stream<Item = Result<Event, Infallible>>> {
    let prompt = input.prompt.clone();
    let agent = state.agent.clone();

    let stream = async_stream::stream! {
        match agent.stream_prompt(&prompt).await {
            Ok(mut completion_stream) => {
                let mut stream = completion_stream.stream().await.unwrap();
                while let Some(chunk) = stream.next().await {
                    match chunk {
                        Ok(rig::streaming::StreamedAssistantContent::Text(text)) => {
                            let html = format!("<span>{}</span>", text.text);
                            yield Ok(Event::default().event("chunk").data(html));
                        }
                        Ok(_) => {} // tool calls, usage data
                        Err(e) => {
                            tracing::error!(error = ?e, "stream error");
                            yield Ok(
                                Event::default()
                                    .event("error")
                                    .data("<span class=\"error\">Generation failed</span>"),
                            );
                            break;
                        }
                    }
                }
                // Signal completion
                yield Ok(Event::default().event("done").data("<span class=\"done\"></span>"));
            }
            Err(e) => {
                tracing::error!(error = ?e, "failed to start stream");
                yield Ok(
                    Event::default()
                        .event("error")
                        .data("<span class=\"error\">AI service unavailable</span>"),
                );
            }
        }
    };

    Sse::new(stream).keep_alive(KeepAlive::default())
}

The handler returns Sse<impl Stream>, which Axum sends as Content-Type: text/event-stream. Each text chunk from the model becomes an SSE event with an HTML fragment as its data.

On the browser side, htmx’s SSE extension consumes the events and swaps them into the page. The full SSE-to-htmx wiring (event subscription, sse-swap, connection lifecycle) is covered in the Server-Sent Events section. The relevant htmx markup:

<div hx-ext="sse"
     sse-connect="/ai/stream"
     sse-close="done">
    <div id="response" sse-swap="chunk" hx-swap="beforeend">
    </div>
</div>

sse-swap="chunk" appends each chunk event’s data to the target div. sse-close="done" closes the SSE connection when the stream completes.

Escaping HTML in streamed output

LLM output may contain characters that break HTML (<, >, &). If you render the output as raw HTML, you must escape it. Maud’s PreEscaped type handles this, but since streaming bypasses Maud’s template rendering, escape manually:

fn escape_html(s: &str) -> String {
    s.replace('&', "&amp;")
        .replace('<', "&lt;")
        .replace('>', "&gt;")
        .replace('"', "&quot;")
}

// In the stream loop:
let html = format!("<span>{}</span>", escape_html(&text.text));

If you want the model to produce HTML (e.g., for formatted responses), sanitise the output instead of escaping it. Use a library like ammonia to strip dangerous tags while preserving safe formatting.

Durable AI workflows with Restate

LLM calls are expensive, slow (seconds, not milliseconds), and rate-limited. A crashed process that loses a partially complete AI workflow wastes money and time. Wrapping AI calls in Restate gives you automatic retries, exactly-once execution, and crash recovery for every step.

The pattern: each LLM call goes inside a ctx.run() closure. If the process crashes after the call completes but before the next step starts, Restate replays from the journal and skips the completed call, returning the stored result without re-invoking the model.

A content generation workflow

A workflow that generates a product description, translates it, and stores the results. Each step is independently durable.

// crates/worker/src/content_gen.rs
use restate_sdk::prelude::*;
use rig::completion::Prompt;
use rig::providers::anthropic;
use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ContentRequest {
    pub product_id: String,
    pub product_name: String,
    pub product_details: String,
    pub target_languages: Vec<String>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ContentResult {
    pub product_id: String,
    pub description: String,
    pub translations: Vec<Translation>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Translation {
    pub language: String,
    pub text: String,
}

#[restate_sdk::workflow]
pub trait ContentGeneration {
    async fn run(request: Json<ContentRequest>) -> Result<Json<ContentResult>, HandlerError>;
    #[shared]
    async fn get_status() -> Result<String, HandlerError>;
}

pub struct ContentGenerationImpl;

impl ContentGeneration for ContentGenerationImpl {
    async fn run(
        &self,
        ctx: WorkflowContext<'_>,
        Json(request): Json<ContentRequest>,
    ) -> Result<Json<ContentResult>, HandlerError> {
        ctx.set("status", "Generating description...".to_string());

        // Step 1: Generate the product description
        let description: String = ctx
            .run(|| generate_description(request.clone()))
            .name("generate_description")
            .await?;

        // Step 2: Translate into each target language
        let mut translations = Vec::new();
        for language in &request.target_languages {
            ctx.set("status", format!("Translating to {language}..."));

            let translation: String = ctx
                .run(|| translate_text(description.clone(), language.clone()))
                .name(&format!("translate_{language}"))
                .await?;

            translations.push(Translation {
                language: language.clone(),
                text: translation,
            });
        }

        ctx.set("status", "Complete".to_string());

        Ok(Json(ContentResult {
            product_id: request.product_id,
            description,
            translations,
        }))
    }

    async fn get_status(
        &self,
        ctx: SharedWorkflowContext<'_>,
    ) -> Result<String, HandlerError> {
        Ok(ctx
            .get::<String>("status")
            .await?
            .unwrap_or_else(|| "Waiting to start...".to_string()))
    }
}

async fn generate_description(request: ContentRequest) -> Result<String, anyhow::Error> {
    let client = anthropic::Client::from_env();
    let agent = client
        .agent("claude-sonnet-4-20250514")
        .preamble(
            "You are a copywriter. Write a compelling product description \
             in 2-3 paragraphs. Be specific and highlight key features."
        )
        .temperature(0.7)
        .build();

    let prompt = format!(
        "Write a product description for: {}\n\nDetails: {}",
        request.product_name, request.product_details
    );

    Ok(agent.prompt(&prompt).await?)
}

async fn translate_text(text: String, language: String) -> Result<String, anyhow::Error> {
    let client = anthropic::Client::from_env();
    let agent = client
        .agent("claude-sonnet-4-20250514")
        .preamble(&format!(
            "You are a translator. Translate the following text into {language}. \
             Preserve the tone and style of the original."
        ))
        .temperature(0.3)
        .build();

    Ok(agent.prompt(&text).await?)
}

Each ctx.run() call wraps one LLM invocation. The side effect functions create their own Rig clients because Restate closures must be Send + 'static, which means they cannot borrow the handler’s context. Creating an Anthropic client is cheap (it is just an HTTP client with credentials), so this overhead is negligible compared to the LLM call itself.

If the worker crashes after generating the description but before the translations, Restate restarts the workflow and replays from the journal. The description step returns its stored result without calling the model again, and execution resumes with the first translation.

Triggering the workflow from Axum

Fire-and-forget from an Axum handler, with the workflow running in the background:

async fn generate_content(
    State(state): State<AppState>,
    Form(input): Form<ContentInput>,
) -> Result<Html<String>, AppError> {
    let request = ContentRequest {
        product_id: input.product_id.clone(),
        product_name: input.product_name,
        product_details: input.product_details,
        target_languages: vec!["fr".into(), "de".into(), "es".into()],
    };

    state
        .http
        .post(format!(
            "{}/ContentGeneration/{}/run/send",
            state.restate_ingress_url, input.product_id,
        ))
        .json(&request)
        .send()
        .await
        .map_err(|e| {
            tracing::error!(error = ?e, "failed to trigger content generation");
            AppError::BadGateway("could not start content generation".into())
        })?;

    Ok(Html(render_generation_progress(&input.product_id)))
}

The /send suffix makes the call fire-and-forget. The Restate workflow runs durably in the background. The rendered page can use SSE to display progress updates, following the same pattern shown in the Background Jobs section.

When to use Restate for AI calls

Wrap LLM calls in Restate when:

  • The call is part of a multi-step workflow where earlier steps are expensive to repeat
  • The result will be stored (database write, file creation) and losing it means re-running the model
  • You are chaining multiple model calls where later calls depend on earlier results
  • The operation is user-initiated and the user expects it to complete even if the server restarts

Skip Restate for:

  • Single low-latency completions served directly in the HTTP response (the basic handler pattern above)
  • Streaming responses where the user sees output in real time and can retry if it fails
  • Development and experimentation where durability adds friction

Prompt management

Hardcoded prompt strings work for simple cases. As your application grows, prompts need structure.

Preamble as configuration

Store system prompts in configuration rather than code. This lets you adjust model behaviour without redeploying:

#[derive(Clone)]
pub struct AiConfig {
    pub model: String,
    pub summarise_preamble: String,
    pub chat_preamble: String,
    pub temperature: f64,
}

impl AiConfig {
    pub fn from_env() -> Self {
        Self {
            model: std::env::var("AI_MODEL")
                .unwrap_or_else(|_| "claude-sonnet-4-20250514".to_string()),
            summarise_preamble: std::env::var("AI_SUMMARISE_PREAMBLE")
                .unwrap_or_else(|_| "You summarise text concisely in 2-3 sentences.".to_string()),
            chat_preamble: std::env::var("AI_CHAT_PREAMBLE")
                .unwrap_or_else(|_| "You are a helpful assistant.".to_string()),
            temperature: std::env::var("AI_TEMPERATURE")
                .ok()
                .and_then(|v| v.parse().ok())
                .unwrap_or(0.3),
        }
    }
}

Build agents from the configuration at startup:

let ai_config = AiConfig::from_env();
let anthropic = anthropic::Client::from_env();

let summarise_agent = anthropic
    .agent(&ai_config.model)
    .preamble(&ai_config.summarise_preamble)
    .temperature(ai_config.temperature)
    .build();

Prompt templates

For prompts that combine fixed instructions with dynamic data, format strings are sufficient:

let prompt = format!(
    "Classify the following support ticket into one of these categories: \
     billing, technical, account, other.\n\n\
     Respond with only the category name.\n\n\
     Ticket: {ticket_text}"
);

For more complex templates with conditional sections, build the prompt string with standard Rust string manipulation. There is no need for a dedicated templating engine for prompts. Rig’s .context() method on the agent builder is another option for injecting dynamic context alongside the preamble:

let agent = anthropic
    .agent("claude-sonnet-4-20250514")
    .preamble("You answer questions about the user's order history.")
    .context(&format!("Customer name: {}\nAccount since: {}", name, since))
    .build();

Context documents are sent alongside the preamble in every request, giving the model additional information without modifying the system prompt.

Retrieval-augmented generation

Retrieval-augmented generation (RAG) grounds an LLM’s answers in your data. Instead of relying on the model’s training data alone, you retrieve relevant documents from your database and include them in the prompt as context. The model answers based on what you provided, reducing hallucination and keeping responses current.

The pattern has three steps: embed the user’s query into a vector, search your database for similar documents, and inject the results into the prompt alongside the question.

The retrieval step

The Semantic Search section covers pgvector setup, embedding generation with Ollama, and similarity queries with SQLx. The functions below come directly from that section:

  • generate_embeddings() converts text into vectors via Ollama’s /api/embed endpoint
  • semantic_search() finds the most similar documents by cosine distance

If your application needs better retrieval quality, swap semantic_search() for the hybrid_search() function from the same section, which combines vector similarity with full-text search using Reciprocal Rank Fusion.

Building a RAG handler

Retrieve context, format it, and pass it to the agent in a single Axum handler:

use axum::{extract::State, response::Html, Form};
use rig::completion::Prompt;
use serde::Deserialize;

#[derive(Deserialize)]
struct AskInput {
    question: String,
}

async fn ask_with_context(
    State(state): State<AppState>,
    Form(input): Form<AskInput>,
) -> Result<Html<String>, AppError> {
    // Step 1: Embed the question
    let embeddings = generate_embeddings(
        &state.http,
        &state.config.ollama_url,
        &[&input.question],
    )
    .await
    .map_err(|e| {
        tracing::error!(error = ?e, "embedding generation failed");
        AppError::BadGateway("embedding service unavailable".into())
    })?;

    let query_embedding = embeddings
        .into_iter()
        .next()
        .ok_or_else(|| AppError::Internal("no embedding returned".into()))?;

    // Step 2: Retrieve relevant documents
    let documents = semantic_search(&state.db, query_embedding, 5).await?;

    // Step 3: Format context and build the prompt
    let context = documents
        .iter()
        .map(|doc| format!("## {}\n{}", doc.title, doc.content))
        .collect::<Vec<_>>()
        .join("\n\n");

    let prompt = format!(
        "Answer the question using only the provided documents. \
         If the documents do not contain enough information, say so.\n\n\
         {context}\n\n\
         Question: {}",
        input.question
    );

    let answer = state.agent.prompt(&prompt).await.map_err(|e| {
        tracing::error!(error = ?e, "RAG completion failed");
        AppError::BadGateway("AI service unavailable".into())
    })?;

    Ok(Html(format!(
        "<div class=\"answer\">{}</div>",
        escape_html(&answer)
    )))
}

The agent used here is the same one built at startup and stored in AppState, as shown in the basic completions section. The only difference is that the prompt now includes retrieved documents as context.

Context window management

Retrieved documents consume input tokens. A pgvector query returning five documents of 500 words each adds roughly 3,000 to 4,000 tokens to the prompt. Monitor this budget:

let max_context_chars = 8_000;
let context = if context.len() > max_context_chars {
    let truncated = &context[..max_context_chars];
    truncated
        .rfind("\n\n")
        .map(|pos| &truncated[..pos])
        .unwrap_or(truncated)
        .to_string()
} else {
    context
};

For large document sets, retrieve more candidates than you need (e.g., 10 to 20) and include only those that fit within your token budget. The similarity score from semantic_search() helps here: set a minimum threshold (e.g., 0.7) and discard documents below it.

Alternative: Rig’s dynamic_context

Rig provides a built-in RAG mechanism through .dynamic_context() on the agent builder. Combined with the rig-postgres companion crate, which implements VectorStoreIndex for pgvector, you can wire retrieval directly into the agent:

// Using rig-postgres (requires its own table schema)
let vector_store = PostgresVectorStore::default(embedding_model, pool);
let index = vector_store.index(embedding_model);

let agent = anthropic
    .agent("claude-sonnet-4-20250514")
    .preamble("Answer questions using the provided context.")
    .dynamic_context(5, index)
    .build();

With .dynamic_context(5, index), Rig automatically retrieves the top 5 similar documents before every prompt and injects them as context. This is convenient but less flexible: you cannot use hybrid search, you cannot filter results by similarity threshold, and rig-postgres requires its own table schema (id uuid, document jsonb, embedded_text text, embedding vector(N)) that differs from the typed columns established in the Semantic Search section. The manual approach gives you full control over retrieval and context formatting.

Agentic retrieval

Standard RAG retrieves context on every query regardless of whether the query needs it. “What is 2 + 2?” triggers a vector search that returns irrelevant results and wastes tokens. Agentic retrieval inverts this: the LLM decides when to search, what to search for, and whether to search again with a refined query.

This is a direct application of the Tool trait covered in the tool use section. Define a tool that wraps semantic search, attach it to an agent, and let the model decide when retrieval is appropriate.

A search tool

use rig::tool::Tool;
use rig::completion::ToolDefinition;
use serde::{Deserialize, Serialize};
use serde_json::json;

#[derive(Debug, Deserialize)]
struct SearchArgs {
    query: String,
    max_results: Option<u32>,
}

#[derive(Debug, thiserror::Error)]
#[error("knowledge base search failed: {0}")]
struct SearchError(String);

#[derive(Serialize, Deserialize)]
struct KnowledgeBaseSearch {
    db: sqlx::PgPool,
    http: reqwest::Client,
    ollama_url: String,
}

impl Tool for KnowledgeBaseSearch {
    const NAME: &'static str = "search_knowledge_base";

    type Error = SearchError;
    type Args = SearchArgs;
    type Output = String;

    async fn definition(&self, _prompt: String) -> ToolDefinition {
        ToolDefinition {
            name: "search_knowledge_base".to_string(),
            description: "Search the knowledge base for documents relevant to a query. \
                Use this when you need factual information to answer a question. \
                You can call this tool multiple times with different or refined \
                queries if the initial results are insufficient."
                .to_string(),
            parameters: json!({
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Natural language search query describing what information you need"
                    },
                    "max_results": {
                        "type": "integer",
                        "description": "Maximum number of documents to return (default 5)"
                    }
                },
                "required": ["query"]
            }),
        }
    }

    async fn call(&self, args: Self::Args) -> Result<Self::Output, Self::Error> {
        let limit = args.max_results.unwrap_or(5) as i64;

        let embeddings =
            generate_embeddings(&self.http, &self.ollama_url, &[&args.query])
                .await
                .map_err(|e| SearchError(e.to_string()))?;

        let query_embedding = embeddings
            .into_iter()
            .next()
            .ok_or_else(|| SearchError("no embedding returned".into()))?;

        let results = semantic_search(&self.db, query_embedding, limit)
            .await
            .map_err(|e| SearchError(e.to_string()))?;

        if results.is_empty() {
            return Ok("No relevant documents found.".to_string());
        }

        let formatted = results
            .iter()
            .map(|doc| {
                format!(
                    "## {} (relevance: {:.0}%)\n{}",
                    doc.title,
                    doc.similarity * 100.0,
                    doc.content
                )
            })
            .collect::<Vec<_>>()
            .join("\n\n");

        Ok(formatted)
    }
}

The tool wraps the same generate_embeddings() and semantic_search() functions from the Semantic Search section. The model receives the formatted results as text and reasons about them.

Two details in the tool definition matter for multi-turn retrieval:

  1. The description explicitly tells the model it can call the tool multiple times with different queries. Without this, models tend to search once and work with whatever comes back.
  2. Including the relevance percentage in the output helps the model judge whether the results are useful or whether a refined search is warranted.

Building the agent

use rig::tool::ToolDyn;

let search_tool = KnowledgeBaseSearch {
    db: pool.clone(),
    http: reqwest::Client::new(),
    ollama_url: config.ollama_url.clone(),
};

let tools: Vec<Box<dyn ToolDyn>> = vec![Box::new(search_tool)];

let agent = anthropic
    .agent("claude-sonnet-4-20250514")
    .preamble(
        "You are a knowledge assistant. You have access to a search tool that \
         queries the knowledge base. Use it when you need factual information to \
         answer a question. If your first search does not return relevant results, \
         try rephrasing the query or searching for related terms. When you have \
         enough information, answer the question directly. If you cannot find the \
         answer after searching, say so."
    )
    .tools(tools)
    .max_tokens(1024)
    .build();

The preamble instructs the model to search selectively and refine when needed. A single .prompt() call can trigger multiple search rounds: the model calls the tool, reads the results, decides they are too broad, calls the tool again with a more specific query, and synthesises an answer from the combined results. Rig manages this loop automatically, as described in the tool use section.

Wiring into an Axum handler

async fn ask_agent(
    State(state): State<AppState>,
    Form(input): Form<AskInput>,
) -> Result<Html<String>, AppError> {
    let answer = state.rag_agent.prompt(&input.question).await.map_err(|e| {
        tracing::error!(error = ?e, "agentic retrieval failed");
        AppError::BadGateway("AI service unavailable".into())
    })?;

    Ok(Html(format!(
        "<div class=\"answer\">{}</div>",
        escape_html(&answer)
    )))
}

The handler is simpler than the manual RAG handler because the agent manages retrieval internally. The trade-off is less control: you cannot inspect or filter the retrieved documents before they reach the model, and each query may trigger zero, one, or several search tool calls depending on the model’s judgement.

RAG vs agentic retrieval

Use standard RAG when:

  • Every query needs context from the knowledge base (e.g., a documentation Q&A system)
  • You want deterministic retrieval: same query always retrieves the same documents
  • You need to control exactly which documents the model sees

Use agentic retrieval when:

  • Queries vary widely and not all require retrieval (e.g., a general assistant that sometimes needs to look things up)
  • The model benefits from refining its search strategy based on initial results
  • You want the agent to combine multiple searches to answer complex questions

Both approaches can be made durable with Restate using the same patterns shown in the durable AI workflows section.

Gotchas

LLM calls are slow. A typical completion takes 1 to 10 seconds. Do not call them synchronously in a request that the user is waiting on unless you are streaming the response. For non-streaming use cases, trigger a Restate workflow and show progress.

Token limits are real. Each model has input and output token limits. If your prompt plus context exceeds the input limit, the API returns an error. Track prompt sizes, especially when injecting user-provided content or database results. Use .max_tokens() on the agent builder to cap output length.

Rate limits vary by provider. Anthropic, OpenAI, and other providers enforce rate limits on tokens per minute and requests per minute. Handle 429 Too Many Requests errors gracefully. Restate’s retry logic helps here: if a rate limit error is retryable, the journaled side effect retries automatically with backoff.

Model output is not safe HTML. Never insert raw LLM output into an HTML page without escaping or sanitising. Models can produce arbitrary text, including strings that look like HTML tags or script injections. Escape by default, sanitise only when you explicitly want formatted output.

Tool definitions need good descriptions. The model decides whether to call a tool based on the description field in ToolDefinition and the parameter descriptions. Vague descriptions lead to the model not calling tools when it should, or calling them with wrong arguments. Write descriptions as if explaining the tool to a colleague who will use it without seeing the implementation.

Rig creates HTTP clients internally. Each Rig provider client manages its own HTTP connection pool. This is separate from the reqwest::Client you use for other external API calls. Do not try to share a single reqwest::Client across both Rig and your own HTTP calls.

Ollama model availability. Ollama models must be pulled before use. If the model is not available locally, the API call fails. Pull models as part of your development setup, not at application startup. For production Ollama deployments, pre-bake models into the container image or volume.

Provider-specific features. Some features (vision, extended thinking, tool use with streaming) vary by provider and model. Test your specific provider/model combination. Rig’s unified interface covers the common surface, but edge cases may behave differently across providers.

Retrieved context is a prompt injection surface. In RAG, retrieved documents become part of the prompt. If your documents contain adversarial text (e.g., “Ignore previous instructions and…”), the model may follow it. This is a fundamental limitation of injecting external content into prompts. Sanitise stored content if it originates from untrusted sources, and do not treat model output from RAG queries as trusted.

Infrastructure

File Storage

File uploads and downloads are a core feature in most web applications: user avatars, document attachments, image galleries, CSV exports. The S3 API has become the standard interface for object storage, and every major provider implements it. Write your code against S3 once, then swap between a local development server and any production provider by changing environment variables.

This section covers setting up an S3-compatible storage backend, handling file uploads in Axum (both server-side and direct-to-storage), generating presigned URLs, and serving files back to users.

Dependencies

[dependencies]
rust-s3 = "0.37"
axum = { version = "0.8", features = ["multipart"] }
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
uuid = { version = "1", features = ["v4"] }

rust-s3 is a lightweight S3 client that works with any S3-compatible provider. It supports async operations via tokio out of the box, with a clean API centred on the Bucket type. The aws-sdk-s3 crate is the other option, but it pulls in a larger dependency tree and its API is more verbose. rust-s3 covers everything needed here.

The multipart feature on axum enables the Multipart extractor for handling file upload forms.

RustFS for local development

RustFS is an S3-compatible object storage server written in Rust. It serves as the local development replacement for production object storage, the same way PostgreSQL in Docker serves as the local database. RustFS is Apache 2.0 licensed, making it a good alternative to MinIO (AGPL).

Add RustFS to your Docker Compose file alongside your other backing services:

services:
  rustfs:
    image: rustfs/rustfs:latest
    ports:
      - "9000:9000"
      - "9001:9001"
    environment:
      RUSTFS_ACCESS_KEY: rustfsadmin
      RUSTFS_SECRET_KEY: rustfsadmin
      RUSTFS_CONSOLE_ENABLE: "true"
    volumes:
      - rustfs-data:/data
      - rustfs-logs:/logs
    command: /data

volumes:
  rustfs-data:
  rustfs-logs:

Port 9000 exposes the S3 API. Port 9001 exposes a web console for browsing buckets and objects. Default credentials are rustfsadmin / rustfsadmin.

After starting the container, create a bucket for development. You can do this through the web console at http://localhost:9001, or with the MinIO client CLI (which works with any S3-compatible server):

mc alias set rustfs http://localhost:9000 rustfsadmin rustfsadmin
mc mb rustfs/uploads

Configuring the S3 client

Build the Bucket handle from environment variables so the same code works in development (RustFS) and production (Hetzner Object Storage or any other provider).

use s3::bucket::Bucket;
use s3::creds::Credentials;
use s3::Region;

pub fn create_bucket() -> Box<Bucket> {
    let region = Region::Custom {
        region: env_var("S3_REGION"),
        endpoint: env_var("S3_ENDPOINT"),
    };

    let credentials = Credentials::new(
        Some(&env_var("S3_ACCESS_KEY")),
        Some(&env_var("S3_SECRET_KEY")),
        None,
        None,
        None,
    )
    .expect("valid S3 credentials");

    let bucket_name = env_var("S3_BUCKET");
    Bucket::new(&bucket_name, region, credentials).expect("valid S3 bucket configuration")
}

fn env_var(name: &str) -> String {
    std::env::var(name).unwrap_or_else(|_| panic!("{name} must be set"))
}

Region::Custom accepts any endpoint URL, which is how you point the client at RustFS locally or Hetzner in production. The Bucket type is the main handle for all S3 operations: uploads, downloads, listing, deletion, and presigned URL generation.

Add the bucket to your application state:

#[derive(Clone)]
pub struct AppState {
    pub db: sqlx::PgPool,
    pub bucket: Box<Bucket>,
}

Environment variables

For local development with RustFS:

S3_ENDPOINT=http://localhost:9000
S3_REGION=us-east-1
S3_ACCESS_KEY=rustfsadmin
S3_SECRET_KEY=rustfsadmin
S3_BUCKET=uploads

For production with Hetzner Object Storage:

S3_ENDPOINT=https://fsn1.your-objectstorage.com
S3_REGION=fsn1
S3_ACCESS_KEY=<hetzner-access-key>
S3_SECRET_KEY=<hetzner-secret-key>
S3_BUCKET=prod-uploads

Hetzner Object Storage provides S3-compatible storage in European data centres (Falkenstein, Nuremberg, Helsinki). Generate access keys from the Hetzner Cloud Console. The endpoint must match the region where your bucket was created.

Upload handling in Axum

Server-side upload via multipart

The most straightforward approach: the browser sends the file to your Axum handler via a standard HTML multipart form, and the handler uploads it to S3. No client-side JavaScript required, which fits the HDA model well.

The HTML form:

use maud::{html, Markup};

fn upload_form() -> Markup {
    html! {
        form method="post" action="/files" enctype="multipart/form-data" {
            label {
                "Choose file"
                input type="file" name="file" required;
            }
            button type="submit" { "Upload" }
        }
    }
}

The handler:

use axum::{
    extract::{Multipart, State},
    response::{IntoResponse, Redirect},
};
use uuid::Uuid;

pub async fn upload_file(
    State(state): State<AppState>,
    mut multipart: Multipart,
) -> Result<impl IntoResponse, AppError> {
    let field = multipart
        .next_field()
        .await?
        .ok_or(AppError::BadRequest("no file provided".into()))?;

    let original_name = field
        .file_name()
        .unwrap_or("unnamed")
        .to_string();

    let content_type = field
        .content_type()
        .unwrap_or("application/octet-stream")
        .to_string();

    let data = field.bytes().await?;

    // Generate a unique key to avoid collisions
    let ext = original_name
        .rsplit('.')
        .next()
        .unwrap_or("bin");
    let key = format!("uploads/{}.{}", Uuid::new_v4(), ext);

    state
        .bucket
        .put_object_with_content_type(&key, &data, &content_type)
        .await
        .map_err(|e| AppError::Internal(format!("S3 upload failed: {e}")))?;

    // Store the key and original filename in your database
    // sqlx::query!("INSERT INTO files ...")

    Ok(Redirect::to("/files"))
}

field.bytes() reads the entire file into memory. This is fine for files up to a few megabytes (avatars, documents). For larger files, use the presigned URL approach described below.

The object key uses a UUID to avoid filename collisions and path traversal issues. Store the mapping between the generated key and the original filename in your database.

Adjusting the body size limit

Axum’s default body size limit is 2 MB. For file uploads, you’ll typically need to raise this on the upload route:

use axum::{extract::DefaultBodyLimit, routing::post, Router};

let app = Router::new()
    .route("/files", post(upload_file))
    .layer(DefaultBodyLimit::max(25 * 1024 * 1024)); // 25 MB

Apply the limit to specific routes rather than globally. A 25 MB limit on your file upload route is reasonable; the same limit on your login form is not.

Direct upload via presigned URL

For larger files, skip the server entirely. The server generates a presigned PUT URL, and the browser uploads directly to S3. This avoids buffering the file through your application server, reducing memory usage and latency.

The flow:

  1. The browser requests a presigned upload URL from your server.
  2. The server generates a presigned PUT URL with a short expiry.
  3. The browser uploads the file directly to S3 using that URL.
  4. The browser notifies the server that the upload is complete.

The handler that generates the presigned URL:

use axum::{extract::State, Json};
use serde::{Deserialize, Serialize};
use uuid::Uuid;

#[derive(Deserialize)]
pub struct PresignedUploadRequest {
    pub filename: String,
    pub content_type: String,
}

#[derive(Serialize)]
pub struct PresignedUploadResponse {
    pub upload_url: String,
    pub object_key: String,
}

pub async fn presigned_upload_url(
    State(state): State<AppState>,
    Json(req): Json<PresignedUploadRequest>,
) -> Result<Json<PresignedUploadResponse>, AppError> {
    let ext = req
        .filename
        .rsplit('.')
        .next()
        .unwrap_or("bin");
    let key = format!("uploads/{}.{}", Uuid::new_v4(), ext);

    let url = state
        .bucket
        .presign_put(&key, 3600, None, None)
        .await
        .map_err(|e| AppError::Internal(format!("presign failed: {e}")))?;

    Ok(Json(PresignedUploadResponse {
        upload_url: url,
        object_key: key,
    }))
}

The presigned URL is valid for 3600 seconds (one hour). The client uploads with a PUT request to that URL. No credentials are needed on the client side because the URL itself contains the authentication signature.

On the client, a small amount of JavaScript handles the direct upload:

async function uploadFile(file) {
    // Step 1: Get presigned URL from server
    const res = await fetch("/files/presign", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({
            filename: file.name,
            content_type: file.type,
        }),
    });
    const { upload_url, object_key } = await res.json();

    // Step 2: Upload directly to S3
    await fetch(upload_url, {
        method: "PUT",
        headers: { "Content-Type": file.type },
        body: file,
    });

    // Step 3: Notify server that upload is complete
    await fetch("/files/confirm", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ object_key }),
    });
}

The confirmation step (step 3) is where the server records the file in the database. Without it, orphaned objects accumulate in S3 from abandoned or failed uploads. Set up a lifecycle rule on your bucket to automatically delete objects older than 24 hours from the uploads/ prefix if they were never confirmed.

Choosing between the two approaches

ConcernServer-side multipartPresigned URL
SimplicitySimpler. Standard HTML form, no JavaScript.Requires JavaScript for the upload flow.
Server loadFile passes through your server. Memory proportional to file size.File goes directly to S3. Server only generates a URL.
File sizePractical up to ~25 MB.Works for files of any size.
Progress trackingRequires SSE or polling for progress on large files.Browser fetch can report upload progress natively.
HDA fitWorks naturally with forms and htmx.Requires a small JavaScript module.

Use server-side multipart as the default for typical uploads (documents, images, avatars). Switch to presigned URLs when files are large enough that buffering them through the server becomes a problem.

Serving files to users

Presigned GET URLs

The simplest way to serve a file: generate a presigned GET URL and redirect the browser to it.

pub async fn download_file(
    State(state): State<AppState>,
    Path(file_id): Path<Uuid>,
) -> Result<impl IntoResponse, AppError> {
    // Look up the file record in the database
    let file = sqlx::query_as!(
        FileRecord,
        "SELECT object_key, original_name FROM files WHERE id = $1",
        file_id
    )
    .fetch_optional(&state.db)
    .await?
    .ok_or(AppError::NotFound("file not found".into()))?;

    let url = state
        .bucket
        .presign_get(&file.object_key, 3600, None)
        .await
        .map_err(|e| AppError::Internal(format!("presign failed: {e}")))?;

    Ok(Redirect::temporary(&url))
}

The presigned URL expires after an hour. The browser follows the redirect and downloads directly from S3. This keeps file serving load off your application server.

Controlling Content-Disposition

By default, browsers display files inline if they can (images, PDFs). To force a download with the original filename, pass response-content-disposition as a query parameter on the presigned URL:

use std::collections::HashMap;

pub async fn download_file_as_attachment(
    State(state): State<AppState>,
    Path(file_id): Path<Uuid>,
) -> Result<impl IntoResponse, AppError> {
    let file = sqlx::query_as!(
        FileRecord,
        "SELECT object_key, original_name FROM files WHERE id = $1",
        file_id
    )
    .fetch_optional(&state.db)
    .await?
    .ok_or(AppError::NotFound("file not found".into()))?;

    let mut queries = HashMap::new();
    queries.insert(
        "response-content-disposition".to_string(),
        format!("attachment; filename=\"{}\"", file.original_name),
    );

    let url = state
        .bucket
        .presign_get(&file.object_key, 3600, Some(queries))
        .await
        .map_err(|e| AppError::Internal(format!("presign failed: {e}")))?;

    Ok(Redirect::temporary(&url))
}

Use attachment when the user explicitly clicks a download link. Use inline (or omit the header) when displaying an image or PDF in the browser.

Proxy handler for access-controlled files

Presigned URLs are convenient but have a limitation: once generated, anyone with the URL can access the file until it expires. For files that require per-request access control (private documents, paid content), proxy the file through your server instead.

use axum::{
    body::Body,
    http::{header, StatusCode},
    response::Response,
};

pub async fn serve_private_file(
    State(state): State<AppState>,
    Path(file_id): Path<Uuid>,
    user: AuthenticatedUser,
) -> Result<Response, AppError> {
    let file = sqlx::query_as!(
        FileRecord,
        "SELECT object_key, original_name, content_type, owner_id FROM files WHERE id = $1",
        file_id
    )
    .fetch_optional(&state.db)
    .await?
    .ok_or(AppError::NotFound("file not found".into()))?;

    // Check access
    if file.owner_id != user.id {
        return Err(AppError::Forbidden("not your file".into()));
    }

    let response = state
        .bucket
        .get_object(&file.object_key)
        .await
        .map_err(|e| AppError::Internal(format!("S3 get failed: {e}")))?;

    Ok(Response::builder()
        .status(StatusCode::OK)
        .header(header::CONTENT_TYPE, &file.content_type)
        .header(
            header::CONTENT_DISPOSITION,
            format!("inline; filename=\"{}\"", file.original_name),
        )
        .body(Body::from(response.to_vec()))
        .unwrap())
}

This approach loads the entire file into memory before sending it to the client. For large files behind access control, consider generating a short-lived presigned URL (60 seconds) after the access check passes, then redirecting. This gives you per-request authorisation without proxying the bytes.

Image thumbnails

For image-heavy applications (galleries, user avatars), serve resized thumbnails instead of full-size originals. Generate thumbnails at upload time and store them as separate S3 objects.

use image::imageops::FilterType;
use std::io::Cursor;

fn generate_thumbnail(data: &[u8], max_dimension: u32) -> Result<Vec<u8>, image::ImageError> {
    let img = image::load_from_memory(data)?;
    let thumb = img.resize(max_dimension, max_dimension, FilterType::Lanczos3);

    let mut buf = Vec::new();
    thumb.write_to(&mut Cursor::new(&mut buf), image::ImageFormat::WebP)?;
    Ok(buf)
}

Add the image crate to your dependencies:

[dependencies]
image = { version = "0.25", default-features = false, features = ["webp", "jpeg", "png"] }

Store the thumbnail alongside the original:

let thumb_data = generate_thumbnail(&data, 300)?;
let thumb_key = format!("uploads/thumb_{}.webp", file_id);

state
    .bucket
    .put_object_with_content_type(&thumb_key, &thumb_data, "image/webp")
    .await?;

WebP produces smaller files than JPEG at equivalent quality. Disable default features on the image crate and enable only the formats you need, as the full feature set pulls in decoders you won’t use.

For applications where thumbnail generation is slow or needs to handle many formats, move the processing to a background job via Restate and update the database record when the thumbnail is ready.

Deleting files

When a user deletes a record that has an associated file, delete the S3 object as well:

pub async fn delete_file(
    State(state): State<AppState>,
    Path(file_id): Path<Uuid>,
) -> Result<impl IntoResponse, AppError> {
    let file = sqlx::query_as!(
        FileRecord,
        "SELECT object_key FROM files WHERE id = $1",
        file_id
    )
    .fetch_optional(&state.db)
    .await?
    .ok_or(AppError::NotFound("file not found".into()))?;

    state
        .bucket
        .delete_object(&file.object_key)
        .await
        .map_err(|e| AppError::Internal(format!("S3 delete failed: {e}")))?;

    sqlx::query!("DELETE FROM files WHERE id = $1", file_id)
        .execute(&state.db)
        .await?;

    Ok(Redirect::to("/files"))
}

Delete the S3 object before the database record. If the S3 deletion fails, the database record remains and you can retry. If you delete the database record first and the S3 deletion fails, you have an orphaned object with no reference to it.

Database schema for file records

A minimal files table to track uploaded objects:

CREATE TABLE files (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    object_key TEXT NOT NULL,
    original_name TEXT NOT NULL,
    content_type TEXT NOT NULL,
    size_bytes BIGINT NOT NULL,
    owner_id UUID NOT NULL REFERENCES users(id),
    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_files_owner ON files(owner_id);

The object_key is the S3 path. The original_name is what the user uploaded. Keep both: the key for S3 operations, the name for display and download headers.

Production providers

The S3 API is a de facto standard. The Region::Custom configuration in rust-s3 means any compliant provider works without code changes.

Hetzner Object Storage is a good default for European deployments. EUR 4.99/month includes 1 TB of storage and 1 TB of egress. Three EU regions: Falkenstein (fsn1), Nuremberg (nbg1), and Helsinki (hel1). Endpoints follow the pattern https://{region}.your-objectstorage.com. Generate access keys from the Hetzner Cloud Console.

Other S3-compatible providers worth considering:

  • Cloudflare R2: No egress fees. Good for globally distributed read-heavy workloads.
  • Backblaze B2: Cheap storage at $6/TB/month. Free egress to Cloudflare via the Bandwidth Alliance.
  • AWS S3: The original. More expensive, but the widest feature set and the most mature tooling ecosystem.
  • DigitalOcean Spaces: Simple pricing, CDN included. $5/month for 250 GB.

Switching providers means changing four environment variables. No code changes required.

Gotchas

Set Content-Type on upload. If you upload without setting the content type, S3 defaults to application/octet-stream. Browsers then download the file instead of displaying it inline, even for images and PDFs. Always pass the content type from the upload form to put_object_with_content_type.

Generate unique object keys. Never use the original filename as the S3 key. Users upload files named document.pdf constantly. Use a UUID or similar unique identifier. This also prevents path traversal attacks where a crafted filename like ../../etc/passwd could cause problems.

Handle the 2 MB default body limit. Axum rejects request bodies larger than 2 MB by default. If your upload handler returns a 413 Payload Too Large error, you forgot to raise the limit with DefaultBodyLimit::max(). Apply the higher limit only to upload routes.

Clean up orphaned objects. Presigned URL uploads can be abandoned partway through. Failed server-side uploads might write to S3 but crash before recording the database entry. Set an S3 lifecycle rule to expire unconfirmed objects after 24 hours, or run a periodic cleanup job that compares S3 contents against database records.

Delete S3 objects before database records. If the S3 delete fails, the database record survives and you can retry. The reverse order leaves orphaned objects you can’t find.

Watch for CORS with presigned URLs. When using direct browser uploads, the S3 endpoint must return appropriate CORS headers. Configure CORS on the bucket to allow PUT requests from your application’s origin. RustFS and most S3 providers support bucket-level CORS configuration.

Don’t store files in the database. It’s tempting to store small files as BYTEA columns in PostgreSQL. Resist this. It bloats your database, makes backups slower, and prevents you from using CDN or S3 features like presigned URLs. Object storage exists for a reason.

Email

Transactional email is the backbone of account management: registration confirmations, password resets, order receipts, notification digests. lettre is the standard Rust crate for composing and sending email via SMTP. It provides a message builder, SMTP transport with TLS, connection pooling, and async support via Tokio.

This section covers configuring lettre for SMTP, composing emails with Maud templates, sending asynchronously without blocking request handlers, and testing with MailCrab in development.

Dependencies

[dependencies]
lettre = { version = "0.11", features = ["tokio1", "tokio1-rustls", "aws-lc-rs"] }
tokio = { version = "1", features = ["full"] }
maud = { version = "0.26", features = ["axum"] }

The tokio1 feature enables AsyncSmtpTransport. tokio1-rustls provides TLS via rustls, consistent with the rest of the stack (reqwest uses rustls by default). The aws-lc-rs feature selects the crypto provider that rustls requires.

lettre’s default features include the synchronous SMTP transport, the message builder, hostname detection, and connection pooling. Adding the three features above layers async Tokio support on top.

SMTP configuration

Configure the SMTP transport from environment variables so the same code works in development (MailCrab) and production (your SMTP provider).

use lettre::{
    transport::smtp::authentication::Credentials,
    AsyncSmtpTransport, Tokio1Executor,
};

pub fn build_mailer() -> AsyncSmtpTransport<Tokio1Executor> {
    let host = env_var("SMTP_HOST");
    let username = std::env::var("SMTP_USERNAME").ok();
    let password = std::env::var("SMTP_PASSWORD").ok();

    let mut builder = if env_var_or("SMTP_TLS", "true") == "true" {
        AsyncSmtpTransport::<Tokio1Executor>::relay(&host)
            .expect("valid SMTP relay hostname")
    } else {
        AsyncSmtpTransport::<Tokio1Executor>::builder_dangerous(&host)
    };

    if let (Some(user), Some(pass)) = (username, password) {
        builder = builder.credentials(Credentials::new(user, pass));
    }

    if let Ok(port) = std::env::var("SMTP_PORT") {
        builder = builder.port(port.parse().expect("SMTP_PORT must be a number"));
    }

    builder.build()
}

fn env_var(name: &str) -> String {
    std::env::var(name).unwrap_or_else(|_| panic!("{name} must be set"))
}

fn env_var_or(name: &str, default: &str) -> String {
    std::env::var(name).unwrap_or_else(|_| default.to_string())
}

builder_dangerous disables TLS certificate verification. Use it only for local development with MailCrab, where there is no TLS. In production, relay() establishes an implicit TLS connection (port 465) and validates the server certificate.

Add the mailer to your application state:

use lettre::{AsyncSmtpTransport, Tokio1Executor};

#[derive(Clone)]
pub struct AppState {
    pub db: sqlx::PgPool,
    pub mailer: AsyncSmtpTransport<Tokio1Executor>,
}

AsyncSmtpTransport implements Clone and manages a connection pool internally, so sharing it through Axum state is safe and efficient.

Environment variables

For local development with MailCrab:

SMTP_HOST=localhost
SMTP_PORT=1025
SMTP_TLS=false

For production with an SMTP provider:

SMTP_HOST=smtp.example.com
SMTP_PORT=465
SMTP_TLS=true
SMTP_USERNAME=apikey
SMTP_PASSWORD=<your-smtp-api-key>

Most transactional email providers (Postmark, Resend, Amazon SES, Mailgun) expose an SMTP interface. Use their SMTP credentials here. No provider-specific SDK is needed.

Sender address

Define a default sender address in configuration rather than hardcoding it in every email:

use lettre::message::Mailbox;

pub fn default_sender() -> Mailbox {
    let address = env_var("EMAIL_FROM");
    address.parse().expect("EMAIL_FROM must be a valid email address")
}
EMAIL_FROM="MyApp <noreply@example.com>"

Composing messages

lettre’s Message::builder() provides a fluent API for constructing RFC-compliant email messages.

Plain text

use lettre::Message;
use lettre::message::header::ContentType;

let email = Message::builder()
    .from(default_sender())
    .to("user@example.com".parse().unwrap())
    .subject("Your password has been changed")
    .header(ContentType::TEXT_PLAIN)
    .body("Your password was changed successfully. If you did not make this change, contact support immediately.".to_string())
    .expect("valid email message");

HTML with plain text fallback

Every HTML email should include a plain text alternative. Some email clients only render plain text, and spam filters penalise HTML-only messages.

use lettre::message::MultiPart;

let email = Message::builder()
    .from(default_sender())
    .to("user@example.com".parse().unwrap())
    .subject("Confirm your email address")
    .multipart(MultiPart::alternative_plain_html(
        "Visit this link to confirm: https://example.com/confirm?token=abc123".to_string(),
        "<p>Click <a href=\"https://example.com/confirm?token=abc123\">here</a> to confirm your email address.</p>".to_string(),
    ))
    .expect("valid email message");

MultiPart::alternative_plain_html creates a multipart/alternative body with both variants. The email client picks whichever it prefers (typically HTML if available).

Attachments

use lettre::message::{Attachment, MultiPart, SinglePart};
use lettre::message::header::ContentType;

let pdf_bytes = std::fs::read("invoice.pdf").expect("read invoice");
let attachment = Attachment::new("invoice.pdf".to_string())
    .body(pdf_bytes, ContentType::parse("application/pdf").unwrap());

let email = Message::builder()
    .from(default_sender())
    .to("customer@example.com".parse().unwrap())
    .subject("Your invoice")
    .multipart(
        MultiPart::mixed()
            .multipart(MultiPart::alternative_plain_html(
                "Your invoice is attached.".to_string(),
                "<p>Your invoice is attached.</p>".to_string(),
            ))
            .singlepart(attachment),
    )
    .expect("valid email message");

MultiPart::mixed() combines the message body with attachments. Nest a MultiPart::alternative_plain_html inside it for the text content, then append attachments with .singlepart().

Email templates with Maud

Maud is already in the stack for HTML rendering. Use it for email templates too. This keeps all HTML generation in one system, with compile-time checking and the same component patterns as your page templates.

Email HTML is not web HTML. Email clients ignore external stylesheets, strip <style> tags inconsistently, and have limited CSS support. Inline styles on every element are the only reliable approach.

A base email layout

use maud::{html, Markup, PreEscaped, DOCTYPE};

pub fn email_layout(title: &str, content: Markup) -> Markup {
    html! {
        (DOCTYPE)
        html lang="en" {
            head {
                meta charset="utf-8";
                meta name="viewport" content="width=device-width, initial-scale=1.0";
                title { (title) }
            }
            body style="margin: 0; padding: 0; background-color: #f4f4f5; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;" {
                table role="presentation" width="100%" cellpadding="0" cellspacing="0" style="background-color: #f4f4f5;" {
                    tr {
                        td align="center" style="padding: 40px 20px;" {
                            table role="presentation" width="600" cellpadding="0" cellspacing="0" style="background-color: #ffffff; border-radius: 8px; overflow: hidden;" {
                                // Header
                                tr {
                                    td style="padding: 32px 40px 24px; background-color: #18181b;" {
                                        h1 style="margin: 0; color: #ffffff; font-size: 20px; font-weight: 600;" {
                                            "MyApp"
                                        }
                                    }
                                }
                                // Body
                                tr {
                                    td style="padding: 32px 40px;" {
                                        (content)
                                    }
                                }
                                // Footer
                                tr {
                                    td style="padding: 24px 40px; border-top: 1px solid #e4e4e7;" {
                                        p style="margin: 0; color: #71717a; font-size: 13px; line-height: 1.5;" {
                                            "You received this email because you have an account at MyApp. "
                                            "If you did not expect this, you can ignore it."
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Table-based layout is intentional. Email clients (particularly Outlook) have poor support for modern CSS layout. Tables are the only layout method that renders consistently across Gmail, Outlook, Apple Mail, and others. role="presentation" tells screen readers these tables are structural, not data.

A transactional email template

A password reset email, using the layout above:

pub fn password_reset_email(reset_url: &str, expires_in_hours: u32) -> (String, String) {
    let html_body = email_layout("Reset your password", html! {
        h2 style="margin: 0 0 16px; color: #18181b; font-size: 22px; font-weight: 600;" {
            "Reset your password"
        }
        p style="margin: 0 0 16px; color: #3f3f46; font-size: 15px; line-height: 1.6;" {
            "We received a request to reset the password for your account. "
            "Click the button below to choose a new password."
        }
        table role="presentation" cellpadding="0" cellspacing="0" style="margin: 24px 0;" {
            tr {
                td style="background-color: #18181b; border-radius: 6px;" {
                    a href=(reset_url) style="display: inline-block; padding: 12px 32px; color: #ffffff; font-size: 15px; font-weight: 600; text-decoration: none;" {
                        "Reset password"
                    }
                }
            }
        }
        p style="margin: 0 0 8px; color: #71717a; font-size: 13px; line-height: 1.5;" {
            "This link expires in " (expires_in_hours) " hours. If you did not request a password reset, ignore this email."
        }
        p style="margin: 0; color: #71717a; font-size: 13px; line-height: 1.5; word-break: break-all;" {
            "If the button does not work, copy and paste this URL into your browser: "
            (reset_url)
        }
    });

    let plain_body = format!(
        "Reset your password\n\n\
         We received a request to reset the password for your account.\n\n\
         Visit this link to choose a new password:\n{reset_url}\n\n\
         This link expires in {expires_in_hours} hours. \
         If you did not request a password reset, ignore this email."
    );

    (html_body.into_string(), plain_body)
}

The function returns both HTML and plain text bodies. The caller passes them to MultiPart::alternative_plain_html. Always include the raw URL in the HTML body for clients that strip links or where the button fails to render. The plain text version covers email clients that don’t render HTML.

Sending a templated email

use lettre::{AsyncTransport, Message};
use lettre::message::MultiPart;

pub async fn send_password_reset(
    mailer: &AsyncSmtpTransport<Tokio1Executor>,
    to: &str,
    reset_url: &str,
) -> Result<(), lettre::transport::smtp::Error> {
    let (html_body, plain_body) = password_reset_email(reset_url, 24);

    let email = Message::builder()
        .from(default_sender())
        .to(to.parse().unwrap())
        .subject("Reset your password")
        .multipart(MultiPart::alternative_plain_html(plain_body, html_body))
        .expect("valid email message");

    mailer.send(email).await?;
    Ok(())
}

Async sending patterns

Email delivery takes time. An SMTP handshake, TLS negotiation, and message transfer can take 500ms to several seconds, depending on the provider. Awaiting this inline in a request handler adds that latency directly to the user’s response time.

Fire-and-forget with tokio::spawn

For emails where delivery latency should not block the response (confirmations, notifications, receipts), spawn the send as a background task:

use axum::{extract::State, response::Redirect};
use lettre::AsyncTransport;

pub async fn handle_password_reset_request(
    State(state): State<AppState>,
    form: axum::extract::Form<PasswordResetForm>,
) -> Result<Redirect, AppError> {
    let user = sqlx::query_as!(
        User,
        "SELECT id, email FROM users WHERE email = $1",
        form.email
    )
    .fetch_optional(&state.db)
    .await?;

    // Always redirect, even if the email doesn't exist (prevents enumeration)
    if let Some(user) = user {
        let token = generate_reset_token(&state.db, user.id).await?;
        let reset_url = format!("https://example.com/reset?token={token}");
        let mailer = state.mailer.clone();

        tokio::spawn(async move {
            if let Err(e) = send_password_reset(&mailer, &user.email, &reset_url).await {
                tracing::error!(error = ?e, email = %user.email, "failed to send password reset email");
            }
        });
    }

    Ok(Redirect::to("/reset-sent"))
}

tokio::spawn moves the email send to a separate Tokio task. The handler returns the redirect immediately. If the send fails, it logs the error but does not surface it to the user.

This pattern is appropriate for most transactional email. The trade-off: if the application crashes between spawning the task and the send completing, the email is lost. For the vast majority of transactional emails (confirmations, notifications, receipts), this is acceptable. The user can re-trigger the action.

Durable delivery with Restate

For emails that must be delivered (invoice delivery, compliance notifications, onboarding sequences), a tokio::spawn that disappears on crash is not sufficient. Use Restate to make the send durable. Restate persists the operation, retries on failure, and survives process restarts. This is the same pattern described in the background jobs section: trigger a Restate workflow from the handler and let the workflow handle the email send with guaranteed delivery.

MailCrab for local development

MailCrab is an email testing server that accepts all SMTP traffic and displays it in a web interface. It replaces the need for a real SMTP provider during development. Point your application’s SMTP configuration at MailCrab and every email your application sends appears in the web UI, where you can inspect the HTML rendering, headers, and plain text.

MailCrab runs as a Docker container alongside your other backing services. Add it to your project’s Docker Compose file:

services:
  mailcrab:
    image: marlonb/mailcrab:latest
    ports:
      - "1025:1025"
      - "1080:1080"

Port 1025 is the SMTP server. Port 1080 is the web UI. Open http://localhost:1080 in your browser to see captured emails in real time.

MailCrab accepts all mail regardless of sender, recipient, or credentials. No accounts or configuration are needed. It stores messages in memory only, so restarting the container clears all captured email. This is a feature, not a limitation, for a development tool.

Gotchas

Inline styles in email HTML. External stylesheets and <style> blocks are stripped or ignored by many email clients. Gmail strips <style> tags entirely. Outlook uses the Word rendering engine. Put styles directly on elements with style="". This is tedious but necessary.

Always include a plain text body. HTML-only emails are more likely to be flagged as spam. Some corporate email clients render plain text only. MultiPart::alternative_plain_html makes this straightforward.

Don’t send email synchronously in handlers. An SMTP send can take several seconds. If you await it in the handler, the user waits for the email to be sent before seeing a response. Use tokio::spawn for fire-and-forget, or Restate for durable delivery.

Validate email addresses before sending. "user@example.com".parse::<lettre::Address>() validates the address format. Catch invalid addresses at form validation time, not at send time, to give users clear error messages.

Test with MailCrab, not your production provider. Sending test emails through a real provider risks hitting rate limits, landing on blocklists, and sending unintended email to real addresses. MailCrab catches everything locally.

Watch for from address restrictions. Most SMTP providers restrict which From addresses you can use. You typically need to verify the domain or specific address in the provider’s dashboard before sending from it. Emails with unverified From addresses are silently dropped or rejected.

Connection pooling is automatic. AsyncSmtpTransport pools connections internally. Do not create a new transport per request. Build one at startup and share it through application state, the same pattern as reqwest::Client and sqlx::PgPool.

Configuration and Secrets

Application configuration follows the twelve-factor methodology: store config in environment variables. Database connection strings, SMTP credentials, S3 keys, listen addresses, and feature flags all come from the environment. The same code runs in development and production. Only the source of the variables changes.

This section covers loading .env files in development with dotenvy, parsing environment variables into a typed Config struct at startup, protecting secrets with the secrecy crate, and managing secrets in production with Docker Compose and Terraform.

Dependencies

The app-config crate handles all environment variable parsing. It depends on dotenvy for .env file loading and secrecy for wrapping sensitive values.

# crates/config/Cargo.toml
[package]
name = "app-config"
edition.workspace = true
version.workspace = true

[lints]
workspace = true

[dependencies]
dotenvy.workspace = true
secrecy = "0.10"

Add secrecy to the workspace dependency table:

# Cargo.toml (workspace root)
[workspace.dependencies]
# ...existing dependencies...
secrecy = "0.10"

Then use secrecy.workspace = true in the config crate.

Loading .env files with dotenvy

dotenvy loads environment variables from a .env file at the project root. It is a maintained fork of the original dotenv crate, which was flagged as unmaintained in RUSTSEC-2021-0141.

dotenvy::dotenv() reads the .env file and calls std::env::set_var for each entry. Existing environment variables are not overridden, so production values injected by the deployment platform always take precedence over a stale .env file.

Rust 2024 edition safety

In Rust 2024 edition (the edition this project targets), std::env::set_var is unsafe. The underlying setenv libc call is not thread-safe: calling it while other threads read environment variables is a data race. Rust’s internal mutex only protects std::env calls, not libc-level getenv calls from other libraries.

The fix is to call dotenvy::dotenv() before the Tokio runtime starts, while the process is still single-threaded. Separate the synchronous entry point from the async runtime:

// crates/server/src/main.rs
use anyhow::Result;
use tracing_subscriber::EnvFilter;

fn main() {
    dotenvy::dotenv().ok();
    run();
}

#[tokio::main]
async fn run() {
    tracing_subscriber::fmt()
        .with_env_filter(EnvFilter::from_default_env())
        .init();

    let config = app_config::load();
    let db = app_db::connect(config.database_url.expose_secret()).await;

    let listener = tokio::net::TcpListener::bind(&config.listen_addr)
        .await
        .expect("failed to bind listener");
    tracing::info!("listening on {}", config.listen_addr);

    let app = app_web::router(db);
    axum::serve(listener, app).await.expect("server error");
}

dotenv().ok() silently ignores a missing .env file. In production, no .env file exists and all variables come from the deployment platform. In development, the .env file is present and its values are loaded before any threads are spawned.

The Config struct

Parse all environment variables into a single typed struct at startup. If a required variable is missing or malformed, the application panics immediately with a clear error message. Failing fast at startup is better than discovering a missing variable minutes later when a handler first needs it.

// crates/config/src/lib.rs
use secrecy::{ExposeSecret, SecretString};

pub struct Config {
    // Server
    pub listen_addr: String,

    // Database
    pub database_url: SecretString,

    // Email
    pub smtp_host: String,
    pub smtp_port: u16,
    pub smtp_tls: bool,
    pub smtp_username: Option<String>,
    pub smtp_password: Option<SecretString>,
    pub email_from: String,

    // Object storage
    pub s3_endpoint: String,
    pub s3_region: String,
    pub s3_bucket: String,
    pub s3_access_key: SecretString,
    pub s3_secret_key: SecretString,
}

pub fn load() -> Config {
    Config {
        listen_addr: format!(
            "{}:{}",
            optional("HOST", "127.0.0.1"),
            optional("PORT", "3000"),
        ),

        database_url: required_secret("DATABASE_URL"),

        smtp_host: required("SMTP_HOST"),
        smtp_port: parse("SMTP_PORT", 1025),
        smtp_tls: optional("SMTP_TLS", "false") == "true",
        smtp_username: std::env::var("SMTP_USERNAME").ok(),
        smtp_password: optional_secret("SMTP_PASSWORD"),
        email_from: required("EMAIL_FROM"),

        s3_endpoint: required("S3_ENDPOINT"),
        s3_region: optional("S3_REGION", "us-east-1"),
        s3_bucket: required("S3_BUCKET"),
        s3_access_key: required_secret("S3_ACCESS_KEY"),
        s3_secret_key: required_secret("S3_SECRET_KEY"),
    }
}

fn required(name: &str) -> String {
    std::env::var(name).unwrap_or_else(|_| panic!("{name} must be set"))
}

fn required_secret(name: &str) -> SecretString {
    SecretString::from(required(name))
}

fn optional_secret(name: &str) -> Option<SecretString> {
    std::env::var(name).ok().map(SecretString::from)
}

fn optional(name: &str, default: &str) -> String {
    std::env::var(name).unwrap_or_else(|_| default.to_string())
}

fn parse<T: std::str::FromStr>(name: &str, default: T) -> T
where
    T::Err: std::fmt::Debug,
{
    match std::env::var(name) {
        Ok(val) => val
            .parse()
            .unwrap_or_else(|_| panic!("{name} must be a valid {}", std::any::type_name::<T>())),
        Err(_) => default,
    }
}

Five helper functions cover every case: required panics on missing variables, required_secret wraps the value in SecretString, optional_secret returns Option<SecretString> for credentials that are only present in some environments, optional provides a default, and parse handles non-string types. These helpers include the variable name in the panic message, which std::env::var does not do on its own.

Sharing config through application state

Pass the Config into Axum’s application state alongside the database pool and other shared resources:

use app_config::Config;
use std::sync::Arc;

#[derive(Clone)]
pub struct AppState {
    pub config: Arc<Config>,
    pub db: sqlx::PgPool,
}

Wrap Config in Arc because it does not derive Clone (the SecretString fields implement Clone, but wrapping in Arc avoids cloning secret data on every request). Handlers access configuration through the state extractor:

use axum::extract::State;

async fn some_handler(State(state): State<AppState>) {
    let from = &state.config.email_from;
    // ...
}

Optional configuration with Option<T>

Use Option<T> for configuration that is only present in some environments. SMTP credentials are a common example: MailCrab in development accepts unauthenticated connections, while production SMTP providers require credentials.

pub smtp_username: Option<String>,
pub smtp_password: Option<SecretString>,

Read optional variables with .ok(), which converts Err(VarError::NotPresent) to None:

smtp_username: std::env::var("SMTP_USERNAME").ok(),
smtp_password: std::env::var("SMTP_PASSWORD").ok().map(SecretString::from),

This pattern works for any feature that is conditionally enabled: a Sentry DSN, a Redis URL for caching, external API keys for optional integrations. The handler checks if let Some(ref password) = config.smtp_password and adapts its behaviour.

Protecting secrets with secrecy

The secrecy crate wraps sensitive values in SecretBox<T>, which provides two protections:

  1. Memory zeroing. When a SecretBox is dropped, its contents are overwritten with zeros before deallocation. This prevents secrets from lingering in freed memory where a crash dump or memory scan could recover them.

  2. Debug redaction. The Debug implementation prints SecretBox<str>([REDACTED]) instead of the actual value. If a Config struct ends up in a panic message, log line, or error report, secrets are not exposed.

Access the underlying value explicitly with expose_secret():

use secrecy::ExposeSecret;

let pool = sqlx::PgPool::connect(config.database_url.expose_secret()).await?;

The explicit .expose_secret() call makes every point where a secret is used visible and auditable. A grep for expose_secret shows exactly where secrets flow in the codebase.

What to wrap

Wrap values that would cause damage if leaked: database connection strings (which contain passwords), API keys, SMTP passwords, S3 secret keys, session signing keys. Do not wrap non-sensitive values like hostnames, ports, or log levels.

.env.example

Commit a .env.example file to version control that documents every variable the application needs. New developers copy it to .env and fill in real values. Any time a new variable is added to the Config struct, add a corresponding entry to .env.example.

# Server
HOST=127.0.0.1
PORT=3000

# Database
DATABASE_URL=postgres://postgres:password@localhost:5432/myapp_dev

# Email (MailCrab in development)
SMTP_HOST=localhost
SMTP_PORT=1025
SMTP_TLS=false
EMAIL_FROM="MyApp <noreply@example.com>"

# Object storage (RustFS in development)
S3_ENDPOINT=http://localhost:9000
S3_REGION=us-east-1
S3_BUCKET=myapp
S3_ACCESS_KEY=minioadmin
S3_SECRET_KEY=minioadmin

# Optional
# SMTP_USERNAME=
# SMTP_PASSWORD=
# SENTRY_DSN=

Add .env to .gitignore to prevent committing real secrets. The .env.example file is safe to commit because it contains only placeholder values and local development defaults.

.env
!.env.example

Secrets in production

In production, environment variables come from the deployment platform, not a .env file. The Config::load() function does not care where the variables originate. It calls std::env::var(), which reads whatever is in the process environment.

Docker Compose

The simplest production pattern is an env_file directive in Docker Compose that points to a file on the host server:

# compose.prod.yaml
services:
  app:
    image: myapp:latest
    env_file:
      - .env.production
    ports:
      - "3000:3000"

The .env.production file lives on the server, not in the repository. It is created during provisioning and contains real credentials. Alternatively, set variables directly in the environment block:

services:
  app:
    image: myapp:latest
    environment:
      DATABASE_URL: postgres://user:${DB_PASSWORD}@db:5432/myapp
      SMTP_HOST: smtp.example.com
      SMTP_TLS: "true"

Variables referenced with ${} syntax are interpolated from the host shell environment, which allows Terraform or CI/CD to inject them without writing credentials into the Compose file.

Terraform provisioning

When deploying to a Hetzner VPS with Terraform, provision the .env.production file on the server during infrastructure setup:

resource "null_resource" "deploy_env" {
  provisioner "file" {
    content     = templatefile("env.production.tftpl", {
      database_url = var.database_url
      smtp_host    = var.smtp_host
      smtp_password = var.smtp_password
      s3_access_key = var.s3_access_key
      s3_secret_key = var.s3_secret_key
    })
    destination = "/opt/myapp/.env.production"
  }
}

The template file (env.production.tftpl) contains variable placeholders that Terraform fills in:

DATABASE_URL=${database_url}
SMTP_HOST=${smtp_host}
SMTP_PASSWORD=${smtp_password}
S3_ACCESS_KEY=${s3_access_key}
S3_SECRET_KEY=${s3_secret_key}

Terraform variables themselves come from terraform.tfvars (git-ignored) or TF_VAR_* environment variables set in CI/CD. The chain is: CI/CD secrets store → Terraform variables → provisioned file on server → Docker Compose env_file → process environment → Config::load().

GitHub Actions

For CI/CD pipelines, store secrets in GitHub Actions and pass them to deployment scripts:

jobs:
  deploy:
    steps:
      - name: Deploy
        env:
          DATABASE_URL: ${{ secrets.DATABASE_URL }}
          SMTP_PASSWORD: ${{ secrets.SMTP_PASSWORD }}
        run: ./deploy.sh

GitHub encrypts secrets at rest and masks them in log output. They are available only to workflows running on the repository.

Gotchas

std::env::var does not include the variable name in its error. Err(VarError::NotPresent) does not say which variable was missing. Always use a wrapper like the required() helper above, which includes the name in the panic message.

dotenvy does not override existing variables. If DATABASE_URL is already set in the shell, the .env file value is ignored. This is correct behaviour: production values set by the platform should not be overridden by a .env file. Use dotenvy::dotenv_override() if you need the opposite (rarely useful).

Parse errors are confusing without context. "abc".parse::<u16>().unwrap() panics with a bare ParseIntError. The parse() helper above includes the variable name in the error, making startup failures easy to diagnose.

Secrets in panic messages. If you unwrap a String containing a database password and the operation panics, the password appears in the panic output. Wrapping secrets in SecretString prevents this. The Debug output shows [REDACTED] instead of the value.

Compile-time environment variables for SQLx. SQLx’s query! macro reads DATABASE_URL at compile time to check queries against the database schema. This uses dotenvy internally (SQLx depends on it). The compile-time .env file must contain a valid DATABASE_URL that points to a running database, even though the application reads it at runtime too.

Observability

Production applications need three categories of telemetry: logs (discrete events), traces (request flows across spans of work), and metrics (numeric measurements over time). The Rust ecosystem handles all three through the tracing crate for instrumentation and OpenTelemetry for export to a self-hosted Grafana stack.

The web server section introduces tracing basics: initialising a subscriber, adding TraceLayer, and controlling log levels with RUST_LOG. The error handling section covers logging errors with tracing::error! and #[instrument]. This section builds on that foundation, covering production subscriber configuration, OpenTelemetry export, metrics collection, and the self-hosted observability stack that receives it all.

Dependencies

tracing and tracing-subscriber are already workspace dependencies from the web server setup. Add the OpenTelemetry crates and the metrics stack:

[workspace.dependencies]
# Already present from web server section
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter", "json", "registry"] }

# OpenTelemetry
opentelemetry = "0.31"
opentelemetry_sdk = { version = "0.31", features = ["rt-tokio"] }
opentelemetry-otlp = { version = "0.31", features = ["grpc-tonic"] }
tracing-opentelemetry = "0.32"
opentelemetry-appender-tracing = "0.31"

# Metrics
metrics = "0.24"
metrics-exporter-prometheus = "0.18"

The grpc-tonic feature on opentelemetry-otlp enables gRPC transport via Tonic. This is not a default feature as of 0.31, so it must be explicitly enabled. The alternative is HTTP/protobuf (http-proto feature), which is now the default. Either works; gRPC is the more established choice for OTLP.

Add json and registry features to tracing-subscriber if they are not already present. json enables structured JSON log output and registry provides the composable multi-layer subscriber.

Structured logging with tracing

tracing goes beyond traditional logging. Where log::info!("processing user {}", user_id) produces a flat string, tracing attaches structured key-value fields to both events and spans.

use tracing::{info, warn, info_span};

// Structured fields on an event
info!(user_id = 42, action = "login", "user authenticated");

// Variable name becomes field name automatically
let user_id = 42;
info!(user_id, "user authenticated");

// Debug vs Display formatting
info!(?some_struct, "debug format");   // uses Debug
info!(%some_value, "display format");  // uses Display

Spans represent a unit of work with a duration. Events occur within spans. When you nest spans, child spans carry their parent’s context, building a tree that traces the full path of a request through your application.

let span = info_span!("process_order", order_id = 1234);
let _guard = span.enter();

// Everything logged here is associated with the process_order span
info!("validating payment");
info!("updating inventory");

The #[instrument] attribute macro (covered in error handling) is the most common way to create spans. It wraps a function in a span named after the function, recording arguments as fields:

#[tracing::instrument(skip(pool))]
async fn create_order(pool: &PgPool, user_id: i64, items: &[Item]) -> Result<Order, AppError> {
    info!("creating order");
    // ...
}

This structured data is what makes observability work. Log aggregation systems like Loki can filter and group by field values. Trace backends like Tempo use spans to reconstruct request flows. None of this works with unstructured string logs.

Production subscriber configuration

The simple tracing_subscriber::fmt().init() from the web server section works for development. Production needs a multi-layer subscriber that sends telemetry to both stdout and OpenTelemetry.

tracing-subscriber’s architecture is built on composable layers stacked on a Registry:

use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt, EnvFilter};

pub fn init_telemetry() {
    // Filter: controls which spans and events are processed
    let env_filter = EnvFilter::try_from_default_env()
        .unwrap_or_else(|_| EnvFilter::new("info,tower_http=debug"));

    // Layer 1: format and print to stdout
    let fmt_layer = tracing_subscriber::fmt::layer()
        .json()
        .with_target(true)
        .with_thread_ids(false)
        .with_file(true)
        .with_line_number(true);

    tracing_subscriber::registry()
        .with(env_filter)
        .with(fmt_layer)
        .init();
}

Each layer handles one concern. The EnvFilter controls what gets processed (respects RUST_LOG). The fmt::layer() handles output formatting. Calling .json() switches from human-readable to structured JSON output, which log aggregation tools parse far more reliably than plain text.

When OpenTelemetry is configured (next section), two additional layers join the stack: one that exports spans as traces, and one that exports events as log records.

OpenTelemetry export

OpenTelemetry (OTel) is a vendor-neutral standard for telemetry data. The Rust application exports traces and logs via the OTLP protocol to an OpenTelemetry Collector, which forwards them to the storage backends (Tempo for traces, Loki for logs).

Setting up the trace exporter

tracing-opentelemetry provides a layer that converts tracing spans into OpenTelemetry spans and exports them via OTLP:

use opentelemetry::trace::TracerProvider;
use opentelemetry_otlp::SpanExporter;
use opentelemetry_sdk::{
    trace::SdkTracerProvider,
    Resource,
};
use tracing_opentelemetry::OpenTelemetryLayer;

fn init_tracer_provider() -> SdkTracerProvider {
    let exporter = SpanExporter::builder()
        .with_tonic()
        .build()
        .expect("failed to create OTLP span exporter");

    SdkTracerProvider::builder()
        .with_batch_exporter(exporter)
        .with_resource(
            Resource::builder()
                .with_service_name("my-app")
                .build(),
        )
        .build()
}

with_tonic() sends spans over gRPC to the collector. The exporter reads OTEL_EXPORTER_OTLP_ENDPOINT for the collector address (defaults to http://localhost:4317). with_batch_exporter batches spans before sending, reducing network overhead.

The Resource identifies this application in the observability stack. service.name is the minimum; it appears as the service label in Grafana.

Setting up the logs exporter

opentelemetry-appender-tracing bridges tracing events to OpenTelemetry log records. This sends your application’s log output through the same OTLP pipeline as traces, and automatically attaches the active trace ID and span ID to each log record. That attachment is what enables clicking from a log line in Loki directly to the corresponding trace in Tempo.

use opentelemetry_appender_tracing::layer::OpenTelemetryTracingBridge;
use opentelemetry_otlp::LogExporter;
use opentelemetry_sdk::logs::SdkLoggerProvider;

fn init_logger_provider() -> SdkLoggerProvider {
    let exporter = LogExporter::builder()
        .with_tonic()
        .build()
        .expect("failed to create OTLP log exporter");

    SdkLoggerProvider::builder()
        .with_batch_exporter(exporter)
        .build()
}

Combining everything

Wire up the full subscriber with all four layers:

use opentelemetry::trace::TracerProvider;
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt, EnvFilter};

pub struct TelemetryGuard {
    tracer_provider: SdkTracerProvider,
    logger_provider: SdkLoggerProvider,
}

impl Drop for TelemetryGuard {
    fn drop(&mut self) {
        if let Err(e) = self.tracer_provider.shutdown() {
            eprintln!("failed to shutdown tracer provider: {e}");
        }
        if let Err(e) = self.logger_provider.shutdown() {
            eprintln!("failed to shutdown logger provider: {e}");
        }
    }
}

pub fn init_telemetry() -> TelemetryGuard {
    let tracer_provider = init_tracer_provider();
    let logger_provider = init_logger_provider();

    let env_filter = EnvFilter::try_from_default_env()
        .unwrap_or_else(|_| EnvFilter::new("info,tower_http=debug"));

    // Stdout: JSON for production, human-readable for development
    let fmt_layer = tracing_subscriber::fmt::layer()
        .json()
        .with_target(true);

    // Traces: tracing spans → OTel spans → OTLP → Tempo
    let tracer = tracer_provider.tracer("my-app");
    let otel_trace_layer = tracing_opentelemetry::layer().with_tracer(tracer);

    // Logs: tracing events → OTel log records → OTLP → Loki
    let otel_logs_layer = OpenTelemetryTracingBridge::new(&logger_provider);

    tracing_subscriber::registry()
        .with(env_filter)
        .with(fmt_layer)
        .with(otel_trace_layer)
        .with(otel_logs_layer)
        .init();

    TelemetryGuard {
        tracer_provider,
        logger_provider,
    }
}

The TelemetryGuard ensures providers flush pending telemetry on shutdown. Hold it in main:

#[tokio::main]
async fn main() {
    let _telemetry = init_telemetry();

    // ... build app, start server
}

When _telemetry drops (at the end of main or on graceful shutdown), the providers flush any buffered spans and log records to the collector. Without this, the last few seconds of telemetry are lost on shutdown.

Development vs production

In development, you may not want to run the full observability stack. Make OpenTelemetry export conditional:

pub fn init_telemetry() -> Option<TelemetryGuard> {
    let env_filter = EnvFilter::try_from_default_env()
        .unwrap_or_else(|_| EnvFilter::new("info,tower_http=debug"));

    let otel_enabled = std::env::var("OTEL_EXPORTER_OTLP_ENDPOINT").is_ok();

    if otel_enabled {
        let tracer_provider = init_tracer_provider();
        let logger_provider = init_logger_provider();

        let tracer = tracer_provider.tracer("my-app");
        let otel_trace_layer = tracing_opentelemetry::layer().with_tracer(tracer);
        let otel_logs_layer = OpenTelemetryTracingBridge::new(&logger_provider);

        tracing_subscriber::registry()
            .with(env_filter)
            .with(tracing_subscriber::fmt::layer().json().with_target(true))
            .with(otel_trace_layer)
            .with(otel_logs_layer)
            .init();

        Some(TelemetryGuard { tracer_provider, logger_provider })
    } else {
        tracing_subscriber::registry()
            .with(env_filter)
            .with(tracing_subscriber::fmt::layer().with_target(true))
            .init();

        None
    }
}

When OTEL_EXPORTER_OTLP_ENDPOINT is not set, the subscriber falls back to human-readable stdout logging with no OTel export. Set the variable to enable export:

OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

The observability stack

The receiving end is a self-hosted Grafana stack: Loki for logs, Tempo for traces, Prometheus for metrics, and Grafana for dashboards. An OpenTelemetry Collector sits between the application and these backends, receiving OTLP data and routing it to the correct destination.

┌──────────┐    OTLP     ┌───────────────┐
│ Rust App │───────────── │  OTel         │──── Loki (logs)
│          │   gRPC/HTTP  │  Collector    │──── Tempo (traces)
└──────────┘              └───────────────┘──── Prometheus (metrics)
                                                    │
                                              ┌─────┘
                                              ▼
                                          Grafana (dashboards)

Local development

For development, Grafana provides an all-in-one Docker image that bundles the OTel Collector, Grafana, Loki, Tempo, and Prometheus in a single container. Add it to your Docker Compose file alongside the other backing services:

services:
  lgtm:
    image: grafana/otel-lgtm:latest
    ports:
      - "3000:3000"   # Grafana UI
      - "4317:4317"   # OTLP gRPC receiver
      - "4318:4318"   # OTLP HTTP receiver

Start the container and set the OTLP endpoint in your .env:

OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

Open http://localhost:3000 to access Grafana. All data sources (Loki, Tempo, Prometheus) are pre-configured. No additional setup is needed.

Production

In production, run each component as a separate container: Grafana, Loki, Tempo, Prometheus, and the OpenTelemetry Collector. The Grafana documentation covers configuration for each component. The OTel Collector documentation covers the YAML pipeline configuration for routing OTLP data to each backend.

The key configuration is the collector’s pipeline, which routes each signal type to its destination:

# otel-collector.yaml (abbreviated)
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
      http:
        endpoint: "0.0.0.0:4318"

exporters:
  otlphttp/loki:
    endpoint: http://loki:3100/otlp
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true
  prometheusremotewrite:
    endpoint: http://prometheus:9090/api/v1/write

service:
  pipelines:
    logs:
      receivers: [otlp]
      exporters: [otlphttp/loki]
    traces:
      receivers: [otlp]
      exporters: [otlp/tempo]
    metrics:
      receivers: [otlp]
      exporters: [prometheusremotewrite]

Loki 3.x accepts OTLP logs natively at its /otlp endpoint. Tempo accepts OTLP traces over gRPC on port 4317. Prometheus 3.x accepts remote writes with the --web.enable-remote-write-receiver flag. Use the otel/opentelemetry-collector-contrib Docker image, which includes all the required exporters.

Metrics with Prometheus

The metrics crate provides a facade for recording application metrics. metrics-exporter-prometheus exposes those metrics in Prometheus text format at a /metrics endpoint that Prometheus scrapes.

Setup

use metrics_exporter_prometheus::{Formatter, PrometheusBuilder, PrometheusHandle};

pub fn init_metrics() -> PrometheusHandle {
    PrometheusBuilder::new()
        .install_recorder()
        .expect("failed to install Prometheus recorder")
}

install_recorder() registers the Prometheus exporter as the global metrics recorder and returns a handle. Use the handle to render the metrics output in an Axum handler:

use axum::{routing::get, Router, Extension};

async fn metrics_handler(
    Extension(handle): Extension<PrometheusHandle>,
) -> String {
    handle.render()
}

let metrics_handle = init_metrics();

let app = Router::new()
    .route("/", get(index))
    .route("/metrics", get(metrics_handler))
    .layer(Extension(metrics_handle));

Prometheus scrapes http://your-app:3000/metrics on a configured interval (typically 15 seconds) and stores the time series data.

Recording metrics

The metrics crate provides three metric types:

use metrics::{counter, gauge, histogram};

// Counter: monotonically increasing (requests served, errors encountered)
counter!("http_requests_total", "method" => "GET", "path" => "/users").increment(1);

// Gauge: value that goes up and down (active connections, queue depth)
gauge!("active_connections").set(42.0);

// Histogram: distribution of values (request latency, response size)
histogram!("http_request_duration_seconds").record(0.035);

Labels (the "method" => "GET" pairs) create separate time series for each label combination. Use labels to break down metrics by dimensions you need to filter or group by in dashboards.

Request metrics middleware

Record HTTP request count and duration for every request with Tower middleware:

use axum::{extract::MatchedPath, middleware, http::Request, response::Response};
use std::time::Instant;

async fn track_metrics<B>(
    matched_path: Option<MatchedPath>,
    request: Request<B>,
    next: middleware::Next<B>,
) -> Response {
    let method = request.method().to_string();
    let path = matched_path
        .map(|p| p.as_str().to_string())
        .unwrap_or_else(|| "unknown".to_string());

    let start = Instant::now();
    let response = next.run(request).await;
    let duration = start.elapsed().as_secs_f64();
    let status = response.status().as_u16().to_string();

    counter!("http_requests_total", "method" => method.clone(), "path" => path.clone(), "status" => status).increment(1);
    histogram!("http_request_duration_seconds", "method" => method, "path" => path).record(duration);

    response
}

let app = Router::new()
    .route("/", get(index))
    .route_layer(middleware::from_fn(track_metrics))
    .route("/metrics", get(metrics_handler));

Use MatchedPath rather than the raw URI for the path label. Raw URIs with path parameters (e.g., /users/42, /users/73) create unbounded label cardinality, which bloats Prometheus storage. MatchedPath returns the route template (/users/:id), keeping cardinality bounded.

The /metrics endpoint is outside the route_layer scope so it does not record metrics about metrics scraping.

What to measure

Start with RED metrics for HTTP services:

  • Rate: http_requests_total (counter, by method/path/status)
  • Errors: filter http_requests_total where status is 5xx
  • Duration: http_request_duration_seconds (histogram, by method/path)

Add application-specific metrics as you identify monitoring needs:

  • db_query_duration_seconds (histogram) for slow query detection
  • background_jobs_total (counter, by job type and outcome)
  • active_sessions (gauge) for capacity planning
  • email_send_total (counter, by outcome: success/failure)

Resist adding metrics for everything up front. Start with RED, observe your application in production, and add metrics when you find yourself asking a question that the existing telemetry cannot answer.

Correlating requests across services

The value of the observability stack multiplies when signals are connected. A log line links to its trace. A metric spike links to the traces that caused it.

Trace IDs in logs

The opentelemetry-appender-tracing bridge automatically attaches the active trace ID and span ID to every log record exported via OTLP. When these logs land in Loki, Grafana can extract the trace ID and create a clickable link to the corresponding trace in Tempo.

Configure this in Grafana’s Loki data source settings by adding a derived field that matches the trace ID and links to Tempo. In the grafana/otel-lgtm development image, this correlation is pre-configured.

Propagation across services

If your application calls other services (via reqwest or similar), propagate the trace context so spans across services form a single trace. Inject the W3C traceparent header into outgoing requests:

use opentelemetry::global;
use opentelemetry::propagation::Injector;
use reqwest::header::HeaderMap;

struct HeaderInjector<'a>(&'a mut HeaderMap);

impl<'a> Injector for HeaderInjector<'a> {
    fn set(&mut self, key: &str, value: String) {
        if let Ok(header_name) = key.parse() {
            if let Ok(header_value) = value.parse() {
                self.0.insert(header_name, header_value);
            }
        }
    }
}

pub async fn call_other_service(client: &reqwest::Client, url: &str) -> reqwest::Result<String> {
    let mut headers = HeaderMap::new();
    global::get_text_map_propagator(|propagator| {
        propagator.inject(&mut HeaderInjector(&mut headers));
    });

    client
        .get(url)
        .headers(headers)
        .send()
        .await?
        .text()
        .await
}

Register the W3C propagator at startup, before initialising the subscriber:

use opentelemetry::global;
use opentelemetry_sdk::propagation::TraceContextPropagator;

global::set_text_map_propagator(TraceContextPropagator::new());

For a single-service application (which most projects in this guide will be), propagation is not needed. Add it when you split into multiple services and want end-to-end traces.

Environment variables

The OpenTelemetry SDK reads standard environment variables. Set these in production:

# Collector endpoint
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317

# Service identification
OTEL_SERVICE_NAME=my-app
OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production

# Log level filtering
RUST_LOG=info,tower_http=debug

# Prometheus scrape (configure in prometheus.yml, not as an env var)

OTEL_SERVICE_NAME overrides the service.name set in code. OTEL_RESOURCE_ATTRIBUTES adds arbitrary key-value pairs to the OTel resource (useful for environment, version, or region labels).

Gotchas

Shutdown order matters. The TelemetryGuard must outlive the Axum server. If the guard drops before in-flight requests complete, those requests’ spans and logs are lost. Structure main so the guard is declared before the server starts and drops after shutdown completes.

EnvFilter is applied once. The filter determines which spans and events reach any layer. If you filter at info level, the OTel layers will not receive debug spans either. For production, info is typically appropriate. Avoid trace or debug in production unless you are actively debugging, as the volume of OTel data grows rapidly.

gRPC vs HTTP for OTLP. The grpc-tonic feature adds Tonic as a dependency, which pulls in prost, hyper, and h2. If binary size or compile time is a concern, use the http-proto feature instead, which sends OTLP over HTTP/protobuf using reqwest. Both are functionally equivalent.

Prometheus label cardinality. Every unique combination of label values creates a separate time series in Prometheus. High-cardinality labels (user IDs, request IDs, raw URLs) cause storage bloat and query slowness. Keep labels to bounded sets: HTTP methods, route templates, status code classes, job types.

The metrics crate is a facade. Like log for logging, metrics defines the recording API but not the backend. If you forget to call PrometheusBuilder::new().install_recorder(), metric calls silently do nothing. Initialise the recorder early in startup.

OpenTelemetry crate versions move together. The opentelemetry, opentelemetry_sdk, and opentelemetry-otlp crates share a version number and must match. tracing-opentelemetry is one minor version ahead (0.32.x works with opentelemetry 0.31.x). Check compatibility when upgrading.

Operations

Testing

Rust’s type system catches many errors at compile time, but it does not verify business logic, database queries, or HTML output. Tests fill that gap. This section covers unit tests for domain logic and Maud components, integration tests against real databases and services using testcontainers, and testing Axum handlers with the axum_test crate.

Unit tests for domain logic

Standard #[test] functions in the same file as the code under test. Rust’s built-in test framework needs no external dependencies for pure logic.

pub fn slugify(input: &str) -> String {
    input
        .to_lowercase()
        .chars()
        .map(|c| if c.is_alphanumeric() { c } else { '-' })
        .collect::<String>()
        .trim_matches('-')
        .to_string()
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn slugify_converts_spaces_and_lowercases() {
        assert_eq!(slugify("Hello World"), "hello-world");
    }

    #[test]
    fn slugify_strips_special_characters() {
        assert_eq!(slugify("Rust & Axum!"), "rust---axum-");
    }
}

Place unit tests in a #[cfg(test)] mod tests block at the bottom of the module file. They compile only during cargo test and have access to private items in the parent module.

For async code, use #[tokio::test]:

#[cfg(test)]
mod tests {
    use super::*;

    #[tokio::test]
    async fn sends_welcome_email() {
        let result = send_welcome("alice@example.com").await;
        assert!(result.is_ok());
    }
}

Testing Maud components

Maud components are Rust functions that return Markup. Test them by rendering to a string and asserting on the HTML output.

Simple assertions with into_string()

For small components with predictable output, compare the rendered string directly:

use maud::{html, Markup};

pub fn status_badge(active: bool) -> Markup {
    html! {
        @if active {
            span.badge.bg-success { "Active" }
        } @else {
            span.badge.bg-secondary { "Inactive" }
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn badge_renders_active() {
        let html = status_badge(true).into_string();
        assert_eq!(html, r#"<span class="badge bg-success">Active</span>"#);
    }

    #[test]
    fn badge_renders_inactive() {
        let html = status_badge(false).into_string();
        assert_eq!(html, r#"<span class="badge bg-secondary">Inactive</span>"#);
    }
}

into_string() consumes the Markup and returns the inner String. Maud produces deterministic output with no extra whitespace, so exact string comparison works reliably for small components.

Structured assertions with scraper

For larger components or pages where exact string matching is fragile, use the scraper crate to parse the HTML and query it with CSS selectors.

[dev-dependencies]
scraper = "0.23"

Define a few test helpers that the rest of the test suite can reuse:

// tests/support/html.rs
use scraper::{Html, Selector};

pub fn parse(markup: maud::Markup) -> Html {
    Html::parse_fragment(&markup.into_string())
}

pub fn select_one<'a>(doc: &'a Html, css: &str) -> scraper::ElementRef<'a> {
    let sel = Selector::parse(css).unwrap();
    doc.select(&sel)
        .next()
        .unwrap_or_else(|| panic!("no element matching '{css}'"))
}

pub fn select_count(doc: &Html, css: &str) -> usize {
    let sel = Selector::parse(css).unwrap();
    doc.select(&sel).count()
}

pub fn text_of(doc: &Html, css: &str) -> String {
    select_one(doc, css).text().collect()
}

Use the helpers to make structural assertions:

use crate::support::html::{parse, select_one, select_count, text_of};

#[test]
fn user_card_renders_name_and_link() {
    let doc = parse(user_card(42, "Alice"));

    assert_eq!(text_of(&doc, ".user-card h2"), "Alice");

    let link = select_one(&doc, "a.profile-link");
    assert_eq!(link.value().attr("href"), Some("/users/42"));
}

#[test]
fn user_list_renders_all_users() {
    let users = vec![
        User { id: 1, name: "Alice".into() },
        User { id: 2, name: "Bob".into() },
    ];
    let doc = parse(user_list(&users));

    assert_eq!(select_count(&doc, ".user-card"), 2);
}

This approach survives attribute reordering, whitespace changes, and additions to unrelated parts of the template. It breaks only when the structural contract changes, which is what you want.

Verifying HTML escaping

Maud auto-escapes interpolated content. Verify this in tests whenever a component renders user-provided input:

#[test]
fn escapes_user_input() {
    let doc = parse(user_card(1, "<script>alert('xss')</script>"));
    let html = select_one(&doc, ".user-card h2").inner_html();

    assert!(!html.contains("<script>"));
    assert!(html.contains("&lt;script&gt;"));
}

Integration tests with testcontainers

testcontainers starts real Docker containers for PostgreSQL, Redis, Restate, or any other service your application depends on. Each test gets isolated infrastructure that is torn down automatically when the test finishes.

[dev-dependencies]
testcontainers = "0.27"
testcontainers-modules = { version = "0.15", features = ["postgres", "redis"] }

The testcontainers-modules crate re-exports the core testcontainers crate, so you can import from either.

PostgreSQL

use testcontainers_modules::postgres::Postgres;
use testcontainers_modules::testcontainers::runners::AsyncRunner;
use sqlx::PgPool;

async fn start_postgres() -> (testcontainers::ContainerAsync<Postgres>, PgPool) {
    let container = Postgres::default()
        .with_db_name("test_db")
        .with_user("test")
        .with_password("test")
        .start()
        .await
        .expect("failed to start postgres container");

    let host = container.get_host().await.unwrap();
    let port = container.get_host_port_ipv4(5432).await.unwrap();
    let url = format!("postgres://test:test@{host}:{port}/test_db");

    let pool = PgPool::connect(&url).await.expect("failed to connect to test database");
    sqlx::migrate!().run(&pool).await.expect("failed to run migrations");

    (container, pool)
}

The container variable must stay in scope for the duration of the test. When it is dropped, testcontainers stops and removes the Docker container.

sqlx::migrate!() reads migrations from the migrations/ directory (relative to the crate’s Cargo.toml) and applies them to the fresh database. Every test starts with a clean schema.

Redis

use testcontainers_modules::redis::{Redis, REDIS_PORT};
use testcontainers_modules::testcontainers::runners::AsyncRunner;

async fn start_redis() -> (testcontainers::ContainerAsync<Redis>, String) {
    let container = Redis::default()
        .start()
        .await
        .expect("failed to start redis container");

    let host = container.get_host().await.unwrap();
    let port = container.get_host_port_ipv4(REDIS_PORT).await.unwrap();
    let url = format!("redis://{host}:{port}");

    (container, url)
}

Restate

Restate provides its own testcontainers integration through the restate-sdk-testcontainers crate:

[dev-dependencies]
restate-sdk = "0.9"
restate-sdk-testcontainers = "0.9"
use restate_sdk_testcontainers::TestEnvironment;

#[tokio::test]
async fn test_workflow() {
    let env = TestEnvironment::new()
        .start(my_service_endpoint)
        .await;

    let ingress_url = env.ingress_url();
    // Send requests to the Restate ingress...
}

TestEnvironment starts a Restate container, binds your service endpoint to a random port, health-checks it, and registers the endpoint with Restate’s admin API. The ingress_url() gives you the URL for sending requests through Restate.

For services without a dedicated testcontainers module, use GenericImage:

use testcontainers::{GenericImage, runners::AsyncRunner};
use testcontainers::core::{IntoContainerPort, WaitFor};

let container = GenericImage::new("my-service", "1.0")
    .with_exposed_port(8080.tcp())
    .with_wait_for(WaitFor::message_on_stdout("ready"))
    .start()
    .await
    .unwrap();

Testing Axum handlers

The axum_test crate provides an ergonomic test client for Axum applications. It handles request construction, response parsing, and cookie persistence.

[dev-dependencies]
axum-test = "0.22"

Basic setup

Create a function that builds the application Router with its state. This same function serves both production and tests:

#[derive(Clone)]
pub struct AppState {
    pub db: PgPool,
    // redis, config, etc.
}

pub fn app(state: AppState) -> Router {
    Router::new()
        .route("/users", get(list_users).post(create_user))
        .route("/users/{id}", get(get_user))
        .with_state(state)
}

In tests, build the state with a testcontainers-backed pool and pass it to TestServer:

use axum_test::TestServer;

#[tokio::test]
async fn test_list_users() {
    let (_pg, pool) = start_postgres().await;
    let state = AppState { db: pool };
    let server = TestServer::new(app(state)).unwrap();

    let response = server.get("/users").await;

    response.assert_status_ok();
    response.assert_header("content-type", "text/html; charset=utf-8");
}

The _pg binding keeps the PostgreSQL container alive for the duration of the test. Dropping it stops the container.

Testing HTML responses

Combine axum_test with the scraper helpers to assert on HTML structure:

#[tokio::test]
async fn test_user_list_renders_users() {
    let (_pg, pool) = start_postgres().await;
    seed_users(&pool).await;
    let server = TestServer::new(app(AppState { db: pool })).unwrap();

    let response = server.get("/users").await;
    response.assert_status_ok();

    let doc = Html::parse_document(&response.text());
    assert_eq!(select_count(&doc, ".user-card"), 3);
    assert_eq!(text_of(&doc, ".user-card:first-child h2"), "Alice");
}

Testing form submissions

Submit forms with .form() and verify the response:

#[tokio::test]
async fn test_create_user() {
    let (_pg, pool) = start_postgres().await;
    let server = TestServer::new(app(AppState { db: pool.clone() })).unwrap();

    let response = server
        .post("/users")
        .form(&[("name", "Charlie"), ("email", "charlie@example.com")])
        .await;

    response.assert_status_ok();

    // Verify the user was persisted
    let user = sqlx::query!("SELECT name FROM users WHERE email = 'charlie@example.com'")
        .fetch_one(&pool)
        .await
        .unwrap();
    assert_eq!(user.name, "Charlie");
}

Testing htmx requests

htmx adds an HX-Request: true header to every request. Add it in tests to exercise the fragment-returning code path:

#[tokio::test]
async fn test_htmx_search_returns_fragment() {
    let (_pg, pool) = start_postgres().await;
    seed_users(&pool).await;
    let server = TestServer::new(app(AppState { db: pool })).unwrap();

    let response = server
        .get("/users/search")
        .add_query_param("q", "alice")
        .add_header("HX-Request", "true")
        .await;

    response.assert_status_ok();

    let html = response.text();
    // Fragment response: no <html> wrapper, just the search results
    assert!(!html.contains("<!DOCTYPE"));
    assert!(html.contains("Alice"));
}

Cookie and session handling

Enable cookie persistence on the test server for testing authenticated flows:

#[tokio::test]
async fn test_authenticated_page() {
    let (_pg, pool) = start_postgres().await;
    seed_users(&pool).await;
    let server = TestServer::builder()
        .save_cookies()
        .build(app(AppState { db: pool }))
        .unwrap();

    // Log in (sets a session cookie)
    server
        .post("/login")
        .form(&[("email", "alice@example.com"), ("password", "correct-password")])
        .await
        .assert_status_ok();

    // Subsequent requests carry the session cookie
    let response = server.get("/dashboard").await;
    response.assert_status_ok();
    response.assert_text_contains("Welcome, Alice");
}

save_cookies() tells TestServer to persist cookies across requests, simulating a browser session.

Test fixtures and data setup

Seed test data with plain async functions that run after migrations:

async fn seed_users(pool: &PgPool) {
    sqlx::query!(
        "INSERT INTO users (name, email) VALUES ($1, $2), ($3, $4), ($5, $6)",
        "Alice", "alice@example.com",
        "Bob", "bob@example.com",
        "Charlie", "charlie@example.com"
    )
    .execute(pool)
    .await
    .unwrap();
}

Call seed_users(&pool).await at the start of any test that needs data. Each test gets its own database container with its own schema, so fixtures do not conflict between tests.

For larger fixture sets, organise seed functions by domain:

// tests/support/fixtures.rs
pub async fn seed_users(pool: &PgPool) { /* ... */ }
pub async fn seed_projects(pool: &PgPool) { /* ... */ }
pub async fn seed_users_with_projects(pool: &PgPool) {
    seed_users(pool).await;
    seed_projects(pool).await;
}

Shared test setup

Extract container startup and state construction into a helper to avoid repeating the boilerplate in every test:

// tests/support/mod.rs
pub mod fixtures;
pub mod html;

use testcontainers::ContainerAsync;
use testcontainers_modules::postgres::Postgres;

pub struct TestContext {
    pub server: axum_test::TestServer,
    pub pool: PgPool,
    _pg: ContainerAsync<Postgres>,
}

impl TestContext {
    pub async fn new() -> Self {
        let (pg, pool) = start_postgres().await;
        let state = AppState { db: pool.clone() };
        let server = axum_test::TestServer::new(app(state)).unwrap();

        Self { server, pool, _pg: pg }
    }
}

Tests become concise:

#[tokio::test]
async fn test_user_creation() {
    let ctx = TestContext::new().await;
    seed_users(&ctx.pool).await;

    let response = ctx.server.get("/users").await;
    response.assert_status_ok();
}

Running tests in CI

Container requirements

Testcontainers requires Docker. In GitHub Actions, Docker is available on the default ubuntu-latest runner. No special setup is needed.

# .github/workflows/test.yml
name: Test
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: dtolnay/rust-toolchain@stable

      - uses: Swatinem/rust-cache@v2

      - name: Run tests
        run: cargo test --workspace

rust-cache caches compiled dependencies between runs. Without it, every CI run rebuilds the dependency tree from scratch.

SQLx offline mode

SQLx compile-time query checking requires a live database during compilation. In CI, compilation happens before Docker containers start, so the build would fail without offline mode.

Generate the query cache locally (with your database running):

cargo sqlx prepare --workspace

Commit the resulting .sqlx/ directory to version control. When DATABASE_URL is absent at compile time, SQLx uses the cached metadata instead of connecting to a database.

In CI, verify the cache is current:

      - name: Check SQLx cache
        run: cargo sqlx prepare --workspace --check

This fails if any query has changed without regenerating the cache.

Parallel test execution

cargo test runs tests in parallel by default using one thread per test. Testcontainers starts a separate Docker container per test, so parallelism works without shared state conflicts. Each test has its own database.

If container startup becomes a bottleneck (each PostgreSQL container takes 2-3 seconds), reduce parallelism:

cargo test -- --test-threads=4

Or use the TESTCONTAINERS environment variable with reusable-containers feature to share containers across tests in the same binary. This trades isolation for speed.

End-to-end browser testing

For testing the fully rendered application in a real browser, the most mature option is Playwright running via Node.js. Playwright does not care what language your server is written in. Start the Rust application, point Playwright at http://localhost:PORT, and write tests against the rendered HTML.

Rust-native alternatives exist. fantoccini is an async WebDriver client that works with Chrome, Firefox, and Safari via their respective drivers. thirtyfour provides similar WebDriver-based testing with a richer query and waiting API. Both are mature and actively maintained.

For most teams, Playwright provides the best E2E testing experience: auto-waiting, tracing, video recording, and multi-browser support. The trade-off is a Node.js dependency in your test toolchain. If that dependency is unacceptable, fantoccini or thirtyfour are solid pure-Rust options.

E2E tests are a complement to the unit and integration tests described above, not a replacement. Run them separately from cargo test, typically as a dedicated CI step after the application is built and running.

Gotchas

Container startup time. Each PostgreSQL container takes 2-3 seconds to start. For a test suite with dozens of integration tests, this adds up. Consider grouping related assertions into fewer, larger tests rather than many small ones, or using the reusable-containers feature to share containers.

Docker must be running. Testcontainers communicates with the Docker daemon. Tests fail immediately if Docker is not available. In CI, this is handled by the default runner. Locally, ensure Docker Desktop or the Docker daemon is running before running cargo test.

Port mapping. Testcontainers maps container ports to random host ports. Always use container.get_host_port_ipv4(5432) to get the mapped port. Never hardcode port numbers.

Container variable lifetime. The container handle (ContainerAsync<Postgres>) must remain in scope for the test’s duration. If it is dropped, the container stops. A common mistake is discarding the handle:

// Wrong: container is dropped immediately
let pool = {
    let (container, pool) = start_postgres().await;
    pool
}; // container dropped here, database gone

// Right: keep the container alive
let (_container, pool) = start_postgres().await;
// _container lives until end of test

SQLx compile-time checking vs test databases. The query! macros check against DATABASE_URL at compile time. This is your development database, not the testcontainers database (which does not exist yet at compile time). The offline cache (.sqlx/) bridges this gap in CI. Locally, keep your development PostgreSQL running for compilation and let testcontainers handle test databases at runtime.

Continuous Integration and Delivery

Every commit should pass formatting checks, linting, compile-time query validation, and tests before it reaches the main branch. This section covers GitHub Actions workflows for Rust projects, building Docker images with cached dependency layers, and pushing images to a container registry.

GitHub Actions workflow structure

The workflows below use GitHub Actions. Forgejo Actions uses a compatible YAML format, so these workflows transfer to a self-hosted Forgejo instance with minor adjustments (runner labels, service hostnames). See the Forgejo section below for specifics. Keeping workflow files in .github/workflows/ works in both systems.

Split CI into parallel jobs for fast feedback. Formatting and linting fail quickly and cheaply. Tests take longer and need service containers. Running them in separate jobs means a formatting mistake fails in seconds, not after a full build.

# .github/workflows/ci.yml
name: CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  CARGO_TERM_COLOR: always

jobs:
  fmt:
    name: Formatting
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
        with:
          components: rustfmt
      - run: cargo fmt --all -- --check

  clippy:
    name: Clippy
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
        with:
          components: clippy
      - uses: Swatinem/rust-cache@v2
      - run: cargo clippy --all-targets --all-features -- -D warnings

  test:
    name: Test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
      - uses: Swatinem/rust-cache@v2
      - name: Run tests
        run: cargo test --workspace

  sqlx-check:
    name: SQLx Cache
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
      - uses: Swatinem/rust-cache@v2
      - run: cargo install sqlx-cli --no-default-features --features postgres
      - run: cargo sqlx prepare --workspace --check

Four jobs, each with a single responsibility:

  • fmt checks formatting with rustfmt. No caching needed since it does not compile code.
  • clippy runs the Rust linter with warnings treated as errors (-D warnings). Caching avoids recompiling dependencies on every run.
  • test runs the full test suite. The testing section covers test setup in detail, including testcontainers for PostgreSQL and Redis, axum_test for handler testing, and SQLx offline mode for compile-time query checking without a live database.
  • sqlx-check verifies the .sqlx/ query cache is current. This catches stale query metadata before it reaches production.

Toolchain and caching

dtolnay/rust-toolchain installs the Rust toolchain. Specify the channel via the @ref syntax: @stable, @nightly, or a pinned version like @1.84.0. The components input adds tools like clippy and rustfmt.

Swatinem/rust-cache@v2 caches the ~/.cargo registry and ./target directory between runs. It builds cache keys from Cargo.toml, Cargo.lock, and the Rust compiler version, so a toolchain update or dependency change invalidates the cache automatically. Place it after the toolchain setup but before any build steps.

Without rust-cache, every CI run rebuilds the full dependency tree from scratch. For a typical Axum application with 100+ transitive dependencies, that adds several minutes to every pipeline.

The fmt job skips caching intentionally. cargo fmt --check only parses the source; it does not compile anything.

Service containers for integration tests

If your test suite needs PostgreSQL, Redis, or other services and you are not using testcontainers (which manages containers itself), GitHub Actions can start service containers:

  test:
    name: Test
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:17
        env:
          POSTGRES_USER: postgres
          POSTGRES_PASSWORD: postgres
          POSTGRES_DB: test_db
        ports:
          - 5432:5432
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
      redis:
        image: redis:7
        ports:
          - 6379:6379
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
      - uses: Swatinem/rust-cache@v2
      - name: Run tests
        run: cargo test --workspace
        env:
          DATABASE_URL: postgres://postgres:postgres@localhost:5432/test_db
          REDIS_URL: redis://localhost:6379

The services block starts Docker containers before the job’s steps run. Health checks ensure the service is ready before tests begin. When the job runs directly on the runner (not inside a container), services are accessible at localhost on the mapped port.

The testcontainers approach described in the testing section manages containers per test and does not need the services block. Both approaches work in CI. Testcontainers gives per-test isolation; service containers give a single shared instance.

Building Docker images

Rust compiles to a single static binary. The build image needs the full Rust toolchain; the runtime image only needs the binary. A multi-stage Dockerfile separates these concerns.

Dockerfile with cargo-chef

cargo-chef caches dependency compilation across Docker builds. Without it, changing a single line of application code triggers a full rebuild of every dependency. With it, dependencies only rebuild when Cargo.toml or Cargo.lock change. The speedup is typically 3-5x.

The pattern uses three stages:

# Stage 1: Compute a dependency-only build recipe
FROM lukemathwalker/cargo-chef:latest-rust-1 AS chef
WORKDIR /app

FROM chef AS planner
COPY . .
RUN cargo chef prepare --recipe-path recipe.json

# Stage 2: Build dependencies, then the application
FROM chef AS builder
COPY --from=planner /app/recipe.json recipe.json
RUN cargo chef cook --release --recipe-path recipe.json
COPY . .
RUN cargo build --release --bin myapp

# Stage 3: Runtime
FROM debian:bookworm-slim AS runtime
RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*
RUN groupadd -g 1001 appgroup && \
    useradd -u 1001 -g appgroup -m appuser
COPY --from=builder --chown=appuser:appgroup /app/target/release/myapp /usr/local/bin/myapp
USER appuser
ENTRYPOINT ["/usr/local/bin/myapp"]

How it works:

  1. The planner stage scans all Cargo.toml files and Cargo.lock and produces recipe.json, a manifest of your dependency tree with workspace structure.
  2. The builder stage first runs cargo chef cook with the recipe. This downloads and compiles all dependencies. Docker caches this layer. As long as recipe.json has not changed (meaning your dependencies are the same), this step is a cache hit on subsequent builds. Then it copies the full source and compiles the application. Only this final compilation step runs on each code change.
  3. The runtime stage copies the compiled binary into a minimal Debian image. ca-certificates is included for TLS connections. The application runs as a non-root user.

The lukemathwalker/cargo-chef base image bundles cargo-chef with the official Rust image. The latest-rust-1 tag tracks the latest Rust 1.x release.

Runtime image choices

debian:bookworm-slim (~80 MB) provides a good balance of size and debuggability. It includes a shell, basic tools, and glibc. For production troubleshooting, being able to docker exec into a container and run basic commands is worth the extra megabytes.

If image size or attack surface is a hard requirement, consider gcr.io/distroless/cc-debian12 (~20 MB, no shell) or scratch (empty, requires a fully statically linked binary built with musl). For most applications, bookworm-slim is the practical choice.

Pushing to a container registry

After the CI jobs pass, build the Docker image and push it to a registry. The image job depends on the test jobs so that broken code never produces an image.

GitHub Container Registry

  image:
    name: Build and Push Image
    runs-on: ubuntu-latest
    needs: [fmt, clippy, test, sqlx-check]
    if: github.ref == 'refs/heads/main'
    permissions:
      contents: read
      packages: write
    steps:
      - uses: actions/checkout@v4

      - uses: docker/setup-buildx-action@v3

      - uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - uses: docker/metadata-action@v5
        id: meta
        with:
          images: ghcr.io/${{ github.repository }}
          tags: |
            type=sha
            type=raw,value=latest,enable={{is_default_branch}}

      - uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

This job runs only on pushes to main (not on pull requests). The GITHUB_TOKEN is automatically available and has sufficient permissions for GHCR when the packages: write permission is set.

The metadata-action generates two tags: a short commit SHA (sha-abc1234) for traceability, and latest for the default branch. The commit SHA tag lets you trace any running container back to the exact commit that produced it.

cache-from: type=gha and cache-to: type=gha,mode=max use the GitHub Actions cache backend for Docker layer caching. Combined with cargo-chef inside the Dockerfile, this means dependency layers are cached both within Docker and across CI runs.

Self-hosted container registry

If you run a self-hosted Forgejo instance or another registry, the workflow is the same pattern with different login credentials:

      - uses: docker/login-action@v3
        with:
          registry: ${{ secrets.REGISTRY_URL }}
          username: ${{ secrets.REGISTRY_USERNAME }}
          password: ${{ secrets.REGISTRY_PASSWORD }}

Replace the images value in the metadata-action to match your registry URL. The rest of the workflow is identical.

Conventional commits

Conventional Commits is a format for commit messages that makes the history machine-readable. Each message starts with a type, an optional scope, and a description:

feat(auth): add password reset flow
fix(search): handle empty query parameter
docs: update deployment instructions
refactor(db): extract connection pool setup
chore(ci): update rust-toolchain to 1.84

The common types are feat (new feature), fix (bug fix), docs, refactor, test, chore, and ci. A ! after the type or scope (e.g. feat!: or feat(api)!:) signals a breaking change.

The value of this convention is consistent, scannable history. When every commit follows the same format, it becomes trivial to see what changed, filter commits by type, or generate changelogs automatically.

Several tools automate workflows around conventional commits. git-cliff generates changelogs from commit history using configurable templates. cocogitto provides commit validation, version bumping, and changelog generation in a single Rust-native binary. Both are actively maintained and integrate with CI pipelines. Choose based on what you actually need, whether that is changelog generation, commit validation, automated version bumps, or all three.

Enforcing the format in CI is optional but straightforward. A validation step on pull requests catches non-conforming commits before they reach the main branch.

Self-hosted CI with Forgejo

Forgejo is a self-hosted Git forge with its own CI system, Forgejo Actions. Workflow files use YAML syntax similar to GitHub Actions and can be placed in .forgejo/workflows/ or .github/workflows/. Many GitHub Actions concepts carry over: triggers, job matrices, service containers, and step syntax.

Forgejo Actions is still marked as experimental. The main differences from GitHub Actions:

  • The default runner image is minimal Debian with Node.js, not the full Ubuntu image GitHub provides. You may need to install additional build tools.
  • Service container networking uses the service label as the hostname (e.g. postgres rather than localhost) since both the job and the service run on the same Docker network.
  • Some GitHub Actions marketplace actions need modification to work with Forgejo.
  • The Forgejo Runner must be installed separately from the Forgejo instance, ideally on a different machine.

For a project that currently uses GitHub Actions and may migrate later, keep workflows in .github/workflows/. Forgejo falls back to that directory when .forgejo/workflows/ does not exist, which means the same workflow files work in both systems with minor adjustments for runner labels and service hostnames.

Gotchas

GitHub Actions cache limit. GitHub enforces a 10 GB total cache limit per repository. Rust builds produce large target directories. If caches are evicted frequently, check that Swatinem/rust-cache is configured with save-if: ${{ github.ref == 'refs/heads/main' }} to avoid filling the cache with every PR branch.

SQLx offline mode is required in CI. SQLx’s compile-time query checking connects to a live database. During CI compilation, no database is running yet. Run cargo sqlx prepare --workspace locally and commit the .sqlx/ directory. The sqlx-check job in the workflow above catches stale metadata.

cargo-chef and workspace layout. cargo chef prepare scans all Cargo.toml files in the workspace. If your workspace structure changes (adding or removing crates), the recipe changes and the dependency cache invalidates. This is correct behaviour. What can catch you off guard: the planner stage copies the entire source tree. If you have large non-Rust files (assets, data), consider adding a .dockerignore file to exclude them from the build context.

Docker BuildKit is required for cache mounts. The docker/setup-buildx-action enables BuildKit automatically in GitHub Actions. If building locally, ensure BuildKit is enabled (DOCKER_BUILDKIT=1 or Docker Desktop with BuildKit as default).

Container registry authentication in CI. The GITHUB_TOKEN has sufficient permissions for GHCR if the workflow sets permissions: packages: write. For self-hosted registries, store credentials as repository secrets, never in the workflow file.

Deployment

A single VPS running Docker Compose handles more traffic than most applications will ever see. A compiled Rust binary serving HTML fragments through Axum is fast enough that a 4 GB server comfortably sustains thousands of concurrent users. Start here. Add servers when monitoring shows you need them, not before.

This section covers the full deployment path: cross-compiling the application, packaging it as a Docker image, provisioning infrastructure with Terraform, orchestrating services with Docker Compose, and handling database backups. It starts with a single-server setup and describes how to grow into a multi-server architecture when the time comes.

The architecture progression

Three stages, from simplest to most complex:

Stage 1: Single VPS. One server runs everything: your application, PostgreSQL, Redis, Caddy, and the observability stack. This is the right starting point for most projects. It deploys in minutes and has no distributed-systems complexity.

Stage 2: Separate services. Split into two or three servers: one for your application, one for shared services (PostgreSQL, Redis, Grafana stack), and optionally one for your software forge (Forgejo). Block storage volumes hold persistent data. Tailscale connects the servers privately. This stage suits production workloads that need independent scaling or isolation between the database and the application.

Stage 3: Kubernetes. When Docker Compose on a small number of servers is genuinely constraining you, not before. The scaling strategy section covers the signals.

The Terraform configurations in this section provision stage 2 (separate stacks for volumes, services, and application). Collapsing them onto a single server for stage 1 is straightforward: put everything in one Compose file and skip the networking.

Hetzner Cloud

Hetzner Cloud offers VPS instances at a fraction of the price of AWS or DigitalOcean for equivalent specs. Check Hetzner’s pricing page for current rates.

Server lines

LineCPUUse case
CXShared Intel/AMDGeneral workloads
CAXShared ARM (Ampere Altra)Best price-performance ratio
CPXDedicated AMD EPYCCPU-intensive workloads
CCXDedicated high-memoryDatabases, caching

For a Rust web application, a CX23 or CX33 (4 vCPU / 8 GB) is a strong starting point. The CAX (ARM) line offers better price-performance, but requires ARM64 Docker images, which adds cross-compilation complexity. Stick with x86 (CX line) unless you have a reason to use ARM.

Regions

Hetzner operates data centres in Nuremberg (NBG1), Falkenstein (FSN1), Helsinki (HEL1), Ashburn (ASH), Hillsboro (HIL), and Singapore (SIN). EU regions include 20-60 TB of traffic. US and Singapore regions cost more and include less traffic. Choose the region closest to your users.

Block storage

Hetzner Volumes provide block storage that attaches to a server. Volumes persist independently of the server lifecycle, which is the entire point: you can destroy and recreate a server without losing data.

  • 10 GB minimum, 10 TB maximum
  • A volume can only attach to one server at a time
  • Volume and server must be in the same location
  • Data is stored with triple replication

Use volumes for PostgreSQL data directories, Redis persistence, and any other state that must survive a server rebuild.

Building the application

Cross-compile on your development machine (or in CI), then package the binary into a minimal Docker image. This avoids slow in-Docker Rust compilation entirely.

Cross-compilation with cargo-zigbuild

cargo-zigbuild replaces the system linker with Zig’s cross-compilation toolchain. It produces Linux binaries from macOS (or any host) without Docker, a Linux VM, or a separate cross-compilation toolchain.

Install cargo-zigbuild and Zig:

cargo install --locked cargo-zigbuild
brew install zig    # or: pip3 install ziglang

Add the target and build:

rustup target add x86_64-unknown-linux-gnu
cargo zigbuild --release --target x86_64-unknown-linux-gnu

The binary lands in target/x86_64-unknown-linux-gnu/release/. It is dynamically linked against glibc, which is fine because the runtime Docker image includes glibc.

To pin a specific minimum glibc version (Debian 12 ships glibc 2.36):

cargo zigbuild --release --target x86_64-unknown-linux-gnu.2.36

The glibc version suffix ensures the binary runs on any system with that glibc version or newer. This prevents surprises where a binary compiled against a newer glibc fails on an older host.

The Dockerfile

The Docker image does not compile anything. It copies the pre-built binary into a minimal base image.

FROM gcr.io/distroless/cc-debian12:nonroot
COPY target/x86_64-unknown-linux-gnu/release/myapp /usr/local/bin/myapp
ENTRYPOINT ["/usr/local/bin/myapp"]

distroless/cc-debian12 includes glibc, libgcc, CA certificates, and timezone data. Nothing else: no shell, no package manager, no utilities. The :nonroot tag runs as UID 65534 instead of root.

The resulting image is roughly 30 MB. Builds take seconds because there is no compilation step.

Build and tag the image:

cargo zigbuild --release --target x86_64-unknown-linux-gnu
docker build -t git.example.com/myorg/myapp:v1.0.0 .

Pushing to the registry

This section assumes a running Forgejo instance with its built-in container registry. The registry is enabled by default.

Log in and push:

docker login git.example.com
docker push git.example.com/myorg/myapp:v1.0.0

Authenticate with a Forgejo personal access token that has package:read and package:write scopes. Create one under Settings > Applications in Forgejo.

Infrastructure with Terraform

Terraform provisions the servers, volumes, firewall rules, and networking. The hcloud provider (v1.60+) is officially maintained by Hetzner and covers the full API.

Stack separation

Split infrastructure into three Terraform stacks (separate state files):

  1. Volumes stack: block storage for persistent data. Deploy first, destroy last (or never).
  2. Services stack: VPS for PostgreSQL, Redis, observability. References volumes by data source.
  3. Application stack: VPS for Caddy and the application. Can be destroyed and recreated without affecting data.

This separation protects persistent data. Rebuilding the application server does not touch the database volume. Rebuilding the services server does not touch the volumes themselves.

Shared variables

Each stack needs access to the Hetzner API token and common settings. Create a terraform.tfvars file (git-ignored) in each stack directory:

hcloud_token = "your-hetzner-api-token"
location     = "fsn1"
ssh_key_name = "deploy-key"

Stack 1: Volumes

# stacks/volumes/main.tf
terraform {
  required_providers {
    hcloud = {
      source  = "hetznercloud/hcloud"
      version = "~> 1.60"
    }
  }
}

variable "hcloud_token" { sensitive = true }
variable "location" { default = "fsn1" }

provider "hcloud" {
  token = var.hcloud_token
}

resource "hcloud_volume" "postgres" {
  name              = "postgres-data"
  size              = 50
  location          = var.location
  format            = "ext4"
  delete_protection = true

  labels = {
    service = "postgres"
  }
}

resource "hcloud_volume" "redis" {
  name              = "redis-data"
  size              = 10
  location          = var.location
  format            = "ext4"
  delete_protection = true

  labels = {
    service = "redis"
  }
}

resource "hcloud_volume" "grafana" {
  name              = "grafana-data"
  size              = 20
  location          = var.location
  format            = "ext4"
  delete_protection = true

  labels = {
    service = "grafana"
  }
}

output "postgres_volume_id" { value = hcloud_volume.postgres.id }
output "redis_volume_id" { value = hcloud_volume.redis.id }
output "grafana_volume_id" { value = hcloud_volume.grafana.id }

delete_protection = true prevents accidental terraform destroy from deleting the volumes. To remove a protected volume, you must first set delete_protection = false and apply, then destroy. This is intentional friction.

Stack 2: Services VPS

# stacks/services/main.tf
terraform {
  required_providers {
    hcloud = {
      source  = "hetznercloud/hcloud"
      version = "~> 1.60"
    }
  }
}

variable "hcloud_token" { sensitive = true }
variable "location" { default = "fsn1" }
variable "ssh_key_name" {}
variable "postgres_volume_id" {}
variable "redis_volume_id" {}
variable "grafana_volume_id" {}

provider "hcloud" {
  token = var.hcloud_token
}

data "hcloud_ssh_key" "deploy" {
  name = var.ssh_key_name
}

resource "hcloud_firewall" "services" {
  name = "services"

  # SSH (lock down to Tailscale once configured)
  rule {
    direction  = "in"
    protocol   = "tcp"
    port       = "22"
    source_ips = ["0.0.0.0/0", "::/0"]
  }

  # Tailscale UDP
  rule {
    direction  = "in"
    protocol   = "udp"
    port       = "41641"
    source_ips = ["0.0.0.0/0", "::/0"]
  }
}

resource "hcloud_server" "services" {
  name        = "services-1"
  server_type = "cx33"
  image       = "ubuntu-24.04"
  location    = var.location
  ssh_keys    = [data.hcloud_ssh_key.deploy.id]
  firewall_ids = [hcloud_firewall.services.id]

  user_data = file("${path.module}/cloud-init.yaml")
}

resource "hcloud_volume_attachment" "postgres" {
  volume_id = var.postgres_volume_id
  server_id = hcloud_server.services.id
  automount = true
}

resource "hcloud_volume_attachment" "redis" {
  volume_id = var.redis_volume_id
  server_id = hcloud_server.services.id
  automount = true
}

resource "hcloud_volume_attachment" "grafana" {
  volume_id = var.grafana_volume_id
  server_id = hcloud_server.services.id
  automount = true
}

output "services_ip" { value = hcloud_server.services.ipv4_address }

Pass the volume IDs from stack 1 via terraform.tfvars:

# stacks/services/terraform.tfvars
hcloud_token       = "your-token"
location           = "fsn1"
ssh_key_name       = "deploy-key"
postgres_volume_id = 12345678
redis_volume_id    = 12345679
grafana_volume_id  = 12345680

Stack 3: Application VPS

# stacks/app/main.tf
terraform {
  required_providers {
    hcloud = {
      source  = "hetznercloud/hcloud"
      version = "~> 1.60"
    }
  }
}

variable "hcloud_token" { sensitive = true }
variable "location" { default = "fsn1" }
variable "ssh_key_name" {}
variable "domain" {}

provider "hcloud" {
  token = var.hcloud_token
}

data "hcloud_ssh_key" "deploy" {
  name = var.ssh_key_name
}

resource "hcloud_firewall" "app" {
  name = "app"

  # HTTP
  rule {
    direction  = "in"
    protocol   = "tcp"
    port       = "80"
    source_ips = ["0.0.0.0/0", "::/0"]
  }

  # HTTPS
  rule {
    direction  = "in"
    protocol   = "tcp"
    port       = "443"
    source_ips = ["0.0.0.0/0", "::/0"]
  }

  # SSH (lock down to Tailscale once configured)
  rule {
    direction  = "in"
    protocol   = "tcp"
    port       = "22"
    source_ips = ["0.0.0.0/0", "::/0"]
  }

  # Tailscale UDP
  rule {
    direction  = "in"
    protocol   = "udp"
    port       = "41641"
    source_ips = ["0.0.0.0/0", "::/0"]
  }
}

resource "hcloud_server" "app" {
  name        = "app-1"
  server_type = "cx23"
  image       = "ubuntu-24.04"
  location    = var.location
  ssh_keys    = [data.hcloud_ssh_key.deploy.id]
  firewall_ids = [hcloud_firewall.app.id]

  user_data = file("${path.module}/cloud-init.yaml")
}

output "app_ip" { value = hcloud_server.app.ipv4_address }

Cloud-init for server setup

Both VPS stacks use a cloud-init.yaml that installs Docker and Tailscale on first boot:

#cloud-config
package_update: true
package_upgrade: true

packages:
  - curl
  - ca-certificates

runcmd:
  # Install Docker
  - curl -fsSL https://get.docker.com | sh
  - systemctl enable docker
  - systemctl start docker

  # Install Tailscale
  - curl -fsSL https://tailscale.com/install.sh | sh

  # Create application directory
  - mkdir -p /opt/app

After Terraform provisions the server, SSH in and run tailscale up with an auth key to join the tailnet. Subsequent deployments happen over Tailscale.

Tailscale for secure networking

Tailscale builds a WireGuard-based mesh VPN between your servers. Each server gets a stable IP on the 100.x.y.z range. All traffic between servers is encrypted end-to-end. The free tier supports up to 3 users and 100 devices.

Why Tailscale

Without Tailscale, the services VPS must expose PostgreSQL (port 5432), Redis (port 6379), and the Grafana stack to the public internet, even if firewalled to specific IPs. With Tailscale, these services bind only to the Tailscale interface. They are invisible to the public internet entirely.

Tailscale also simplifies SSH access. Once your servers are on the tailnet, you can close port 22 in the Hetzner firewall and SSH over the Tailscale IP instead.

Setting up servers

On each server, install Tailscale (already done via cloud-init) and authenticate:

# Generate an auth key in the Tailscale admin console
# Use a reusable, tagged key: --advertise-tags=tag:server
tailscale up --authkey tskey-auth-xxxxx --advertise-tags=tag:server

Tagged auth keys disable key expiry, so the server stays connected indefinitely. Generate auth keys in the Tailscale admin console.

Verify connectivity:

# From the app server, ping the services server by Tailscale hostname
ping services-1

Tailscale’s MagicDNS assigns each machine a hostname on your tailnet. Use these hostnames in your application’s DATABASE_URL and Redis connection strings instead of public IPs.

Docker sidecar pattern

For services running inside Docker containers, use a Tailscale sidecar container. Other containers share its network stack via network_mode: service::

services:
  tailscale:
    image: tailscale/tailscale:latest
    hostname: app-1
    environment:
      TS_AUTHKEY: ${TS_AUTHKEY}
      TS_EXTRA_ARGS: --advertise-tags=tag:server
      TS_STATE_DIR: /var/lib/tailscale
      TS_USERSPACE: "false"
    volumes:
      - ts-state:/var/lib/tailscale
    devices:
      - /dev/net/tun:/dev/net/tun
    cap_add:
      - net_admin
    restart: unless-stopped

  app:
    image: git.example.com/myorg/myapp:latest
    network_mode: service:tailscale
    depends_on:
      - tailscale

network_mode: service:tailscale makes the app container reachable at the Tailscale IP. The Tailscale state volume (ts-state) preserves the node identity across container restarts.

The sidecar approach is most useful when the application itself needs to be reachable over Tailscale. For the simpler case where the application only connects to services over Tailscale (and the host already has Tailscale installed), the host-level Tailscale installation is sufficient and the sidecar is not needed.

Production Docker Compose

Services VPS

The services VPS runs PostgreSQL, Redis, and the observability stack. Each service stores data on a Hetzner Volume mounted to the host.

# /opt/app/compose.yaml (services VPS)
services:
  postgres:
    image: postgres:17-alpine
    restart: unless-stopped
    ports:
      - "100.x.y.z:5432:5432"
    volumes:
      - /mnt/postgres-data/pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_USER: myapp
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_DB: myapp
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U myapp"]
      interval: 10s
      timeout: 5s
      retries: 5
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

  redis:
    image: redis:7-alpine
    restart: unless-stopped
    ports:
      - "100.x.y.z:6379:6379"
    volumes:
      - /mnt/redis-data:/data
    command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}
    healthcheck:
      test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

Replace 100.x.y.z with the services server’s Tailscale IP. Binding to the Tailscale IP means PostgreSQL and Redis accept connections only from the tailnet, not from the public internet.

Volume mount paths (/mnt/postgres-data/, /mnt/redis-data/) correspond to where Hetzner automounts the attached volumes. Check the mount points with lsblk or df -h after Terraform provisions the server.

The observability stack (Grafana, Loki, Tempo, Prometheus, OTel Collector) runs on the same VPS. See the observability section for Docker Compose configuration of those services.

Application VPS

The application VPS runs Caddy and the application.

# /opt/app/compose.yaml (app VPS)
services:
  app:
    image: git.example.com/myorg/myapp:${TAG:-latest}
    restart: unless-stopped
    expose:
      - "3000"
    env_file:
      - .env.production
    healthcheck:
      test: ["CMD", "/usr/local/bin/myapp", "healthcheck"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 15s
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

  caddy:
    image: caddy:2-alpine
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile:ro
      - caddy-data:/data
      - caddy-config:/config
    depends_on:
      app:
        condition: service_healthy

volumes:
  caddy-data:
  caddy-config:

The application reads DATABASE_URL, REDIS_URL, and other configuration from .env.production. These connection strings use Tailscale hostnames:

DATABASE_URL=postgres://myapp:password@services-1:5432/myapp
REDIS_URL=redis://:password@services-1:6379
OTEL_EXPORTER_OTLP_ENDPOINT=http://services-1:4317

See the configuration section for the full .env.production setup and how Terraform provisions it.

Health check implementation

The distroless runtime image has no shell, so curl and wget are not available. Implement a health check subcommand in the application binary:

fn main() {
    dotenvy::dotenv().ok();

    if std::env::args().nth(1).as_deref() == Some("healthcheck") {
        match healthcheck() {
            Ok(()) => std::process::exit(0),
            Err(e) => {
                eprintln!("healthcheck failed: {e}");
                std::process::exit(1);
            }
        }
    }

    run();
}

fn healthcheck() -> Result<(), Box<dyn std::error::Error>> {
    let port = std::env::var("PORT").unwrap_or_else(|_| "3000".to_string());
    let url = format!("http://127.0.0.1:{port}/health");
    let resp = ureq::get(&url).call()?;
    if resp.status() == 200 { Ok(()) } else { Err("non-200 status".into()) }
}

Add ureq (a minimal synchronous HTTP client) to the server crate’s dependencies. The health check subcommand calls the application’s /health endpoint and exits with code 0 or 1. Docker uses the exit code to determine container health.

Caddy as reverse proxy

Caddy handles TLS termination and reverse proxying. Its defining feature is automatic HTTPS: point a domain at your server, and Caddy obtains and renews a Let’s Encrypt certificate with zero configuration.

Caddyfile

app.example.com {
    reverse_proxy app:3000
}

That is the entire configuration. Caddy obtains a TLS certificate for app.example.com, terminates HTTPS, and proxies requests to the application container on port 3000.

For multiple services behind different subdomains:

app.example.com {
    reverse_proxy app:3000
}

grafana.example.com {
    reverse_proxy services-1:3000
}

Caddy resolves services-1 via Tailscale’s MagicDNS when the Caddy container shares the host’s Tailscale network (or uses the Tailscale sidecar pattern).

Requirements

Caddy’s automatic HTTPS needs two things:

  1. A DNS A record pointing your domain to the server’s public IP.
  2. Ports 80 and 443 open and routed to Caddy (for the ACME challenge and HTTPS traffic).

Both are handled by the Terraform firewall configuration and your DNS provider. If you manage DNS through Hetzner, the hcloud provider supports DNS zone and record resources (hcloud_zone, hcloud_zone_record).

Why Caddy over Nginx

Nginx requires certbot, cron jobs, and manual renewal configuration for Let’s Encrypt certificates. Caddy does this automatically. For a reverse proxy in front of a Rust application, Caddy’s simpler configuration and automatic certificate management eliminate an entire category of operational work. Nginx’s performance advantage is irrelevant at the traffic levels where you are running one or two VPS instances.

Database deployment and backups

PostgreSQL runs in a Docker container on the services VPS, with its data directory mounted on a Hetzner Volume. The volume provides persistence and triple replication at the storage layer. Backups provide recovery from logical errors (accidental deletes, bad migrations) that replication cannot protect against.

pg_dump with cron

A daily pg_dump via cron is the simplest backup strategy and covers the majority of use cases.

Create the backup script on the services VPS:

#!/usr/bin/env bash
# /opt/app/scripts/backup-db.sh
set -euo pipefail

BACKUP_DIR="/opt/app/backups"
RETENTION_DAYS=14
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
DUMP_FILE="${BACKUP_DIR}/myapp-${TIMESTAMP}.dump"

mkdir -p "${BACKUP_DIR}"

# Dump the database (custom format, compressed)
docker exec postgres pg_dump \
  -U myapp \
  -Fc \
  myapp > "${DUMP_FILE}"

# Remove backups older than retention period
find "${BACKUP_DIR}" -name "myapp-*.dump" -mtime +${RETENTION_DAYS} -delete

echo "Backup complete: ${DUMP_FILE} ($(du -h "${DUMP_FILE}" | cut -f1))"

-Fc produces a custom-format dump: compressed, supports selective restore of individual tables, and works with pg_restore. It is smaller and more flexible than plain SQL dumps.

Scheduling with cron

# Run daily at 02:00 UTC
echo "0 2 * * * root /opt/app/scripts/backup-db.sh >> /var/log/db-backup.log 2>&1" \
  | sudo tee /etc/cron.d/db-backup

Off-site copies

A backup on the same server as the database is not a real backup. Copy dumps to an S3-compatible storage provider. Add this to the backup script after the dump:

# Upload to S3 (Hetzner Object Storage, Backblaze B2, etc.)
S3_BUCKET="s3://myapp-backups"
S3_ENDPOINT="https://fsn1.your-objectstorage.com"

aws s3 cp "${DUMP_FILE}" "${S3_BUCKET}/db/${TIMESTAMP}.dump" \
  --endpoint-url "${S3_ENDPOINT}"

# Remove remote backups older than 30 days
aws s3 ls "${S3_BUCKET}/db/" --endpoint-url "${S3_ENDPOINT}" \
  | awk '{print $4}' \
  | head -n -30 \
  | xargs -I{} aws s3 rm "${S3_BUCKET}/db/{}" --endpoint-url "${S3_ENDPOINT}"

Install the AWS CLI on the services VPS (apt install awscli) and configure it with your S3 credentials. The AWS CLI works with any S3-compatible provider.

Restoring from backup

# Stop the application first to prevent writes during restore
docker exec -i postgres pg_restore \
  -U myapp \
  -d myapp \
  --clean \
  --if-exists \
  < /opt/app/backups/myapp-20260227-020000.dump

--clean --if-exists drops existing objects before restoring, which handles the common case of restoring to an existing database. Test the restore process periodically. An untested backup is not a backup.

When pg_dump is not enough

pg_dump takes a full snapshot every time. For databases larger than a few GB, consider:

  • pgBackRest: supports incremental and differential backups, parallel restore, backup verification, and direct archiving to S3. It is the standard tool for serious PostgreSQL backup infrastructure.
  • WAL archiving with point-in-time recovery: continuous archiving of write-ahead logs enables recovery to any point in time, not just the last backup. This requires more configuration but provides the strongest recovery guarantees.

For most applications in this guide’s scope, daily pg_dump with off-site copies is sufficient.

Deploying updates

The simplest deployment workflow: pull the new image and recreate the container.

# On the app VPS
docker compose pull app
docker compose up -d app

docker compose up -d app recreates only the app service if its image has changed. There is a brief interruption (typically 1-3 seconds) while the old container stops and the new one starts and passes its health check. For most applications, this is acceptable.

Wrap this in a deployment script:

#!/usr/bin/env bash
# deploy.sh
set -euo pipefail

TAG="${1:?Usage: deploy.sh <tag>}"
export TAG

cd /opt/app
docker compose pull app
docker compose up -d app

echo "Deployed ${TAG}"
echo "Waiting for health check..."
docker compose exec app /usr/local/bin/myapp healthcheck
echo "Healthy."

Trigger the script from CI or run it manually over SSH (via Tailscale):

ssh services-1 "cd /opt/app && TAG=v1.2.3 ./deploy.sh v1.2.3"

Zero-downtime with Caddy

If a few seconds of downtime during deployment is not acceptable, use a blue-green approach with Caddy’s admin API. Run two instances of the application (blue and green). Deploy to the inactive one, verify it is healthy, then switch Caddy’s upstream.

Caddy exposes an admin API on localhost:2019 that can update routing without a restart. Switch the upstream from blue to green atomically:

curl -X PATCH http://localhost:2019/config/apps/http/servers/srv0/routes/0/handle/0/upstreams \
  -H "Content-Type: application/json" \
  -d '[{"dial": "green:3000"}]'

This is more complex to set up and maintain. Start with the simple recreate approach and add blue-green when you have a genuine need for zero-downtime deployments.

Scaling strategy

Docker Compose on a single VPS scales further than most people expect. A Rust application serving HTML is fast. PostgreSQL on a dedicated volume with proper indexes handles millions of rows. You will likely hit organisational complexity before you hit server capacity.

When to add servers

  • Database contention: the application and database compete for CPU or memory on the same server. Move the database to the services VPS (stage 2).
  • Backup impact: pg_dump on a busy database affects application latency. A separate services VPS isolates backup I/O.
  • Independent scaling: the application needs more CPU but the database does not (or vice versa). Separate servers let you size each independently.
  • Isolation requirements: security policy requires the database to be on a server with no public internet access.

When to consider Kubernetes

Stay on Docker Compose until you genuinely exhaust what a small number of VPS instances can provide. The signals that Kubernetes might be worth the operational cost:

  • Multiple application instances across servers: you need horizontal scaling beyond what a single server provides, and a load balancer in front of multiple app servers.
  • Auto-scaling: traffic is bursty enough that you need to add and remove capacity automatically.
  • Many services with complex dependencies: once you pass 10-20 containers with interdependent deployment ordering, Docker Compose files become fragile.
  • Multiple teams deploying independently: Kubernetes namespaces and RBAC provide isolation that Docker Compose does not.

If you reach this point, k3s is a lightweight Kubernetes distribution that runs on a single node or a small cluster. It installs in under a minute, uses roughly 500 MB of RAM, and provides full Kubernetes API compatibility. k3s on two or three Hetzner CX33 instances is a reasonable stepping stone before full managed Kubernetes.

Do not adopt Kubernetes because it seems like the professional choice. The operational complexity is real. For most Rust web applications, Docker Compose on one to three servers is the right answer for years.

Gotchas

Mount the Hetzner Volume before starting containers. If a volume is not mounted when Docker Compose starts, containers write to the server’s local disk. When the volume is later mounted, the data on local disk is hidden. Verify mount points with df -h before the first docker compose up.

Pin Docker image tags in production. Use myapp:v1.2.3, not myapp:latest. With latest, docker compose pull fetches whatever was most recently pushed, which makes deployments unpredictable and rollbacks impossible.

Caddy’s data volume is important. The caddy-data volume stores TLS certificates and ACME account keys. Losing this volume means Caddy must re-issue all certificates, which can hit Let’s Encrypt rate limits (50 certificates per registered domain per week). Keep the Caddy data volume on persistent storage.

Tailscale auth keys expire. Pre-authentication keys have a maximum lifetime of 90 days. Once a server joins the tailnet, the key is no longer needed, so this only matters for new server provisioning. Use tagged auth keys (--advertise-tags=tag:server) to disable node key expiry on the device itself.

Docker logging fills disks. Without max-size and max-file on the json-file logging driver, container logs grow without bound. Set these on every service in production Compose files.

Test your backups. Periodically restore a backup to a temporary database and verify the data. A backup that cannot be restored is not a backup.

Distroless has no shell. You cannot docker exec -it app bash into a distroless container. For debugging, run a temporary container with a full image (docker run -it --network container:app debian:bookworm-slim bash) that shares the application container’s network namespace. Or add a debug sidecar to the Compose file temporarily.

Web Application Performance

Rust already eliminates the largest performance tax in most web stacks: garbage collection pauses, interpreter overhead, and runtime type-checking. An Axum application serving HTML fragments through Maud starts fast and stays fast under load. The HDA architecture adds a structural advantage: no client-side framework bundle to download and parse, no JSON serialisation layer between server and browser, and smaller payloads because HTML fragments replace full JSON responses plus client-side rendering.

That said, performance work in any stack follows the same rule: measure first, then optimise what the measurements show. Adding caching, compression layers, or index hints without evidence of a real problem creates complexity that must be maintained, debugged, and reasoned about. Every technique in this section is worth knowing. None of them should be applied preemptively.

HTTP caching headers

HTTP caching is the highest-leverage performance tool available. A response that never reaches your server costs nothing to serve.

Cache-Control for dynamic responses

Set Cache-Control headers in Axum handlers using tuple responses:

use axum::{
    http::header,
    response::IntoResponse,
};
use maud::{html, Markup};

async fn product_page(/* ... */) -> impl IntoResponse {
    let markup: Markup = html! { /* ... */ };

    (
        [(header::CACHE_CONTROL, "public, max-age=300")],
        markup,
    )
}

For pages that must revalidate on every request but can still benefit from conditional caching:

(
    [(header::CACHE_CONTROL, "no-cache")],
    markup,
)

no-cache does not mean “don’t cache.” It means “cache, but revalidate with the server before using.” Combined with an ETag, the server can respond with 304 Not Modified and skip sending the body entirely.

ETags and conditional responses

An ETag is a fingerprint of the response content. When the browser sends the ETag back in an If-None-Match header, the server can return a 304 Not Modified if the content has not changed, saving bandwidth and rendering time.

use axum::{
    body::Body,
    extract::Request,
    http::{header, Response, StatusCode},
    response::IntoResponse,
};
use std::hash::{DefaultHasher, Hash, Hasher};

fn compute_etag(content: &str) -> String {
    let mut hasher = DefaultHasher::new();
    content.hash(&mut hasher);
    format!("\"{}\"", format!("{:x}", hasher.finish()))
}

async fn cacheable_page(req: Request) -> Response<Body> {
    let html = render_page();
    let etag = compute_etag(&html);

    // Check If-None-Match from the browser
    if let Some(if_none_match) = req.headers().get(header::IF_NONE_MATCH) {
        if if_none_match.to_str().ok() == Some(etag.as_str()) {
            return Response::builder()
                .status(StatusCode::NOT_MODIFIED)
                .header(header::ETAG, &etag)
                .header(header::CACHE_CONTROL, "public, max-age=60")
                .body(Body::empty())
                .unwrap();
        }
    }

    Response::builder()
        .status(StatusCode::OK)
        .header(header::ETAG, &etag)
        .header(header::CACHE_CONTROL, "public, max-age=60")
        .header(header::CONTENT_TYPE, "text/html; charset=utf-8")
        .body(Body::from(html))
        .unwrap()
}

For pages where generating the full HTML is itself expensive, ETags are less useful because you must still render the content to compute the hash. In those cases, consider using a version number, last-modified timestamp from the database, or a cache layer (covered below).

Blanket headers with tower-http

Apply Cache-Control to groups of routes using SetResponseHeaderLayer. This sets the header only if the handler did not already set one, so individual handlers can override the default.

use axum::{routing::get, Router};
use tower_http::set_header::SetResponseHeaderLayer;
use http::{header, HeaderValue};

let app = Router::new()
    .route("/products", get(list_products))
    .route("/products/{id}", get(product_page))
    .layer(
        SetResponseHeaderLayer::if_not_present(
            header::CACHE_CONTROL,
            HeaderValue::from_static("public, max-age=60"),
        )
    );
[dependencies]
tower-http = { version = "0.6", features = ["set-header"] }

Caching guidelines by content type

Content typeCache-ControlRationale
Static assets (CSS, JS, images)public, max-age=31536000, immutableContent-hashed filenames mean the URL changes when the file changes. Cache forever.
Public pages (product listing, homepage)public, max-age=60 to max-age=300Short TTL allows quick updates. Adjust based on how frequently content changes.
Personalised pages (dashboard, profile)private, no-cacheMust not be cached by shared proxies. Revalidate on every request.
HTMX fragmentsno-store or no-cacheFragments usually reflect current state. no-store prevents any caching. no-cache allows ETag-based revalidation.
API responses (JSON, if you have them)private, no-cache or short max-ageDepends on the data. Default to conservative.

Static asset caching with content-hashed filenames is covered in the CSS section. The immutable directive tells browsers not to revalidate even when the user reloads the page, eliminating conditional requests entirely for fingerprinted assets.

Response compression

Compressing responses reduces bandwidth and improves page load times, particularly on slower connections. tower-http provides a compression middleware that negotiates the algorithm from the client’s Accept-Encoding header.

[dependencies]
tower-http = { version = "0.6", features = ["compression-gzip", "compression-br"] }
use axum::{routing::get, Router};
use tower_http::compression::CompressionLayer;

let app = Router::new()
    .route("/", get(index))
    .layer(CompressionLayer::new());

CompressionLayer::new() enables all compiled-in algorithms and negotiates automatically. By default it skips images, gRPC responses, Server-Sent Events, and responses smaller than 32 bytes.

Choosing algorithms

Enable gzip and Brotli for broad browser support. Zstandard (zstd) compresses faster than both but lacks Safari support as of early 2026.

let compression = CompressionLayer::new()
    .gzip(true)
    .br(true)
    .zstd(false); // Omit until Safari support lands

Tuning the compression predicate

The default predicate is conservative. Raise the minimum response size to avoid compressing tiny responses where the overhead exceeds the savings:

use tower_http::compression::{
    CompressionLayer,
    predicate::{NotForContentType, SizeAbove, Predicate},
};

let predicate = SizeAbove::new(256)
    .and(NotForContentType::IMAGES)
    .and(NotForContentType::SSE)
    .and(NotForContentType::GRPC);

let compression = CompressionLayer::new().compress_when(predicate);

Compression level

For dynamic HTML responses, CompressionLevel::Default balances compression ratio against CPU cost. Avoid CompressionLevel::Best for dynamic content; the marginal size reduction does not justify the CPU cost per request.

use tower_http::compression::CompressionLevel;

let compression = CompressionLayer::new()
    .quality(CompressionLevel::Default);

For static assets, pre-compress at build time rather than compressing on every request. tower-http’s ServeDir supports pre-compressed files:

use tower_http::services::ServeDir;

let static_files = ServeDir::new("static")
    .precompressed_br()
    .precompressed_gzip();

Place pre-compressed files alongside the originals (app.css.br, app.css.gz). ServeDir selects the correct variant based on the request’s Accept-Encoding.

Database query performance

PostgreSQL with proper indexes handles far more traffic than most developers expect. Before reaching for caching or read replicas, check that queries are efficient.

EXPLAIN ANALYZE

EXPLAIN ANALYZE executes a query and reports the actual execution plan, timing, and row counts.

EXPLAIN (ANALYZE, BUFFERS)
SELECT u.id, u.name, count(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name;

Read the plan from the innermost (most indented) nodes outward. Look for:

  • Seq Scan on large tables. A sequential scan on a table with thousands of rows, where the query selects a small fraction, signals a missing index.
  • Estimated vs actual row divergence. When the planner estimates 10 rows but actual is 50,000, it picks a bad join strategy. Run ANALYZE tablename to update statistics.
  • Nested Loop with high loops count. Multiply actual time by loops to get the real elapsed time. A node showing actual time=0.05ms loops=10000 is really consuming 500ms.
  • Sort on disk. If a Sort node reports Sort Method: external merge Disk, increase work_mem or reduce the data being sorted.

BUFFERS is particularly useful: shared hit shows pages read from the PostgreSQL buffer cache, shared read shows pages fetched from disk. High read values on frequently-executed queries mean the working set exceeds shared_buffers.

Indexing strategies

B-tree (the default) covers equality, range, and sorting queries. Place equality columns before range columns in multicolumn indexes:

-- Speeds up WHERE tenant_id = $1 AND created_at > $2
CREATE INDEX idx_orders_tenant_created ON orders (tenant_id, created_at);

Partial indexes cover only rows matching a condition. Effective for queue-like patterns:

-- Only index unprocessed jobs
CREATE INDEX idx_jobs_pending ON jobs (priority, created_at)
    WHERE status = 'pending';

Covering indexes (INCLUDE) store extra columns in the index to enable index-only scans:

-- SELECT email, name FROM users WHERE email = $1
-- Both columns served from the index, no heap lookup
CREATE INDEX idx_users_email_covering ON users (email) INCLUDE (name);

GIN indexes handle JSONB, arrays, and full-text search vectors. They are larger and slower to update than B-tree but fast for containment lookups.

Do not create indexes speculatively. Every index slows down writes. Add indexes when EXPLAIN ANALYZE shows a problem. Use CREATE INDEX CONCURRENTLY on production tables to avoid locking writes during creation.

Monitor index usage

Find unused indexes that are slowing down writes for no benefit:

SELECT schemaname, relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

The N+1 problem

One query to fetch a list, then one query per item to fetch related data. The most common performance problem in web applications.

// 1 query: fetch all orders
let orders = sqlx::query_as!(Order, "SELECT * FROM orders LIMIT 50")
    .fetch_all(&pool).await?;

// 50 queries: fetch user for each order
for order in &orders {
    let user = sqlx::query_as!(User,
        "SELECT name FROM users WHERE id = $1", order.user_id)
        .fetch_one(&pool).await?;
}

Fix with a JOIN:

let results = sqlx::query_as!(
    OrderWithUser,
    r#"SELECT o.id as order_id, o.total, u.name as user_name
       FROM orders o
       JOIN users u ON u.id = o.user_id
       LIMIT 50"#
)
.fetch_all(&pool).await?;

Or batch-load with ANY when JOINs are not practical:

let user_ids: Vec<i32> = orders.iter().map(|o| o.user_id).collect();
let users = sqlx::query_as!(User,
    "SELECT id, name FROM users WHERE id = ANY($1)", &user_ids)
    .fetch_all(&pool).await?;

Connection pool tuning

SQLx’s default pool settings are conservative. Tune them for web workloads:

use sqlx::postgres::PgPoolOptions;
use std::time::Duration;

let pool = PgPoolOptions::new()
    .max_connections(20)
    .min_connections(2)
    .acquire_timeout(Duration::from_secs(5))
    .idle_timeout(Duration::from_secs(300))
    .max_lifetime(Duration::from_secs(1800))
    .connect(&database_url)
    .await?;

Key adjustments:

  • max_connections: set based on PostgreSQL’s max_connections divided by the number of application instances. With PostgreSQL’s default of 100 and four app instances, 20 per instance leaves headroom for admin connections and migrations.
  • min_connections: set to 2-5 to avoid cold-start latency. The default (0) means the first requests after an idle period wait for TCP handshake, TLS negotiation, and authentication.
  • acquire_timeout: the default (30 seconds) is far too long for a web request. Set to 3-5 seconds. Fail fast with a 503 rather than making the user wait.

Keeping transactions short

Do not hold database connections open during external HTTP calls or other non-database I/O:

// Bad: holds a connection for the entire duration of the HTTP call
let mut tx = pool.begin().await?;
sqlx::query!("UPDATE orders SET status = 'processing' WHERE id = $1", id)
    .execute(&mut *tx).await?;
let result = reqwest::get("https://payment-api.example.com/charge").await?;
sqlx::query!("UPDATE orders SET status = $1 WHERE id = $2", result.status, id)
    .execute(&mut *tx).await?;
tx.commit().await?;

Under load, this drains the pool. Restructure to minimise the transaction scope, or use Restate for operations that need durable execution across external calls.

pg_stat_statements

Enable pg_stat_statements to identify the queries consuming the most cumulative database time. This is the production equivalent of EXPLAIN ANALYZE for individual queries.

In Docker Compose for development:

services:
  postgres:
    image: postgres:17
    command: >
      postgres
        -c shared_preload_libraries=pg_stat_statements
        -c pg_stat_statements.track=all
        -c track_io_timing=on

Then create the extension:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Find the queries consuming the most database time:

SELECT
    substring(query, 1, 200) AS query_preview,
    calls,
    round(total_exec_time::numeric, 2) AS total_ms,
    round(mean_exec_time::numeric, 2) AS mean_ms,
    rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

A query with a 2ms mean but 5 million daily calls contributes more total load than a 500ms query called 100 times. Optimise by total time, not mean time.

Redis as a caching layer

Redis adds a shared, network-accessible cache that works across multiple application instances. It also adds an infrastructure dependency, a consistency problem (stale data), and invalidation complexity. Only introduce it when you have measured a real bottleneck that cannot be solved with database optimisation or HTTP caching.

If you are running a single application instance and want in-process caching, consider moka first. It provides an async-aware concurrent cache with TTL support, avoids the network hop, and requires no additional infrastructure.

Setup

The application’s Redis pub/sub setup already includes a ConnectionManager. Reuse it for caching:

use redis::aio::ConnectionManager;

#[derive(Clone)]
struct AppState {
    db: sqlx::PgPool,
    redis: ConnectionManager,
}

ConnectionManager wraps a MultiplexedConnection with automatic reconnection. It is cheap to clone and safe to share across handlers.

Cache-aside pattern

The application checks the cache first. On a miss, it queries the database, stores the result, and returns it.

use redis::AsyncCommands;
use serde::{de::DeserializeOwned, Serialize};
use std::future::Future;

async fn cache_aside<T, F, Fut>(
    redis: &mut ConnectionManager,
    key: &str,
    ttl_seconds: u64,
    fetch_fn: F,
) -> anyhow::Result<T>
where
    T: Serialize + DeserializeOwned,
    F: FnOnce() -> Fut,
    Fut: Future<Output = anyhow::Result<T>>,
{
    // Check cache
    let cached: Option<String> = redis.get(key).await?;
    if let Some(json) = cached {
        return Ok(serde_json::from_str(&json)?);
    }

    // Cache miss: fetch from source
    let value = fetch_fn().await?;

    // Store with TTL
    let json = serde_json::to_string(&value)?;
    redis.set_ex(key, &json, ttl_seconds).await?;

    Ok(value)
}

Use it in a handler:

async fn get_product(
    State(state): State<AppState>,
    Path(id): Path<i64>,
) -> Result<impl IntoResponse, AppError> {
    let mut redis = state.redis.clone();

    let product = cache_aside(
        &mut redis,
        &format!("product:{id}"),
        300, // 5 minutes
        || async {
            sqlx::query_as!(Product, "SELECT * FROM products WHERE id = $1", id)
                .fetch_one(&state.db)
                .await
                .map_err(Into::into)
        },
    ).await?;

    Ok(render_product(&product))
}

Cache invalidation

Invalidation is the hard part. Two practical strategies:

TTL-based expiration. Set a time-to-live and accept that data may be stale for up to that duration. Simple, self-healing, no coordination needed. Choose the TTL based on how stale the data can acceptably be.

Explicit invalidation on write. Delete the cache entry when the underlying data changes:

async fn update_product(
    State(state): State<AppState>,
    Path(id): Path<i64>,
    Form(input): Form<ProductInput>,
) -> Result<impl IntoResponse, AppError> {
    sqlx::query!("UPDATE products SET name = $1, price = $2 WHERE id = $3",
        input.name, input.price, id)
        .execute(&state.db).await?;

    // Invalidate the cache entry
    let mut redis = state.redis.clone();
    let _: () = redis.del(format!("product:{id}")).await?;

    Ok(Redirect::to(&format!("/products/{id}")))
}

The practical approach is both: set a TTL as a safety net, and explicitly invalidate on known write paths. The TTL catches any invalidation you missed.

The problem compounds with list queries. When you update a product, which cached list queries include it? Unless you can answer that precisely, you end up invalidating aggressively (clearing all product-related caches on any product write) or accepting staleness. Neither is free.

Graceful degradation

Never let a Redis failure break a request. Fall back to the database:

match redis.get::<_, Option<String>>(&cache_key).await {
    Ok(Some(json)) => {
        if let Ok(product) = serde_json::from_str(&json) {
            return Ok(product);
        }
    }
    Ok(None) => {} // Cache miss
    Err(e) => {
        tracing::warn!("Redis error, falling back to database: {e}");
    }
}

// Fetch from database
let product = sqlx::query_as!(Product, "SELECT * FROM products WHERE id = $1", id)
    .fetch_one(&state.db).await?;

Redis anti-patterns

  • Never set keys without a TTL. Unbounded memory growth leads to eviction storms or out-of-memory errors. Always use set_ex.
  • Never use the KEYS command. It blocks the single-threaded Redis server while scanning the entire keyspace. Use SCAN for iteration.
  • Use pipelining for multiple operations. Serial single-operation calls waste round trips:
let mut pipe = redis::pipe();
pipe.get("key1").get("key2").get("key3");
let (v1, v2, v3): (Option<String>, Option<String>, Option<String>) =
    pipe.query_async(&mut redis).await?;

Profiling Rust applications

When the techniques above are not enough, or when you need to identify where time is actually being spent, reach for a profiler. The table below covers the tools that work well with async Rust and Axum.

NeedToolPlatformNotes
CPU hotspotscargo-flamegraphLinux, macOSGenerates interactive SVG flamegraphs. Requires debug = true in release profile. Uses perf on Linux, xctrace on macOS.
CPU hotspots (interactive)samplyLinux, macOSOpens results in Firefox Profiler’s web UI. Better macOS experience than flamegraph.
Heap allocation profilingdhatAllRequires a #[global_allocator] swap and feature flag. View results in the DHAT online viewer.
Async runtime debuggingtokio-consoleAllTerminal UI showing task states, wakeup counts, and poll durations. Requires tokio_unstable cfg flag. Development only.
MicrobenchmarkscriterionAllStatistics-driven benchmarking with regression detection. Supports async with the async_tokio feature.
Per-request latencytower-http TraceLayerAllAlready covered in the observability section. Instrument handlers with #[instrument] for function-level timing.
Memory growth analysisheaptrackLinuxNo code changes needed. Uses LD_PRELOAD to intercept allocations.

The general workflow: start with TraceLayer and #[instrument] spans in the observability section to identify which requests are slow. Use pg_stat_statements and EXPLAIN ANALYZE if the slowness is in the database. Reach for flamegraph or samply when the bottleneck is in application code. Use criterion to benchmark specific functions before and after optimisation.

Practices

Rust Best Practices for Web Development

Rust-specific patterns that come up repeatedly when building web applications with this stack. This section focuses on the Rust angle: ownership patterns in request handlers, async pitfalls, linting configuration, and dependency decisions. Topics that have dedicated sections elsewhere (Tower middleware, database performance, project structure) are summarised here with links to the full treatment.

Ownership and borrowing in web contexts

Shared application state

Axum handlers receive shared application state through the State extractor. Since multiple handlers run concurrently, the state must be wrapped in Arc:

use std::sync::Arc;
use axum::extract::State;

struct AppState {
    db: sqlx::PgPool,
    config: AppConfig,
}

async fn list_contacts(
    State(state): State<Arc<AppState>>,
) -> impl IntoResponse {
    let contacts = sqlx::query_as!(Contact, "SELECT * FROM contacts")
        .fetch_all(&state.db)
        .await?;
    // ...
}

Register the state once when building the router:

let state = Arc::new(AppState { db: pool, config });
let app = Router::new()
    .route("/contacts", get(list_contacts))
    .with_state(state);

Use State for application-wide data (database pools, configuration, service handles). Use Extension only for request-scoped data injected by middleware, such as the authenticated user. State is type-safe at compile time; a missing .with_state() call produces a compiler error. Extension is not: a missing .layer(Extension(...)) compiles but panics at runtime.

Extractors give you owned data

Axum extractors (Path, Query, Form, Json) deserialise into owned types. Handlers receive owned String, Vec<T>, and struct fields. This matches the request lifecycle: each request is independent, so its data is naturally owned by the handler that processes it.

The practical consequence is that you rarely fight the borrow checker inside handlers. Borrowing becomes relevant for intermediate operations within the handler body, not for the handler’s inputs and outputs.

The extractor ordering rule

Extractors that consume the request body (Form, Json, Bytes, String, Multipart) implement FromRequest. Only one body-consuming extractor can appear per handler, and it must be the last parameter. Non-body extractors (Path, Query, State, HeaderMap) implement FromRequestParts and can appear in any order.

// Correct: State and Path before Form (body-consuming)
async fn update_contact(
    State(state): State<Arc<AppState>>,
    Path(id): Path<i64>,
    Form(data): Form<ContactForm>,
) -> impl IntoResponse {
    // ...
}

When you need mutability

Mutable shared state is uncommon in web applications. Database pools handle their own synchronisation internally. Configuration is read-only after startup. If you genuinely need mutable shared state, choose based on whether the lock crosses an .await:

  • No .await while holding the lock: Use std::sync::Mutex (or parking_lot::Mutex). Cheaper and simpler.
  • Must hold across .await: Use tokio::sync::Mutex. This is rare and usually a sign the design can be restructured.
// Short synchronous critical section — std::sync::Mutex is correct
let counter = state.counter.lock().unwrap();
let value = *counter + 1;
drop(counter); // Release before any .await

Holding a std::sync::Mutex guard across .await produces a !Send future that the compiler will reject. This is the compiler protecting you from a deadlock.

If you find yourself reaching for Arc<Mutex<T>> frequently, consider whether a channel-based design or a database-backed approach is more appropriate.

Clone discipline

Developers coming from garbage-collected languages tend to .clone() defensively. In web handlers, most data is already owned, so cloning is less necessary than it first appears.

Clone when you need a second owner. Do not clone to satisfy the borrow checker when restructuring the code would eliminate the need. Arc::clone is cheap (an atomic increment). Cloning a Vec<String> with thousands of elements is not.

For shared read-only data, wrap in Arc rather than cloning. For values that are sometimes borrowed and sometimes owned, use Cow<'a, str>.

Async Rust pitfalls

Axum runs on the Tokio multi-threaded runtime. The runtime spawns one worker thread per CPU core and uses cooperative scheduling: tasks must yield at .await points. Understanding this model prevents the most common async mistakes.

Blocking the runtime

The single most damaging mistake in async Rust web applications. If you block a runtime thread with synchronous work, every other task scheduled on that thread stalls. With 4 worker threads and each request blocking for 100ms, throughput caps at 40 requests per second regardless of how many tasks are queued.

Blocking operations include:

  • Synchronous file I/O (std::fs)
  • CPU-intensive computation (image processing, password hashing, compression)
  • std::thread::sleep
  • Any third-party library call that does not return a future

Wrap blocking work with tokio::task::spawn_blocking:

use tokio::task;

async fn hash_password(password: String) -> Result<String, anyhow::Error> {
    task::spawn_blocking(move || {
        let salt = SaltString::generate(&mut OsRng);
        let hash = Argon2::default()
            .hash_password(password.as_bytes(), &salt)?
            .to_string();
        Ok(hash)
    })
    .await?
}

spawn_blocking moves the work to a dedicated thread pool that does not interfere with the async runtime.

Holding guards across .await

Any RAII guard (mutex lock, database transaction handle, file handle) held across an .await point blocks the resource for the entire time the task is suspended. Other tasks waiting for that resource stall.

// Wrong: lock held across .await
let mut data = state.cache.lock().unwrap();
let result = fetch_from_db(&state.db).await; // Task suspends here, lock held
data.insert(key, result);

// Right: drop the lock before awaiting
let needs_fetch = {
    let data = state.cache.lock().unwrap();
    !data.contains_key(&key)
};
if needs_fetch {
    let result = fetch_from_db(&state.db).await;
    let mut data = state.cache.lock().unwrap();
    data.insert(key, result);
}

Scope guards tightly. Drop them before .await.

Send bounds

Axum handlers must return Send futures because the multi-threaded runtime moves tasks between threads. The most common way to produce a !Send future is holding a !Send type (like an Rc or a std::sync::MutexGuard) across an .await. The compiler error messages for this are notoriously unhelpful, but the fix is almost always: restructure so the !Send value is dropped before the .await.

Task starvation

A long-running CPU-bound loop inside an async task starves all other tasks on that thread. Unlike Go’s goroutines, Rust async tasks do not pre-empt. They must voluntarily yield.

For loops that do significant work per iteration, either move the entire operation to spawn_blocking or insert periodic yields:

for item in large_collection {
    process(item);
    tokio::task::yield_now().await;
}

Cancellation safety

When tokio::select! resolves one branch, all other futures are dropped immediately. If a dropped future was partway through writing data or accumulating state, that work is lost. Use cancel-safe primitives (tokio channels, tokio::time::interval) and keep critical state outside futures that participate in select!.

async fn in traits

Native async fn in traits landed in Rust 1.75 (December 2023). Use it where possible. The limitation: native async trait methods are not dyn-compatible. If you need dyn Trait with async methods, the async-trait crate is still required.

Tower middleware

Tower middleware is covered in detail in the Web Server with Axum section. The key patterns for day-to-day use:

Execution order

Requests flow through layers outside-in; responses return inside-out. With Router::layer(), middleware executes bottom-to-top (the last .layer() call runs first on requests). With tower::ServiceBuilder, middleware executes top-to-bottom. This inversion is a common source of confusion.

Three approaches to custom middleware

  1. axum::middleware::from_fn: Write an async function. No Tower boilerplate. Use this for application-specific concerns (auth checks, request logging, header injection).
  2. Implement Layer + Service: Full Tower machinery. Needed when you must wrap the response body or share middleware across non-Axum services.
  3. tower-http crates: For standard concerns (tracing, compression, CORS, timeouts, request IDs), always use tower-http. Do not reimplement these.

Clone requirement

Services must be Clone because Axum clones them for concurrent request handling. Wrap middleware state in Arc to make cloning cheap.

Dependency management

Choosing when to pull in a crate and when to implement yourself is a recurring decision. The general principle: use crates for complex, security-sensitive, or system-level concerns; implement trivial operations yourself.

Use a crate when

  • Security is involved: Cryptography, TLS, password hashing, authentication. Getting these wrong has real consequences, and the ecosystem crates (argon2, rustls, jsonwebtoken) are well-audited.
  • The problem is complex and well-solved: Serialisation (serde), async runtime (tokio), HTTP (hyper/axum), database access (sqlx). These represent thousands of hours of work and extensive testing.
  • Unsafe code is required: Crates that isolate unsafe behind a safe API (database drivers, system interfaces) are doing work you should not duplicate.

Implement yourself when

  • The operation is trivial: A few lines of string manipulation or data transformation do not justify a dependency. If you need one function from a crate that brings in 30 transitive dependencies, write the function.
  • It is glue code between your types: Conversion traits, domain-specific validation, serialisation adapters between your own types belong in your codebase.
  • The crate is heavier than your need: Check cargo tree to see what a crate pulls in. Prefer lighter alternatives when they exist (futures-lite over futures, ureq over reqwest if you do not need async).

Evaluating crates

Check maintenance status (last release, open issues), transitive dependency count (cargo tree -d), and reverse dependencies on crates.io. Use --no-default-features and enable only what you need. Run cargo-deny in CI to check licenses, known advisories, and duplicate versions. For high-security applications, cargo-vet provides formal dependency auditing.

The rust-deps skill in Claude Code provides crate-specific guidance when you need it.

Performance

Rust web application performance is covered in depth in Web Application Performance. The Rust-specific patterns that matter most:

Where Rust helps automatically

  • No garbage collection pauses: Consistent p99 latency under sustained load. Memory is freed deterministically at scope boundaries.
  • Low per-request overhead: No interpreter, no JIT warmup. The first request is as fast as the millionth.
  • Fast serialisation: Serde significantly outperforms JSON libraries in most other languages.

Where Rust does not help

The database is almost always the bottleneck in CRUD applications. Rust’s speed does not compensate for missing indexes, N+1 query patterns, or an undersised connection pool. Run EXPLAIN ANALYZE before optimising Rust code.

Common Rust-specific performance mistakes

  • Excessive cloning in hot paths. Use Arc for shared read-only data, Cow<'a, str> for borrow-or-own scenarios, and Vec::with_capacity() when the size is known.
  • Blocking the async runtime (see above). A single blocking call degrades throughput for all concurrent requests.
  • Connection pool exhaustion. SQLx pools have a limited number of connections. Under load, handlers queue waiting. Size the pool appropriately (a starting point: 2x CPU cores) and monitor pool wait times.

Profiling tools

  • tokio-console: Real-time async task visualisation. Invaluable for diagnosing task starvation and blocked tasks.
  • cargo flamegraph: CPU profiling with flame graph output.
  • dhat: Heap allocation profiling.

Measure before optimising. The most common performance problems are architectural (blocking the runtime, slow queries, pool sizing) rather than language-level (clone costs, allocation patterns).

Code organisation

Project structure is covered in Project Structure. The conventions relevant to daily coding:

Module organisation within a crate

src/
├── handlers/      # Axum handler functions, one file per resource
├── models/        # Domain types, DTOs
├── db/            # Database access functions
├── errors.rs      # Crate-specific error types
├── lib.rs         # Public API re-exports
└── main.rs        # Binary entry point (if applicable)

Route organisation

Each feature module exposes a routes() function. Compose them with Router::nest():

pub fn routes() -> Router<Arc<AppState>> {
    Router::new()
        .route("/", get(list_contacts).post(create_contact))
        .route("/{id}", get(show_contact).put(update_contact))
        .route("/{id}/delete", post(delete_contact))
}
// In main router assembly
let app = Router::new()
    .nest("/contacts", contacts::routes())
    .nest("/invoices", invoices::routes())
    .with_state(state);

When to split into a new crate

Split when you have a clear boundary: a different domain, a different dependency set, or when compilation time for the crate becomes painful. A single crate with well-organised modules is preferable to many tiny crates with unclear boundaries.

Clippy lints and formatting

Workspace-level lint configuration

Configure lints once in the workspace root Cargo.toml. Each member crate opts in with [lints] workspace = true, giving a single source of truth for the entire project.

# Root Cargo.toml
[workspace.lints.clippy]
# Enable pedantic as warnings — selectively suppress noisy lints per crate
pedantic = { level = "warn", priority = -1 }

# Deny patterns that indicate bugs or incomplete code
unwrap_used = "deny"
expect_used = "deny"
panic = "deny"
dbg_macro = "deny"
todo = "deny"
print_stdout = "deny"
print_stderr = "deny"

# Async safety
await_holding_lock = "deny"
large_futures = "warn"

# Prevent suppressing lints with #[allow(...)]
allow_attributes = "deny"

[workspace.lints.rust]
unsafe_code = "deny"
# Each member crate's Cargo.toml
[lints]
workspace = true

Why deny unwrap, expect, and panic

These three cause the process to abort on failure. In a request handler, that takes down the entire server. Denying them forces proper error handling with ? and explicit error types. This is especially valuable when AI coding agents generate code, as they frequently reach for .unwrap() as the path of least resistance.

The strict deny applies everywhere, including tests and startup. In tests, replace .unwrap() with .expect("test: reason") or return Result from the test function. In main(), use .expect("fatal: reason") with a #[expect(clippy::expect_used)] annotation (see below).

Prefer #[expect] over #[allow]

When you genuinely need to suppress a lint, use #[expect(clippy::lint_name)] instead of #[allow(clippy::lint_name)]. The difference: #[expect] triggers a warning if the lint it suppresses is no longer produced. This means suppression annotations do not silently outlive their usefulness.

// Good: if the unwrap is later removed, the compiler warns that this
// #[expect] is unnecessary, prompting you to clean it up
#[expect(clippy::expect_used, reason = "fatal if database URL is missing")]
fn main() {
    let db_url = std::env::var("DATABASE_URL")
        .expect("DATABASE_URL must be set");
}

// Bad: #[allow] stays forever, silently suppressing the lint
// even after the code it was meant to cover has changed
#[allow(clippy::expect_used)]
fn main() { /* ... */ }

The allow_attributes = "deny" lint in the workspace configuration enforces this. Any #[allow(...)] produces a compiler error, requiring #[expect(...)] instead. This is particularly effective when working with AI coding agents, which tend to silence errors with #[allow] rather than fixing the underlying issue.

Commonly suppressed pedantic lints

The pedantic group is worth enabling but includes lints that produce noise in web application code. Suppress these per crate as needed:

# In a member crate's Cargo.toml if needed
[lints.clippy]
module_name_repetitions = "allow"  # contacts::ContactHandler is natural
must_use_candidate = "allow"       # Noisy for handler return types
missing_errors_doc = "allow"       # Useful for libraries, excessive for app code
missing_panics_doc = "allow"       # Redundant when panics are denied

clippy.toml

Create a clippy.toml at the workspace root for threshold-based configuration:

cognitive-complexity-threshold = 15
type-complexity-threshold = 200
too-many-lines-threshold = 100

rustfmt

Use default rustfmt settings. The value of consistent formatting across the Rust ecosystem outweighs individual preferences. A minimal rustfmt.toml if needed:

edition = "2021"
max_width = 100
use_field_init_shorthand = true

CI integration

Run both in CI to enforce standards:

cargo fmt --all -- --check
cargo clippy --all-targets --all-features -- -D warnings

The -D warnings flag promotes all warnings to errors, so lint violations fail the build.

Gotchas

  • Axum’s #[debug_handler] produces clear compiler errors when handler signatures are wrong. Use it during development, remove it before release.
  • Turbofish syntax (::<Type>) is needed more often than you might expect with SQLx queries. When query_as cannot infer the output type, be explicit: query_as::<_, Contact>(...).
  • Feature flag accumulation: Cargo features are additive and cannot be disabled by dependents. If crate A enables tokio/full and crate B only needs tokio/rt, the workspace gets tokio/full. Audit features with cargo tree -f '{p} {f}'.
  • Compilation times: Rust compilation is slow. Use cargo-watch for incremental rebuilds, the mold linker on Linux (or the default linker on macOS, which is already fast), and split your workspace into crates so only changed code recompiles.
  • Error messages from trait bounds: When Axum rejects a handler, the error often refers to missing trait implementations deep in Tower’s type system. Enable #[debug_handler] first. If the error persists, check that all extractors are in the right order and that the return type implements IntoResponse.
  • impl IntoResponse hides the type: Returning impl IntoResponse from handlers is ergonomic but means you cannot name the return type elsewhere. If you need to store handlers in a collection or return them from a function, use axum::response::Response as the concrete type.

Building with AI Coding Agents

AI coding agents work better with Rust than most developers expect. The compiler provides deterministic, actionable feedback at every step: propose code, compile, read the error, revise. In dynamically typed languages, an agent can produce code that runs but is subtly wrong. In Rust, the borrow checker, type system, and lifetime rules catch entire categories of errors before the code ever executes. This feedback loop turns the agent’s stochastic generation into a convergence process. The strictness that makes Rust harder to learn makes it easier for agents to get right.

That advantage has limits. Agents hallucinate crate names and API surfaces, produce code with security vulnerabilities at high rates, and make architectural decisions that compile but are not what you want. This section covers how to structure your project for effective agent collaboration, what agents handle well, where human judgment is essential, and how to review what they produce.

Structuring your project for AI agents

AI coding agents work from context. The more precise the context, the better the output. Without project-specific instructions, an agent falls back on its training data, which contains code from every version of every crate, every architectural style, and every level of quality. Give it a narrow lane to work in.

The CLAUDE.md file

Place a CLAUDE.md file in your project root. Claude Code reads this file automatically at the start of every session. Other tools have their own conventions (.cursor/rules/*.mdc for Cursor, .github/copilot-instructions.md for GitHub Copilot), but the content principles are the same.

Keep it lean. Target under 200 lines. The file is injected into the agent’s context window alongside its system prompt and your conversation, so every line competes for attention. A 500-line instruction file dilutes the instructions that matter.

Structure the file around three concerns:

What this project is. The tech stack, architectural style, and key crates with versions.

## Stack
- Axum 0.8 (web framework)
- Maud 0.26 (HTML templating)
- SQLx 0.8 (database, compile-time checked queries, Postgres)
- htmx 2.0 (interactivity)
- tower-sessions 0.14 (session management)

How to build and verify. The commands an agent needs to check its own work.

## Commands
- `cargo check` — fast type checking
- `cargo clippy --all-targets -- -D warnings` — lint with warnings as errors
- `cargo test --workspace` — full test suite
- `cargo sqlx prepare --workspace` — regenerate SQLx query cache

What conventions to follow. Error handling patterns, module organisation, naming conventions, anything the agent would get wrong without guidance.

## Conventions
- Error handling: thiserror for library crates, anyhow for application crates
- All handlers return Result<impl IntoResponse, AppError>
- HTML fragments for htmx requests, full pages for normal requests
- British English in user-facing strings

Hierarchical instruction files

For larger projects, place additional CLAUDE.md files in subdirectories. A file in crates/web/CLAUDE.md provides context specific to the web crate without cluttering the root file. Claude Code merges these automatically when working in that directory.

This mirrors how a team onboards a new developer: general project context first, then module-specific conventions as they start working in a particular area.

What to leave out

Do not duplicate what tools already enforce. Formatting rules belong in rustfmt.toml. Lint configuration belongs in clippy.toml or Cargo.toml lint sections. The instruction file covers what the agent cannot infer from tooling configuration.

Avoid task-specific instructions. “When writing a new handler, always add a test” is a good instruction. “Add a handler for /users/{id}/edit that returns an edit form” is a task, not a convention. Tasks belong in your conversation with the agent, not in the instruction file.

Cross-tool compatibility

AGENTS.md is an emerging cross-tool standard backed by the Linux Foundation’s Agentic AI Foundation, with support from Claude Code, Cursor, Copilot, Codex, and others. If your team uses multiple AI tools, an AGENTS.md file provides a single source of project context that all tools read. The content guidance is identical to what is described above for CLAUDE.md.

ToolInstruction fileNotes
Claude CodeCLAUDE.mdHierarchical, nested in subdirectories
Cursor.cursor/rules/*.mdcYAML frontmatter with activation modes
GitHub Copilot.github/copilot-instructions.mdSupported since late 2024
Cross-toolAGENTS.mdLinux Foundation backed, 60k+ projects

Using this guide as agent context

This guide is designed to work as context for AI coding agents. Each section is self-contained, uses explicit file paths and crate names, and avoids implied knowledge that requires reading other sections first.

When starting a task, give the agent the relevant section from this guide alongside your project’s instruction file. If you are implementing authentication, provide the authentication section. If you are setting up deployment, provide the deployment section. The agent gets current, opinionated guidance instead of drawing on its training data, which may reference deprecated APIs or different architectural patterns.

For Claude Code specifically, this works through the instruction file hierarchy. Reference sections by linking to them or by including the key patterns inline in a subdirectory CLAUDE.md:

## Auth patterns
- Session-based auth with tower-sessions and sqlx-store
- See the project guide's authentication section for implementation details
- Password hashing: argon2 crate, never store plaintext
- CSRF: tower-csrf middleware on all state-changing endpoints

The goal is not to paste entire sections into context. It is to give the agent enough anchoring information that it produces code consistent with your chosen patterns rather than inventing its own.

Writing effective prompts for Rust web development

Prompting an AI agent for Rust web development is more constrained than prompting for Python or JavaScript. The type system defines a narrow space of valid programs, and the more of that space you specify upfront, the better the output.

Specify crate versions

The single most impactful habit. Agents draw on training data spanning multiple years of crate releases. Axum 0.7 and 0.8 have different APIs. SQLx 0.7 and 0.8 changed their macro syntax. Stating the version explicitly prevents the agent from generating code for the wrong API surface.

Put versions in your instruction file so you do not repeat them in every prompt. When they appear in context, the agent uses them consistently.

Provide type signatures

Rust’s type system constrains solutions. Providing an explicit function signature gives the agent a precise target:

“Write a handler with this signature: async fn create_user(State(pool): State<PgPool>, Form(payload): Form<CreateUserForm>) -> Result<impl IntoResponse, AppError>. It should insert the user into the users table and redirect to /users/{id}.”

This is more effective than “write a handler that creates a user” because the agent does not need to guess the extractor types, error handling approach, or return type.

Show one example, ask for variations

Instead of describing a pattern from scratch, show the agent one working handler, test, or component and ask for similar ones. The agent matches the style, error handling, and conventions of the example rather than inventing its own. This produces more consistent codebases than generating each piece independently.

State the response format for htmx handlers

Agents default to generating full HTML pages. For htmx-driven applications, most handlers return HTML fragments. Be explicit: “This handler responds to an htmx request and returns an HTML fragment. Do not include <html>, <head>, or <body> tags.”

Provide the database schema

Include the relevant CREATE TABLE statements when asking for database-related code. This prevents the agent from hallucinating column names, types, or relationships. For sqlx::query_as! macros, the agent cannot run compile-time verification itself, so the schema serves as the source of truth.

Use iterative refinement

Ask the agent to review its own output before you accept it. “Review the code you just wrote for non-idiomatic Rust patterns, unnecessary allocations, and missing error cases. Fix any issues you find.” The OpenSSF’s security-focused guide for AI code assistant instructions specifically recommends this recursive self-review pattern over telling the agent it is an expert.

Patterns AI agents handle well

Agents perform best on tasks where the type system constrains the solution space and the pattern is well-represented in training data.

CRUD handlers. Standard create, read, update, delete operations with Axum extractors and SQLx queries. The combination of typed extractors, parameterised queries, and structured return types leaves little room for the agent to go wrong.

Trait implementations. Generating impl blocks for Display, From, Serialize, Deserialize, IntoResponse, and similar traits. The compiler defines the expected shape precisely.

Test scaffolding. Given a function signature and expected behaviour, agents produce solid test structures. Rust’s #[cfg(test)] module pattern and assert! macros are well-represented in training data. Review the assertions for correctness, since a test that always passes proves nothing.

Boilerplate and repetitive code. Migration files, configuration parsing, middleware setup, route registration. These follow established patterns with little variation.

Explaining compiler errors. When you hit a confusing borrow checker or lifetime error, asking the agent to explain it is often faster than searching for the error code. Current models understand Rust’s ownership semantics well enough to give accurate explanations.

Multi-file consistency. Agents that operate across files (Claude Code, Cursor, Windsurf) maintain synchronisation between handler definitions, route registration, and type declarations. This is one area where agents save significant manual coordination effort.

Areas where human judgment is needed

Agents generate code that compiles. Compiling is necessary but not sufficient. These areas require active human judgment, not just verification that the build passes.

Architectural decisions

Agents optimise for completing the immediate task. They do not consider how a piece of code fits into the broader system. An agent will happily put everything in main.rs if you do not specify a module structure. It will create a new database connection pool per request if you do not show it the shared state pattern. Architectural decisions, where to draw module boundaries, how to structure the workspace, when to extract a crate, remain human responsibilities.

Crate selection

Agents hallucinate crate names. They recommend crates that do not exist, suggest deprecated crates, or use the wrong crate for the job. In Shuttle’s 2025 testing of seven AI tools on the same Rust project, hallucinated crate versions were the most consistently reported problem across all tools. Always verify that a suggested crate exists on crates.io and check its maintenance status before adding it to Cargo.toml.

Ownership and lifetime design

Agents can fix individual borrow checker errors, but they sometimes fix them by adding unnecessary .clone() calls or wrapping things in Arc<Mutex<>> when a simpler restructuring would work. The resulting code compiles but carries hidden performance costs and obscures the ownership model. When an agent adds a clone to satisfy the compiler, consider whether the data flow should be restructured instead.

Performance-sensitive code paths

Agents produce code that is functionally correct but not necessarily performant. Hidden allocations (.collect::<Vec<_>>() when streaming would work), blocking calls in async contexts, holding locks across await points, these compile and pass tests but degrade under load. In hot paths, review the generated code for unnecessary allocations and synchronisation overhead.

Error handling granularity

Agents tend toward two extremes: either they use .unwrap() everywhere or they create overly granular error types for every possible failure. Neither is appropriate. Error handling requires judgment about which failures the caller can handle, which should propagate, and which need logging.

Sensitive business logic

Authorisation rules, pricing calculations, data retention policies, anything where a subtle bug has business consequences beyond a 500 error. These require understanding the domain, not just the type system. Use agents to generate the scaffolding, then write the core logic yourself or review it with particular care.

Review practices for AI-generated Rust code

Rust’s toolchain provides a review pipeline that catches more issues automatically than any other mainstream language. Use it.

The automated pipeline

Run these checks on every piece of AI-generated code, in order:

  1. cargo check — Fast type checking without full compilation. Catches type errors, borrow checker violations, and missing trait implementations. If this fails, the code has fundamental problems. Send the error back to the agent.

  2. cargo clippy --all-targets --all-features -- -D warnings — Clippy provides over 600 lints. It catches non-idiomatic patterns, common performance mistakes, and correctness issues. Treat warnings as errors. If clippy flags something, fix it before proceeding.

  3. cargo test --workspace — Run the full test suite. If the agent wrote tests, verify that they actually test meaningful behaviour. A test that asserts true == true passes but proves nothing.

What to look for in human review

After the automated pipeline passes, review the code for issues that tools cannot catch:

Unnecessary clones and allocations. Agents satisfy the borrow checker by adding .clone() where restructuring the data flow would be better. Look for clones of large types, clones inside loops, and String allocations where &str would suffice.

Over-engineering. Agents sometimes introduce unnecessary traits, generic parameters, or abstraction layers for code that does one thing. Three lines of straightforward code is better than a generic trait with one implementor.

Hidden unwrap() calls. Search generated code for .unwrap() and .expect(). In handler code, these cause panics that crash the request (or worse, the server). They should be replaced with proper error propagation using ? and typed errors.

Stale or hallucinated dependencies. Check Cargo.toml changes. Verify that any new crate the agent added actually exists, is maintained, and is the right tool for the job. Check the version number against crates.io.

SQL query correctness. SQLx’s compile-time checking validates syntax and types against the database schema, but it does not verify business logic. A query that returns the wrong rows or updates the wrong records compiles fine. Read the SQL.

Security review

AI-generated code contains security vulnerabilities at rates that warrant systematic review. Veracode’s 2025 study across 100+ LLMs found security flaws in 45% of generated code. The Stanford study found that developers using AI assistants wrote less secure code while believing it was more secure.

Rust’s memory safety eliminates one category of vulnerabilities (buffer overflows, use-after-free, null pointer dereferences), but does nothing for application-level security: SQL injection, XSS, hardcoded secrets, improper access control, information leakage through error messages.

See the web application security section for a thorough treatment of security practices and a review checklist specific to AI-generated code. The short version: never trust that generated code handles user input safely, always verify that secrets come from environment variables, and check that error responses do not leak internal details.

The research-plan-implement workflow

The techniques above address individual prompts and reviews. The broader question is how to organise an entire feature’s worth of AI-assisted work. A pattern that works well: split the work into three phases, each with its own fresh context.

Research. Explore the codebase, identify existing patterns, map the relevant types and modules. Compress the findings into a focused summary. This phase is about understanding what exists before deciding what to change. The agent is good at this: reading files, tracing call chains, summarising structure. The output is a short document, not code.

Plan. Using the research summary as input, produce a detailed execution plan: which files to create or modify, in what order, with what interfaces. Include test criteria and references to specific code locations. Human review at this stage is high leverage. Catching an architectural mistake in a plan costs one line of editing. Catching it after implementation costs a rework cycle.

Implement. Feed the plan and only the necessary source files into a fresh context. Work in chunks, testing between steps. The plan constrains the agent’s decisions, reducing the chance of it inventing its own architecture or drifting from the intended design.

Each phase starts with a clean context window. This matters because of context rot: agent performance degrades as the context fills with stale conversation history, abandoned approaches, and accumulated noise. Research suggests reasoning quality peaks around 40% context window utilisation. Long, sprawling sessions where research, planning, and implementation all happen in one thread produce worse results than short, focused sessions with clear inputs.

The key insight is that human leverage is highest at the research and planning stages, not at the code level. A wrong assumption in research multiplies into dozens of wrong lines of code. A plan that specifies the wrong module boundary produces a coherent but misguided implementation. Catch errors early, where they are cheap to fix.

Gotchas

Hallucinated crate versions are the most common problem. Agents confidently generate code using APIs from older versions of Axum, SQLx, tokio, and other rapidly evolving crates. Specifying versions in your instruction file mitigates this but does not eliminate it. Always verify that generated code uses the current API surface.

Agents break working code on subsequent edits. A common failure mode: the agent writes correct code for a feature, then on a later edit to the same file, modifies or deletes the earlier code. Review diffs carefully, not just the new code. Use version control to catch regressions.

Tests generated by agents need review. Agents produce tests that compile and pass but sometimes test the wrong thing or test trivial properties. A test for a create-user handler that never checks whether the user was actually persisted to the database is worse than useless, since it provides false confidence.

Agents fight the borrow checker with brute force. When an agent encounters a lifetime or borrowing error, it sometimes adds Arc<Mutex<>> wrapping, unnecessary clones, or 'static lifetime bounds rather than restructuring the code. The result compiles but is not idiomatic and may have performance implications. If an agent’s fix involves wrapping something in Arc<Mutex<>> that was not originally behind one, ask why the ownership model needs shared mutable state.

Context window limits affect large projects. Rust projects with deep module trees and many crates can exceed what an agent can hold in context. When working on a large workspace, guide the agent to the specific crates and files relevant to the task rather than expecting it to understand the entire project.

Further reading

These posts explore the practices touched on in this section in more depth:

  • AI Engineer vs. Sloperator — The distinction between producing quality code with AI tools and generating slop. Covers context rot, the research-plan-implement workflow, and how to configure projects for agent collaboration.
  • Context Engineering Is the Job — Context engineering as the core discipline of working with LLMs. How to gather, curate, and manage the information that goes into each generation step.
  • Thinking in Plans, Not Code — Progressive refinement from requirements through detailed plans before implementation. Why the planning phase, not the coding phase, is where quality is determined.
  • Code Review in AI-Augmented Development — How code review changes when AI generates the code. Right-sizing work units, reviewing plans before code, and triaging review effort toward high-risk areas.

© 2026, made with ❤️ by Daz