Web Application Performance

Rust already eliminates the largest performance tax in most web stacks: garbage collection pauses, interpreter overhead, and runtime type-checking. An Axum application serving HTML fragments through Maud starts fast and stays fast under load. The HDA architecture adds a structural advantage: no client-side framework bundle to download and parse, no JSON serialisation layer between server and browser, and smaller payloads because HTML fragments replace full JSON responses plus client-side rendering.

That said, performance work in any stack follows the same rule: measure first, then optimise what the measurements show. Adding caching, compression layers, or index hints without evidence of a real problem creates complexity that must be maintained, debugged, and reasoned about. Every technique in this section is worth knowing. None of them should be applied preemptively.

HTTP caching headers

HTTP caching is the highest-leverage performance tool available. A response that never reaches your server costs nothing to serve.

Cache-Control for dynamic responses

Set Cache-Control headers in Axum handlers using tuple responses:

use axum::{
    http::header,
    response::IntoResponse,
};
use maud::{html, Markup};

async fn product_page(/* ... */) -> impl IntoResponse {
    let markup: Markup = html! { /* ... */ };

    (
        [(header::CACHE_CONTROL, "public, max-age=300")],
        markup,
    )
}

For pages that must revalidate on every request but can still benefit from conditional caching:

(
    [(header::CACHE_CONTROL, "no-cache")],
    markup,
)

no-cache does not mean “don’t cache.” It means “cache, but revalidate with the server before using.” Combined with an ETag, the server can respond with 304 Not Modified and skip sending the body entirely.

ETags and conditional responses

An ETag is a fingerprint of the response content. When the browser sends the ETag back in an If-None-Match header, the server can return a 304 Not Modified if the content has not changed, saving bandwidth and rendering time.

use axum::{
    body::Body,
    extract::Request,
    http::{header, Response, StatusCode},
    response::IntoResponse,
};
use std::hash::{DefaultHasher, Hash, Hasher};

fn compute_etag(content: &str) -> String {
    let mut hasher = DefaultHasher::new();
    content.hash(&mut hasher);
    format!("\"{}\"", format!("{:x}", hasher.finish()))
}

async fn cacheable_page(req: Request) -> Response<Body> {
    let html = render_page();
    let etag = compute_etag(&html);

    // Check If-None-Match from the browser
    if let Some(if_none_match) = req.headers().get(header::IF_NONE_MATCH) {
        if if_none_match.to_str().ok() == Some(etag.as_str()) {
            return Response::builder()
                .status(StatusCode::NOT_MODIFIED)
                .header(header::ETAG, &etag)
                .header(header::CACHE_CONTROL, "public, max-age=60")
                .body(Body::empty())
                .unwrap();
        }
    }

    Response::builder()
        .status(StatusCode::OK)
        .header(header::ETAG, &etag)
        .header(header::CACHE_CONTROL, "public, max-age=60")
        .header(header::CONTENT_TYPE, "text/html; charset=utf-8")
        .body(Body::from(html))
        .unwrap()
}

For pages where generating the full HTML is itself expensive, ETags are less useful because you must still render the content to compute the hash. In those cases, consider using a version number, last-modified timestamp from the database, or a cache layer (covered below).

Blanket headers with tower-http

Apply Cache-Control to groups of routes using SetResponseHeaderLayer. This sets the header only if the handler did not already set one, so individual handlers can override the default.

use axum::{routing::get, Router};
use tower_http::set_header::SetResponseHeaderLayer;
use http::{header, HeaderValue};

let app = Router::new()
    .route("/products", get(list_products))
    .route("/products/{id}", get(product_page))
    .layer(
        SetResponseHeaderLayer::if_not_present(
            header::CACHE_CONTROL,
            HeaderValue::from_static("public, max-age=60"),
        )
    );

[dependencies]
tower-http = { version = "0.6", features = ["set-header"] }

Caching guidelines by content type

Content type	Cache-Control	Rationale
Static assets (CSS, JS, images)	`public, max-age=31536000, immutable`	Content-hashed filenames mean the URL changes when the file changes. Cache forever.
Public pages (product listing, homepage)	`public, max-age=60` to `max-age=300`	Short TTL allows quick updates. Adjust based on how frequently content changes.
Personalised pages (dashboard, profile)	`private, no-cache`	Must not be cached by shared proxies. Revalidate on every request.
HTMX fragments	`no-store` or `no-cache`	Fragments usually reflect current state. `no-store` prevents any caching. `no-cache` allows ETag-based revalidation.
API responses (JSON, if you have them)	`private, no-cache` or short `max-age`	Depends on the data. Default to conservative.

Static asset caching with content-hashed filenames is covered in the CSS section. The immutable directive tells browsers not to revalidate even when the user reloads the page, eliminating conditional requests entirely for fingerprinted assets.

Response compression

Compressing responses reduces bandwidth and improves page load times, particularly on slower connections. tower-http provides a compression middleware that negotiates the algorithm from the client’s Accept-Encoding header.

[dependencies]
tower-http = { version = "0.6", features = ["compression-gzip", "compression-br"] }

use axum::{routing::get, Router};
use tower_http::compression::CompressionLayer;

let app = Router::new()
    .route("/", get(index))
    .layer(CompressionLayer::new());

CompressionLayer::new() enables all compiled-in algorithms and negotiates automatically. By default it skips images, gRPC responses, Server-Sent Events, and responses smaller than 32 bytes.

Choosing algorithms

Enable gzip and Brotli for broad browser support. Zstandard (zstd) compresses faster than both but lacks Safari support as of early 2026.

let compression = CompressionLayer::new()
    .gzip(true)
    .br(true)
    .zstd(false); // Omit until Safari support lands

Tuning the compression predicate

The default predicate is conservative. Raise the minimum response size to avoid compressing tiny responses where the overhead exceeds the savings:

use tower_http::compression::{
    CompressionLayer,
    predicate::{NotForContentType, SizeAbove, Predicate},
};

let predicate = SizeAbove::new(256)
    .and(NotForContentType::IMAGES)
    .and(NotForContentType::SSE)
    .and(NotForContentType::GRPC);

let compression = CompressionLayer::new().compress_when(predicate);

Compression level

For dynamic HTML responses, CompressionLevel::Default balances compression ratio against CPU cost. Avoid CompressionLevel::Best for dynamic content; the marginal size reduction does not justify the CPU cost per request.

use tower_http::compression::CompressionLevel;

let compression = CompressionLayer::new()
    .quality(CompressionLevel::Default);

For static assets, pre-compress at build time rather than compressing on every request. tower-http’s ServeDir supports pre-compressed files:

use tower_http::services::ServeDir;

let static_files = ServeDir::new("static")
    .precompressed_br()
    .precompressed_gzip();

Place pre-compressed files alongside the originals (app.css.br, app.css.gz). ServeDir selects the correct variant based on the request’s Accept-Encoding.

Database query performance

PostgreSQL with proper indexes handles far more traffic than most developers expect. Before reaching for caching or read replicas, check that queries are efficient.

EXPLAIN ANALYZE

EXPLAIN ANALYZE executes a query and reports the actual execution plan, timing, and row counts.

EXPLAIN (ANALYZE, BUFFERS)
SELECT u.id, u.name, count(p.id) AS post_count
FROM users u
LEFT JOIN posts p ON p.user_id = u.id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name;

Read the plan from the innermost (most indented) nodes outward. Look for:

Seq Scan on large tables. A sequential scan on a table with thousands of rows, where the query selects a small fraction, signals a missing index.
Estimated vs actual row divergence. When the planner estimates 10 rows but actual is 50,000, it picks a bad join strategy. Run ANALYZE tablename to update statistics.
Nested Loop with high loops count. Multiply actual time by loops to get the real elapsed time. A node showing actual time=0.05ms loops=10000 is really consuming 500ms.
Sort on disk. If a Sort node reports Sort Method: external merge Disk, increase work_mem or reduce the data being sorted.

BUFFERS is particularly useful: shared hit shows pages read from the PostgreSQL buffer cache, shared read shows pages fetched from disk. High read values on frequently-executed queries mean the working set exceeds shared_buffers.

Indexing strategies

B-tree (the default) covers equality, range, and sorting queries. Place equality columns before range columns in multicolumn indexes:

-- Speeds up WHERE tenant_id = $1 AND created_at > $2
CREATE INDEX idx_orders_tenant_created ON orders (tenant_id, created_at);

Partial indexes cover only rows matching a condition. Effective for queue-like patterns:

-- Only index unprocessed jobs
CREATE INDEX idx_jobs_pending ON jobs (priority, created_at)
    WHERE status = 'pending';

Covering indexes (INCLUDE) store extra columns in the index to enable index-only scans:

-- SELECT email, name FROM users WHERE email = $1
-- Both columns served from the index, no heap lookup
CREATE INDEX idx_users_email_covering ON users (email) INCLUDE (name);

GIN indexes handle JSONB, arrays, and full-text search vectors. They are larger and slower to update than B-tree but fast for containment lookups.

Do not create indexes speculatively. Every index slows down writes. Add indexes when EXPLAIN ANALYZE shows a problem. Use CREATE INDEX CONCURRENTLY on production tables to avoid locking writes during creation.

Monitor index usage

Find unused indexes that are slowing down writes for no benefit:

SELECT schemaname, relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

The N+1 problem

One query to fetch a list, then one query per item to fetch related data. The most common performance problem in web applications.

// 1 query: fetch all orders
let orders = sqlx::query_as!(Order, "SELECT * FROM orders LIMIT 50")
    .fetch_all(&pool).await?;

// 50 queries: fetch user for each order
for order in &orders {
    let user = sqlx::query_as!(User,
        "SELECT name FROM users WHERE id = $1", order.user_id)
        .fetch_one(&pool).await?;
}

Fix with a JOIN:

let results = sqlx::query_as!(
    OrderWithUser,
    r#"SELECT o.id as order_id, o.total, u.name as user_name
       FROM orders o
       JOIN users u ON u.id = o.user_id
       LIMIT 50"#
)
.fetch_all(&pool).await?;

Or batch-load with ANY when JOINs are not practical:

let user_ids: Vec<i32> = orders.iter().map(|o| o.user_id).collect();
let users = sqlx::query_as!(User,
    "SELECT id, name FROM users WHERE id = ANY($1)", &user_ids)
    .fetch_all(&pool).await?;

Connection pool tuning

SQLx’s default pool settings are conservative. Tune them for web workloads:

use sqlx::postgres::PgPoolOptions;
use std::time::Duration;

let pool = PgPoolOptions::new()
    .max_connections(20)
    .min_connections(2)
    .acquire_timeout(Duration::from_secs(5))
    .idle_timeout(Duration::from_secs(300))
    .max_lifetime(Duration::from_secs(1800))
    .connect(&database_url)
    .await?;

Key adjustments:

max_connections: set based on PostgreSQL’s max_connections divided by the number of application instances. With PostgreSQL’s default of 100 and four app instances, 20 per instance leaves headroom for admin connections and migrations.
min_connections: set to 2-5 to avoid cold-start latency. The default (0) means the first requests after an idle period wait for TCP handshake, TLS negotiation, and authentication.
acquire_timeout: the default (30 seconds) is far too long for a web request. Set to 3-5 seconds. Fail fast with a 503 rather than making the user wait.

Keeping transactions short

Do not hold database connections open during external HTTP calls or other non-database I/O:

// Bad: holds a connection for the entire duration of the HTTP call
let mut tx = pool.begin().await?;
sqlx::query!("UPDATE orders SET status = 'processing' WHERE id = $1", id)
    .execute(&mut *tx).await?;
let result = reqwest::get("https://payment-api.example.com/charge").await?;
sqlx::query!("UPDATE orders SET status = $1 WHERE id = $2", result.status, id)
    .execute(&mut *tx).await?;
tx.commit().await?;

Under load, this drains the pool. Restructure to minimise the transaction scope, or use Restate for operations that need durable execution across external calls.

pg_stat_statements

Enable pg_stat_statements to identify the queries consuming the most cumulative database time. This is the production equivalent of EXPLAIN ANALYZE for individual queries.

In Docker Compose for development:

services:
  postgres:
    image: postgres:17
    command: >
      postgres
        -c shared_preload_libraries=pg_stat_statements
        -c pg_stat_statements.track=all
        -c track_io_timing=on

Then create the extension:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Find the queries consuming the most database time:

SELECT
    substring(query, 1, 200) AS query_preview,
    calls,
    round(total_exec_time::numeric, 2) AS total_ms,
    round(mean_exec_time::numeric, 2) AS mean_ms,
    rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

A query with a 2ms mean but 5 million daily calls contributes more total load than a 500ms query called 100 times. Optimise by total time, not mean time.

Redis as a caching layer

Redis adds a shared, network-accessible cache that works across multiple application instances. It also adds an infrastructure dependency, a consistency problem (stale data), and invalidation complexity. Only introduce it when you have measured a real bottleneck that cannot be solved with database optimisation or HTTP caching.

If you are running a single application instance and want in-process caching, consider moka first. It provides an async-aware concurrent cache with TTL support, avoids the network hop, and requires no additional infrastructure.

Setup

The application’s Redis pub/sub setup already includes a ConnectionManager. Reuse it for caching:

use redis::aio::ConnectionManager;

#[derive(Clone)]
struct AppState {
    db: sqlx::PgPool,
    redis: ConnectionManager,
}

ConnectionManager wraps a MultiplexedConnection with automatic reconnection. It is cheap to clone and safe to share across handlers.

Cache-aside pattern

The application checks the cache first. On a miss, it queries the database, stores the result, and returns it.

use redis::AsyncCommands;
use serde::{de::DeserializeOwned, Serialize};
use std::future::Future;

async fn cache_aside<T, F, Fut>(
    redis: &mut ConnectionManager,
    key: &str,
    ttl_seconds: u64,
    fetch_fn: F,
) -> anyhow::Result<T>
where
    T: Serialize + DeserializeOwned,
    F: FnOnce() -> Fut,
    Fut: Future<Output = anyhow::Result<T>>,
{
    // Check cache
    let cached: Option<String> = redis.get(key).await?;
    if let Some(json) = cached {
        return Ok(serde_json::from_str(&json)?);
    }

    // Cache miss: fetch from source
    let value = fetch_fn().await?;

    // Store with TTL
    let json = serde_json::to_string(&value)?;
    redis.set_ex(key, &json, ttl_seconds).await?;

    Ok(value)
}

Use it in a handler:

async fn get_product(
    State(state): State<AppState>,
    Path(id): Path<i64>,
) -> Result<impl IntoResponse, AppError> {
    let mut redis = state.redis.clone();

    let product = cache_aside(
        &mut redis,
        &format!("product:{id}"),
        300, // 5 minutes
        || async {
            sqlx::query_as!(Product, "SELECT * FROM products WHERE id = $1", id)
                .fetch_one(&state.db)
                .await
                .map_err(Into::into)
        },
    ).await?;

    Ok(render_product(&product))
}

Cache invalidation

Invalidation is the hard part. Two practical strategies:

TTL-based expiration. Set a time-to-live and accept that data may be stale for up to that duration. Simple, self-healing, no coordination needed. Choose the TTL based on how stale the data can acceptably be.

Explicit invalidation on write. Delete the cache entry when the underlying data changes:

async fn update_product(
    State(state): State<AppState>,
    Path(id): Path<i64>,
    Form(input): Form<ProductInput>,
) -> Result<impl IntoResponse, AppError> {
    sqlx::query!("UPDATE products SET name = $1, price = $2 WHERE id = $3",
        input.name, input.price, id)
        .execute(&state.db).await?;

    // Invalidate the cache entry
    let mut redis = state.redis.clone();
    let _: () = redis.del(format!("product:{id}")).await?;

    Ok(Redirect::to(&format!("/products/{id}")))
}

The practical approach is both: set a TTL as a safety net, and explicitly invalidate on known write paths. The TTL catches any invalidation you missed.

The problem compounds with list queries. When you update a product, which cached list queries include it? Unless you can answer that precisely, you end up invalidating aggressively (clearing all product-related caches on any product write) or accepting staleness. Neither is free.

Graceful degradation

Never let a Redis failure break a request. Fall back to the database:

match redis.get::<_, Option<String>>(&cache_key).await {
    Ok(Some(json)) => {
        if let Ok(product) = serde_json::from_str(&json) {
            return Ok(product);
        }
    }
    Ok(None) => {} // Cache miss
    Err(e) => {
        tracing::warn!("Redis error, falling back to database: {e}");
    }
}

// Fetch from database
let product = sqlx::query_as!(Product, "SELECT * FROM products WHERE id = $1", id)
    .fetch_one(&state.db).await?;

Redis anti-patterns

Never set keys without a TTL. Unbounded memory growth leads to eviction storms or out-of-memory errors. Always use set_ex.
Never use the KEYS command. It blocks the single-threaded Redis server while scanning the entire keyspace. Use SCAN for iteration.
Use pipelining for multiple operations. Serial single-operation calls waste round trips:

let mut pipe = redis::pipe();
pipe.get("key1").get("key2").get("key3");
let (v1, v2, v3): (Option<String>, Option<String>, Option<String>) =
    pipe.query_async(&mut redis).await?;

Profiling Rust applications

When the techniques above are not enough, or when you need to identify where time is actually being spent, reach for a profiler. The table below covers the tools that work well with async Rust and Axum.

Need	Tool	Platform	Notes
CPU hotspots	cargo-flamegraph	Linux, macOS	Generates interactive SVG flamegraphs. Requires `debug = true` in release profile. Uses `perf` on Linux, `xctrace` on macOS.
CPU hotspots (interactive)	samply	Linux, macOS	Opens results in Firefox Profiler’s web UI. Better macOS experience than flamegraph.
Heap allocation profiling	dhat	All	Requires a `#[global_allocator]` swap and feature flag. View results in the DHAT online viewer.
Async runtime debugging	tokio-console	All	Terminal UI showing task states, wakeup counts, and poll durations. Requires `tokio_unstable` cfg flag. Development only.
Microbenchmarks	criterion	All	Statistics-driven benchmarking with regression detection. Supports async with the `async_tokio` feature.
Per-request latency	tower-http `TraceLayer`	All	Already covered in the observability section. Instrument handlers with `#[instrument]` for function-level timing.
Memory growth analysis	heaptrack	Linux	No code changes needed. Uses `LD_PRELOAD` to intercept allocations.

The general workflow: start with TraceLayer and #[instrument] spans in the observability section to identify which requests are slow. Use pg_stat_statements and EXPLAIN ANALYZE if the slowness is in the database. Reach for flamegraph or samply when the bottleneck is in application code. Use criterion to benchmark specific functions before and after optimisation.