Building with AI Coding Agents

AI coding agents work better with Rust than most developers expect. The compiler provides deterministic, actionable feedback at every step: propose code, compile, read the error, revise. In dynamically typed languages, an agent can produce code that runs but is subtly wrong. In Rust, the borrow checker, type system, and lifetime rules catch entire categories of errors before the code ever executes. This feedback loop turns the agent’s stochastic generation into a convergence process. The strictness that makes Rust harder to learn makes it easier for agents to get right.

That advantage has limits. Agents hallucinate crate names and API surfaces, produce code with security vulnerabilities at high rates, and make architectural decisions that compile but are not what you want. This section covers how to structure your project for effective agent collaboration, what agents handle well, where human judgment is essential, and how to review what they produce.

Structuring your project for AI agents

AI coding agents work from context. The more precise the context, the better the output. Without project-specific instructions, an agent falls back on its training data, which contains code from every version of every crate, every architectural style, and every level of quality. Give it a narrow lane to work in.

The CLAUDE.md file

Place a CLAUDE.md file in your project root. Claude Code reads this file automatically at the start of every session. Other tools have their own conventions (.cursor/rules/*.mdc for Cursor, .github/copilot-instructions.md for GitHub Copilot), but the content principles are the same.

Keep it lean. Target under 200 lines. The file is injected into the agent’s context window alongside its system prompt and your conversation, so every line competes for attention. A 500-line instruction file dilutes the instructions that matter.

Structure the file around three concerns:

What this project is. The tech stack, architectural style, and key crates with versions.

## Stack
- Axum 0.8 (web framework)
- Maud 0.26 (HTML templating)
- SQLx 0.8 (database, compile-time checked queries, Postgres)
- htmx 2.0 (interactivity)
- tower-sessions 0.14 (session management)

How to build and verify. The commands an agent needs to check its own work.

## Commands
- `cargo check` — fast type checking
- `cargo clippy --all-targets -- -D warnings` — lint with warnings as errors
- `cargo test --workspace` — full test suite
- `cargo sqlx prepare --workspace` — regenerate SQLx query cache

What conventions to follow. Error handling patterns, module organisation, naming conventions, anything the agent would get wrong without guidance.

## Conventions
- Error handling: thiserror for library crates, anyhow for application crates
- All handlers return Result<impl IntoResponse, AppError>
- HTML fragments for htmx requests, full pages for normal requests
- British English in user-facing strings

Hierarchical instruction files

For larger projects, place additional CLAUDE.md files in subdirectories. A file in crates/web/CLAUDE.md provides context specific to the web crate without cluttering the root file. Claude Code merges these automatically when working in that directory.

This mirrors how a team onboards a new developer: general project context first, then module-specific conventions as they start working in a particular area.

What to leave out

Do not duplicate what tools already enforce. Formatting rules belong in rustfmt.toml. Lint configuration belongs in clippy.toml or Cargo.toml lint sections. The instruction file covers what the agent cannot infer from tooling configuration.

Avoid task-specific instructions. “When writing a new handler, always add a test” is a good instruction. “Add a handler for /users/{id}/edit that returns an edit form” is a task, not a convention. Tasks belong in your conversation with the agent, not in the instruction file.

Cross-tool compatibility

AGENTS.md is an emerging cross-tool standard backed by the Linux Foundation’s Agentic AI Foundation, with support from Claude Code, Cursor, Copilot, Codex, and others. If your team uses multiple AI tools, an AGENTS.md file provides a single source of project context that all tools read. The content guidance is identical to what is described above for CLAUDE.md.

ToolInstruction fileNotes
Claude CodeCLAUDE.mdHierarchical, nested in subdirectories
Cursor.cursor/rules/*.mdcYAML frontmatter with activation modes
GitHub Copilot.github/copilot-instructions.mdSupported since late 2024
Cross-toolAGENTS.mdLinux Foundation backed, 60k+ projects

Using this guide as agent context

This guide is designed to work as context for AI coding agents. Each section is self-contained, uses explicit file paths and crate names, and avoids implied knowledge that requires reading other sections first.

When starting a task, give the agent the relevant section from this guide alongside your project’s instruction file. If you are implementing authentication, provide the authentication section. If you are setting up deployment, provide the deployment section. The agent gets current, opinionated guidance instead of drawing on its training data, which may reference deprecated APIs or different architectural patterns.

For Claude Code specifically, this works through the instruction file hierarchy. Reference sections by linking to them or by including the key patterns inline in a subdirectory CLAUDE.md:

## Auth patterns
- Session-based auth with tower-sessions and sqlx-store
- See the project guide's authentication section for implementation details
- Password hashing: argon2 crate, never store plaintext
- CSRF: tower-csrf middleware on all state-changing endpoints

The goal is not to paste entire sections into context. It is to give the agent enough anchoring information that it produces code consistent with your chosen patterns rather than inventing its own.

Writing effective prompts for Rust web development

Prompting an AI agent for Rust web development is more constrained than prompting for Python or JavaScript. The type system defines a narrow space of valid programs, and the more of that space you specify upfront, the better the output.

Specify crate versions

The single most impactful habit. Agents draw on training data spanning multiple years of crate releases. Axum 0.7 and 0.8 have different APIs. SQLx 0.7 and 0.8 changed their macro syntax. Stating the version explicitly prevents the agent from generating code for the wrong API surface.

Put versions in your instruction file so you do not repeat them in every prompt. When they appear in context, the agent uses them consistently.

Provide type signatures

Rust’s type system constrains solutions. Providing an explicit function signature gives the agent a precise target:

“Write a handler with this signature: async fn create_user(State(pool): State<PgPool>, Form(payload): Form<CreateUserForm>) -> Result<impl IntoResponse, AppError>. It should insert the user into the users table and redirect to /users/{id}.”

This is more effective than “write a handler that creates a user” because the agent does not need to guess the extractor types, error handling approach, or return type.

Show one example, ask for variations

Instead of describing a pattern from scratch, show the agent one working handler, test, or component and ask for similar ones. The agent matches the style, error handling, and conventions of the example rather than inventing its own. This produces more consistent codebases than generating each piece independently.

State the response format for htmx handlers

Agents default to generating full HTML pages. For htmx-driven applications, most handlers return HTML fragments. Be explicit: “This handler responds to an htmx request and returns an HTML fragment. Do not include <html>, <head>, or <body> tags.”

Provide the database schema

Include the relevant CREATE TABLE statements when asking for database-related code. This prevents the agent from hallucinating column names, types, or relationships. For sqlx::query_as! macros, the agent cannot run compile-time verification itself, so the schema serves as the source of truth.

Use iterative refinement

Ask the agent to review its own output before you accept it. “Review the code you just wrote for non-idiomatic Rust patterns, unnecessary allocations, and missing error cases. Fix any issues you find.” The OpenSSF’s security-focused guide for AI code assistant instructions specifically recommends this recursive self-review pattern over telling the agent it is an expert.

Patterns AI agents handle well

Agents perform best on tasks where the type system constrains the solution space and the pattern is well-represented in training data.

CRUD handlers. Standard create, read, update, delete operations with Axum extractors and SQLx queries. The combination of typed extractors, parameterised queries, and structured return types leaves little room for the agent to go wrong.

Trait implementations. Generating impl blocks for Display, From, Serialize, Deserialize, IntoResponse, and similar traits. The compiler defines the expected shape precisely.

Test scaffolding. Given a function signature and expected behaviour, agents produce solid test structures. Rust’s #[cfg(test)] module pattern and assert! macros are well-represented in training data. Review the assertions for correctness, since a test that always passes proves nothing.

Boilerplate and repetitive code. Migration files, configuration parsing, middleware setup, route registration. These follow established patterns with little variation.

Explaining compiler errors. When you hit a confusing borrow checker or lifetime error, asking the agent to explain it is often faster than searching for the error code. Current models understand Rust’s ownership semantics well enough to give accurate explanations.

Multi-file consistency. Agents that operate across files (Claude Code, Cursor, Windsurf) maintain synchronisation between handler definitions, route registration, and type declarations. This is one area where agents save significant manual coordination effort.

Areas where human judgment is needed

Agents generate code that compiles. Compiling is necessary but not sufficient. These areas require active human judgment, not just verification that the build passes.

Architectural decisions

Agents optimise for completing the immediate task. They do not consider how a piece of code fits into the broader system. An agent will happily put everything in main.rs if you do not specify a module structure. It will create a new database connection pool per request if you do not show it the shared state pattern. Architectural decisions, where to draw module boundaries, how to structure the workspace, when to extract a crate, remain human responsibilities.

Crate selection

Agents hallucinate crate names. They recommend crates that do not exist, suggest deprecated crates, or use the wrong crate for the job. In Shuttle’s 2025 testing of seven AI tools on the same Rust project, hallucinated crate versions were the most consistently reported problem across all tools. Always verify that a suggested crate exists on crates.io and check its maintenance status before adding it to Cargo.toml.

Ownership and lifetime design

Agents can fix individual borrow checker errors, but they sometimes fix them by adding unnecessary .clone() calls or wrapping things in Arc<Mutex<>> when a simpler restructuring would work. The resulting code compiles but carries hidden performance costs and obscures the ownership model. When an agent adds a clone to satisfy the compiler, consider whether the data flow should be restructured instead.

Performance-sensitive code paths

Agents produce code that is functionally correct but not necessarily performant. Hidden allocations (.collect::<Vec<_>>() when streaming would work), blocking calls in async contexts, holding locks across await points, these compile and pass tests but degrade under load. In hot paths, review the generated code for unnecessary allocations and synchronisation overhead.

Error handling granularity

Agents tend toward two extremes: either they use .unwrap() everywhere or they create overly granular error types for every possible failure. Neither is appropriate. Error handling requires judgment about which failures the caller can handle, which should propagate, and which need logging.

Sensitive business logic

Authorisation rules, pricing calculations, data retention policies, anything where a subtle bug has business consequences beyond a 500 error. These require understanding the domain, not just the type system. Use agents to generate the scaffolding, then write the core logic yourself or review it with particular care.

Review practices for AI-generated Rust code

Rust’s toolchain provides a review pipeline that catches more issues automatically than any other mainstream language. Use it.

The automated pipeline

Run these checks on every piece of AI-generated code, in order:

  1. cargo check — Fast type checking without full compilation. Catches type errors, borrow checker violations, and missing trait implementations. If this fails, the code has fundamental problems. Send the error back to the agent.

  2. cargo clippy --all-targets --all-features -- -D warnings — Clippy provides over 600 lints. It catches non-idiomatic patterns, common performance mistakes, and correctness issues. Treat warnings as errors. If clippy flags something, fix it before proceeding.

  3. cargo test --workspace — Run the full test suite. If the agent wrote tests, verify that they actually test meaningful behaviour. A test that asserts true == true passes but proves nothing.

What to look for in human review

After the automated pipeline passes, review the code for issues that tools cannot catch:

Unnecessary clones and allocations. Agents satisfy the borrow checker by adding .clone() where restructuring the data flow would be better. Look for clones of large types, clones inside loops, and String allocations where &str would suffice.

Over-engineering. Agents sometimes introduce unnecessary traits, generic parameters, or abstraction layers for code that does one thing. Three lines of straightforward code is better than a generic trait with one implementor.

Hidden unwrap() calls. Search generated code for .unwrap() and .expect(). In handler code, these cause panics that crash the request (or worse, the server). They should be replaced with proper error propagation using ? and typed errors.

Stale or hallucinated dependencies. Check Cargo.toml changes. Verify that any new crate the agent added actually exists, is maintained, and is the right tool for the job. Check the version number against crates.io.

SQL query correctness. SQLx’s compile-time checking validates syntax and types against the database schema, but it does not verify business logic. A query that returns the wrong rows or updates the wrong records compiles fine. Read the SQL.

Security review

AI-generated code contains security vulnerabilities at rates that warrant systematic review. Veracode’s 2025 study across 100+ LLMs found security flaws in 45% of generated code. The Stanford study found that developers using AI assistants wrote less secure code while believing it was more secure.

Rust’s memory safety eliminates one category of vulnerabilities (buffer overflows, use-after-free, null pointer dereferences), but does nothing for application-level security: SQL injection, XSS, hardcoded secrets, improper access control, information leakage through error messages.

See the web application security section for a thorough treatment of security practices and a review checklist specific to AI-generated code. The short version: never trust that generated code handles user input safely, always verify that secrets come from environment variables, and check that error responses do not leak internal details.

The research-plan-implement workflow

The techniques above address individual prompts and reviews. The broader question is how to organise an entire feature’s worth of AI-assisted work. A pattern that works well: split the work into three phases, each with its own fresh context.

Research. Explore the codebase, identify existing patterns, map the relevant types and modules. Compress the findings into a focused summary. This phase is about understanding what exists before deciding what to change. The agent is good at this: reading files, tracing call chains, summarising structure. The output is a short document, not code.

Plan. Using the research summary as input, produce a detailed execution plan: which files to create or modify, in what order, with what interfaces. Include test criteria and references to specific code locations. Human review at this stage is high leverage. Catching an architectural mistake in a plan costs one line of editing. Catching it after implementation costs a rework cycle.

Implement. Feed the plan and only the necessary source files into a fresh context. Work in chunks, testing between steps. The plan constrains the agent’s decisions, reducing the chance of it inventing its own architecture or drifting from the intended design.

Each phase starts with a clean context window. This matters because of context rot: agent performance degrades as the context fills with stale conversation history, abandoned approaches, and accumulated noise. Research suggests reasoning quality peaks around 40% context window utilisation. Long, sprawling sessions where research, planning, and implementation all happen in one thread produce worse results than short, focused sessions with clear inputs.

The key insight is that human leverage is highest at the research and planning stages, not at the code level. A wrong assumption in research multiplies into dozens of wrong lines of code. A plan that specifies the wrong module boundary produces a coherent but misguided implementation. Catch errors early, where they are cheap to fix.

Gotchas

Hallucinated crate versions are the most common problem. Agents confidently generate code using APIs from older versions of Axum, SQLx, tokio, and other rapidly evolving crates. Specifying versions in your instruction file mitigates this but does not eliminate it. Always verify that generated code uses the current API surface.

Agents break working code on subsequent edits. A common failure mode: the agent writes correct code for a feature, then on a later edit to the same file, modifies or deletes the earlier code. Review diffs carefully, not just the new code. Use version control to catch regressions.

Tests generated by agents need review. Agents produce tests that compile and pass but sometimes test the wrong thing or test trivial properties. A test for a create-user handler that never checks whether the user was actually persisted to the database is worse than useless, since it provides false confidence.

Agents fight the borrow checker with brute force. When an agent encounters a lifetime or borrowing error, it sometimes adds Arc<Mutex<>> wrapping, unnecessary clones, or 'static lifetime bounds rather than restructuring the code. The result compiles but is not idiomatic and may have performance implications. If an agent’s fix involves wrapping something in Arc<Mutex<>> that was not originally behind one, ask why the ownership model needs shared mutable state.

Context window limits affect large projects. Rust projects with deep module trees and many crates can exceed what an agent can hold in context. When working on a large workspace, guide the agent to the specific crates and files relevant to the task rather than expecting it to understand the entire project.

Further reading

These posts explore the practices touched on in this section in more depth:

  • AI Engineer vs. Sloperator — The distinction between producing quality code with AI tools and generating slop. Covers context rot, the research-plan-implement workflow, and how to configure projects for agent collaboration.
  • Context Engineering Is the Job — Context engineering as the core discipline of working with LLMs. How to gather, curate, and manage the information that goes into each generation step.
  • Thinking in Plans, Not Code — Progressive refinement from requirements through detailed plans before implementation. Why the planning phase, not the coding phase, is where quality is determined.
  • Code Review in AI-Augmented Development — How code review changes when AI generates the code. Right-sizing work units, reviewing plans before code, and triaging review effort toward high-risk areas.