Semantic Search

PostgreSQL full-text search matches words. Semantic search matches meaning. A user searching for “how to fix a flat tire” finds documents about “tire puncture repair” even though no words overlap. This is possible because text is converted into high-dimensional vectors (embeddings) that encode meaning, and similar meanings produce similar vectors.

pgvector adds vector similarity search to PostgreSQL. It introduces a native vector column type with distance operators and index support. If you already run PostgreSQL, adding semantic search requires an extension, not a new service. Your embeddings live alongside your relational data, with full ACID guarantees and SQL for filtering.

This section covers pgvector setup, generating embeddings with a local model, storing and querying vectors from Rust with SQLx, and combining vector similarity with full-text search for hybrid retrieval. For building complete RAG pipelines that feed retrieved context to an LLM, see Retrieval-Augmented Generation in the AI and LLM Integration section.

pgvector setup

Enable the extension

Add a migration:

CREATE EXTENSION IF NOT EXISTS vector;

vector is a contrib-style extension included in the standard postgres Docker image from PostgreSQL 17 onward. Cloud-managed PostgreSQL services (AWS RDS, Supabase, Neon) include it too.

Schema

Add a vector column sized to match your embedding model’s output dimensions. The example below uses 768 dimensions, which matches nomic-embed-text.

CREATE TABLE documents (
    id          BIGSERIAL PRIMARY KEY,
    title       TEXT NOT NULL,
    content     TEXT NOT NULL,
    embedding   vector(768),
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

Unlike the tsvector column used for full-text search, embeddings cannot be a GENERATED ALWAYS column. Generating an embedding requires calling an external model, which PostgreSQL cannot do in a column expression. Your application generates the embedding and writes it alongside the content.

Indexing

Create an HNSW (Hierarchical Navigable Small World) index for approximate nearest neighbour search:

CREATE INDEX idx_documents_embedding ON documents
    USING hnsw (embedding vector_cosine_ops);

HNSW is the recommended index type. It provides logarithmic search time and handles data updates without degrading recall. The alternative, IVFFlat, builds faster and uses less space, but its recall degrades as data changes because cluster centroids are not recalculated.

Without an index, pgvector performs exact nearest neighbour search via sequential scan. This is fine for small datasets (under ~100K vectors) but does not scale.

The vector_cosine_ops operator class matches cosine distance (<=>), which is the right choice for text embeddings. Other operator classes exist for L2 distance (vector_l2_ops), inner product (vector_ip_ops), and others.

Tuning index parameters

HNSW accepts two build-time parameters:

CREATE INDEX idx_documents_embedding ON documents
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

m controls the maximum number of connections per node (default 16). Higher values improve recall but increase index size and build time.
ef_construction controls the search breadth during index building (default 64). Higher values produce a better quality index at the cost of slower builds.

At query time, hnsw.ef_search controls how many nodes the search visits (default 40). Increase it when you need higher recall:

SET hnsw.ef_search = 100;

The defaults work well for most workloads. Benchmark against your actual data before changing them.

Generating embeddings

An embedding model converts text into a fixed-size vector. You need one to populate the embedding column and to convert search queries into vectors at query time.

Local embeddings with Ollama

Ollama runs embedding models locally. It serves an HTTP API compatible with the OpenAI embeddings endpoint, so any client that speaks that protocol works.

Pull an embedding model:

ollama pull nomic-embed-text

nomic-embed-text produces 768-dimension vectors, supports 8,192 token context, and runs on commodity hardware. It scores competitively with commercial APIs on retrieval benchmarks.

Generate an embedding via the API:

curl http://localhost:11434/api/embed -d '{
  "model": "nomic-embed-text",
  "input": "How to handle errors in Rust web applications"
}'

The response includes an embeddings array containing one vector per input string.

Calling Ollama from Rust

Ollama’s /api/embed endpoint accepts JSON and returns JSON. Use reqwest directly:

use reqwest::Client;
use serde::{Deserialize, Serialize};

#[derive(Serialize)]
struct EmbedRequest {
    model: String,
    input: Vec<String>,
}

#[derive(Deserialize)]
struct EmbedResponse {
    embeddings: Vec<Vec<f32>>,
}

pub async fn generate_embeddings(
    client: &Client,
    ollama_url: &str,
    texts: &[&str],
) -> Result<Vec<Vec<f32>>, reqwest::Error> {
    let response: EmbedResponse = client
        .post(format!("{}/api/embed", ollama_url))
        .json(&EmbedRequest {
            model: "nomic-embed-text".to_string(),
            input: texts.iter().map(|s| s.to_string()).collect(),
        })
        .send()
        .await?
        .json()
        .await?;

    Ok(response.embeddings)
}

Batch multiple texts in a single request. Ollama processes them together, which is faster than one request per text.

OpenAI as an alternative

If you prefer a hosted API, OpenAI’s text-embedding-3-small produces 1,536-dimension vectors at $0.02 per million tokens. Change the vector(768) column to vector(1536), swap the model name, and point the request at https://api.openai.com/v1/embeddings with a bearer token. The query patterns in this section work the same regardless of how the embedding was generated.

Storing and querying vectors with SQLx

The pgvector crate

The pgvector crate provides a Vector type that implements SQLx’s Encode and Decode traits.

[dependencies]
pgvector = { version = "0.4", features = ["sqlx"] }

Inserting documents with embeddings

use pgvector::Vector;
use sqlx::PgPool;

pub async fn insert_document(
    pool: &PgPool,
    title: &str,
    content: &str,
    embedding: Vec<f32>,
) -> Result<i64, sqlx::Error> {
    let embedding = Vector::from(embedding);

    sqlx::query_scalar!(
        r#"
        INSERT INTO documents (title, content, embedding)
        VALUES ($1, $2, $3)
        RETURNING id
        "#,
        title,
        content,
        embedding as _
    )
    .fetch_one(pool)
    .await
}

The as _ cast tells SQLx to use the pgvector crate’s Encode implementation rather than trying to infer a type mapping for the vector column.

Similarity search

The <=> operator computes cosine distance. Lower distance means higher similarity. Order by distance ascending to get the most similar results first.

struct SimilarDocument {
    id: i64,
    title: String,
    content: String,
    similarity: f64,
}

pub async fn semantic_search(
    pool: &PgPool,
    query_embedding: Vec<f32>,
    limit: i64,
) -> Result<Vec<SimilarDocument>, sqlx::Error> {
    let embedding = Vector::from(query_embedding);

    sqlx::query_as!(
        SimilarDocument,
        r#"
        SELECT
            id,
            title,
            content,
            1 - (embedding <=> $1) as "similarity!"
        FROM documents
        ORDER BY embedding <=> $1
        LIMIT $2
        "#,
        embedding as _,
        limit
    )
    .fetch_all(pool)
    .await
}

1 - cosine_distance converts the distance into a similarity score between 0.0 and 1.0, where 1.0 is identical.

Filtered similarity search

Combine vector similarity with standard SQL filtering. pgvector’s HNSW index supports iterative scans (v0.8.0+), so filtered queries return the expected number of results even when the filter is selective:

pub async fn search_by_category(
    pool: &PgPool,
    query_embedding: Vec<f32>,
    category: &str,
    limit: i64,
) -> Result<Vec<SimilarDocument>, sqlx::Error> {
    let embedding = Vector::from(query_embedding);

    sqlx::query_as!(
        SimilarDocument,
        r#"
        SELECT
            id,
            title,
            content,
            1 - (embedding <=> $1) as "similarity!"
        FROM documents
        WHERE category = $2
        ORDER BY embedding <=> $1
        LIMIT $3
        "#,
        embedding as _,
        category,
        limit
    )
    .fetch_all(pool)
    .await
}

Hybrid search

Vector similarity alone achieves roughly 62% retrieval precision. Combining it with full-text search using Reciprocal Rank Fusion (RRF) pushes this to roughly 84%. RRF merges two ranked result lists by converting ranks into scores and summing them, so a document that ranks well in both lists scores highest.

Schema for hybrid search

A table that supports both search strategies needs a tsvector column for FTS and a vector column for semantic search:

CREATE TABLE documents (
    id            BIGSERIAL PRIMARY KEY,
    title         TEXT NOT NULL,
    content       TEXT NOT NULL,
    embedding     vector(768),
    search_vector tsvector
        GENERATED ALWAYS AS (
            setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
            setweight(to_tsvector('english', coalesce(content, '')), 'B')
        ) STORED,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_documents_embedding ON documents
    USING hnsw (embedding vector_cosine_ops);

CREATE INDEX idx_documents_search ON documents
    USING gin (search_vector);

The hybrid search query

Run both search strategies, rank each result set independently, then merge with RRF:

WITH semantic AS (
    SELECT id, title, content,
           row_number() OVER (ORDER BY embedding <=> $1) AS rank
    FROM documents
    ORDER BY embedding <=> $1
    LIMIT $3
),
fulltext AS (
    SELECT id, title, content,
           row_number() OVER (
               ORDER BY ts_rank_cd(search_vector,
                   websearch_to_tsquery('english', $2)) DESC
           ) AS rank
    FROM documents
    WHERE search_vector @@ websearch_to_tsquery('english', $2)
    LIMIT $3
),
combined AS (
    SELECT id, title, content, rank, 'semantic' AS source FROM semantic
    UNION ALL
    SELECT id, title, content, rank, 'fulltext' AS source FROM fulltext
)
SELECT id, title, content,
       sum(1.0 / (50 + rank)) AS score
FROM combined
GROUP BY id, title, content
ORDER BY score DESC
LIMIT $3;

The constant 50 in the RRF formula (1.0 / (50 + rank)) is a smoothing parameter. It prevents top-ranked results from dominating excessively. 50 is the standard value from the original RRF paper.

Hybrid search in Rust

struct HybridResult {
    id: i64,
    title: String,
    content: String,
    score: f64,
}

pub async fn hybrid_search(
    pool: &PgPool,
    query_embedding: Vec<f32>,
    query_text: &str,
    limit: i64,
) -> Result<Vec<HybridResult>, sqlx::Error> {
    let embedding = Vector::from(query_embedding);

    sqlx::query_as!(
        HybridResult,
        r#"
        WITH semantic AS (
            SELECT id, title, content,
                   row_number() OVER (ORDER BY embedding <=> $1) AS rank
            FROM documents
            ORDER BY embedding <=> $1
            LIMIT $3
        ),
        fulltext AS (
            SELECT id, title, content,
                   row_number() OVER (
                       ORDER BY ts_rank_cd(search_vector,
                           websearch_to_tsquery('english', $2)) DESC
                   ) AS rank
            FROM documents
            WHERE search_vector @@ websearch_to_tsquery('english', $2)
            LIMIT $3
        ),
        combined AS (
            SELECT id, title, content, rank FROM semantic
            UNION ALL
            SELECT id, title, content, rank FROM fulltext
        )
        SELECT
            id as "id!",
            title as "title!",
            content as "content!",
            sum(1.0 / (50 + rank))::float8 as "score!"
        FROM combined
        GROUP BY id, title, content
        ORDER BY score DESC
        LIMIT $3
        "#,
        embedding as _,
        query_text,
        limit
    )
    .fetch_all(pool)
    .await
}

The caller generates an embedding from the query text, then passes both the embedding and the raw text. The embedding drives the semantic branch; the raw text drives the FTS branch.

pub async fn search(
    pool: &PgPool,
    http_client: &reqwest::Client,
    ollama_url: &str,
    query: &str,
    limit: i64,
) -> Result<Vec<HybridResult>, sqlx::Error> {
    let embeddings = generate_embeddings(http_client, ollama_url, &[query])
        .await
        .map_err(|e| sqlx::Error::Protocol(e.to_string()))?;

    hybrid_search(pool, embeddings.into_iter().next().unwrap(), query, limit).await
}

When to use semantic search

Add pgvector when your application needs to match by meaning rather than keywords:

Knowledge base search. Users describe problems in their own words; documents use different terminology.
Recommendation. “Show me articles similar to this one” is a single vector distance query.
RAG retrieval. An LLM needs relevant context from your data to generate grounded answers. See Retrieval-Augmented Generation in the AI and LLM Integration section.
Classification and clustering. Group documents by semantic similarity without manual tagging.

Stick with full-text search when exact keyword matching, boolean queries, or phrase search are what users expect. The two approaches complement each other, as the hybrid search pattern above demonstrates.

pgvector vs dedicated vector databases

pgvector handles up to a few million vectors comfortably. Beyond that, index builds become slow and memory-intensive. Dedicated vector databases (Qdrant, Weaviate, Pinecone) are built for horizontal scaling to billions of vectors.

For most content-heavy and CRUD web applications, pgvector is the right choice. Your embeddings share a database with the data they describe, transactions keep them consistent, and there is no sync pipeline to maintain. The same reasoning that makes PostgreSQL FTS the right starting point for keyword search applies here: start with what you have, and graduate to a dedicated service only when you hit a specific limitation.

Gotchas

The vector type has a dimension limit. Maximum 2,000 dimensions for vector, 4,000 for halfvec. Most embedding models produce 768 or 1,536 dimensions, which fit comfortably. OpenAI’s text-embedding-3-large at 3,072 dimensions exceeds the vector limit — reduce it to 1,536 via the API’s dimensions parameter.

Embeddings are not free to generate. Every document insert or update requires an embedding model call. For bulk imports, batch the embedding requests. For Ollama, send multiple texts in a single /api/embed request.

HNSW index builds can spike memory. Building an HNSW index on a large table may consume significant memory. For tables with millions of rows, build the index during a maintenance window and monitor resource usage.

IVFFlat recall degrades silently. If you use IVFFlat instead of HNSW, recall drops as your data changes because cluster centroids are not recalculated. Rebuild the index periodically or use HNSW.

SELECT * fails with vector columns in query_as!. Just as with tsvector columns, SQLx’s compile-time macros need explicit column lists. List your columns explicitly, omitting or casting the embedding column unless you need its contents.

The extension must be installed in your development database. SQLx’s compile-time query! macros connect to the database during compilation. The vector extension must be enabled there. The same applies to cargo sqlx prepare for offline compilation in CI.