Your App LogoYOUR APP EXPERTYAE
    • Services
    • About
    • Portfolio
    • Blog
    • FAQ
    • Build Your App
    1. Home
    2. Blog
    3. Choosing a vector database: pgvector vs Pinecone vs Qdrant
    AI & Agents

    Choosing a vector database: pgvector vs Pinecone vs Qdrant

    An honest comparison of the three serious choices for production vector search in 2026 — what each one is good at, what they're not, and why pgvector wins more often than the marketing suggests.

    YAEL Engineering·24 Nov 2025·9 min read·1,742 words
    On this page
    • The decision in one paragraph
    • The feature comparison that matters
    • Why pgvector wins more often than you'd think
    • When pgvector hits its limit
    • When Pinecone makes sense
    • When Qdrant makes sense
    • Hybrid search — the under-appreciated win
    • Embedding model choices
    • Reranking — the highest-leverage step
    • Operational notes
    • FAQ
    • Can pgvector handle 100M vectors?
    • What about Weaviate / Milvus / Chroma?
    • Does Supabase support pgvector?
    • What about full-text + vector in one query?
    • How many dimensions do I need?
    • Can I use binary or scalar quantization?
    • Should I store the original text alongside the vector?
    • How do I update an embedding when the document changes?

    For ~80% of production RAG systems in 2026, pgvector running on the Postgres you already have is the right answer. It is fast enough up to tens of millions of vectors, it supports hybrid (vector + keyword) search natively, it joins to your relational data without a separate sync pipeline, and it costs you nothing beyond Postgres hosting. Pinecone is the right answer if you need >100M vectors with sub-100ms latency or if you don't want to run a database at all. Qdrant is the right answer if you need a self-hosted, feature-rich vector DB independent of your relational store. Pinecone, Weaviate, Milvus, Chroma — all viable in their niches. The question isn't usually "which is best." It's "which is best for the size of the problem I actually have."

    We've shipped RAG on each of these. This is the honest take.

    The decision in one paragraph

    If you already have Postgres and your corpus is under ~10M vectors: pgvector. If you have over 100M vectors or want a fully managed service: Pinecone (or Postgres on Neon with their managed pgvector). If you need self-hosted with rich filtering and don't want to use Postgres: Qdrant. If you have specific exotic needs (graph search, image+text hybrid): Weaviate. If you want the simplest possible local option: Chroma or LanceDB.

    The feature comparison that matters

    | | pgvector | Pinecone | Qdrant | |---|---|---|---| | Hosting | Self-host on Postgres or managed (Neon, Supabase) | Fully managed only | Self-host or managed cloud | | Max practical vectors | ~10-100M with HNSW | Effectively unlimited | ~100M self-hosted, more managed | | Filtering | Full SQL on relational columns | Metadata filters, limited | Rich payload filters | | Hybrid search | tsvector + RRF native | None (use a wrapper) | BM25 native | | Joins to relational data | Free | Manual sync | Manual sync | | Cost (low volume) | $0 over Postgres | ~$70/mo minimum | $0 self-hosted | | Cost (high volume) | Postgres scaling | Linear with vectors | Linear with infra | | Latency (P95) | ~10-50ms for under 10M vectors | ~30-80ms | ~10-40ms | | Index types | HNSW, IVFFlat | HNSW (managed) | HNSW | | Operational complexity | Whatever Postgres needs | None | Real |

    Why pgvector wins more often than you'd think

    The dominant argument: you already have a relational database. Your documents are stored relationally. Your tenant model is relational. Your auth is relational. Adding a separate vector DB means a sync pipeline — chunks get written to two places, and they get out of sync, and the bug is invisible until a user notices.

    With pgvector, the embedding is a column:

    sql
    create extension vector;
    
    create table documents (
      id          text primary key,
      org_id      text not null,
      title       text not null,
      body        text not null,
      embedding   vector(1536),
      created_at  timestamptz not null default now()
    );
    
    -- HNSW index for fast approximate nearest neighbor
    create index documents_embedding_idx
      on documents using hnsw (embedding vector_cosine_ops);
    
    -- Standard btree for tenant filtering
    create index documents_org_idx on documents(org_id);

    A tenant-scoped search becomes one query:

    sql
    select id, title,
           1 - (embedding <=> $1) as score
    from documents
    where org_id = $2
    order by embedding <=> $1
    limit 8;

    That's the entire vector search. With row-level security from multi-tenant Postgres RLS, even the where org_id = $2 is enforced at the DB level. No sync pipeline. No second source of truth. No cross-system race conditions.

    When pgvector hits its limit

    The real ceiling is around 10-30M vectors per HNSW index, depending on your hardware and query patterns. Past that:

    • HNSW build time gets long (hours, not minutes)
    • Memory usage of the index becomes significant
    • Query latency starts to climb past 100ms

    You have three options at that scale:

    1. Shard by tenant (each tenant gets its own table)
    2. Use IVFFlat instead of HNSW (different trade-offs — faster build, slightly slower queries, more tuneable)
    3. Move to a dedicated vector DB

    Sharding by tenant is surprisingly effective because most queries are already tenant-scoped — you're searching within one tenant at a time anyway.

    When Pinecone makes sense

    Pinecone's pitch: it's the easiest production vector DB to operate. You don't run a database. You make API calls. The latency is consistent. The scaling is automatic.

    Pick Pinecone when:

    • You have a non-engineering team building RAG and don't want to think about infra
    • Your corpus is genuinely huge (100M+ vectors)
    • You need cross-region replication out of the box
    • You can afford $70-700+/month in Pinecone fees on top of your other infra

    Pinecone's filtering is weaker than Postgres or Qdrant. Metadata filters work but they're constrained — no joins, limited operators, no nested filtering. If your search needs are "vector similarity plus some metadata exact-match," fine. If they're more complex, Pinecone gets awkward.

    When Qdrant makes sense

    Qdrant is the self-hosted middle ground. Rich payload filtering, HNSW-only (good for most cases), Rust-based and fast.

    Pick Qdrant when:

    • You can't or don't want to use Postgres
    • You need rich filter expressions on metadata
    • You have a self-hosted ops culture and the team to run it
    • You want sub-50ms P99 latency at 50M+ vectors

    The trade-off: it's a separate database. You have a sync pipeline again. You have separate backup and disaster-recovery operations. None of this is unique to Qdrant — same applies to Pinecone, Weaviate, and Milvus.

    Hybrid search — the under-appreciated win

    Pure dense-vector search misses keyword matches that humans would expect. "Find docs about kubernetes" returns docs that semantically match "container orchestration" — and the user wonders why we didn't return the doc literally titled "Kubernetes."

    Hybrid search combines BM25 (keyword) with dense vectors. Postgres natively supports this:

    sql
    select
      id, title,
      (
        0.5 * (1 - (embedding <=> $1))  -- vector similarity
        + 0.5 * ts_rank(to_tsvector('english', body), plainto_tsquery($2))  -- BM25-ish
      ) as score
    from documents
    where org_id = $3
    order by score desc
    limit 8;

    This is reciprocal-rank-fusion-style hybrid. Better than either alone by 5-15% on quality benchmarks. Worth implementing.

    Qdrant has native BM25 support. Pinecone doesn't — you'd run two queries and fuse them application-side.

    Embedding model choices

    The vector DB matters less than the embedding model. Choices:

    • OpenAI text-embedding-3-large (3072 dim) — strong baseline, expensive at scale
    • OpenAI text-embedding-3-small (1536 dim) — cheaper, ~95% of large's quality on most tasks
    • Cohere Embed v3 — competitive quality, often cheaper
    • BGE M3 / e5-large-v2 — self-hosted, near-OpenAI quality, free if you have the GPU
    • Voyage AI — niche but excellent for code embeddings

    Storage cost scales linearly with dimension. A 3072-dim vector at float32 is 12KB. A million vectors is 12GB just in the embedding column. Halve the dimension and you halve the storage.

    sql
    -- Use a smaller embedding to save storage
    alter table documents drop column embedding;
    alter table documents add column embedding vector(768);
    -- (re-embed with a smaller model)

    Reranking — the highest-leverage step

    Vector search returns top-K candidates. A cross-encoder reranker (Cohere Rerank, BGE reranker) re-orders them by true relevance. This typically improves retrieval quality more than upgrading the embedding model.

    Reranking is independent of which vector DB you use:

    ts
    async function searchAndRerank(query: string, orgId: string) {
      const queryVec = await embed(query);
      const candidates = await db.query<DocumentRow[]>(
        `select id, title, body from documents
         where org_id = $1
         order by embedding <=> $2
         limit 50`,
        [orgId, queryVec],
      );
      // Rerank with Cohere — expensive but very effective
      const reranked = await cohere.rerank({
        query,
        documents: candidates.map((c) => c.body),
        top_n: 8,
      });
      return reranked.results.map((r) => candidates[r.index]);
    }

    Top-50 → rerank → top-8. The combined cost is ~$0.001 per query at Cohere's rates. The quality gain is dramatic.

    Operational notes

    A few things we've learned shipping these in production:

    • HNSW builds are slow. Build the index once after bulk-loading; don't rebuild on every insert. Postgres handles incremental updates fine.
    • IVFFlat needs a sample. When creating the index, Postgres samples to learn the centroids. Make sure you have representative data loaded first.
    • Embedding cost dominates. At high volume, paying OpenAI for embeddings can exceed the cost of running Postgres. Plan for it.
    • Drift. If you re-embed your corpus with a new model, you must re-embed queries with the same model. Cross-model search returns garbage. Add a model-version column to your documents table.
    sql
    alter table documents add column embedding_model text default 'text-embedding-3-small';

    Building a RAG pipeline?

    We've shipped RAG with pgvector, Pinecone, and Qdrant — and we'll pick the right one for your scale.

    See AI Agent service

    FAQ

    Can pgvector handle 100M vectors?

    Technically yes, practically painful. Past ~30M per index, HNSW build times get long and memory pressure builds. Shard by tenant or move to a dedicated vector DB at that scale.

    What about Weaviate / Milvus / Chroma?

    Weaviate has strong filtering and a built-in vectorization pipeline — useful if you want one tool to do embedding + storage. Milvus is high-scale enterprise. Chroma is a delightful local-first tool that's not designed for high-volume production.

    Does Supabase support pgvector?

    Yes, natively. Same for Neon. Both are good hosted-pgvector options.

    What about full-text + vector in one query?

    Postgres does it natively via the hybrid pattern above. Most other vector DBs require either two queries (then app-side fusion) or a separate full-text store.

    How many dimensions do I need?

    For modern embedding models, 1024-1536 dims is enough for most domains. Drop to 768 if storage matters, push to 3072 if quality matters and you can afford the cost.

    Can I use binary or scalar quantization?

    Yes — pgvector supports half-precision and binary quantized vectors as of recent versions. Storage drops 4-32x with modest quality loss. Worth experimenting on read-heavy systems.

    Should I store the original text alongside the vector?

    Yes, in the same row. Querying the vector then needing to round-trip to fetch the text from another store is the pattern that pushes most teams off Pinecone.

    How do I update an embedding when the document changes?

    Re-embed the new text, update the row. With pgvector, it's a single update. With a separate vector DB, you delete + insert. Either way, plan for it — drift between text and vector is the silent killer of RAG quality.

    TagspgvectorPineconeQdrantVector DBRAG
    ServiceAI Agent DevelopmentAutomation Scripts
    PreviousStripe Connect marketplace architecture: a deep diveNext Building an internal tool instead of buying Retool

    Keep reading

    AI & AgentsRAG vs fine-tuning: when to pick each (and when to pick both)A practical decision framework for retrieval-augmented generation vs fine-tuning vs prompt engineering — with cost, latency, and update-frequency trade-offs.9 min readAI & AgentsBuilding AI agents with Claude tool use in productionWhat changes when an AI agent moves from demo to production — tool-call loops, error recovery, observability, cost controls, and the failure modes that only appear at scale.9 min readAI & AgentsSelf-hosting Llama vs Claude API: the real cost breakdownWhen self-hosting an open-weight LLM beats the Claude API, when it doesn't, and the operational costs nobody includes in their comparison.8 min read
    On this page
    • The decision in one paragraph
    • The feature comparison that matters
    • Why pgvector wins more often than you'd think
    • When pgvector hits its limit
    • When Pinecone makes sense
    • When Qdrant makes sense
    • Hybrid search — the under-appreciated win
    • Embedding model choices
    • Reranking — the highest-leverage step
    • Operational notes
    • FAQ
    • Can pgvector handle 100M vectors?
    • What about Weaviate / Milvus / Chroma?
    • Does Supabase support pgvector?
    • What about full-text + vector in one query?
    • How many dimensions do I need?
    • Can I use binary or scalar quantization?
    • Should I store the original text alongside the vector?
    • How do I update an embedding when the document changes?

    YOUR APP EXPERT LTD

    71-75 Shelton Street, LONDON WC2H 9JQ, UK

    +44 20 1234 5678

    [email protected]

    Quick Links

    • Services
    • About Us
    • Portfolio
    • Blog
    • Contact

    Stay Connected

    Newsletter

    Stay updated with our latest innovations and insights.

    © 2026 YOUR APP EXPERT LTD. All rights reserved.

    Engineering the Future of Technology