Vector search is table stakes for semantic search and RAG. But every platform comes with production tradeoffs you need to plan for.
Index construction is resource-intensive.
Index construction is expensive. HNSW balances build cost, query speed, and accuracy better than most options, but it's not like creating a B-tree. The algorithm can need 2-3x the memory of the final index during construction. At scale, builds take hours, eat tens of gigabytes of RAM, and fight your production workloads for resources.
Memory pressure compounds quickly.
Vector databases keep indexes in memory for speed. This works at prototype scale. At production scale, memory becomes your primary cost driver and failure mode. One Weaviate user importing roughly 60 million objects saw memory usage hit 130 GB and repeated out-of-memory failures, even after tightening cache limits. Solutions exist: memory-mapped storage, quantization, and tiered architectures can help. But they require planning.
Semantic similarity isn’t keyword matching.
Vector search finds concepts, not exact matches. A query for “invoice 8675309” might return documents about invoices generally without ever surfacing that specific one. Teams who rip out keyword search entirely tend to watch user satisfaction drop before figuring out they needed both all along.
Most mature platforms support hybrid search now, for example via techniques like Reciprocal Rank Fusion that merge ranked results from both approaches. Build hybrid from the start. Retrofitting it later hurts.
Filtering needs special handling.
Combining vector search with metadata filters sounds simple. Find similar properties where type equals “townhome.” In practice, it’s tricky. Filter before the search and you miss relevant results. Filter after and your top-k results might all get thrown out.
Some platforms handle this well. Others show order-of-magnitude latency spikes when filters get selective. Test with your actual filter cardinality before committing to an approach.
Cost curves steepen at scale.
The path from 10,000 vectors to 10 million vectors isn’t linear. Memory requirements grow. Managed service pricing tiers jump. Embedding model changes force full re-indexing. If you’re not careful and decide to embed everything, prepare for a price shock. Self-hosted options trade dollars for operational complexity. Pick either, but plan for the operational costs.
Vector databases are worth adopting. They do things keyword search can’t. Just plan for the realities above.



Excelent breakdown of the hidden costs teams miss when they prototype at 10k vectors and deploy at 10M. The hybrid search point is spot-on bc people tratthe embedding model like magic and forget exact match still matters for IDs and structured data. I dunno why more docs don't lead with memory pressure first, since that's what usually blows up production before filtering even becomes the bottleneck.