Redis vector search performance depends on your index configuration. Learn to tune HNSW parameters and optimize RediSearch to hit sub-10ms retrieval latency.
Last month, our team spent about three days debugging why our RAG pipeline's retrieval latency spiked from 15ms to over 200ms during peak load. It turned out we were treating our vector store like a standard key-value cache, ignoring the nuances of the HNSW (Hierarchical Navigable Small World) algorithm.
If you’re already leveraging LLM Caching Strategies to Slash Latency and API Costs, you know that retrieval speed is the heartbeat of your application. When you move into the world of Redis vector search, you’re no longer just fetching bytes; you’re navigating high-dimensional graphs.
To get consistent performance, you have to stop thinking about Redis as a simple bucket of data. When you enable RediSearch, you’re building an in-memory graph. The most common pitfall I see is using default index parameters for production workloads.
When you define your index using FT.CREATE, you have to make a choice between FLAT (brute force) and HNSW (approximate nearest neighbor). For datasets larger than a few thousand vectors, HNSW is mandatory. However, HNSW requires careful tuning of two parameters: M and EF_CONSTRUCTION.
M improves recall but increases memory usage.We first tried bumping M to 64, thinking more links meant faster traversal. It broke our memory budget within hours, causing massive OOM evictions. We learned the hard way that Partial Indexing Strategies to Boost Database Performance and Storage Efficiency apply here too—don't index fields you don't need for filtering.
Instead, we settled on an M of 16 and a higher EF_RUNTIME during query time. Here is the configuration that brought our P99 latency back down to around 8ms:
BashFT.CREATE idx:documents ON HASH PREFIX 1 doc: SCHEMA embedding VECTOR HNSW 6 TYPE FLOAT32 DIM 1536 DISTANCE_METRIC COSINE metadata TAG
When you query, don't forget to set the EF_RUNTIME parameter. If your results aren't accurate enough, bump this number. Just be aware that it directly impacts your CPU cycles per request. It’s a constant trade-off between speed and precision.
One of the biggest wins for ANN indexing is combining vector search with scalar filtering. If you’re building a complex RAG pipeline, you need to be careful. As discussed in Implementing Metadata Filtering for Precise RAG Pipeline Retrieval, applying filters after the vector search is a waste of compute.
Always perform pre-filtering using TAG or NUMERIC fields in your index. This allows RediSearch to prune the search space before it even touches the vector graph. It’s the difference between scanning 1,000,000 vectors and scanning 50,000.
1. How do I know if my M value is too high?
Check your memory usage with INFO MEMORY. If your Redis instance is nearing its maxmemory limit during index creation, your M value is likely too high for your available RAM.
2. Why is my query latency inconsistent?
It’s often due to the EF_RUNTIME parameter. If you set it too high, you’ll see spikes in CPU usage. Try lowering it until you find the sweet spot between latency and recall accuracy.
3. Does the distance metric matter for performance?
Not significantly for raw retrieval speed, but COSINE is generally preferred for text embeddings (like OpenAI’s text-embedding-3-small). Stick to what your embedding model was trained for to ensure your distance calculations are actually meaningful.
I’m still experimenting with index quantization to see if we can shrink our memory footprint by another 20%. Moving forward, I’d suggest profiling your queries with FT.PROFILE before making any major configuration changes. It’s the only way to see exactly where your milliseconds are going.
Master Postgres RLS to enforce data isolation in your multi-tenant SaaS architecture. Learn how to secure shared databases without sacrificing performance.
Read moreBun Test migration offers massive speedups for your JavaScript testing suite. Learn the real performance benchmarks and the hidden gotchas you'll face.