Inside VectorDB

How VectorDBs work under the hood

October 3, 2025 · 12 min read

Introduction

Imagine you are an e-commerce platform with say 10 million products on your website. Your customers are searching for products by name, description, or attributes. You want the ability to find the products that are most similar to the query. How do you go about it? Fuzzy searches / Keyword searches, well it would fail miserably given the scale and nature of the problem.

When keyword search is insufficient, we need semantic search powered by vector similarity! Product descriptions can be encoded as vectors using embeddings, but how do we store these vectors and query them efficiently at scale? This is where VectorDBs come into play.

Traditional route

When we fire a query to a traditional database, it is a simple table lookup. But here your natural language query is encoded to an n-dimensional vector first and then looked up in the database holding more such vectors.

This is where the magic happens: Vector Similarity Search.

The Brute Force Problem

The most straightforward approach to find similar vectors is the K-Nearest Neighbors (KNN) exact search. Here’s how it works:

Take your query vector
Compare it with every single vector in the database
Calculate distance (cosine, euclidean, etc.) for each comparison
Sort all distances and return the K closest vectors

KNN exact search: Query vector compared against all N vectors in database

Why This Doesn’t Scale

Let’s put this into perspective with some numbers:

Database size: 10 million product vectors
Vector dimensions: 768 (typical for text embeddings)
Query time: For each search, you need 10M distance calculations
Memory: Vectors are typically memory-mapped or batched for efficient access

All this with additional overhead of doing cosine similarity calculation which itself is a computationally intensive operation and increases with dimensionality of the vectors.

1# Pseudo-code for brute force search
2def knn_search(query_vector, database_vectors, k=5):
3    distances = []
4    for vector in database_vectors:  # O(N) - This is the problem!
5        distance = calculate_distance(query_vector, vector)
6        distances.append((distance, vector))
7
8    # Sort and return top K
9    return sorted(distances)[:k]  # O(N log N)

The time complexity is O(N) for each query, where N is the number of vectors. With millions of vectors and hundreds of concurrent users, this becomes computationally prohibitive.

Real-world impact:

Query latency: Several seconds per search
CPU usage: Maxed out with just a few concurrent users
Memory requirements: Entire dataset must fit in RAM
Cost: Expensive hardware for acceptable performance

Can we do better than this?

The smarter route

Instead of going through all the vectors, we can utilize inherent nature of the vectors to our advantage. Can we skip some of the vectors and still get approximate neighbors without losing relevance? Turns out we can, if we use clever data structures to store these vectors.

Here are some approaches to get approximate nearest neighbors:

KD Trees
Locality Sensitive Hashing (LSH)
Annoy
HNSW

We will focus on HNSW in detail to keep the scope brief, it is widely used in practice across industry.

Textbook description of HNSW

HNSW stands for Hierarchical Navigable Small World, a powerful algorithm for performing approximate nearest neighbor (ANN) searches in high-dimensional vector spaces. It organizes data into a multi-layered, hierarchical graph structure, allowing for very fast similarity searches by using long-range “skip” connections in upper layers to quickly narrow down the search area, and fine-grained searches in lower layers to find the most accurate results.

Too many words!

Real World Travel Analogy

Imagine you live in a city in state $S_1$ and want to visit a remote village in state $S_k$. Instead of driving through every town, you use the hierarchical highway system:

Level 2 - Interstate Roads (State-to-State)

Goal: navigate from one state to another.

States ($S_1$, $S_2$, $S_k$): Represent major regions in the vector space
Solid arrows: Direct interstate highways between states
Dashed arrows: Long-distance connections that can be skipped
Navigate from $S_1$ -> $S_2$ … -> $S_k$ using the most efficient interstate route

Level 2: State-to-State travel using Interstate roads

Level 1 - Regional Travel (Inter-City)

Goal: navigate through cities.

Gray circles: Regular cities/vectors within each state
Red circles: Hub cities that connect to other states
Hatched areas: Irrelevant regions skipped during greedy search
Blue circle: Target destination in state $S_k$
Search within the relevant regions, skip irrelevant areas

Level 1: Inter-city travel within states

Level 0 - Local Navigation (Village-to-Village)

Goal: navigate through villages.

Connected nodes: Local road networks between nearby cities
Green path: Actual search route taken from start to target
Blue dotted connections: Final local connections to reach destination
Navigate through local connections to find the exact target vector

Level 0: Village-to-village local navigation

Why This Works

Hub Cities: Act as shortcuts connecting distant regions efficiently
Hierarchical Levels: Each level provides different granularity of navigation
Skip Irrelevant Areas: Never waste time in regions that don’t help
Approximate Results: Fast navigation to very good destinations

This is exactly how HNSW manages to find approximate nearest neighbors - using smart way of organizing vectors, similarity search can be optimized for speed while still maintaining relevance.

How HNSW works

Here we will dive deep into internals of how HNSW works.

Skip lists and HNSW

In order to simulate “skipping of vectors” we use a special data structure like skip list. This enables us to skip some of the vectors and still get approximate nearest neighbors.

In the diagram:

Bottom level: All nodes (4, 7, 22, 34, 39, 47, 66, 69, 74, 91)
Higher levels: Progressively fewer “hub” nodes (34, 47, 74 at top)
Search path: Start from HEAD, use express lanes, drop down when needed

Here are some properties of a skip list that align very well with HNSW:

Multi-level indexing: Higher levels contain fewer nodes for faster traversal
Logarithmic search: O(log n) search time instead of O(n) linear search
Express lanes: Upper levels act like highways to jump across large distances
Progressive refinement: Start high, drop down levels to get more precise
Probabilistic structure: Nodes randomly promoted to higher levels
Efficient skipping: Can bypass many irrelevant nodes during search

The same skip list can be represented as a graph as shown below:

Equivalent graph representation of a skip list

Notice: In order to find 66, (HEAD -> 34 -> 47 -> 66) you jump through just 2 nodes, this is the power of skip list.

Core Algorithm of HNSW (Simplified)

Level assignment

What does it mean when we say level of vector A is 2?

A vector can exist on multiple levels simultaneously: 0, 1, and 2
Think of it like an employee who has access to level 2 clearance - they also have access to all levels below it (1 and 0)
By default, every vector is present in level 0

How is this level assigned?

Level is assigned using an exponential-like distribution
Formula: max_level = floor(-ln(U) * mL) where:
- U ~ uniform(0,1) is a random number
- mL = 1/ln(M) is the level multiplier (normalization factor)
The formula floor(-ln(U) / ln(M)) is equivalent but shows the dependency on M directly
Some implementations allow tuning mL independently to control hierarchy height

Why is exponential-like distribution used?

It naturally favors lower levels - most vectors exist only at L=0
Very few vectors exist at higher levels (exponentially decaying probability)
Sampling from this distribution ensures hierarchical structure similar to skip lists
Setting mL = 0 would create a single-layer graph (no hierarchy)

Close friends

In a social setting, you have a few close friends that you regularly hang out with
HNSW also likes to follow this pattern
Each vector is allowed to keep M number of close friends
Close friends = Nearest neighbors

Dotted lines represent bidirectional edges between vectors
Note on bidirectionality: When vector V connects to neighbor N, N must also add V to its neighbor list. If N already has M neighbors, it must prune one neighbor to stay within the limit (potentially V itself if V is the furthest). After pruning, edges are not guaranteed to be perfectly symmetric across the graph.

New person in the town

When a new person arrives in town, they don’t randomly knock on every door they get an introduction and explore the social network through connections
Similarly, inserting a new vector means navigating the graph through existing connections to find the best neighbors

When inserting a new vector into HNSW, the algorithm doesn’t scan all existing vectors. Instead, it uses a greedy search process starting from an entry point:

The insertion process:

flowchart LR newVector["New Vector"] search["Search
K candidates"] connect["Connect
Best M neighbors"] newVector --> search search --> connect style newVector fill:#E3F2FD,stroke:#1976D2,stroke-width:3px style search fill:#E8F5E9,stroke:#388E3C,stroke-width:3px style connect fill:#FFEBEE,stroke:#D32F2F,stroke-width:3px

Step-by-step breakdown:

Entry point: Start from the entry point provided by the layer above
Greedy search: Navigate through the graph by examining neighbors of the current node, always moving to the neighbor closest to the new vector
Candidate pool: During this search, maintain a list of the K closest vectors encountered (these are potential neighbors)
Edge creation: From this candidate pool, select the top M closest vectors and create bidirectional edges to them

Why K > M?

K controls the search breadth during insertion. A larger value means exploring more candidates before selecting neighbors
M limits the number of edges per node, keeping the graph’s connectivity manageable
By evaluating K candidates (e.g., 100) but only connecting to M neighbors (e.g., 16), we get better quality connections while maintaining efficient graph structure

Parameter naming clarification:
During construction: The parameter K is called efConstruction - it controls how many candidates to evaluate when inserting a new vector
During search/query: The parameter is called ef (or ef_search) - it controls how many candidates to evaluate when searching for neighbors
Both control exploration breadth, but affect different phases (build time vs query time)

Key insight: K > M

Larger K -> better neighbor selection (higher recall)
Smaller M -> fewer edges per node (faster queries, lower memory)

Mutual Friends: The Bridge Between Circles

When you want to meet someone from a distant social circle, you don’t randomly reach out you find a mutual friend who bridges both groups
HNSW uses the same idea: routing nodes are vectors that bridge distant regions of the vector space

Example: Vector Q gets assigned Level = 2

Vector Q exists on multiple levels after being assigned max_level = 2

When new vector Q is inserted and gets assigned Level = 2:

Q exists on L=2, L=1, and L=0: Same vector, but “copies” appear at each level in the hierarchy
At each level, Q searches for and connects to different nearest neighbors:
- At L=2: Q finds its nearest neighbors among the ~3 vectors at L=2 (sparse, long-range connections)
- At L=1: Q finds its nearest neighbors among the larger set at L=1 (medium-range connections)
- At L=0: Q finds its nearest neighbors among ALL vectors at L=0 (dense, short-range connections)

Why multi-level neighbors solve the reachability problem:

At L=0 alone: Each node limited to M neighbors means limited connectivity. Some vectors may be unreachable through the existing neighbor connections
At L=2: With only ~3 vectors total, Q’s M neighbors provide connections to much broader regions of the vector space
The hierarchy improves graph connectivity: L=2 provides global connectivity, L=1 provides regional connectivity, L=0 provides local precision

The bridging property emerges from scarcity:

Fewer routing nodes at higher levels: L=2 has only 3 vectors -> Q’s M neighbors span broader regions, creating bridges to otherwise disconnected parts of the graph
More routing nodes at lower levels: L=0 has all vectors -> Q’s M neighbors connect to nearby nodes, providing local precision
Hierarchical navigation: Start at Q on L=2 (reach different regions), descend to Q on L=1 (refine), finally Q on L=0 (find precise neighbors)

Key insight: Q having different neighbors at each level enables efficient graph traversal. At L=0, M neighbors provide local connectivity. At L=2, those same M neighbor slots provide global connectivity across the vector space, solving reachability problems that plague single-level graphs.

Performance Tests: Brute Force vs HNSWlib vs FAISS

Now that we understand how HNSW works, let’s see how it performs in practice. I ran some tests on 1M documents from the MS MARCO dataset.

Important: These are exploratory tests, not rigorous benchmarks. Full experimental code and configurations are available here: https://github.com/sagarsrc/inside-vectordb

Experimental Setup

Dataset: MS MARCO (1M documents, 768-dimensional embeddings)

Methods:

Brute Force: sklearn cosine similarity (baseline, 100% recall)
HNSWlib: M=32, ef_construction=100, ef_search=50
FAISS HNSW: M=32, ef_construction=100, ef_search=50

Distance Metric: Cosine similarity (implemented as L2-normalized inner product)

Results

Performance Summary

Method	Implementation	Latency	Throughput	Recall@10	Speedup
Brute Force	sklearn	699.43 ms	1.4 QPS	0.8100 (100%)	1x
HNSWlib	C++ with Python bindings	0.46 ms	2,166 QPS	0.7433 (91.8%)	1,515x
FAISS HNSW	IndexHNSWFlat	0.08 ms	11,805 QPS	0.7683 (94.9%)	8,257x

NOTE: The numbers are inflated as we are using 384 dimension vectors, in practice dimensionality is much higher and would affect latency and throughputs.

Visual Comparisons

Latency comparison: FAISS achieves sub-millisecond queries

Throughput: FAISS processes 11,805 queries/second

Recall@10: HNSW maintains 91.8-94.9% accuracy relative to brute force

Speed vs Accuracy: 1500-8000x speedup with 5-8% recall loss

ef_search sensitivity analysis showing recall-latency tradeoffs

Key Observations

HNSW delivers massive speedups: 1500-8000x faster than brute force, sacrificing only 5-8% recall
Fair comparison: Both implementations use M=32, making this a true apples-to-apples comparison
FAISS optimizations shine: FAISS’s highly optimized C++ implementation gives it 5x better throughput than HNSWlib
Sub-millisecond queries at scale: Both HNSW methods achieve sub-millisecond latency on 1M vectors
Recall-speed tradeoff: The ef_search parameter allows fine-tuning between recall and latency based on application needs

Closing Notes

Vector databases power modern AI from semantic search to RAG systems. HNSW’s clever hierarchical design makes searching millions of vectors feel instant.

Key insights:

Skip list hierarchy creates express lanes through vector space
Random level assignment naturally creates routing nodes
Multi-level neighbors solve graph connectivity same vector, different connections at each level
Parameters matter: M (memory), efConstruction (build quality), ef_search (query speed)
Approximate wins: 94% recall with 1000x+ speedup

Full code and experiments on GitHub: https://github.com/sagarsrc/inside-vectordb

References

Written By

Sagar Sarkale