Hybrid search — vector + full-text re-ranking
Pure vector search is great for semantic similarity (“find me FAQs about refund-like concepts”) but loses on exact-keyword precision (“the product code SKU-4429 exactly”). Pure full-text is the inverse. Hybrid search runs both, then re-ranks the union — best of both worlds.
This recipe combines .similarTo() and .whereFullText() results with a simple linear combination of scores. Cascade owns the two queries; you own the re-ranking math.
The pattern
Section titled “The pattern”Three steps:
- Run a vector query — top N candidates by semantic similarity.
- Run a full-text query — top N candidates by keyword match.
- Merge by document id, combine the two scores, sort, take the top K.
The vector query catches semantically-related results that don’t share words with the query; the full-text query catches exact matches the vector might rank too low.
The retrieval service
Section titled “The retrieval service”import { KnowledgeChunk } from "../models/knowledge-chunk/knowledge-chunk.model";import { embed } from "app/ai/services/embed.service";
type HybridSearchInput = { question: string; limit?: number; candidatesPerSide?: number; vectorWeight?: number;};
type ScoredChunk = { contentId: string; chunkIndex: number; text: string; score: number;};
export async function hybridSearch(input: HybridSearchInput) { const candidatesPerSide = input.candidatesPerSide ?? 20; const vectorWeight = input.vectorWeight ?? 0.6; const fullTextWeight = 1 - vectorWeight;
const [vectorResults, fullTextResults] = await Promise.all([ runVectorSide(input.question, candidatesPerSide), runFullTextSide(input.question, candidatesPerSide), ]);
const merged = mergeByKey(vectorResults, fullTextResults, vectorWeight, fullTextWeight);
merged.sort((a, b) => b.score - a.score);
return merged.slice(0, input.limit ?? 5);}Three services in one — the two side-queries and the merge step. Each piece stays under twenty lines.
The vector side
Section titled “The vector side”async function runVectorSide(question: string, limit: number): Promise<ScoredChunk[]> { const queryEmbedding = await embed(question);
return KnowledgeChunk.query() .similarTo("embedding", queryEmbedding) .limit(limit) .get<ScoredChunk>();}Cosine similarity returns 0-1 already; no normalisation needed before the merge.
The full-text side
Section titled “The full-text side”async function runFullTextSide(question: string, limit: number): Promise<ScoredChunk[]> { const results = await KnowledgeChunk.query() .whereFullText(["text"], question) .selectRaw(fullTextScoreExpression()) .limit(limit) .get<ScoredChunk>();
return normaliseScores(results);}The score expression is driver-specific — ts_rank(...) on Postgres, { $meta: "textScore" } on MongoDB. The Expressions guide covers selectRaw for the driver branches.
Full-text scores aren’t bounded by default (Postgres ts_rank can return anything; MongoDB textScore is unbounded too), so normalise to 0-1 before combining with cosine:
function normaliseScores(chunks: ScoredChunk[]): ScoredChunk[] { if (chunks.length === 0) { return []; }
const maxScore = Math.max(...chunks.map((chunk) => chunk.score));
if (maxScore === 0) { return chunks; }
return chunks.map((chunk) => ({ ...chunk, score: chunk.score / maxScore, }));}Min-max normalisation against the top of each side’s results. Naive but cheap, and good enough for most apps.
The merge step
Section titled “The merge step”function mergeByKey( vectorResults: ScoredChunk[], fullTextResults: ScoredChunk[], vectorWeight: number, fullTextWeight: number,): ScoredChunk[] { const merged = new Map<string, ScoredChunk>();
for (const chunk of vectorResults) { const key = chunkKey(chunk);
merged.set(key, { ...chunk, score: chunk.score * vectorWeight }); }
for (const chunk of fullTextResults) { const key = chunkKey(chunk); const existing = merged.get(key);
if (existing) { existing.score += chunk.score * fullTextWeight; continue; }
merged.set(key, { ...chunk, score: chunk.score * fullTextWeight }); }
return Array.from(merged.values());}
function chunkKey(chunk: ScoredChunk): string { return `${chunk.contentId}:${chunk.chunkIndex}`;}A row that appears in both sides gets the weighted sum of both scores. A row that appears in only one side gets only that side’s weighted score — still a candidate, just at a lower combined rank.
Tuning weights
Section titled “Tuning weights”The default vectorWeight: 0.6 favours semantic similarity slightly. The right weight is domain-specific:
- Documentation / FAQs — full-text matters more (users type exact terminology). Try
0.4. - Conversational / chat — vector matters more (users paraphrase). Try
0.7. - Product catalogs — depends on whether searches are exact-SKU or descriptive. Measure with real queries before picking.
Don’t over-tune. The signal-to-noise on a recommendation system rarely justifies more than 2-3 weight values explored.
Going further
Section titled “Going further”- Pure vector search — Vector search guide
- The RAG pipeline that calls this service — RAG recipe
- Driver-specific score expressions in full — Expressions guide
- Full-text setup (tsvector / $text indexes) — Full-text search recipe