Skip to content
Warlock.js v4

Similarity Retrieval

Most cache operations look up entries by exact key match. Similarity retrieval looks them up by meaning — give the cache a query vector, get back the stored entries whose vectors are closest to it. Same set / get model you already know, just a different lookup function.

// Embed once on the way in...
await cache.set("doc.support-policy", policy, {
vector: await embed(policy.text),
tags: ["docs"],
});
// ...then ask for the entries closest to a fresh query.
const hits = await cache.similar(await embed(userQuestion), {
topK: 5,
threshold: 0.7,
});
for (const hit of hits) {
console.log(hit.key, hit.score, hit.value);
}

Why live here, in @warlock.js/cache? Because everything a vector store needs — TTL, eviction, tagging, namespacing, deep-clone-on-read — already does. Similarity is the same primitive with a different match function.

  • Semantic caching — skip an LLM round-trip when the incoming prompt is close enough to one you’ve already answered.
  • RAG retrieval — pull the top-k document chunks for a user query before handing them to a model.
  • Deduping near-duplicates — webhook payloads, support tickets, scraped articles.
  • Recommendations — “find items like this one” without building a full vector pipeline.

If exact-key lookup works for your case, use that — it’s faster and cheaper. Reach for similar() when the keys you’d want to match against don’t exist yet at query time.

  • Embedding / vector — a fixed-length array of numbers that represents a piece of text (or image, or audio). Two pieces of text with similar meaning produce vectors that point in similar directions.
  • Cosine similarity — a score in [-1, 1] measuring how aligned two vectors are. 1 means identical direction, 0 means unrelated, -1 means opposite. For embeddings from typical models, the practical range is [0, 1].
  • topK — return at most this many results, ordered by similarity (highest first).
  • Threshold — drop hits below this score before topK truncation.

Cache is embedding-agnostic — bring your own embedder (OpenAI, Cohere, a local model, anything that returns number[]). The cache stores and ranks; it doesn’t compute embeddings.

import { cache } from "@warlock.js/cache";
const text = "Refunds are issued within 14 days of purchase.";
const vector = await embedder.embed(text);
await cache.set("policy.refunds", { text }, {
vector,
tags: ["policies"],
ttl: "30d",
});

The vector lives alongside the entry. Read it back with plain get:

const policy = await cache.get<{ text: string }>("policy.refunds");
// → { text: "Refunds are issued within 14 days of purchase." }
const queryVec = await embedder.embed("How do I get my money back?");
const hits = await cache.similar<{ text: string }>(queryVec, {
topK: 3,
threshold: 0.7,
});
// hits[0] = { key: "policy.refunds", value: { text: "..." }, score: 0.89 }

Tag filters narrow the candidate pool before ranking — handy for multi-tenant setups or when one cache holds multiple knowledge bases.

const hits = await cache.similar(queryVec, {
topK: 5,
tags: ["docs"], // only entries tagged with "docs" are scored
});
// Up to 10 results, but only ones scoring 0.8+
const hits = await cache.similar(queryVec, { topK: 10, threshold: 0.8 });
// May return zero results if nothing clears the floor — that's a feature.

similar() only considers entries written with set({ vector }). A plain set adds the entry to the cache as KV — it’s invisible to similarity queries. This means you can mix vector-indexed and plain entries in the same cache without polluting your similarity results.

// Indexed for similarity:
await cache.set("doc.1", doc1, { vector: vec1 });
// Plain KV — invisible to similar():
await cache.set("session.abc", sessionData, "1h");
// Only doc.1 shows up here:
const hits = await cache.similar(queryVec, { topK: 10 });

Not every driver indexes vectors. The capability is opt-in per driver:

DriverStatusNotes
memory✅ Brute forceDev / small datasets only — O(N) per query
lru-memory✅ Brute forceSame — eviction also drops vectors
memory-extended✅ Brute forceInherits memory semantics
pg (with vector config)✅ pgvectorProduction option — HNSW or IVFFlat index, native cosine <=>
pg (without vector config)❌ Throws CacheUnsupportedErrorRun KV-only
redis❌ Throws CacheUnsupportedErrorRediSearch support is on the backlog
file❌ Throws CacheUnsupportedErrorNo similarity index
nulln/a — similar() returns []Black-hole semantics preserved

For dev, start with memory. For production, switch to pg with the vector config block — same code, real index.

  • CacheUnsupportedError — you called similar() on a driver that doesn’t index vectors, or set({ vector }) on the same. Check the matrix above.
  • CacheConfigurationError: Vector dimension mismatch — the query vector’s length doesn’t match what’s stored. This usually means the embedder changed (or the dimension config on the driver is wrong). Vectors are not portable across embedders — re-embed on a switch.
  • CacheConfigurationError: pgvector extension not installed — only on pg. Run CREATE EXTENSION vector; once, or remove options.pg.vector to fall back to KV-only.

The memory family does brute-force scans. That’s fine for development, fine for a few thousand entries, not fine for production knowledge bases at scale. Each query is O(N) over every vector-tagged entry. Past ~10k entries you’ll feel it.

For real production workloads, use the pg driver with vector config — pgvector’s HNSW index is sub-linear and battle-tested.

  • pg driver — Postgres + pgvector setup
  • Set Options — TTL, tags, conflict policy details
  • Tags — building tag-narrowed knowledge bases