THN Interview Prep

RAG ingest, retrieve & pack

RAG gives an LLM external knowledge at request time. Instead of hoping the model memorized facts, your system retrieves relevant, authorized evidence and packs it into the prompt.

RAG is not "add a vector database." It is an end-to-end pipeline: ingest -> chunk -> embed -> index -> retrieve -> rerank -> pack -> answer -> evaluate.


Full RAG workflow

Loading diagram…

Explanation: vector search finds semantic neighbors, keyword search catches exact identifiers, object storage keeps original content, and metadata enforces access and freshness.


What each storage layer stores

StorageStoresWhy it exists
Object/blob storageOriginal files, parsed text, OCR output, snapshots.Reproducibility, re-indexing, audit, source download.
Relational/document DBDocument metadata, tenants, ACLs, chunk lineage, ingest status.Filtering, governance, delete/update workflows.
Vector DB / ANN indexEmbedding vectors plus metadata pointers.Semantic similarity search at scale.
Keyword indexTokens, terms, exact ids, BM25-style inverted index.Names, SKUs, error codes, proper nouns.
CacheRecent retrievals, embeddings, rerank results.Latency/cost reduction with tenant-safe keys.
Trace/eval storeQueries, retrieved ids, answer outcome, ratings.Debugging and regression testing.

Do not store secrets in vectors or traces. Embeddings are not a privacy boundary.


Chunking choices

ContentGood chunk strategyFailure mode
Policies/proseParagraph or section chunks with heading path.Losing the exception clause in a different chunk.
TablesPreserve rows, headers, units, and surrounding explanation.Numeric answers hallucinate because headers were removed.
CodeFunction/class-level chunks with repo path and symbol.Splitting imports from function body.
Tickets/chatsThread-aware chunks with time and resolution status.Retrieving complaint but not final fix.
PDF/OCRClean layout noise, keep page numbers and confidence.Navigation/footer text dominates embeddings.

Vector database basics

Loading diagram…

Vector search is approximate nearest-neighbor search. It is good for semantic similarity, not guaranteed factual correctness. Always combine it with filters, reranking, citations, and evaluation.


Context packing

Pack evidence in a deterministic order:

  1. Drop chunks the user cannot access.
  2. Drop stale or superseded versions.
  3. Merge duplicate/overlapping chunks.
  4. Prefer chunks with exact identifiers when the query contains identifiers.
  5. Keep source ids and short titles with each chunk.
  6. Reserve token budget for the user's question and final answer.
  7. Tell the model to answer only from evidence for factual claims.

RAG failure states

Loading diagram…

Interview questions

1. Why use hybrid search instead of only vectors?

  • Vectors are strong for semantics. Keyword search is stronger for exact ids, product names, codes, and rare terms. Fusion gives better recall.

2. Where do ACL checks happen?

  • Ideally before ranking or at least before packing. Never rely on the model to ignore unauthorized chunks.

3. How do you debug a hallucinated RAG answer?

  • Inspect retrieved ids, ranking scores, source versions, packed context, final prompt, and whether the answer contained unsupported claims.

4. What happens when retrieval returns nothing?

  • The assistant should say it lacks evidence, ask a clarifying question, or route to another source. It should not invent.

5. How do you handle document updates?

  • Version documents, checksum ingest, mark old chunks superseded, dual-write during embedding migrations, and test query regressions.

LLM contracts, context & tools · Agent memory, state & storage · Evaluations

Spotted something unclear or wrong on this page?

On this page