LESSON 3 of 6 Expert

Building RAG Pipelines

From chunking strategies and embedding models to vector databases, hybrid search, re-ranking, and evaluation — the complete RAG engineering guide.

6 min read 4 quiz questions

RAG Engineering

Building a production RAG system goes far beyond “put documents in a vector store.” Each component requires careful engineering decisions.

Document Processing

Loading

Handle diverse formats: PDF (use unstructured.io or PyMuPDF), HTML (BeautifulSoup), DOCX (python-docx), spreadsheets, emails. Preserve structure — headings, tables, and lists carry semantic meaning.

Chunking Strategies

StrategyHow It WorksBest For
Fixed-sizeSplit every N tokens with M overlapSimple documents
RecursiveSplit by headings → paragraphs → sentencesStructured content
SemanticGroup by embedding similarityMixed-topic documents
Parent-childSmall chunks for retrieval, return parent chunk for contextPrecision + context

Recommended starting point: Recursive splitting at 400 tokens with 50-token overlap.

Metadata Enrichment

Attach metadata to each chunk: source document, page number, section heading, date, author. This enables filtered retrieval and citation.

Embeddings

Convert text chunks to vectors that capture semantic meaning.

ModelDimensionsQualitySpeed
OpenAI text-embedding-3-large3072ExcellentFast (API)
Cohere embed-v41024ExcellentFast (API)
bge-large-en-v1.51024Very goodSelf-hosted
all-MiniLM-L6-v2384GoodVery fast

Choose based on your constraints: API models are easy but have cost/privacy implications. Open-source models can run locally.

Vector Databases

DatabaseTypeBest For
PineconeManaged cloudProduction, no ops overhead
QdrantSelf-hosted or cloudFlexible, great filtering
ChromaDBEmbeddedPrototyping, small datasets
pgvectorPostgreSQL extensionAlready using Postgres
WeaviateSelf-hosted or cloudMulti-modal search

Retrieval Pipeline

Query embedding → find top-K nearest chunks → pass to LLM.

Combine vector search with BM25 keyword search using Reciprocal Rank Fusion (RRF):

$$RRF(d) = \sum_{r \in R} \frac{1}{k + r(d)}$$

Where $r(d)$ is the rank of document $d$ in each result set, and $k$ is typically 60.

Best: Hybrid Search + Re-ranking

After hybrid retrieval, use a cross-encoder re-ranker (e.g., Cohere Rerank, bge-reranker-v2) to re-score the top 20-50 results and keep the top 5-10.

This dramatically improves precision at a small latency cost.

Generation

Structure your generation prompt carefully:

Answer the question based ONLY on the provided context.
If the context doesn't contain enough information, say so.

Context:
{retrieved_chunks}

Question: {user_query}

Key techniques:

  • Instruct faithfulness — tell the model to only use provided context
  • Include citations — ask the model to reference which chunks it used
  • Handle uncertainty — instruct it to say “I don’t know” rather than hallucinate

Evaluation with RAGAS

Evaluate your RAG pipeline on four dimensions:

  1. Faithfulness — Is the answer supported by the retrieved context?
  2. Context Relevancy — Are the retrieved chunks actually relevant?
  3. Answer Relevancy — Does the answer address the question?
  4. Answer Correctness — Is the answer factually correct? (requires ground truth)

Build an evaluation dataset of 50-100 question/answer pairs from your documents and run regular evaluations as you tune the pipeline.

Common Failure Modes

  • Wrong chunks retrieved → improve chunking or add re-ranking
  • Answer not faithful to context → strengthen the generation prompt
  • Missing information → check if documents are properly indexed
  • Contradictory chunks → add metadata filtering and recency weighting

Key takeaway: RAG engineering is iterative. Start simple (basic vector search), measure with RAGAS, then add complexity (hybrid search, re-ranking, metadata filtering) where evaluation shows gaps.

Quick Quiz

Test what you just learned. Pick the best answer for each question.

Q1 What is the main trade-off when choosing chunk size for RAG?

Q2 What is 'hybrid search' in RAG?

Q3 What is a 're-ranker' in a RAG pipeline?

Q4 What does the RAGAS framework evaluate?