Lesson 1

Build a RAG system from scratch

Drag in a document, ask a question, get an answer grounded in it — with citations. A few hundred lines of readable PythonJavaScriptC#, no heavyweight frameworks.

Follow along in:

Overview

What you'll build

Retrieval-Augmented Generation (RAG) makes a language model answer questions about your documents instead of guessing from its training data. Here's the whole pipeline you'll build, stage by stage:

          ┌─────────────┐
 drop ───▶│ documents/  │   PDF · DOCX · TXT · MD
 files    └──────┬──────┘
                 │  extract text (per page)
                 ▼
            ┌─────────┐
            │  chunk  │   ~1000 chars, 200 overlap, keep source+page
            └────┬────┘
                 │  build + cache index (.localrag/)
                 ▼
         ┌────────────────┐
         │   retriever    │   BM25 (keyword)  —or—  embeddings (semantic)
         └───────┬────────┘
                 │  top-k chunks for the question
                 ▼
         ┌────────────────┐     ┌───────────────────────────────┐
 ask ───▶│ grounding      │───▶ │ provider                      │──▶ answer
 question│ prompt         │     │ Claude Code · Ollama · Gemini │    + sources
         └────────────────┘     │ · OpenAI                      │
                                └───────────────────────────────┘

Why it matters: an LLM alone hallucinates facts about your private data. RAG grounds it: retrieve the relevant text first, then make the model answer from that text and cite it. That's the difference between a demo and something you can trust.

Polyglot: pick Python (the reference), Node.js, or C# with the selector above — every step's code and commands follow your choice all the way to the end. The ports are faithful module-for-module translations; a couple of gaps (semantic embeddings, the gemini/openai providers) are Python-only and called out where they come up.

Press → to begin.

Setup

Prerequisites

Install the runtime for your language, then let the dispatcher handle the rest. Python is the reference stack.The Node.js port mirrors it module-for-module.The C# / .NET 8 port mirrors it module-for-module.

Create a virtualenv & install the libraries

python -m venv venv && source venv/bin/activate
pip install pypdf python-docx rank-bm25 numpy requests python-dotenv flask

You only need Node.js 18+

node --version                  # confirm 18+
./run -l 1 --lang node test     # installs deps on first use, then indexes

You only need the .NET 8 SDK

dotnet --version                # confirm 8.x
./run -l 1 --lang csharp test   # restores + builds on first use, then indexes

The only AI you need to start is the Claude Code CLI. If you can run claude in your terminal, you're set — no API key. Every language defaults to it; Ollama is the local fallback.

Install

Dependencies — Linux · macOS · Windows

Same idea on every OS. No Docker. The quickest path is always ./run -l 1 --lang node --lang csharp — it sets up everything on first use.

Linux / macOS

python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

Windows (PowerShell)

python -m venv venv; venv\Scripts\Activate.ps1
pip install -r requirements.txt

1 · Install Node.js 18+

sudo apt install -y nodejs npm              # Linux (Debian/Ubuntu)
brew install node                           # macOS
winget install -e --id OpenJS.NodeJS.LTS    # Windows

2 · Manual setup (optional — ./run does this for you)

cd node/lesson-1 && npm install

Windows without Git Bash (PowerShell / cmd)

cd node\lesson-1
npm install
node src\cli.js index      # validate; then: node src\cli.js web

1 · Install the .NET 8 SDK

# Linux: see learn.microsoft.com/dotnet/core/install/linux
brew install dotnet@8                          # macOS
winget install -e --id Microsoft.DotNet.SDK.8  # Windows

2 · Manual setup (optional — ./run does this for you)

cd dotnet/lesson-1 && dotnet restore && dotnet build -c Release

Windows without Git Bash (PowerShell / cmd)

cd dotnet\lesson-1
dotnet run -c Release -- index   # validate; then: dotnet run -c Release -- web

Default AI — Claude Code CLI (no API key)

npm install -g @anthropic-ai/claude-code && claude

Full per-OS, per-language guide: INSTALL (PDF). The Node index is cached separately in .localrag/node, so it never clobbers the Python one.The .NET index is cached separately in .localrag/dotnet, so it never clobbers the Python one.Or just run ./run -l 1 — it creates the venv and installs everything for you.

Step 1

Extract text from documents

RAG starts by turning files into plain text. We dispatch on file extension and normalize everything to a list of pages — each carrying its text, a page number, and the source filename (we need those for citations).

localrag/extract.py

from pathlib import Path
from typing import List, TypedDict

SUPPORTED_EXTS = {".pdf", ".docx", ".txt", ".md", ".markdown"}

class Page(TypedDict):
    source: str
    page_number: int
    text: str

def _extract_pdf(path: Path) -> List[Page]:
    from pypdf import PdfReader
    reader = PdfReader(str(path))
    pages = []
    for i, page in enumerate(reader.pages, start=1):
        text = (page.extract_text() or "").strip()
        if text:
            pages.append(Page(source=path.name, page_number=i, text=text))
    return pages

def extract_pages(path: Path) -> List[Page]:
    ext = path.suffix.lower()
    if ext == ".pdf":
        return _extract_pdf(path)
    if ext == ".docx":
        from docx import Document
        doc = Document(str(path))
        text = "\n".join(p.text for p in doc.paragraphs if p.text.strip())
        return [Page(source=path.name, page_number=1, text=text)] if text else []
    if ext in {".txt", ".md", ".markdown"}:
        text = path.read_text(encoding="utf-8", errors="replace").strip()
        return [Page(source=path.name, page_number=1, text=text)] if text else []
    return []

node/lesson-1/src/extract.js

export const SUPPORTED_EXTS = new Set([".pdf", ".docx", ".txt", ".md", ".markdown"]);

async function extractPdf(filePath) {
  const { default: pdfParse } = await import("pdf-parse");
  const data = await pdfParse(fs.readFileSync(filePath));
  const text = (data.text || "").trim();
  if (!text) return [];
  return [{ source: path.basename(filePath), page_number: 1, text }];
}

export async function extractPages(filePath) {
  const ext = path.extname(filePath).toLowerCase();
  if (ext === ".pdf") return extractPdf(filePath);
  if (ext === ".docx") return extractDocx(filePath);
  if (ext === ".txt" || ext === ".md" || ext === ".markdown") {
    const text = fs.readFileSync(filePath, "utf-8").trim();
    return text ? [{ source: path.basename(filePath), page_number: 1, text }] : [];
  }
  return [];
}

dotnet/lesson-1/Extract.cs

public record Page(string Source, int PageNumber, string Text);

public static readonly HashSet<string> SupportedExts =
    new(StringComparer.OrdinalIgnoreCase) { ".pdf", ".docx", ".txt", ".md", ".markdown" };

private static List<Page> ExtractPdf(string path)
{
    var pages = new List<Page>();
    var name = Path.GetFileName(path);
    using var doc = PdfDocument.Open(path);
    var i = 0;
    foreach (var page in doc.GetPages())
    {
        i++;
        var text = (page.Text ?? string.Empty).Trim();
        if (text.Length > 0) pages.Add(new Page(name, i, text));
    }
    return pages;
}

public static List<Page> ExtractPages(string path)
{
    var ext = Path.GetExtension(path).ToLowerInvariant();
    if (ext == ".pdf") return ExtractPdf(path);
    if (ext == ".docx") return ExtractDocx(path);
    if (ext is ".txt" or ".md" or ".markdown")
    {
        var text = ReadTextFile(path).Trim();
        return text.Length > 0 ? new() { new(Path.GetFileName(path), 1, text) } : new();
    }
    return new();
}

Why pages? PDFs have real pages, so a citation like manual.pdf:4 points to the exact spot. Extraction is the unglamorous 80% of real RAG — garbage text in, garbage answers out.

Parity note: pdf-parse returns whole-document text, so a PDF emits a single page (page_number 1) rather than one page per physical page. DOCX/TXT/MD match the Python reference exactly.

Parity note: PdfPig does read per-page like Python; DOCX/TXT/MD match the reference exactly.

Step 2

Split text into overlapping chunks

A whole document is too big to feed the model and too coarse to retrieve precisely. We split it into ~1000-character chunks. The key trick is overlap: each chunk repeats the tail of the previous one, so a sentence split across a boundary still appears intact somewhere.

localrag/chunk.py

def _split_text(text, size, overlap):
    text = " ".join(text.split())            # normalize whitespace
    if len(text) <= size:
        return [text] if text else []
    chunks, start, n = [], 0, len(text)
    while start < n:
        end = min(start + size, n)
        if end < n:                          # prefer a clean break near the limit
            window = text[start:end]
            for sep in (". ", "! ", "? ", "\n", " "):
                pos = window.rfind(sep)
                if pos > size // 2:
                    end = start + pos + len(sep)
                    break
        chunk = text[start:end].strip()
        if chunk:
            chunks.append(chunk)
        if end >= n:
            break
        start = max(end - overlap, start + 1)   # step back to create overlap
    return chunks

node/lesson-1/src/chunk.js

function splitText(text, size, overlap) {
  text = text.split(/\s+/).filter(Boolean).join(" "); // normalize whitespace
  if (text.length <= size) return text ? [text] : [];

  const chunks = [];
  let start = 0;
  const n = text.length;
  while (start < n) {
    let end = Math.min(start + size, n);
    if (end < n) {                          // prefer a clean break near the limit
      const window = text.slice(start, end);
      for (const sep of [". ", "! ", "? ", "\n", " "]) {
        const pos = window.lastIndexOf(sep);
        if (pos > Math.floor(size / 2)) { end = start + pos + sep.length; break; }
      }
    }
    const chunk = text.slice(start, end).trim();
    if (chunk) chunks.push(chunk);
    if (end >= n) break;
    start = Math.max(end - overlap, start + 1);  // step back to create overlap
  }
  return chunks;
}

dotnet/lesson-1/Chunk.cs

public static List<string> SplitText(string text, int size, int overlap)
{
    // Normalize whitespace: split on whitespace runs, join with single spaces.
    text = string.Join(" ", text.Split((char[]?)null, StringSplitOptions.RemoveEmptyEntries));
    if (text.Length <= size)
        return text.Length > 0 ? new List<string> { text } : new();

    var chunks = new List<string>();
    var start = 0;
    var n = text.Length;
    while (start < n)
    {
        var end = Math.Min(start + size, n);
        if (end < n)                          // prefer a clean break near the limit
        {
            var window = text.Substring(start, end - start);
            foreach (var sep in new[] { ". ", "! ", "? ", "\n", " " })
            {
                var pos = window.LastIndexOf(sep, StringComparison.Ordinal);
                if (pos > size / 2) { end = start + pos + sep.Length; break; }
            }
        }
        var chunk = text.Substring(start, end - start).Trim();
        if (chunk.Length > 0) chunks.Add(chunk);
        if (end >= n) break;
        start = Math.Max(end - overlap, start + 1);  // step back to create overlap
    }
    return chunks;
}

Chunk size is a dial. Too large → imprecise retrieval; too small → lost context. 800–1200 chars with 10–20% overlap is a sane default. Each chunk keeps its source and page_number so we can cite it later. The algorithm is identical across all three languages.

Step 3

Index and cache

Extract + chunk every file in documents/, then cache the result so we don't redo the work on every question. We fingerprint each file by (path, mtime, size); if nothing changed, we reuse the cache.

localrag/store.py

def is_stale(config):
    """True if the cache is missing or the documents folder changed."""
    index_path = config.cache_dir / "index.json"
    if not index_path.exists():
        return True
    data = json.loads(index_path.read_text())
    return data.get("fingerprint") != _fingerprint(discover_files(config.docs_dir))

def build_index(config):
    files = discover_files(config.docs_dir)
    chunks = []
    for path in files:
        chunks.extend(chunk_pages(extract_pages(path)))
    config.cache_dir.mkdir(parents=True, exist_ok=True)
    (config.cache_dir / "index.json").write_text(
        json.dumps({"fingerprint": _fingerprint(files), "chunks": chunks}))
    return chunks, len(files)

node/lesson-1/src/store.js

export function isStale(config) {
  // True if the cache is missing or the docs folder changed since last build.
  const ip = indexPath(config);
  if (!fs.existsSync(ip)) return true;
  let data;
  try { data = JSON.parse(fs.readFileSync(ip, "utf-8")); }
  catch { return true; }
  return !fingerprintsEqual(data.fingerprint, fingerprint(discoverFiles(config.docsDir)));
}

export async function buildIndex(config) {
  const files = discoverFiles(config.docsDir);
  const chunks = [];
  for (const filePath of files) {
    const pages = await extractPages(filePath);
    chunks.push(...chunkPages(pages));
  }
  fs.mkdirSync(config.cacheDir, { recursive: true });
  fs.writeFileSync(indexPath(config),
    JSON.stringify({ fingerprint: fingerprint(files), chunks }), "utf-8");
  return { chunks, fileCount: files.length };
}

dotnet/lesson-1/Store.cs

public static bool IsStale(Config config)
{
    var path = IndexPath(config);
    if (!File.Exists(path)) return true;
    IndexFile? data;
    try { data = JsonSerializer.Deserialize<IndexFile>(File.ReadAllText(path), JsonOpts); }
    catch { return true; }
    if (data is null) return true;
    return !FingerprintsEqual(data.Fingerprint, Fingerprint(Extract.DiscoverFiles(config.DocsDir)));
}

public static (List<Chunk> Chunks, int FileCount) BuildIndex(Config config)
{
    var files = Extract.DiscoverFiles(config.DocsDir);
    var chunks = new List<Chunk>();
    foreach (var path in files)
        chunks.AddRange(Chunking.ChunkPages(Extract.ExtractPages(path)));

    Directory.CreateDirectory(config.CacheDir);
    var index = new IndexFile(Fingerprint(files), ToRecords(chunks));
    File.WriteAllText(IndexPath(config), JsonSerializer.Serialize(index, JsonOpts));
    return (chunks, files.Count);
}

This is the "drop a file and ask again" loop. The index is just a JSON file — no database, no vector server to run. Perfect for a local demo and trivial to inspect.

Step 4

Retrieve with BM25 (zero setup)

Given a question, which chunks are relevant? The simplest robust answer is BM25, a classic keyword-ranking algorithm. It needs no model and no embedding service, so it works with any provider — including Claude Code, which can't embed.

localrag/retriever.py

import re
from rank_bm25 import BM25Okapi

def _tokenize(text):
    return re.findall(r"[a-z0-9]+", text.lower())

class Bm25Retriever:
    name = "bm25"
    def __init__(self, chunks):
        self.chunks = chunks
        self.bm25 = BM25Okapi([_tokenize(c["text"]) for c in chunks] or [[""]])

    def search(self, query, k):
        if not self.chunks:
            return []
        scores = self.bm25.get_scores(_tokenize(query))
        ranked = sorted(range(len(self.chunks)),
                        key=lambda i: scores[i], reverse=True)
        return [self.chunks[i] for i in ranked[:k]]

node/lesson-1/src/retriever.js

function tokenize(text) {
  return text.toLowerCase().match(/[a-z0-9]+/g) || [];
}

class Bm25Retriever {
  // The constructor builds idf + per-doc term frequencies (k1=1.5, b=0.75) —
  // a from-scratch BM25 Okapi, since there's no rank-bm25 in Node.
  search(query, k) {
    if (!this.chunks.length || k <= 0) return [];
    const scores = this.scores(tokenize(query));
    const ranked = scores.map((_, i) => i).sort((a, b) => scores[b] - scores[a]);
    const top = ranked.slice(0, k);
    const best = scores[top[0]];
    if (best <= 0) return top.map((i) => this.chunks[i]);
    return top.filter((i) => scores[i] > 0).map((i) => this.chunks[i]);
  }
}

dotnet/lesson-1/Retriever.cs

// The constructor builds idf + per-doc term frequencies (k1=1.5, b=0.75) —
// a from-scratch BM25 Okapi, since there's no rank-bm25 in .NET.
public List<Chunk> Search(string query, int k)
{
    if (_chunks.Count == 0 || k <= 0) return new List<Chunk>();
    var scores = GetScores(Tokenize(query));
    var ranked = Enumerable.Range(0, _chunks.Count)
        .OrderByDescending(i => scores[i])
        .ToList();
    var top = ranked.Take(k).ToList();
    var best = scores[top[0]];
    if (best <= 0) return top.Select(i => _chunks[i]).ToList();
    return top.Where(i => scores[i] > 0).Select(i => _chunks[i]).ToList();
}

A real bug worth knowing. BM25's IDF goes negative when a word appears in every chunk — common on a tiny corpus. An early version filtered with score > 0 and returned nothing. The fix: don't apply an absolute cutoff — return top-k by rank and let the grounding prompt judge relevance. Retrieval retrieves; the LLM decides. All three ports carry the same fix.

Step 5

The grounding prompt (anti-hallucination)

This is the heart of RAG. We hand the model the retrieved chunks as context and instruct it to answer from that context, cite sources, and clearly label anything it adds from general knowledge. The prompt text is identical in every language.

localrag/prompts.py

SYSTEM_PROMPT = """You are a careful assistant answering questions over a set
of the user's own documents. Follow these rules exactly:

1. Answer from the DOCUMENT CONTEXT below FIRST. For every claim that comes
   from the documents, cite the source like [filename:page].
2. If the answer is not contained in the document context, say so plainly:
   "This is not covered in your documents." You may then add general
   knowledge, but you MUST prefix it with
   "(general knowledge — not from your documents)".
3. Never invent document contents, quotes, or citations.
4. Be concise. Prefer the documents' own wording.
"""

def build_context(chunks):
    return "\n\n---\n\n".join(
        f"[{c['source']}:{c['page_number']}]\n{c['text']}" for c in chunks)

def build_user_prompt(question, chunks):
    context = build_context(chunks) if chunks else "(no relevant documents)"
    return f"DOCUMENT CONTEXT:\n{context}\n\nQUESTION:\n{question}"

node/lesson-1/src/prompts.js

export const SYSTEM_PROMPT = `You are a careful assistant answering questions over a set of
the user's own documents. Follow these rules exactly:

1. Answer from the DOCUMENT CONTEXT below FIRST. For every claim that comes from
the documents, cite the source like [filename:page].
2. If the answer is not in the context, say so plainly: "This is not covered in
your documents." Prefix any general knowledge with "(general knowledge — ...)".
3. Never invent document contents, quotes, or citations.
4. Be concise. Prefer the documents' own wording.`;

export function buildContext(chunks) {
  return chunks
    .map((c) => `[${c.source}:${c.page_number}]\n${c.text}`)
    .join("\n\n---\n\n");
}

export function buildUserPrompt(question, chunks) {
  const context = chunks.length ? buildContext(chunks) : "(no relevant documents found)";
  return `DOCUMENT CONTEXT:\n${context}\n\nQUESTION:\n${question}`;
}

dotnet/lesson-1/Prompts.cs

public const string SystemPrompt =
    "You are a careful assistant answering questions over a set of " +
    "the user's own documents. Follow these rules exactly:\n\n" +
    "1. Answer from the DOCUMENT CONTEXT below FIRST. Cite each claim like [filename:page].\n" +
    "2. If the answer is not in the context, say so plainly, then prefix any general " +
    "knowledge with \"(general knowledge — not from your documents)\".\n" +
    "3. Never invent document contents, quotes, or citations.\n" +
    "4. Be concise. Prefer the documents' own wording.\n";

public static string BuildContext(List<Chunk> chunks)
{
    var blocks = chunks.Select(c => $"[{c.Source}:{c.PageNumber}]\n{c.Text}");
    return string.Join("\n\n---\n\n", blocks);
}

public static string BuildUserPrompt(string question, List<Chunk> chunks)
{
    var context = chunks.Count > 0 ? BuildContext(chunks) : "(no relevant documents found)";
    return $"DOCUMENT CONTEXT:\n{context}\n\nQUESTION:\n{question}";
}

RAG quality = retrieval quality + prompt quality. A strict prompt with bad retrieval answers "not in your documents" to everything; a sloppy prompt with perfect retrieval still hallucinates. You need both. Cite, admit ignorance, label general knowledge — that's the minimum anti-hallucination contract.

Step 6

A provider abstraction

The pipeline shouldn't care which AI answers. So we define one tiny interface and pick an implementation by name. Switching providers becomes a one-line env-var change.

localrag/providers/__init__.py

from typing import Protocol

class LLMProvider(Protocol):
    name: str
    def is_available(self) -> bool: ...
    def chat(self, system: str, user: str) -> str: ...

def get_provider(name, config):
    if name == "claude":
        from .claude_code import ClaudeCodeProvider; return ClaudeCodeProvider(config)
    if name == "ollama":
        from .ollama import OllamaProvider; return OllamaProvider(config)
    if name == "gemini":
        from .gemini import GeminiProvider; return GeminiProvider(config)
    if name == "openai":
        from .openai import OpenAIProvider; return OpenAIProvider(config)
    raise ValueError(f"Unknown provider '{name}'")

The default: Claude Code CLI — no API key

import shutil, subprocess

class ClaudeCodeProvider:
    name = "claude"
    def __init__(self, config):
        self.bin = config.claude_bin
    def is_available(self):
        return shutil.which(self.bin) is not None
    def chat(self, system, user):
        result = subprocess.run(
            [self.bin, "-p", f"{system}\n\n{user}"],
            capture_output=True, text=True, timeout=180)
        return result.stdout.strip()

node/lesson-1/src/providers/index.js

import { ClaudeCodeProvider } from "./claudeCode.js";
import { OllamaProvider } from "./ollama.js";

export function getProvider(name, config) {
  name = (name || "").toLowerCase();
  if (name === "claude") return new ClaudeCodeProvider(config);
  if (name === "ollama") return new OllamaProvider(config);
  if (name === "gemini" || name === "openai") {
    throw new Error(`Provider '${name}' is not ported in Node yet — use the Python reference.`);
  }
  throw new Error(`Unknown provider '${name}'. Choose one of: claude, ollama.`);
}

The default: Claude Code CLI — no API key

export class ClaudeCodeProvider {
  constructor(config) { this.name = "claude"; this.bin = config.claudeBin; }
  isAvailable() { return resolveBin(this.bin) !== null; }
  chat(system, user) {
    const resolved = resolveBin(this.bin);   // find claude on PATH, no shell
    // The prompt goes on STDIN — no argv-length limits, no shell quoting.
    const out = execFileSync(resolved, ["-p"], {
      input: `${system}\n\n${user}`, encoding: "utf-8", timeout: 180000,
    });
    return out.trim();
  }
}

Parity note: the Node port ships claude + ollama. Asking for gemini/openai throws a clear error pointing back to the Python reference (./run -l 1 --provider gemini …).

dotnet/lesson-1/Providers/ILlmProvider.cs

public interface ILlmProvider
{
    string Name { get; }
    bool IsAvailable();
    string Chat(string system, string user);
}

public static class ProviderFactory
{
    public static ILlmProvider GetProvider(string name, Config config)
    {
        name = (name ?? string.Empty).ToLowerInvariant();
        return name switch
        {
            "claude" => new ClaudeCodeProvider(config),
            "ollama" => new OllamaProvider(config),
            "gemini" or "openai" => throw new InvalidOperationException(
                $"Provider '{name}' is not ported in C# yet — use the Python reference."),
            _ => throw new InvalidOperationException($"Unknown provider '{name}'."),
        };
    }
}

Parity note: the C# port ships claude + ollama. Asking for gemini/openai throws a clear error pointing back to the Python reference.

This is the pattern every LLM framework is built around — a provider interface plus adapters. Write it once by hand and LangChain's ChatModel stops looking like magic.

Step 7

Wire it together & run

The pipeline: ensure the index is fresh, retrieve top-k chunks, build the grounded prompt, call the provider, print the answer and its sources.

the core query function

def answer(question, retriever, config):
    hits = retriever.search(question, config.top_k)
    provider = get_provider(config.provider, config)
    reply = provider.chat(SYSTEM_PROMPT, build_user_prompt(question, hits))
    print(reply)
    sources = []
    for h in hits:
        tag = f"{h['source']}:{h['page_number']}"
        if tag not in sources:
            sources.append(tag)
    print("Sources:", ", ".join(sources) or "(none)")

Type this

python -m localrag index
python -m localrag ask "How do I reset the device?"

node/lesson-1/src/cli.js — the core query function

async function answer(question, retriever, config) {
  const hits = retriever.search(question, config.topK);
  const provider = getProvider(config.provider, config);
  const reply = await provider.chat(SYSTEM_PROMPT, buildUserPrompt(question, hits));
  console.log("\n" + reply.trim() + "\n");
  const sources = [];
  for (const h of hits) {
    const tag = `${h.source}:${h.page_number}`;
    if (!sources.includes(tag)) sources.push(tag);
  }
  console.log("Sources: " + (sources.join(", ") || "(none)"));
}

Type this

./run -l 1 --lang node index
./run -l 1 --lang node ask "How do I reset the device?"

dotnet/lesson-1/Program.cs — the core query function

void PrintAnswer(string question, IRetriever retriever)
{
    var hits = retriever.Search(question, config.TopK);
    var provider = ProviderFactory.GetProvider(config.Provider, config);
    var reply = provider.Chat(Prompts.SystemPrompt, Prompts.BuildUserPrompt(question, hits));
    Console.WriteLine("\n" + reply.Trim() + "\n");
    Console.WriteLine(hits.Count > 0
        ? "Sources: " + string.Join(", ", Engine.DedupSources(hits))
        : "Sources: (none)");
}

Type this

./run -l 1 --lang csharp index
./run -l 1 --lang csharp ask "How do I reset the device?"

Output

To reset the WidgetPro 3000, press and hold the power button for 10 seconds
until the status LED blinks blue three times. [sample_manual.md:1]

Sources: sample_manual.md:1

That's a complete RAG system. Everything after this is upgrades.

Step 8 · Upgrade

Semantic retrieval with embeddings

BM25 matches words. "How do I power-cycle it?" won't match a doc that says "restart" — no shared keywords. Embeddings match meaning: turn each chunk into a vector and rank by cosine similarity to the question.

localrag/retriever.py — EmbeddingRetriever

class EmbeddingRetriever:
    name = "embeddings"
    def __init__(self, chunks, config):
        import numpy as np
        from .providers import embed_texts
        self.chunks, self.config = chunks, config
        vectors = np.asarray(
            embed_texts(config.embed_provider, config,
                        [c["text"] for c in chunks]), dtype="float32")
        self._vectors = self._normalize(vectors)

    def search(self, query, k):
        import numpy as np
        from .providers import embed_texts
        q = self._normalize(np.asarray(embed_texts(
            self.config.embed_provider, self.config, [query]),
            dtype="float32"))[0]
        sims = self._vectors @ q              # cosine (vectors normalized)
        ranked = np.argsort(sims)[::-1][:k]
        return [self.chunks[i] for i in ranked]

Never dead-end: fall back to BM25

def build_retriever(chunks, config):
    if config.retriever == "embeddings":
        try:
            return EmbeddingRetriever(chunks, config)
        except Exception as exc:
            print(f"[localrag] Embeddings unavailable ({exc}). Falling back to BM25.")
    return Bm25Retriever(chunks)

Type this (needs Ollama running)

RAG_RETRIEVER=embeddings RAG_EMBED_PROVIDER=ollama python -m localrag ask "power-cycle steps?"

Semantic embeddings are Python-only. The Node port keeps the same "never dead-end" design: ask for embeddings and it prints a notice and falls back to BM25. For the full embeddings path, use the Python reference (./run -l 1).

node/lesson-1/src/retriever.js — graceful fallback

export function buildRetriever(chunks, config) {
  // Embeddings are not ported in Node; fall back to BM25 with a clear message,
  // mirroring the Python "never dead-end" design.
  if (config.retriever === "embeddings") {
    console.log(
      "[localrag] Embeddings not ported in Node. Falling back to BM25 " +
      "(use the Python reference for embeddings)."
    );
  }
  return new Bm25Retriever(chunks);
}

Semantic embeddings are Python-only. The C# port keeps the same "never dead-end" design: ask for embeddings and it prints a notice and falls back to BM25. For the full embeddings path, use the Python reference (./run -l 1).

dotnet/lesson-1/Retriever.cs — graceful fallback

public static class RetrieverFactory
{
    /// <summary>Pick a retriever from config. Embeddings fall back to BM25 in this port.</summary>
    public static IRetriever BuildRetriever(List<Chunk> chunks, Config config)
    {
        if (config.Retriever == "embeddings")
            Console.WriteLine("[localrag] Embeddings are not ported in C# yet. Falling back to BM25.");
        return new Bm25Retriever(chunks);
    }
}

BM25 vs embeddings: BM25 is free, instant, great for exact terms and codes. Embeddings catch paraphrases but cost an embed call. Production often runs both (hybrid) and merges rankings. You now have both — try the same question each way.In Python you have both — try the same question each way.In Python you have both — try the same question each way.

Step 9

A drag-and-drop web UI

A terminal is fine for you; a web page is better for a demo. A tiny FlaskExpressASP.NET minimal-API app reuses the exact same engine. Three endpoints: serve the page, accept dropped files (save + reindex), and answer questions.

localrag/web.py

@app.post("/api/upload")
def upload():
    for f in request.files.getlist("files"):
        name = secure_filename(f.filename)
        if Path(name).suffix.lower() in SUPPORTED_EXTS:
            f.save(base_config.docs_dir / name)
    chunks, n = refresh_index(base_config)      # rebuild on every drop
    return jsonify({"files": _list_files(base_config), "chunks": len(chunks)})

@app.post("/api/ask")
def ask():
    data = request.get_json()
    return jsonify(answer_question(_request_config(), data["question"]))

Type this

python -m localrag web      # http://127.0.0.1:5000

node/lesson-1/src/web.js

app.post("/api/upload", uploadFiles, async (req, res) => {
  for (const f of req.files || []) {
    const name = secureFilename(f.originalname);
    if (SUPPORTED_EXTS.has(path.extname(name).toLowerCase())) {
      await fs.promises.writeFile(path.join(baseConfig.docsDir, name), f.buffer);
    }
  }
  const { chunks, fileCount } = await refreshIndex(baseConfig);   // rebuild on every drop
  res.json({ files: listFiles(baseConfig), indexed_files: fileCount, chunks: chunks.length });
});

Type this

./run -l 1 --lang node      # http://127.0.0.1:5000

dotnet/lesson-1/Program.cs

app.MapPost("/api/upload", async (HttpRequest request) =>
{
    var form = await request.ReadFormAsync();
    foreach (var f in form.Files.GetFiles("files"))
    {
        var name = SecureFilename(f.FileName);
        if (!Extract.SupportedExts.Contains(Path.GetExtension(name))) continue;
        await using var fs = File.Create(Path.Combine(baseConfig.DocsDir, name));
        await f.CopyToAsync(fs);
    }
    var (chunks, nFiles) = Engine.RefreshIndex(baseConfig);   // rebuild on every drop
    return Results.Json(new { files = ListFiles(baseConfig), chunks = chunks.Count });
});

Type this

./run -l 1 --lang csharp    # http://127.0.0.1:5000

Same pipeline, nicer surface. The front end is one HTML file with vanilla JS: a dropzone using the browser's drag-and-drop events, a fetch() to upload, then a question box that renders the answer plus clickable source chips. All three ports serve the same page. Drop a PDF, ask a question, watch it cite the file you just dropped.

Step 10

See the anti-hallucination work

Ask something in your documents and something out of them, and watch the model stay honest:

Try both

python -m localrag ask "How long is the warranty?"
python -m localrag ask "What is the capital of France?"

Try both

./run -l 1 --lang node ask "How long is the warranty?"
./run -l 1 --lang node ask "What is the capital of France?"

Try both

./run -l 1 --lang csharp ask "How long is the warranty?"
./run -l 1 --lang csharp ask "What is the capital of France?"

Output

→ 24 months from purchase date [warranty.txt:1]        # grounded, cited

→ This is not covered in your documents.
  (general knowledge — not from your documents) The capital of France is Paris.

A tiny offline test locks in the retrieval core (no network, no LLM):

tests/test_smoke.py + run

def test_bm25_finds_reset_instructions():
    chunks = chunk_pages(extract_pages(SAMPLE), size=400, overlap=80)
    hits = Bm25Retriever(chunks).search("how do I reset the device", k=3)
    assert "power button" in hits[0]["text"].lower()

pytest -q      # 4 passed

Validate the port (indexes the sample corpus)

./run -l 1 --lang node test    # → "Indexed 4 file(s) into 44 chunk(s)."

The offline unit-test suite (tests/test_smoke.py) lives in the Python reference; the ports ship a test command that exercises the same extract → chunk → index path end to end.

Validate the port (indexes the sample corpus)

./run -l 1 --lang csharp test  # → "Indexed 4 file(s) into 32 chunk(s)."

The offline unit-test suite (tests/test_smoke.py) lives in the Python reference; the ports ship a test command that exercises the same extract → chunk → index path end to end.

That clean separation — cited facts vs. clearly-labeled general knowledge — is the payoff of the grounding prompt. It's what makes RAG trustworthy.

Try it yourself

Verify RAG on a brand-new document

Prove the app reads your files, not training data: feed it a story no model has ever seen. Download The_Magic_Turtle_Astronaut.pdf — a made-up legend — and drag it onto the web UI (./run -l 1 --lang node --lang csharp).

Grounded — each answer should cite [file:page]

What was the name of the magic turtle, and what species was she?
Who discovered the turtle's secret, and how?
What was the spaceship called, and how long did the journey take?
What planet did Caretta discover, and which star does it orbit?
On what date was the habitable planet discovered?

NOT in the story — should say "not covered in your documents"

How much did the spaceship cost to build?
What did the turtle eat during the twelve-year voyage?
Who was the President of Earth when the mission launched?

The honest "not covered" answers are the real test — the prompt staying truthful instead of inventing. Talk trick: ask "What is the nearest star system to the Sun?" — the story says Alpha Centauri (~4.25 ly), so it answers from the document, with a citation, even though it's general knowledge — retrieval preferring your text.

Recap

You built a RAG system — and understand every line

From scratch, you assembled:

Stage	What it does
Extraction	PDF/DOCX/TXT/MD → pages
Chunking	overlapping windows with source+page
Indexing	cached, auto-refreshing JSON index
Retrieval	BM25 and embeddings behind one interface
Grounding	a prompt that cites and admits ignorance
Providers	one interface, four back ends, claude + ollama, claude + ollama, env-var switch
Web UI	drag-and-drop, same engine underneath

No LangChain, no vector database, no cloud — and you understand every line. That understanding is the point: frameworks become easy once you know what they automate. You followed the Node.js port; the Python reference adds semantic embeddings and the gemini/openai providers. You followed the C# port; the Python reference adds semantic embeddings and the gemini/openai providers.

Exercises: merge BM25 + embedding rankings (hybrid); chunk on Markdown headings; show the exact sentence behind each citation; stream tokens to the web UI.

Next → Lesson 2: MCP servers Get the full source

Use ← → arrow keys, the dots, or the buttons. Pick a language above; your choice sticks across every step.