Drag in a document, ask a question, get an answer grounded in it — with citations. A few hundred lines of readable PythonJavaScriptC#, no heavyweight frameworks.
Retrieval-Augmented Generation (RAG) makes a language model answer questions about your documents instead of guessing from its training data. Here's the whole pipeline you'll build, stage by stage:
┌─────────────┐
drop ───▶│ documents/ │ PDF · DOCX · TXT · MD
files └──────┬──────┘
│ extract text (per page)
▼
┌─────────┐
│ chunk │ ~1000 chars, 200 overlap, keep source+page
└────┬────┘
│ build + cache index (.localrag/)
▼
┌────────────────┐
│ retriever │ BM25 (keyword) —or— embeddings (semantic)
└───────┬────────┘
│ top-k chunks for the question
▼
┌────────────────┐ ┌───────────────────────────────┐
ask ───▶│ grounding │───▶ │ provider │──▶ answer
question│ prompt │ │ Claude Code · Ollama · Gemini │ + sources
└────────────────┘ │ · OpenAI │
└───────────────────────────────┘gemini/openai providers) are
Python-only and called out where they come up.Press → to begin.
Install the runtime for your language, then let the dispatcher handle the rest. Python is the reference stack.The Node.js port mirrors it module-for-module.The C# / .NET 8 port mirrors it module-for-module.
python -m venv venv && source venv/bin/activate
pip install pypdf python-docx rank-bm25 numpy requests python-dotenv flasknode --version # confirm 18+
./run -l 1 --lang node test # installs deps on first use, then indexesdotnet --version # confirm 8.x
./run -l 1 --lang csharp test # restores + builds on first use, then indexesclaude in your terminal, you're set — no API key. Every language
defaults to it; Ollama is the local fallback.Same idea on every OS. No Docker. The quickest path is always
./run -l 1 --lang node --lang csharp — it sets up everything on first use.
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txtpython -m venv venv; venv\Scripts\Activate.ps1
pip install -r requirements.txtsudo apt install -y nodejs npm # Linux (Debian/Ubuntu)
brew install node # macOS
winget install -e --id OpenJS.NodeJS.LTS # Windowscd node/lesson-1 && npm installcd node\lesson-1
npm install
node src\cli.js index # validate; then: node src\cli.js web# Linux: see learn.microsoft.com/dotnet/core/install/linux
brew install dotnet@8 # macOS
winget install -e --id Microsoft.DotNet.SDK.8 # Windowscd dotnet/lesson-1 && dotnet restore && dotnet build -c Releasecd dotnet\lesson-1
dotnet run -c Release -- index # validate; then: dotnet run -c Release -- webnpm install -g @anthropic-ai/claude-code && claude.localrag/node, so it never clobbers the Python one.The .NET index is cached separately in .localrag/dotnet, so it never clobbers the Python one.Or just run ./run -l 1 — it creates the venv and installs everything for you.RAG starts by turning files into plain text. We dispatch on file extension and normalize everything to a list of pages — each carrying its text, a page number, and the source filename (we need those for citations).
from pathlib import Path
from typing import List, TypedDict
SUPPORTED_EXTS = {".pdf", ".docx", ".txt", ".md", ".markdown"}
class Page(TypedDict):
source: str
page_number: int
text: str
def _extract_pdf(path: Path) -> List[Page]:
from pypdf import PdfReader
reader = PdfReader(str(path))
pages = []
for i, page in enumerate(reader.pages, start=1):
text = (page.extract_text() or "").strip()
if text:
pages.append(Page(source=path.name, page_number=i, text=text))
return pages
def extract_pages(path: Path) -> List[Page]:
ext = path.suffix.lower()
if ext == ".pdf":
return _extract_pdf(path)
if ext == ".docx":
from docx import Document
doc = Document(str(path))
text = "\n".join(p.text for p in doc.paragraphs if p.text.strip())
return [Page(source=path.name, page_number=1, text=text)] if text else []
if ext in {".txt", ".md", ".markdown"}:
text = path.read_text(encoding="utf-8", errors="replace").strip()
return [Page(source=path.name, page_number=1, text=text)] if text else []
return []export const SUPPORTED_EXTS = new Set([".pdf", ".docx", ".txt", ".md", ".markdown"]);
async function extractPdf(filePath) {
const { default: pdfParse } = await import("pdf-parse");
const data = await pdfParse(fs.readFileSync(filePath));
const text = (data.text || "").trim();
if (!text) return [];
return [{ source: path.basename(filePath), page_number: 1, text }];
}
export async function extractPages(filePath) {
const ext = path.extname(filePath).toLowerCase();
if (ext === ".pdf") return extractPdf(filePath);
if (ext === ".docx") return extractDocx(filePath);
if (ext === ".txt" || ext === ".md" || ext === ".markdown") {
const text = fs.readFileSync(filePath, "utf-8").trim();
return text ? [{ source: path.basename(filePath), page_number: 1, text }] : [];
}
return [];
}public record Page(string Source, int PageNumber, string Text);
public static readonly HashSet<string> SupportedExts =
new(StringComparer.OrdinalIgnoreCase) { ".pdf", ".docx", ".txt", ".md", ".markdown" };
private static List<Page> ExtractPdf(string path)
{
var pages = new List<Page>();
var name = Path.GetFileName(path);
using var doc = PdfDocument.Open(path);
var i = 0;
foreach (var page in doc.GetPages())
{
i++;
var text = (page.Text ?? string.Empty).Trim();
if (text.Length > 0) pages.Add(new Page(name, i, text));
}
return pages;
}
public static List<Page> ExtractPages(string path)
{
var ext = Path.GetExtension(path).ToLowerInvariant();
if (ext == ".pdf") return ExtractPdf(path);
if (ext == ".docx") return ExtractDocx(path);
if (ext is ".txt" or ".md" or ".markdown")
{
var text = ReadTextFile(path).Trim();
return text.Length > 0 ? new() { new(Path.GetFileName(path), 1, text) } : new();
}
return new();
}manual.pdf:4 points to the exact spot. Extraction is the unglamorous 80% of
real RAG — garbage text in, garbage answers out.pdf-parse returns
whole-document text, so a PDF emits a single page (page_number 1) rather than
one page per physical page. DOCX/TXT/MD match the Python reference exactly.A whole document is too big to feed the model and too coarse to retrieve precisely. We split it into ~1000-character chunks. The key trick is overlap: each chunk repeats the tail of the previous one, so a sentence split across a boundary still appears intact somewhere.
def _split_text(text, size, overlap):
text = " ".join(text.split()) # normalize whitespace
if len(text) <= size:
return [text] if text else []
chunks, start, n = [], 0, len(text)
while start < n:
end = min(start + size, n)
if end < n: # prefer a clean break near the limit
window = text[start:end]
for sep in (". ", "! ", "? ", "\n", " "):
pos = window.rfind(sep)
if pos > size // 2:
end = start + pos + len(sep)
break
chunk = text[start:end].strip()
if chunk:
chunks.append(chunk)
if end >= n:
break
start = max(end - overlap, start + 1) # step back to create overlap
return chunksfunction splitText(text, size, overlap) {
text = text.split(/\s+/).filter(Boolean).join(" "); // normalize whitespace
if (text.length <= size) return text ? [text] : [];
const chunks = [];
let start = 0;
const n = text.length;
while (start < n) {
let end = Math.min(start + size, n);
if (end < n) { // prefer a clean break near the limit
const window = text.slice(start, end);
for (const sep of [". ", "! ", "? ", "\n", " "]) {
const pos = window.lastIndexOf(sep);
if (pos > Math.floor(size / 2)) { end = start + pos + sep.length; break; }
}
}
const chunk = text.slice(start, end).trim();
if (chunk) chunks.push(chunk);
if (end >= n) break;
start = Math.max(end - overlap, start + 1); // step back to create overlap
}
return chunks;
}public static List<string> SplitText(string text, int size, int overlap)
{
// Normalize whitespace: split on whitespace runs, join with single spaces.
text = string.Join(" ", text.Split((char[]?)null, StringSplitOptions.RemoveEmptyEntries));
if (text.Length <= size)
return text.Length > 0 ? new List<string> { text } : new();
var chunks = new List<string>();
var start = 0;
var n = text.Length;
while (start < n)
{
var end = Math.Min(start + size, n);
if (end < n) // prefer a clean break near the limit
{
var window = text.Substring(start, end - start);
foreach (var sep in new[] { ". ", "! ", "? ", "\n", " " })
{
var pos = window.LastIndexOf(sep, StringComparison.Ordinal);
if (pos > size / 2) { end = start + pos + sep.Length; break; }
}
}
var chunk = text.Substring(start, end - start).Trim();
if (chunk.Length > 0) chunks.Add(chunk);
if (end >= n) break;
start = Math.Max(end - overlap, start + 1); // step back to create overlap
}
return chunks;
}source and page_number so we can cite it later. The algorithm is
identical across all three languages.Extract + chunk every file in documents/, then cache the result so we don't
redo the work on every question. We fingerprint each file by
(path, mtime, size); if nothing changed, we reuse the cache.
def is_stale(config):
"""True if the cache is missing or the documents folder changed."""
index_path = config.cache_dir / "index.json"
if not index_path.exists():
return True
data = json.loads(index_path.read_text())
return data.get("fingerprint") != _fingerprint(discover_files(config.docs_dir))
def build_index(config):
files = discover_files(config.docs_dir)
chunks = []
for path in files:
chunks.extend(chunk_pages(extract_pages(path)))
config.cache_dir.mkdir(parents=True, exist_ok=True)
(config.cache_dir / "index.json").write_text(
json.dumps({"fingerprint": _fingerprint(files), "chunks": chunks}))
return chunks, len(files)export function isStale(config) {
// True if the cache is missing or the docs folder changed since last build.
const ip = indexPath(config);
if (!fs.existsSync(ip)) return true;
let data;
try { data = JSON.parse(fs.readFileSync(ip, "utf-8")); }
catch { return true; }
return !fingerprintsEqual(data.fingerprint, fingerprint(discoverFiles(config.docsDir)));
}
export async function buildIndex(config) {
const files = discoverFiles(config.docsDir);
const chunks = [];
for (const filePath of files) {
const pages = await extractPages(filePath);
chunks.push(...chunkPages(pages));
}
fs.mkdirSync(config.cacheDir, { recursive: true });
fs.writeFileSync(indexPath(config),
JSON.stringify({ fingerprint: fingerprint(files), chunks }), "utf-8");
return { chunks, fileCount: files.length };
}public static bool IsStale(Config config)
{
var path = IndexPath(config);
if (!File.Exists(path)) return true;
IndexFile? data;
try { data = JsonSerializer.Deserialize<IndexFile>(File.ReadAllText(path), JsonOpts); }
catch { return true; }
if (data is null) return true;
return !FingerprintsEqual(data.Fingerprint, Fingerprint(Extract.DiscoverFiles(config.DocsDir)));
}
public static (List<Chunk> Chunks, int FileCount) BuildIndex(Config config)
{
var files = Extract.DiscoverFiles(config.DocsDir);
var chunks = new List<Chunk>();
foreach (var path in files)
chunks.AddRange(Chunking.ChunkPages(Extract.ExtractPages(path)));
Directory.CreateDirectory(config.CacheDir);
var index = new IndexFile(Fingerprint(files), ToRecords(chunks));
File.WriteAllText(IndexPath(config), JsonSerializer.Serialize(index, JsonOpts));
return (chunks, files.Count);
}Given a question, which chunks are relevant? The simplest robust answer is BM25, a classic keyword-ranking algorithm. It needs no model and no embedding service, so it works with any provider — including Claude Code, which can't embed.
import re
from rank_bm25 import BM25Okapi
def _tokenize(text):
return re.findall(r"[a-z0-9]+", text.lower())
class Bm25Retriever:
name = "bm25"
def __init__(self, chunks):
self.chunks = chunks
self.bm25 = BM25Okapi([_tokenize(c["text"]) for c in chunks] or [[""]])
def search(self, query, k):
if not self.chunks:
return []
scores = self.bm25.get_scores(_tokenize(query))
ranked = sorted(range(len(self.chunks)),
key=lambda i: scores[i], reverse=True)
return [self.chunks[i] for i in ranked[:k]]function tokenize(text) {
return text.toLowerCase().match(/[a-z0-9]+/g) || [];
}
class Bm25Retriever {
// The constructor builds idf + per-doc term frequencies (k1=1.5, b=0.75) —
// a from-scratch BM25 Okapi, since there's no rank-bm25 in Node.
search(query, k) {
if (!this.chunks.length || k <= 0) return [];
const scores = this.scores(tokenize(query));
const ranked = scores.map((_, i) => i).sort((a, b) => scores[b] - scores[a]);
const top = ranked.slice(0, k);
const best = scores[top[0]];
if (best <= 0) return top.map((i) => this.chunks[i]);
return top.filter((i) => scores[i] > 0).map((i) => this.chunks[i]);
}
}// The constructor builds idf + per-doc term frequencies (k1=1.5, b=0.75) —
// a from-scratch BM25 Okapi, since there's no rank-bm25 in .NET.
public List<Chunk> Search(string query, int k)
{
if (_chunks.Count == 0 || k <= 0) return new List<Chunk>();
var scores = GetScores(Tokenize(query));
var ranked = Enumerable.Range(0, _chunks.Count)
.OrderByDescending(i => scores[i])
.ToList();
var top = ranked.Take(k).ToList();
var best = scores[top[0]];
if (best <= 0) return top.Select(i => _chunks[i]).ToList();
return top.Where(i => scores[i] > 0).Select(i => _chunks[i]).ToList();
}score > 0 and returned nothing. The fix: don't apply an absolute
cutoff — return top-k by rank and let the grounding prompt judge relevance. Retrieval
retrieves; the LLM decides. All three ports carry the same fix.This is the heart of RAG. We hand the model the retrieved chunks as context and instruct it to answer from that context, cite sources, and clearly label anything it adds from general knowledge. The prompt text is identical in every language.
SYSTEM_PROMPT = """You are a careful assistant answering questions over a set
of the user's own documents. Follow these rules exactly:
1. Answer from the DOCUMENT CONTEXT below FIRST. For every claim that comes
from the documents, cite the source like [filename:page].
2. If the answer is not contained in the document context, say so plainly:
"This is not covered in your documents." You may then add general
knowledge, but you MUST prefix it with
"(general knowledge — not from your documents)".
3. Never invent document contents, quotes, or citations.
4. Be concise. Prefer the documents' own wording.
"""
def build_context(chunks):
return "\n\n---\n\n".join(
f"[{c['source']}:{c['page_number']}]\n{c['text']}" for c in chunks)
def build_user_prompt(question, chunks):
context = build_context(chunks) if chunks else "(no relevant documents)"
return f"DOCUMENT CONTEXT:\n{context}\n\nQUESTION:\n{question}"export const SYSTEM_PROMPT = `You are a careful assistant answering questions over a set of
the user's own documents. Follow these rules exactly:
1. Answer from the DOCUMENT CONTEXT below FIRST. For every claim that comes from
the documents, cite the source like [filename:page].
2. If the answer is not in the context, say so plainly: "This is not covered in
your documents." Prefix any general knowledge with "(general knowledge — ...)".
3. Never invent document contents, quotes, or citations.
4. Be concise. Prefer the documents' own wording.`;
export function buildContext(chunks) {
return chunks
.map((c) => `[${c.source}:${c.page_number}]\n${c.text}`)
.join("\n\n---\n\n");
}
export function buildUserPrompt(question, chunks) {
const context = chunks.length ? buildContext(chunks) : "(no relevant documents found)";
return `DOCUMENT CONTEXT:\n${context}\n\nQUESTION:\n${question}`;
}public const string SystemPrompt =
"You are a careful assistant answering questions over a set of " +
"the user's own documents. Follow these rules exactly:\n\n" +
"1. Answer from the DOCUMENT CONTEXT below FIRST. Cite each claim like [filename:page].\n" +
"2. If the answer is not in the context, say so plainly, then prefix any general " +
"knowledge with \"(general knowledge — not from your documents)\".\n" +
"3. Never invent document contents, quotes, or citations.\n" +
"4. Be concise. Prefer the documents' own wording.\n";
public static string BuildContext(List<Chunk> chunks)
{
var blocks = chunks.Select(c => $"[{c.Source}:{c.PageNumber}]\n{c.Text}");
return string.Join("\n\n---\n\n", blocks);
}
public static string BuildUserPrompt(string question, List<Chunk> chunks)
{
var context = chunks.Count > 0 ? BuildContext(chunks) : "(no relevant documents found)";
return $"DOCUMENT CONTEXT:\n{context}\n\nQUESTION:\n{question}";
}The pipeline shouldn't care which AI answers. So we define one tiny interface and pick an implementation by name. Switching providers becomes a one-line env-var change.
from typing import Protocol
class LLMProvider(Protocol):
name: str
def is_available(self) -> bool: ...
def chat(self, system: str, user: str) -> str: ...
def get_provider(name, config):
if name == "claude":
from .claude_code import ClaudeCodeProvider; return ClaudeCodeProvider(config)
if name == "ollama":
from .ollama import OllamaProvider; return OllamaProvider(config)
if name == "gemini":
from .gemini import GeminiProvider; return GeminiProvider(config)
if name == "openai":
from .openai import OpenAIProvider; return OpenAIProvider(config)
raise ValueError(f"Unknown provider '{name}'")import shutil, subprocess
class ClaudeCodeProvider:
name = "claude"
def __init__(self, config):
self.bin = config.claude_bin
def is_available(self):
return shutil.which(self.bin) is not None
def chat(self, system, user):
result = subprocess.run(
[self.bin, "-p", f"{system}\n\n{user}"],
capture_output=True, text=True, timeout=180)
return result.stdout.strip()import { ClaudeCodeProvider } from "./claudeCode.js";
import { OllamaProvider } from "./ollama.js";
export function getProvider(name, config) {
name = (name || "").toLowerCase();
if (name === "claude") return new ClaudeCodeProvider(config);
if (name === "ollama") return new OllamaProvider(config);
if (name === "gemini" || name === "openai") {
throw new Error(`Provider '${name}' is not ported in Node yet — use the Python reference.`);
}
throw new Error(`Unknown provider '${name}'. Choose one of: claude, ollama.`);
}export class ClaudeCodeProvider {
constructor(config) { this.name = "claude"; this.bin = config.claudeBin; }
isAvailable() { return resolveBin(this.bin) !== null; }
chat(system, user) {
const resolved = resolveBin(this.bin); // find claude on PATH, no shell
// The prompt goes on STDIN — no argv-length limits, no shell quoting.
const out = execFileSync(resolved, ["-p"], {
input: `${system}\n\n${user}`, encoding: "utf-8", timeout: 180000,
});
return out.trim();
}
}claude + ollama.
Asking for gemini/openai throws a clear error pointing back to the
Python reference (./run -l 1 --provider gemini …).public interface ILlmProvider
{
string Name { get; }
bool IsAvailable();
string Chat(string system, string user);
}
public static class ProviderFactory
{
public static ILlmProvider GetProvider(string name, Config config)
{
name = (name ?? string.Empty).ToLowerInvariant();
return name switch
{
"claude" => new ClaudeCodeProvider(config),
"ollama" => new OllamaProvider(config),
"gemini" or "openai" => throw new InvalidOperationException(
$"Provider '{name}' is not ported in C# yet — use the Python reference."),
_ => throw new InvalidOperationException($"Unknown provider '{name}'."),
};
}
}claude + ollama.
Asking for gemini/openai throws a clear error pointing back to the
Python reference.ChatModel
stops looking like magic.The pipeline: ensure the index is fresh, retrieve top-k chunks, build the grounded prompt, call the provider, print the answer and its sources.
def answer(question, retriever, config):
hits = retriever.search(question, config.top_k)
provider = get_provider(config.provider, config)
reply = provider.chat(SYSTEM_PROMPT, build_user_prompt(question, hits))
print(reply)
sources = []
for h in hits:
tag = f"{h['source']}:{h['page_number']}"
if tag not in sources:
sources.append(tag)
print("Sources:", ", ".join(sources) or "(none)")python -m localrag index
python -m localrag ask "How do I reset the device?"async function answer(question, retriever, config) {
const hits = retriever.search(question, config.topK);
const provider = getProvider(config.provider, config);
const reply = await provider.chat(SYSTEM_PROMPT, buildUserPrompt(question, hits));
console.log("\n" + reply.trim() + "\n");
const sources = [];
for (const h of hits) {
const tag = `${h.source}:${h.page_number}`;
if (!sources.includes(tag)) sources.push(tag);
}
console.log("Sources: " + (sources.join(", ") || "(none)"));
}./run -l 1 --lang node index
./run -l 1 --lang node ask "How do I reset the device?"void PrintAnswer(string question, IRetriever retriever)
{
var hits = retriever.Search(question, config.TopK);
var provider = ProviderFactory.GetProvider(config.Provider, config);
var reply = provider.Chat(Prompts.SystemPrompt, Prompts.BuildUserPrompt(question, hits));
Console.WriteLine("\n" + reply.Trim() + "\n");
Console.WriteLine(hits.Count > 0
? "Sources: " + string.Join(", ", Engine.DedupSources(hits))
: "Sources: (none)");
}./run -l 1 --lang csharp index
./run -l 1 --lang csharp ask "How do I reset the device?"To reset the WidgetPro 3000, press and hold the power button for 10 seconds
until the status LED blinks blue three times. [sample_manual.md:1]
Sources: sample_manual.md:1BM25 matches words. "How do I power-cycle it?" won't match a doc that says "restart" — no shared keywords. Embeddings match meaning: turn each chunk into a vector and rank by cosine similarity to the question.
class EmbeddingRetriever:
name = "embeddings"
def __init__(self, chunks, config):
import numpy as np
from .providers import embed_texts
self.chunks, self.config = chunks, config
vectors = np.asarray(
embed_texts(config.embed_provider, config,
[c["text"] for c in chunks]), dtype="float32")
self._vectors = self._normalize(vectors)
def search(self, query, k):
import numpy as np
from .providers import embed_texts
q = self._normalize(np.asarray(embed_texts(
self.config.embed_provider, self.config, [query]),
dtype="float32"))[0]
sims = self._vectors @ q # cosine (vectors normalized)
ranked = np.argsort(sims)[::-1][:k]
return [self.chunks[i] for i in ranked]def build_retriever(chunks, config):
if config.retriever == "embeddings":
try:
return EmbeddingRetriever(chunks, config)
except Exception as exc:
print(f"[localrag] Embeddings unavailable ({exc}). Falling back to BM25.")
return Bm25Retriever(chunks)RAG_RETRIEVER=embeddings RAG_EMBED_PROVIDER=ollama python -m localrag ask "power-cycle steps?"./run -l 1).export function buildRetriever(chunks, config) {
// Embeddings are not ported in Node; fall back to BM25 with a clear message,
// mirroring the Python "never dead-end" design.
if (config.retriever === "embeddings") {
console.log(
"[localrag] Embeddings not ported in Node. Falling back to BM25 " +
"(use the Python reference for embeddings)."
);
}
return new Bm25Retriever(chunks);
}./run -l 1).public static class RetrieverFactory
{
/// <summary>Pick a retriever from config. Embeddings fall back to BM25 in this port.</summary>
public static IRetriever BuildRetriever(List<Chunk> chunks, Config config)
{
if (config.Retriever == "embeddings")
Console.WriteLine("[localrag] Embeddings are not ported in C# yet. Falling back to BM25.");
return new Bm25Retriever(chunks);
}
}A terminal is fine for you; a web page is better for a demo. A tiny FlaskExpressASP.NET minimal-API app reuses the exact same engine. Three endpoints: serve the page, accept dropped files (save + reindex), and answer questions.
@app.post("/api/upload")
def upload():
for f in request.files.getlist("files"):
name = secure_filename(f.filename)
if Path(name).suffix.lower() in SUPPORTED_EXTS:
f.save(base_config.docs_dir / name)
chunks, n = refresh_index(base_config) # rebuild on every drop
return jsonify({"files": _list_files(base_config), "chunks": len(chunks)})
@app.post("/api/ask")
def ask():
data = request.get_json()
return jsonify(answer_question(_request_config(), data["question"]))python -m localrag web # http://127.0.0.1:5000app.post("/api/upload", uploadFiles, async (req, res) => {
for (const f of req.files || []) {
const name = secureFilename(f.originalname);
if (SUPPORTED_EXTS.has(path.extname(name).toLowerCase())) {
await fs.promises.writeFile(path.join(baseConfig.docsDir, name), f.buffer);
}
}
const { chunks, fileCount } = await refreshIndex(baseConfig); // rebuild on every drop
res.json({ files: listFiles(baseConfig), indexed_files: fileCount, chunks: chunks.length });
});./run -l 1 --lang node # http://127.0.0.1:5000app.MapPost("/api/upload", async (HttpRequest request) =>
{
var form = await request.ReadFormAsync();
foreach (var f in form.Files.GetFiles("files"))
{
var name = SecureFilename(f.FileName);
if (!Extract.SupportedExts.Contains(Path.GetExtension(name))) continue;
await using var fs = File.Create(Path.Combine(baseConfig.DocsDir, name));
await f.CopyToAsync(fs);
}
var (chunks, nFiles) = Engine.RefreshIndex(baseConfig); // rebuild on every drop
return Results.Json(new { files = ListFiles(baseConfig), chunks = chunks.Count });
});./run -l 1 --lang csharp # http://127.0.0.1:5000fetch() to
upload, then a question box that renders the answer plus clickable source chips. All three
ports serve the same page. Drop a PDF, ask a question, watch it cite the file you
just dropped.Ask something in your documents and something out of them, and watch the model stay honest:
python -m localrag ask "How long is the warranty?"
python -m localrag ask "What is the capital of France?"./run -l 1 --lang node ask "How long is the warranty?"
./run -l 1 --lang node ask "What is the capital of France?"./run -l 1 --lang csharp ask "How long is the warranty?"
./run -l 1 --lang csharp ask "What is the capital of France?"→ 24 months from purchase date [warranty.txt:1] # grounded, cited
→ This is not covered in your documents.
(general knowledge — not from your documents) The capital of France is Paris.A tiny offline test locks in the retrieval core (no network, no LLM):
def test_bm25_finds_reset_instructions():
chunks = chunk_pages(extract_pages(SAMPLE), size=400, overlap=80)
hits = Bm25Retriever(chunks).search("how do I reset the device", k=3)
assert "power button" in hits[0]["text"].lower()pytest -q # 4 passed./run -l 1 --lang node test # → "Indexed 4 file(s) into 44 chunk(s)."tests/test_smoke.py) lives in the
Python reference; the ports ship a test command that exercises the same
extract → chunk → index path end to end../run -l 1 --lang csharp test # → "Indexed 4 file(s) into 32 chunk(s)."tests/test_smoke.py) lives in the
Python reference; the ports ship a test command that exercises the same
extract → chunk → index path end to end.Prove the app reads your files, not training data: feed it a story no model has ever
seen. Download
The_Magic_Turtle_Astronaut.pdf
— a made-up legend — and drag it onto the web UI
(./run -l 1 --lang node --lang csharp).
What was the name of the magic turtle, and what species was she?
Who discovered the turtle's secret, and how?
What was the spaceship called, and how long did the journey take?
What planet did Caretta discover, and which star does it orbit?
On what date was the habitable planet discovered?How much did the spaceship cost to build?
What did the turtle eat during the twelve-year voyage?
Who was the President of Earth when the mission launched?From scratch, you assembled:
| Stage | What it does |
|---|---|
| Extraction | PDF/DOCX/TXT/MD → pages |
| Chunking | overlapping windows with source+page |
| Indexing | cached, auto-refreshing JSON index |
| Retrieval | BM25 and embeddings behind one interface |
| Grounding | a prompt that cites and admits ignorance |
| Providers | one interface, four back ends, claude + ollama, claude + ollama, env-var switch |
| Web UI | drag-and-drop, same engine underneath |
Exercises: merge BM25 + embedding rankings (hybrid); chunk on Markdown headings; show the exact sentence behind each citation; stream tokens to the web UI.