Most problems are one of two things: no AI provider configured, or the wrong model name. This page covers each provider — Claude Code, Ollama, Gemini, OpenAI — how to set one up (including a free Gemini key), which model to use, and the errors you're most likely to see.
The app answers with one provider at a time, chosen by RAG_PROVIDER
(claude · ollama · gemini · openai). The
default is claude — no API key. Set values in a .env file at the repo
root (copy .env.example first), or as environment variables, then restart the
app. In the web UI you can also pick a provider from the dropdown for a single question.
cp .env.example .env # then edit .env
RAG_PROVIDER=ollama # claude | ollama | gemini | openai
Polyglot note: the Node.js and C# ports of Lesson 1 support
claude and ollama. For gemini / openai, use
the Python reference (./run -l 1).
Juggling these providers and keys across several projects or machines? agentvault manages Claude / Ollama / Gemini / OpenAI configs and rules in one place.
Runs models on your own machine. Install it from ollama.com, then pull a model. (Prefer one command to pull + run any Ollama model? ai-runner wraps exactly that.)
404 (Not Found) / “model … not found”Ollama is running, but the model you asked for isn't installed. The default is
llama3.1:8b. Either pull it, or point the app at a model you already have:
ollama list # what you already have
ollama pull llama3.1:8b # download the default model (~4.9 GB)
# ...or use an installed model without pulling:
# macOS / Linux:
export OLLAMA_MODEL=qwen2.5:7b
# Windows PowerShell:
$env:OLLAMA_MODEL = "qwen2.5:7b"
To make it permanent, set OLLAMA_MODEL in .env.
The Ollama server isn't running or is on a different address. Start it and check the URL:
ollama serve # start the server
curl http://localhost:11434/api/tags # should list your models
# non-default host/port? set it:
export OLLAMA_URL=http://localhost:11434
| Use | Model | Pull |
|---|---|---|
| Chat / Q&A (recommended) | llama3.1:8b | ollama pull llama3.1:8b |
| Smaller / faster | qwen2.5:7b, llama3.2:1b | ollama pull qwen2.5:7b |
| Tool / function calling (Lesson 9) | llama3.1, qwen2.5, mistral-nemo | ollama pull llama3.1 |
Embeddings (RAG_RETRIEVER=embeddings) | nomic-embed-text | ollama pull nomic-embed-text |
Uses your existing Claude Code CLI login, so there's nothing to configure.
Install the CLI and sign in once:
npm install -g @anthropic-ai/claude-code
claude # first run signs you in
claude --version
Custom install path? Point the app at it with CLAUDE_BIN=/full/path/to/claude.
No API key is needed — it reuses your Claude login. Claude Code can't produce embeddings, so
for RAG_RETRIEVER=embeddings use Ollama, Gemini, or OpenAI as the embed provider.
Google's hosted models, with a generous free tier — a good no-cost cloud option.
.env and select Gemini:RAG_PROVIDER=gemini
GEMINI_API_KEY=AIza...your-key...
GEMINI_MODEL=gemini-2.5-flash # fast & free-tier friendly (default)
GEMINI_EMBED_MODEL=text-embedding-004
Which model? gemini-2.5-flash is the default — fast and free-tier friendly.
For higher quality use gemini-2.5-pro. Embeddings use text-embedding-004.
| You see | Cause & fix |
|---|---|
400 API key not valid / 403 | Wrong or missing GEMINI_API_KEY. Re-copy it from AI Studio (no quotes, no spaces). |
404 model not found | GEMINI_MODEL name is wrong or unavailable to your key. Use gemini-2.5-flash. |
429 / quota exceeded | Free-tier rate limit hit — wait a minute and retry, or slow down requests. |
Hosted models from OpenAI. Note the API is paid (it needs billing set up), separate from a ChatGPT subscription.
.env:RAG_PROVIDER=openai
OPENAI_API_KEY=sk-...your-key...
OPENAI_MODEL=gpt-4o-mini # cheap & capable default
OPENAI_EMBED_MODEL=text-embedding-3-small
# Local/compatible server (LM Studio, vLLM, etc.)? override the base URL:
OPENAI_BASE_URL=https://api.openai.com/v1
| You see | Cause & fix |
|---|---|
401 Incorrect API key | Wrong/expired OPENAI_API_KEY. Generate a new key. |
429 insufficient_quota | No billing / credit on the account. Add a payment method. |
404 on a custom endpoint | OPENAI_BASE_URL is wrong, or that server doesn't expose the requested model. |
| Symptom | Fix |
|---|---|
| Answer says “not covered in your documents” | That's correct behaviour when the answer isn't in your files — it's the anti-hallucination guard. Add relevant documents to documents/. |
| No sources / “nothing relevant found” | The corpus is empty or the question doesn't match. Drop files into documents/ (PDF/DOCX/TXT/MD); the index auto-refreshes. |
| Embeddings mode falls back to BM25 | The embed provider was unreachable. Set RAG_EMBED_PROVIDER to ollama/gemini/openai and ensure it's configured. |
| Web UI won't start / port in use | ./run auto-picks a free port. To force one: WEB_PORT=5050 ./run -l 1. |
Changed .env but nothing changed | Restart the app — config is read at startup. |
| Variable | Default | What it does |
|---|---|---|
RAG_PROVIDER | claude | Which AI answers: claude · ollama · gemini · openai |
RAG_RETRIEVER | bm25 | Retrieval: bm25 (keyword) · embeddings (semantic) |
OLLAMA_URL | http://localhost:11434 | Ollama server address |
OLLAMA_MODEL | llama3.1:8b | Ollama chat model (must be pulled) |
GEMINI_API_KEY | — | Gemini key (from AI Studio) |
GEMINI_MODEL | gemini-2.5-flash | Gemini chat model |
OPENAI_API_KEY | — | OpenAI key (needs billing) |
OPENAI_MODEL | gpt-4o-mini | OpenAI chat model |
CLAUDE_BIN | claude | Path to the Claude Code CLI |