RAG Works Great Until It Doesn’t
RAG is the default answer for giving LLMs access to your documentation. Chunk the docs, embed them, throw them in a vector database, and retrieve the top-K results when the user asks a question. It’s elegant. It scales. It’s also got failure modes that’ll make you scream at 2 AM.
Here’s the thing: your LLM confidently cites page 47 of your docs while completely missing the actual answer on page 3. The retriever grabbed the wrong chunks—chunks that looked semantically similar to the query but weren’t actually relevant. Or worse, the answer spans multiple chunks and the retriever only grabbed one. Or the chunk size was too small and you lost context. Or you updated the docs and forgot to re-embed them, so your system is hallucinating based on stale information.
RAG introduces a whole pipeline of potential failure points: chunking decisions, embedding models, vector database retrieval, context window limits, staleness. Each one can fail silently.
What if you treated document retrieval the way developers treat code exploration? Browse, navigate, search—like using a filesystem.
The Virtual Filesystem Approach
Instead of chunking and embedding, expose your docs as a filesystem that the LLM can navigate with standard tools: ls, cat, grep. The LLM doesn’t guess what it needs; it explores the structure, looks at file names, reads what’s relevant, and knows exactly where it found something.
This works because LLMs are actually pretty good at filesystem navigation. They understand directory hierarchies, they can infer what a file contains from its name, and they can read just the part they need. They’re developers—let them use a filesystem like one.
When This Beats RAG
Structured documentation — API docs, codebases, wikis. If your docs have a clear hierarchy (README → guides → API reference → examples), a filesystem structure mirrors that. The LLM can start at the top and navigate down.
Semantic gaps — Your users ask questions that don’t map well to embeddings. A developer might ask “how do I authenticate?” but your embeddings might cluster that with “what authentication methods exist?” The LLM can browse the auth section directly and find both without guessing.
Relationships between docs — If the answer depends on understanding how two documents connect, the filesystem structure makes that obvious. The LLM can jump between files and trace dependencies.
Small to medium corpora — If your docs fit in context (say, under 10 MB uncompressed), you don’t need the retrieval layer at all. The LLM can hold the whole structure in mind.
Control and auditability — The LLM’s file access is logged. You know exactly which docs it read. No mysterious embedding space to debug.
The Tradeoffs
This approach isn’t free:
- More LLM calls — The LLM browses, reads, backtracks, refines. That’s slower and more expensive than a single retrieval + context injection.
- Requires a capable model — Your LLM needs to navigate intelligently. Claude, GPT-4, Llama 3.1 work great. Smaller models struggle.
- Context window is still the ceiling — You can’t load docs bigger than your context window. For massive corpora, you’re back to RAG or hybrid approaches.
Implementation: MCP Filesystem Server
The easiest way is via the Model Context Protocol (MCP). Here’s a server config that exposes a docs directory:
{ "mcpServers": { "docs": { "command": "npx", "args": ["@modelcontextprotocol/server-filesystem", "/path/to/your/docs"] } }}Point Claude at this server, and you get list_directory, read_file, write_file tools automatically. The LLM can navigate your docs as if they’re on its local machine.
A Python Tool-Calling Loop
If you’re not using MCP, you can build a simple loop where the LLM calls filesystem tools:
import jsonfrom pathlib import Pathfrom anthropic import Anthropic
client = Anthropic()
def handle_tool(tool_name, tool_input): """Execute filesystem tools.""" path = Path(tool_input.get("path", "."))
if tool_name == "ls": return json.dumps([str(p.relative_to(path.parent)) for p in path.iterdir()]) elif tool_name == "cat": return (path / tool_input.get("filename")).read_text() elif tool_name == "grep": pattern = tool_input.get("pattern") result = [] for f in path.glob("**/*"): if f.is_file(): try: if pattern.lower() in f.read_text().lower(): result.append(str(f.relative_to(path))) except: pass return json.dumps(result)
def query_docs(user_query, docs_path="/path/to/docs"): """Query docs via LLM with filesystem access.""" messages = [ {"role": "user", "content": user_query} ]
system = f"""You're a documentation assistant. You have access to a filesystem at {docs_path}.Use ls to explore structure, cat to read files, grep to search. Start by understanding the hierarchy."""
while True: response = client.messages.create( model="claude-opus-4-1-20250805", max_tokens=4096, system=system, tools=[ {"name": "ls", "description": "List directory contents", "input_schema": {"type": "object", "properties": {"path": {"type": "string"}}}}, {"name": "cat", "description": "Read a file", "input_schema": {"type": "object", "properties": {"filename": {"type": "string"}}}}, {"name": "grep", "description": "Search files", "input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}}} ], messages=messages )
if response.stop_reason == "end_turn": return response.content[0].text
for block in response.content: if block.type == "tool_use": tool_result = handle_tool(block.name, block.input) messages.append({"role": "assistant", "content": response.content}) messages.append({"role": "user", "content": [{"type": "tool_result", "tool_use_id": block.id, "content": tool_result}]}) breakSystem Prompt for Filesystem Navigation
Here’s a prompt that tells the LLM how to explore:
You have access to a documentation filesystem. Your job is to answer user questions by navigating the docs.
Strategy:1. Start with ls to see the top-level structure2. Infer what directories/files might be relevant3. Use grep to search for keywords if the structure isn't obvious4. Read relevant files with cat5. Synthesize an answer backed by specific file references
Always cite which files you read. If the answer requires multiple files, trace the connections.When to Stick with RAG
RAG is still the better choice for:
- Large public corpora — Search across Wikipedia or a 10,000-document knowledge base.
- Speed-critical systems — RAG retrieval is faster than the LLM navigating a filesystem.
- Low-latency APIs — Embedding-based retrieval is easier to cache and optimize.
- User-facing search — Ranking documents by relevance for humans is different from ranking for LLM exploration.
The Verdict
RAG is a great default. But if your docs fit in context, have clear structure, and your users’ questions don’t map well to embeddings, a virtual filesystem is simpler and more transparent. It’s less magic, more control. Your LLM is a developer—let it use the tools developers use.
Start with the MCP server approach. If your docs are under 5 MB, try loading them entirely into context and see if the LLM can navigate without tools. Most of the time, it can. You might find that the simplest approach beats the fanciest pipeline.