Your Code Deserves Better Than a Cloud Subscription Judging It in Real Time
Let’s paint a picture. You’re deep in a project, you’ve got a function called do_the_thing_for_real_this_time_v3, a variable named temp2, and a comment that just says # idk. GitHub Copilot is watching all of this. It’s learning from all of this. Somewhere in a Microsoft datacenter, your crimes against clean code are being ingested into a training pipeline.
Or maybe that’s not what bothers you. Maybe it’s the $10/month bill. Maybe it’s the fact that your company’s legal team sent a three-page email about not pasting proprietary code into AI tools. Maybe you just like owning your stack.
Whatever the reason, you’re here because you want an AI code assistant that runs on your own hardware, respects your privacy, and doesn’t require a credit card. The good news: the options are actually pretty solid now. The slightly less good news: each one has its own quirks, setup overhead, and personality.
Let’s dig in.
Why Self-Host Your Code Assistant?
Before the comparison, let’s be clear about what you’re trading off.
The case for self-hosting:
- Privacy. Your code never leaves your machine. That matters for proprietary work, client projects, or just basic paranoia (valid).
- No per-token costs. Once you’ve got local models running, completions are free. Run it 10 times or 10,000 times — same bill.
- No telemetry. No one is collecting your prompts, your completions, or your usage patterns.
- Works offline. Traveling? VPN restrictions? Air-gapped environment? Doesn’t matter.
- Model choice. You pick the model. Want a 7B parameter model that runs on a laptop? Go for it. Got a beefy GPU and want to run something larger? Also fine.
The trade-offs:
- Setup takes more than clicking “install extension.”
- Quality depends heavily on the model you run locally — a 3B model is not going to match GPT-4.
- You need hardware. A decent GPU or at least 16GB RAM helps a lot.
If those trade-offs are acceptable, let’s look at what’s actually available.
The Three Contenders
Continue.dev — The Flexible Swiss Army Knife
Continue.dev is a VS Code and JetBrains plugin that acts as a front-end for whatever AI backend you want to point it at. It doesn’t run models itself — it connects to things like Ollama, LiteLLM, OpenAI-compatible APIs, Anthropic, or basically anything that speaks the right protocol.
This is both its superpower and its one complication: you need to have a backend running separately.
What it does:
- Inline code completion (the gray ghost text that finishes your thoughts)
- Chat sidebar (ask questions, get explanations, refactor code)
- Context-aware: it can read your open files, selected code, terminal output, and more
- Configurable via a
config.jsonfile — which sounds annoying but is actually pretty clean once you get the hang of it
The config.json approach means you can switch models without reinstalling anything. Want to try a different model? Update one field. Want to point at a remote server instead of local? One line change.
Best for: People who already have (or want to set up) Ollama, and want a polished IDE experience with full chat + autocomplete.
Verdict: The most practical choice for most self-hosters. Especially if you’re already in the Ollama ecosystem.
Tabby — The Server You Own
Tabby is a different beast. It’s a self-hosted server application — you run it on a machine, it exposes an OpenAPI endpoint, and then your IDE extensions connect to it. The VS Code extension is the main integration, with JetBrains support also available.
What makes Tabby interesting is that it’s designed around the idea of training on your own codebase. You can point it at your repos and it builds context from your actual code, not just a generic pretrained model. For teams working on large or specialized codebases, this is a genuinely useful differentiator.
What it does:
- Code completion (its core strength)
- Repository context indexing
- OpenAPI interface — tools can integrate against it
- Docker-friendly deployment
- Answer Engine feature for chat-style queries (newer addition)
The trade-off: Tabby is more operationally involved than Continue.dev. You’re managing a server process, dealing with model downloads and GPU configuration, and keeping it running. It’s not hard, but it’s closer to “running a service” than “installing a plugin.”
# Quick Tabby start with Docker (CPU only)
docker run -it \
-v $HOME/.tabby:/data \
-p 8080:8080 \
tabbyml/tabby \
serve --model TabbyML/StarCoder-1B
Best for: Teams or power users who want to index their own codebase and treat the AI assistant as a proper service in their infrastructure.
Verdict: More setup, more power. Excellent if codebase-aware completion is what you need.
Cody — The Polished One With an Escape Hatch
Cody is Sourcegraph’s AI coding assistant. It’s the most polished of the three out of the box, has a generous free tier, and works really well as a cloud product.
But it also has a self-hosted path via Sourcegraph Enterprise, which is why it belongs in this comparison.
What it does:
- Chat and autocomplete in VS Code, JetBrains, Neovim
- Deep codebase search via Sourcegraph’s code intelligence (this is where it genuinely shines)
- Command palette for common tasks (explain code, generate tests, fix bugs)
- Context from multiple repos simultaneously
The honest take on self-hosting: Cody self-hosted means running Sourcegraph, which is a substantial piece of infrastructure. It’s not a weekend project — it’s closer to “enterprise deployment.” If you’re a team already using Sourcegraph, adding Cody is a natural extension. If you’re a solo developer who just wants local completions, it’s overkill.
The free tier cloud version is legitimately good and doesn’t require any setup. It uses Anthropic models on the backend and has solid rate limits. For individuals, this might actually be the right call even if it’s not “self-hosted.”
Best for: Teams already in the Sourcegraph ecosystem, or individuals who want the cleanest cloud-based experience that isn’t GitHub Copilot.
Verdict: Great product, complex self-hosting story. Use the free cloud tier unless you’ve got Sourcegraph reasons.
The Practical Setup: Continue.dev + Ollama
This is the combo most people should start with. Here’s the full path from zero to working autocomplete.
Step 1: Install Ollama
Ollama handles model downloads, serving, and the API. It’s the easiest way to run local models.
# Linux/macOS
curl -fsSL https://ollama.com/install.sh | sh
# Pull a good coding model
ollama pull qwen2.5-coder:7b
qwen2.5-coder:7b is a solid choice — strong at code, reasonable resource requirements. If you’ve got the VRAM, deepseek-coder-v2:16b is excellent.
Verify it’s running:
ollama list
curl http://localhost:11434/api/tags
Step 2: Install Continue.dev
In VS Code: Extensions panel → search “Continue” → install the one by Continue.dev.
Once installed, it opens a sidebar. You’ll be prompted to configure your backend.
Step 3: Configure config.json
Open the command palette (Ctrl+Shift+P) → “Continue: Open config.json”
Here’s a working config that sets up both chat and autocomplete with Ollama:
{
"models": [
{
"title": "Qwen 2.5 Coder 7B",
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"apiBase": "http://localhost:11434"
}
],
"tabAutocompleteModel": {
"title": "Autocomplete",
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"apiBase": "http://localhost:11434"
},
"allowAnonymousTelemetry": false
}
That allowAnonymousTelemetry: false line is there because you came here for privacy.
Step 4: Use it
- Autocomplete: Just start typing. Ghost text appears. Tab to accept, Escape to dismiss.
- Chat: Click the Continue sidebar or hit
Ctrl+Lto open chat. Select code first to include it as context. - Inline edit: Select code, hit
Ctrl+I, type your instruction.
That’s it. You’ve got a local Copilot.
Comparison Table
| Feature | Continue.dev | Tabby | Cody |
|---|---|---|---|
| Self-hostable | Via backend (Ollama etc.) | Yes, native server | Enterprise only |
| Free tier | Yes (with local models) | Yes | Yes (cloud) |
| VS Code support | Yes | Yes | Yes |
| JetBrains support | Yes | Yes | Yes |
| Chat | Yes | Partial (newer) | Yes |
| Autocomplete | Yes | Yes (core feature) | Yes |
| Codebase indexing | Basic | Strong | Very strong |
| Setup difficulty | Medium | Medium-High | Low (cloud) / Very High (self-hosted) |
| Best model flexibility | High | Medium | Low (opinionated) |
| Works offline | Yes | Yes | No (free tier) |
Which One Should You Actually Use?
Use Continue.dev if: You want to get something working this weekend, you’re already comfortable with or interested in Ollama, and you want chat + autocomplete without managing a separate server. This is the default recommendation for 80% of people reading this.
Use Tabby if: You’re working on a large proprietary codebase and want the assistant to actually understand your code patterns, not just generic completions. Also if you want to run it as a shared service for a small team.
Use Cody (free tier) if: You don’t have the hardware for local models, you don’t want to manage infrastructure, and you just want something that works well without being GitHub Copilot. The free cloud tier is genuinely solid.
Use GitHub Copilot if: You enjoy paying $10/month for the privilege of having Microsoft read your variable names. (It is, to be fair, very good. But you knew that already.)
Hardware Reality Check
Let’s not pretend you can run a useful 70B model on a MacBook Air. Here’s the honest breakdown:
- 4GB VRAM / 8GB RAM: 3B models work. Completions will be fast. Quality is “helpful but not magical.”
- 8GB VRAM / 16GB RAM: 7B models work well. This is the sweet spot for most people. qwen2.5-coder:7b is genuinely useful here.
- 16GB+ VRAM: 13B-34B models. Quality gets noticeably better. Deep completions, better context handling.
- 24GB+ VRAM or multi-GPU: 70B models. Cloud-competitive quality. At this point you’re running a small server, which honestly sounds fun.
CPU-only is possible but slow. Tabby and Ollama both support it. If you’re on CPU, smaller models (1.5B-3B) are the realistic choice for anything resembling a responsive experience.
The Bottom Line
Self-hosted AI code assistants have crossed the threshold from “interesting experiment” to “actually useful daily driver.” Continue.dev with Ollama is the entry point most people should take — it’s practical, flexible, and the setup is well-documented.
Tabby is the right call if you’re thinking about this from a team infrastructure angle and want codebase-aware completions as a proper service.
Cody is excellent at what it does, but its self-hosted story is more complex than it looks. The free cloud tier is worth using if local inference isn’t your thing.
Your code can stay on your machine. Your variable names can remain your shameful secret. That’s the deal.
Have a different setup that works better for you? Running something exotic like LM Studio or a custom LiteLLM proxy? The comments are open, and I’m always curious what janky configurations people have gotten working in production.