Here’s the thing: GitHub Copilot is phenomenal. The code suggestions are sharp, it understands context, and it genuinely makes you faster. But then you read the terms of service and realize Microsoft is using your code to train their model. Every keystroke, every function, every “this is so stupid” comment gets vacuumed up. Some teams can’t stomach that. Some regulatory regimes won’t allow it.
The good news? You don’t have to choose between productivity and privacy. There are three solid alternatives that bring AI code assistance into your editor—without the code leaving your machine (or your infrastructure). Let’s dig in.
The Copilot Problem
Copilot’s subscription costs $20/month, or your employer picks up the tab. That’s fine. What’s not fine is the data policy. Microsoft trains their LLMs on your code unless you’ve explicitly opted out in settings. Even then, telemetry still flows. Your proprietary algorithms, your security fixes, your hardcoded jokes—it all gets analyzed.
For enterprise teams, this is a non-starter. For self-hosters? Unacceptable.
Enter the alternatives.
Continue.dev: Your Editor + Any LLM
Continue is the path of least resistance. It’s a VS Code and JetBrains plugin that turns your editor into an LLM chat interface. The secret sauce: it doesn’t pick the LLM for you. You can point it at Ollama running locally, Claude, OpenAI, Mistral, whatever. And it works inside your editor.
What Continue Does
- Tab completion — ghost text as you type, hit Tab to accept
- Inline chat — Cmd+L to open a side panel, ask the model to generate code, refactor, add tests
- Edit mode — highlight a block and ask it to rewrite
- Slash commands —
/codebaseto search your repo,/testto generate tests,/docto add comments - Multi-file context — it reads your editor state and can pull in related files
Setup with Ollama (Local)
First, spin up Ollama and pull a model:
ollama pull mistral # or neural-chat, codellama, etc.ollama serveThen in Continue’s config file (~/.continue/config.json):
{ "models": [ { "title": "Ollama Local", "provider": "ollama", "model": "mistral", "apiBase": "http://localhost:11434" } ], "tabAutocompleteModel": { "title": "Ollama Local", "provider": "ollama", "model": "mistral", "apiBase": "http://localhost:11434" }, "contextProviders": [ { "name": "codebase" }, { "name": "diff" } ]}Done. Open VS Code, hit Cmd+L, and you’re chatting with Mistral running on your GPU.
Continue also supports Anthropic, OpenAI, Cohere, and cloud deployments (if privacy isn’t your #1 concern). You can use different models for tab completion vs chat—e.g., a small model for real-time suggestions, Claude for thoughtful refactoring.
Cody: Sourcegraph’s Self-Aware Alternative
Cody is Sourcegraph’s answer to Copilot. Same vibe—VS Code and JetBrains plugin, inline chat, code generation. The difference: Cody indexes your entire codebase and uses that context when generating code.
This matters. When you ask Cody to “add a function that calls the API handler,” it knows your codebase structure. It finds the exact endpoint definition, understands your error handling pattern, and generates code that actually fits. Continue can approximate this with file search; Cody has it baked in.
The Trade-Off
Cody comes in flavors:
- Cloud version (free + pro) — Sourcegraph indexes your code, you get the context boost, but code leaves your machine
- Self-hosted enterprise — you run the indexer on your infrastructure
For teams that care about privacy, the enterprise version is the play. For individuals? The cloud version works fine—Sourcegraph isn’t Microsoft, and they’re explicit about not using your code for training (though you should read their privacy policy).
Cody works with Claude, Llama, Mistral, and others. The default is fast and reliable.
Tabby: The Self-Hosted Server
Tabby is a different beast. Instead of a plugin that calls a remote API, Tabby is a self-hosted inference server that runs on your hardware (ideally a GPU). You deploy it once, and any IDE plugin (VS Code, JetBrains, Vim, Neovim) talks to it over HTTP.
Why This Matters
If you have a team, Tabby is cheaper than per-seat Copilot licenses. Everyone points their editor at the same Tabby instance, and you’re sharing compute. One RTX 4090 doing inference beats 10 Copilot subscriptions.
Tabby Config Example
server: listen_addr: 0.0.0.0:8080
models: completion: model_name: TabbyML/SantaCoder device_mapping: cuda num_gpu_layers: -1 # all layers on GPU
chat: model_name: Mistral-7B-OpenOrca device_mapping: cuda num_gpu_layers: 40Run it:
docker run -d \ --gpus all \ -v ~/.tabby:/root/.tabby \ -p 8080:8080 \ tabbyml/tabby serveThen point VS Code at http://localhost:8080. It starts suggesting code immediately.
Tabby uses models like SantaCoder (fine-tuned for code completion) and Mistral (for chat). They’re smaller and faster than Copilot’s models, but the suggestion quality is solid.
Head-to-Head
| Feature | Continue | Cody | Tabby |
|---|---|---|---|
| Tab completion | ✓ | ✓ | ✓ |
| Inline edit | ✓ | ✓ | ✓ |
| Chat | ✓ | ✓ | ✓ (limited) |
| Codebase context | Limited | Excellent | Limited |
| Self-hosted | Yes (Ollama) | Yes (enterprise) | Yes (full) |
| Privacy (free tier) | ✓ | ✗ | ✓ |
| Per-seat cost | Free (if using Ollama) | ~$40/mo cloud | Shared hardware |
| Model flexibility | Highest | Good | Good |
Model Picks
- Tab completion: CodeLlama 7B or 13B (fine-tuned for code), DeepSeek Coder, or SantaCoder. These are optimized for speed.
- Chat: Mistral 7B, Neural Chat, or anything quantized and under 13B. For serious refactoring, Claude or GPT-4.
- Balance: Codellama 34B if your GPU can handle it (24GB+ VRAM), otherwise Mistral.
For a single developer on a MacBook, Continue + Ollama (Mistral 7B) is unbeatable. You get privacy, zero monthly cost, and enough smarts for most code.
For a small team, Tabby on a shared GPU box beats Copilot on cost and control.
For large codebases where context matters, Cody’s self-hosted enterprise is the answer—if budget allows.
The Verdict
All three work. The choice depends on your constraints:
- Privacy-first, single user? Continue + Ollama.
- Team that wants shared infrastructure? Tabby.
- Enterprise with unlimited budget and complex codebases? Cody enterprise.
None of these send your code to train a model that competes with you. That alone is worth the 30-minute setup time.