Skip to content
SumGuy's Ramblings
Go back

Continue.dev vs Cody vs Tabby: AI Code Assistants That Live on Your Machine

Your Code Deserves Better Than a Cloud Subscription Judging It in Real Time

Let’s paint a picture. You’re deep in a project, you’ve got a function called do_the_thing_for_real_this_time_v3, a variable named temp2, and a comment that just says # idk. GitHub Copilot is watching all of this. It’s learning from all of this. Somewhere in a Microsoft datacenter, your crimes against clean code are being ingested into a training pipeline.

Or maybe that’s not what bothers you. Maybe it’s the $10/month bill. Maybe it’s the fact that your company’s legal team sent a three-page email about not pasting proprietary code into AI tools. Maybe you just like owning your stack.

Whatever the reason, you’re here because you want an AI code assistant that runs on your own hardware, respects your privacy, and doesn’t require a credit card. The good news: the options are actually pretty solid now. The slightly less good news: each one has its own quirks, setup overhead, and personality.

Let’s dig in.


Why Self-Host Your Code Assistant?

Before the comparison, let’s be clear about what you’re trading off.

The case for self-hosting:

The trade-offs:

If those trade-offs are acceptable, let’s look at what’s actually available.


The Three Contenders

Continue.dev — The Flexible Swiss Army Knife

Continue.dev is a VS Code and JetBrains plugin that acts as a front-end for whatever AI backend you want to point it at. It doesn’t run models itself — it connects to things like Ollama, LiteLLM, OpenAI-compatible APIs, Anthropic, or basically anything that speaks the right protocol.

This is both its superpower and its one complication: you need to have a backend running separately.

What it does:

The config.json approach means you can switch models without reinstalling anything. Want to try a different model? Update one field. Want to point at a remote server instead of local? One line change.

Best for: People who already have (or want to set up) Ollama, and want a polished IDE experience with full chat + autocomplete.

Verdict: The most practical choice for most self-hosters. Especially if you’re already in the Ollama ecosystem.


Tabby — The Server You Own

Tabby is a different beast. It’s a self-hosted server application — you run it on a machine, it exposes an OpenAPI endpoint, and then your IDE extensions connect to it. The VS Code extension is the main integration, with JetBrains support also available.

What makes Tabby interesting is that it’s designed around the idea of training on your own codebase. You can point it at your repos and it builds context from your actual code, not just a generic pretrained model. For teams working on large or specialized codebases, this is a genuinely useful differentiator.

What it does:

The trade-off: Tabby is more operationally involved than Continue.dev. You’re managing a server process, dealing with model downloads and GPU configuration, and keeping it running. It’s not hard, but it’s closer to “running a service” than “installing a plugin.”

# Quick Tabby start with Docker (CPU only)
docker run -it \
  -v $HOME/.tabby:/data \
  -p 8080:8080 \
  tabbyml/tabby \
  serve --model TabbyML/StarCoder-1B

Best for: Teams or power users who want to index their own codebase and treat the AI assistant as a proper service in their infrastructure.

Verdict: More setup, more power. Excellent if codebase-aware completion is what you need.


Cody — The Polished One With an Escape Hatch

Cody is Sourcegraph’s AI coding assistant. It’s the most polished of the three out of the box, has a generous free tier, and works really well as a cloud product.

But it also has a self-hosted path via Sourcegraph Enterprise, which is why it belongs in this comparison.

What it does:

The honest take on self-hosting: Cody self-hosted means running Sourcegraph, which is a substantial piece of infrastructure. It’s not a weekend project — it’s closer to “enterprise deployment.” If you’re a team already using Sourcegraph, adding Cody is a natural extension. If you’re a solo developer who just wants local completions, it’s overkill.

The free tier cloud version is legitimately good and doesn’t require any setup. It uses Anthropic models on the backend and has solid rate limits. For individuals, this might actually be the right call even if it’s not “self-hosted.”

Best for: Teams already in the Sourcegraph ecosystem, or individuals who want the cleanest cloud-based experience that isn’t GitHub Copilot.

Verdict: Great product, complex self-hosting story. Use the free cloud tier unless you’ve got Sourcegraph reasons.


The Practical Setup: Continue.dev + Ollama

This is the combo most people should start with. Here’s the full path from zero to working autocomplete.

Step 1: Install Ollama

Ollama handles model downloads, serving, and the API. It’s the easiest way to run local models.

# Linux/macOS
curl -fsSL https://ollama.com/install.sh | sh

# Pull a good coding model
ollama pull qwen2.5-coder:7b

qwen2.5-coder:7b is a solid choice — strong at code, reasonable resource requirements. If you’ve got the VRAM, deepseek-coder-v2:16b is excellent.

Verify it’s running:

ollama list
curl http://localhost:11434/api/tags

Step 2: Install Continue.dev

In VS Code: Extensions panel → search “Continue” → install the one by Continue.dev.

Once installed, it opens a sidebar. You’ll be prompted to configure your backend.

Step 3: Configure config.json

Open the command palette (Ctrl+Shift+P) → “Continue: Open config.json”

Here’s a working config that sets up both chat and autocomplete with Ollama:

{
  "models": [
    {
      "title": "Qwen 2.5 Coder 7B",
      "provider": "ollama",
      "model": "qwen2.5-coder:7b",
      "apiBase": "http://localhost:11434"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Autocomplete",
    "provider": "ollama",
    "model": "qwen2.5-coder:7b",
    "apiBase": "http://localhost:11434"
  },
  "allowAnonymousTelemetry": false
}

That allowAnonymousTelemetry: false line is there because you came here for privacy.

Step 4: Use it

That’s it. You’ve got a local Copilot.


Comparison Table

FeatureContinue.devTabbyCody
Self-hostableVia backend (Ollama etc.)Yes, native serverEnterprise only
Free tierYes (with local models)YesYes (cloud)
VS Code supportYesYesYes
JetBrains supportYesYesYes
ChatYesPartial (newer)Yes
AutocompleteYesYes (core feature)Yes
Codebase indexingBasicStrongVery strong
Setup difficultyMediumMedium-HighLow (cloud) / Very High (self-hosted)
Best model flexibilityHighMediumLow (opinionated)
Works offlineYesYesNo (free tier)

Which One Should You Actually Use?

Use Continue.dev if: You want to get something working this weekend, you’re already comfortable with or interested in Ollama, and you want chat + autocomplete without managing a separate server. This is the default recommendation for 80% of people reading this.

Use Tabby if: You’re working on a large proprietary codebase and want the assistant to actually understand your code patterns, not just generic completions. Also if you want to run it as a shared service for a small team.

Use Cody (free tier) if: You don’t have the hardware for local models, you don’t want to manage infrastructure, and you just want something that works well without being GitHub Copilot. The free cloud tier is genuinely solid.

Use GitHub Copilot if: You enjoy paying $10/month for the privilege of having Microsoft read your variable names. (It is, to be fair, very good. But you knew that already.)


Hardware Reality Check

Let’s not pretend you can run a useful 70B model on a MacBook Air. Here’s the honest breakdown:

CPU-only is possible but slow. Tabby and Ollama both support it. If you’re on CPU, smaller models (1.5B-3B) are the realistic choice for anything resembling a responsive experience.


The Bottom Line

Self-hosted AI code assistants have crossed the threshold from “interesting experiment” to “actually useful daily driver.” Continue.dev with Ollama is the entry point most people should take — it’s practical, flexible, and the setup is well-documented.

Tabby is the right call if you’re thinking about this from a team infrastructure angle and want codebase-aware completions as a proper service.

Cody is excellent at what it does, but its self-hosted story is more complex than it looks. The free cloud tier is worth using if local inference isn’t your thing.

Your code can stay on your machine. Your variable names can remain your shameful secret. That’s the deal.


Have a different setup that works better for you? Running something exotic like LM Studio or a custom LiteLLM proxy? The comments are open, and I’m always curious what janky configurations people have gotten working in production.


Share this post on:

Previous Post
Ulimit, Cgroups, and the Art of Stopping Processes From Eating Your Server
Next Post
CUDA vs ROCm vs CPU: Running AI on Whatever GPU You've Got