Skip to content
SumGuy's Ramblings
Go back

Stable Diffusion vs ComfyUI vs Fooocus: AI Image Generation at Home

So you’ve seen people generating incredible AI art on their home computers and you want in. You fire up Google, type “how to run Stable Diffusion locally,” and immediately get hit with a wall of acronyms: A1111, Forge, ComfyUI, Fooocus, SDXL, LoRA, VAE, ControlNet… and your eyes glaze over like a fresh Krispy Kreme donut.

Don’t worry. By the end of this article, you’ll know exactly which tool to pick, what GPU you need, and how to get generating without wanting to throw your computer out the window. Let’s break it all down.

Wait, What Even Is Stable Diffusion?

First, let’s clear up the biggest confusion: Stable Diffusion is not a program you run. It’s the underlying AI model — the brain, if you will. Think of it like an engine. You still need a car (the interface) to actually drive it.

Stable Diffusion was created by Stability AI and released as open source, which means anyone can build a frontend for it. And boy, did people build frontends. We’re going to look at the four most popular ones:

  1. Automatic1111 (A1111) — The original gangster
  2. Forge — A1111’s faster, leaner cousin
  3. ComfyUI — The node-based power tool
  4. Fooocus — The “just make it work” option

Each one uses the same underlying Stable Diffusion models but wraps them in a wildly different experience. It’s like how Chrome, Firefox, and Safari all render the same websites but feel completely different to use.

The Contenders: A Quick Overview

Automatic1111 (AUTOMATIC1111/stable-diffusion-webui)

This was THE way to run Stable Diffusion locally for most of 2023. It’s a web-based UI that gives you sliders, dropdowns, and text boxes to control every aspect of image generation. If Stable Diffusion interfaces were cars, A1111 would be a manual transmission sedan — not the fanciest ride, but it gets you everywhere and there’s a massive community of people who know how to fix it when it breaks.

Pros:

Cons:

Forge (lllyasviel/stable-diffusion-webui-forge)

Forge is what happens when someone looks at A1111 and says, “I like this, but what if it didn’t eat my entire GPU for breakfast?” Created by the same developer behind ControlNet (more on that later), Forge is essentially A1111 with significant performance optimizations under the hood.

Pros:

Cons:

If you were planning to use A1111, just use Forge instead. Seriously. It’s the same experience but faster. The only reason to pick A1111 over Forge is if you need a very specific extension that hasn’t been ported yet, which is increasingly rare.

ComfyUI

ComfyUI is where things get interesting — and by “interesting,” I mean “your screen looks like a conspiracy theory board with all the string connections.” ComfyUI uses a node-based workflow system where you visually connect processing blocks together. If A1111 is a microwave (push button, get food), ComfyUI is a full commercial kitchen where you control every burner individually.

Pros:

Cons:

Here’s the thing about ComfyUI that nobody tells you upfront: once you learn it, you’ll never want to go back. The node system means you can build workflows that do things no slider-based UI can touch. Want to generate an image, automatically upscale it, apply a different LoRA to specific regions, and save it with custom metadata — all in one click? ComfyUI does that. But you’ll spend a weekend learning how to connect the nodes first.

Fooocus

Fooocus is the Midjourney of local AI image generation. It was designed with one goal: make generating beautiful images as simple as possible. Type a prompt, hit generate, get a great image. That’s it. No sliders for CFG scale. No sampler selection dropdown with 47 options. No existential crisis about which VAE to use.

Pros:

Cons:

Fooocus is perfect for people who just want pretty pictures without a PhD in diffusion models. It’s also great as a first step before graduating to more complex tools.

SDXL vs SD 1.5: The Model Question

Before we talk about hardware, let’s address the elephant in the room: which base model should you use?

SD 1.5 (512x512 native resolution)

SDXL (1024x1024 native resolution)

SD 3.5 and Beyond

My recommendation: Start with SDXL if your GPU can handle it. SD 1.5 is not dead — it has an incredible ecosystem of fine-tuned models — but SDXL produces better results with less effort on prompting. If you’re on a potato GPU (4GB VRAM or less), SD 1.5 is your friend.

GPU Requirements: The Real Talk

Let’s cut through the marketing fluff and talk about what you actually need.

The Bare Minimum (4GB VRAM)

The Sweet Spot (8GB VRAM)

The Comfortable Zone (12GB+ VRAM)

The “Money Is No Object” Tier (16GB+ VRAM)

A Note on AMD GPUs

AMD support has improved dramatically. ROCm works on Linux for RX 6000 and 7000 series cards. Windows support is spottier. If you’re on AMD and Linux, you’re in decent shape. AMD on Windows? You’ll need DirectML backends, which work but are slower than CUDA equivalents. It’s getting better, but NVIDIA still has the smoother experience for AI workloads.

Docker Setups: Because Dependency Hell Is Real

If you’ve ever tried to install Python packages for AI projects, you know the special joy of version conflicts. Docker solves this by putting everything in a container. Here’s how to get each tool running with Docker.

General Prerequisites

# Install NVIDIA Container Toolkit (for GPU passthrough)
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

# Verify GPU is visible inside Docker
docker run --rm --gpus all nvidia/cuda:12.1-base-ubuntu22.04 nvidia-smi

A1111 / Forge via Docker

# docker-compose.yml
services:
  forge:
    image: ghcr.io/lllyasviel/stable-diffusion-webui-forge:latest
    ports:
      - "7860:7860"
    volumes:
      - ./models:/app/models
      - ./outputs:/app/outputs
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Several community-maintained Docker images exist for both A1111 and Forge. Look for ones that include common extensions pre-installed to save yourself setup time.

ComfyUI via Docker

# docker-compose.yml
services:
  comfyui:
    image: ghcr.io/ai-dock/comfyui:latest
    ports:
      - "8188:8188"
    volumes:
      - ./models:/opt/ComfyUI/models
      - ./output:/opt/ComfyUI/output
      - ./custom_nodes:/opt/ComfyUI/custom_nodes
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Pro tip: Mount your models directory as a shared volume across all your Docker containers. There’s no reason to have four copies of a 6GB SDXL model sitting on your drive. Your SSD will thank you.

Fooocus via Docker

# docker-compose.yml
services:
  fooocus:
    image: ghcr.io/lllyasviel/fooocus:latest
    ports:
      - "7865:7865"
    volumes:
      - ./models:/app/models
      - ./outputs:/app/outputs
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

The Shared Models Trick

Here’s a sanity-saving tip. Create one central models directory and symlink or volume-mount it into each tool:

mkdir -p ~/ai-models/{checkpoints,loras,vae,controlnet,embeddings}

# Then in each docker-compose.yml, mount specific subdirectories:
# - ~/ai-models/checkpoints:/app/models/Stable-diffusion
# - ~/ai-models/loras:/app/models/Lora
# - ~/ai-models/vae:/app/models/VAE

This way you download a model once and every tool can use it. Disk space is precious when your models folder inevitably grows to 200GB.

ControlNet: The Secret Weapon

ControlNet deserves special mention because it’s a game-changer regardless of which interface you use. It lets you guide image generation using reference images for poses, edges, depth maps, and more. Think of it as giving the AI a coloring book outline instead of letting it freestyle.

In A1111/Forge: Install the ControlNet extension, download the models, select a preprocessor and model from the dropdown. Straightforward.

In ComfyUI: ControlNet is handled through nodes. More setup, but you get granular control over how strongly it influences different parts of the generation. You can even blend multiple ControlNet models with different weights on different steps.

In Fooocus: Basic ControlNet support is built in through the “Input Image” tab. Less flexible but it works out of the box.

The most commonly used ControlNet models:

LoRAs: Teaching Old Models New Tricks

LoRAs (Low-Rank Adaptations) are small add-on models that modify the base model’s behavior. Want your SDXL model to generate images in a specific art style? There’s a LoRA for that. Want it to know what a specific character looks like? LoRA. Want photorealistic skin textures? You guessed it.

LoRAs are typically 10-200MB (compared to the 2-7GB base models), making them easy to collect. And you will collect them. You’ll have a LoRA problem within a week. I believe in you.

Managing LoRAs across tools:

VRAM Optimization Tips

Running out of VRAM? Here are some tricks that work across all platforms:

  1. Enable xformers or torch SDP attention — Free speed and VRAM savings. Most modern setups use SDP by default.

  2. Use —medvram or —lowvram flags (A1111/Forge) — Trades speed for lower VRAM usage by moving model components to CPU as needed.

  3. FP16 / FP8 precision — Half-precision (FP16) is standard now. Forge and ComfyUI support FP8, which halves VRAM usage again with minimal quality loss.

  4. Tiled VAE decoding — Breaks the VAE decode step into tiles instead of processing the whole image at once. Essential for high-resolution images on limited VRAM.

  5. Model offloading — ComfyUI is especially good at this. It can move unused models to CPU RAM or even disk, keeping only what’s needed on the GPU.

  6. Close your browser tabs. Seriously. That Chrome session with 47 tabs is eating your VRAM. Your GPU can’t generate images AND render your YouTube playlist.

Workflow Comparison: Same Task, Different Tools

Let’s say you want to generate a portrait, upscale it 2x, and apply a film grain effect. Here’s how that looks in each tool:

In A1111/Forge:

  1. Type prompt, adjust settings, generate
  2. Send to Extras tab, select upscaler, upscale
  3. Send to img2img, apply film grain LoRA at low denoise
  4. Three separate steps, manual sending between tabs

In ComfyUI:

  1. Build a workflow once: KSampler -> Upscale -> Apply LoRA -> KSampler -> Save
  2. Click “Queue Prompt”
  3. One click. Done. Forever. Reuse that workflow for every portrait.

In Fooocus:

  1. Type prompt, generate (it auto-upscales based on your quality settings)
  2. Film grain? You’d need to manually apply that externally
  3. Simple but limited

This is where ComfyUI’s upfront complexity pays off. Build once, use forever.

The Beginner’s Decision Tree

Still not sure which to pick? Let me make it simple:

“I just want pretty pictures with minimal setup” -> Fooocus. Install it, type a prompt, be amazed. Graduate to something else when you hit its limits.

“I want control but don’t want to learn node programming” -> Forge. It’s A1111 but better. The slider-based interface is intuitive, the extension ecosystem is massive, and performance is great.

“I want maximum control and I’m willing to invest time learning” -> ComfyUI. The learning curve is real, but the payoff is enormous. Every serious AI artist I know ended up on ComfyUI eventually.

“I used A1111 a year ago and haven’t checked back” -> Switch to Forge. Same workflow, faster results. You’ll wonder why you didn’t switch sooner.

My Honest Recommendation

Install Fooocus first. Spend an evening generating images, understanding prompting, and figuring out what you actually want to do with AI image generation. Don’t overthink it.

Then install Forge when you want more control over sampling, LoRAs, and ControlNet. You’ll feel the power difference immediately.

Finally, when you find yourself thinking “I wish I could automate this multi-step process,” that’s when you open ComfyUI. Watch a couple of beginner workflow tutorials, download some community workflows from places like OpenArt or CivitAI, and start connecting nodes.

The beauty of local AI image generation is that you’re not locked in. Your models work across all these tools. Your LoRAs are portable. Your ControlNet models don’t care which frontend loads them. Switch freely, use what works for the task at hand, and remember: the best tool is the one that gets you generating instead of configuring.

Now stop reading and go make some weird AI art. Your GPU is getting bored.


Share this post on:

Previous Post
mTLS Explained: When Regular TLS Isn't Paranoid Enough
Next Post
Appwrite: Your Own Firebase, Minus the Google Surveillance Subscription