So you’ve seen people generating incredible AI art on their home computers and you want in. You fire up Google, type “how to run Stable Diffusion locally,” and immediately get hit with a wall of acronyms: A1111, Forge, ComfyUI, Fooocus, SDXL, LoRA, VAE, ControlNet… and your eyes glaze over like a fresh Krispy Kreme donut.
Don’t worry. By the end of this article, you’ll know exactly which tool to pick, what GPU you need, and how to get generating without wanting to throw your computer out the window. Let’s break it all down.
Wait, What Even Is Stable Diffusion?
First, let’s clear up the biggest confusion: Stable Diffusion is not a program you run. It’s the underlying AI model — the brain, if you will. Think of it like an engine. You still need a car (the interface) to actually drive it.
Stable Diffusion was created by Stability AI and released as open source, which means anyone can build a frontend for it. And boy, did people build frontends. We’re going to look at the four most popular ones:
- Automatic1111 (A1111) — The original gangster
- Forge — A1111’s faster, leaner cousin
- ComfyUI — The node-based power tool
- Fooocus — The “just make it work” option
Each one uses the same underlying Stable Diffusion models but wraps them in a wildly different experience. It’s like how Chrome, Firefox, and Safari all render the same websites but feel completely different to use.
The Contenders: A Quick Overview
Automatic1111 (AUTOMATIC1111/stable-diffusion-webui)
This was THE way to run Stable Diffusion locally for most of 2023. It’s a web-based UI that gives you sliders, dropdowns, and text boxes to control every aspect of image generation. If Stable Diffusion interfaces were cars, A1111 would be a manual transmission sedan — not the fanciest ride, but it gets you everywhere and there’s a massive community of people who know how to fix it when it breaks.
Pros:
- Enormous extension ecosystem
- More tutorials and guides than any other option
- Straightforward slider-and-button interface
- Supports basically every model, LoRA, and extension ever made
Cons:
- Noticeably slower than Forge on the same hardware
- VRAM hungry — struggles on cards with less than 8GB
- Development has slowed down compared to alternatives
- The settings page looks like a Boeing 747 cockpit
Forge (lllyasviel/stable-diffusion-webui-forge)
Forge is what happens when someone looks at A1111 and says, “I like this, but what if it didn’t eat my entire GPU for breakfast?” Created by the same developer behind ControlNet (more on that later), Forge is essentially A1111 with significant performance optimizations under the hood.
Pros:
- 30-75% faster than A1111 on the same hardware (not a typo)
- Uses significantly less VRAM — SDXL runs on 6GB cards
- Mostly compatible with A1111 extensions
- Same familiar interface if you’re coming from A1111
Cons:
- Slightly less extension compatibility than vanilla A1111
- Newer project, so fewer dedicated tutorials
- Some A1111-specific workflows need minor tweaks
- Can lag behind on certain bleeding-edge extensions
If you were planning to use A1111, just use Forge instead. Seriously. It’s the same experience but faster. The only reason to pick A1111 over Forge is if you need a very specific extension that hasn’t been ported yet, which is increasingly rare.
ComfyUI
ComfyUI is where things get interesting — and by “interesting,” I mean “your screen looks like a conspiracy theory board with all the string connections.” ComfyUI uses a node-based workflow system where you visually connect processing blocks together. If A1111 is a microwave (push button, get food), ComfyUI is a full commercial kitchen where you control every burner individually.
Pros:
- Maximum control over every step of the generation pipeline
- Workflows are shareable and reproducible (huge for consistency)
- Extremely memory efficient — often uses less VRAM than even Forge
- Native support for advanced techniques like IP-Adapter, AnimateDiff, etc.
- The fastest option for batch processing and complex pipelines
- Active, rapidly evolving development
Cons:
- Learning curve steeper than a San Francisco street
- The default interface looks like it was designed by electrical engineers for electrical engineers (because it kind of was)
- Debugging a broken workflow can feel like defusing a bomb
- Simple tasks require more setup than traditional UIs
Here’s the thing about ComfyUI that nobody tells you upfront: once you learn it, you’ll never want to go back. The node system means you can build workflows that do things no slider-based UI can touch. Want to generate an image, automatically upscale it, apply a different LoRA to specific regions, and save it with custom metadata — all in one click? ComfyUI does that. But you’ll spend a weekend learning how to connect the nodes first.
Fooocus
Fooocus is the Midjourney of local AI image generation. It was designed with one goal: make generating beautiful images as simple as possible. Type a prompt, hit generate, get a great image. That’s it. No sliders for CFG scale. No sampler selection dropdown with 47 options. No existential crisis about which VAE to use.
Pros:
- Genuinely the easiest way to generate images locally
- Produces excellent results out of the box with minimal prompting
- Built-in prompt enhancement (makes your simple prompts better automatically)
- Inpainting and outpainting work surprisingly well
- Very low VRAM usage with smart optimizations
Cons:
- Limited customization compared to other options
- Fewer models and extensions supported
- “Simple” means you can’t fine-tune when you need to
- Smaller community and fewer resources
- Development activity has been inconsistent
Fooocus is perfect for people who just want pretty pictures without a PhD in diffusion models. It’s also great as a first step before graduating to more complex tools.
SDXL vs SD 1.5: The Model Question
Before we talk about hardware, let’s address the elephant in the room: which base model should you use?
SD 1.5 (512x512 native resolution)
- VRAM needed: 4-6GB minimum
- Speed: Fast, even on older hardware
- Quality: Good, but showing its age
- Model ecosystem: Massive. Thousands of fine-tuned models and LoRAs
- Best for: Older GPUs, specific art styles with fine-tuned models, speed over quality
SDXL (1024x1024 native resolution)
- VRAM needed: 6-8GB minimum (more is better)
- Speed: 2-4x slower than SD 1.5
- Quality: Significantly better, especially for photorealism and text
- Model ecosystem: Growing rapidly, though still smaller than SD 1.5
- Best for: Anyone with a modern GPU who wants the best quality
SD 3.5 and Beyond
- VRAM needed: 8GB+ recommended
- Speed: Comparable to SDXL
- Quality: Improved text rendering, better prompt adherence
- Model ecosystem: Still developing
- Best for: Early adopters and those chasing cutting-edge quality
My recommendation: Start with SDXL if your GPU can handle it. SD 1.5 is not dead — it has an incredible ecosystem of fine-tuned models — but SDXL produces better results with less effort on prompting. If you’re on a potato GPU (4GB VRAM or less), SD 1.5 is your friend.
GPU Requirements: The Real Talk
Let’s cut through the marketing fluff and talk about what you actually need.
The Bare Minimum (4GB VRAM)
- Cards: GTX 1650, RX 580
- What works: SD 1.5 at 512x512, Fooocus with optimizations
- What doesn’t: SDXL (technically possible, painfully slow), large ControlNet models
- Reality check: You’ll be waiting. A lot. But it works.
The Sweet Spot (8GB VRAM)
- Cards: RTX 3060 Ti, RTX 3070, RTX 4060, RX 6700 XT
- What works: Everything at reasonable speeds. SDXL, ControlNet, LoRAs, the whole buffet.
- What doesn’t: Running multiple large models simultaneously, very high-resolution generation without tiling
- Reality check: This is where the fun starts. Most people should target this tier.
The Comfortable Zone (12GB+ VRAM)
- Cards: RTX 3060 12GB, RTX 4070, RTX 4070 Ti
- What works: Everything, fast. Multiple ControlNet models, high-res generation, batch processing.
- Reality check: The RTX 3060 12GB is a weird value king here. It’s slower than the 3060 Ti but has more VRAM, and VRAM matters more than raw speed for AI workloads.
The “Money Is No Object” Tier (16GB+ VRAM)
- Cards: RTX 4080, RTX 4090, RTX 5080, RTX 5090
- What works: Everything, immediately. Training LoRAs, running flux models, basically anything the open-source community throws at you.
- Reality check: If you have a 4090 or 5090, you already know. Go nuts.
A Note on AMD GPUs
AMD support has improved dramatically. ROCm works on Linux for RX 6000 and 7000 series cards. Windows support is spottier. If you’re on AMD and Linux, you’re in decent shape. AMD on Windows? You’ll need DirectML backends, which work but are slower than CUDA equivalents. It’s getting better, but NVIDIA still has the smoother experience for AI workloads.
Docker Setups: Because Dependency Hell Is Real
If you’ve ever tried to install Python packages for AI projects, you know the special joy of version conflicts. Docker solves this by putting everything in a container. Here’s how to get each tool running with Docker.
General Prerequisites
# Install NVIDIA Container Toolkit (for GPU passthrough)
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
# Verify GPU is visible inside Docker
docker run --rm --gpus all nvidia/cuda:12.1-base-ubuntu22.04 nvidia-smi
A1111 / Forge via Docker
# docker-compose.yml
services:
forge:
image: ghcr.io/lllyasviel/stable-diffusion-webui-forge:latest
ports:
- "7860:7860"
volumes:
- ./models:/app/models
- ./outputs:/app/outputs
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Several community-maintained Docker images exist for both A1111 and Forge. Look for ones that include common extensions pre-installed to save yourself setup time.
ComfyUI via Docker
# docker-compose.yml
services:
comfyui:
image: ghcr.io/ai-dock/comfyui:latest
ports:
- "8188:8188"
volumes:
- ./models:/opt/ComfyUI/models
- ./output:/opt/ComfyUI/output
- ./custom_nodes:/opt/ComfyUI/custom_nodes
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Pro tip: Mount your models directory as a shared volume across all your Docker containers. There’s no reason to have four copies of a 6GB SDXL model sitting on your drive. Your SSD will thank you.
Fooocus via Docker
# docker-compose.yml
services:
fooocus:
image: ghcr.io/lllyasviel/fooocus:latest
ports:
- "7865:7865"
volumes:
- ./models:/app/models
- ./outputs:/app/outputs
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
The Shared Models Trick
Here’s a sanity-saving tip. Create one central models directory and symlink or volume-mount it into each tool:
mkdir -p ~/ai-models/{checkpoints,loras,vae,controlnet,embeddings}
# Then in each docker-compose.yml, mount specific subdirectories:
# - ~/ai-models/checkpoints:/app/models/Stable-diffusion
# - ~/ai-models/loras:/app/models/Lora
# - ~/ai-models/vae:/app/models/VAE
This way you download a model once and every tool can use it. Disk space is precious when your models folder inevitably grows to 200GB.
ControlNet: The Secret Weapon
ControlNet deserves special mention because it’s a game-changer regardless of which interface you use. It lets you guide image generation using reference images for poses, edges, depth maps, and more. Think of it as giving the AI a coloring book outline instead of letting it freestyle.
In A1111/Forge: Install the ControlNet extension, download the models, select a preprocessor and model from the dropdown. Straightforward.
In ComfyUI: ControlNet is handled through nodes. More setup, but you get granular control over how strongly it influences different parts of the generation. You can even blend multiple ControlNet models with different weights on different steps.
In Fooocus: Basic ControlNet support is built in through the “Input Image” tab. Less flexible but it works out of the box.
The most commonly used ControlNet models:
- Canny: Edge detection. Great for maintaining structure and outlines.
- OpenPose: Human pose estimation. Essential for character art.
- Depth: Depth maps for maintaining spatial relationships.
- IP-Adapter: Style transfer from reference images. This one’s a personal favorite.
- Tile: Upscaling with detail preservation. Seriously useful.
LoRAs: Teaching Old Models New Tricks
LoRAs (Low-Rank Adaptations) are small add-on models that modify the base model’s behavior. Want your SDXL model to generate images in a specific art style? There’s a LoRA for that. Want it to know what a specific character looks like? LoRA. Want photorealistic skin textures? You guessed it.
LoRAs are typically 10-200MB (compared to the 2-7GB base models), making them easy to collect. And you will collect them. You’ll have a LoRA problem within a week. I believe in you.
Managing LoRAs across tools:
- Store them in your shared models directory under a
lorasfolder - Both A1111/Forge and ComfyUI use the same LoRA format
- Fooocus supports LoRAs but with fewer options for weight adjustment
- CivitAI is the primary marketplace for community LoRAs (it’s basically the Steam Workshop for AI models)
VRAM Optimization Tips
Running out of VRAM? Here are some tricks that work across all platforms:
-
Enable xformers or torch SDP attention — Free speed and VRAM savings. Most modern setups use SDP by default.
-
Use —medvram or —lowvram flags (A1111/Forge) — Trades speed for lower VRAM usage by moving model components to CPU as needed.
-
FP16 / FP8 precision — Half-precision (FP16) is standard now. Forge and ComfyUI support FP8, which halves VRAM usage again with minimal quality loss.
-
Tiled VAE decoding — Breaks the VAE decode step into tiles instead of processing the whole image at once. Essential for high-resolution images on limited VRAM.
-
Model offloading — ComfyUI is especially good at this. It can move unused models to CPU RAM or even disk, keeping only what’s needed on the GPU.
-
Close your browser tabs. Seriously. That Chrome session with 47 tabs is eating your VRAM. Your GPU can’t generate images AND render your YouTube playlist.
Workflow Comparison: Same Task, Different Tools
Let’s say you want to generate a portrait, upscale it 2x, and apply a film grain effect. Here’s how that looks in each tool:
In A1111/Forge:
- Type prompt, adjust settings, generate
- Send to Extras tab, select upscaler, upscale
- Send to img2img, apply film grain LoRA at low denoise
- Three separate steps, manual sending between tabs
In ComfyUI:
- Build a workflow once: KSampler -> Upscale -> Apply LoRA -> KSampler -> Save
- Click “Queue Prompt”
- One click. Done. Forever. Reuse that workflow for every portrait.
In Fooocus:
- Type prompt, generate (it auto-upscales based on your quality settings)
- Film grain? You’d need to manually apply that externally
- Simple but limited
This is where ComfyUI’s upfront complexity pays off. Build once, use forever.
The Beginner’s Decision Tree
Still not sure which to pick? Let me make it simple:
“I just want pretty pictures with minimal setup” -> Fooocus. Install it, type a prompt, be amazed. Graduate to something else when you hit its limits.
“I want control but don’t want to learn node programming” -> Forge. It’s A1111 but better. The slider-based interface is intuitive, the extension ecosystem is massive, and performance is great.
“I want maximum control and I’m willing to invest time learning” -> ComfyUI. The learning curve is real, but the payoff is enormous. Every serious AI artist I know ended up on ComfyUI eventually.
“I used A1111 a year ago and haven’t checked back” -> Switch to Forge. Same workflow, faster results. You’ll wonder why you didn’t switch sooner.
My Honest Recommendation
Install Fooocus first. Spend an evening generating images, understanding prompting, and figuring out what you actually want to do with AI image generation. Don’t overthink it.
Then install Forge when you want more control over sampling, LoRAs, and ControlNet. You’ll feel the power difference immediately.
Finally, when you find yourself thinking “I wish I could automate this multi-step process,” that’s when you open ComfyUI. Watch a couple of beginner workflow tutorials, download some community workflows from places like OpenArt or CivitAI, and start connecting nodes.
The beauty of local AI image generation is that you’re not locked in. Your models work across all these tools. Your LoRAs are portable. Your ControlNet models don’t care which frontend loads them. Switch freely, use what works for the task at hand, and remember: the best tool is the one that gets you generating instead of configuring.
Now stop reading and go make some weird AI art. Your GPU is getting bored.