Look, we need to talk about your Docker images.
I know you wrote that Dockerfile in 2021 during a sprint that was already two days over. I know it works. I know “it runs on my machine” and also on the EC2 instance you forgot to right-size. But your production image is 1.2 GB, and roughly 900 MB of that is build tools, dev dependencies, and whatever node_modules decided to drag along for the ride.
Multi-stage Docker builds have been around since Docker 17.05 (that’s 2017, folks), and yet I still see single-stage monstrosities shipping compilers to production like they’re expecting to do a hotfix inside the running container.
Let’s fix that.
What Are Multi-Stage Docker Builds?
Multi-stage builds let you use multiple FROM statements in a single Dockerfile. Each FROM starts a new “stage” with a fresh filesystem. The magic is that you can COPY --from= a previous stage, grabbing only the artifacts you need.
Think of it like cooking Thanksgiving dinner. Stage one is the kitchen: flour everywhere, twelve pots on the stove, the smoke alarm going off (normal). Stage two is the dining table: just the finished turkey, mashed potatoes, and pie. Nobody needs to see the war zone that produced them.
# Stage 1: The kitchen (build environment)
FROM node:20 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Stage 2: The dining table (production)
FROM node:20-slim
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package*.json ./
RUN npm ci --omit=dev
EXPOSE 3000
CMD ["node", "dist/index.js"]
That COPY --from=builder is doing the heavy lifting. You’re reaching back into the build stage and grabbing only what you need. The compiler, the 847 dev dependencies, that one package that pulls in all of Chromium — all left behind.
The Problem: A Real-World Horror Story
Here’s a Dockerfile I’ve seen in the wild more times than I’d like to admit:
FROM node:20
WORKDIR /app
COPY . .
RUN npm install
RUN npm run build
EXPOSE 3000
CMD ["node", "dist/index.js"]
Looks clean enough, right? Five lines, easy to read. Here’s what’s actually in your production image:
| What’s Included | Approximate Size |
|---|---|
| Debian base (node:20 uses bookworm) | ~350 MB |
| Node.js runtime + npm + yarn | ~200 MB |
| Python (yes, node:20 includes Python) | ~50 MB |
| gcc, g++, make (build essentials) | ~150 MB |
| Your source code (including tests, docs, .git) | ~20 MB |
| node_modules (ALL of them, dev + prod) | ~300 MB |
| Your actual built app | ~5 MB |
| Total | ~1,075 MB |
You’re shipping 1 GB of stuff to serve 5 MB of JavaScript. That’s like renting a U-Haul to deliver a sandwich.
Node.js: Before and After
Before (The Single-Stage Sadness)
FROM node:20
WORKDIR /app
COPY . .
RUN npm install
RUN npm run build
EXPOSE 3000
CMD ["node", "dist/index.js"]
Image size: ~1,100 MB
After (Multi-Stage, Properly Done)
# ---- Build Stage ----
FROM node:20-bookworm AS builder
WORKDIR /app
# Copy dependency files first (cache optimization)
COPY package.json package-lock.json ./
# Install ALL dependencies (including devDependencies for building)
RUN npm ci
# Now copy source code
COPY tsconfig.json ./
COPY src/ ./src/
# Build the application
RUN npm run build
# Prune dev dependencies
RUN npm prune --omit=dev
# ---- Production Stage ----
FROM node:20-bookworm-slim AS production
# Don't run as root, you animal
RUN groupadd -r appuser && useradd -r -g appuser appuser
WORKDIR /app
# Copy only what we need from the builder
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./
# Set ownership
RUN chown -R appuser:appuser /app
USER appuser
EXPOSE 3000
# Use tini or dumb-init for proper signal handling
CMD ["node", "dist/index.js"]
Image size: ~220 MB
That’s an 80% reduction. And we haven’t even gone full alpine yet.
Going Even Smaller with Alpine
FROM node:20-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY tsconfig.json ./
COPY src/ ./src/
RUN npm run build
RUN npm prune --omit=dev
FROM node:20-alpine AS production
RUN addgroup -S appuser && adduser -S appuser -G appuser
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./
RUN chown -R appuser:appuser /app
USER appuser
EXPOSE 3000
CMD ["node", "dist/index.js"]
Image size: ~130 MB
A quick word on Alpine though: it uses musl libc instead of glibc, which means some native Node.js modules (looking at you, bcrypt and sharp) might throw a tantrum. If you hit weird segfaults or build failures, switch back to -slim. It’s still tiny, and it won’t gaslight your C++ addons.
Python: Before and After
Python has its own special flavor of dependency chaos. Virtual environments, wheels, system packages — it’s a party.
Before (The Chunky One)
FROM python:3.12
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Image size: ~1,050 MB
After (Multi-Stage with Virtual Env)
# ---- Build Stage ----
FROM python:3.12-bookworm AS builder
WORKDIR /app
# Create a virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# ---- Production Stage ----
FROM python:3.12-slim-bookworm AS production
# Install only runtime system dependencies (if needed)
RUN apt-get update && \
apt-get install -y --no-install-recommends libpq5 && \
rm -rf /var/lib/apt/lists/*
# Copy the virtual environment from the builder
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Create non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser
WORKDIR /app
COPY --chown=appuser:appuser . .
USER appuser
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Image size: ~180 MB
The trick here is the virtual environment. By creating it in the build stage and copying the entire /opt/venv directory to production, you get all your installed packages without any of the build toolchain (gcc, python headers, etc.) that was needed to compile them.
Python Pro Tips
If you have packages that need C compilation (like psycopg2 instead of psycopg2-binary), the build stage handles it and you only ship the compiled .so files. Beautiful.
# In the build stage, install build deps
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc \
libpq-dev \
&& rm -rf /var/lib/apt/lists/*
That gcc and libpq-dev never touch your production image. They live and die in the build stage, like a summer fling that knew what it was.
Go: The Multi-Stage Dream
Go is where multi-stage builds really shine because Go compiles to a static binary. You can literally ship your app on scratch — a completely empty filesystem.
Before (Still Too Big)
FROM golang:1.22
WORKDIR /app
COPY . .
RUN go build -o server .
EXPOSE 8080
CMD ["./server"]
Image size: ~850 MB
Yes, 850 MB. The golang base image includes the entire Go toolchain, git, gcc, and a full Debian installation. For a 15 MB binary.
After (Multi-Stage to Scratch)
# ---- Build Stage ----
FROM golang:1.22-bookworm AS builder
WORKDIR /app
# Copy go.mod and go.sum first for dependency caching
COPY go.mod go.sum ./
RUN go mod download
# Copy source code
COPY . .
# Build a statically linked binary
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
go build -ldflags="-w -s" -o /server .
# ---- Production Stage ----
FROM scratch
# Import CA certificates for HTTPS
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
# Copy the binary
COPY --from=builder /server /server
EXPOSE 8080
ENTRYPOINT ["/server"]
Image size: ~12 MB
From 850 MB to 12 MB. That’s a 98.6% reduction. Your Go binary and some SSL certificates. That’s it. That’s the entire image.
The -ldflags="-w -s" strips debug information and symbol tables, saving another few MB. CGO_ENABLED=0 ensures a fully static binary with no C library dependencies, which is required for the scratch base image (since scratch has literally nothing in it — no shell, no libc, no ls, nothing).
When Scratch Is Too Bare
If you need a shell for debugging, timezone data, or a non-root user (scratch doesn’t have /etc/passwd), use gcr.io/distroless/static-debian12 instead:
FROM gcr.io/distroless/static-debian12 AS production
COPY --from=builder /server /server
EXPOSE 8080
ENTRYPOINT ["/server"]
Image size: ~15 MB
Distroless gives you the absolute minimum a Linux userland needs: CA certs, timezone data, and passwd/group files. No shell, no package manager, no attack surface. Google uses these in production. You probably should too.
Image Size Comparison: The Scoreboard
| Language | Single-Stage | Multi-Stage (slim) | Multi-Stage (minimal) | Savings |
|---|---|---|---|---|
| Node.js | 1,100 MB | 220 MB | 130 MB (alpine) | 88% |
| Python | 1,050 MB | 180 MB | 150 MB (slim) | 86% |
| Go | 850 MB | 15 MB (distroless) | 12 MB (scratch) | 98.6% |
If those numbers don’t make you want to refactor your Dockerfiles right now, I don’t know what will. That’s real bandwidth, real storage, real cold-start time you’re saving. In a Kubernetes cluster with autoscaling, the difference between a 1 GB image and a 15 MB image is the difference between your pods coming up in 2 seconds and your pods coming up “eventually, probably.”
Build Cache Optimization: Layer Ordering Matters
Docker caches each layer. When a layer changes, every layer after it is invalidated. This means the order of your COPY and RUN instructions matters enormously.
Bad Layer Ordering
FROM node:20-slim AS builder
WORKDIR /app
COPY . . # <-- Source code changes invalidate EVERYTHING below
RUN npm ci # <-- Reinstalls all deps every time you change a typo
RUN npm run build
Every time you change a single line of code, Docker reinstalls all your dependencies. If your npm ci takes 45 seconds, that’s 45 seconds of your life you’ll never get back, multiplied by every build, forever.
Good Layer Ordering
FROM node:20-slim AS builder
WORKDIR /app
COPY package.json package-lock.json ./ # <-- Changes rarely
RUN npm ci # <-- Cached unless deps change
COPY . . # <-- Source changes only bust this and below
RUN npm run build
Now dependency installation is cached until you actually change package.json or package-lock.json. Source code changes only invalidate the COPY . . and RUN npm run build layers.
This pattern applies to every language:
Python:
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
Go:
COPY go.mod go.sum ./
RUN go mod download
COPY . .
Rust:
COPY Cargo.toml Cargo.lock ./
RUN mkdir src && echo "fn main() {}" > src/main.rs
RUN cargo build --release
COPY src/ ./src/
RUN touch src/main.rs && cargo build --release
(Rust’s version is slightly cursed because Cargo needs a valid src/main.rs to resolve dependencies. We create a dummy one, build to cache deps, then overwrite with real source. It’s ugly but it works.)
The .dockerignore File: Your First Line of Defense
Before you even think about multi-stage builds, create a .dockerignore file. It’s like .gitignore but for Docker build contexts.
Without it, COPY . . sends everything to the Docker daemon — including your .git directory (which can easily be 100+ MB), node_modules (which you’re about to reinstall anyway), test files, documentation, IDE configs, and that embarrassing TODO.md with “fix this later” written forty times.
# .dockerignore
# Version control
.git
.gitignore
# Dependencies (will be installed fresh in the build)
node_modules
__pycache__
*.pyc
.venv
vendor/
# Build artifacts
dist
build
*.egg-info
# IDE and editor files
.vscode
.idea
*.swp
*.swo
*~
# Docker files (no need to send these into the build context)
Dockerfile*
docker-compose*
.dockerignore
# Documentation and misc
*.md
LICENSE
docs/
# Environment files (NEVER ship these)
.env
.env.*
*.pem
*.key
# Tests (usually not needed in production)
tests/
test/
__tests__/
*.test.js
*.spec.js
coverage/
.nyc_output/
A good .dockerignore can reduce your build context from hundreds of MB to a few MB, which means faster builds even before Docker processes a single instruction.
Advanced: Using Build Arguments and Targets
Multi-stage builds pair beautifully with build arguments and target stages for different environments.
# ---- Base Stage ----
FROM node:20-bookworm-slim AS base
WORKDIR /app
COPY package.json package-lock.json ./
# ---- Development Stage ----
FROM base AS development
RUN npm install
COPY . .
CMD ["npm", "run", "dev"]
# ---- Build Stage ----
FROM base AS builder
RUN npm ci
COPY . .
RUN npm run build
RUN npm prune --omit=dev
# ---- Test Stage ----
FROM builder AS test
RUN npm install --include=dev
RUN npm test
# ---- Production Stage ----
FROM node:20-bookworm-slim AS production
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./
RUN groupadd -r appuser && useradd -r -g appuser appuser
RUN chown -R appuser:appuser /app
USER appuser
EXPOSE 3000
CMD ["node", "dist/index.js"]
Now you can target specific stages:
# Run in development mode with hot reload
docker build --target development -t myapp:dev .
# Run tests in CI
docker build --target test -t myapp:test .
# Build production image
docker build --target production -t myapp:prod .
One Dockerfile, three purposes. No more maintaining Dockerfile.dev, Dockerfile.test, and Dockerfile.prod that inevitably drift apart.
Common Gotchas and How to Dodge Them
1. Forgetting CA Certificates
If your Go binary makes HTTPS requests and you’re using scratch, it will fail with certificate errors. Always copy CA certs:
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
2. Timezone Data Missing
Applications that deal with timezones need /usr/share/zoneinfo. Either copy it from the build stage or use distroless:
COPY --from=builder /usr/share/zoneinfo /usr/share/zoneinfo
3. Running as Root
Just because scratch doesn’t have adduser doesn’t mean you have to run as root. You can set the user by UID:
USER 65534:65534
(65534 is the nobody user on most Linux systems.)
4. Not Using .dockerignore
I already covered this, but I’m mentioning it again because it’s that important. If your build context is over 50 MB, you’re probably doing it wrong.
5. Copying node_modules from Host
# DON'T do this
COPY node_modules ./node_modules
Your host node_modules might have platform-specific binaries compiled for macOS. Your container is Linux. This will end in tears. Always install dependencies inside the Docker build.
The Security Angle
Multi-stage builds aren’t just about size — they’re about security too.
Every package in your image is a potential attack vector. That gcc in your production container? It means an attacker who gets code execution can compile exploits. That wget? Great for downloading malware. That python interpreter shipped with the full Node image? Lovely for running attack scripts.
A minimal production image has:
- Fewer packages to have CVEs
- No compilers for building exploits
- No shells for interactive access (with
scratchordistroless) - No package managers for installing tools
When your security team runs a vulnerability scan, a 1 GB image might flag 200+ CVEs (most in packages your app doesn’t use). A distroless or scratch image? Maybe 2-3, all in your actual dependencies.
A Quick Checklist Before You Go
Here’s your “did I actually optimize this” checklist:
- Using multi-stage builds (separate build and runtime stages)
- Using
-slim,-alpine,distroless, orscratchas the production base -
.dockerignorefile exists and excludes.git,node_modules, tests, docs - Dependencies are copied before source code (layer caching)
- Using
npm ciinstead ofnpm install(deterministic installs) - Dev dependencies are not in the production image
- Running as a non-root user
- No secrets or credentials baked into the image
- Health check defined (either
HEALTHCHECKinstruction or orchestrator-level)
Wrapping Up
Multi-stage Docker builds are one of those rare optimizations that are easy to implement and have massive payoff. You get smaller images, faster deployments, better security, and lower costs — all from reorganizing a file that’s usually under 30 lines.
If you take one thing away from this article, let it be this: your production image should contain your application and its runtime dependencies. Nothing else. Not your compiler, not your test framework, not your linter, and definitely not the 47,000 files in node_modules that exist solely because someone’s package depends on is-odd which depends on is-number.
Now go audit your Dockerfiles. I’ll wait.