Docker Healthcheck Patterns That Actually Work

Your container’s serving requests, but the health check thinks it’s dead. So Docker kills it. Then it restarts. Then it checks again. Restart. Repeat.

Health checks matter. Let’s do them right.

The HEALTHCHECK Instruction

Every Dockerfile can define a health check:

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

What each flag does:

—interval=30s — Run the check every 30 seconds (default)
—timeout=3s — The check must complete within 3 seconds (default)
—start-period=5s — Grace period before checks start (app startup time)
—retries=3 — Fail after 3 consecutive check failures (default)

After --retries failures, Docker marks the container as unhealthy. Note: Docker doesn’t automatically kill unhealthy containers. Orchestrators (Swarm, Kubernetes) do that. For single containers, unhealthy just means “the status says so.”

Check Types: What Works

curl (Most Common)

HEALTHCHECK --interval=10s --timeout=2s --retries=2 \
  CMD curl -f http://localhost:8080/health || exit 1

The -f flag exits nonzero if HTTP status is >= 400. Simple and effective.

Gotcha: curl must be installed in the image. Alpine images usually have it, but lightweight Python images don’t.

wget (Lightweight)

HEALTHCHECK --interval=10s --timeout=2s --retries=2 \
  CMD wget --quiet --tries=1 --spider http://localhost:8080/health || exit 1

Exists in most lightweight images. --spider doesn’t download the body, just checks the status.

native tools (Best)

If your app has a built-in health endpoint in the binary, use it:

HEALTHCHECK --interval=10s --timeout=2s --retries=2 \
  CMD ["/app/server", "health-check"]

Or if your language has a standard check tool:

# Python
HEALTHCHECK --interval=10s --timeout=2s --retries=2 \
  CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8080/health').getcode() == 200"

# Node.js
HEALTHCHECK --interval=10s --timeout=2s --retries=2 \
  CMD node -e "require('http').get('http://localhost:8080/health', (r) => {if (r.statusCode !== 200) throw new Error(r.statusCode)})"

nc (netcat) for TCP

Just checking if a port is open:

HEALTHCHECK --interval=10s --timeout=2s --retries=2 \
  CMD nc -z localhost 8080 || exit 1

Works but doesn’t verify the app actually works, just that the port is listening.

Good vs Bad Health Checks

Bad: Checking too much

# Don't do this
HEALTHCHECK CMD curl http://localhost/users && curl http://localhost/products && curl http://localhost/orders

If one endpoint is slow, the whole check times out and the container crashes. Overkill.

Good: Simple and focused

HEALTHCHECK --interval=10s --timeout=2s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

One lightweight endpoint that returns quickly.

Bad: Checking external dependencies

# Don't do this
HEALTHCHECK CMD curl http://external-api.example.com/status || exit 1

If the external service is down, your container gets killed. That’s not a health issue, that’s a dependency issue.

Good: Check yourself

HEALTHCHECK --interval=10s --timeout=2s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

The /health endpoint can check internal state (database connection, queues, etc.) without external calls.

Tuning Parameters

Interval: How often to check?

API servers: 10-15s
Databases: 30s
Batch workers: 60s (they might be working on something)
Long-running tasks: 120s+

Default 30s is usually fine.

Timeout: How long to wait?

Quick endpoints: 1-2s
Database checks: 5s
Slow queries: 10s+

Default 3s works for most HTTP endpoints.

Start-period: Grace period on startup

Java apps: 30-60s (JVM startup is slow)
Python: 5-10s
Go: 1-2s
Databases: 10-30s (init + recovery)

This is critical. If checks start before the app is ready, you’ll see false failures.

# Java app with slow startup
HEALTHCHECK --interval=10s --timeout=5s --start-period=60s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1

Retries: How many failures trigger unhealthy?

API servers: 2-3 (fail fast, but tolerate brief hiccups)
Databases: 5+ (tolerate maintenance operations)
Critical services: 1-2 (fail immediately)

Using Health Status in Compose

depends_on with health checks:

version: '3.8'
services:
  db:
    image: postgres:latest
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

  api:
    image: myapi:latest
    depends_on:
      db:
        condition: service_healthy

Now the API won’t start until Postgres is actually ready, not just running.

Checking Health Status

docker ps --format='table {{.Names}}\t{{.Status}}'
# NAMES      STATUS
# api        Up 2 minutes (healthy)
# db         Up 2 minutes (unhealthy)

docker inspect mycontainer | jq '.State.Health'
# {
#   "Status": "healthy",
#   "FailingStreak": 0,
#   "Runs": [
#     {
#       "Start": "2025-02-26T15:30:00.123Z",
#       "End": "2025-02-26T15:30:01.456Z",
#       "ExitCode": 0,
#       "Output": ""
#     }
#   ]
# }

A Real Example: Node.js API

FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .

# Health endpoint built into the app
HEALTHCHECK --interval=15s --timeout=3s --start-period=10s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

EXPOSE 3000
CMD ["node", "server.js"]

In your Node app, include a simple health endpoint:

app.get('/health', (req, res) => {
  res.status(200).json({ status: 'ok' });
});

In docker-compose:

services:
  api:
    build: .
    depends_on:
      redis:
        condition: service_healthy
    environment:
      REDIS_HOST: redis

  redis:
    image: redis:alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 2s
      retries: 3

Common Mistakes

No start-period: Container fails health checks while booting. Add --start-period.

Timeout too short: Check always times out. Increase --timeout.

Checking external services: Container dies when your ISP hiccups. Check yourself only.

Not implementing an endpoint: If your app doesn’t have /health, add one. It’s 3 lines of code.

Health checks are your first line of defense. Get them right, and you won’t wake up to a page at 2 AM because Docker killed your container for no reason.