Skip to content
Go back

Healthcheck vs Restart Policy: The Difference Matters

By SumGuy 5 min read
Healthcheck vs Restart Policy: The Difference Matters

The Confusion

You’ve got a container running. It crashes sometimes. You add a restart policy: restart_policy: always. Now it restarts automatically. Problem solved.

Then you notice something weird. The container is running, but it’s not working. It responds with 502 errors. You check the logs — it’s stuck in an infinite restart loop, barely staying up long enough for your health checks to pass.

You needed a healthcheck, not just a restart policy. Or maybe both, but they’re doing different things.

Restart Policy: “Is The Container Running?”

A restart policy answers one question: if the container exits, should we start it again?

docker-compose.yml
services:
app:
image: myapp:latest
restart_policy:
condition: on-failure
max_retries: 5

This says: if the container exits, restart it. But only retry 5 times.

The key word: exits. The container process actually stops. The container itself terminates.

Common restart policies:

This is blunt. It’s “if the process dies, bring it back up.” But what if the process is running but completely broken? What if it’s stuck in an infinite loop? What if it’s consuming 100% CPU and hanging?

The restart policy won’t help. The container is still running.

Healthcheck: “Is The Container Healthy?”

A healthcheck answers a different question: is the container actually functioning?

services:
app:
image: myapp:latest
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s

This healthcheck runs curl http://localhost:8080/health every 30 seconds. If it fails 3 times in a row, the container is marked as “unhealthy.”

But note: marking it unhealthy doesn’t automatically restart the container. You need a restart policy for that.

The states:

Why You Need Both

A real example: your Node.js app has a memory leak. It’s running. It’s accepting connections. But it’s using 5GB of RAM and responding slowly.

You need both:

services:
app:
image: myapp:latest
# If the process dies, restart it
restart_policy:
condition: on-failure
max_retries: 5
# Monitor if it's actually healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
# When healthcheck fails, kill and restart
# (Docker doesn't do this automatically — you need orchestration)

Wait, there’s a problem. Docker marks the container unhealthy, but it doesn’t automatically restart it. You need container orchestration (Docker Swarm, Kubernetes, etc.) to actually restart unhealthy containers.

Without orchestration, a failing healthcheck just sets a flag.

Docker Compose Only (No Orchestration)

If you’re just using Docker Compose locally or on a single server, you can only rely on restart policies, not healthchecks.

services:
app:
image: myapp:latest
restart_policy:
condition: on-failure
max_retries: 5

The healthcheck tells you (via docker ps) that something’s wrong, but the container won’t restart on its own.

To get automatic restarts based on health, you need:

Docker Swarm + Healthcheck

If you’re using Swarm (not Compose), you can get automatic restarts:

Terminal window
docker service create \
--name myapp \
--health-cmd="curl -f http://localhost:8080/health || exit 1" \
--health-interval=30s \
--health-timeout=10s \
--health-retries=3 \
myapp:latest

Swarm monitors the healthcheck and replaces unhealthy tasks.

Kubernetes Version

In Kubernetes, this is called a “liveness probe”:

apiVersion: v1
kind: Pod
metadata:
name: myapp
spec:
containers:
- name: app
image: myapp:latest
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 40
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3

If the probe fails 3 times, Kubernetes kills the pod and starts a new one.

Writing A Good Healthcheck

Your healthcheck should be realistic. It should test the actual thing that matters.

Bad healthcheck:

healthcheck:
test: ["CMD", "test", "-f", "/tmp/app.pid"]

This just checks if a PID file exists. The app could be hung, and this would still pass.

Good healthcheck:

healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]

Makes an actual HTTP request. The app has to respond.

Better healthcheck:

health.sh
#!/bin/bash
# Check if the process is running
if ! pgrep -f "python app.py" > /dev/null; then
exit 1
fi
# Check if HTTP endpoint responds
if ! curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health | grep -q "200"; then
exit 1
fi
# Check if database is responsive
if ! python -c "import psycopg2; psycopg2.connect('dbname=mydb')" 2>/dev/null; then
exit 1
fi
exit 0

Then in Compose:

healthcheck:
test: /app/health.sh
interval: 30s
timeout: 10s
retries: 3

Debugging Healthcheck Issues

Check the status:

Terminal window
docker ps

Look for the STATUS column. It shows like: Up 5 minutes (healthy) or Up 2 minutes (unhealthy).

Or inspect:

Terminal window
docker inspect <container> --format='{{json .State.Health}}'

Shows something like:

{
"Status": "unhealthy",
"FailingStreak": 3,
"Log": [
{
"Start": "2026-01-18T10:30:00Z",
"End": "2026-01-18T10:30:05Z",
"ExitCode": 1,
"Output": "curl: (7) Failed to connect"
}
]
}

The Output tells you why it failed.

The Bottom Line

Set them both up, and your containers will recover from most failures without human intervention.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it may appear here.


Previous Post
RAG on a Budget: Building a Knowledge Base with Ollama & ChromaDB
Next Post
mTLS Explained: When Regular TLS Isn't Paranoid Enough

Related Posts