Skip to content
Go back

Private Docker Registry with Harbor

By SumGuy 9 min read
Private Docker Registry with Harbor

Docker Hub is fine, until it isn’t

You’re deep in a CI/CD run. Build passes. Pipeline hits the push step. Docker Hub rate-limits you — again. Your deploy is blocked because a free-tier policy decided your IP looked too eager.

Or maybe you’re shipping proprietary software and the idea of your container images living on someone else’s servers makes your security team’s eye twitch. Or your images are 4 GB and pulling them across the internet on every deploy is burning time and money you don’t have.

Whatever brought you here, you’re about to set up a private container registry that doesn’t apologize for existing.

Enter Harbor.

Why not just registry:2?

You could spin up Docker’s official registry:2 image in about four minutes. It runs, it stores images, it technically does the job. It also has no UI, no real access control, no vulnerability scanning, and about as much operational visibility as a black box nailed to your server rack.

Harbor is what happens when someone looked at registry:2 and said “this needs to be a real product.” It’s a CNCF graduated project — not a weekend experiment — running in production at companies with actual uptime requirements.

Here’s what Harbor adds:

Featureregistry:2Harbor
Web UINoYes
RBACNoYes
Vulnerability scanningNoYes (Trivy)
Image replicationNoYes
Garbage collectionManualScheduled + UI
Audit logsNoYes
Robot accountsNoYes
Helm chart repoNoYes

The operational difference is roughly the same as between sqlite3 at a terminal and a proper database with a dashboard. One is a tool, one is a platform.

Why self-host a registry at all?

Quick case for the skeptics:

Privacy. Proprietary code, baked-in configs, anything you’d rather not have on public infrastructure — it stays yours.

Performance. Pulling from your own network is dramatically faster than pulling from a remote registry. In a Kubernetes cluster that scales frequently, this compounds.

CI/CD speed. No more rate limits, no more waiting on Docker Hub’s infrastructure during your build rush hour. Your pipeline pulls what it needs, immediately.

Cost. Cloud registries charge for storage, egress, and sometimes per-image. Fixed-cost hardware you already own is just better math.

Compliance. Some industries have regulations about where data lives. “It’s on Docker Hub somewhere” is not an answer that satisfies auditors.

Deploying Harbor with Docker Compose

Harbor ships an official installer that generates a Compose stack for you. Download the offline installer — it bundles everything and doesn’t depend on Docker Hub during setup, which is ironic but practical.

Terminal window
wget https://github.com/goharbor/harbor/releases/download/v2.11.0/harbor-offline-installer-v2.11.0.tgz
tar xzvf harbor-offline-installer-v2.11.0.tgz
cd harbor
cp harbor.yml.tmpl harbor.yml

Edit harbor.yml before running anything:

harbor.yml
hostname: registry.yourdomain.com
https:
port: 443
certificate: /etc/harbor/certs/registry.yourdomain.com.crt
private_key: /etc/harbor/certs/registry.yourdomain.com.key
harbor_admin_password: ChangeThisNow
database:
password: AlsoChangeThis
data_volume: /data/harbor
log:
level: info
rotate_count: 50
rotate_size: 200m
location: /var/log/harbor

The data_volume is where all your image layers live. Make sure that path has room — images add up fast and disk space surprises are never fun at 3 AM.

Run the installer with Trivy enabled:

Terminal window
sudo ./install.sh --with-trivy

That --with-trivy flag enables the vulnerability scanner. It takes a few minutes to load everything. When it’s done, Harbor is running as a Compose stack. Hit https://registry.yourdomain.com in your browser, log in as admin, and you’re in.

To bring it back up after a reboot:

Terminal window
cd /path/to/harbor
docker compose up -d

Harbor manages its own internal Compose file — you don’t edit it directly.

Configuring Docker to use your registry

If you have a valid TLS cert from a recognized CA, Docker trusts it out of the box:

Terminal window
docker login registry.yourdomain.com
docker tag myapp:latest registry.yourdomain.com/myproject/myapp:latest
docker push registry.yourdomain.com/myproject/myapp:latest

For home lab setups with self-signed certs, install your CA on each Docker host:

Terminal window
sudo mkdir -p /etc/docker/certs.d/registry.yourdomain.com/
sudo cp ca.crt /etc/docker/certs.d/registry.yourdomain.com/
sudo systemctl restart docker

Or, if you truly can’t do TLS (lab-only, never production), tell Docker to trust it as an insecure registry:

/etc/docker/daemon.json
{
"insecure-registries": ["registry.yourdomain.com"]
}
Terminal window
sudo systemctl restart docker

The insecure route works but don’t let that config drift into production machines. Certificate trust is the correct path — it just takes five more minutes up front.

RBAC: projects, users, and robot accounts

Harbor organizes images into projects. A project is a namespace — registry.yourdomain.com/myproject/myapp lives in myproject. Permissions are set per project.

The roles you’ll actually use:

RoleCapabilities
GuestPull only
DeveloperPush and pull
MaintainerPush, pull, delete tags
Project AdminFull project control

For human users, create accounts in the UI (or wire up LDAP/OIDC for enterprise auth). For CI/CD pipelines, use robot accounts — they’re purpose-built for automation and don’t depend on any human’s credentials.

Create a robot account: Project → Robot Accounts → New Robot Account. Name it, set an expiry, choose permissions. You get back a username and a token you’ll only see once — store it in your CI secret manager immediately.

Terminal window
# CI/CD login with a robot account
# Note the dollar sign — it's part of the username format
docker login registry.yourdomain.com \
--username 'robot$myproject+ci-runner' \
--password "$HARBOR_ROBOT_TOKEN"

That dollar sign in the username trips people up constantly in shell scripts and YAML. Single-quote the username whenever possible.

Scope robot accounts to exactly what they need. Your build pipeline needs push access to one project — give it that. Nothing more.

Vulnerability scanning with Trivy

If you installed with --with-trivy, Harbor scans images on push. You’ll see vulnerability reports in the UI per tag — Critical, High, Medium, Low — with CVE IDs, affected packages, and whether a fix is available.

Enable automatic scanning: Project → Configuration → “Automatically scan images on push.” Toggle it on.

To block pulls of vulnerable images: Project → Configuration → “Prevent vulnerable images from running,” set a threshold. Set it to Critical at minimum — anything with a known critical CVE shouldn’t be deployable without a conscious decision to override it.

You can also trigger scans via the API, which is useful for gating CI:

Terminal window
curl -u "robot\$myproject+ci-runner:$HARBOR_TOKEN" \
-X POST \
"https://registry.yourdomain.com/api/v2.0/projects/myproject/repositories/myapp/artifacts/latest/scan"

Even just having the scan data visible is valuable without enforcement. You push an image, check Harbor, see “12 medium CVEs in the base OS layer” — that’s actionable next sprint. The data is there. Use it.

Image replication

Harbor can sync images between registries, which is useful for:

Set up a replication endpoint: Administration → Registries → New Endpoint. Add Docker Hub as a source. Then create a replication rule — pull-based (Harbor fetches on schedule) or push-based (Harbor pushes on trigger).

A practical setup: replicate your base images (ubuntu, python, node, alpine) from Docker Hub into Harbor on a nightly schedule. Your builds pull from Harbor. Docker Hub’s rate limits become someone else’s problem.

Name: dockerhub-base-cache
Source: docker.io
Filter: library/ubuntu, library/python, library/node
Destination: /base-images/
Trigger: Scheduled — 0 2 * * *

For staging-to-prod replication, use event-based triggers — push to staging, Harbor automatically mirrors to prod registry. Your prod deploy just pulls what’s already local.

The practical workflow

Here’s the end-to-end flow in a real pipeline:

.gitlab-ci.yml
build:
stage: build
script:
- docker login registry.yourdomain.com
-u 'robot$myapp+ci' -p "$HARBOR_TOKEN"
- docker build -t registry.yourdomain.com/myapp/api:$CI_COMMIT_SHA .
- docker push registry.yourdomain.com/myapp/api:$CI_COMMIT_SHA
- docker tag registry.yourdomain.com/myapp/api:$CI_COMMIT_SHA
registry.yourdomain.com/myapp/api:latest
- docker push registry.yourdomain.com/myapp/api:latest
deploy:
stage: deploy
script:
- ssh deploy@prod
"docker pull registry.yourdomain.com/myapp/api:$CI_COMMIT_SHA"
- ssh deploy@prod "docker compose up -d"

Your prod server pulls from your registry — no Docker Hub in the loop, no human credentials on that machine (robot account, scoped to pull-only), and every image was scanned before it landed there.

Build → Harbor (scanned on push) → CI gate (check scan results) → prod pull → deploy. That’s the loop. Once it’s running, it mostly hums along without needing attention.

Common gotchas

Certificate trust on every Docker host. This is the one people forget. Harbor works on your laptop, works in CI, fails mysteriously on the prod server because you didn’t install the cert there. Automate cert distribution as part of your infra provisioning — Ansible, cloud-init, whatever you’re using. One less thing to debug at deployment time.

Disk space is now your responsibility. Harbor doesn’t delete old image layers automatically. Deleted tags free the manifest, not the blobs — blobs wait for garbage collection. Configure GC: Administration → Clean Up. Schedule it weekly during off-hours. Also set tag retention rules per project — keep the last 10 tags, drop the rest. Your disk will thank you.

Garbage collection takes Harbor briefly read-only. While GC runs, pushes are blocked. Not long, but plan for it. Run it during a maintenance window or at 2 AM when your pipeline isn’t active.

Robot account tokens expire. If you set an expiry (you should), set a calendar reminder when you create the account. A pipeline that suddenly can’t push because the robot token expired is a fun incident to explain at standup.

Replication lag for new base images. If Harbor is your pull-through cache for Docker Hub, a brand-new base image won’t be in Harbor yet on first use. The first build after updating a base image will still hit Docker Hub. Warm the cache as part of your base image update process — pull it manually to Harbor, then your builds pick it up locally.

robot$ username handling in shells. The dollar sign is literal. Single-quote it in bash, escape it in YAML, handle it carefully in environment variable interpolation. Every environment handles this slightly differently and it will catch you at least once.

Is it worth it?

If you’re running more than one or two services in any kind of production-adjacent context, yes. The combination of actual access control, built-in scanning, and not being at Docker Hub’s mercy is worth the few hours to set it up.

Harbor runs comfortably on modest hardware — 2 vCPUs, 4 GB RAM, and whatever disk your images need. It’s not a resource hog. Once it’s running, it mostly gets out of your way.

Your future self — the one who’s not debugging a failed deploy because Docker Hub decided your pull count was suspicious — will appreciate it.


Share this post on:

Send a Webmention

Written about this post on your own site? Send a webmention and it'll show up above once verified.


Previous Post
Using AI to Find Security Bugs in Your Code
Next Post
Alert Fatigue: Why Your Alerts Are Meaningless

Discussion

Powered by Garrul . Sign in with GitHub or Google, or post anonymously.

Related Posts