Mengapa Docker Swarm Masih Relevan di 2024
Banyak yang bilang Kubernetes sudah menang. Benar untuk skala enterprise, tapi untuk tim kecil hingga menengah yang butuh orchestration tanpa kompleksitas kubectl dan helm chart yang never-ending, Docker Swarm tetap pilihan solid. Saya sudah jalanin Swarm di production untuk 3 client berbeda — e-commerce, SaaS B2B, dan media streaming — selama 2+ tahun. Uptime? 99.97% tanpa dedicated DevOps team.
Artikel ini bukan teori. Ini checklist aktual yang saya pakai setiap spin up cluster baru. Kita akan cover:
- Bootstrap cluster 3-node dengan TLS mutual authentication
- Traefik v3 sebagai ingress dengan Let's Encrypt wildcard & mTLS
- Stack monitoring: Prometheus, Grafana, Loki, cAdvisor
- Hardening: rootless containers, secrets management, network segmentation
- Backup/restore strategy untuk Swarm state
Arsitektur Cluster: 3 Manager + 2 Worker Minimum
Jangan pakai single manager. Kalau node manager mati, cluster tidak bisa schedule task baru walau worker masih hidup. Minimum 3 manager untuk quorum (Raft consensus butuh majority: 2 dari 3). Worker bisa 2 atau lebih tergantung workload.
Spesifikasi Node (Contoh Production)
| Role | CPU | RAM | Disk | OS |
|---|---|---|---|---|
| manager-1,2,3 | 4 vCPU | 8 GB | 50 GB SSD | Ubuntu 22.04 LTS |
| worker-1,2 | 8 vCPU | 16 GB | 100 GB NVMe | Ubuntu 22.04 LTS |
Semua node di VPC private (10.0.0.0/16), hanya manager-1 yang punya floating IP publik untuk SSH bastion. Inter-node communication lewat WireGuard mesh — lebih simple dari VXLAN overlay default Docker.
Step 1: Persiapan OS & Kernel Hardening
Jalankan di semua node sebelum install Docker:
# /etc/sysctl.d/99-swarm-hardening.conf
# Network tuning untuk high throughput
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 1024 65535
# Security
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
kernel.yama.ptrace_scope = 2
kernel.kptr_restrict = 2
vm.max_map_count = 262144 # untuk Elasticsearch/OpenSearch kalau dipakai
fs.file-max = 2097152
sysctl --system
# Install dependencies
apt update && apt install -y \
ca-certificates curl gnupg lsb-release \
wireguard wireguard-tools \
jq htop iotop nethogs \
ufw fail2ban
# Disable swap (wajib untuk k8s, recommended untuk Swarm)
swapoff -a
sed -i '/ swap / s/^/#/' /etc/fstab
Step 2: Docker Engine Rootless + TLS
Rootless mode = container jalan sebagai user biasa, bukan root. Kalau container escape, attacker cuma dapet user privileges. Setup per node:
# Buat user dedicated untuk Docker
useradd -m -s /bin/bash dockeruser
usermod -aG docker dockeruser
# Install Docker rootless
su - dockeruser
curl -fsSL https://get.docker.com/rootless | sh
# Tambah ke PATH
echo 'export PATH=/home/dockeruser/bin:$PATH' >> ~/.bashrc
echo 'export DOCKER_HOST=unix:///run/user/1000/docker.sock' >> ~/.bashrc
source ~/.bashrc
# Verifikasi
docker version --format '{{.Server.Os}}/{{.Server.Arch}}: rootless={{.Server.Rootless}}'
Generate TLS Certificates untuk Mutual Auth
Swarm butuh CA, cert manager, dan cert worker. Pakai cfssl biar reproducible:
# Di manager-1 (CA server)
mkdir -p ~/swarm-tls && cd ~/swarm-tls
cat > ca-config.json <'EOF'
{
"signing": {
"default": { "expiry": "87600h" },
"profiles": {
"swarm": {
"usages": ["signing", "key encipherment", "server auth", "client auth"],
"expiry": "87600h"
}
}
}
}
EOF
cat > ca-csr.json <'EOF'
{
"CN": "Swarm CA",
"key": { "algo": "rsa", "size": 4096 },
"names": [{ "O": "Tool Kuy", "OU": "Swarm Cluster" }]
}
EOF
cfssl gencert -initca ca-csr.json | cfssljson -bare ca
# Generate cert untuk setiap node (ulangi untuk manager-2, manager-3, worker-1, worker-2)
cat > node-csr.json <'EOF'
{
"CN": "manager-1",
"hosts": ["manager-1", "10.0.1.11", "127.0.0.1"],
"key": { "algo": "rsa", "size": 4096 },
"names": [{ "O": "Tool Kuy", "OU": "Swarm Manager" }]
}
EOF
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=swarm node-csr.json | cfssljson -bare manager-1
Distribusikan ca.pem, manager-1.pem, manager-1-key.pem ke /home/dockeruser/.docker/tls/ di masing-masing node. Set permission 600 untuk private key.
Step 3: Bootstrap Swarm dengan TLS
# Di manager-1
docker swarm init \
--advertise-addr 10.0.1.11 \
--cert-expiry 87600h \
--external-ca cfssl \
--ca-cert /home/dockeruser/.docker/tls/ca.pem \
--ca-key /home/dockeruser/.docker/tls/ca-key.pem \
--dispatcher-heartbeat 5s
# Output: join tokens untuk manager & worker
# Simpan token-manager dan token-worker di password manager (Bitwarden/1Password)
# Di manager-2 & manager-3
docker swarm join \
--token SWMTKN-1-xxx-manager-token \
--advertise-addr 10.0.1.12 \
--ca-cert /home/dockeruser/.docker/tls/ca.pem \
--ca-key /home/dockeruser/.docker/tls/ca-key.pem \
10.0.1.11:2377
# Di worker-1 & worker-2
docker swarm join \
--token SWMTKN-1-xxx-worker-token \
--advertise-addr 10.0.1.21 \
--ca-cert /home/dockeruser/.docker/tls/ca.pem \
10.0.1.11:2377
Verifikasi:
docker node ls
# Harus muncul 5 node, 3 Ready/Reachable/Leader (manager), 2 Ready/Reachable (worker)
Step 4: Network Overlay Tersegmentasi
Jangan pakai single overlay network untuk semua service. Pisahkan per security zone:
# Network untuk ingress (Traefik only)
docker network create \
--driver overlay \
--attachable \
--opt encrypted=true \
--subnet 10.10.10.0/24 \
ingress-net
# Network untuk monitoring (Prometheus, Grafana, Loki)
docker network create \
--driver overlay \
--opt encrypted=true \
--subnet 10.10.20.0/24 \
monitoring-net
# Network untuk database (internal only, no ingress)
docker network create \
--driver overlay \
--opt encrypted=true \
--internal \
--subnet 10.10.30.0/24 \
database-net
# Network untuk backend services
docker network create \
--driver overlay \
--opt encrypted=true \
--subnet 10.10.40.0/24 \
backend-net
Flag --opt encrypted=true enable IPsec encryption antar node (AES-GCM). Flag --internal pada database-net bikin network tidak punya gateway ke luar — container di sini tidak bisa keluar internet, hanya komunikasi internal.
Step 5: Traefik v3 Stack dengan mTLS & Let's Encrypt Wildcard
Traefik v3 (release Oktober 2024) bawa native OTel metrics, improved middleware, dan better Kubernetes CRD support. Config sebagai Docker stack:
# traefik-stack.yml
version: "3.9"
services:
traefik:
image: traefik:v3.0
command:
# Entrypoints
- "--entrypoints.web.address=:80"
- "--entrypoints.web.http.redirections.entrypoint.to=websecure"
- "--entrypoints.web.http.redirections.entrypoint.scheme=https"
- "--entrypoints.websecure.address=:443"
- "--entrypoints.metrics.address=:9100"
# Providers
- "--providers.docker=true"
- "--providers.docker.swarmmode=true"
- "--providers.docker.exposedbydefault=false"
- "--providers.docker.network=ingress-net"
- "--providers.docker.watch=true"
# Certificates (Let's Encrypt wildcard via DNS challenge)
- "[email protected]"
- "--certificatesresolvers.le.acme.storage=/letsencrypt/acme.json"
- "--certificatesresolvers.le.acme.dnschallenge=true"
- "--certificatesresolvers.le.acme.dnschallenge.provider=cloudflare"
- "--certificatesresolvers.le.acme.wildcarddomains=*.toolkuy.id"
# mTLS untuk service-to-service
- "--entrypoints.websecure.http.tls=true"
- "--entrypoints.websecure.http.tls.certificatesresolver=le"
- "--entrypoints.websecure.http.tls.domains[0].main=toolkuy.id"
- "--entrypoints.websecure.http.tls.domains[0].sans=*.toolkuy.id"
# Metrics & Tracing
- "--metrics.prometheus=true"
- "--metrics.prometheus.entrypoint=metrics"
- "--tracing.opentelemetry=true"
- "--tracing.opentelemetry.address=otel-collector:4317"
# Security headers default
- "--entrypoints.websecure.http.middlewares=secHeaders@docker"
# Access log format JSON untuk Loki
- "--accesslog=true"
- "--accesslog.format=json"
- "--accesslog.fields.headers.defaultmode=keep"
- "--accesslog.fields.headers.names.User-Agent=keep"
- "--accesslog.fields.headers.names.X-Forwarded-For=keep"
# Pilot token (optional, untuk dashboard Traefik)
- "--pilot.token=${TRAEFIK_PILOT_TOKEN}"
ports:
- target: 80
published: 80
mode: host
protocol: tcp
- target: 443
published: 443
mode: host
protocol: tcp
- target: 9100
published: 9100
mode: host
protocol: tcp
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- traefik-letsencrypt:/letsencrypt
- /home/dockeruser/.docker/tls:/certs:ro
networks:
- ingress-net
- monitoring-net
deploy:
mode: global
placement:
constraints:
- node.role == manager
resources:
limits:
cpus: "1.0"
memory: 512M
reservations:
cpus: "0.25"
memory: 128M
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
update_config:
parallelism: 1
delay: 10s
order: start-first
labels:
# Middleware security headers
- "traefik.http.middlewares.secHeaders.headers.sslRedirect=true"
- "traefik.http.middlewares.secHeaders.headers.forceSTSHeader=true"
- "traefik.http.middlewares.secHeaders.headers.STSSeconds=31536000"
- "traefik.http.middlewares.secHeaders.headers.STSIncludeSubdomains=true"
- "traefik.http.middlewares.secHeaders.headers.STSPreload=true"
- "traefik.http.middlewares.secHeaders.headers.contentTypeNosniff=true"
- "traefik.http.middlewares.secHeaders.headers.browserXssFilter=true"
- "traefik.http.middlewares.secHeaders.headers.referrerPolicy=strict-origin-when-cross-origin"
- "traefik.http.middlewares.secHeaders.headers.customFrameOptionsValue=SAMEORIGIN"
- "traefik.http.middlewares.secHeaders.headers.customRequestHeaders.X-Forwarded-Proto=https"
# Rate limiting global
- "traefik.http.middlewares.ratelimit.ratelimit.average=1000"
- "traefik.http.middlewares.ratelimit.ratelimit.burst=2000"
secrets:
- cloudflare-api-token
- traefik-pilot-token
networks:
ingress-net:
external: true
monitoring-net:
external: true
volumes:
traefik-letsencrypt:
secrets:
cloudflare-api-token:
external: true
traefik-pilot-token:
external: true
Deploy:
# Buat secrets dulu
echo "your-cloudflare-api-token" | docker secret create cloudflare-api-token -
echo "your-traefik-pilot-token" | docker secret create traefik-pilot-token -
docker stack deploy -c traefik-stack.yml traefik