docker

Membangun Docker Swarm Production-Ready dengan Traefik, Prometheus, dan Hardening Keamanan

Membangun Docker Swarm Production-Ready dengan Traefik, Prometheus, dan Hardening Keamanan

Mengapa Docker Swarm Masih Relevan di 2024

Banyak yang bilang Kubernetes sudah menang. Benar untuk skala enterprise, tapi untuk tim kecil hingga menengah yang butuh orchestration tanpa kompleksitas kubectl dan helm chart yang never-ending, Docker Swarm tetap pilihan solid. Saya sudah jalanin Swarm di production untuk 3 client berbeda — e-commerce, SaaS B2B, dan media streaming — selama 2+ tahun. Uptime? 99.97% tanpa dedicated DevOps team.

Artikel ini bukan teori. Ini checklist aktual yang saya pakai setiap spin up cluster baru. Kita akan cover:

  • Bootstrap cluster 3-node dengan TLS mutual authentication
  • Traefik v3 sebagai ingress dengan Let's Encrypt wildcard & mTLS
  • Stack monitoring: Prometheus, Grafana, Loki, cAdvisor
  • Hardening: rootless containers, secrets management, network segmentation
  • Backup/restore strategy untuk Swarm state

Arsitektur Cluster: 3 Manager + 2 Worker Minimum

Jangan pakai single manager. Kalau node manager mati, cluster tidak bisa schedule task baru walau worker masih hidup. Minimum 3 manager untuk quorum (Raft consensus butuh majority: 2 dari 3). Worker bisa 2 atau lebih tergantung workload.

Spesifikasi Node (Contoh Production)

RoleCPURAMDiskOS
manager-1,2,34 vCPU8 GB50 GB SSDUbuntu 22.04 LTS
worker-1,28 vCPU16 GB100 GB NVMeUbuntu 22.04 LTS

Semua node di VPC private (10.0.0.0/16), hanya manager-1 yang punya floating IP publik untuk SSH bastion. Inter-node communication lewat WireGuard mesh — lebih simple dari VXLAN overlay default Docker.

Step 1: Persiapan OS & Kernel Hardening

Jalankan di semua node sebelum install Docker:

# /etc/sysctl.d/99-swarm-hardening.conf
# Network tuning untuk high throughput
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 1024 65535

# Security
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
kernel.yama.ptrace_scope = 2
kernel.kptr_restrict = 2
vm.max_map_count = 262144  # untuk Elasticsearch/OpenSearch kalau dipakai
fs.file-max = 2097152
sysctl --system

# Install dependencies
apt update && apt install -y \
  ca-certificates curl gnupg lsb-release \
  wireguard wireguard-tools \
  jq htop iotop nethogs \
  ufw fail2ban

# Disable swap (wajib untuk k8s, recommended untuk Swarm)
swapoff -a
sed -i '/ swap / s/^/#/' /etc/fstab

Step 2: Docker Engine Rootless + TLS

Rootless mode = container jalan sebagai user biasa, bukan root. Kalau container escape, attacker cuma dapet user privileges. Setup per node:

# Buat user dedicated untuk Docker
useradd -m -s /bin/bash dockeruser
usermod -aG docker dockeruser

# Install Docker rootless
su - dockeruser
curl -fsSL https://get.docker.com/rootless | sh

# Tambah ke PATH
echo 'export PATH=/home/dockeruser/bin:$PATH' >> ~/.bashrc
echo 'export DOCKER_HOST=unix:///run/user/1000/docker.sock' >> ~/.bashrc
source ~/.bashrc

# Verifikasi
docker version --format '{{.Server.Os}}/{{.Server.Arch}}: rootless={{.Server.Rootless}}'

Generate TLS Certificates untuk Mutual Auth

Swarm butuh CA, cert manager, dan cert worker. Pakai cfssl biar reproducible:

# Di manager-1 (CA server)
mkdir -p ~/swarm-tls && cd ~/swarm-tls

cat > ca-config.json <'EOF'
{
  "signing": {
    "default": { "expiry": "87600h" },
    "profiles": {
      "swarm": {
        "usages": ["signing", "key encipherment", "server auth", "client auth"],
        "expiry": "87600h"
      }
    }
  }
}
EOF

cat > ca-csr.json <'EOF'
{
  "CN": "Swarm CA",
  "key": { "algo": "rsa", "size": 4096 },
  "names": [{ "O": "Tool Kuy", "OU": "Swarm Cluster" }]
}
EOF

cfssl gencert -initca ca-csr.json | cfssljson -bare ca

# Generate cert untuk setiap node (ulangi untuk manager-2, manager-3, worker-1, worker-2)
cat > node-csr.json <'EOF'
{
  "CN": "manager-1",
  "hosts": ["manager-1", "10.0.1.11", "127.0.0.1"],
  "key": { "algo": "rsa", "size": 4096 },
  "names": [{ "O": "Tool Kuy", "OU": "Swarm Manager" }]
}
EOF

cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=swarm node-csr.json | cfssljson -bare manager-1

Distribusikan ca.pem, manager-1.pem, manager-1-key.pem ke /home/dockeruser/.docker/tls/ di masing-masing node. Set permission 600 untuk private key.

Step 3: Bootstrap Swarm dengan TLS

# Di manager-1
docker swarm init \
  --advertise-addr 10.0.1.11 \
  --cert-expiry 87600h \
  --external-ca cfssl \
  --ca-cert /home/dockeruser/.docker/tls/ca.pem \
  --ca-key /home/dockeruser/.docker/tls/ca-key.pem \
  --dispatcher-heartbeat 5s

# Output: join tokens untuk manager & worker
# Simpan token-manager dan token-worker di password manager (Bitwarden/1Password)
# Di manager-2 & manager-3
docker swarm join \
  --token SWMTKN-1-xxx-manager-token \
  --advertise-addr 10.0.1.12 \
  --ca-cert /home/dockeruser/.docker/tls/ca.pem \
  --ca-key /home/dockeruser/.docker/tls/ca-key.pem \
  10.0.1.11:2377

# Di worker-1 & worker-2
docker swarm join \
  --token SWMTKN-1-xxx-worker-token \
  --advertise-addr 10.0.1.21 \
  --ca-cert /home/dockeruser/.docker/tls/ca.pem \
  10.0.1.11:2377

Verifikasi:

docker node ls
# Harus muncul 5 node, 3 Ready/Reachable/Leader (manager), 2 Ready/Reachable (worker)

Step 4: Network Overlay Tersegmentasi

Jangan pakai single overlay network untuk semua service. Pisahkan per security zone:

# Network untuk ingress (Traefik only)
docker network create \
  --driver overlay \
  --attachable \
  --opt encrypted=true \
  --subnet 10.10.10.0/24 \
  ingress-net

# Network untuk monitoring (Prometheus, Grafana, Loki)
docker network create \
  --driver overlay \
  --opt encrypted=true \
  --subnet 10.10.20.0/24 \
  monitoring-net

# Network untuk database (internal only, no ingress)
docker network create \
  --driver overlay \
  --opt encrypted=true \
  --internal \
  --subnet 10.10.30.0/24 \
  database-net

# Network untuk backend services
docker network create \
  --driver overlay \
  --opt encrypted=true \
  --subnet 10.10.40.0/24 \
  backend-net

Flag --opt encrypted=true enable IPsec encryption antar node (AES-GCM). Flag --internal pada database-net bikin network tidak punya gateway ke luar — container di sini tidak bisa keluar internet, hanya komunikasi internal.

Step 5: Traefik v3 Stack dengan mTLS & Let's Encrypt Wildcard

Traefik v3 (release Oktober 2024) bawa native OTel metrics, improved middleware, dan better Kubernetes CRD support. Config sebagai Docker stack:

# traefik-stack.yml
version: "3.9"

services:
  traefik:
    image: traefik:v3.0
    command:
      # Entrypoints
      - "--entrypoints.web.address=:80"
      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
      - "--entrypoints.web.http.redirections.entrypoint.scheme=https"
      - "--entrypoints.websecure.address=:443"
      - "--entrypoints.metrics.address=:9100"
      
      # Providers
      - "--providers.docker=true"
      - "--providers.docker.swarmmode=true"
      - "--providers.docker.exposedbydefault=false"
      - "--providers.docker.network=ingress-net"
      - "--providers.docker.watch=true"
      
      # Certificates (Let's Encrypt wildcard via DNS challenge)
      - "[email protected]"
      - "--certificatesresolvers.le.acme.storage=/letsencrypt/acme.json"
      - "--certificatesresolvers.le.acme.dnschallenge=true"
      - "--certificatesresolvers.le.acme.dnschallenge.provider=cloudflare"
      - "--certificatesresolvers.le.acme.wildcarddomains=*.toolkuy.id"
      
      # mTLS untuk service-to-service
      - "--entrypoints.websecure.http.tls=true"
      - "--entrypoints.websecure.http.tls.certificatesresolver=le"
      - "--entrypoints.websecure.http.tls.domains[0].main=toolkuy.id"
      - "--entrypoints.websecure.http.tls.domains[0].sans=*.toolkuy.id"
      
      # Metrics & Tracing
      - "--metrics.prometheus=true"
      - "--metrics.prometheus.entrypoint=metrics"
      - "--tracing.opentelemetry=true"
      - "--tracing.opentelemetry.address=otel-collector:4317"
      
      # Security headers default
      - "--entrypoints.websecure.http.middlewares=secHeaders@docker"
      
      # Access log format JSON untuk Loki
      - "--accesslog=true"
      - "--accesslog.format=json"
      - "--accesslog.fields.headers.defaultmode=keep"
      - "--accesslog.fields.headers.names.User-Agent=keep"
      - "--accesslog.fields.headers.names.X-Forwarded-For=keep"
      
      # Pilot token (optional, untuk dashboard Traefik)
      - "--pilot.token=${TRAEFIK_PILOT_TOKEN}"

    ports:
      - target: 80
        published: 80
        mode: host
        protocol: tcp
      - target: 443
        published: 443
        mode: host
        protocol: tcp
      - target: 9100
        published: 9100
        mode: host
        protocol: tcp

    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - traefik-letsencrypt:/letsencrypt
      - /home/dockeruser/.docker/tls:/certs:ro

    networks:
      - ingress-net
      - monitoring-net

    deploy:
      mode: global
      placement:
        constraints:
          - node.role == manager
      resources:
        limits:
          cpus: "1.0"
          memory: 512M
        reservations:
          cpus: "0.25"
          memory: 128M
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
      update_config:
        parallelism: 1
        delay: 10s
        order: start-first
      labels:
        # Middleware security headers
        - "traefik.http.middlewares.secHeaders.headers.sslRedirect=true"
        - "traefik.http.middlewares.secHeaders.headers.forceSTSHeader=true"
        - "traefik.http.middlewares.secHeaders.headers.STSSeconds=31536000"
        - "traefik.http.middlewares.secHeaders.headers.STSIncludeSubdomains=true"
        - "traefik.http.middlewares.secHeaders.headers.STSPreload=true"
        - "traefik.http.middlewares.secHeaders.headers.contentTypeNosniff=true"
        - "traefik.http.middlewares.secHeaders.headers.browserXssFilter=true"
        - "traefik.http.middlewares.secHeaders.headers.referrerPolicy=strict-origin-when-cross-origin"
        - "traefik.http.middlewares.secHeaders.headers.customFrameOptionsValue=SAMEORIGIN"
        - "traefik.http.middlewares.secHeaders.headers.customRequestHeaders.X-Forwarded-Proto=https"
        
        # Rate limiting global
        - "traefik.http.middlewares.ratelimit.ratelimit.average=1000"
        - "traefik.http.middlewares.ratelimit.ratelimit.burst=2000"

    secrets:
      - cloudflare-api-token
      - traefik-pilot-token

networks:
  ingress-net:
    external: true
  monitoring-net:
    external: true

volumes:
  traefik-letsencrypt:

secrets:
  cloudflare-api-token:
    external: true
  traefik-pilot-token:
    external: true

Deploy:

# Buat secrets dulu
echo "your-cloudflare-api-token" | docker secret create cloudflare-api-token -
echo "your-traefik-pilot-token" | docker secret create traefik-pilot-token -

docker stack deploy -c traefik-stack.yml traefik

Label Service untuk Traefik Auto-Discovery

💬 Kolom komentar sedang dimuat...