How To Set Up Docker Swarm Monitoring for Real Cluster Visibility

Docker Swarm is simple to stand up, but it gets harder to trust as the cluster grows. A service can look healthy from the manager node while tasks are restarting on worker nodes, memory is tightening on one host, or a failed node is still part of the routing path. Without clear cluster-wide monitoring, those issues stay hidden until they turn into user-facing failures.

This guide gives you a practical path to reliable Docker Swarm monitoring. It covers the crucial metrics to watch, how Docker Swarm logging monitoring fits into the picture, which open source tools are commonly used, and where Dokploy can reduce the setup and maintenance burden.

Why Docker Swarm monitoring is different from single-host monitoring

Docker Swarm spreads replicas across multiple nodes, so a local view is never enough.

You’re not only monitoring Docker containers on one server, you’re also tracking the health of a swarm cluster where services deployed on different worker nodes and manager nodes can fail, restart, or reschedule independently.

A node-level problem can stay invisible unless your monitoring stack collects data across the full cluster.

That distributed design also changes failure handling. A failing node may still receive traffic while its tasks quietly restart or fail to reach the desired replica count, so comprehensive visibility has to include both host-level and service-level signals.

Key metrics to monitor in a Docker Swarm cluster

A swarm problem is usually a cluster problem, so what are the crucial metrics to monitor?

At the infrastructure level, track the following per node and per container:

CPU, memory usage
Disk usage
Network throughput.

At the service level, focus on desired replicas versus running replicas, restart rates, and task scheduling failures.

Those metrics tell you whether resource utilization is healthy and whether the swarm is actually keeping services available.

Dokploy’s monitoring solution surfaces the same core signals in a real-time dashboard. The documented options include server and container refresh rates with a default of 20 seconds, automated cleanup through a cron job, retention controls, and per-service include and exclude filters.

The metrics port default is 4500, and for security, Dokploy protects metrics requests with a token.

Dokploy protects metrics requests with a token

That focus on visibility matches the broader container landscape.

CNCF’s 2023 annual survey says container use is greater than 90% among organizations piloting or evaluating containers, and it explicitly notes that monitoring and observability become more challenging as container counts rise.

Docker Swarm logging and monitoring

The previous section covered metrics, but metrics alone do not explain failures.

When it comes to managing both Docker Swarm logging and monitoring together, the job is to connect cluster symptoms to root causes.

Monitored metrics tell you that CPU is pinned, memory is rising, or a service is below its desired replica count.
Logs tell you whether the real issue is an application crash, an image pull error, a bad configuration, or a network timeout.

In a swarm, logs are emitted per container and per node, so root-cause analysis depends on central aggregation rather than hopping between hosts.

Docker supports multiple logging drivers for that routing layer, including json-file, syslog, gelf, and fluentd.

The default is json-file, but without rotation, this can consume significant disk space. For cluster logging, the usual pattern is to configure a remote driver such as syslog, gelf, or fluentd in daemon.json or at container start, then forward service logs to a central collector.

That setup gives you one place to analyze events across multiple services and nodes. With both metrics and logs in scope, the next step is choosing Docker Swarm monitoring tools.

Docker Swarm monitoring tools

The most common Docker Swarm monitoring open source stack is:

Prometheus for metrics collection
cAdvisor for container-level metrics
node_exporter for host-level metrics
Grafana for dashboards and alerting

Prometheus scrapes metrics endpoints over HTTP and stores time series data locally. cAdvisor exports container and machine-wide resource usage, while node_exporter is designed to monitor the host system itself.

For teams that want a quicker starting point, Swarmprom packages Prometheus, Grafana, and related components as a deployable Docker Swarm stack. It’s a common first monitoring stack for Swarm because it reduces the amount of manual wiring needed to deploy Prometheus, exporters, dashboards, and alert routing in one cluster-aware setup.

Deploying Prometheus and Grafana as a Swarm stack

A common choice is to run cAdvisor and node_exporter as global services, which means one replica per node. That way, every manager node and worker node automatically exposes targets for Prometheus to scrape, and new nodes join the monitoring surface as the cluster grows. Prometheus is then configured with scrape targets for those exporters, and Grafana sits on top to visualize the collected data.

A minimal Swarm pattern looks like this:

services:
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.49.1
    deploy:
      mode: global
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    privileged: true
    ports:
      - "8080:8080"
  node-exporter:
    image: prom/node-exporter:v1.8.0
    deploy:
      mode: global
    pid: host
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/rootfs'
    ports:
      - "9100:9100"

In production, you add the usual mounts, networks, configs, and Prometheus scrape configuration. Swarmprom provides a broader stack layout and includes the pieces needed for a ready-made deployment.

Alerting in Grafana

Once dashboards are in place, alerting closes the loop.

Grafana alerting can notify when CPU stays high, memory pressure crosses a threshold, or running replicas drop below the expected count.

Grafana has native contact points for destinations such as Slack and PagerDuty, so alert routing does not require a separate notification product.

Monitoring Docker Swarm with Dokploy

If building and maintaining that monitoring stack feels like a second project, Dokploy is the natural next option.

Dokploy offers a built-in monitoring section with real-time server and container metrics, 20-second default refresh intervals, retention settings managed by an automated cron job, include and exclude service filtering, threshold-based notifications, token-protected metrics requests, and port 4500 as the default metrics port.

For those who want the lower-level cluster detail, Dokploy’s Swarm API documents endpoints for swarm.getNodes, swarm.getNodeInfo, and swarm.getNodeApps with x-api-key authentication.

Dokploy's built-in monitoring dashboard

Dokploy's built-in monitoring dashboard is available on both self-hosted and Cloud deployments. The main distinction is that threshold-based alert notifications are currently a Cloud-only feature—self-hosted instances have full access to the metrics dashboard and configuration options, but alert routing to notification platforms requires the Cloud plan.

Conclusion

Docker Swarm monitoring only works when you treat the cluster as the unit of truth. You need cluster-wide metrics for CPU, memory, disk, network, replicas, restarts, and scheduling failures.

You also need centralized logging so the same dashboard that shows a problem can lead you to the reason it happened. Your choice of setup could be the difference between catching issues early and spending your time firefighting outages.

If you want to avoid building and maintaining a full observability stack from scratch, Dokploy is the faster path to evaluate. It gives you a documented monitoring workflow and a clear extension path, so you can get real-time swarm monitoring working with less operational setup than stitching Prometheus, exporters, dashboards, and routing together yourself.

Docker Swarm monitoring FAQs

What is Docker Swarm monitoring?

Docker Swarm monitoring is the practice of collecting and analyzing cluster-wide metrics, service health data, and logs across a Docker Swarm cluster so you can detect node failures, restart storms, replica drift, and resource pressure before they become outages.

What are the best Docker Swarm monitoring tools?

The most common open source stack is Prometheus, cAdvisor, node_exporter, and Grafana. Swarmprom is a popular way to deploy that stack on Docker Swarm with less manual setup. Dokploy is a good PaaS solution with built-in monitoring.

How does Docker Swarm logging monitoring work?

It works by routing logs emitted by containers on each node into a central collector using Docker logging drivers such as json-file, syslog, gelf, or fluentd. Metrics show that something broke. Centralized logs help you analyze why.

Is there an open source Docker Swarm monitoring option?

Yes. Dokploy is open source and includes a built-in monitoring dashboard in both its self-hosted and Cloud versions. A common open source stack is Prometheus, cAdvisor, node_exporter, and Grafana. Swarmprom packages these together as a ready-to-deploy Docker Swarm stack.