Container Monitoring Tools: 6 Great Tools and How to Choose

Container monitoring tools can help developers and DevOps teams monitor activity of running containers, collect logs from containers, and analyze data to provide observability and insight into container performance.

A critical element of container monitoring is that containers are transient resources. In other words, they are destroyed when the operation is complete or the application needs to scale down. Therefore, log data must be continuously collected and moved to a central location to prevent data loss. Container monitoring solutions provide visibility and real-time insight into these highly dynamic container environments.

This is part of a series of articles about container monitoring.

In this article:

In this article

6 Container Monitoring Tools You Should Know

1. Lumigo

Lumigo is a cloud native observability tool, purpose-built to navigate the complexities of microservices. Through automated distributed tracing, Lumigo stitches together the many components of a containerized application and tracks every service in a request. When an error or failure occurs, users will see not only the impacted service, but the entire request in one visual map so you can easily understand the root cause, limit impact and prevent future failures.

With deep debugging data in to applications and infrastructure, developers have all the information they need to monitor and troubleshoot their containers with out any of the manual work:

Automatic correlation of logs, metrics and traces into end-to-end visualization of requests and full system map of applications
Monitor and debug third party APIs and managed services (ex. Amazon DynamoDB, Twilio, Stripe)
Go from alert (in Slack, PagerDuty and other workflow tools) to root cause analysis in one click
Understand system behavior and explore performance and cost issues

Get started with a free trial of Lumigo for your microservice applications

2. Prometheus

License: Apache

GitHub: https://github.com/prometheus/prometheus

Prometheus is a mature and very popular open source tool that performs monitoring for dynamic containerized environments. It is one of only a few projects that received Graduated status from the Cloud Native Computing Foundation (CNCF) program. Prometheus was originally created and open sourced by SoundCloud to simplify the process of retrieving numerical metrics from a given metric endpoint and organize them into a time series.

Prometheus has three main components:

Exporters are self-contained processes that can run on target resources to generate and export metrics via the Metrics API.
Prometheus server performs service discovery, fetches the metrics from exporters, and stores them in the Prometheus database for later visualization or alerting.
Alertmanager is responsible for setting up alert rules, analyzing data in Prometheus DB, and sending alert messages to multiple recipients when certain rules are triggered.

Prometheus has become the industry standard for monitoring cloud-native architectures. It is known for its simple service discovery, ease of use, powerful alerting capabilities, and strong integration with Kubernetes.

However, many in the industry find the Prometheus polling architecture to be problematic. Every metric endpoint must be accessible from the Prometheus server. However, Prometheus does provide a push gateway for pushing metrics to the server instead of polling them.

3. cAdvisor

License: Apache

GitHub: https://github.com/google/cadvisor

Google’s Container Advisor (cAdvisor) is an open source tool for monitoring Docker containers. It is a runtime daemon that collects, aggregates, and exports resource usage and performance data for target containers.

cAdvisor is useful for monitoring resource isolation parameters, historical resource usage, and generating historical data histograms. This data is stored globally for each container for easy analysis of historical performance.

The cAdvisor build is provided as an image that can be installed on a Docker host. cAdvisor provides two interfaces:

A web UI—for users who want to monitor Docker containers directly.
A REST API—for users who want to integrate metrics with external applications through web service endpoints.

4. Grafana

License: GNU 3

GitHub: https://github.com/grafana/grafana

Grafana is an open source metric analysis and visualization suite. It lets you build custom dashboards with data from multiple sources, including Prometheus, Elasticsearch, MySQL, Postgres, and Redis. Grafana also has its own alert system and role-based software access control (RBAC) system.

Grafana is well known to Prometheus users, due to its ability to effectively visualize metrics stored in Prometheus. Grafana has dozens of custom dashboards, some official and many created by the community, built for many types of data sources. This makes it easy for users to set up dashboards and start monitoring a variety of metrics.

5. Elasticsearch & Kibana

License: Apache

GitHub: https://github.com/elastic/elasticsearch

Elasticsearch is an open source search engine based on the Lucene library. It provides a distributed, multi-tenant full-text search engine with an HTTP web interface and schemaless JSON documents. Elasticsearch was written in Java, and makes it easy to store, search, and analyze data at scale.

Kibana is a free, open user interface for visualizing Elasticsearch data and exploring the Elastic Stack. It does everything from tracking query load to understanding how requests flow through an application.

Kibana’s core comes with basic visualizations like histogram, line, pie and sunburst, and lets you search any document.

Put together, Elasticsearch and Kibana provide a flexible backend for monitoring Docker container logs. However, like Prometheus and Grafana, they require initial setup and configuration steps, as well as ongoing upgrades and maintenance. This can be time consuming, especially if you are not familiar with these tools.

6. Jaeger

License: Apache

GitHub: https://github.com/jaegertracing/jaeger

Jaeger is an end-to-end distributed tracing solution, open sourced by Uber Engineering, and currently in Incubation status in the CNCF. It allows you to monitor and troubleshoot transactions in complex distributed systems.

The main challenge addressed by Jaeger is the difficulty of observability in modern microservices architectures. For example, when a service fails, it is not known how requests travel between services over the network to complete a single business transaction—this makes debugging very difficult.

Jaeger uses tracing to enable root cause analysis, performance and latency optimization, and distributed transaction monitoring. It comes with Istio, a popular open source service mesh solution, out of the box.

How to Choose the Right Monitoring Tool for You

There are several open source and commercial container observability tools available. Most of them can effectively log, monitor, and trace containerized environments. The main difference between them is the level of effort required to set up, configure, and maintain the solution.

Here are key considerations for choosing the right monitoring tool:

Coverage of collected metrics—some tools collect only a few metrics, some collect many metrics you don’t really need, and others might allow customization to add the required metrics. When you work under pressure to troubleshoot in production, the lack of relevant metrics can be frustrating. At the same time, if there are too many indicators or something is wrong, it’s hard to find the important signals. Try to find a tool with a good number of default metrics, and the ability to customize and add special metrics you collect.
Coverage of log formats—a typical application stack consists of several components: a database, a web server, and a message queue. Make sure you can collect logs from your applications, not just from containers themselves.. This is an important logging best practice, which lets you get insights from applications as well as troubleshoot them.
Collection of events—knowing why a service is restarting or crashing allows you to quickly triage the problem and find the root cause faster. Therefore, prefer container monitoring tools that can collect events directly from container runtimes and can consume Kubernetes health events.
Correlation of metrics, logs, and traces—a good container monitoring tool provides easy access to all the observability data—whether it was collected through metrics, logs, or traces. A single UI that displays data from different sources is the key to interactive drill-down, quick troubleshooting, and fast recovery from production problems.
Machine learning and anomaly detection—threshold-based alerts are only effective for known, persistent workloads. They can also generate too much noise in a dynamic environment. Make sure the solution you choose has basic threshold alerts but also allows anomaly detection based on machine learning, which is more flexible and has lower false positives. Ensure the system doesn’t take too long to learn the baseline and does not require significant tweaking and training.

Get started with a free trial of Lumigo for your microservice applications