Container monitoring is a way to collect metrics and track the health of containerized applications and microservices architectures. This process can be difficult due to the ephemeral nature of containers and the limitations of traditional application performance monitoring tools.
The goal of container monitoring is to ensure that container workloads are performing as expected and running smoothly. Container monitoring is a subset of container observability, which also includes log analysis, notifications, and tracing.
This is part of an extensive series of guides about [microservices].
In this article
As containers grow in popularity, new challenges emerge that can expose businesses to financial damages, lost productivity, and regulatory penalties. Monitoring can help with early identification of performance issues, software bugs, and security incidents, and provide the information teams need to remediate these issues.
Container monitoring also helps prevent outages by reducing the mean time to recovery (MTTR) of performance issues and providing data to support the overall health of your applications. The ability to automatically raise alerts, monitor time series data, and troubleshoot issues improves the user experience and ultimately, business outcomes.
Monitoring provides a holistic view of containerized infrastructure. Through data aggregation and visualization, you can optimize functionality and better identify the root cause of performance issues.
Logging is an essential part of any computing system. Without logging, it is difficult to identify and troubleshoot problems. A log is a record of events that occurred over a period of time.
Logging is a mechanism for capturing and recording information about a program so that it can be monitored and debugged, while the program is running, and afterwards. Logging helps developers understand what their code is doing, and helps application owners and maintainers understand how systems operate and behave at runtime.
There are several logging methods for containers, including Docker’s logging driver, Fluentd, and rsyslog. These tools can be used to troubleshoot container or log data issues.
Observability builds on logging to provide deep visibility into distributed systems, automating and accelerating problem identification and resolution. Observability platforms continuously instrument and collect telemetry data, and make it easy to add instrumentation to existing applications and infrastructure components.
Learn more in our detailed guide to container observability (coming soon)
Containers add a new dynamic layer to your infrastructure. Performance monitoring tools should be able to automatically discover all running containers, pick up container deployment changes immediately, and update them in real time to map hosts.
A container orchestration tool, such as Kubernetes, deploys containers on the most appropriate hosts in your cluster. Containers move from one host to another for the purposes of horizontally scaling. This means that monitoring tools need to determine which host is running which container, and be able to persist log data even after containers shut down.
Containers use multiple tiers of resources, and it can be complex to monitor and identify resource utilization issues. Containers often run within clusters managed by an orchestrator like Kubernetes. In this case, there are multiple resource constraints:
When containerized applications experience performance issues or downtime, it can sometimes be difficult to pinpoint the cause, because these problems can originate from one or even several components in the containerized environment.
Unlike traditional application logs, container logs use console output streams called stdout and stderr. Containers use a logging driver to collect logs and deliver them to a destination.
When containers run in a cluster, each container has a different log stream stderr and stdout. To monitor application logs, you must parse them individually and combine them to get a full picture of the cluster. It is also important to identify the source of the logs (i.e. which log belongs to which container) and add necessary metadata such as container ID and container name.
Learn more in our detailed guide to container logging (coming soon)
Distributed tracing is a technique for tracking the execution of a distributed system, such as a microservices architecture or a containerized environment. It allows administrators to understand how different components of the system interact with each other and how requests are processed as they flow through the system.
Distributed tracing is important for container monitoring because it can help administrators understand the performance and behavior of containerized applications and the underlying infrastructure. By tracking the execution of a distributed system, administrators can identify bottlenecks and latency issues, understand the impact of changes to the system, and troubleshoot problems more effectively.
Distributed tracing works by inserting trace instrumentation into the code of the system being monitored. This instrumentation generates trace data as the system executes, which is then collected and analyzed by the tracing system. Distributed tracing tools typically provide features such as visualization and analysis, which can help administrators understand and analyze the trace data.
Container monitoring tools monitor running containers, collect container activity logs, and analyze data to provide insight into container performance. These tools typically provide the following features and capabilities:
Benefits that container monitoring solutions provide include:
Learn more in our detailed guide to container monitoring tools (coming soon)
A good monitoring system should provide an overview of the entire application with information about each component. Here are a few things to consider when choosing a container monitoring solution:
Large enterprises may need to use more than one tool to monitor different containerized applications. Before deciding which container monitoring tool or combination of tools is right for your business, it’s important to identify the metrics you need to monitor, and your proposed continuous monitoring workflow.
The biggest difference between containers and virtual machines in terms of monitoring is the need to shift the focus from individual containers to a pod or an entire cluster. Containerized applications are often built as microservices. As such, individual containers can only account for a small portion of the infrastructure’s performance.
In most cases, you run multiple containers of the same microservice for availability and scalability. Similarly, viewing individual containers can generate misleading information. So, focus on monitoring a specific set of containers as a unit. That being said—sometimes you need to see metrics for a specific container to debug a specific issue.
Traffic between containers
For containers, network traffic is much more complex than in monolithic applications. It’s important to understand how container network traffic flows, and monitor it accordingly.
For containers, network traffic between containers on the same machine is just as significant as network traffic between different machines. In some cases, much more traffic goes from one container to another than from one machine to another. Therefore, it is essential to monitor traffic between containers, whether they are on the same machine or different machines.
API traffic
Another aspect of container network monitoring is that microservices typically communicate with each other via a REST API. When monitoring HTTP response codes, it’s important to note that a high number of 5xx errors doesn’t necessarily mean your customers are directly impacted, because these errors can occur in container-to-container traffic.
Visualizing the network mesh
Finally, containers can create very complex network meshes, so it’s important to understand which microservices communicate with which microservices. This is less important for traditional infrastructure monitoring, where you could focus monitoring based on the VM name or network segment.
For containers, you need to create a full service map to visualize the traffic between your microservices. This gives you a better understanding of your network traffic and helps identify anomalous, unwanted, or malicious traffic.
Containerized systems also require a different approach to alerting. For example, in a VM-based deployment, it is common to receive alerts when a VM restarts—this is not a good idea for containers, which are created and destroyed very frequently. Part of the container orchestrator’s job is to move containers to different nodes based on various factors.
This means that container starts and stops are not abnormal, so you usually don’t need to be notified. However, you should receive notifications if a container restarts too many times (in Kubernetes this is known as CrashLoopBackOff).
The same applies to resource usage. Depending on your setup, you might have autocaling mechanisms that go into action when containers use too many resources on a machine. In this case, containers that use more resources on a machine can be handled by autoscaling. You should be alerted when the autoscaling mechanism fails to find a suitable machine for the container, or if resource utilization is extreme, which could indicate an error or cyber attack.
Containers are ephemeral by design. This means that the data you have depreciates over time, making real time analysis important. Monitoring tools that provide data visualization can come in useful.
Monitoring tools provide a graphical interface to help uncover important changes and anomalies. There are also advanced monitoring solutions that combine machine learning (ML) with automatic alerting, ensuring accurate, timely and accurate reporting of incidents to all parties involved.
Lumigo is a cloud native observability tool, purpose-built to navigate the complexities of microservices. Through agentless automated tracing, Lumigo stitches together asynchronous requests across the many distributed components that make up a cloud native app. From ECS to third party APIs Lumigo visualizes requests in one complete view, and monitors every service that a request passes through. Leveraging the end-to-end observability that Lumigo provides, as well as the many features that make debugging container apps easy, developers have everything they need to find and fix errors and issues fast:
With Lumigo users can:
Get started with a free trial of Lumigo for your microservice applications
Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of microservices.
Authored by Lumigo
Authored by Lumigo
Authored by CodeSee