6 Essential Kubernetes Monitoring Best Practices

What Is Kubernetes Monitoring?

Kubernetes monitoring involves monitoring the various components of a Kubernetes cluster, including the nodes, pods, containers, and services, as well as the overall health of the cluster itself. It is typically done using a variety of tools and technologies, including specialized monitoring tools such as Prometheus and Grafana, as well as logging tools like Elasticsearch and Fluentd.

Kubernetes monitoring helps you ensure that your applications are running smoothly and that any issues or failures are detected and resolved quickly. By monitoring your Kubernetes cluster, you can identify resource constraints, service failures, and application crashes, and take steps to address these issues and others before they impact your users or your business.

This is part of a series of articles about Kubernetes monitoring.

In this article

Stop Measuring Individual Containers

While monitoring individual containers can provide valuable insights, it’s important to focus on service-level or application-level metrics for a more complete view of your application’s health. This means tracking metrics such as request latency, error rates, and resource usage at the service or application level.

By focusing on these higher-level metrics, you can get a better understanding of how your application is performing as a whole and identify issues that may be impacting multiple containers or services.

Always Alert on High Disk Usage

Disk space can quickly become a bottleneck in Kubernetes, especially when running large-scale applications. Monitoring disk usage and setting up alerts when disk usage reaches a certain threshold can help you avoid running out of disk space, which can cause applications to fail or crash. It’s recommended to set up alerts that trigger when disk usage reaches 75% or higher, giving you time to address the issue before it becomes critical.

Create an Effective Instrumentation Strategy

An instrumenting strategy is a plan for collecting and analyzing data about your application’s performance. It involves setting up monitoring tools and collecting metrics that can help you gain insights into how your application is running and where there may be issues or opportunities for optimization.

When creating an instrumenting strategy for Kubernetes, there are a few key factors to consider:

Metrics: Determine which metrics you want to collect and how often you want to collect them. This may include CPU and memory usage, network traffic, request latency, error rates, and more.
Monitoring tools: Decide which monitoring tools you want to use to collect and analyze your metrics. Popular choices for Kubernetes monitoring include Prometheus, Grafana, and Elasticsearch, among others.
Data storage: Determine where you want to store your monitoring data. This may include a time-series database, such as Prometheus or InfluxDB, or a log aggregator, such as Elasticsearch or Splunk.
Alerting: Decide which metrics you want to set up alerts for and how you want to be notified when an alert is triggered. This may include sending alerts to a chat platform, such as Slack, or sending an email or SMS message.

By creating a comprehensive instrumenting strategy for your Kubernetes cluster, you can gain valuable insights into how your application is performing and identify issues before they impact your users or your business. It can also help you optimize resource usage and ensure that your application is running at peak efficiency.

Monitor End-User Experience

Monitoring the end-user experience of your application can help you identify issues that may not be apparent from system-level metrics. For example, you may notice high error rates or slow page load times for a particular user segment, indicating a problem with a specific feature or component. It often involves monitoring application logs and analyzing user behavior to gain insights into how the application is being used and how it can be improved.

Monitor the Cloud Environment

When running Kubernetes in a cloud environment, you may have additional considerations to keep in mind, such as the use of load balancers, auto-scaling, and cloud-specific monitoring tools. Some best practices for monitoring Kubernetes in a cloud environment include:

Monitor cloud infrastructure metrics: In addition to Kubernetes metrics, you should also monitor the cloud infrastructure that supports your Kubernetes cluster, such as load balancers, virtual machines (VMs), and storage resources. Many cloud providers offer their own monitoring tools, such as Amazon CloudWatch or Google Cloud Monitoring, that can be used to monitor cloud infrastructure metrics.
Monitor auto-scaling: It’s important to monitor auto-scaling events to ensure that your cluster is scaling up and down appropriately. This can help you optimize resource usage and avoid running out of resources during peak demand.
Monitor security: Security is a critical consideration when running Kubernetes in a cloud environment. You should monitor your cluster for potential security threats, such as unauthorized access or network vulnerabilities. Cloud providers often offer security monitoring tools, such as AWS CloudTrail or Google Cloud Security Command Center, that can be used to monitor security events.

By monitoring both your Kubernetes cluster and the underlying cloud infrastructure, you can gain valuable insights into how your application is performing in a cloud environment and identify issues before they impact your users or your business.

Use Out-of-the-box Dashboards and Alerts

Dashboards are visual representations of your monitoring data that provide a quick and easy way to view key performance indicators (KPIs) and identify issues. They often include graphs, charts, and tables that display metrics such as CPU usage, memory usage, network traffic, and more. Alerts are notifications that are triggered when certain conditions are met, such as when CPU usage exceeds a certain threshold or when a pod is in a non-running state.

Out-of-the-box dashboards and alerts are pre-built monitoring tools that come with most Kubernetes monitoring solutions. They provide a starting point for monitoring your Kubernetes cluster and can help you quickly get up and running with your monitoring strategy. You can customize these dashboards and alerts to suit your specific use case and ensure that you’re monitoring the metrics that are most important to your application’s performance.

Kubernetes Monitoring with Lumigo

Lumigo is a troubleshooting and monitoring platform, purpose-built for microservice-based applications. Developers using Kubernetes to orchestrate their containerized applications can use Lumigo to monitor, trace and troubleshoot issues fast. Deployed with zero-code changes and automated in one-click, Lumigo stitches together every interaction between micro and managed service into end-to-end stack traces. These traces, served alongside request payload data, give developers complete visibility into their container environments. Using Lumigo, developers get:

End-to-end virtual stack traces across every micro and managed service that makes up a serverless application, in context

API visibility that makes all the data passed between services available and accessible, making it possible to perform root cause analysis without digging through logs

Distributed tracing that is deployed with no code and automated in one click

Unified platform to explore and query across microservices, see a real-time view of applications, and optimize performance

To try Lumigo for Kubernetes, check out our Kubernetes operator on GitHub.