ECS Monitoring: Key Metrics and 5 Built-in Tools You Can Use

What Is Amazon Elastic Container Service (AWS ECS)?

Amazon Elastic Container Service (ECS) is a managed container orchestration service provided by Amazon Web Services (AWS). It allows developers to easily run and scale containerized applications in the cloud. ECS manages the underlying infrastructure, such as the container instances and the storage and networking resources, so that developers can focus on building and deploying their applications.

Developers can choose to run their containers on a fleet of EC2 instances that are managed by ECS. This allows developers to have more control over the underlying infrastructure and provides features like EC2 auto scaling.

Alternatively, developers can use ECS with Fargate, which allows them to run their containers without having to manage any EC2 instances. Fargate automatically provisions the necessary resources for running the containers and scales them as needed. This can make it easier for developers to get started with ECS and to focus on building and deploying their applications, rather than managing the underlying infrastructure.

This is part of a series of articles about AWS ECS.

In this article:

In this article

Monitoring Amazon ECS

There are several ways to monitor Amazon Elastic Container Service (ECS):

Using Amazon CloudWatch, which is a monitoring service provided by AWS. CloudWatch provides metrics and logs for ECS, such as CPU and memory usage for the container instances, the number of tasks and services running, and the number of containers that are being deployed or stopped.
Setting up CloudWatch alarms that can automatically trigger actions in response to changes in the metrics, such as sending a notification or scaling the number of container instances. This can help ensure that the ECS cluster is running smoothly and that any potential issues are addressed quickly.
Using the ECS Management Console, which provides a web-based user interface for managing and monitoring ECS clusters and tasks. The Management Console allows developers to view the status of the ECS cluster and the tasks that are running, and provides details such as the CPU and memory usage for each task.
Using the ECS API or the AWS CLI to access information about the ECS cluster and tasks programmatically, which can be useful for integrating monitoring into existing tools and processes.

Monitoring ECS on EC2

When running Amazon Elastic Container Service (ECS) on Amazon Elastic Compute Cloud (EC2), there are several key metrics that you should monitor in order to ensure that your containers are running smoothly and efficiently. Some of the most important metrics to monitor include:

CPU and memory usage: You should monitor the CPU and memory usage of your EC2 instances, as well as the individual containers running on them. This will help you identify any potential performance bottlenecks and determine whether you need to scale up your EC2 instances or adjust the resource allocation for your containers.
Task and container counts: You should monitor the number of tasks and containers running on your EC2 instances, as well as their status. This will help you ensure that all of your tasks are running as expected and that you are not running out of capacity on your EC2 instances.
Cluster and service health: You should monitor the overall health of your ECS clusters and services, including the number of healthy and unhealthy tasks, the availability of your services, and any errors or issues that are reported by your tasks.
Network performance: You should monitor the network performance of your EC2 instances and containers, including the amount of data transferred, the number of connections, and any errors or latency issues. This will help you ensure that your containers have adequate network resources and are able to communicate with each other and other services as needed.

Monitoring ECS on Fargate

Monitoring Amazon Elastic Container Service (ECS) on AWS Fargate is similar to monitoring ECS on Amazon Elastic Compute Cloud (EC2), but there are some key differences to be aware of. One of the main differences is that with ECS on Fargate, you do not have direct access to the underlying EC2 instances on which your containers are running. This means that you cannot use traditional server monitoring tools, such as those that collect operating system-level metrics, to monitor your Fargate tasks and services.

Instead, when monitoring ECS on Fargate, you must use tools and services that are specifically designed to work with containers, such as the Amazon CloudWatch Container Insights service. This service provides detailed metrics and logs for your Fargate tasks and services, including CPU and memory usage, network performance, and the number of running tasks and containers. You can use these metrics to monitor the performance and health of your Fargate environment, and set alarms and react to changes in your environment using CloudWatch.

Another key difference is that with ECS on Fargate, you do not have to worry about managing the underlying EC2 instances on which your containers are running. This means that you do not need to monitor the EC2 instances themselves, and can instead focus on monitoring the performance and health of your containers and the services that they are running.

What Metrics to Monitor in AWS ECS

You’ll want to monitor the following metrics in your ECS clusters:

CPUReservation: This metric represents the amount of CPU capacity that has been reserved for a task or service. Monitoring CPUReservation can help you ensure that your tasks have sufficient CPU resources to meet their needs and avoid resource contention.
CPUUtilization: This metric represents the percentage of CPU capacity that is being used by a task or service. Monitoring CPUUtilization can help you identify tasks or services that may be overloading the CPU, as well as identify opportunities to optimize resource utilization.
MemoryUtilization: This metric represents the percentage of memory that is being used by a task or service. Monitoring MemoryUtilization can help you identify tasks or services that may be using more memory than they need, as well as identify potential memory shortages that could impact performance.
Storage metrics: These metrics can help you identify issues with storage volumes and bind mounts. Common metrics include disk usage, disk read and write operations, disk read and write latency, and throughput.
I/O metrics: I/O metrics such as disk read and write operations, as well as network bytes sent and received, can be useful for identifying and troubleshooting performance issues related to input/output operations.
Network metrics: Network metrics such as packets sent and received, as well as network errors and retransmits, can be useful for identifying and troubleshooting network-related issues.

5 Built-In Tools for ECS Monitoring

Container Instance Health

Amazon Elastic Container Service (ECS) includes a built-in tool called Container Instance Health, which allows you to monitor the health of your Amazon ECS container instances. Container Instance Health provides visibility into the state of your container instances, including any issues or errors that may be affecting their ability to run tasks.

To use Container Instance Health, you can log in to the Amazon ECS console and navigate to the Container Instances page. This page shows a list of all of the container instances in your ECS environment, along with their current health status and any associated issues or errors. You can use this information to quickly identify any container instances that may be experiencing problems, and take action to resolve the issues.

CloudWatch

Amazon Elastic Container Service (ECS) integrates with Amazon CloudWatch, which is a monitoring service provided by AWS. With CloudWatch, you can collect and track metrics, set alarms, and automatically react to changes in your environment. You can use CloudWatch to monitor a variety of metrics for ECS, including CPU and memory usage, and the number of running tasks and containers.

To use CloudWatch to monitor your Amazon ECS environment, you can log in to the AWS Management Console and navigate to the CloudWatch dashboard. From here, you can view pre-configured dashboard widgets that show the key metrics for your ECS environment, such as the number of running tasks and containers, and the CPU and memory usage of your EC2 instances.

In addition to viewing these metrics on the dashboard, you can also use CloudWatch to create custom metrics and alarms. For example, you could create a metric that tracks the number of failed tasks in your ECS environment, and set up an alarm to notify you if the number of failed tasks exceeds a certain threshold. This can help you identify and respond to potential issues in your ECS environment in real-time.

CloudWatch Container Insights

Amazon Elastic Container Service (ECS) integrates with Amazon CloudWatch Container Insights, which is a monitoring service that provides detailed metrics and logs for your Amazon ECS tasks and services. With CloudWatch Container Insights, you can monitor the performance and health of your containers in real-time, and gain valuable insights into how your containers are running.

To use CloudWatch Container Insights to monitor your Amazon ECS environment, you can log in to the AWS Management Console and navigate to the CloudWatch dashboard. From here, you can view pre-configured dashboard widgets that show the key metrics for your ECS environment, such as the number of running tasks and containers, and the CPU and memory usage of your EC2 instances.

In addition to viewing these metrics on the dashboard, you can also use CloudWatch Container Insights to collect and view detailed logs for your ECS tasks and services. This can be helpful for troubleshooting issues with your containers, and for gaining a better understanding of how your containers are interacting with other services and resources in your environment.

Amazon EventBridge

Amazon Elastic Container Service (ECS) integrates with Amazon EventBridge, which is a serverless event bus service provided by AWS. With EventBridge, you can create rules that automatically trigger when certain events occur in your ECS environment. You can use these rules to create custom responses to events, such as sending an email or triggering a Lambda function.

To use EventBridge to monitor your Amazon ECS environment, you can log in to the AWS Management Console and navigate to the EventBridge dashboard. From here, you can create new rules that are triggered by events in your ECS environment, such as when a task is started or stopped, or when a task fails to run.

Once you have created your rules, EventBridge will automatically trigger the specified actions when the relevant events occur in your ECS environment. This can be helpful for responding to events in real-time, and for automating tasks and processes within your ECS environment.

AWS CloudTrail

Amazon Elastic Container Service (ECS) integrates with AWS CloudTrail, which is a service that logs and records all API calls made to your AWS account. With CloudTrail, you can track changes and actions taken within your ECS environment, and gain valuable insights into how your environment is being used.

To use CloudTrail to monitor your Amazon ECS environment, you can log in to the AWS Management Console and navigate to the CloudTrail dashboard. From here, you can view the log files generated by CloudTrail, which contain detailed information about the API calls made to your ECS environment.

You can use these log files to track changes and actions taken within your ECS environment, such as when tasks are started or stopped, or when new container instances are launched. This can be helpful for understanding how your environment is being used, and for troubleshooting any issues that may arise.

In addition to viewing the log files generated by CloudTrail, you can also use the service to set up alerts and notifications for specific events within your ECS environment. For example, you could create an alert that is triggered when a task fails to run, or when a container instance is terminated. This can help you stay informed about important events within your ECS environment and respond quickly to any potential issues.

AWS ECS with Lumigo

For all the benefits that AWS ECS brings to developing and running containers, these distributed applications still need observability to ensure they run at the highest performance, with the greatest reliability to deliver seamless customer experiences.

Lumigo is a cloud native observability platform purpose-built for microservice applications that provides deep visibility into applications and infrastructure, enabling users to easily monitor and troubleshoot their applications running on Amazon ECS.

Trace end-to-end applications running on Amazon ECS, AWS Lambda and consuming AWS services and 3rd party APIs
Easily monitor and debug ECS clusters and underlying services and tasks in real-time
Setup automatic alerts to notify you in Slack, Pagerduty and other workflow tools

Get started with Lumigo today!