The AWS CloudWatch Agent is a software package that facilitates the monitoring and collection of metrics and logs from Amazon EC2 instances and on-premises servers. Its primary function is to extend the native capabilities of Amazon CloudWatch, allowing for the tracking of additional system and application metrics.
The agent works by gathering data on various system parameters and operational metrics, such as CPU usage, memory consumption, and disk I/O activity, as well as application logs. This data is then transmitted to CloudWatch, where it can be visualized, monitored, and acted upon through alerts and automated actions.
The CloudWatch Agent operates by running as a background service on the host system. It can be configured to collect not just predefined metrics but also custom metrics specified by the user. For instance, through integration with third-party metric collectors like StatsD and collectd, the agent can gather detailed insights into application performance.
In this article
The CloudWatch agent offers the following features and capabilities.
The CloudWatch agent collects detailed system-level metrics, such as CPU utilization, memory usage, and disk I/O. This enables monitoring of resources, enabling faster detection of performance bottlenecks. It complements AWS’s native monitoring tools by providing deeper insights into the health of EC2 instances and on-premises servers.
The agent also allows for the customization of metric collection, enabling the tracking of specific performance indicators relevant to an application.
The agent can monitor on-premises servers, enabling unified, hybrid monitoring across cloud and on-premises resources. This feature also simplifies migration processes. By establishing a monitoring baseline for on-premises servers, businesses can accurately assess performance before and after migration to AWS, ensuring a smooth transition.
The CloudWatch agent supports the retrieval of custom metrics through StatsD and collectd, popular third-party metric collectors. This integration enables the collection of detailed application and system performance metrics beyond what is available through standard monitoring.
The agent can gather logs from various sources, enabling centralized log management in CloudWatch Logs. Centralized log management simplifies troubleshooting, security monitoring, and compliance auditing. Automated log file rotation and retention policies reduce the operational burden of managing large volumes of log data.
The CloudWatch agent collects several types of metrics.
These metrics include usage, idle time, and interrupt requests. They enable users to diagnose CPU performance issues and identify processes consuming excessive CPU resources. By monitoring these metrics, users can optimize their instances for better performance and cost efficiency.
Disk metrics collected by the CloudWatch agent include disk usage, read/write operations, and disk latency. Monitoring these metrics helps in identifying disk I/O bottlenecks and ensuring that storage performance matches application requirements. They help in capacity planning, alerting administrators to storage issues before they impact application availability or performance.
Memory utilization metrics reveal the total memory available, used memory, and cache and buffer usage. Monitoring memory metrics helps in detecting memory leaks or applications that are using more memory than expected. Efficient memory usage ensures applications run smoothly, enhancing performance and reducing the chance of crashes due to out-of-memory errors.
These metrics include inbound and outbound traffic, packet drop rates, and connection counts. They help explain network health, identify bottlenecks, and troubleshoot connectivity issues. Monitoring network metrics helps users maintain application performance and availability by ensuring the network infrastructure can handle traffic loads efficiently.
These metrics include the number of running processes, thread counts, and system uptime. Monitoring them allows for the identification of resource-intensive processes and potential system overloads. This insight is useful for maintaining application responsiveness and system stability.
These metrics include swap space utilization and swap in/out operations. They provide insights into memory pressure situations where swapping occurs. Minimizing swap operations is crucial for performance, as disk-based swap operations are significantly slower than memory access. Keeping swap usage low ensures better application performance and responsiveness.
Special network performance metrics, such as latency, packet loss, and jitter, are essential for applications that are sensitive to network performance, such as VoIP services, video streaming, and online gaming. Monitoring these metrics helps in ensuring a high-quality user experience by identifying and addressing network issues promptly.
To illustrate how you can use the AWS CloudWatch Agent to collect logs and metrics, let’s see how to use it to monitor Amazon EC2 instances.
To create an IAM role for the CloudWatch Agent, navigate to the IAM section of the AWS Management Console. Choose Roles from the sidebar, then click Create role. Select AWS service as the type of trusted entity, and choose EC2 as the service that will use this role.
For permissions, search for and attach the CloudWatchAgentServerPolicy, which grants the necessary permissions for metric and log collection. Name the role something descriptive like CloudWatchAgentRole, review the settings, and then create the role.
The required permissions include actions such as cloudwatch:PutMetricData, which allows the agent to publish custom metrics to CloudWatch, and logs:PutLogEvents, enabling it to send logs to CloudWatch Logs.
To begin monitoring with the AWS CloudWatch Agent, first set up an Amazon EC2 instance. Choose an instance type (e.g., t2.micro for a low-traffic application) and an Amazon Machine Image (AMI) that suits your application’s OS requirements (e.g., Amazon Linux 2).
During setup, attach the instance to the CloudWatchAgentRole IAM role you created earlier. This provides the necessary permissions for the CloudWatch Agent to function.
Launch your EC2 instance from the AWS Management Console. Once it’s running, select your instance and click Connect to find instructions for SSH (Linux) or Remote Desktop (Windows).
For a Linux instance, you might use a command like:
ssh -i /path/to/your-key.pem ec2-user@your-instance-public-dns.amazonaws.com.
After setting up the EC2 instance and attaching the IAM role, download/install the CloudWatch agent using the following command.
sudo yum install amazon-cloudwatch-agent
The CloudWatch Agent configuration can be modified using the amazon-cloudwatch-agent-config-wizard command on your EC2 instance.
On an Amazon AMI, cloudwatch-agent-config-wizard resides in /opt/aws/amazon-cloudwatch-agent/bin as shown below:
When you run this command, a wizard prompts you to select the metrics and logs to collect, such as specifying CPU, disk, and memory utilization metrics, along with Apache or NGINX access logs if they’re relevant to your application. The wizard generates a JSON configuration file, which you can further edit manually if needed.
After installation, navigate to the directory containing your configuration file and run the following command to start the agent with your configuration:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:./your-config-file.json -s
Ensure the agent is set to start on boot. For example, on systemd-based systems:
sudo systemctl enable amazon-cloudwatch-agent
With the agent running, metrics and logs will start populating in CloudWatch. To view them, go to the CloudWatch section of the AWS Management Console. Click on All Metrics in the Metrics section found on the left side. Next, look for the EC2 instance ID, click on it, and then search for CWAgent. You can visualize metrics like CPUUtilization, DiskReadOps, or any custom metrics you’ve configured.
Lumigo is an Observability and troubleshooting platform, purpose-built for microservice-based applications running on AWS. Developers running applications on microservices can use Lumigo to monitor, trace and troubleshoot issues fast. Deployed with zero-code changes and automated in one-click, Lumigo stitches together every interaction between micro and managed service into end-to-end stack traces. These traces enriched with full response and request payload data, gives you complete visibility into your microservices environments. Using Lumigo, developers get: