Guide Content

Guide Content

AWS Cloudwatch Agent: The Basics and a Quick Tutorial

What Is the AWS CloudWatch Agent?

The AWS CloudWatch Agent is a software package that facilitates the monitoring and collection of metrics and logs from Amazon EC2 instances and on-premises servers. Its primary function is to extend the native capabilities of Amazon CloudWatch, allowing for the tracking of additional system and application metrics.

The agent works by gathering data on various system parameters and operational metrics, such as CPU usage, memory consumption, and disk I/O activity, as well as application logs. This data is then transmitted to CloudWatch, where it can be visualized, monitored, and acted upon through alerts and automated actions.

The CloudWatch Agent operates by running as a background service on the host system. It can be configured to collect not just predefined metrics but also custom metrics specified by the user. For instance, through integration with third-party metric collectors like StatsD and collectd, the agent can gather detailed insights into application performance.

In this article

Key Features of the AWS CloudWatch Agent

The CloudWatch agent offers the following features and capabilities.

Collection of System-Level Metrics

The CloudWatch agent collects detailed system-level metrics, such as CPU utilization, memory usage, and disk I/O. This enables monitoring of resources, enabling faster detection of performance bottlenecks. It complements AWS’s native monitoring tools by providing deeper insights into the health of EC2 instances and on-premises servers.

The agent also allows for the customization of metric collection, enabling the tracking of specific performance indicators relevant to an application.

Support for Hybrid Environments

The agent can monitor on-premises servers, enabling unified, hybrid monitoring across cloud and on-premises resources. This feature also simplifies migration processes. By establishing a monitoring baseline for on-premises servers, businesses can accurately assess performance before and after migration to AWS, ensuring a smooth transition.

Retrieval of Custom Metrics via StatsD and collectd

The CloudWatch agent supports the retrieval of custom metrics through StatsD and collectd, popular third-party metric collectors. This integration enables the collection of detailed application and system performance metrics beyond what is available through standard monitoring.

Log Collection Capabilities

The agent can gather logs from various sources, enabling centralized log management in CloudWatch Logs. Centralized log management simplifies troubleshooting, security monitoring, and compliance auditing. Automated log file rotation and retention policies reduce the operational burden of managing large volumes of log data.

Metrics Collected by the CloudWatch Agent

The CloudWatch agent collects several types of metrics.

CPU Metrics

These metrics include usage, idle time, and interrupt requests. They enable users to diagnose CPU performance issues and identify processes consuming excessive CPU resources. By monitoring these metrics, users can optimize their instances for better performance and cost efficiency.

Disk Metrics

Disk metrics collected by the CloudWatch agent include disk usage, read/write operations, and disk latency. Monitoring these metrics helps in identifying disk I/O bottlenecks and ensuring that storage performance matches application requirements. They help in capacity planning, alerting administrators to storage issues before they impact application availability or performance.

Memory Metrics

Memory utilization metrics reveal the total memory available, used memory, and cache and buffer usage. Monitoring memory metrics helps in detecting memory leaks or applications that are using more memory than expected. Efficient memory usage ensures applications run smoothly, enhancing performance and reducing the chance of crashes due to out-of-memory errors.

Network Metrics

These metrics include inbound and outbound traffic, packet drop rates, and connection counts. They help explain network health, identify bottlenecks, and troubleshoot connectivity issues. Monitoring network metrics helps users maintain application performance and availability by ensuring the network infrastructure can handle traffic loads efficiently.

Process Metrics

These metrics include the number of running processes, thread counts, and system uptime. Monitoring them allows for the identification of resource-intensive processes and potential system overloads. This insight is useful for maintaining application responsiveness and system stability.

Swap Metrics

These metrics include swap space utilization and swap in/out operations. They provide insights into memory pressure situations where swapping occurs. Minimizing swap operations is crucial for performance, as disk-based swap operations are significantly slower than memory access. Keeping swap usage low ensures better application performance and responsiveness.

Special Network Performance Metrics

Special network performance metrics, such as latency, packet loss, and jitter, are essential for applications that are sensitive to network performance, such as VoIP services, video streaming, and online gaming. Monitoring these metrics helps in ensuring a high-quality user experience by identifying and addressing network issues promptly.

Collecting EC2 Logs and Metrics with the AWS CloudWatch Agent

To illustrate how you can use the AWS CloudWatch Agent to collect logs and metrics, let’s see how to use it to monitor Amazon EC2 instances.

Create an IAM Role

To create an IAM role for the CloudWatch Agent, navigate to the IAM section of the AWS Management Console. Choose Roles from the sidebar, then click Create role. Select AWS service as the type of trusted entity, and choose EC2 as the service that will use this role.

For permissions, search for and attach the CloudWatchAgentServerPolicy, which grants the necessary permissions for metric and log collection. Name the role something descriptive like CloudWatchAgentRole, review the settings, and then create the role.

The required permissions include actions such as cloudwatch:PutMetricData, which allows the agent to publish custom metrics to CloudWatch, and logs:PutLogEvents, enabling it to send logs to CloudWatch Logs.

Set Up an EC2 Instance

To begin monitoring with the AWS CloudWatch Agent, first set up an Amazon EC2 instance. Choose an instance type (e.g., t2.micro for a low-traffic application) and an Amazon Machine Image (AMI) that suits your application’s OS requirements (e.g., Amazon Linux 2).

During setup, attach the instance to the CloudWatchAgentRole IAM role you created earlier. This provides the necessary permissions for the CloudWatch Agent to function.

Launch and Connect to the Instance

Launch your EC2 instance from the AWS Management Console. Once it’s running, select your instance and click Connect to find instructions for SSH (Linux) or Remote Desktop (Windows).

For a Linux instance, you might use a command like:

Copy Code

ssh -i /path/to/your-key.pem ec2-user@your-instance-public-dns.amazonaws.com.

Install CloudWatch Agent and Modify Configuration

After setting up the EC2 instance and attaching the IAM role, download/install the CloudWatch agent using the following command.

Copy Code

sudo yum install amazon-cloudwatch-agent

The CloudWatch Agent configuration can be modified using the amazon-cloudwatch-agent-config-wizard command on your EC2 instance.

On an Amazon AMI, cloudwatch-agent-config-wizard resides in /opt/aws/amazon-cloudwatch-agent/bin as shown below:

When you run this command, a wizard prompts you to select the metrics and logs to collect, such as specifying CPU, disk, and memory utilization metrics, along with Apache or NGINX access logs if they’re relevant to your application. The wizard generates a JSON configuration file, which you can further edit manually if needed.

Start the Agent

After installation, navigate to the directory containing your configuration file and run the following command to start the agent with your configuration:

Copy Code

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:./your-config-file.json -s

Ensure the agent is set to start on boot. For example, on systemd-based systems:

Copy Code

sudo systemctl enable amazon-cloudwatch-agent

Generate and View Metrics

With the agent running, metrics and logs will start populating in CloudWatch. To view them, go to the CloudWatch section of the AWS Management Console. Click on All Metrics in the Metrics section found on the left side. Next, look for the EC2 instance ID, click on it, and then search for CWAgent. You can visualize metrics like CPUUtilization, DiskReadOps, or any custom metrics you’ve configured.

Lumigo: Cloud Native Monitoring for AWS

Lumigo is an Observability and troubleshooting platform, purpose-built for microservice-based applications running on AWS. Developers running applications on microservices can use Lumigo to monitor, trace and troubleshoot issues fast. Deployed with zero-code changes and automated in one-click, Lumigo stitches together every interaction between micro and managed service into end-to-end stack traces. These traces enriched with full response and request payload data, gives you complete visibility into your microservices environments. Using Lumigo, developers get:

End-to-end virtual stack traces across every micro and managed service that makes up an application, in context
API visibility that makes all the data passed between services available and accessible, making it possible to perform root cause analysis with automatically correlated traces, logs, and metrics
Distributed tracing that is deployed with no code and automated in one click
Unified platform to explore and query across microservices, see a real-time view of applications, and optimize performance