• Guide Content

SQS Monitoring: 5 Key Metrics & Monitoring Them with CloudWatch

What Is AWS SQS Monitoring? 

AWS SQS monitoring tracks various metrics and operations of Amazon Simple Queue Service (SQS), a distributed message queuing service. It helps detect inefficiencies or issues in message processing, ensuring applications handle request loads effectively and reliably.

Developers can observe message throughput, queue statuses, and performance bottlenecks through monitoring. This ensures SQS operates efficiently, reducing downtime and improving the responsiveness of applications that depend on it.

The Importance of Monitoring AWS SQS 

Monitoring AWS SQS is crucial for maintaining system integrity and performance. It allows for immediate detection and rectification of issues that could disrupt message processing, essential for applications requiring consistent and reliable communication across different components.

SQS monitoring also helps support operational scaling and load-balancing decisions. Organizations can scale their resources by analyzing traffic patterns and queue performance accordingly, optimizing cost and performance.

Key Metrics to Monitor in AWS SQS

Here are some of the main metrics that can be used to monitor the performance of SQS.

1. ApproximateAgeOfOldestMessage

The ApproximateAgeOfOldestMessage metric is useful for identifying delays in processing messages in SQS queues. It measures when the oldest message has been in the queue, indicating potential backlogs or inefficiencies.

A high age value suggests that the queue cannot process messages quickly enough, possibly due to insufficient resources or application errors. Monitoring this metric helps maintain smooth and timely data flow through applications.

2. ApproximateNumberOfMessagesDelayed

ApproximateNumberOfMessagesDelayed tracks the number of delayed messages before being available for processing. This metric is useful for applications where timely processing is critical, as it helps identify issues in message delivery settings.

Increased delays can affect application performance, leading to unsatisfactory user experiences. Regular monitoring ensures that delays are kept within acceptable thresholds and that any deviation is addressed promptly.

3. NumberOfEmptyReceives

Monitoring the NumberOfEmptyReceives allows teams to evaluate the efficiency of message retrieval from the queue. This metric indicates how often the queue is polled without retrieving any messages, which can indicate inefficiency.

A high rate of empty receives can result in higher costs and wasted computational resources. Effectively monitoring and adjusting polling rates can optimize costs and improve efficiency.

4. NumberOfMessagesDeleted

NumberOfMessagesDeleted indicates the number of messages removed from the queue after successful processing. Tracking this helps confirm that messages are being effectively processed, not just received.

Monitoring deletions also helps ensure that message handling complies with data processing policies and audits data flows within applications.

5. SentMessageSize

SentMessageSize represents the size of messages sent to the queue. Monitoring this metric is important for optimizing the queuing system’s performance, as larger messages can increase processing time and costs.

Keeping track of message sizes helps maintain efficiency, reduces transmission times, and can also aid in cost management by staying within service limits for message sizes.

What Is AWS CloudWatch? 

CloudWatch is a monitoring and management service provided by Amazon Web Services that provides data and actionable insights for AWS, hybrid, and on-premises applications and infrastructure resources. CloudWatch collects monitoring and operational data in logs, metrics, and events, providing a unified view of AWS resources, applications, and services.

Organizations can use AWS CloudWatch to gain system-wide visibility into resource utilization, application performance, and operational health. It allows them to understand and respond to system-wide performance changes, optimize resource utilization, and get a unified view of operational health.

Creating CloudWatch Alarms for Amazon SQS Metrics

Here’s an overview of how to create a CloudWatch alarm to monitor SQS.

Access CloudWatch Metrics for Amazon SQS

To access CloudWatch metrics for your Amazon SQS queues using the Amazon SQS Console:

  1. Log into the Amazon SQS console.
  2. Select the queues you want to view metrics by checking their boxes.
  3. Click on the Monitoring tab to display various graphs related to SQS metrics.
  4. Hover over the information icons next to the graphs to get more details about each graph’s representation.
  5. You can adjust the time range for all graphs simultaneously or individually to better analyze the data over different intervals.

To monitor your SQS metrics via the CloudWatch console:

  1. Navigate to Metrics and select the SQS namespace.
  2. Choose the Queue Metrics dimension to view specific metrics.
  3. You can sort the metrics, graph specific metrics by selecting them, and use the search feature to filter out metrics you are interested in.

Additionally, you can view all CloudWatch metrics by selecting View all CloudWatch metrics in the SQS console, which redirects you to a more detailed section in the CloudWatch console where further interactions with metrics are possible.

Create a CloudWatch Alarm for Amazon SQS Metrics

To set up an alarm in AWS CloudWatch based on specific SQS metrics:

  1. Access the CloudWatch console by signing into your AWS Management Console.
  2. Navigate to Alarms and click on Create Alarm.
  3. Go to Select Metric, browse Metrics, and then navigate to SQS, followed by Queue Metrics.
  4. Select the queue and metric (for example, NumberOfMessagesSent) you want to monitor.

To configure the alarm:

  1. Choose the threshold for the metric that will trigger the alarm (e.g., more than 100 messages sent to the queue within an hour).
  2. Set the necessary conditions like the period (1 hour) and the statistic type (Sum).
  3. Configure what actions should be taken when the alarm state is met. For example, you might set up an Amazon SNS topic to send email notifications when the alarm is triggered.

To set up notifications:

  1. Select an existing SNS topic or create a new one by entering the required email addresses.
  2. Note that email addresses must be verified for new SNS topics before notifications can be sent.
  3. Finalize by selecting Create Alarm.

Once configured, your alarms will help maintain the efficiency and performance of your SQS queues by alerting you whenever the specified conditions are met. This setup ensures that any potential issues are addressed promptly, maintaining the smooth operation of your applications.

Lumigo: Cloud Native Monitoring for AWS

Lumigo is a cloud-native observability and troubleshooting tool. Lumigo automatically enriches traces with complete in-context request and response payloads and correlates them to the appropriate logs and metrics. This unified view of all troubleshooting data enables users to solve cloud native issues 80% faster than similar tools. Monitoring AWS SQS with Lumigo:

  • Distributed tracing—One of the key drivers behind Lumigo is the distributed tracing capabilities. These allow you to follow the message path through the system from the AWS SQS queue through and into the Node.js container and beyond. This allows you to identify how these errors are occurring and help you replicate them in a controlled environment.
  • Debug code errors—With Lumigo’s detailed error analysis, you can see the call stack, the exact location of the error in the system, and how it may be thrown. This helps alleviate the manual process of logging and debugging the code, helping to optimize yourself and other developers in the team.
  • Monitor key metrics you set—By getting into the nitty-gritty of your system, Lumigo can also help monitor strategic key metrics within your infrastructure, such as invocation counts, execution times, and message throughput. By allowing you to see all of this and more, it gives you a holistic view of how your system works from the inside out.
  • Analyze the logs—In Lumigo, you can aggregate logs from Lambda functions and other AWS services, allowing you to identify bottlenecks and failure points quickly. This can help with this scenario by allowing you to see the logs from the Lambda function and the Node.js container and see where the errors are being thrown from.

Get started with a free trial of Lumigo for your microservice applications.