• Guide Content

Analytics and alerts with AWS CloudWatch and SNS

AWS offers 175+ services to developers for building applications on the AWS cloud. Running applications on so many different services requires robust logging, monitoring, and notifications services to pull logs, metrics and other information. And given the popularity of serverless on AWS, it should be able to address the special needs of serverless monitoring. In this article, we take a closer look at two such services: CloudWatch and SNS.

With the need for centralized logging in mind, AWS launched the CloudWatch service. It integrates with many of your systems, applications, and AWS services. It not only aggregates logs in one place but also provides a dashboard to view them, and the ability to search them for specific error codes or patterns. You can also forward these logs to other third-party systems for analytics and other purposes. Many teams use AWS S3 to archive these logs. Currently, more than 30 AWS services publish logs to CloudWatch.

CloudWatch has several components other than logs. Let’s take a closer look.

AWS CloudWatch Logs Insight

CloudWatch Logs Insight is a fully-managed service that does not require any setup or maintenance. It provides an interactive query and visualization platform that plows through massive logs in seconds and provides meaningful results. It supports various types of log format, and auto-discovers fields if the log format is JSON.

It benefits from AWS services such as Amazon Route 53, AWS Lambda, AWS CloudTrail, and Amazon VPC that can emit log events as JSON and auto-discovers certain fields depending on the AWS services it interacts with. This helps it build better visualization without any effort on the part of the user.

AWS CloudWatch Alarms

AWS CloudWatch Alarms watches a single CloudWatch metric or the result of math expression of a metric and performs actions. An action can be based on the metric value or expression relative to a threshold over a number of time periods. There are different types of actions, for example: an AWS EC2 service can auto-scale based on a CPU threshold or a notification can be sent to an AWS SNS topic, which eventually triggers an email or Lambda function.

You can also use composite alarms in which you have a rule expression that takes into account the other existing alarm states. If all conditions of a rule are met, the composite alarm goes into ALARM state. Composite alarms cannot perform EC2 actions or auto-scaling, but they can send Amazon SNS notifications.

An alarm can be in one of three statuses:

  • OK – Means the metric has not crossed the threshold.
  • ALARM – Means the alarm is triggered as the metric is outside of the defined threshold.
  • INSUFFICIENT_DATA – Means it doesn’t have enough information yet to determine whether the metric is within or outside of the threshold range.

Tips from the experts

  1. Use anomaly detection for dynamic thresholds

    Instead of static thresholds, use CloudWatch's anomaly detection feature to create alarms with dynamic thresholds that adapt to your application's normal patterns. This helps reduce false positives and ensures that alarms are triggered only when there are genuine anomalies.
  2. Create composite alarms for multi-metric monitoring

    Use composite alarms to combine multiple alarms into a single status, allowing you to monitor complex conditions that involve multiple metrics. For instance, you can set up a composite alarm that triggers only when both high CPU usage and low free memory conditions are met, reducing noise and improving actionable insights.
  3. Automate response actions with CloudWatch Event Rules

    Beyond notifications, use CloudWatch Event Rules to automatically trigger corrective actions when specific conditions are met. For example, configure an event rule to scale EC2 instances when a particular alarm is triggered, or to restart failed ECS tasks, making your architecture more resilient and self-healing.
  4. Leverage SNS message filtering

    Use SNS message filtering to reduce unnecessary notifications. By filtering messages based on attributes, you can ensure that only relevant notifications are sent to subscribers, reducing noise and helping teams focus on critical alerts.
  5. Integrate CloudWatch with AWS Lambda for custom alerts

    Use AWS Lambda as a target for CloudWatch alarms to create custom alert logic. For example, a Lambda function can enrich alarm data, integrate with third-party services, or implement custom notification workflows, enabling more tailored alert responses.
  6. Use CloudWatch Logs Insights for real-time alerting

    Configure CloudWatch Logs Insights queries to run on a schedule, generating metrics or alerts from log data. This is useful for real-time alerting based on specific log patterns, such as error codes or unauthorized access attempts, providing immediate visibility into critical issues.
  7. Set up SNS delivery status logging

    Enable SNS delivery status logging to CloudWatch Logs to monitor the success or failure of message deliveries. This can help you diagnose issues with message delivery, such as bounced emails or failed SMS notifications, and take corrective action promptly.
  8. Implement cross-account CloudWatch and SNS

    For organizations with multiple AWS accounts, set up cross-account CloudWatch and SNS to centralize monitoring and alerting. Use AWS Organizations and resource policies to allow CloudWatch and SNS access across accounts, simplifying management and ensuring consistent alerting across environments.
  9. Optimize alarm actions to reduce alert fatigue

    To reduce alert fatigue, use a hierarchy of alarms with varying severities and actions. For example, configure minor alarms to only log incidents or send low-priority notifications, while critical alarms trigger high-priority actions such as paging on-call engineers.
  10. Monitor and manage CloudWatch costs

    CloudWatch metrics and logs can become expensive with high usage. Regularly review your CloudWatch usage, optimize the frequency and granularity of metrics, and apply retention policies to logs. Use CloudWatch Contributor Insights to identify the top contributors to log data volume and adjust logging levels accordingly.
Aviad Mor
Aviad Mor
CTO
Aviad Mor is the Co-Founder & CTO at Lumigo. Lumigo’s SaaS platform helps companies monitor and troubleshoot microservices applications while providing actionable insights that prevent business disruptions. Aviad has over a decade of experience in technology leadership, heading the development of core products in Check Point from inception to wide adoption.

AWS CloudWatch Events

AWS CloudWatch Events is a very important part of an event-driven or serverless architecture on AWS. It enables you to build systems by triggering events based on the changes in AWS services such as EC2, ECS, EKS, S3, and DynamoDB. CloudWatch Events responds to the operational changes and triggers corrective action using Lambda Function, Kinesis Streams, or S3.

You can also schedule automated actions using CloudWatch Events. You can self-trigger at certain times using cron or rate expressions.

AWS CloudWatch works with the AWS SNS service to send notifications to users either through email or SMS. It can also trigger other actions using HTTPS calls, SQS, and Lambda functions.

So, before talking about a few of the use cases, let’s first discuss what the AWS SNS service offers  AWS users.

AWS SNS

AWS Simple Notification Service (SNS) is a fully-managed messaging service that offers system-to-system as well as app-to-person (A2P) communication. It provides patterns to communicate between the systems through publish/subscribe. You can also communicate directly to users via SMS, mobile push notifications, and email. The benefit of this service is that you can decouple your microservices and have them communicate with each other without knowing each other.

In SNS, a “topic” acts as a logical access point and communication channel. Publishers send messages to a topic and clients subscribe to that topic to receive these messages.

Now, let’s see how these two services, CloudWatch and SNS, work together to fulfil several business use cases.

CloudWatch and SNS Use Cases

CloudWatch Alarm Based on a Static Threshold

You can create a CloudWatch alarm based on a CloudWatch metric. Enable the watch and define the threshold for the metric. When the metric breaches the threshold for a specified number of evaluation periods, the alarm goes into ALARM state.

Email Notification with CodePipeline

When you set up a pipeline in AWS CodePipeline, you may want to get a notification when there is a change in the execution state of the pipeline. AWS SNS provides a topic with which users can subscribe to receive these notifications through email.

Once the topic is created in SNS, you can create a rule in CloudWatch Events. Rule definition should configure CodePipeline as the event source and Pipeline Execution State Change as the Event Type. For example, if you want to get notification of any FAILED status, set the Specific state(s) field as FAILED. To point Events to your specific pipeline, define the pipeline details in the Event pattern. Finally, configure the SNS topic in the target. You are all set to receive notifications when your CodePipeline fails.

ELB Health Check Failure Notification

Most modern applications have Elastic Load Balancer (ELB) as a load balancer for their workload running on EC2 instances. So let’s say you have two EC2 instances behind an ELB and there is a http health check defined to ensure these two instances are healthy. You can define an alarm in ELB with the rule configured to  send a notification to the operation team if the healthy instances count is < 2.

Dialog box for creating an alarm with AWS SNS

Now, if you remove one of the instances from the ELB, it will trigger the alarm to the configure SNS topic and send an email notification to your operations team to act on it.

There are many such use cases that can be implemented using CloudWatch and SNS.

Summary

Looking at both CloudWatch and SNS, you can see how they address a wide range of features and configurations for different types of use cases. And both are critical components for an event-driven and serverless application.

Read more in our Serverless Monitoring Guide