Every application that runs in a production environment needs to have a monitoring service to enable observability, reliability, and security — and serverless monitoring is no different. For applications on the AWS cloud, there is an out-of-the-box service known as AWS CloudWatch Logs used to store, monitor, and access logs using queries.
CloudWatch Logs can be enabled for various AWS services such as EC2, EKS, ECS, Lambda, and many more. It is a centralized place where you can see all different types of logs — app logs, audit logs, infra logs — in one place. You can search logs for a specific error or pattern. It provides filtering based on fields and provides a powerful query language to write customized queries and create dashboards.
There are a few concepts you need to understand about CloudWatch Logs:
CloudWatch Logs captures each activity that occurs in an application or resource. This is known as a log event. It is an activity record that describes when the event occurred and contains the raw message. It must be UTF-8 encoded.
A single log event generally doesn’t provide all the details of what happened during a specific timestamp, which involves a sequence of events. CloudWatch Logs group together these sequences of events from the same source, and this is known as a log stream.
Log streams can be further grouped together based on the same retention, access control, and monitoring requirements. So a log group includes several log streams.
Raw events may be useful for developers to debug production issues, but you also want to see the pattern of application behavior over a period of time. For that, you can use metric filters, which observe these events and create data points in CloudWatch metrics.
Retention settings are assigned at the log group level and applied to all the log streams in that log group. It enables CloudWatch Logs to delete the log events when it meets the retention criteria for deleting.
CloudWatch Logs is typically used in conjunction with several other AWS services:
This service is used to enable monitoring of the calls made to your AWS service, which can include the CloudWatch Logs service. You can audit how calls are happening to the CloudWatch Logs API for your account. Log files are delivered to an AWS S3 bucket.
This service manages access control for AWS resources. You can use IAM to define which AWS services get permission to push logs to AWS CloudWatch Logs.
This service is used to stream data in real-time from one AWS service to another for further processing. In many use cases, CloudWatch Logs are transferred to S3 for archiving using Kinesis Data Streams. You can also use this service to transfer logs to third-party services such as Splunk and ELK.
The Lambda service is used to trigger functional logic based on events. CloudWatch Logs can be integrated with this service to take action if an infrastructure failure occurs.
AWS Lambda is the most popular service for building serverless applications. Other than providing all the serverless features, it automatically enables monitoring of Lambda functions. It creates a log group for each Lambda function to capture the Lambda logs. This log group looks like this – /aws/lambda/<function name>
In each log group, there will be a log stream for each instance of the Lambda function running.
The Lambda service integrates with AWS CloudWatch Logs and all the Lambda logs are pushed from the execution environment to CloudWatch Logs.
Lambda logs are directly available on the Lambda console along with CloudWatch metrics. To see the logs in detail, you can go to the CloudWatch console or by using the AWS CLI.
AWS provides the CloudWatch Logs Insights tool to enable you to search and analyze log data. This tool is very handy for addressing operational issues. CloudWatch Lambda Insights is an extension of CloudWatch Logs Insights.
CloudWatch Lambda Insights is built specifically for serverless applications running on Lambda. It collects CPU, memory, disk, and other infra-related resource usage and aggregates them to show data points. It also provides information for cold starts and Lambda instance-related issues.
Lambda Function Insights is not enabled by default. You need to enable it using the Lambda console, AWS CLI, SAM CLI, and several other methods. When you add it, it is going to add the Insights extension as a Lambda layer to the function and also add an execution role’s permission (if not provided earlier).
The Lambda Insights dashboard can be used for a multi-function or single-function view. A multi-function view is important for monitoring metrics such as overall usage of memory and CPU across functions. A single-function view is used for monitoring the performance of individual requests of a function.
CloudWatch metrics are used to monitor application behavior and raise alarms. Lambda sends each event’s metric to CloudWatch. Based on these metrics, graphs and dashboards are created to monitor the patterns. You can also set CloudWatch alarms based on these metrics.
For Lambda functions, you can filter these metrics based on the function name, resource, version, and alias.
There are three types of metrics at a high level:
Invocation metrics are used to monitor the outcome of an invocation. For example, if you need to count the number of errors per minute occurring for a function, you can view the Sum of the Errors metric with a period of one minute.
Some of the invocation metrics are – Invocations, Errors, DeadLetterErrors, desitnationDeliveryFailures, Throttles. They are available to view with the Sum statistic.
The performance metric is used to monitor each Lambda function invocation’s performance. For example, if you need to find out how much time a function spends processing an event, you can view the Duration metric. You can also use IteratorAge to find out the age of the last record of the event.
Concurrency metrics are used to view the aggregate count of the number of instances of a function running based on a version, alias, or region. You can read ConcurrentExecutions, ProvisionedConcurrentExecutions, ProvisionedConcurrencyUtilization and UnreservedConcurrentExecutions metrics under the Max statistic.
Lambda is the most popular service in the AWS cloud for building event-driven workflows and serverless applications. By nature, these systems are built of distributed components, so it is very important to have a centralized service such as CloudWatch Logs to monitor their behavior. It not only helps to debug failures but also identifies performance bottlenecks and resource waste. You can also fine-tune your scaling configurations based on these metrics.