Guide Content

Guide Content

How to Do AWS Lambda Monitoring and 8 Critical Best Practices

What Is AWS Lambda Monitoring?

AWS Lambda is a serverless runtime platform that lets you run code functions on-demand. It’s an event-driven service that does not require provisioning compute or storage resources to support functions. However, it does require close monitoring to ensure application performance, health, and reliability.

AWS Lambda serverless monitoring involves achieving visibility into the state of your Lambda workloads. We’ll review three ways to monitor your Lambda functions:

Using Amazon CloudWatch to extend visibility into logs and set up alerts.
Using AWS X-Ray for distributed tracing.
Using a dedicated serverless monitoring solution like Lumigo to achieve complete observability, resolve issues like Lambda cold starts timeouts, and monitor upstream AWS services.

This is part of an extensive series of guides about [IaaS].

In this article

Why Is AWS Lambda Monitoring Important?

AWS Lambda monitoring is essential for several reasons, as it helps you maintain the performance, reliability, and efficiency of your serverless applications:

Identify performance issues—monitoring can help you detect performance bottlenecks, high latency, or resource constraints that may impact your Lambda functions’ response times and overall performance. By addressing these issues proactively, you can ensure a smooth user experience.
Error detection and troubleshooting—monitoring Lambda functions enables you to capture error messages, exceptions, and other anomalies that may occur during execution. By analyzing logs and traces, you can quickly identify the root causes of these issues and implement appropriate fixes.
Optimize resource usage and costs—monitoring provides insights into your Lambda functions’ resource usage, such as memory, CPU, and execution duration. By analyzing this data, you can make informed decisions about resource allocation, scaling, and optimization, helping you control costs while maintaining the desired performance levels.
Maintain reliability and availability—ensuring the reliability and availability of your serverless applications is critical for maintaining user satisfaction and trust. Monitoring allows you to detect and address issues before they escalate, minimizing the potential impact on your users.
Gain insights into application behavior—by monitoring your Lambda functions, you can gain valuable insights into how your serverless applications are behaving under various conditions, such as changes in traffic patterns or resource constraints. This can help you make data-driven decisions to improve your applications’ design and architecture.
Ensure compliance and security—monitoring Lambda functions can help you identify and address security vulnerabilities, unauthorized access attempts, or potential breaches. Additionally, monitoring can assist you in meeting compliance requirements by providing visibility into your serverless infrastructure’s security posture.
Proactive alerting and notifications—setting up alarms and notifications based on specific metrics or events can help you proactively address issues before they impact your users. This can significantly reduce the time it takes to detect and resolve problems, improving the overall stability of your serverless applications.

Key Concepts of AWS Lambda Monitoring

Monitoring can enable you to detect performance problems, outages, and errors in your Lambda workloads. It is vital to monitor each service because Lambda-based applications frequently combine multiple services. Closely monitor the performance, throughput, and errors of event sources of your Lambda functions.

Here are the key concepts that serverless observability in AWS relies on:

Metrics—the Lambda service automatically publishes metrics for Lambda functions including time series data and service-level indicators like request and error rate, duration of invocation. You can also create metrics for your specific use cases.
Logs—Amazon CloudWatch is the default logging service in Lambda. Lambda logs show discrete events with timestamped records in serverless applications, like a failure, an error, or a state transformation. If you prefer, you can use a third-party logging system.
Alerts—CloudWatch alerts are notifications that can alert operators if metrics are outside expected bounds. You can receive alerts directly from CloudWatch, or integrate a full monitoring system with CloudWatch to monitor and report on anomalies.
Visualization—displaying metrics in a visual format and organizing them in dashboards.
Distributed tracing—identifying the full path of a request in a system built of several microservices.

AWS Lambda Metrics

Here are the primary AWS Lambda metrics you can access via Amazon CloudWatch.

Lambda invocation metrics include:

Invocations—the total number of times Lambda executes the function code (including successful and failed executions).
Errors—the number of Lambda invocations resulting in function errors.
DeadLetterErrors—the number of failed attempts Lambda makes to send an event to the dead letter queue (for asynchronous invocations).
Throttles—the number of throttled invocation requests (Lambda rejects these requests intentionally).
ProvisionedConcurrencyInvocations—the number of times Lambda executes the function code on provisioned concurrency.
ProvisionedConcurrencySpilloverInvocations—the number of times Lambda executes the function code on standard concurrency while provisioned concurrency is all in use.

Lambda performance metrics include:

Duration—the time the function code takes to process an event.
IteratorAge—the age of the oldest event record (for event source mappings reading from streams).

Lambda concurrency metrics include:

ConcurrentExecutions—the number of instances of function processing events.
ProvisionedConcurrentExecutions—the number of function instances processing events on provisioned concurrency.
ProvisionedConcurrencyUtilization—the ProvisionedConcurrentExecutions value divided by the total allocated provisioned concurrency (for an alias or version).

Lambda Monitoring with AWS CloudWatch

Lambda functions integrate with CloudWatch automatically. Here is how it works:

Lambda records various standard metrics automatically and publishes them to CloudWatch metrics.
By default, AWS durably stores logs from Lambda function invocations in a CloudWatch log stream.
The Lambda console provides a monitoring tab that offers a view into integrated CloudWatch metrics per each function.

Alarms in CloudWatch

CloudWatch lets you create alarms to monitor metrics and push notifications when metrics exceed typical values.

Here are several options:

Create composite alarms combining several alarms to get more effective notifications.
Create alarms manually in the console
Create alarms using an AWS SAM template to define the alarm with the application’s resources.

Custom metrics
CloudWatch can track application-specific custom metrics. AWS stores metrics with standard resolution by default to provide one-minute granularity. It lets you define custom metrics as standard or high resolution to achieve one-second granularity. High-resolution metrics offer more immediate insight into subminute activity and help generate alarms more quickly, according to 10-second or 30-second activity.

You can also develop graphs, statistical analysis, and dashboards on custom metrics. Instead of measuring performance related to Lambda functions, you can use custom metrics to track statistics in the application domain. A single statistic can have multiple dimensions for later analysis. Note that the system must publish each custom metric to a namespace, isolating groups of custom metrics. As a result, a namespace equates to a workload or application domain.

Learn more in our detailed guide to AWS Lambda CloudWatch

Distributed Tracing for Lambda with AWS X-Ray

Amazon X-Ray lets you visualize the components of your serverless application, identify performance bottlenecks, and troubleshoot requests that are causing errors. Lambda functions send trace data directly to X-Ray, which processes the data to generate service maps and searchable trace summaries.

If the service that calls your Lambda function has X-Ray tracing enabled, Lambda automatically sends the trace to X-Ray. This can be set up for any upstream service, such as the Amazon API Gateway or any application instrumented with the X-Ray SDK. Such upstream applications can then add a trace header that samples incoming requests and sends a trace to Lambda.

You can also trace requests without trace headers by enabling active tracing in your function configuration, under Configuration > Monitoring tools > X-Ray > Active tracing.

A trace records information about requests processed by one or more services. The trace contains three subsegments:

Initialization—shows the initialization phase of the Lambda execution lifecycle. In this step, Lambda uses the configured resources to create or unfreeze the execution environment, download function code and all layers, initialize extensions, initialize runtime, and execute function initialization code.
Invocation—shows the invocation phase of the Lambda function handler. It starts with registering the runtime and ends when runtime is ready to send a response.
Overhead—shows the steps that occur between sending a response at runtime and the time of next call. During this time, the runtime completes all work related to the invocation and prepares to freeze the Lambda sandbox.

The image below shows how X-Ray visualizes a distributed trace.

Learn more in our detailed guide to AWS X-Ray

End-to-End Lambda Monitoring with Lumigo

Lumigo is a serverless monitoring platform that lets developers effortlessly find Lambda cold starts, understand their impact, and fix them.

Lumigo can help you:

Solve cold starts – easily obtain cold start-related metrics for your Lambda functions, including cold start %, average cold duration, and enabled provisioned concurrency. Generate real-time alerts on cold starts, so you’ll know instantly when a function is under-provisioned and can adjust provisioned concurrency.
Find and fix issues in seconds with visual debugging – Lumigo builds a virtual stacktrace of all services participating in the transaction. Everything is displayed in a visual map that can be searched and filtered.
Automatic distributed tracing – with one click and no manual code/configuration changes, Lumigo visualizes your entire environment, including your Lambdas, other AWS services, and every API call and external SaaS service.
Identify and remove performance bottlenecks – see the end-to-end execution duration of each service, and which services run sequentially and in parallel. Lumigo automatically identifies your worst latency offenders, including AWS Lambda cold starts.
Correlate cross-system issues – Use a single dashboard to manage issues across all the different services in the account. Be able to correlate errors over time to get cross-system insights, and control the desired notification from a single place. For example, correlate time periods with DynamoDB throttles and Lambda timeouts, thus finding root causes at ease.

8 Best Practices for AWS Lambda Monitoring

To effectively monitor AWS Lambda functions and maintain the performance, reliability, and efficiency of your serverless applications, consider the following best practices:

Configure custom metrics and logs: While AWS Lambda automatically generates default metrics and logs, creating custom metrics and logs tailored to your specific application can provide you with more targeted insights. This can help you track custom performance indicators, application-specific events, or business metrics.
Use structured logging: When logging data from your Lambda functions, use structured formats like JSON. Structured logging makes it easier to search, filter, and analyze log data in Amazon CloudWatch Logs or other log management tools.
Monitor cold starts: Cold starts occur when a new container is created to handle a Lambda function’s invocation. This can result in increased latency. Keep track of cold start occurrences and their impact on your function’s performance, and optimize your function’s configuration to minimize their effects.
Optimize function timeouts and memory allocation: Monitor function timeouts and memory usage to find the optimal balance between performance and cost. Adjusting these settings can help improve the execution time and efficiency of your Lambda functions.
Monitor concurrency and throttling: Keep track of concurrent executions and throttling events to ensure your Lambda functions can handle the desired request load. Adjust the concurrency limits and provisioned concurrency settings as needed to maintain performance and avoid throttling.
Monitor security-related events: Track events related to security, such as unauthorized access attempts, configuration changes, and potential vulnerabilities. This helps maintain the security posture of your serverless applications and meet compliance requirements.
Use third-party monitoring tools: In addition to Amazon CloudWatch and AWS X-Ray, consider using third-party monitoring tools like Datadog, New Relic, or Dynatrace for more in-depth insights and enhanced visualization capabilities.
Establish a monitoring baseline: Define a baseline for your Lambda functions’ performance and resource usage under normal conditions. This helps you identify deviations from the norm, detect potential issues, and gauge the effectiveness of optimizations.

See Additional Guides on Key IaaS Topics

Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of IaaS.

Lumigo Launches AI Agent Observability