AWS Lambda Monitoring with CloudWatch, X-Ray, and Lumigo

  • Topics

What Is AWS Lambda Monitoring?

AWS Lambda is a serverless runtime platform that lets you run code functions on-demand. It’s an event-driven service that does not require provisioning compute or storage resources to support functions. However, it does require close monitoring to ensure application performance, health, and reliability.

AWS Lambda serverless monitoring involves achieving visibility into the state of your Lambda workloads. Lambda provides various metrics to help you understand the application’s state and lets you integrate with Amazon CloudWatch to extend visibility into logs and set up alerts. Amazon also provides AWS X-Ray for distributed tracing. However, to achieve complete observability and resolve issues like Lambda cold starts timeouts, and integration with other AWS services, you will need a dedicated serverless monitoring solution.

In this article:

  • Key Concepts of AWS Lambda Monitoring
  • AWS Lambda Metrics
  • Lambda Monitoring with AWS CloudWatch
  • Monitoring AWS Lambda with AWS X-Ray
  • Lambda Monitoring with Lumigo

Key Concepts of AWS Lambda Monitoring

Monitoring can enable you to detect performance problems, outages, and errors in your Lambda workloads. It is vital to monitor each service because Lambda-based applications frequently combine multiple services. Closely monitor the performance, throughput, and errors of event sources of your Lambda functions.

Here are the key concepts that serverless observability in AWS relies on:

  • Metrics—the Lambda service automatically publishes metrics for Lambda functions including time series data and service-level indicators like request and error rate, duration of invocation. You can also create metrics for your specific use cases.
  • Logs—Amazon CloudWatch is the default logging service in Lambda. Lambda logs show discrete events with timestamped records in serverless applications, like a failure, an error, or a state transformation. If you prefer, you can use a third-party logging system.
  • Alerts—CloudWatch alerts are notifications that can alert operators if metrics are outside expected bounds. You can receive alerts directly from CloudWatch, or integrate a full monitoring system with CloudWatch to monitor and report on anomalies.
  • Visualization—displaying metrics in a visual format and organizing them in dashboards.
  • Distributed tracing—identifying the full path of a request in a system built of several microservices.

AWS Lambda Metrics

Here are the primary AWS Lambda metrics you can access via Amazon CloudWatch.

Lambda invocation metrics include:

  • Invocations—the total number of times Lambda executes the function code (including successful and failed executions).
  • Errors—the number of Lambda invocations resulting in function errors.
  • DeadLetterErrors—the number of failed attempts Lambda makes to send an event to the dead letter queue (for asynchronous invocations).
  • Throttles—the number of throttled invocation requests (Lambda rejects these requests intentionally).
  • ProvisionedConcurrencyInvocations—the number of times Lambda executes the function code on provisioned concurrency.
  • ProvisionedConcurrencySpilloverInvocations—the number of times Lambda executes the function code on standard concurrency while provisioned concurrency is all in use.

Lambda performance metrics include:

  • Duration—the time the function code takes to process an event.
  • IteratorAge—the age of the oldest event record (for event source mappings reading from streams).

Lambda concurrency metrics include:

  • ConcurrentExecutions—the number of instances of function processing events.
  • ProvisionedConcurrentExecutions—the number of function instances processing events on provisioned concurrency.
  • ProvisionedConcurrencyUtilization—the ProvisionedConcurrentExecutions value divided by the total allocated provisioned concurrency (for an alias or version).

Lambda Monitoring with AWS CloudWatch

Lambda functions integrate with CloudWatch automatically. Here is how it works:

  • Lambda records various standard metrics automatically and publishes them to CloudWatch metrics.
  • By default, AWS durably stores logs from Lambda function invocations in a CloudWatch log stream.
  • The Lambda console provides a monitoring tab that offers a view into integrated CloudWatch metrics per each function.

Lambda Monitoring with AWS CloudWatch

Alarms in CloudWatch

CloudWatch lets you create alarms to monitor metrics and push notifications when metrics exceed typical values.

Here are several options:

  • Create composite alarms combining several alarms to get more effective notifications.
  • Create alarms manually in the console
  • Create alarms using an AWS SAM template to define the alarm with the application’s resources.

Custom metrics
CloudWatch can track application-specific custom metrics. AWS stores metrics with standard resolution by default to provide one-minute granularity. It lets you define custom metrics as standard or high resolution to achieve one-second granularity. High-resolution metrics offer more immediate insight into subminute activity and help generate alarms more quickly, according to 10-second or 30-second activity.

You can also develop graphs, statistical analysis, and dashboards on custom metrics. Instead of measuring performance related to Lambda functions, you can use custom metrics to track statistics in the application domain. A single statistic can have multiple dimensions for later analysis. Note that the system must publish each custom metric to a namespace, isolating groups of custom metrics. As a result, a namespace equates to a workload or application domain.

Learn more in our detailed guide to AWS Lambda CloudWatch

Monitoring AWS Lambda with AWS X-Ray

Amazon X-Ray lets you visualize the components of your serverless application, identify performance bottlenecks, and troubleshoot requests that are causing errors. Lambda functions send trace data directly to X-Ray, which processes the data to generate service maps and searchable trace summaries.

If the service that calls your Lambda function has X-Ray tracing enabled, Lambda automatically sends the trace to X-Ray. This can be set up for any upstream service, such as the Amazon API Gateway or any application instrumented with the X-Ray SDK. Such upstream applications can then add a trace header that samples incoming requests and sends a trace to Lambda.

You can also trace requests without trace headers by enabling active tracing in your function configuration, under Configuration > Monitoring tools > X-Ray > Active tracing.

A trace records information about requests processed by one or more services. The trace contains three subsegments:

  • Initialization—shows the initialization phase of the Lambda execution lifecycle. In this step, Lambda uses the configured resources to create or unfreeze the execution environment, download function code and all layers, initialize extensions, initialize runtime, and execute function initialization code.
  • Invocation—shows the invocation phase of the Lambda function handler. It starts with registering the runtime and ends when runtime is ready to send a response.
  • Overhead—shows the steps that occur between sending a response at runtime and the time of next call. During this time, the runtime completes all work related to the invocation and prepares to freeze the Lambda sandbox.

The image below shows how X-Ray visualizes a distributed trace.

Monitoring AWS Lambda with AWS X-Ray

Learn more in our detailed guide to AWS X-Ray

Lambda Monitoring with Lumigo

Lumigo is a serverless monitoring platform that lets developers effortlessly find Lambda cold starts, understand their impact, and fix them.

Lumigo can help you:

  • Solve cold starts – easily obtain cold start-related metrics for your Lambda functions, including cold start %, average cold duration, and enabled provisioned concurrency. Generate real-time alerts on cold starts, so you’ll know instantly when a function is under-provisioned and can adjust provisioned concurrency.
  • Find and fix issues in seconds with visual debugging – Lumigo builds a virtual stacktrace of all services participating in the transaction. Everything is displayed in a visual map that can be searched and filtered.
  • Automatic distributed tracing – with one click and no manual code/configuration changes, Lumigo visualizes your entire environment, including your Lambdas, other AWS services, and every API call and external SaaS service.
  • Identify and remove performance bottlenecks – see the end-to-end execution duration of each service, and which services run sequentially and in parallel. Lumigo automatically identifies your worst latency offenders, including AWS Lambda cold starts.
  • Correlate cross-system issues – Use a single dashboard to manage issues across all the different services in the account. Be able to correlate errors over time to get cross-system insights, and control the desired notification from a single place. For example, correlate time periods with DynamoDB throttles and Lambda timeouts, thus finding root causes at ease.