• Guide Content

Top 8 Lambda Metrics and Defining Your Custom Metrics

What Are Lambda Metrics?

Lambda metrics are performance indicators that provide insights into the operation of AWS Lambda functions. These metrics help developers and system administrators monitor the behavior of their serverless applications, identifying potential issues and optimizing performance. Key metrics include invocation count, execution duration, error rates, and throttling incidents. 

By analyzing these metrics, users can fine-tune their Lambda functions to achieve better reliability and efficiency. Without these metrics, it would be nearly impossible to diagnose performance bottlenecks, unexpected errors, or scalability challenges. 

Metrics give a granular view of how the function performs over time and under different loads, allowing proactive management and quicker resolution of issues. This is particularly important in a production environment where downtime or performance degradation can have a significant impact.

This is part of a series of articles about serverless monitoring.

What Is Amazon CloudWatch? 

Amazon CloudWatch is a monitoring and observability service that allows users to collect, analyze, and act on data from various AWS resources, including Lambda functions. CloudWatch provides a unified view of operational health across different services, making it easier to track performance metrics, log files, and set alarms for critical conditions. 

It supports both out-of-the-box and custom metrics, helping to ensure that applications run smoothly. Users can create dashboards for real-time monitoring and set up automated responses to specific conditions, like triggering notifications or lambda functions based on metric thresholds.

Types of Lambda Metrics Available on CloudWatch 

CloudWatch supports several metrics related to AWS Lambda functions.

Execution Metrics

Execution metrics aid in understanding how frequently functions are called. Key metrics include the total number of invocations and the proportion of success versus failure rates. Monitoring these metrics helps track usage patterns and detect anomalies, such as sudden spikes in invocation rates or unexpected failures. 

Tracking the invocation count over different time periods enables developers to identify trends and make data-driven decisions about scaling and optimizing lambda functions. For example, consistently high invocation rates may indicate the need for optimization or resource adjustments. Unexpected drops in the invocation count could signal underlying issues needing immediate investigation. 

Performance Metrics

Performance metrics focus on how efficiently a lambda function executes. Two key performance metrics are the duration of each invocation and the overhead introduced by function initialization. Monitoring these helps to optimize code performance and reduce costs, as AWS charges are based partly on the duration of execution.

Latency and execution duration metrics provide useful data for identifying performance bottlenecks. High latency or prolonged execution times could indicate inefficient code or external service dependencies that need optimization. 

Concurrency Metrics

Concurrency metrics measure how many instances of a lambda function run simultaneously. Key metrics include concurrent invocations and maximum concurrency reached. Monitoring these metrics helps in understanding peak usage times and planning capacity. AWS imposes limits on concurrent executions, so it’s important to track these metrics to avoid throttling.

By observing concurrency metrics, users can better manage the scaling behavior of Lambda functions. Consistently hitting the concurrency limit might require a re-evaluation of function efficiency or an increase in allocated concurrent requests. 

Asynchronous Invocation Metrics

Asynchronous invocation metrics provide insights into the performance and reliability of Lambda functions triggered by asynchronous events. These metrics include the number of events successfully processed, retries due to failures, and events pushed to dead letter queues (DLQs). 

Tracking the success and failure of asynchronous invocations helps identify issues quickly, enabling timely corrective actions. For example, a high retry count might indicate underlying problems in the event source or function logic that need attention. 

Key Metrics to Monitor in Lambda with CloudWatch 

1. Invocations

The invocations metric indicates the demand for the function and can guide capacity planning and scaling decisions. Monitoring this metric helps users identify usage patterns, assess the impact of changes, and detect anomalies.

Understanding invocation patterns helps in preparing for periods of increased usage. For example, a sudden spike in invocations might require scaling adjustments or bug fixes to handle the load. Monitoring this metric ensures that the function can handle varying levels of traffic effectively.

2. Duration

The duration metric measures the time a lambda function takes to execute. This is crucial for performance optimization, as longer durations can increase costs and impact user experience. Duration is affected by code complexity, external dependencies, and resource allocation.

Optimizing the duration involves identifying bottlenecks and streamlining code execution. Reducing execution time improves efficiency and can lead to significant cost savings, especially for high-frequency functions. 

3. Errors

The errors metric tracks the number of failed execution attempts due to issues within the lambda function. Monitoring this metric helps in quickly identifying and addressing problems that could affect application reliability and user experience. Factors such as code errors, misconfigurations, and resource limitations can contribute to this metric.

Analyzing error patterns and troubleshooting issues promptly can prevent minor issues from escalating into major problems. Regularly monitoring this metric allows for quick diagnosis and resolution, ensuring the function performs reliably under different conditions.

4. Throttles

The throttles metric indicates the number of invocations that were limited due to reaching concurrency or resource limits. This is useful for understanding capacity constraints and planning for scalability. Frequent throttling suggests the need for adjustments in function configuration or account limits.

Monitoring throttling events helps in determining whether to request an increase in concurrent request limits or optimize existing resources. Addressing throttle issues ensures that the Lambda function can handle peak loads without service degradation.

5. DeadLetterErrors

The DeadLetterErrors metric tracks the number of failed events that could not be processed and were sent to a dead letter queue (DLQ). This metric is particularly important for asynchronous invocations, providing a way to handle and debug failed events. 

By analyzing the events in DLQs, developers can identify patterns and root causes of failures, leading to more robust and fault-tolerant applications. This metric is essential for maintaining high reliability and ensuring that critical events are not overlooked.

6. IteratorAge

The IteratorAge metric measures the age of the last event processed from a streaming source, such as DynamoDB or Kinesis. It indicates how far behind the Lambda function is in processing events. Monitoring this metric is important for real-time applications requiring timely processing of events.

High iterator age can signal performance issues or backlog in event processing. Regular monitoring helps maintain low latency and ensures that events are processed in a timely manner.

7. ConcurrentExecutions

The ConcurrentExecutions metric tracks the number of Lambda function instances running simultaneously. This metric is important for understanding capacity utilization and managing resource allocation. By keeping an eye on concurrent executions, developers can ensure that functions scale appropriately to meet demand.

Frequent monitoring helps in identifying trends and making informed decisions about resource allocation. This ensures that the function can handle varying workloads efficiently without reaching concurrency limits, maintaining performance and reliability.

8. UnreservedConcurrentExecutions

The UnreservedConcurrentExecutions metric indicates the number of unreserved function instances available for execution within the account’s concurrency limits. Monitoring this metric helps in ensuring that there is enough capacity for both reserved and unreserved functions.

This metric is useful for maintaining a balance between reserved and unreserved instances, ensuring that critical functions always have enough capacity to run. Regular monitoring helps in optimizing resource allocation and avoiding contention issues.

Defining Custom Metrics in AWS Lambda 

There are two ways to define custom Lambda metrics in AWS.

Embedded Metric Format (EMF)

Embedded Metric Format (EMF) is a feature provided by AWS that allows developers to publish custom metrics directly from their AWS lambda functions. EMF enables the embedding of metric data in JSON format within structured log events, which are then automatically extracted and sent to Amazon CloudWatch. This approach simplifies the process of creating custom metrics and enhances observability.

To use EMF, you incorporate specific JSON structures in your lambda function logs. These JSON structures contain key-value pairs that represent the metrics you want to capture. AWS CloudWatch automatically parses these logs, extracts the metrics, and makes them available for visualization and analysis.

For example, to track custom metrics like processing_time and item_count within a lambda function, you can log the following JSON:

{
  "_aws": {
    "Timestamp": 1597923590000,
    "CloudWatchMetrics": [
      {
        "Namespace": "MyApp/Metrics",
        "Dimensions": [["FunctionName"]],
        "Metrics": [
          {"Name": "ProcessingTime", "Unit": "Milliseconds"},
          {"Name": "ItemCount", "Unit": "Count"}
        ]
      }
    ]
  },
  "FunctionName": "MyLambdaFunction",
  "ProcessingTime": 123,
  "ItemCount": 27
}

This JSON structure is logged by the lambda function, and CloudWatch extracts the metrics ProcessingTime and ItemCount under the MyApp/Metrics namespace. By using EMF, you can easily create and manage custom metrics without the need for additional API calls, simplifying the monitoring of application behaviors.

CloudWatch PutMetricData API

The CloudWatch PutMetricData API provides another method for defining custom metrics in AWS Lambda. This API allows you to send metric data directly to Amazon CloudWatch from your lambda functions. This method offers greater flexibility and control over the metric data you send.

To use the PutMetricData API, you need to call the PutMetricData function from the AWS SDK within your lambda code. This involves specifying the metric namespace, metric name, and data points you wish to capture.

Here’s an example of how to send custom metrics using the PutMetricData API:

import boto3
import datetime

cloudwatch = boto3.client('cloudwatch')

def put_custom_metrics():
    response = cloudwatch.put_metric_data(
        Namespace='MyApp/Metrics',
        MetricData=[
            {
                'MetricName': 'ProcessingTime',
                'Dimensions': [
                    {
                        'Name': 'FunctionName',
                        'Value': 'MyLambdaFunction'
                    },
                ],
                'Timestamp': datetime.datetime.now(),
                'Value': 123.0,
                'Unit': 'Milliseconds'
            },
            {
                'MetricName': 'ItemCount',
                'Dimensions': [
                    {
                        'Name': 'FunctionName',
                        'Value': 'MyLambdaFunction'
                    },
                ],
                'Timestamp': datetime.datetime.now(),
                'Value': 27.0,
                'Unit': 'Count'
            },
        ]
    )
    return response

if __name__ == “__main__”:
    response = put_custom_metrics()
    print( response )

In this example, the put_metric_data call sends the ProcessingTime and ItemCount metrics to the MyApp/Metrics namespace. This data can then be visualized and analyzed in CloudWatch. Using the PutMetricData API allows for precise control over the metrics sent and is suitable for scenarios requiring more complex metric data handling.

AWS Lambda Observability, Debugging, and Performance Made Easy with Lumigo

Lumigo is a serverless monitoring platform that lets developers effortlessly find Lambda cold starts, understand their impact, and fix them.

Lumigo can help you:

  • Solve cold starts easily obtain cold start-related metrics for your Lambda functions, including cold start %, average cold duration, and enabled provisioned concurrency. Generate real-time alerts on cold starts, so you’ll know instantly when a function is under-provisioned and can adjust provisioned concurrency.
  • Find and fix issues in seconds with visual debugging – Lumigo builds a virtual stack trace of all services participating in the transaction. Everything is displayed in a visual map that can be searched and filtered.
  • Automatic distributed tracing – with one click and no manual code changes, Lumigo visualizes your entire environment, including your Lambdas, other AWS services, and every API call and external SaaS service.
  • Identify and remove performance bottlenecks – see the end-to-end execution duration of each service, and which services run sequentially and in parallel. Lumigo automatically identifies your worst latency offenders, including AWS Lambda cold starts.
  • Serverless-specific smart alerts – using machine learning, Lumigo’s predictive analytics identifies and alerts on issues before they impact application performance or costs, including alerts about AWS Lambda cold starts.

Get a free account with Lumigo resolve Lambda issues in seconds.