AWS Lambda Throttle: Detection, Prevention, and Management

What Is AWS Lambda Throttling?

Throttling in AWS Lambda refers to the process of limiting the concurrent execution of Lambda functions to prevent overwhelming the service and maintain resource usage within account-level and regional quotas. AWS enforces these limits to ensure fair usage among users, optimize performance, and protect against system abuse.

When the number of concurrent executions exceeds the set limits, Lambda starts throttling new invocation requests. Throttled invocations receive a 429 error response (Too Many Requests), and event sources like Amazon S3 or Amazon SNS may retry the throttled requests, depending on their retry policies. Monitoring and managing concurrency levels can help avoid throttling and maintain smooth Lambda function operation.

This is part of a series of articles about serverless monitoring

In this article

What Are the Reasons for Lambda Throttling?

Lambda throttling can occur due to several reasons. Some of the primary reasons include:

Concurrent executions limit: Each AWS account has a default soft limit of 1000 concurrent executions per region for all Lambda functions combined. If the total number of concurrent executions across all functions in a region exceeds this limit, additional requests will be throttled. You can request a limit increase from AWS if you need more concurrency.
Custom concurrency limits: You can set custom concurrency limits for individual Lambda functions to allocate specific resources or control the rate at which they scale. If a function reaches its custom concurrency limit, additional requests to that function will be throttled.
Burst concurrency limit: AWS Lambda allows a short-term burst of concurrent executions above the set limit. However, if the burst limit is exceeded, the function will be throttled. The burst limit is calculated based on the function’s allocated memory and the amount of available instance capacity.
Provisioned Concurrency: When using Provisioned Concurrency, you can set the number of pre-warmed instances for your Lambda function. If the number of requests exceeds the provisioned concurrency, additional requests will be throttled or served using the standard (non-pre-warmed) concurrency, depending on the function’s configuration.
Account-level service limits: AWS imposes account-level service limits for various resources, such as the total number of function invocations per region. If you exceed these limits, you may experience throttling. You can request a limit increase from AWS if needed.
Downstream service throttling: Your Lambda function may depend on other AWS services or external APIs with their own rate limits. If those services start throttling requests from your Lambda function, it could cause a backlog of requests in Lambda, eventually leading to Lambda throttling.

How Can You Prevent Lambda Throttling?

To address AWS Lambda throttling and ensure the smooth functioning of your applications, you can employ the following strategies:

Request a concurrency limit increase: If you find that the default concurrency limit is insufficient for your application’s needs, you can request a limit increase from AWS Support.
Set custom concurrency limits: For individual Lambda functions with specific concurrency requirements, you can set custom concurrency limits. This allows you to allocate resources according to your application’s needs while preventing overloading other functions in your account.
Use provisioned concurrency: Provisioned concurrency allows you to pre-warm a certain number of instances of your Lambda function to reduce cold starts and ensure low-latency responses. By configuring the appropriate amount of provisioned concurrency, you can reduce the chances of Lambda throttling due to bursts in traffic.
Implement retries with backoff: When Lambda throttling occurs, you should design your applications to handle these cases by implementing retries with exponential backoff. This approach helps prevent overwhelming your function with retries while giving the function time to scale up and accommodate additional requests.
Use Amazon API Gateway: You can use Amazon API Gateway to control the rate at which requests are sent to your Lambda functions. API Gateway allows you to set up throttling rules, including rate limits and burst limits, that can help prevent Lambda from being overloaded with requests.
Optimize function performance: Ensure that your Lambda functions are optimized for performance by reducing the execution time, minimizing package size, and allocating the appropriate amount of memory. Faster functions can handle more requests within the same concurrency limits.
Monitor your functions: Use AWS CloudWatch to monitor your Lambda functions’ performance, including metrics such as invocation count, duration, and throttled invocations. Monitoring these metrics can help you identify potential bottlenecks and adjust your concurrency settings accordingly.
Handle downstream service throttling: If your Lambda functions depend on other AWS services or external APIs with their own rate limits, ensure that you handle throttling from those services appropriately. This may include implementing retries with backoff, using circuit breakers, or implementing request queues to manage the flow of requests to downstream services.

By employing these strategies, you can effectively manage Lambda throttling and ensure the reliable, performant operation of your serverless applications.

Identifying and Managing Throttling with CloudWatch

AWS CloudWatch can assist in both identifying and managing Lambda function throttling. Here’s how you can accomplish this:

Identifying AWS Lambda throttling with CloudWatch:

Understand AWS Lambda metrics: AWS Lambda automatically sends several metrics to CloudWatch, including “Invocations”, “Errors”, “Duration”, and “Throttles”. The “Throttles” metric represents the number of Lambda function execution attempts that were throttled due to invocation rates exceeding the current limits.
Analyze metrics in CloudWatch: Use the CloudWatch console to visualize these metrics. Monitor the “Throttles” metric for spikes or consistent high levels, which could indicate throttling.
Set CloudWatch alarms: You can create alarms in CloudWatch that will trigger when the “Throttles” metric exceeds a certain threshold. When the alarm state changes, CloudWatch can send notifications through Amazon SNS, which you can receive via email, SMS, or other channels.

Managing AWS Lambda throttling with CloudWatch:

Understanding the cause: If you’ve confirmed that throttling is happening, it’s crucial to understand why. It could be due to your function reaching its concurrency limit, or because of an issue with downstream resources. Analyze CloudWatch logs and X-Ray traces (if enabled) to diagnose the issue.
Request a concurrency limit increase: If you consistently hit the concurrency limit, and it’s not due to a temporary spike in invocations, consider requesting a limit increase from AWS.
Use reserved concurrency: By setting a reserved concurrency limit for critical functions, you can ensure that they always have some capacity to run, and that they don’t consume all available concurrency.
Implement retry and backoff logic: If your Lambda function is throttled due to downstream resource limitations, consider implementing retry logic with exponential backoff and jitter. This strategy gradually increases the wait time between retries, reducing the likelihood of further throttling.
Use provisioned concurrency for predictable workloads: If you can predict the load on your Lambda function, consider using provisioned concurrency. This feature keeps a specified number of execution environments initialized and ready to respond immediately, preventing cold starts and eliminating the risk of throttling due to concurrency limits.
Monitoring and adjusting: Continuously monitor CloudWatch metrics, alarms, and logs to ensure that your changes are effective and adjust your strategies as needed.

Remember, effective management of Lambda throttling requires understanding your workload patterns, making data-driven decisions, and monitoring the impact of changes you make.

Identify and Resolve Lambda Throttling with Lumigo

Lumigo is a monitoring and troubleshooting platform for microservices and can be used to help identify performance issues with Lambdas, as well as issues in services across your serverless environment. Using the information surfaced in Lumigo, you can address throttled functions, fine-tune provisioned concurrency, minimize performance issues and reduce cost.

Set up alerts to track functions that are either over-provisioned or under-provisioned.
Customize alerts to help you right-size provisioned concurrency and maintain a balance between performance and cost
Gain visibility into functions with cold starts and identify functions with timeouts
Drill into function details and see recent invocations, check metrics and search logs