Guide Content

Guide Content

AWS Lambda Concurrency: The Complete Guide

What Is AWS Lambda Concurrency?

Lambda concurrency consists of the number of requests a certain function serves during any given time. Once a function is invoked, Lambda uses an instance of the function in order to process an event. After the function code stops running, it can start handling another request.

However, if the function is invoked while the request is still being processed, Lambda allocates another instance – and this increases the concurrency of the function. The total concurrency of all functions in an AWS account is subject to a per-region quota.

Here are the two available types of concurrency controls:

Reserved concurrency – guarantees the maximum number of concurrent instances for your function. Once the function reserves concurrency, other functions cannot use it. Lambda lets you configure reserved concurrency for your function free of charge.
Provisioned concurrency – lets you request a certain number of execution environments which get initialized and prepared to respond immediately to the invocations of the function. This option incurs additional charges.

In this article

Understanding Reserved Concurrency

Let’s start by defining concurrency: concurrency is the same request executed by more than one Lambda function at a time. A request is an event that triggers an invocation of a Lambda function.

By default, AWS Lambda gives you a pool of 1000 concurrent executions per AWS account. All Lambda functions in this account share this pool.

If your Lambda receives a large number of requests, up to 1000, AWS will execute those requests in the public pool. If, for example, one Lambda function receives 1000 requests or more, and then a second Lambda function receives additional requests, the second function’s requests will be rejected.

This is called throttling – the second function is blocked from receiving additional requests because there is no available concurrency.

How can you avoid throttling?

Some applications do not tolerate this risk of throttling. In this case, you can specify a certain number of executions for every Lambda – this is called reserved concurrency. Each Lambda function has a reserved concurrency parameter, which lets you save a specific amount of concurrency for the use of this function.

Keep in mind that AWS Lambda always keeps a minimum of 100 runs in the public pool. So you can reserve concurrency for specific Lambda functions up to a total of 900 concurrent runs.

What is Unreserved Account Concurrency?

If you select the option Use unreserved account concurrency for a Lambda function, that function will not have any reserved concurrency. It will run out of the public pool, and if no concurrency is available there, it will be blocked. This setting means that a function will be limited to concurrency in the public pool, and will not be able to use the reserved concurrency of other functions – even if they are not currently using it.

The same is true in the other direction. If you specify reserved concurrency for a Lambda function, the total number of concurrent executions may not cannot exceed this number. Reserved concurrency is also the concurrency limit for the function – any requests beyond this limit are throttled.

Reasons You Should Limit Lambda Concurrency

There are several reasons to suppress or limit the concurrency of a Lambda function:

Cost – you may want to limit the maximum budget that can be used by a specific Lambda function
Security – if an attacker tries to overload your function, as in a denial of service (DoS) attack, or a process accidentally makes too many requests, you want to limit the impact on your system.
Performance – concurrency limits can help you apply reasonable batch sizes to improve performance of Lambda functions and downstream services.
Scalability – while Lambda functions can scale easily, downstream resources like databases may not be able to. Limiting Lambda concurrency can help you align with the scaling capabilities of downstream resources.
Switching off – you can reduce the reserved concurrency to near zero to effectively switch off a Lambda function.

How to Define Reserved Concurrency

Use the AWS Lambda console to manage reserved concurrency settings for your functions.

Image Source: AWS

To define reserved concurrency:

Open the Lambda function, navigate to Functions page, and select a function.
Select Configuration > Concurrency > Edit.
Select the Reserve concurrency option and type a value. This value indicates the number of concurrent requests your function will be able to serve.
Click Save.

Keep in mind that you can reserve up to the total concurrency available for your AWS account (which is 1000 by default, but can be increased). If you have already reserved concurrency for other functions, you will have less concurrency available.

Understanding Provisioned Concurrency

A major problem with Lambda concurrency is the possibility of Lambda cold starts. A cold start is the first request handled by new Lambda workers and may take up to five seconds to execute. This can significantly impact both latency and user experience (read more in our guide to Lambda performance).

A cold start occurs because Lambda must initialize the worker (a container that runs your functions on the server) and the function module before passing the request to the handler function. When invoking a function, Lambda needs a container to be ready. After invoking a function, a container stays warm for a limited amount of time (usually 30-45 minutes) before it is shut down.

Source: AWS

Provisioned concurrency lets you define a required amount of concurrency and pre-prepare the necessary containers, enabling your Lambda function to immediately run and respond to requests – with no cold starts.

Provisioned concurrency lets AWS Lambda prepare containers for functions in advance. This means you can invoke functions with a double digit millisecond latency, even if the function was not called previously.

Important note: While reserved concurrency is offered at no additional cost (included in the cost of regular Lambda invocations), provisioned concurrency incurs extra costs. See provisioned concurrency costs below.

How to Turn On Provisioned Concurrency

To configure provisioned concurrency for your Lambda functions:

Go to the AWS Lambda console, navigate to the Functions screen, and select a Lambda function.
Select Actions > Publish new version. This option lets you apply settings to a published version or alias of an existing function.
You can add a description for the version if needed, then select Publish.
Select Actions > Create alias. Enter a name for the alias.
Go to the Version drop-down menu. Select 1 and then select Create.

Image Source: AWS

Go to the Concurrency card and select Add. Set the following options:

Under Qualifier Type, select the Alias radio button. Click the function alias you selected in the Alias drop-down menu. Specify a value for Provisioned Concurrency – this number defines the number of function instances that run continuously. Select Save.
Additional costs warning – here you can see the additional fees AWS Lambda will charge for your provisioned concurrency.

Go back to the Lambda console and make sure that the Provisioned Concurrency card displays the In progress status.

After a few minutes, the initialization process is complete. You can now use public aliases for functions with provisioned concurrency.

Costs of Provisioned Concurrency

Provisioned concurrency is different from reserved concurrency – while reserved concurrency is included in the regular cost of AWS Lambda, provisioned concurrency costs extra.

After activating provisioned concurrency, you pay a set fee for the Lambda containers that are active and awaiting requests (even if you do not actually use the containers), and a slightly lower fee for actual function invocation time. The cost per requests remains the same.

Regular cost of AWS Lambda without provisioned concurrency (for x86 hardware architecture, in the US East region):

Duration of function invocation: $0.0000166667 per GB-second
Requests: $0.20 per million requests

Cost of AWS Lambda with provisioned concurrency enabled (for x86 hardware architecture, in the US East region):

Provisioned concurrency: $0.0000041667 per GB-second
Duration of function invocation: $0.0000097222 per GB-second
Requests: $0.20 per million requests

What is Burst Concurrency?

In real-world applications, we get a lot of requests in milliseconds time span. And, there will be a situation, where an instance is serving an event while other invocations are requested in parallel. In that case, Lambda initializes another instance to handle the additional requests. As more events come in, Lambda initializes more instances and routes the requests based on instances’ availability. When the numbers decrease, Lambda starts to scale down by stopping the unused instances.

In Lambda, the number of instances that serve the request at a given time is known as Concurrency. However, the bursting of these instances cannot be infinite. It starts with an initial burst ranging from 500 to 3000 depending on the Region where Lambda function runs.

Burst concurrency limits

3000 – US West (Oregon), US East (N. Virginia), Europe (Ireland)
1000 – Asia Pacific (Tokyo), Europe (Frankfurt)
500 – Other Regions

After the initial burst, it can scale further by 500 instances per minute until it is sufficient to serve all the requests or max concurrency limit is hit. When the throughput is more than the instances can be scaled, they will start erroring out with throttling error code (429).

Image source: AWS

What are Lambda Concurrency Limits?

Amazon defines a limit on the number of concurrent requests that can be processed by Lambda users. Concurrency limits are defined at two levels:

Account – It is by default 1000 for an account per region. It can be extended by requesting if from AWS.

Function – It has to be configured at the level of each function. If not defined, it will use at the account-level concurrency limit. You can define up to 900 limits at function-level as the remaining 100 are left for those functions that didn’t define the concurrency limit. It is always recommended to define the Function level limit so that one function’s unreasonable scaling doesn’t impact the other functions in the same account.

If you see that your functions need more than the standard limit, you can request a quota increase by submitting a support ticket to AWS.

Application Auto-Scaling for Lambda Functions

AWS Lambda also has integration with Application Auto-Scaling. Application Auto-Scaling is a web service that enables automatic scaling of AWS resources. You can also configure Application Auto-Scaling to manage provisioned concurrency of a Lambda function.

There are two methods of scaling: schedule-based and utilization-based. If you have a use case in which you can anticipate the peak traffic, use schedule-based scaling. Otherwise, use utilization-based scaling. To increase provisioned concurrency based on the need at runtime, you can use the Application Auto Scaling API to register a target and create a scaling policy.

Image source: AWS

Monitoring Concurrency of a Lambda Function

AWS Lambda lets you measure the sum of all concurrent executions for a Lambda function. It also supports collecting metrics for all versions and aliases of a function. Below are some of the main metrics related to concurrency:

ConcurrentExecutions
UnreservedConcurrentExecutions
ProvisionedConcurrentExecutions
ProvisionedConcurrencyInvocations
ProvisionedConcurrencySpilloverInvocations
ProvisionedConcurrencyUtilization

Using Lumigo to Fine-Tune Provisioned Concurrency

Lumigo is a serverless monitoring solution. As an AWS Advanced Technology Partner and an AWS launch partner, Lumigo supports Lambda and provisioned concurrency. You can use the Lumigo platform to gain visibility into all cold starts in serverless applications. These insights can help you fine-tune provisioned concurrency, minimize performance issues, and reduce costs.

Here is how to use Lumigo to fine tune provisioned concurrency:

Create a free Lumigo account and then connect a tracer to your Lambdas. Once the installation process completes (in a few minutes), Lumigo can start helping you monitor and debug Lambda functions.
Go to the main Lumigo console to view insights about cold starts and provisioned concurrency. The information is displayed in the same view, to help you analyze all important data using a single display.
Set up alerts to track functions that are either over-provisioned or under-provisioned. Lumigo lets you customize alerts to help you right-size provisioned concurrency and maintain a balance between performance and cost.
To get information directly from the command line, go to the Lumigo CLI. Next, run the analyze-lambda-cold-starts command, which calculates the total provisioned concurrency per function. The command then determines how a function was utilized over a predefined timeframe.