Lambda concurrency consists of the number of requests a certain function serves during any given time. Once a function is invoked, Lambda uses an instance of the function in order to process an event. After the function code stops running, it can start handling another request.
However, if the function is invoked while the request is still being processed, Lambda allocates another instance – and this increases the concurrency of the function. The total concurrency of all functions in an AWS account is subject to a per-region quota.
Here are the two available types of concurrency controls:
In this article:
Let’s start by defining concurrency: concurrency is the same request executed by more than one Lambda function at a time. A request is an event that triggers an invocation of a Lambda function.
By default, AWS Lambda gives you a pool of 1000 concurrent executions per AWS account. All Lambda functions in this account share this pool.
If your Lambda receives a large number of requests, up to 1000, AWS will execute those requests in the public pool. If, for example, one Lambda function receives 1000 requests or more, and then a second Lambda function receives additional requests, the second function’s requests will be rejected.
This is called throttling – the second function is blocked from receiving additional requests because there is no available concurrency.
How can you avoid throttling?
Some applications do not tolerate this risk of throttling. In this case, you can specify a certain number of executions for every Lambda – this is called reserved concurrency. Each Lambda function has a reserved concurrency parameter, which lets you save a specific amount of concurrency for the use of this function.
Keep in mind that AWS Lambda always keeps a minimum of 100 runs in the public pool. So you can reserve concurrency for specific Lambda functions up to a total of 900 concurrent runs.
If you select the option Use unreserved account concurrency for a Lambda function, that function will not have any reserved concurrency. It will run out of the public pool, and if no concurrency is available there, it will be blocked. This setting means that a function will be limited to concurrency in the public pool, and will not be able to use the reserved concurrency of other functions – even if they are not currently using it.
The same is true in the other direction. If you specify reserved concurrency for a Lambda function, the total number of concurrent executions may not cannot exceed this number. Reserved concurrency is also the concurrency limit for the function – any requests beyond this limit are throttled.
There are several reasons to suppress or limit the concurrency of a Lambda function:
Use the AWS Lambda console to manage reserved concurrency settings for your functions.
Image Source: AWS
To define reserved concurrency:
Keep in mind that you can reserve up to the total concurrency available for your AWS account (which is 1000 by default, but can be increased). If you have already reserved concurrency for other functions, you will have less concurrency available.
A major problem with Lambda concurrency is the possibility of Lambda cold starts. A cold start is the first request handled by new Lambda workers and may take up to five seconds to execute. This can significantly impact both latency and user experience (read more in our guide to Lambda performance).
A cold start occurs because Lambda must initialize the worker (a container that runs your functions on the server) and the function module before passing the request to the handler function. When invoking a function, Lambda needs a container to be ready. After invoking a function, a container stays warm for a limited amount of time (usually 30-45 minutes) before it is shut down.
Provisioned concurrency lets you define a required amount of concurrency and pre-prepare the necessary containers, enabling your Lambda function to immediately run and respond to requests – with no cold starts.
Provisioned concurrency lets AWS Lambda prepare containers for functions in advance. This means you can invoke functions with a double digit millisecond latency, even if the function was not called previously.
Important note: While reserved concurrency is offered at no additional cost (included in the cost of regular Lambda invocations), provisioned concurrency incurs extra costs. See provisioned concurrency costs below.
To configure provisioned concurrency for your Lambda functions:
Image Source: AWS
After a few minutes, the initialization process is complete. You can now use public aliases for functions with provisioned concurrency.
Provisioned concurrency is different from reserved concurrency – while reserved concurrency is included in the regular cost of AWS Lambda, provisioned concurrency costs extra.
After activating provisioned concurrency, you pay a set fee for the Lambda containers that are active and awaiting requests (even if you do not actually use the containers), and a slightly lower fee for actual function invocation time. The cost per requests remains the same.
Regular cost of AWS Lambda without provisioned concurrency (for x86 hardware architecture, in the US East region):
Cost of AWS Lambda with provisioned concurrency enabled (for x86 hardware architecture, in the US East region):
In real-world applications, we get a lot of requests in milliseconds time span. And, there will be a situation, where an instance is serving an event while other invocations are requested in parallel. In that case, Lambda initializes another instance to handle the additional requests. As more events come in, Lambda initializes more instances and routes the requests based on instances’ availability. When the numbers decrease, Lambda starts to scale down by stopping the unused instances.
In Lambda, the number of instances that serve the request at a given time is known as Concurrency. However, the bursting of these instances cannot be infinite. It starts with an initial burst ranging from 500 to 3000 depending on the Region where Lambda function runs.
Burst concurrency limits
After the initial burst, it can scale further by 500 instances per minute until it is sufficient to serve all the requests or max concurrency limit is hit. When the throughput is more than the instances can be scaled, they will start erroring out with throttling error code (429).
Image source: AWS
Amazon defines a limit on the number of concurrent requests that can be processed by Lambda users. Concurrency limits are defined at two levels:
Account – It is by default 1000 for an account per region. It can be extended by requesting if from AWS.
Function – It has to be configured at the level of each function. If not defined, it will use at the account-level concurrency limit. You can define up to 900 limits at function-level as the remaining 100 are left for those functions that didn’t define the concurrency limit. It is always recommended to define the Function level limit so that one function’s unreasonable scaling doesn’t impact the other functions in the same account.
If you see that your functions need more than the standard limit, you can request a quota increase by submitting a support ticket to AWS.
AWS Lambda also has integration with Application Auto-Scaling. Application Auto-Scaling is a web service that enables automatic scaling of AWS resources. You can also configure Application Auto-Scaling to manage provisioned concurrency of a Lambda function.
There are two methods of scaling: schedule-based and utilization-based. If you have a use case in which you can anticipate the peak traffic, use schedule-based scaling. Otherwise, use utilization-based scaling. To increase provisioned concurrency based on the need at runtime, you can use the Application Auto Scaling API to register a target and create a scaling policy.
Image source: AWS
AWS Lambda lets you measure the sum of all concurrent executions for a Lambda function. It also supports collecting metrics for all versions and aliases of a function. Below are some of the main metrics related to concurrency:
Lumigo is a serverless monitoring solution. As an AWS Advanced Technology Partner and an AWS launch partner, Lumigo supports Lambda and provisioned concurrency. You can use the Lumigo platform to gain visibility into all cold starts in serverless applications. These insights can help you fine-tune provisioned concurrency, minimize performance issues, and reduce costs.
Here is how to use Lumigo to fine tune provisioned concurrency: