Dec 03 2019
Amazon Lambda is a serverless runtime that offers provisioned concurrency, which is a feature that extends control over the performance of your serverless applications. Provisioned concurrency can help you avoid cold starts and latency issues in serverless functions.
Provisioned concurrency enables serverless functions to adapt to sudden bursts of traffic and significant scaling events. You can use provisioned concurrency to build scalable serverless applications with predictable latency. The feature lets you set a desired concurrency for all aliases and versions of each function.
AWS Lambda uses provisioned concurrency to pre-prepare containers for your functions. It also helps ensure that after containers are called, they be invoked with double-digit millisecond latency.
In this article:
Cold starts can negatively impact Lambda performance. A cold start is the first request handled by new Lambda workers and may take up to five seconds to execute. This can significantly impact both latency and user experience.
Cold starts occur because Lambda needs to initialize a worker as well as your function module, before it can pass a request to a handler function. When functions are invoked, Lambda requires a ready supply of containers. This means functions are kept warm for only a limited amount of time after executing – typically 30-45 minutes – before being spun down.
Source: AWS
Once enabled, Provisioned Concurrency will keep your desired number of concurrent executions initialized and ready to respond to requests. This means an end to cold starts!
Consider the example of a food delivery service. The service might experience cold starts during peaks of deliveries at lunch and dinner time. To solve the problem, they could increase Provisioned Concurrency just before lunchtime and dinnertime spikes. So when the users flood in, there won’t be any cold starts.
Since these spikes happen at predictable times, you can use AWS Auto Scaling to adjust it on a schedule (learn more about this below ).
Provisioned concurrency can work seamlessly with the existing on-demand scaling behavior. When there are more requests coming in than it can handle, Lambda would simply spill over to on-demand scaling. There will still be cold starts for these spillover invocations, but they will be few and far between, if you configured enough provisioned concurrency.
Here is a quick rundown that explains how you can configure provisioned concurrency for your Lambda functions from the AWS Management Console:
Image Source: AWS
After several minutes, the initialization process is completed. You can now use the published alias of the function along with the Provisioned Concurrency feature.
To modify these settings, you can also use the AWS CLI, AWS SDK, and AWS CloudFormation.
The basic way to monitor provisioned concurrency is to look at the following CloudWatch metrics:
For more advanced monitoring, and to receive proactive alerts about cold starts or insufficient cold starts, jump to provisioned concurrency with Lumigo below .
Be aware that the Provisioned Concurrency comes out of your regional concurrency limit. You can configure Provisioned Concurrency on multiple aliases and/or versions of a function, all of which would count towards the regional concurrency limit.
But what if the function in question has also configured a reserved concurrency? In that case, the total Provisioned Concurrency across all its versions cannot exceed its reserved concurrency.
sum(Provisioned Concurrency of all versions) <= reserved concurrency
The reverse is also true.
For a function with existing Provisioned Concurrency, you need to choose a reserved concurrency value equal or greater to the sum of its Provisioned Concurrency.
Provisioned Concurrency also has a slightly different pricing model.
On-demand concurrency charges you based on:
Provisioned Concurrency has a slightly lower duration cost, but introduces an extra uptime component to the pricing:
Here are two pricing examples:
Eagle-eyed readers might notice that $0.035 + $0.015 = $0.05 per GB-hour for a fully utilized concurrent execution. Which is $0.01 (16%) cheaper than on-demand concurrency! So a system with high Provisioned Concurrency utilization can also save on Lambda cost too 😀
Provisioned Concurrency also works with AWS Auto Scaling, which allows you to configure scaling actions based on utilization level (think EC2 auto-scaling) or on a schedule (think cron).
In both cases, you have to first register the alias as a scaling target for AWS Auto Scaling. You can do this with the AWS CLI, like this:
aws –region sa-east-1 application-autoscaling register-scalable-target –service-namespace lambda –resource-id function:yc-test:canary –scalable-dimension lambda:function:ProvisionedConcurrency –min-capacity 1 –max-capacity 100
From now on, I will be able to configure scaling policies and scheduled actions against the canary alias on the function yc-test.
Earlier, we mentioned the new ProvisionedConcurrencyUtilization metric. It shows you how much of the Provisioned Concurrency are you actually using.
It can be a useful indicator that you might have over-provisioned the number of Provisioned Concurrency. And, it can be used to auto-scale the Provisioned Concurrency as traffic patterns change.
To auto-scale the number of Provisioned Concurrency, you can configure a scaling policy against this metric. You will need to run a command like this:
aws –region sa-east-1 application-autoscaling put-scaling-policy –service-namespace lambda –scalable-dimension lambda:function:ProvisionedConcurrency –resource-id function:yc-test:canary –policy-name TestPolicy –policy-type TargetTrackingScaling –target-tracking-scaling-policy-configuration file://config.json
{ “TargetValue”: 0.7, “PredefinedMetricSpecification”: { “PredefinedMetricType”: “LambdaProvisionedConcurrencyUtilization” }}
You can see the auto-generated CloudWatch Alarms in the CloudWatch console.
Starting with no Provisioned Concurrency on the alias, and a steady stream of traffic going to on-demand concurrency. Once the scaling policy was configured and the alarm is triggered, Provisioned Concurrency is automatically added to the alias:
And they start to take over the invocations from on-demand concurrency once the Provisioned Concurrency is ready.
At this point, our Provisioned Concurrency utilization is very low compared to our 70% threshold (see config.json above).
So as the traffic goes up, AWS Auto Scaling should take care of adding more Provisioned Concurrency to the alias.
We can configure a scheduled action to enable Provisioned Concurrency with a command like this:
aws –region sa-east-1 application-autoscaling put-scheduled-action –service-namespace lambda –scheduled-action-name TestScheduledAction –resource-id function:yc-test:canary –scalable-dimension lambda:function:ProvisionedConcurrency –scalable-target-action MinCapacity=20,MaxCapacity=20 –schedule “at(2019-11-28T11:05:00)”
This would configure 20 Provisioned Concurrency against the canary alias on the yc-test function. You can see the scheduled scaling actions with the following command:
aws –region sa-east-1 application-autoscaling describe-scheduled-actions –service-namespace lambda
And at exactly 11:05am UTC, I can see the Provisioned Concurrency being added to the specified alias.
As before, the new Provisioned Concurrency takes a few minutes to provision. From the CloudWatch metrics I can see it started to take over invocations as they come into active service.
If you want to enable and disable Provisioned Concurrency at the same time each day, you can use cron expressions with the –schedule value.
Check out the documentation for AWS CLI for more details on the application-scaling commands.
Lumigo is a serverless monitoring solution. As an AWS Advanced Technology Partner and an AWS launch partner, Lumigo supports Lambda and provisioned concurrency. You can use the Lumigo platform to gain visibility into all cold starts in serverless applications. These insights can help you fine-tune provisioned concurrency, minimize performance issues, and reduce costs.
Here is how to use Lumigo to fine tune provisioned concurrency: