Provisioned Concurrency - the end of cold starts

Home Blog Provisioned Concurrency - the end of cold starts
Cloned Stormtroopers representing concurrent Lambda functions

AWS today announced Provisioned Concurrency, an exciting feature to allow Lambda customers to not have to worry about cold starts anymore.

And we at Lumigo are proud to be an official launch partner for Provisioned Concurrency for AWS Lambda.

In this post, we’ll drill into the problems cold starts pose, explore how Provisioned Concurrency resolves them, and explain some rough edges you need to understand when it comes to working with this new feature.

The topic of cold starts has been perhaps the most frequently and fiercely debated topic in the serverless community.

For many, it’s a non-issue because their primary workload is data processing, so spikes in latency don’t negatively impact user experience.

Or maybe their traffic pattern is so uniform and stable that there are seldom spikes that cause a flurry of cold starts.

Or, perhaps they are writing Lambda functions in Node.js, Python or Golang where you can optimize cold start durations to an acceptable range (<500ms) with minimal effort. That means even when cold starts happen the response time is still within the application’s SLA.

However, for many others, cold starts are a major stumbling block preventing them from adopting serverless.

Java and .Net functions often experience cold starts that last for several seconds! For user-facing APIs, that is clearly not desirable. Especially when you consider that slow response time can impact revenue as well as user experience, as Amazon found out a decade ago.

And in a complex system, APIs often have to call other APIs, which means cold starts can compound as well.

While it might be feasible for some to rewrite existing applications as they migrate to serverless, many enterprises have millions of lines of existing Java or .Net code that it’s simply not economically viable to rewrite for the sake of moving to serverless.

And some companies, such as those in the food delivery industry (e.g. JustEat or Deliveroo), experience very spiky traffic around certain times of the day.

These are predictable spikes that occur around the same time each day, but they would cause massive amounts of cold starts nonetheless. In fact, these spikes can also cause the application to run into other limits such as how quickly Lambda is able to scale out after the initial burst capacity.

Source: AWS

It’s for these reasons that we are so excited by the new Provisioned Concurrency feature.

Monitor & optimize cold start issues in AWS Lambda. Learn more

Provisioned concurrency is a game-changer

It requires no code change to existing functions and works for all Lambda runtimes. Once enabled, Provisioned Concurrency will keep your desired number of concurrent executions initialized and ready to respond to requests. No more cold starts!

Provisioned Concurrency can be enabled, disabled and adjusted on the fly using the AWS Management Console, AWS CLI, AWS SDK or CloudFormation. For the aforementioned food delivery services, it means they could increase the Provisioned Concurrency just before lunchtime and dinnertime spikes. So when the users flood in, there won’t be any cold starts. Since these spikes happen at predictable times, you can also use AWS Auto Scaling to adjust it on a schedule. More on this later.

 

It also works seamlessly with the existing on-demand scaling behavior. When there are more requests coming in than it can handle, Lambda would simply spill over to on-demand scaling. As before, you will see cold starts for these spillover invocations. But these should be few and far between if you have configured a sufficient amount of Provisioned Concurrency.

 

There are also a number of additional CloudWatch metrics for you to monitor the behavior of Provisioned Concurrency:

  • ProvisionedConcurrentExecutions – concurrent executions using Provisioned Concurrency.
  • ProvisionedConcurrencyUtilization – fraction of Provisioned Concurrency in use.
  • ProvisionedConcurrencyInvocations – number of invocations using Provisioned Concurrency.
  • ProvisionedConcurrencySpilloverInvocations – number of invocations that are above Provisioned Concurrency.

Provisioned Concurrency alleviates the need for Lambda warmer and other workarounds that have often been adopted by the community. It has also enabled workloads that were previously difficult to migrate to serverless, such as:

  • User-facing APIs that run on Java/.Net runtimes
  • Microservices where there are many API-to-API communications.
  • APIs that have a very spiky traffic pattern.

And now, let’s dig into the nitty-gritty details!

Snow flakes representing cold starts in AWS Lambda

How provisioned concurrency works

You can configure Provisioned Concurrency on a Lambda alias or version. It’s important to remember that you cannot configure it against the $LATEST alias, nor any alias that points to $LATEST.

After you enable Provisioned Concurrency, Lambda will provision the requested number of concurrent executions. This can take a minute or two, and you can check on its progress in the meantime.

Once fully provisioned, the Status will change to Ready. Invocations will then be handled by the Provisioned Concurrency ahead of regular on-demand concurrency.

As you can see in the graph above, 1 out of 1 invocation was handled as a Provisioned Concurrency.

However, as before, the first invocation would still report the Init Duration (the time it takes to initialize the function module) in the REPORT message in CloudWatch Logs. This init duration no longer happens as part of the first invocation. Instead, it happens when Lambda provisioned the Provisioned Concurrency. The duration is included in the REPORT message here purely for the sake of reporting it somewhere.

You can see evidence of this in the X-Ray trace for the first invocation.

As you can see from this trace, the Initialization step was performed ahead of the first invocation. In fact, if you look closely, the Initialization step happened over 30 minutes before the first invocation!

Note: based on conversations with AWS, there may be changes to the way initialization duration is reported in CloudWatch Logs and X-Ray.

Lumigo hearts Provisioned Concurrency

Here at Lumigo we have already fallen head over heels for this awesome new feature. And we want to help you make the most of it too.

As an AWS Advanced Technology Partner, we’re proud to be an AWS launch partner and announce immediate support for Provisioned Concurrency in the Lumigo platform.

When you log into Lumigo you’ll be able to view useful information about cold starts and Provisioned Concurrency at a glance.

You can see the average cold start duration as well as the percentage of invocations that were cold starts. These help you understand the impact cold starts have on your user experience and whether you should consider enabling Provisioned Concurrency on those functions.

For example, you can sort by the Cold Start % column and quickly identify functions that experience cold starts frequently. You can also sort by the Avg Cold Duration column to find functions with poor cold start performance.

cold starts and cold durations with provisioned concurrency

Combined with your knowledge about these functions – whether they serve API requests or perform background data processing – you can then make informed decisions about where to apply Provisioned Concurrency.

Once enabled, we also show you the total configured Provisioned Concurrency for each function in the same view, so you get all the relevant information at your fingertips!

total configured Provisioned Concurrency for each function

And that’s not all. 

You can also opt to receive preconfigured ‘Lumigo insight’ alerts with recommendations to help you determine which of your functions are over- or under-provisioned. This would help you right-size your configuration for Provisioned Concurrency and help you strike the best balance between performance and cost.

Lumigo also provides alerts with recommendations on when to consider using the new feature.

Finally, we have also added support for Provisioned Concurrency in the lumigo-cli tool. When you run the analyze-lambda-cold-starts command, it will calculate the total Provisioned Concurrency for each function and how fully utilized they have been over the specified timeframe.

E-Book - The Defintive Guide to Serverless cost

Where does the Provisioned Concurrency go?

Ultimately, the Provisioned Concurrency is always provisioned against a version. When you configure Provisioned Concurrency on an alias, it’s passed to the underlying version.

For instance, given the following configuration, where Provisioned Concurrency is configured on the canary alias, which currently points at version 10.

If you invoke version 10 directly instead of the alias….

… you will see that Provisioned Concurrency was used and the invocation was not a cold start.

Can you combine Provisioned Concurrency?

What happens if you configure Provisioned Concurrency on an alias as well as the version it’s associated with?

Luckily, you can’t.

This is good news because it avoids so much unnecessary complexity and confusion.

Equally, if you have two aliases that point to the same version, you’re prevented from configuring Provisioned Concurrency on both aliases.

In the case of a weighted alias, you have to configure its Provisioned Concurrency before you add routing configuration.

OK. That’s good, less confusion.

What if we start with two aliases, both pointing at different versions, and both have configured Provisioned Concurrency?

And what if we then enable routing on the canary alias with version 11, which already has 10 Provisioned Concurrency through the production alias?

Turns out, you can’t do that.

What if we configure Provisioned Concurrency on a version (v12) and then set up routing configuration against it?

Nope, not allowed either.

This is good. The rules are enforced consistently. The bottom line is, each version can only have one set of Provisioned Concurrency.

Working with concurrency limits

Another thing to remember is that, the Provisioned Concurrency comes out of your regional concurrency limit. You can configure Provisioned Concurrency on multiple aliases and/or versions of a function, all of which would count towards the regional concurrency limit. 

But what if the function in question has also configured a reserved concurrency? In that case the total Provisioned Concurrency across all its versions cannot exceed its reserved concurrency.

    sum(Provisioned Concurrency of all versions) <= reserved concurrency

The reverse is also true.

For a function with existing Provisioned Concurrency, you need to choose a reserved concurrency value equal or greater to the sum of its Provisioned Concurrency.

Pricing changes

Provisioned Concurrency also has a slightly different pricing model.

On-demand concurrency charges you based on:

  • Invocation duration: $0.06 per GB-hour, 100ms round up
  • Requests: $0.20 per 1M requests

Provisioned Concurrency has a slightly lower duration cost, but introduces an extra uptime component to the pricing:

  • Invocation duration: $0.035 per GB-hour, 100ms round up
  • Requests: $0.20 per 1M requests
  • $0.015 per GB-hour, 5 minute round up

Which means if you configure 1 Provisioned Concurrency on a function with 1GB of memory, then you will pay $0.015 per hour for it (rounded up to the next 5 minute block) even if there are no invocations. If you configure 10 Provisioned Concurrency for this function, then you’ll pay $0.15 per hour for them, and so on.

Eagle-eyed readers might notice that $0.035 + $0.015 = $0.05 per GB-hour for a fully utilized concurrent execution. Which is $0.01 (16%) cheaper than on-demand concurrency! So a system with high Provisioned Concurrency utilization can also save on Lambda cost too 😀

Behavior during deployments

Now let’s see how Provisioned Concurrency behaves when it comes to rolling out updates to the alias. When that happens, the alias’ Provisioned Concurrency is first removed from the old version, then applied to the new version. This process is not instant as Lambda needs to provision the desired concurrency against the new version.

However, traffic is routed to the new version straight away. This creates a window of time when requests against the alias would not fall under any Provisioned Concurrency!

This can be problematic to say the very least. It means every time you deploy a new version of your code you will have to lose the Provisioned Concurrency for a few minutes. This introduces cold starts and makes deployments less graceful than they should be.

Again, this is an issue that will likely be addressed soon.

In the meantime, it can be mitigated with weighted alias since Provisioned Concurrency is distributed across the two versions according to their respective weight.

As you ramp up the traffic to the new version gradually you don’t lose the existing Provisioned Concurrency all at once. So provided that you have some headroom then you will be able to gradually route all traffic to the new version without incurring cold starts.

In fact, during this gradual deployment, changes to the weighting takes a little while to happen. Meanwhile, if you refresh the console, you’ll see the old weighting configuration and the status of the Provisioned Concurrency as “In progress”. It’s as if Lambda is reserving the Provisioned Concurrency before committing to the new weighting configuration.

A minute or so later, the new weighting will be reflected in the console and the Provisioned Concurrency is in the “Ready” status.

The good news is that, while these changes are happening the service is not affected. While I gradually upped the percentage of traffic to a new version, there were no cold starts.

This process can be automated with CodeDeploy, which has built-in support for gradual deployments across two versions.

Another workaround would be to configure Provisioned Concurrency on the respective versions directly. However, it requires more orchestration:

  1. Create new version
  2. Configure Provisioned Concurrency on the new version
  3. Wait for the Provisioned Concurrency to be Ready
  4. Update alias to point to this new version
  5. Disable Provisioned Concurrency on the old version

This can be automated with better tooling, but at the time of writing I’m not aware of tools that support this workflow out-of-the-box.

Autoscaling

Provisioned Concurrency also works with AWS Auto Scaling, which allows you to configure scaling actions based on utilization level (think EC2 auto-scaling) or on a schedule (think cron).

In both cases, you have to first register the alias as a scaling target for AWS Auto Scaling. You can do this with the AWS CLI, like this:

aws –region sa-east-1 application-autoscaling register-scalable-target –service-namespace lambda –resource-id function:yc-test:canary –scalable-dimension lambda:function:ProvisionedConcurrency –min-capacity 1 –max-capacity 100

From now on, I will be able to configure scaling policies and scheduled actions against the canary alias on the function yc-test.

Scaling by utilization

Earlier, we mentioned the new ProvisionedConcurrencyUtilization metric. It shows you how much of the Provisioned Concurrency are you actually using.

It can be a useful indicator that you might have over-provisioned the number of Provisioned Concurrency. And, it can be used to auto-scale the Provisioned Concurrency as traffic pattern changes.

To auto-scale the number of Provisioned Concurrency, you can configure a scaling policy against this metric. You will need to run a command like this:

aws –region sa-east-1 application-autoscaling put-scaling-policy –service-namespace lambda –scalable-dimension lambda:function:ProvisionedConcurrency –resource-id function:yc-test:canary –policy-name TestPolicy –policy-type TargetTrackingScaling –target-tracking-scaling-policy-configuration file://config.json

My config.json file looks like this:

{

  “TargetValue”: 0.7,

  “PredefinedMetricSpecification”: {

    “PredefinedMetricType”: “LambdaProvisionedConcurrencyUtilization”

  }

}

When you run the command, you will get a response like this:

You can see the auto-generated CloudWatch Alarms in the CloudWatch console.

Starting with no Provisioned Concurrency on the alias, and a steady stream of traffic going to on-demand concurrency. Once the scaling policy was configured and the alarm is triggered, Provisioned Concurrency is automatically added to the alias:

And they start to take over the invocations from on-demand concurrency once the Provisioned Concurrency is ready.

At this point, our Provisioned Concurrency utilization is very low compared to our 70% threshold (see config.json above).

 

So as the traffic goes up, AWS Auto Scaling should take care of adding more Provisioned Concurrency to the alias.

However, at the time of writing, the auto-generated alarms uses Average utilization rather than the Maximum utilization you see below. Even as the traffic starts to outpace our Provisioned Concurrency and invocations spill over to on-demand, the alarm still shows we’re way below the scaling threshold.

As a result, auto-scaling does not behave as expected. This problem has been reported to the Lambda team and should be addressed in the near future. In the meantime, a temporary workaround is to modify the auto-generated alarms yourself and change it to use Maximum utilization instead.

Scheduled scaling

For the aforementioned use case, where food delivery services experience predictable spikes at the same time each day, we can configure a scheduled action to enable Provisioned Concurrency with a command like this:

aws –region sa-east-1 application-autoscaling put-scheduled-action –service-namespace lambda –scheduled-action-name TestScheduledAction –resource-id function:yc-test:canary –scalable-dimension lambda:function:ProvisionedConcurrency –scalable-target-action MinCapacity=20,MaxCapacity=20 –schedule “at(2019-11-28T11:05:00)”

This would configure 20 Provisioned Concurrency against the canary alias on the yc-test function. You can see the scheduled scaling actions with the following command:

aws –region sa-east-1 application-autoscaling describe-scheduled-actions –service-namespace lambda

And at exactly 11:05am UTC, I can see the Provisioned Concurrency being added to the specified alias.

 As before, the new Provisioned Concurrency takes a few minutes to provision. From the CloudWatch metrics I can see it started to take over invocations as they come into active service.

If you want to enable and disable Provisioned Concurrency at the same time each day, you can use cron expressions with the –schedule value.

Check out the documentation for AWS CLI for more details on the application-scaling commands.

Wrap-up

It’s an exciting day for AWS Lambda customers as well as those who are still evaluating its merits. One of the longest standing and most fiercely debated shortcomings of the platform has become a thing of the past.

To summarize, in this post we discussed:

  • The problems with cold starts and why it’s a blocker for some AWS customers.
  • What is Provisioned Concurrency and how does it work?
  • How does it work with auto-scaling?
  • How Lumigo makes it easier for you to work with Provisioned Concurrency and make smart choices about when to use it.
  • Rough edges and workarounds for deploying updates and auto-scaling.

I hope you have enjoyed this post. Here at Lumigo we’ll continue to bring you features that will make working with Provisioned Concurrency even easier.


Run serverless with confidence! Start monitoring with Lumigo today - try it free