AWS Lambda Cold Starts: Solving the Problem

Home Blog AWS Lambda Cold Starts: Solving the Problem
Blog - Lambda Cold Starts

What are Lambda Cold Starts?

Cold starts can be a killer to Lambda performance, especially if you’re developing a customer-facing application that needs to operate in real time. They happen because if your Lambda is not already running, AWS needs to deploy your code and spin up a new container before the request can begin. This can mean a request takes much longer to execute, and only when the container is ready can your lambda start running.

A “cold start” is the 1st request that a new Lambda worker handles. This request takes longer to process because the Lambda service needs to:

  1. find a space in its EC2 fleet to allocate the worker
  2. initialize the worker
  3. initialize your function module

before it can pass the request to your handler function.

The fact of the matter is that cold starts are a necessary byproduct of the scalability of serverless. Cold starts reflect the AWS Lambda startup time required to “warm up” containers before functions become operational.

AWS needs a ready supply of containers to spin up when functions are invoked. That means that functions are kept warm for a limited amount of time (usually 30 – 45 minutes) after executing, before being spun down so that container is ready for any new function to be invoked.

Source: AWS

Cold starts in Lambda account for less than 0.25% of requests but the impact can be huge, sometimes requiring 5 seconds to execute the code. This issue is particularly relevant to applications that need to run executions in real-time, or those that rely on split-second timing.

What is Provisioned Concurrency and How Can it Solve Cold Starts?

Amazon Lambda provides Provisioned Concurrency, a feature that gives you more control over the performance of serverless applications. Using Provisioned Concurrency, you can avoid cold starts and startup latency issues for your Lambda functions.

Provisioned Concurrency lets you build scalable serverless applications with predictable latency. You can set the desired concurrency for all versions or aliases of a function. AWS Lambda prepares containers for your functions, ensuring they can be invoked with double-digit millisecond latency after they are called. This means serverless functions can adapt to sudden bursts of traffic, or significant scaling events, without increasing latency.

However, provisioned concurrency comes at a cost. You are billed for provisioned concurrency, from the moment you enable it, rounded to the nearest 5 minutes. The price is calculated according to the amount of concurrency (number of concurrent function invocations that can be performed without latency), and the amount of memory you allocate.

This means you must set provisioned concurrency carefully – specify just enough concurrency for your workloads, to avoid incurring unnecessary costs.

How to Turn On Provisioned Concurrency

The steps below explain how you can configure Provisioned Concurrency for your Lambda functions using the AWS Management Console.

  1. In the AWS Lambda console, choose an existing Lambda function.
  2. In the Actions drop-down menu, choose the Publish new version option. This will let you apply settings to an alias or a published version of a function.

Image Source: AWS

  1. You can add a description for the version, but this is optional. When done, choose the Publish option.
  2. In the Actions drop-down menu, choose the Create alias option. For each alias, enter a name.
  3. In the Version drop-down menu, and choose 1, and then choose Create.

Image Source: AWS

  1. Find the Concurrency card and then choose the Add option.
  2. Choose the Alias radio button for Qualifier Type, and then choose the function alias you selected previously in the Alias drop-down menu. Define the required value for Provisioned Concurrency – the number specifies the number of function instances that will run continuously. Choose Save.

Warning – additional costs: provisioned concurrency is billed in addition to regular invocation costs for AWS Lambda. You pay for provisioned concurrency as if additional instances of your function were invoked and running on a continuous basis.

  1. After you complete step 7, go to the Lambda console. The Provisioned Concurrency card should display the In progress status.

The initialization process will be complete after several minutes, and then you can use the published alias of your function with the Provisioned Concurrency feature.

Image Source: AWS

The above steps apply to the AWS Management Console. You can also use AWS CloudFormation, the AWS CLI, and AWS SDK to modify these settings.

Using Lumigo to Fine Tune Provisioned Concurrency

Lumigo is a serverless monitoring solution with strong support for AWS Lambda. We are an AWS Advanced Technology Partner and an AWS launch partner, providing support for Provisioned Concurrency in our platform since its launch.

The Lumigo platform gives you unprecedented visibility into cold starts in your serverless applications, and helps you fine tune Provisioned Concurrency, to ensure you minimize performance issues and conserve costs.

  1. Create a free Lumigo account and connect a tracer to your Lambdas

You can get a free Lumigo account here. Installation takes only a few minutes, and you can start monitoring and debugging your AWS Lambda functions, with full distributed tracing. Log into your Lumigo account.

  1. Gain insights into cold starts and Provisioned Concurrency

In the main Lumigo console, you can view information about Provisioned Concurrency as well as cold starts. Information about cold starts and Provisioned Concurrency is displayed in the same view, ensuring you can analyze all important data from within one display.

The console shows the following metrics for cold starts:

  • For each Lambda function, what percentage of invocations are cold starts
  • Average cold duration for cold starts, indicating how serious the problem is
  • Number of provisioned concurrency instances that are enabled for the function, so you can see at a glance is there is enough provisioned concurrency to prevent cold starts

  1. Set up alerts

Lumigo can send you recommendations, designed to help you keep track of functions that are under-provisioned or over-provisioned. You can easily customize these alerts. The goal is to help you right-size your Provisioned Concurrency and ensure a balance between cost and performance.

  1. Use the Lumigo CLI tool to get info about Provisioned Concurrency directly from the command line

In the Lumigo CLI, once you run the analyze-lambda-cold-starts command, it calculates the total Provisioned Concurrency per function, and determines how the function was utilized over the predefined time frame.

More Ways to Improve Lambda Cold Start Performance

Monitor to Identify How Cold Starts are Affecting Your Application

Even if you use provisioned concurrency correctly, cold starts can happen. It is important to monitor your applications and identify how cold starts affect performance. Cold starts increase latency for some requests, and you need to identify which requests are affected, and whether they impact your end users.

Both CloudWatch Logs and X-Ray can help you to identify where and when cold starts are occurring in your application, although it requires some active process of deduction on your part. A serverless-focused monitoring platform like Lumigo makes it much easier to monitor how cold starts are affecting your application.

In the Lumigo dashboard, you can see at a glance the functions with the most cold starts. When you see functions with high percentage of cold starts, such as the graphql-api-prod-listSports function below (with 57.36% of its invocations being cold starts), these are functions that you need to pay special attention to!

You can drill into each of these functions further and see how bad these cold starts are in terms of duration. After all, if the cold start duration is short then the cold starts have a much smaller impact on our user experience when they happen. The worst-case scenario is when the cold start duration is long and cold starts happen frequently!

Furthermore, you can set up alerts in Lumigo so you can be notified when your functions experience a high percentage of cold starts. This is a great way to keep an eye on those user-facing functions where you’re concerned about end-to-end latency.

Reduce the Number of Packages

We’ve seen that the biggest impact on AWS Lambda cold start times is not the size of the package but the initialization time when the package is actually loaded for the first time.

The more packages you use, the longer it will take for the container to load them. Tools such as Browserify and Serverless Plugin Optimize can help you reduce the number of packages.

Related Research – Web Frameworks Implication on Serverless Cold Start Performance in NodeJS

Use Node.js, Python or Golang

If you write Lambda functions in Node.js, Python or Golang, you can optimize cold start durations to an acceptable range (<500ms) with minimal effort. That means that even when cold starts happen the response time is still within the application’s SLA.

In one experiment, Nathan Malishev found that Python, Node.js and Go took much less time to initialize than Java or .NET, with Python performing at least twice as quickly compared to Java, depending on memory allocation.

Source: Lambda Cold Starts, A Language Comparison – Nathan Malishev

E-Book: Learn best practices for monitoring serverless applications

What Affects Cold Start Duration? Our Experiment

If you want to understand cold starts better, let’s take a closer look at the main factors that affect cold starts. It turns out that different types of requests can result in different cold start times. We designed an experiment to better understand these effects.

Type 1 vs. Type 2 Cold Starts

Michael Hart originally noticed that there are noticeable differences between two types of cold starts:

  1. Cold starts that happen immediately after a code change
  2. Other cold starts (e.g. when Lambda needs to scale up the number of workers to match traffic demand)

Perhaps there are some additional steps that need to be performed during the first cold start after a code deployment. Hence why the first cold start after a code change takes longer than the other cold starts.

In practice, most of the cold starts you will see in the wild will be of the 2nd type and it’s where we should focus on. However, I was really intrigued by this discovery and ran several experiments myself.

Experiment Design

In one such experiment, I measured the roundtrip duration for a few different functions:

  • control— a hello world function with no dependencies whatsoever.
module.exports.handler = async event => {
  return {
     statusCode: 200,
     body: '{}'
  }
}
  • AWS SDK is bundled but not required— the same function as control, but the deployment artifact includes the Node.js AWS SDK (even though the function doesn’t actually require it), which results in a 9.5MB deployment artifact.
  • control with big assets — the same function as control, but the deployment artifact includes two large MP3 files, which results in a 60.2MB deployment artifact.
  • require bundled AWS SDK— a function that requires the AWS SDK duration module initialization. This function bundles the AWS SDK as part of its deployment artifact (9.5MB).
const AWS = require('aws-sdk')module.exports.handler = async event => {
  return {
     statusCode: 200,
     body: '{}'
  }
}
  • require AWS SDK via Layer— the same function as require bundled AWS SDK but the AWS SDK is not bundled in the deployment artifact. Instead, the AWS SDK is injected via a Lambda layer.
  • require built-in AWS SDK— the same function as require bundled AWS SDK but the AWS SDK is not bundled in the deployment artifact. Instead, it’s using the AWS SDK that is included in the Lambda execution environment.

For each of these functions, I collected 200 data points for the post deploy cold starts (type 1) and 1000 data points for the other cold starts (type 2). The results are as follows.

There are a few things you can learn from these data.

Performance Impact of Cold Start Type

Type 1 cold starts (immediately after a code deployment) consistently take longer than type 2, especially as we look at the tail latencies (p99).

Performance Impact of Deployment Artifact Size

The size of the artifact has an impact on cold start even if the function does not actively require them. The following three tests all have the same function code:

module.exports.handler = async event => {
  return {
     statusCode: 200,
     body: '{}'
  }
}

The only difference is in the size of the deployment artifact. As you can see below, bundling the Node.js AWS SDK in the deployment artifact adds 20–60ms to the roundtrip latency for a cold start. But when that artifact gets much bigger, so too does the latency impact.

When the artifact is 60MB, this adds a whopping 250–450ms!

So, deployment size does impact cold start, but the impact is somewhat minimal if it’s just the AWS SDK.

Performance Impact of Dependency Source

Often times, the AWS SDK is an unavoidable dependency. But turns out where the AWS SDK comes from matters too. It’s fastest to use the AWS SDK that’s built into the Lambda execution environment. Interestingly, it’s also much faster to load the AWS SDK via Layers than it is when you bundle it in the deployment artifact! The difference is much more significant than the aforementioned 20–60ms, which suggests that there are additional factors at play.

Before you decide to never bundle the AWS SDK in your deployment artifacts, there are other factors to consider.

For example, if you use the built-in AWS SDK then you effectively lose immutable infrastructure. There have also been instances when people’s functions suddenly break when AWS upgraded the version of the AWS SDK. Read this post for more details.

If you use Lambda layers, then you must carry additional operational overhead since the Lambda layer requires a separate deployment and you still have to update every function that references this layer. Read this post for why Lambda layer is not a silver bullet and should be used sparingly.

That being said, for serverless framework users, there is a clever plugin called serverless-layers which sidesteps a lot of the operational issues with Lambda Layers. Effectively, it doesn’t use Layers as a way to share code but use it purely as an optimization. During each deployment it checks if your dependencies have changed, and if so, package and deploy the dependencies as a Lambda layer (just for that project) and update all the functions to reference the layer.

But wait! There’s more.

Performance Impact of the “Require” Statement

Just what is taking the time when this line of code runs during module initialization?

const AWS = require('aws-sdk')

Behind the scenes, the Node runtime must resolve the dependency and check if aws-sdk exists in any of the paths on the NODE_PATH. And when the module folder is found, it has to run the initialization logic on the aws-sdk module and resolve all of its dependencies and so on.

All these takes CPU cycles and filesystem IO calls, and that’s where we incur the latency overhead.

So, if your function just needs the DynamoDB client then you can save yourself a lot of cold start time by requiring ONLY the DynamoDB client.

const DynamoDB = require('aws-sdk/clients/dynamodb')

And since a lot of the cold start time is going towards resolving dependencies, what if we remove the need for runtime dependency resolution altogether?

Performance Impact of WebPack

By using a bundler like webpack, we can resolve all the dependencies ahead of time and shake them down to only the code that we actually need.

This creates savings in two ways:

  • smaller deployment artifact
  • no runtime resolution

And the result is awesome!

So, if you’re running Node.js and want to minimize your Lambda cold start time. Then the most effective thing you can do is to be mindful of what you require in your code and then apply webpack. It addresses several of the contributing factors to cold time latency simultaneously.

For the Serverless framework users out there, you can use the serverless-webpack plugin to do this for you.

Get Lumigo and Identify, Fix and Prevent Cold Starts

Lumigo is a serverless monitoring platform that lets developers effortlessly find Lambda cold starts, understand their impact, and fix them.

Lumigo tracks out-of-the-box key Lambda cold start-related metrics for your Lambda functions, including cold start %, average cold duration, and enabled provisioned concurrency. It also generates real-time alerts on cold starts, so you’ll know instantly when a function is under-provisioned.

Beyond cold starts, Lumigo can help you:

  • Find and fix issues in seconds with visual debugging – Lumigo builds a virtual stack trace of all services participating in the transaction. Everything is displayed in a visual map that can be searched and filtered.
  • Automatic distributed tracing – with one click and no manual code changes, Lumigo visualizes your entire environment, including your Lambdas, other AWS services, and every API call and external SaaS service.
  • Identify and remove performance bottlenecks – see the end-to-end execution duration of each service, and which services run sequentially and in parallel. Lumigo automatically identifies your worst latency offenders, including AWS Lambda cold starts.
  • Serverless-specific smart alerts – using machine learning, Lumigo’s predictive analytics identifies and alerts on issues before they impact application performance or costs, including alerts about AWS Lambda cold starts.

Get a free account with Lumigo and let us help you eliminate Lambda cold starts today