Guide Content

Guide Content

AWS Lambda Performance Optimization: 12 Tips and Advanced Techniques

What Is AWS Lambda Performance Tuning?

AWS Lambda performance tuning involves optimizing the execution of your serverless functions to ensure they run efficiently and cost-effectively. The primary focus is on minimizing the cold start latency, optimizing memory allocation, reducing execution duration, and managing dependencies. Tuning your AWS Lambda functions involves several strategies and best practices to ensure optimal performance, which can directly impact user experience and application responsiveness.

Performance tuning can also include optimizing the code itself, selecting appropriate runtime environments, and ensuring that external resources such as databases and APIs are accessed efficiently. By implementing these optimizations, you can achieve faster execution times, reduced costs, and improved scalability for your serverless applications.

This is part of our comprehensive guide to performance testing in a cloud native world.

In this article

What Are AWS Lambda Benchmarks?

AWS Lambda benchmarks are standardized tests used to measure the performance of Lambda functions under various conditions. These benchmarks typically evaluate key performance metrics such as cold start duration, warm start duration, execution time, memory usage, and scalability under load.

Cold start benchmarks measure the time it takes for a Lambda function to initialize from scratch, which includes provisioning the runtime environment and loading the code. Warm start benchmarks assess the performance when the function is invoked while the execution environment is already initialized.

Other important benchmarks might include the impact of different runtime environments (e.g., Node.js, Python, Java), the influence of memory allocation settings on execution speed and cost, and the efficiency of handling network requests and database connections. By regularly benchmarking your AWS Lambda functions, you can identify performance bottlenecks and make informed decisions about optimizations.

12 Quick Tips for AWS Lambda Performance Optimization

Here are 12 quick tips you can immediately use to improve your performance:

Language – prefer interpreted languages like Node.js or Python over languages like Java and C# if cold start time is affecting user experience. Keep in mind that Java and other compiled languages perform better than interpreted ones for subsequent requests.
Java start time – if you have to go with Java, cold starts will be a bigger problem due to the long startup time of Java applications. se Provisioned Concurrency to address the cold start time issue.
Framework – prefer Spring Cloud Functions rather than the Spring Boot web framework.
Network – use the default network environment unless you need a VPC resource with private IP. This is because setting up ENIs takes significant time and adds to the cold start time. However, note that AWS has significantly improved the way ENIs connect to Lambda environments with customer VPCs.
Dependencies – remove all unnecessary dependencies that are not required to run the function. Keep only the ones that are required at runtime.
Variables – use Global/Static variables and singleton objects, because these remain alive until the container goes down. The advantage is that any subsequent call does not need to reinitialize these variables or objects.
Database connections – define DB connections at the global level so that it can be reused for subsequent invocations. AWS has recently released RDS Proxy in preview, which should make applications more scalable and resilient to database failures.
DNS resolution – if your Lambda is running in a VPC and is calling an AWS resource, avoid DNS resolution, because it takes significant time. For example, if your Lambda function accesses an Amazon RDS DB instance in your VPC, launch the instance with the non-publicly accessible option.
Dependency injection – if you are using Java, prefer simpler IoC dependency injections like Dagger, rather than full Spring Framework DI.
JAR files – if you are using Java, put your dependency .jar files in a separate /lib directory, not together with function code. This speeds up the package unpacking process.
Minification – if you are using Node.js, you can use minification and/or uglification of the code to reduce the size of the package. This will reduce the time it takes to download the package significantly. In some cases, package size may reduce from 10MB to 1MB.
SDK library – most blog posts and documentation say that the Lambda execution environment already has AWS SDK for Node.js and Python, and you shouldn’t add them in your dependency. However, this SDK library will be upgraded with the latest patches regularly and may impact your Lambda behavior. So it’s preferable to have your own dependency management.

What Is AWS Lambda Power Tuning?

AWS Lambda Power Tuning is an open source tool (get it from the official GitHub repo) that optimizes the memory and CPU allocation for Lambda functions to achieve the best performance at the lowest cost. This process involves systematically testing different memory configurations to find the optimal balance between execution time and cost.

Lambda functions allow users to allocate memory from 128 MB to 10,240 MB, with the CPU allocation scaling proportionally with the memory. Higher memory allocations typically result in faster execution times because of more CPU power, but they also incur higher costs. Power tuning helps identify the sweet spot where the function runs efficiently without over-allocating resources.

AWS Lambda Power Tuning automates this process, running your Lambda function with various memory configurations, collects execution time and cost data, and visualizes the results. The tool provides recommendations based on these results, allowing you to choose the configuration that best meets your performance and cost requirements.

To use the AWS Lambda Power Tuning tool, you need to deploy it using AWS Step Functions, which orchestrate the tuning process. Once deployed, the tool will run your Lambda function multiple times with different memory settings and generate a comprehensive report. This report includes a cost/performance trade-off graph, making it easy to visualize the impact of different configurations.

Setting Up AWS Lambda Performance Monitoring

In order to optimize Lambda performance, you must have performance monitoring in place. It is critical to measure and understand the behavior of functions during invocation. These metrics will help to fine-tune configuration and get the best performance out of these functions.

CloudWatch, by default, logs details of each Lambda execution in LogGroup/LogStream. Using the log details, CloudWatch displays the dashboards for various metrics like “number of requests”, “execution duration”, “error rate” and many more. These metrics can be used to create custom alarms.

CloudWatch only shows details for Lambda functions, but what if we need to know how the downstream AWS services (e.g. DynamoDB, S3) are doing for each lambda invocation? X-Ray can be useful for viewing a function’s execution with downstream services. This helps to track and debug Lambda function issues with other services.

Note: Adding X-Ray in your Node.js code package adds almost 6MB and will also add to the compute time of your function execution. It is recommended to use it only when it is needed to debug an issue, then remove it.

Learn more about Lambda monitoring:

Read our complete guide to serverless monitoring

Read our guide to using AWS CloudWatch logs to monitor Lambda functions

Read our guide to AWS X-Ray

AWS Lambda Performance Tuning In-Depth

AWS Lambda Limits – Tuning Memory and CPU

AWS Lambda provides memory ranges from 128 MB to 3,008 MB in 64 MB increments. Although we only specify the RAM, a linearly proportional amount of CPU power gets allocated to the Lambda Function by AWS. When the allocated memory crosses the Lambda memory size limit of 1,792 MB, it adds the equivalent of one full vCPU (one vCPU-second of credits per second).

If you have a single-threaded app, you shouldn’t select more than 1.8 GB RAM, as it cannot make use of the additional CPU and the cost will increase. Conversely, if you have selected less than 1.8 GB RAM and have multi-threading code which is CPU bound, it won’t help in reducing Lambda execution time.

Lambda billing is accurate to 100-ms increments. For that reason, when allocating memory to a function, you need to consider that putting the smallest RAM may reduce the memory cost but increase latency. It may alsoAnd, it may outweigh the cost savings due to the longer execution time of the function.

How can you balance memory and Lambda execution time?

As we know, Lambda costing depends on both memory allocation and execution time. If we need to reduce the Lambda execution time, we can try increasing memory (and by extension, CPU) to process it faster. However, when we try to increase the memory for a function past a certain limit, it won’t improve the execution time as AWS currently offers a maximum of 2 cores CPU.

If your application leans more towards computation logic (i.e. it’s CPU-centric), increasing the memory makes sense as it will reduce the execution time drastically and save on cost per execution.

Also, it’s worth paying attention to the fact that AWS charges for Lambda execution in increments of 100ms. So, if the average execution time for your function is 110ms, it will charge you for 200ms. So increasing memory and bringing execution time down to below 100ms can deliver a worthwhile cost saving.

There are a few open-source tools available that claim to help you find the best power configuration. However, monitoring memory usage and execution time – through CloudWatch Logs, X-Ray or a commercial tool like Lumigo – is a better option. You can then adjust configurations accordingly. Increasing or decreasing even a small number of your functions makes a big difference in overall AWS cost.

Learn more in our detailed guide to AWS Lambda Limits

Understanding the Cold Start Time Issue

When we invoke a Lambda function for the first time, it downloads the code from S3, downloads all the dependencies, creates a container and starts the application before it executes the code. This whole duration (except the execution of code) is the cold start time.

A cold start accounts for a significant amount of total execution time, and can significantly affect the performance and latency of your Lambda functions.

To address this issue, AWS came up with a feature called provisioned concurrency, which can warm up the Lambda execution environments in advance. Environments will be available for applications to do immediate code execution, with no need to wait for functions to start up.

Learn more in our detailed guide to Lambda cold start performance

Reserving Lambda Concurrency to Address Cold Start Performance Problems

AWS Lambda handles scalability for you. It creates a new execution environment to handle concurrent requests. So why should you be concerned? The truth is that nothing comes with infinite resources. Similarly, when optimizing Lambda performance you need to consider concurrency execution limits:

Account-level – By default, it is 1,000 per region across all functions.

Function level – By default, the “Unreserved Account Concurrency limit” option will be selected when we create a new function. That means, it can potentially use all of the available concurrency at account level (1,000 – concurrency used by other functions). However, this is not best practice, as if a function takes up the entire limit of the account, other functions may be impacted by throttling errors. That’s why it is recommended to always configure “reserve concurrency”, supporting a bulkhead pattern.

Source: AWS

Note – AWS will always keep an unreserved concurrency pool with a minimum of 100 concurrent executions to process the requests of functions that don’t have any specific limit set up. So in practice, you will only be able to allocate up to 900 for reserve concurrency.

When designing concurrency in Lambda, you should always consider the limitations of other integrated services like DynamoDB or RDS. We need to adjust the concurrency limit for a function based on the maximum connection these services can handle.

Concurrency is a good option to handle the large volume of requests, but if there is a sharp spike it will hit the performance of the application, because creating new execution environments entails cold start time and that may cause higher latency in response to requests during that time.

AWS launched Provisioned Concurrency for Lambda at re:Invent 2019 to handle these types of use cases. It provides options to provision the execution environment in advance when creating a function. It can also be auto-scaled based on CloudWatch metrics or scheduled for a particular time or day depending on requirements.

As an official partner for the launch, Lumigo now provides a range of provisioned concurrency metrics and alerts to

Read more in our detailed guide to provisioned concurrency

AWS Lambda Observability, Debugging, and Performance Made Easy with Lumigo

Lumigo is a serverless monitoring platform that lets developers effortlessly find Lambda cold starts, understand their impact, and fix them.

Lumigo can help you:

Solve cold starts – easily obtain cold start-related metrics for your Lambda functions, including cold start %, average cold duration, and enabled provisioned concurrency. Generate real-time alerts on cold starts, so you’ll know instantly when a function is under-provisioned and can adjust provisioned concurrency.
Find and fix issues in seconds with visual debugging – Lumigo builds a virtual stack trace of all services participating in the transaction. Everything is displayed in a visual map that can be searched and filtered.
Automatic distributed tracing – with one click and no manual code changes, Lumigo visualizes your entire environment, including your Lambdas, other AWS services, and every API call and external SaaS service.
Identify and remove performance bottlenecks – see the end-to-end execution duration of each service, and which services run sequentially and in parallel. Lumigo automatically identifies your worst latency offenders, including AWS Lambda cold starts.
Serverless-specific smart alerts – using machine learning, Lumigo’s predictive analytics identifies and alerts on issues before they impact application performance or costs, including alerts about AWS Lambda cold starts.

Get a free account with Lumigo to resolve Lambda issues in seconds

See Additional Guides on Key Performance Testing Topics

Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of performance testing.

Lumigo Launches AI Agent Observability