AWS Lambda performance tuning involves optimizing the execution of your serverless functions to ensure they run efficiently and cost-effectively. The primary focus is on minimizing the cold start latency, optimizing memory allocation, reducing execution duration, and managing dependencies. Tuning your AWS Lambda functions involves several strategies and best practices to ensure optimal performance, which can directly impact user experience and application responsiveness.
Performance tuning can also include optimizing the code itself, selecting appropriate runtime environments, and ensuring that external resources such as databases and APIs are accessed efficiently. By implementing these optimizations, you can achieve faster execution times, reduced costs, and improved scalability for your serverless applications.
This is part of our comprehensive guide to performance testing in a cloud native world.
In this article
AWS Lambda benchmarks are standardized tests used to measure the performance of Lambda functions under various conditions. These benchmarks typically evaluate key performance metrics such as cold start duration, warm start duration, execution time, memory usage, and scalability under load.
Cold start benchmarks measure the time it takes for a Lambda function to initialize from scratch, which includes provisioning the runtime environment and loading the code. Warm start benchmarks assess the performance when the function is invoked while the execution environment is already initialized.
Other important benchmarks might include the impact of different runtime environments (e.g., Node.js, Python, Java), the influence of memory allocation settings on execution speed and cost, and the efficiency of handling network requests and database connections. By regularly benchmarking your AWS Lambda functions, you can identify performance bottlenecks and make informed decisions about optimizations.
Here are 12 quick tips you can immediately use to improve your performance:
AWS Lambda Power Tuning is an open source tool (get it from the official GitHub repo) that optimizes the memory and CPU allocation for Lambda functions to achieve the best performance at the lowest cost. This process involves systematically testing different memory configurations to find the optimal balance between execution time and cost.
Lambda functions allow users to allocate memory from 128 MB to 10,240 MB, with the CPU allocation scaling proportionally with the memory. Higher memory allocations typically result in faster execution times because of more CPU power, but they also incur higher costs. Power tuning helps identify the sweet spot where the function runs efficiently without over-allocating resources.
AWS Lambda Power Tuning automates this process, running your Lambda function with various memory configurations, collects execution time and cost data, and visualizes the results. The tool provides recommendations based on these results, allowing you to choose the configuration that best meets your performance and cost requirements.
To use the AWS Lambda Power Tuning tool, you need to deploy it using AWS Step Functions, which orchestrate the tuning process. Once deployed, the tool will run your Lambda function multiple times with different memory settings and generate a comprehensive report. This report includes a cost/performance trade-off graph, making it easy to visualize the impact of different configurations.
In order to optimize Lambda performance, you must have performance monitoring in place. It is critical to measure and understand the behavior of functions during invocation. These metrics will help to fine-tune configuration and get the best performance out of these functions.
CloudWatch, by default, logs details of each Lambda execution in LogGroup/LogStream. Using the log details, CloudWatch displays the dashboards for various metrics like “number of requests”, “execution duration”, “error rate” and many more. These metrics can be used to create custom alarms.
CloudWatch only shows details for Lambda functions, but what if we need to know how the downstream AWS services (e.g. DynamoDB, S3) are doing for each lambda invocation? X-Ray can be useful for viewing a function’s execution with downstream services. This helps to track and debug Lambda function issues with other services.
Note: Adding X-Ray in your Node.js code package adds almost 6MB and will also add to the compute time of your function execution. It is recommended to use it only when it is needed to debug an issue, then remove it.
Learn more about Lambda monitoring:
AWS Lambda provides memory ranges from 128 MB to 3,008 MB in 64 MB increments. Although we only specify the RAM, a linearly proportional amount of CPU power gets allocated to the Lambda Function by AWS. When the allocated memory crosses the Lambda memory size limit of 1,792 MB, it adds the equivalent of one full vCPU (one vCPU-second of credits per second).
If you have a single-threaded app, you shouldn’t select more than 1.8 GB RAM, as it cannot make use of the additional CPU and the cost will increase. Conversely, if you have selected less than 1.8 GB RAM and have multi-threading code which is CPU bound, it won’t help in reducing Lambda execution time.
Lambda billing is accurate to 100-ms increments. For that reason, when allocating memory to a function, you need to consider that putting the smallest RAM may reduce the memory cost but increase latency. It may alsoAnd, it may outweigh the cost savings due to the longer execution time of the function.
How can you balance memory and Lambda execution time?
As we know, Lambda costing depends on both memory allocation and execution time. If we need to reduce the Lambda execution time, we can try increasing memory (and by extension, CPU) to process it faster. However, when we try to increase the memory for a function past a certain limit, it won’t improve the execution time as AWS currently offers a maximum of 2 cores CPU.
If your application leans more towards computation logic (i.e. it’s CPU-centric), increasing the memory makes sense as it will reduce the execution time drastically and save on cost per execution.
Also, it’s worth paying attention to the fact that AWS charges for Lambda execution in increments of 100ms. So, if the average execution time for your function is 110ms, it will charge you for 200ms. So increasing memory and bringing execution time down to below 100ms can deliver a worthwhile cost saving.
There are a few open-source tools available that claim to help you find the best power configuration. However, monitoring memory usage and execution time – through CloudWatch Logs, X-Ray or a commercial tool like Lumigo – is a better option. You can then adjust configurations accordingly. Increasing or decreasing even a small number of your functions makes a big difference in overall AWS cost.
Learn more in our detailed guide to AWS Lambda Limits
When we invoke a Lambda function for the first time, it downloads the code from S3, downloads all the dependencies, creates a container and starts the application before it executes the code. This whole duration (except the execution of code) is the cold start time.
A cold start accounts for a significant amount of total execution time, and can significantly affect the performance and latency of your Lambda functions.
To address this issue, AWS came up with a feature called provisioned concurrency, which can warm up the Lambda execution environments in advance. Environments will be available for applications to do immediate code execution, with no need to wait for functions to start up.
Learn more in our detailed guide to Lambda cold start performance
AWS Lambda handles scalability for you. It creates a new execution environment to handle concurrent requests. So why should you be concerned? The truth is that nothing comes with infinite resources. Similarly, when optimizing Lambda performance you need to consider concurrency execution limits:
Account-level – By default, it is 1,000 per region across all functions.
Function level – By default, the “Unreserved Account Concurrency limit” option will be selected when we create a new function. That means, it can potentially use all of the available concurrency at account level (1,000 – concurrency used by other functions). However, this is not best practice, as if a function takes up the entire limit of the account, other functions may be impacted by throttling errors. That’s why it is recommended to always configure “reserve concurrency”, supporting a bulkhead pattern.
Note – AWS will always keep an unreserved concurrency pool with a minimum of 100 concurrent executions to process the requests of functions that don’t have any specific limit set up. So in practice, you will only be able to allocate up to 900 for reserve concurrency.
When designing concurrency in Lambda, you should always consider the limitations of other integrated services like DynamoDB or RDS. We need to adjust the concurrency limit for a function based on the maximum connection these services can handle.
Concurrency is a good option to handle the large volume of requests, but if there is a sharp spike it will hit the performance of the application, because creating new execution environments entails cold start time and that may cause higher latency in response to requests during that time.
AWS launched Provisioned Concurrency for Lambda at re:Invent 2019 to handle these types of use cases. It provides options to provision the execution environment in advance when creating a function. It can also be auto-scaled based on CloudWatch metrics or scheduled for a particular time or day depending on requirements.
As an official partner for the launch, Lumigo now provides a range of provisioned concurrency metrics and alerts to
Read more in our detailed guide to provisioned concurrency
Lumigo is a serverless monitoring platform that lets developers effortlessly find Lambda cold starts, understand their impact, and fix them.
Lumigo can help you:
Get a free account with Lumigo to resolve Lambda issues in seconds
Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of performance testing.
Authored by Lumigo
Authored by Lumigo
Authored by Granulate