Lambda is a managed AWS service that promises to take care of server infrastructure for you. However, it still requires careful design to get the best performance out of the computation capabilities it provides, and avoid latency and service disruption for users.
This is part of our comprehensive guide to performance testing in a cloud native world.
In this article
Before we get into the details of AWS Lambda Optimization, here are 12 quick tips you can immediately use to improve your performance:
In order to optimize Lambda performance, you must have performance monitoring in place. It is critical to measure and understand the behavior of functions during invocation. These metrics will help to fine-tune configuration and get the best performance out of these functions.
CloudWatch, by default, logs details of each Lambda execution in LogGroup/LogStream. Using the log details, CloudWatch displays the dashboards for various metrics like “number of requests”, “execution duration”, “error rate” and many more. These metrics can be used to create custom alarms.
CloudWatch only shows details for Lambda functions, but what if we need to know how the downstream AWS services (e.g. DynamoDB, S3) are doing for each lambda invocation? X-Ray can be useful for viewing a function’s execution with downstream services. This helps to track and debug Lambda function issues with other services.
Note: Adding X-Ray in your Node.js code package adds almost 6MB and will also add to the compute time of your function execution. It is recommended to use it only when it is needed to debug an issue, then remove it.
Learn more about Lambda monitoring:
AWS Lambda provides memory ranges from 128 MB to 3,008 MB in 64 MB increments. Although we only specify the RAM, a linearly proportional amount of CPU power gets allocated to the Lambda Function by AWS. When the allocated memory crosses the Lambda memory size limit of 1,792 MB, it adds the equivalent of one full vCPU (one vCPU-second of credits per second).
If you have a single-threaded app, you shouldn’t select more than 1.8 GB RAM, as it cannot make use of the additional CPU and the cost will increase. Conversely, if you have selected less than 1.8 GB RAM and have multi-threading code which is CPU bound, it won’t help in reducing Lambda execution time.
Lambda billing is accurate to 100-ms increments. For that reason, when allocating memory to a function, you need to consider that putting the smallest RAM may reduce the memory cost but increase latency. It may alsoAnd, it may outweigh the cost savings due to the longer execution time of the function.
How can you balance memory and Lambda e time?
As we know, Lambda costing depends on both memory allocation and execution time. If we need to reduce the Lambda execution time, we can try increasing memory (and by extension, CPU) to process it faster. However, when we try to increase the memory for a function past a certain limit, it won’t improve the execution time as AWS currently offers a maximum of 2 cores CPU.
If your application leans more towards computation logic (i.e. it’s CPU-centric), increasing the memory makes sense as it will reduce the execution time drastically and save on cost per execution.
Also, it’s worth paying attention to the fact that AWS charges for Lambda execution in increments of 100ms. So, if the average execution time for your function is 110ms, it will charge you for 200ms. So increasing memory and bringing execution time down to below 100ms can deliver a worthwhile cost saving.
There are a few open-source tools available that claim to help you find the best power configuration. However, monitoring memory usage and execution time – through CloudWatch Logs, X-Ray or a commercial tool like Lumigo – is a better option. You can then adjust configurations accordingly. Increasing or decreasing even a small number of your functions makes a big difference in overall AWS cost.
Learn more in our detailed guide to AWS Lambda Limits
When we invoke a Lambda function for the first time, it downloads the code from S3, downloads all the dependencies, creates a container and starts the application before it executes the code. This whole duration (except the execution of code) is the cold start time.
A cold start accounts for a significant amount of total execution time, and can significantly affect the performance and latency of your Lambda functions.
To address this issue, AWS came up with a feature called provisioned concurrency, which can warm up the Lambda execution environments in advance. Environments will be available for applications to do immediate code execution, with no need to wait for functions to start up.
Learn more in our detailed guide to Lambda cold start performance
AWS Lambda handles scalability for you. It creates a new execution environment to handle concurrent requests. So why should you be concerned? The truth is that nothing comes with infinite resources. Similarly, when optimizing Lambda performance you need to consider concurrency execution limits:
Account-level – By default, it is 1,000 per region across all functions.
Function level – By default, the “Unreserved Account Concurrency limit” option will be selected when we create a new function. That means, it can potentially use all of the available concurrency at account level (1,000 – concurrency used by other functions). However, this is not best practice, as if a function takes up the entire limit of the account, other functions may be impacted by throttling errors. That’s why it is recommended to always configure “reserve concurrency”, supporting a bulkhead pattern.
Source: AWS
Note – AWS will always keep an unreserved concurrency pool with a minimum of 100 concurrent executions to process the requests of functions that don’t have any specific limit set up. So in practice, you will only be able to allocate up to 900 for reserve concurrency.
When designing concurrency in Lambda, you should always consider the limitations of other integrated services like DynamoDB or RDS. We need to adjust the concurrency limit for a function based on the maximum connection these services can handle.
Concurrency is a good option to handle the large volume of requests, but if there is a sharp spike it will hit the performance of the application, because creating new execution environments entails cold start time and that may cause higher latency in response to requests during that time.
AWS launched Provisioned Concurrency for Lambda at re:Invent 2019 to handle these types of use cases. It provides options to provision the execution environment in advance when creating a function. It can also be auto-scaled based on CloudWatch metrics or scheduled for a particular time or day depending on requirements.
As an official partner for the launch, Lumigo now provides a range of provisioned concurrency metrics and alerts to
Read more in our detailed guide to provisioned concurrency
In this article, we’ve explored various aspects of Lambda function performance. Serverless computing offers numerous advantages to development teams ready to embrace the shift in mindset that it requires, but to get the most out of it we need to ensure the right balance between performance and cost. Following the best practices laid out here will go a long way to achieving that balance, but monitoring is critical to understanding the behavior of your serverless application in order to fine-tune performance.
We have authored in-depth guides on several other topics that can be useful as you explore the world of performance testing.
Serverless architectures can get complex, and suffer from low visibility. Learn how to monitor serverless environments, understand distributed tracing, and learn to debug common problems.
See top articles in our serverless debugging guide:
Discover how to monitor serverless environments using logs, first party cloud monitoring tools like Amazon CloudWatch, and dedicated serverless monitoring solutions like Lumigo.
See top articles in our serverless monitoring guide:
Authored by Tigera
Learn how to monitor Kubernetes, the world’s leading container orchestrator, to improve the performance and security of large-scale containerized applications.