If you work on a serverless project, you have probably run into the issue of AWS Lambda timeouts handling. Lambda functions are short lived; the Lambda max timeout is 900 seconds (15 minutes). This can be difficult to manage and can cause issues in production applications. We’ll take a look at AWS Lambda timeout limits, timeout errors, monitoring timeout errors, and how to apply best practices to handle them effectively.
In this article
A Lambda Serverless application is made up of three major components. Each of these components can time out, affecting your serverless application:
The following table summarizes the timeout limits and important considerations for each component.
Serverless Component | Max Timeout | Comments |
API Gateway | 50 milliseconds – 29 seconds | Configurable |
Lambda Function | 900 seconds (15 minutes) | Also limited to 1,000 concurrent executions. If not handled, can lead to throttling issues. |
DynamoDB Streams | 40,000 write capacity units per table | |
S3 | No timeout by default, can be configured to 5-10 seconds | Unlimited objects per bucket |
Downstream Applications | Check your applications |
Learn more in our detailed guide to aws lambda limits.
There are two AWS-native solutions you can use to monitor logs for Lambda – CloudWatch and X-Ray. CloudWatch provides Duration metrics which tell us how much time a Lambda function is taking throughout a particular period. It also tells us the Average Duration which can be used to baseline the function timeout limit.
However, CloudWatch doesn’t tell us how much time each downstream call takes. This information is critical to setting a timeout limit for integrated services. Amazon X-Ray can help you discover the timeout value for downstream services. X-Ray shows the execution time taken by all downstream systems. In the example below, it shows the execution time of S3 GET (171ms) and S3 PUT (178ms) requests.
Now, let’s take a few scenarios and understand how these AWS limits might cause timeout errors in a serverless application.
Problem: A REST API implemented through a Lambda Function is exposed through API Gateway. This API is calling a third-party service to retrieve the data. But for some reason, this third-party service is not responding. The function has a timeout of 15 minutes, so the thread will be kept waiting for the response. However, the threshold limit for API Gateway is 29 seconds, so the user will receive the AWS Lambda timeout error after 29 seconds. Not only is this a poor experience for the user but it will also result in additional cost. Solution: For APIs, it’s always better to define your own timeouts at the function level, which should be very short – around 3-6 seconds. Setting the short timeout will ensure that we don’t wait for an unreasonable time for a downstream response and cause a timeout.
Problem: REST API is calling multiple services. It’s calling a DynamoDB table to retrieve data, calling an API, and then storing the data back in the DynamoDB table. If the API is not responding, the function will wait for the response until it reaches the timeout set at the function level (let’s assume 6s), and then timeout. Here one integration point is causing the whole function to timeout. Solution: For each integration point, the timeout needs to be set so that the function can handle the timeout error and process the request with the available data and doesn’t waste the execution time. So here, for all 3 integrations, the timeout limit has to be defined to handle the response in an effective way.
Problem: To solve the above two problems, most developers use a fixed AWS Lambda timeout limit at the function and integration level hardcoded in the code/config. However, it doesn’t make full use of the execution time and can cause problems.
In the first approach, the function timeout limit is set as 6s and for each integration call, it is set at 2s. Even though the whole function invocation (including all three calls) can be done within 6s, the API integration call will timeout as it is not able to perform within 2s. It has not been given the best chance to complete the request. Similarly, in the second approach, if the timeout is set too high for each call, it will cause the function to timeout without giving a chance for recovery. The function has a 6s timeout and each integration call has a 5s timeout. So, the whole execution can take a maximum of 15s + 1s (1s for handling the response at the function level). In this case, requests are allowed too much time to execute and cause the function to timeout.
Solution: To utilize the invocation time better, set the timeout based on the amount of invocation time left. It must also account for the time required to perform recovery steps, like returning a meaningful error or returning a fallback result based on circuit breaker pattern. Let’s take an example of one programming language to understand better how to do this: If Nodejs is the programming language of your function, Lambda handler does provide context object as an input. This object has a method, context.getRemainingTimeInMills(), which returns the approximate remaining execution time of the Lambda function that is currently executing. To set the timeout for the current running function, we can use this code:
var server = app.listen(); server.setTimeout(6000);
And to set the timeout for each API call, we can use this code:
app.post('/xxx', function (req, res) { req.setTimeout( context.getRemainingTimeInMills() - 500 ); // 500ms to account recovery steps });
In this article, we’ve looked at various scenarios in which timeouts can lead to bad user experience, not to mention adding cost to your account. So, apply common sense. If a function is taking more time than allotted, there could well be a problem that needs proper attention, rather than simply increasing the timeout limit. Monitoring is the best way to identify these gaps and finetune timeout configuration. Learn how easy AWS Lambda monitoring can be with Lumigo
Related content: Read our guide to aws lambda async.