Jan 18 2023
Amazon DynamoDB is a fully managed NoSQL database service provided by AWS. As a managed service, we don’t have to worry about tasks such as hardware provisioning, configuration, scaling, replications, or patching. Although AWS does most of the heavy-lifting, it doesn’t mean we need not be vigilant about how we’re using DynamoDB or that we don’t need to understand it in the larger context of our applications. In this article we’ll cover the common issues that DynamoDB users should look out for and how they impact your application, as well as the key metrics to monitor to get a comprehensive view into the health and performance of DynamoDB.
In this article
Common DynamoDB Issues and Troubleshooting
Limited Visibility
Like with other AWS services, DynamoDB can be difficult to see exactly what’s happening with the service. There are many layers of abstraction on top of DynamoDB, making it difficult to troubleshoot.
For example, when requests are throttled, even when there’s enough capacity available, you have no easy way to find out why. There’s nothing logged to make this obvious, meaning you’ll have to go through the documentation to understand how requests were handled. We can try using APIs provided by AWS in their SDKs, but this approach is not so easy.
Also, when there are errors, DynamoDB isn’t so great when it comes to explaining what’s going wrong. There are ways of seeing the performance of each query we run on it and charting this. But that’s pretty much the limit when it comes to debugging performance issues—there’s nothing more you can do out of the box.
Permissions
AWS manages all permissions to services and data using IAM policies and roles, but DynamoDB IAM policies are very vague, and there are many of them. Getting them right is critical for privacy and security. Not having the right permissions can cause data leaks or disable you from accessing data either through the console or via an application, depending on the use case. Even though the error message makes it clear that it’s a permission issue, there’s no indication of what permission is required or missing, causing further delays in debugging the problem.
Code debugging and exception handling
Debugging any code that works with AWS services or runs on AWS Lambda is always challenging. Because the errors are not very verbose or clear, it’s difficult to understand what the issues are and how to fix them.
The lack of tools to debug AWS services adds to the difficulty of debugging and exception handling. There are some tools on the market to make this easier. For example, Lumigo’s execution tags can help in tagging DB save attempts. You simply tag such DB responses to easily search for them in Lumigo and debug issues whenever necessary.
Potential for high latency and its impact
Slow processing of data
When data is in the hundreds of GBs or in TBs, or too many transformations or computations happening on huge data, the processing of data will slow, among other reasons like unoptimized code, bad design of the system, etc. If the data takes a lot of time, latency will increase and slow down the entire system.
Timed-out requests
When DynamoDB is not able to respond to a request within the time specified, the request will time out, and this will also fail the entire system dependent on this request. Bad network, nodes being too busy or unresponsive can cause timeouts, so make sure you give enough time for the request to come through when creating an AWS SDK client in your applications.
Read/Write
DynamoDB read and write units and their configuration define the performance of our queries. Based on the use cases and load on a given DynamoDB table or instance, we need to ensure enough units for both reads and writes. If you’re trying to write or read more than one unit, DynamoDB will need to consume multiple, increasing the duration of the operation. You should calculate how many units you need while creating tables configured accordingly to avoid high latencies.
Throughput
DynamoDB provides two types of throughput modes: provisioned and on-demand. The provisioned mode lets you set a limit on the number of reads and writes per second. This is best when the application is not expected to have any bursts or traffic spikes. But if and when it exceeds the limit set, the applications can experience high latency and even dropped requests.
The on-demand mode makes sure no request coming into DynamoDB is dropped. It is more accommodating and performant but because it can scale dynamically, it can also cause a spike in billing.
Performance metrics to look for with DynamoDB
DynamoDB exposes a few metrics we can use to gauge its performance. Using these, we can decide if the DynamoDB instance needs any tuning or not.
Throttled Requests
Each resource in DynamoDB (table or index) has a defined throughput limit. Whenever an operation exceeds this limit, the requests are throttled. Depending on the type of operation, the number of throttled requests is incremented. For batch requests, the throttled requests count is incremented only if all the requests in the batch are throttled.
Latency
Latency is the time taken by a service to respond to a request and DynamoDB promises single-digit millisecond latencies no matter what size of data we’re working with or at what scale. Even if we scale our application gradually, we shouldn’t see any change in the performance of the queries on DynamoDB. Any change in latency could indicate something wrong in the data pipeline.
Errors
When there is an error during the execution of a query, DynamoDB throws an HTTP exception (as all requests are submitted through HTTP) with three components: HTTP status code, error message, and exception name. Any error that contains a 5xx series status code should be looked into, as this is a system error. 4xx series exceptions represent a bad request or human error.
Capacity
DynamoDB sets read and write capacity metrics for tables and global secondary indexes. Whenever this read or write capacity is breached for a given table, all following requests are throttled. It’s important to closely watch consumed read and write capacity units so that requests don’t throttle and decrease performance of the system.
Monitoring and Troubleshooting DynamoDB
DynamoDB, like any other AWS service, can be integrated with Amazon CloudWatch, which is the centralized place for monitoring all activities and events happening across AWS services. DynamoDB sends logs in the form of events to CloudWatch. All of this is configurable and can be tuned.
CloudWatch collects the information as log messages and stores them for a configured period of time. DynamoDB also exposes some important metrics that can be reported to CloudWatch and other monitoring tools. With these metrics, you can monitor performance of DynamoDB and also catch issues as they happen. The following is a small list of such metrics:
- ConsumedReadCapacityUnits
- ConsumedWriteCapacityUnits
- ReadThrottleEvents
- WriteThrottleEvents
- SystemErrors
- UserErrors
To troubleshoot these issues, developers can use Cloudwatch with Lumigo, which provides one-click distributed tracing to monitor and troubleshoot managed services like DynamoDB.
Optimizing DynamoDB for Performance
Once we understand what’s causing issues with DynamoDB, we can easily tune the configuration to improve performance.
Read/Writes
DynamoDB has read and write capacities that help throttle requests during high-traffic situations. But these capacity limits might also cause high-latency issues. To fix this, you need to monitor the read and write capacity usage and then increase the limits so that DynamoDB can use more resources to make the read and write operations less throttled.
On-Demand vs. Provisioned
Whether you use on-demand or provisioned DynamoDB instances can have a huge impact on the performance of applications. If we know what capacities we need for our DynamoDB instance and are sure that there will be no surprises or unexpected hikes in read or write traffic, we can provision the DynamoDB instance ourselves to reduce cost.
On the other hand, if there is even a slight chance of unexpected traffic or load hike, we need to let DynamoDB scale on-demand so that no read or write requests are throttled and are served immediately, thereby keeping latency to a minimum.
Autoscaling
When you have autoscaling enabled on a DynamoDB table or index, you can control how DynamoDB scales requests. You can have it scale only for reads, only for writes, or both. You can even set a target utilization so that DynamoDB makes sure the auto-scaled capacities are always near this target utilization number.
Throughput
Unlike traditional databases, Amazon DynamoDB scales out instead of scaling up to improve query performance. This means it can add more storage whenever data grows. Along with this, DynamoDB partitions the data so that throughput is not affected; this is the easiest way to improve throughput. Also, you can specify the level of throughput needed, but be sure to design your applications to make complete use of DynamoDB’s design.
Summary
DynamoDB is a popular data management and query engine used in many modern applications. Although Amazon already takes care of most of the optimization and performance tuning, you need to monitor it closely to make sure you’re getting the most out of it.