Monitoring AWS DynamoDB performance and latency

Home Blog Monitoring AWS DynamoDB performance and latency
monitor and troubleshoot dynamodb

Amazon DynamoDB is a fully managed NoSQL database service provided by AWS and is tailor-made for serverless applications. As a fully managed service, we don’t have to worry about operational tasks with DynamoDB, such as hardware provisioning, configuring instances, scaling, replications, software patching, etc. But just because AWS is doing most of the heavy-lifting doesn’t mean we shouldn’t be vigilant about how we’re using DynamoDB or that we don’t need to understand it in the larger context of our serverless applications. In this article, we’ll cover the common performance issues that DynamoDB users should look out for and how they can impact your application, as well as the key metrics to monitor to get a comprehensive view into the health and performance of DynamoDB. 

Why It’s Important to Monitor DynamoDB

In order to ensure serverless applications run smoothly, we need to make sure that all the components and services in our environments are running at the highest performance. We need to continuously monitor DynamoDB to keep an eye on its availability, reliability, and performance. By monitoring your data, you can also easily debug applications when an error occurs. It’s always recommended to monitor all AWS services for simple debugging even when there’s a multi-point failure.

Common DynamoDB Issues and Troubleshooting

The way DynamoDB is set up can cause difficulty in understanding and debugging issues when they arise. It’s important to understand what these problems can be and design a solution around them to avoid having to worry about them later on. Below are some common issues you could run into.

Limited Visibility

Like with any other AWS service, DynamoDB is designed in a way that makes it difficult to see exactly what’s happening. There are too many layers of abstraction on top of DynamoDB. AWS adds these layers to make it easier for people to work with it, but at the same time, these abstractions make it difficult to troubleshoot performance issues.

For example, when requests are being throttled, even when there’s enough capacity available on the table, you have no easy way to find out why this is happening. There’s nothing logged in the log messages to make this obvious, meaning you’ll have to go through the documentation to understand how requests are handled.

We can absolutely use the APIs provided by AWS in their SDKs for managing and gaining visibility into almost everything, but they’re definitely not the best or the easiest way. 

Also, when there are errors, DynamoDB isn’t so great when it comes to explaining what’s going wrong. There are ways of seeing the performance of each query we run on it and charting this. But that’s pretty much the limit when it comes to debugging performance issues—there’s nothing more you can do out of the box.

Permissions

AWS manages all permissions to services and data using IAM policies and roles, which can be very confusing even for a seasoned AWS engineer. This can also be a very early hurdle to pass while setting up DynamoDB for a project. 

Getting the permissions right is very important for privacy and security. But DynamoDB IAM policies are very vague, and there are just too many of them. There are multiple ways of setting permissions to DynamoDB. You can do it directly from the UI or the console, or use JSON files to define these policies and upload them via the console.

Not having the right permissions can cause data leaks or disable you from accessing data either through the console or via an application, depending on the use case. In such cases, even though the error message makes it clear that it’s a permission issue, there will not be a clear indication of what permission is required or missing. This can cause further delays in debugging the problem. So spending time on understanding and applying the right policies and roles associated with DynamoDB permissions will be helpful.

Code debugging and exception handling

Debugging any code that works with AWS services or runs on AWS Lambda is always challenging. Because the errors are not very verbose or clear, it’s difficult to understand what the issues are and how to fix them. And because the exceptions are not well defined, it’s challenging to handle them in a meaningful way and make decisions based on the exceptions. We can definitely write custom exception classes to make things a little easier, but it’s still not easy to understand what exception classes from the AWS SDK we should consider as the base classes for all possible exceptions.

The lack of tools to debug AWS services adds to the difficulty of debugging and exception handling. There are some tools on the market to make this easier. For example, Lumigo’s execution tags can help in tagging DB save attempts. You simply tag such DB responses to easily search for them in Lumigo and debug issues whenever necessary.

Potential for high latency and its impact

There are many factors that affect the performance of DynamoDB, and each cause has a different effect on the overall system. Let’s look at some of the potential causes for the high latency of DynamoDB queries.

Slow processing of data

When data is in the hundreds of GBs or in TBs, or there are too many transformations or computations happening on huge data, the processing of data will become slow. There can be other reasons for the slow processing of data as well, such as unoptimized code, use of unnecessary transformations, bad design of the system, etc. But if the processing of the data takes a lot of time, it’s going to increase the latency and slow down the entire system.

Timed-out requests

We can assume that one of the causes of timed-out requests could be the slow processing of data. There could, of course, be a lot of other reasons, such as a bad network, the nodes being too busy or unresponsive, etc. But when DynamoDB is not able to respond to a request within the time specified, the request will time out, and this will also fail the entire system dependent on this request. So you need to make sure you give enough time for the request to come through when creating an AWS SDK client in your applications.

Read/Write

DynamoDB read and write units and their configuration define the performance of our queries. Based on the use cases and the load on a given DynamoDB table or instance, we have to make sure we have enough units for both reads and writes. If you’re trying to write or read more than one unit, DynamoDB will need to consume multiple units, which will cause more time for the operation to complete. So, you should calculate how many units of reads and writes you need while creating tables and configure them accordingly to avoid high latencies. Also, maximum read and write units vary depending on the AWS region we choose.

Throughput

DynamoDB provides two types of throughput modes: provisioned and on-demand. The provisioned mode promises to not have any surprises when it comes to billing, as you can set a limit on the number of reads and writes per second. This is best when the application is not expected to have any bursts in requests or is not going to see any unexpected traffic. But if and when it exceeds the limit set, the applications are going to experience high latency and even dropped requests, which can impact business. 

On the other hand, the on-demand mode makes sure no request coming into DynamoDB is dropped. It is more accommodating and performant. But because it can scale dynamically based on the load, it can also cause a spike in billing. You will have to decide which mode you want to configure based on your given use cases.

Performance metrics to look for with DynamoDB

DynamoDB exposes a few metrics we can use to gauge its performance. Using these, we can decide if the DynamoDB instance needs any tuning or not. Below are some of these metrics.

Throttled Requests

Each resource in DynamoDB (table or index) has a defined throughput limit. Whenever an operation exceeds this limit, the requests are throttled. Depending on the operation and the type of operation, the number of throttled requests is incremented. For batch requests though, the throttled requests count is incremented only if all the requests in the batch are throttled.

Latency 

Latency is the time taken by DynamoDB to respond to a request. The higher the latency, the slower the whole system will be. DynamoDB promises single-digit millisecond latencies no matter what size of data we’re working with or at what scale. This means, even if we scale our application gradually, we shouldn’t see any change in the performance of the queries on DynamoDB. Any change in latency means there could be something wrong in the data pipeline.

Errors 

Whenever there is any error during the execution of a query, DynamoDB throws an HTTP exception (as all requests are submitted through HTTP) with three components: HTTP status code, error message, and exception name. You need to monitor errors to make sure everything is working as expected.

Any error that contains a 5xx series status code should be looked into, as this is a system error. But the chances of this occurring are very slim. The chances of 4xx series exceptions being thrown, on the other hand, are high, as these represent a bad request or human error. This can be resolved by changing the query or fixing any other error in the request.

Capacity 

DynamoDB sets read and write capacity metrics for tables and global secondary indexes. Whenever this read or write capacity is breached for a given table, all following requests are throttled. So it’s important to make sure we closely watch the consumed read and write capacity units so that we don’t throttle any requests and decrease the performance of the system.

Monitoring DynamoDB

DynamoDB, like any other AWS service, comes ready to be integrated with Amazon CloudWatch, which is the centralized place for monitoring all activities and events happening with various AWS services. DynamoDB sends logs and other administrative information in the form of events to CloudWatch. As usual, all of this is configurable and can be tuned to whatever degree we need. 

CloudWatch collects all the information as log messages and stores them for a configured period of time. This comes in handy since we might need to retain log messages for debugging purposes. CloudWatch also makes it simple to search through these logs with various filters, rendering debugging and finding issues a lot easier.

DynamoDB also exposes a good number of important metrics that can be reported to CloudWatch and other monitoring tools. With these metrics, it’s easy to monitor the performance of DynamoDB and also catch issues as and when they happen. The following is a small list of such metrics:

  • ConsumedReadCapacityUnits
  • ConsumedWriteCapacityUnits
  • ReadThrottleEvents
  • WriteThrottleEvents
  • SystemErrors
  • UserErrors

DynamoDB Monitoring Tools

There are various tools available to monitor DynamoDB and its performance, a few of which are discussed below. 

Amazon CloudWatch 

As already mentioned, CloudWatch is another service within AWS that allows for easy monitoring of all other AWS services. It’s a snap to configure and get started with, as well as convenient because it’s already part of the AWS suite. You can set up alerts and charts for monitoring DynamoDB metrics and for alerting the team when something doesn’t look right.

Lumigo

Lumigo is cloud native observability platform that, like DynamoDB, was tailor-made for serverless applications. With best-in-breed distributed tracing, Lumigo helps developers visualize their applications end-to-end. Deployed with zero code changes, Lumigo connects every component in modern applications, from AWS serverless and containerized services like DynamoDB to 3rd party integrations, making it easy and quick to find and fix bugs, errors or performance issues

Opsview Monitor 

Opsview Monitor readily integrates with DynamoDB and other AWS services to paint an easily understandable picture of what’s going on. It offers almost all the features that the other tools on the list offer, plus Opsview provides a trial or free account to test the tool before signing up for the service.

ManageEngine

ManageEngine’s Application Manager is pretty well known in the application monitoring space. The tool makes it easy to auto-discover all services in the AWS stack along with your DynamoDB database. It provides simple charts for quickly getting an overall idea of the health of your AWS services. You can also configure ManageEgine to send out alerts in case something goes wrong.

Optimizing DynamoDB for Performance

Once we understand what’s causing issues with DynamoDB, we can easily tune the configuration to improve performance. Below is a short list of configuration parameters that can be tuned for better DynamoDB performance.

Read/Writes

DynamoDB has read and write capacities that help throttle requests during high-traffic situations. But these capacity limits might also cause high-latency issues. To fix this, you need to monitor the read and write capacity usage and then increase the limits so that DynamoDB can use more resources to make the read and write operations less throttled.

On-Demand vs. Provisioned 

Whether you use on-demand or provisioned DynamoDB instances can have a huge impact on the performance of applications. If we know what capacities we need for our DynamoDB instance and are sure that there will be no surprises or unexpected hikes in read or write traffic, we can provision the DynamoDB instance ourselves to reduce cost. 

On the other hand, if there is even a slight chance of unexpected traffic or load hike, we need to let DynamoDB scale on-demand so that no read or write requests are throttled and are served immediately, thereby keeping latency to a minimum.

Autoscaling 

When you have autoscaling enabled on a DynamoDB table or index, you can control how DynamoDB scales requests. You can have it scale only for reads, only for writes, or both. You can even set a target utilization so that DynamoDB makes sure the auto-scaled capacities are always near this target utilization number.

Throughput 

Unlike traditional databases, Amazon DynamoDB scales out instead of scaling up to improve query performance. This means it can add more storage whenever data grows. Along with this, DynamoDB partitions the data so that throughput is not affected; this is the easiest way to improve throughput. Also, you can specify the level of throughput needed, but be sure to design your applications to make complete use of DynamoDB’s design. 

Summary

DynamoDB is a popular data management and query engine used in many modern applications. Although Amazon already takes care of most of the optimization and performance tuning, you need to monitor it closely to make sure you’re getting the most out of it. 

There are several monitoring tools that plug into DynamoDB easily and extract all required information to plot charts, automate monitoring and alerting, and even help in uncovering underlying issues that can cause issues with performance and latency. Using these tools, you can easily optimize DynamoDB’s query performance and improve latency for better user experience.