Performance testing is the process of evaluating computer systems to see how quickly they respond to user requests. Performance testing can uncover issues that negatively impact the user experience, and provide insights on how to fix them.
Traditionally, performance testing focused on client server systems deployed on-premises. Its goal was to build servers that could withstand peaks in application load and still deliver satisfactory performance.
In today’s cloud native world, performance testing has many new meanings—organizations are testing the performance of cloud computing systems, serverless applications, containerized architectures, and web applications. This is part of an extensive series of guides about IaaS.
In this article, you will learn:
While the computing world has changed dramatically over the past two decades, the same types of performance tests are still run today in cloud native environments.
|A baseline test is a performance test that runs a system under normal expected load.|
This test provides a benchmark to help you identify performance issues under irregular conditions. Also, if you run the test when a service is new, you can get rid of any obvious errors and start assessing real performance.
|The traditional load test evaluates a system to see how it performs with higher than normal load.|
|A stress test is a test that pushes your system to its limits, checking when it crashes, and whether it does so gracefully.|
During testing, load on the system is gradually increased until it reaches the point of failure. This is sometimes referred to as a “stepped” test because the increase in load is done gradually.
|A spike test evaluates a system with normal load, and a sudden jump to a peak level traffic (for example, jumping from 1,000 to 10,000 concurrent users). Many systems can crash due to sudden spikes in load. This test shows how your system reacts to spikes in traffic or transaction volume.|
|A soak test (or durability test) is performed under conditions that can have a cumulative effect on system performance.|
The test reveals performance problems that can occur due to long-term stress on the system—such as memory leaks, resource leaks, data corruption and other factors that can degrade performance over time.
The industry standard for soak testing runs at 80% of maximum capacity.
|Volume Test||A volume test is similar to a stress test or load test, but instead of testing application loads, it tests the amount of data being processed, which can also have an impact on application performance.|
The test can have two variants: either data is incrementally added to the databases, or a large amount of data is populated before the test, and then the system is evaluated.
In general, normal use of an application constantly increases database size, so you should perform volume testing on a regular basis.
Performance testing is an old discipline, and many of the old principles are still relevant today. Here is a performance testing process that can be used to test modern cloud native environments, such as cloud systems, serverless, and containerized applications.
In the rest of this article, we’ll explain how to monitor and improve performance in modern computing environments:
Cloud computing is changing the way end users deploy, monitor, and use applications. The cloud provides an unlimited pool of resources for computing, storage and network resources, so you can scale applications as needed.
However, even though cloud applications are able to dynamically scale and adapt to load, it is more important than ever to measure their performance. Performance issues can be complex to detect and resolve in a cloud environment, and can have a major impact on users.
In the cloud, performance testing focuses on changing the number of concurrent users accessing the application, trying different load profiles, and measuring different performance indicators, such as throughput and latency. Testing can be done at the virtual machine level, at a service level, and at an entire application level.
The table below shows performance testing types common used in a cloud environment
|Type of Performance Test||How it is Used in a Cloud Environment|
|Stress test||Checks the responsiveness and stability of cloud infrastructure under very high loads.|
|Load test||Verifies that the system performs well when used by a normal number of concurrent users.|
|Browser testing||Ensures compatibility and performance of user-facing systems across browsers and devices.|
|Latency testing||Measures the time it takes to receive a response to a request made to a cloud-based service or API.|
|Targeted infrastructure test||Isolates each component or layer of your application and tests if it can deliver the required performance.|
|Failover test||Checks the system’s ability to replace a failed component under high load conditions with minimal service disruption.|
|Capacity test||Defines a benchmark of the maximum traffic or load a cloud system can effectively handle at a given scale.|
|Soak test||Measure the performance of a cloud system under a given load over a long period of time, as a realistic test of a production environment.|
Storage tiering is a commonly-used concept in the cloud. It involves moving data between storage services or storage classes at different stages of its lifecycle. For example:
In the cloud, each storage service or service tier has its own availability and performance criteria and its own SLAs. When monitoring performance, you must be aware of these criteria and monitor each tier according to them. For example, if you experience high latency on an SSD disk or fast-access object storage, it should raise an alert – but if there is high latency when retrieving data from an archive, this is normal.
Learn more in the detailed guide to storage tiering.
In the cloud, performance and cost are directly related. If a cloud resource has unsatisfactory performance, there are usually several ways to improve performance by provisioning more resources:
In the cloud, performance and cost are two sides of the same coin. In many cases, to remediate performance instances, you will need to increase costs. This makes it important to be aware of cloud cost parameters, the relative cost of different cloud services, current utilization, budgets and thresholds set by the organization.
Learn more in the detailed guide to cloud cost.
Amazon Web Services (AWS) is the most popular public cloud platform. While there are hundreds of AWS services, some of the most common metrics monitored on AWS are:
AWS provides several monitoring services, including:
Learn more in the detailed guide to AWS monitoring.
Microsoft Azure is another market-leading public cloud offering. Common performance metrics on Azure include:
Azure provides a number of first-party monitoring services:
Learn more in the detailed guide to Azure monitoring.
Google Cloud Platform is Google’s public cloud service. It offers Google Cloud Monitoring, a service that provides data about the performance, availability, and health of your cloud applications.
Google Cloud Monitoring is based on three foundations:
Serverless computing is a way to design and deliver software with no knowledge of the underlying infrastructure—computing resources are provided as an automated, scalable cloud service.
In a traditional data center, or when using an infrastructure as a service (IaaS) model in the cloud, the server’s computing resources are fixed, and paid for regardless of the amount of computing work the server performs. In serverless computing, billing is performed only when the client’s code is actually running.
Serverless computing does not eliminate servers, but its purpose is to remove computing resource considerations from the software design and development process.
However, the serverless model raises significant challenges with regard to performance testing and application debugging:
Amazon Lambda is the most popular serverless platform today, with 96% market share according to the BMC State of Serverless report. Therefore, understanding how to diagnose and tune performance issues in Lambda is top of mind for any serverless practitioner.
We’ll briefly cover three ways to improve performance in AWS Lambda.
AWS Lambda allocates memory to serverless functions, ranging from 128 MB to 3,008 MB in 64 MB increments. At the same time, CPU power is allocated to the function in direct proportion to the amount of memory. When the allocated memory exceeds 1,792 MB, Lambda gives the function access to one full vCPU of processing capacity.
If your application is single-threaded, make sure never to use more than 1.8 GB of RAM, because when additional vCPUs are added, you will pay more but the app will not be able to make use of them. Conversely, if you use less than 1.8 GB of RAM, and the application is multi-threaded, you will not be able to make use of multi threading to improve performance.
AWS Lambda cost depend on memory allocation and execution time. If you reduce Lambda execution time (for example by making your function more efficient), you can increase memory (and thus increase CPU) to speed up processing. However, AWS currently offers up to 2 vCPU cores per function, so beyond a certain point, increasing memory will not reduce execution time.
If your application is CPU intensive, increasing memory can significantly reduce execution time and cost per execution.
AWS Lambda handles scalability automatically. But there are limits to the number of concurrent requests it allows. When optimizing Lambda performance, you should consider limiting concurrent executions.
There are two ways to limit concurrency in AWS Lambda:
A unique problem in serverless environments is cold starts—when a function is invoked, it takes time for it to power up and start serving user requests, and in the interim, users may experience latency.
AWS has introduced the concept of provisioned concurrency, which can help resolve this problem. It lets you provision a preset amount concurrency in Lambda, ensuring that you have an appropriate amount of function instances running, ready to serve user requests with no delay. You can change provisioned concurrency automatically using CloudWatch metrics, or by defining a schedule based on known application loads.
Learn more in our detailed guide to AWS Lambda performance
Serverless systems are a “black box”—engineers do not have knowledge about their inner workings. Serverless function code is only executed during the request, and it is not known where exactly it is executed on the hardware.
This is the traditional style of monitoring, where an agent running on a server collects data and saves it to a central server. In serverless, you don’t have access to the server, so monitoring should be built into the serverless function itself. This means that the developer has to include the monitoring library as part of their code, and initialize it while the process is running.
Push data monitoring in a serverless environment can help you collect data like function execution time, memory usage, recording user experience, function payloads, network performance and database performance.
Pull data monitoring is a new type of monitoring, in which services are built to report metrics on their own. Because services are temporary and not running statically, it can be effective to build telemetry functions into the service, and have it report essential metrics. This requires careful planning, because there can be multiple services running in diverse locations at different times.
Here are some of the key metrics you should measure for serverless functions:
Learn more in our detailed guide to serverless monitoring
Serverless applications are highly fragmented. You will not always have the ability to run these components locally. Distributed architecture can cause problems in many areas of the stack. The ability to drill down into code is critical to quickly fixing defects.
Remote debugging is difficult in serverless applications, because you do not have access to the server and operating system. It can also incur costs, because you’ll need to spin up instances of a function in order to test them. Developers often don’t understand where the problem is and why it happened.
In order to effectively debug serverless applications, it is essential to have dedicated tools that can provide information about what is happening in the environment and where the problems lie. In particular, it is important to get access to stack traces of serverless functions that incurred errors, to be able to debug and resolve production issues. Serverless observability tools have been developed to address these challenges.
Learn more in our detailed guide to serverless debugging
The following tools can help you monitor and debug serverless functions in the most popular serverless platform, AWS Lambda.
The primary source of information about how AWS applications work is AWS CloudWatch Logs. This also applies to Lambda functions. By default, Lambda functions send data to CloudWatch, and CloudWatch creates a LogGroup for each function. Each LogGroup has multiple log streams, which contain the logs generated for each Lambda function instance.
Log streams contain log events. Click on an event to see more information about the specific Lambda function that was called. This is the most basic way to debug issues in a Lambda function.
CloudWatch log events are useful, but limited in the information they provide about serverless issues. AWS built X-Ray to answer more complex questions like:
X-Ray creates a mapping of application components, and enables end-to-end tracking of requests. It can be used to debug applications both during application development and in production.
The Lumigo platform is the leading monitoring and debugging platform for serverless and microservices applications. It deploys in minutes, with no code changes, and enables:
Containers are a crucial part of the cloud native environment, and a foundation of most DevOps environments. They are a lightweight encapsulation of software and configuration, which makes it easy to deploy applications and IT systems in an automated and repeatable manner. Docker is the de-facto standard for container engines, and Kubernetes is the most popular orchestration tool used to manage large numbers of containers.
According to the 2020 CNCF Survey, 92% of cloud native users run containers in production, and 83% of them use Kubernetes in production. 23% of organizations have over 5,000 containers, and 12% say they have over 50 Kubernetes clusters running in production—indicating growing enterprise use of containers. In production and large enterprise deployments, performance considerations become critical.
One of the important things to understand is that containers are not virtual machines. A virtual machine runs as a software representation of a computer that is independent of the physical host, but containers depend on the host’s operating system, kernel and file system.
This means that, for example, a virtual machine can be hosted on a computer running Windows, while its workloads run Linux, or vice versa. On the other hand, containers running on the default Linux host must run Linux, because they share the resources of the underlying operating system kernel.
A container running on a host appears to be completely isolated from other containers. But internally, all containers running on a particular host use that host’s kernel and file system. This shared usage of host resources has a profound effect on performance optimization—which only gets more complicated when you run orchestrators like Kubernetes.
Learn more in our detailed guide to containerized architecture.
Broadly speaking, Kubernetes runs two types of containers:
Containers are organized into pods, which are deployed on physical hosts called nodes. For both types of containers, the following three metrics are crucial for evaluating performance as part of a Kubernetes deployment.
Monitoring memory usage at the pod and node level can provide valuable insight into cluster performance and the ability to successfully run workloads. Pods whose physical memory usage exceeds the predefined limit will be shut down. Also, if a node is running out of available memory, the kubelet marks the node as out of memory and starts reclaiming its resources.
Like memory, disk space is a critical resource for each container. So if kubelet detects that the root volume is running out of disk space, pod scheduling issues can occur. On specific nodes, when available disk space goes below a certain threshold, the node is flagged as having “disk pressure”, and kubelet may also reclaim node level resources.
You also need to track the usage level of storage volumes used by Kubernetes pods. Storage volumes provide persistent storage, which survives even after a specific container or pod shuts down. This allows you to predict problems at the application or service level.
Tracking the number of CPUs used by pods and nodes, compared to the configured requirements and limits, can provide valuable insights into cluster performance. If there are insufficient CPU resources available at the node level, the node will throttle the CPU resources available to each pod, which can cause performance issues.
Here are a few best practices you can use to improve performance of applications running on Kubernetes:
Learn more in the detailed guide to Kubernetes in production.
Here are a few best practices and considerations for monitoring Kubernetes deployments. You should establish careful monitoring at both the cluster and pod level.
The purpose of cluster monitoring is to monitor the health of the entire Kubernetes cluster. As an administrator, you need to know if all nodes in the cluster are functioning normally, the workload capacity they are running, the number of applications running on each node, and the resource utilization of the entire cluster.
You can use Kubernetes metrics to monitor how specific pods and their workloads are behaving. Pay special attention to:
Container metrics are primarily provided by the cAdvisor utility which comes with Kubernetes. For more extensive monitoring capabilities, a common choice is Prometheus, an open source monitoring tool built for cloud native environments.
Learn more in our detailed guide to Kubernetes monitoring
Prometheus is a cloud native tool for monitoring time-series events across containerized environments. It works natively with Kubernetes, integrating seamlessly with this orchestration platform. Prometheus supports various graphs and dashboards.
Prometheus applies a multi-dimensional data model using metric names and key-value pairs to identify time series data. It offers the use of PromQL, a flexible query language that helps leverage the dimensionality of this model.
Prometheus does not rely on distributed storage. It uses autonomous single server nodes and collects time-series data over HTTP via a pull model. It pushes the collected time series data through static configuration or services discovery, both of which also support targets discovery.
Learn more in our detailed guide to Prometheus monitoring.
Web application developers today can automatically push their code through build, test, and deploy, but are not always sure how the code will perform in production. Web application performance testing is the solution to this problem, and should be an important part of your testing strategy.
A poorly performing website or web application will also do worse on SEO and subsequently get less traffic than competitors, and is likely to have lower engagement, lower conversion, and lower revenues. Effective website and web application performance testing can improve all these metrics by ensuring the development team pays consistent attention to performance.
Learn more in the detailed guide to web performance
Here are a few tools web application developers can use to test and improve performance on an ongoing basis.
Pagespeed Insights by Google checks several on-page and back-end factors of a web page, and reports on their effect on page load time. It provides a performance score for desktop and mobile, shows which elements have the biggest impact on page load time, and provides suggestions for improvement.
GTmetrix, a free tool which is based on the Google Lighthouse performance benchmark, provides five different reports showcasing website performance:
GTmetrix helps visualize page performance and lets you set up alerts to notify about performance issues. It also provides extensive support for testing performance on mobile devices.
Pingdom is a commercial offering that monitors website uptime, page speed, real user monitoring (RUM) showing how actual visitors experience your web pages, and synthetic session monitoring. Pingdom has a global infrastructure with testing servers in 100 countries. It provides notification by email, SMS, and integrates with collaboration and alerting tools like Slack and PagerDuty.
WebPageTest is a free tool that lets you test your web pages from 40 locations, using 25 browsers. It performs in-depth performance tests covering topics like:
Website Speed Test analyzes a website’s images, which are responsible for a large percentage of load time on most web pages. It inspects image format, fit, compression, and quality options, and provides suggestions for optimizing images, which can have a dramatic impact on page load time. The image analysis tool is integrated with WebPageTest.
Here are several simple best practices that can improve performance for your web pages.
Optimizing your code by removing spaces, commas and other unwanted characters, as well as comments, formatting, and unused code, can significantly speed up your page. To compress text files even further, use minification and uglification frameworks such as CSS Nano and UglifyJS.
Whenever a page is redirected to another page, the visitor faces a delay waiting for the HTTP request/response cycle to complete. It is not uncommon to see web pages redirected over three times, and each redirect adds a delay for the user. Not to mention redirect loops and errors that can result in a broken user experience.
Server response time is influenced by the amount of traffic received, the resources used by each page, software running on the server, and the hosting solution used. To speed up server response time, find and fix performance bottlenecks such as delayed database queries, slow routing, and insufficient memory or CPU resources on the server. Aim for a server response time of under 200ms.
Resize images to the required size before using them on a web page, and provide several versions of images for responsive designs. Ensure image files are compressed for the web. Use CSS sprites to combine images like buttons and icons into one large image—the file then loads immediately with fewer HTTP requests, and the web page shows only the relevant portions. This can significantly reduce page load time.
Learn more in the detailed guide to image optimization
CSS is commonly used to style and transform images on websites. Using CSS commands, developers can adjust position for images, resize images, add backgrounds or borders, and apply filters like grayscale or blur.
In general, CSS image effects have a negative effect on page performance. There are two main performance concerns:
Learn more in the detailed guide to CSS images
Use next-generation image and video formats, which can provide much better compression ratios with higher quality:
Learn more in the detailed guide to next-generation image formats
You can optimize website images and videos by selecting the most appropriate format and compression, and delivering media files efficiently using content delivery networks (CDN).
There are many CMS plugins and tools available that can help automate image and video optimization. These plugins let you automatically convert images to the most appropriate format, and apply the best quality parameters to reduce file size while retaining quality.
Learn more in the detailed guide to video optimization
Video content is increasingly used on web pages and can be a major component in page load time. There are many ways to optimize video content to improve performance and user experience, but some of them are complex and may require dedicated software or hardware.
Cloud-based video services offer video APIs, which offer advanced video optimization capabilities on the fly:
Lazy loading is a common and effective technique, which involves only loading images only when the website visitor needs to see them.
Typically, the technique detects the user’s viewport, and loads images as the user scrolls down the page and sees them. This can significantly reduce the amount of data loaded to the user’s browser when they initially visit a page, and conserve bandwidth because images that are never viewed by the user do not need to be downloaded.
Learn more in the detailed guide to lazy loading
Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of IaaS.
Learn about the AWS ecosystem on its services, understand the core Lambda functionalities, and discover AWS Lambda monitoring functionalities.
See top articles in our guide to the AWS serverless ecosystem:
Authored by NetApp
Learn about AWS EFS, your backup options, how to optimize performance, see a brief comparison of EFS vs EBS vs S3, and discover how Cloud Volumes ONTAP can help.
See top articles in our guide to AWS EFS:
Authored by NetApp
Learn what is AWS EBS and how to perform common EBS operations. Including five highly useful EBS features that can help you optimize performance and billing.
See top articles in our guide to AWS EBS: