Performance testing is the process of evaluating computer systems to see how quickly they respond to user requests. Performance testing can uncover issues that negatively impact the user experience, and provide insights on how to fix them.
Traditionally, performance testing focused on client-server systems deployed on-premises. Its goal was to build servers that could withstand peaks in application load and still deliver satisfactory performance.
In today’s cloud native world, performance testing has many new meanings—organizations are testing the performance of cloud computing systems, serverless applications, containerized architectures, and web applications.
In this article, you will learn:
While the computing world has changed dramatically over the past two decades, the same types of performance tests are still run today in cloud native environments.
|A baseline test is a performance test that runs a system under normal expected load.
This test provides a benchmark to help you identify performance issues under irregular conditions. Also, if you run the test when a service is new, you can get rid of any obvious errors and start assessing real performance.
|The traditional load test evaluates a system to see how it performs with higher than normal load.|
|A stress test is a test that pushes your system to its limits, checking when it crashes, and whether it does so gracefully.
During testing, load on the system is gradually increased until it reaches the point of failure. This is sometimes referred to as a “stepped” test because the increase in load is done gradually.
|A spike test evaluates a system with normal load, and a sudden jump to a peak level traffic (for example, jumping from 1,000 to 10,000 concurrent users). Many systems can crash due to sudden spikes in load. This test shows how your system reacts to spikes in traffic or transaction volume.|
|A soak test (or durability test) is performed under conditions that can have a cumulative effect on system performance.
The test reveals performance problems that can occur due to long-term stress on the system—such as memory leaks, resource leaks, data corruption and other factors that can degrade performance over time.
The industry standard for soak testing runs at 80% of maximum capacity.
|Volume Test||A volume test is similar to a stress test or load test, but instead of testing application loads, it tests the amount of data being processed, which can also have an impact on application performance.
The test can have two variants: either data is incrementally added to the databases, or a large amount of data is populated before the test, and then the system is evaluated.
In general, normal use of an application constantly increases database size, so you should perform volume testing on a regular basis.
Performance testing is an old discipline, and many of the old principles are still relevant today. Here is a performance testing process that can be used to test modern cloud native environments, such as cloud systems, serverless, and containerized applications.
In the rest of this article, we’ll explain how to monitor and improve performance in modern computing environments:
Cloud computing is changing the way end users deploy, monitor, and use applications. The cloud provides an unlimited pool of resources for computing, storage and network resources, so you can scale applications as needed.
However, even though cloud applications are able to dynamically scale and adapt to load, it is more important than ever to measure their performance. Performance issues can be complex to detect and resolve in a cloud environment, and can have a major impact on users.
In the cloud, performance testing focuses on changing the number of concurrent users accessing the application, trying different load profiles, and measuring different performance indicators, such as throughput and latency. Testing can be done at the virtual machine level, at a service level, and at an entire application level.
The table below shows performance testing types common used in a cloud environment
|Type of Performance Test||How it is Used in a Cloud Environment|
|Stress test||Checks the responsiveness and stability of cloud infrastructure under very high loads.|
|Load test||Verifies that the system performs well when used by a normal number of concurrent users.|
|Browser testing||Ensures compatibility and performance of user-facing systems across browsers and devices.|
|Latency testing||Measures the time it takes to receive a response to a request made to a cloud-based service or API.|
|Targeted infrastructure test||Isolates each component or layer of your application and tests if it can deliver the required performance.|
|Failover test||Checks the system’s ability to replace a failed component under high load conditions with minimal service disruption.|
|Capacity test||Defines a benchmark of the maximum traffic or load a cloud system can effectively handle at a given scale.|
|Soak test||Measure the performance of a cloud system under a given load over a long period of time, as a realistic test of a production environment.|
Amazon Web Services (AWS) is the most popular public cloud platform. While there are hundreds of AWS services, some of the most common metrics monitored on AWS are:
AWS provides several monitoring services, including:
Learn more in the detailed guide to AWS monitoring.
Microsoft Azure is another market-leading public cloud offering. Common performance metrics on Azure include:
Azure provides a number of first-party monitoring services:
Learn more in the detailed guide to Azure monitoring.
Google Cloud Platform is Google’s public cloud service. It offers Google Cloud Monitoring, a service that provides data about the performance, availability, and health of your cloud applications.
Google Cloud Monitoring is based on three foundations:
Serverless computing is a way to design and deliver software with no knowledge of the underlying infrastructure—computing resources are provided as an automated, scalable cloud service.
In a traditional data center, or when using an infrastructure as a service (IaaS) model in the cloud, the server’s computing resources are fixed, and paid for regardless of the amount of computing work the server performs. In serverless computing, billing is performed only when the client’s code is actually running.
Serverless computing does not eliminate servers, but its purpose is to remove computing resource considerations from the software design and development process.
However, the serverless model raises significant challenges with regard to performance testing and application debugging:
The main impact of these challenges is low observability over what is running and how workloads are performing.
Amazon Lambda is the most popular serverless platform today, with 96% market share according to the BMC State of Serverless report. Therefore, understanding how to diagnose and tune performance issues in Lambda is top of mind for any serverless practitioner.
We’ll briefly cover three ways to improve performance in AWS Lambda.
AWS Lambda allocates memory to serverless functions, ranging from 128 MB to 3,008 MB in 64 MB increments. At the same time, CPU power is allocated to the function in direct proportion to the amount of memory. When the allocated memory exceeds 1,792 MB, Lambda gives the function access to one full vCPU of processing capacity.
If your application is single-threaded, make sure never to use more than 1.8 GB of RAM, because when additional vCPUs are added, you will pay more but the app will not be able to make use of them. Conversely, if you use less than 1.8 GB of RAM, and the application is multi-threaded, you will not be able to make use of multi threading to improve performance.
AWS Lambda cost depend on memory allocation and execution time. If you reduce Lambda execution time (for example by making your function more efficient), you can increase memory (and thus increase CPU) to speed up processing. However, AWS currently offers up to 2 vCPU cores per function, so beyond a certain point, increasing memory will not reduce execution time.
If your application is CPU intensive, increasing memory can significantly reduce execution time and cost per execution.
AWS Lambda handles scalability automatically. But there are limits to the number of concurrent requests it allows. When optimizing Lambda performance, you should consider limiting concurrent executions.
There are two ways to limit concurrency in AWS Lambda:
A unique problem in serverless environments is cold starts—when a function is invoked, it takes time for it to power up and start serving user requests, and in the interim, users may experience latency.
AWS has introduced the concept of provisioned concurrency, which can help resolve this problem. It lets you provision a preset amount concurrency in Lambda, ensuring that you have an appropriate amount of function instances running, ready to serve user requests with no delay. You can change provisioned concurrency automatically using CloudWatch metrics, or by defining a schedule based on known application loads.
Learn more in our detailed guide to AWS Lambda performance
Serverless systems are a “black box”—engineers do not have knowledge about their inner workings. Serverless function code is only executed during the request, and it is not known where exactly it is executed on the hardware.
However, it is possible to monitor and resolve issues in serverless, it just requires looking at different metrics than you are used to in a traditional server-based environment.
This is the traditional style of monitoring, where an agent running on a server collects data and saves it to a central server. In serverless, you don’t have access to the server, so monitoring should be built into the serverless function itself. This means that the developer has to include the monitoring library as part of their code, and initialize it while the process is running.
Push data monitoring in a serverless environment can help you collect data like function execution time, memory usage, recording user experience, function payloads, network performance and database performance.
Pull data monitoring is a new type of monitoring, in which services are built to report metrics on their own. Because services are temporary and not running statically, it can be effective to build telemetry functions into the service, and have it report essential metrics. This requires careful planning, because there can be multiple services running in diverse locations at different times.
Here are some of the key metrics you should measure for serverless functions:
Learn more in our detailed guide to serverless monitoring
Serverless applications are highly fragmented. You will not always have the ability to run these components locally. Distributed architecture can cause problems in many areas of the stack. The ability to drill down into code is critical to quickly fixing defects.
Remote debugging is difficult in serverless applications, because you do not have access to the server and operating system. It can also incur costs, because you’ll need to spin up instances of a function in order to test them. Developers often don’t understand where the problem is and why it happened.
In order to effectively debug serverless applications, it is essential to have dedicated tools that can provide information about what is happening in the environment and where the problems lie. In particular, it is important to get access to stack traces of serverless functions that incurred errors, to be able to debug and resolve production issues. Serverless observability tools have been developed to address these challenges.
Learn more in our detailed guide to serverless debugging
The following tools can help you monitor and debug serverless functions in the most popular serverless platform, AWS Lambda.
The primary source of information about how AWS applications work is AWS CloudWatch Logs. This also applies to Lambda functions. By default, Lambda functions send data to CloudWatch, and CloudWatch creates a LogGroup for each function. Each LogGroup has multiple log streams, which contain the logs generated for each Lambda function instance.
Log streams contain log events. Click on an event to see more information about the specific Lambda function that was called. This is the most basic way to debug issues in a Lambda function.
CloudWatch log events are useful, but limited in the information they provide about serverless issues. AWS built X-Ray to answer more complex questions like:
X-Ray creates a mapping of application components, and enables end-to-end tracking of requests. It can be used to debug applications both during application development and in production.
The Lumigo platform is the leading monitoring and debugging platform for serverless and microservices applications. It deploys in minutes, with no code changes, and enables:
Containers are a crucial part of the cloud native environment, and a foundation of most DevOps environments. They are a lightweight encapsulation of software and configuration, which makes it easy to deploy applications and IT systems in an automated and repeatable manner. Docker is the de-facto standard for container engines, and Kubernetes is the most popular orchestration tool used to manage large numbers of containers.
According to the 2020 CNCF Survey, 92% of cloud native users run containers in production, and 83% of them use Kubernetes in production. 23% of organizations have over 5,000 containers, and 12% say they have over 50 Kubernetes clusters running in production—indicating growing enterprise use of containers. In production and large enterprise deployments, performance considerations become critical.
One of the important things to understand is that containers are not virtual machines. A virtual machine runs as a software representation of a computer that is independent of the physical host, but containers depend on the host’s operating system, kernel and file system.
This means that, for example, a virtual machine can be hosted on a computer running Windows, while its workloads run Linux, or vice versa. On the other hand, containers running on the default Linux host must run Linux, because they share the resources of the underlying operating system kernel.
A container running on a host appears to be completely isolated from other containers. But internally, all containers running on a particular host use that host’s kernel and file system. This shared usage of host resources has a profound effect on performance optimization—which only gets more complicated when you run orchestrators like Kubernetes.
Learn more in our detailed guide to containerized architecture
Broadly speaking, Kubernetes runs two types of containers:
Containers are organized into pods, which are deployed on physical hosts called nodes. For both types of containers, the following three metrics are crucial for evaluating performance as part of a Kubernetes deployment.
Monitoring memory usage at the pod and node level can provide valuable insight into cluster performance and the ability to successfully run workloads. Pods whose physical memory usage exceeds the predefined limit will be shut down. Also, if a node is running out of available memory, the kubelet marks the node as out of memory and starts reclaiming its resources.
Like memory, disk space is a critical resource for each container. So if kubelet detects that the root volume is running out of disk space, pod scheduling issues can occur. On specific nodes, when available disk space goes below a certain threshold, the node is flagged as having “disk pressure”, and kubelet may also reclaim node level resources.
You also need to track the usage level of storage volumes used by Kubernetes pods. Storage volumes provide persistent storage, which survives even after a specific container or pod shuts down. This allows you to predict problems at the application or service level.
Tracking the number of CPUs used by pods and nodes, compared to the configured requirements and limits, can provide valuable insights into cluster performance. If there are insufficient CPU resources available at the node level, the node will throttle the CPU resources available to each pod, which can cause performance issues.
Here are a few best practices you can use to improve performance of applications running on Kubernetes:
Learn more in the detailed guide to Kubernetes in production
Here are a few best practices and considerations for monitoring Kubernetes deployments. You should establish careful monitoring at both the cluster and pod level.
The purpose of cluster monitoring is to monitor the health of the entire Kubernetes cluster. As an administrator, you need to know if all nodes in the cluster are functioning normally, the workload capacity they are running, the number of applications running on each node, and the resource utilization of the entire cluster.
You can use Kubernetes metrics to monitor how specific pods and their workloads are behaving. Pay special attention to:
Container metrics are primarily provided by the cAdvisor utility which comes with Kubernetes. For more extensive monitoring capabilities, a common choice is Prometheus, an open source monitoring tool built for cloud native environments.
Learn more in the detailed guides to:
Application performance is a broad term that refers to the efficiency and speed of any software application. It’s about how fast the application responds to user requests, how smoothly it runs, and how well it accomplishes its primary task. This is a critical aspect of software development because it directly affects the user experience. If an application is slow or frequently crashes, users will quickly abandon it in favor of a better-performing alternative.
Application performance is not just about speed. It’s a holistic measure of an application’s efficiency, reliability, and overall user satisfaction. It’s a fundamental quality that every developer should strive to achieve in their software.
Learn more in the detailed guide to application performance monitoring
Java is a versatile and widely-used programming language known for its “write once, run anywhere” capability. However, it’s often criticized for its slower performance compared to languages like C or C++. Here are a few ways to improve Java application performance.
Firstly, always use the latest version of Java. Each new version brings performance improvements and optimizations that can significantly speed up your application. Secondly, use appropriate data structures. The right data structure for the right job can greatly enhance performance. For example, using an ArrayList instead of a LinkedList when frequent random access is needed can improve speed.
Additionally, avoid creating unnecessary objects. Object creation and garbage collection can be costly in terms of performance. So, reuse objects when possible and nullify them when they’re no longer needed.
Learn more in the detailed guide to Java performance
Golang, also known as Go, is lauded for its simplicity and efficiency. It’s built with performance in mind, but there are still ways to optimize your Golang applications.
One way is by understanding and effectively using Goroutines. Goroutines are lightweight threads managed by the Go runtime. They’re cheap to create and can significantly boost performance when used for concurrent tasks.
Another tip is to use built-in functions and packages whenever possible. Go has a rich standard library full of optimized packages, so take full advantage of them. Lastly, minimize garbage collection by reducing heap allocations. This can be achieved by reusing objects and avoiding unnecessary pointer usage.
Learn more in the detailed guide to Golang performance
Python is known for its readability and ease of use, not for its speed. However, with the right practices, Python performance can be significantly improved.
Using built-in functions and libraries is one of the key ways to boost Python performance. These functions are written in C, making them much faster than their Python counterparts. Also, consider using list comprehensions instead of traditional loops for better speed.
Furthermore, consider using a JIT (Just-In-Time) compiler like PyPy. JIT compilers can significantly speed up Python code by compiling it into machine code just before execution.
Learn more in the detailed guide to optimizing Python
Firstly, use the ‘use strict’ directive. This enables strict mode, which can catch common coding mistakes and “unsafe” actions, thus preventing potential performance hits.
Secondly, minimize DOM manipulations. DOM operations are expensive, so try to batch your changes and update the DOM as few times as possible. Also, avoid using global variables as they can slow down lookups.
PHP is a popular server-side scripting language. It’s simple to use but needs careful handling to optimize performance. Here are some PHP performance tips.
Firstly, use the latest PHP version. Each new version comes with performance improvements and new features that can boost speed. Secondly, use native PHP functions whenever possible. These functions are faster and more efficient than custom code.
Also, consider using a PHP accelerator like OPcache. PHP accelerators can significantly improve performance by caching precompiled script bytecode, thereby reducing the need for PHP to load and parse scripts with each request.
Web application developers today can automatically push their code through build, test, and deploy, but are not always sure how the code will perform in production. Web application performance testing is the solution to this problem, and should be an important part of your testing strategy.
A poorly performing website or web application will also do worse on SEO and subsequently get less traffic than competitors, and is likely to have lower engagement, lower conversion, and lower revenues. Effective website and web application performance testing can improve all these metrics by ensuring the development team pays consistent attention to performance.
Learn more in the detailed guide to web performance (coming soon)
Here are a few tools web application developers can use to test and improve performance on an ongoing basis.
Pagespeed Insights by Google checks several on-page and back-end factors of a web page, and reports on their effect on page load time. It provides a performance score for desktop and mobile, shows which elements have the biggest impact on page load time, and provides suggestions for improvement.
GTmetrix, a free tool which is based on the Google Lighthouse performance benchmark, provides five different reports showcasing website performance:
GTmetrix helps visualize page performance and lets you set up alerts to notify about performance issues. It also provides extensive support for testing performance on mobile devices.
Pingdom is a commercial offering that monitors website uptime, page speed, real user monitoring (RUM) showing how actual visitors experience your web pages, and synthetic session monitoring. Pingdom has a global infrastructure with testing servers in 100 countries. It provides notification by email, SMS, and integrates with collaboration and alerting tools like Slack and PagerDuty.
WebPageTest is a free tool that lets you test your web pages from 40 locations, using 25 browsers. It performs in-depth performance tests covering topics like:
Website Speed Test analyzes a website’s images, which are responsible for a large percentage of load time on most web pages. It inspects image format, fit, compression, and quality options, and provides suggestions for optimizing images, which can have a dramatic impact on page load time. The image analysis tool is integrated with WebPageTest.
Here are several simple best practices that can improve performance for your web pages.
Optimizing your code by removing spaces, commas and other unwanted characters, as well as comments, formatting, and unused code, can significantly speed up your page. To compress text files even further, use minification and uglification frameworks such as CSS Nano and UglifyJS.
Whenever a page is redirected to another page, the visitor faces a delay waiting for the HTTP request/response cycle to complete. It is not uncommon to see web pages redirected over three times, and each redirect adds a delay for the user. Not to mention redirect loops and errors that can result in a broken user experience.
Server response time is influenced by the amount of traffic received, the resources used by each page, software running on the server, and the hosting solution used. To speed up server response time, find and fix performance bottlenecks such as delayed database queries, slow routing, and insufficient memory or CPU resources on the server. Aim for a server response time of under 200ms.
Resize images to the required size before using them on a web page, and provide several versions of images for responsive designs. Ensure image files are compressed for the web. Use CSS sprites to combine images like buttons and icons into one large image—the file then loads immediately with fewer HTTP requests, and the web page shows only the relevant portions. This can significantly reduce page load time.
Learn more in the detailed guide to image optimization
Use next-generation image and video formats, which can provide much better compression ratios with higher quality:
Learn more in the detailed guide to next-generation image formats
You can optimize website images and videos by selecting the most appropriate format and compression, and delivering media files efficiently using content delivery networks (CDN).
There are many CMS plugins and tools available that can help automate image and video optimization. These plugins let you automatically convert images to the most appropriate format, and apply the best quality parameters to reduce file size while retaining quality.
Learn more in the detailed guide to video optimization
Lazy loading is a common and effective technique, which involves only loading images only when the website visitor needs to see them.
Typically, the technique detects the user’s viewport, and loads images as the user scrolls down the page and sees them. This can significantly reduce the amount of data loaded to the user’s browser when they initially visit a page, and conserve bandwidth because images that are never viewed by the user do not need to be downloaded.
Learn more in the detailed guide to lazy loading
Lumigo, together with several partner websites, has authored a large repository of content that can help you learn about many aspects of performance testing for cloud native and web applications. Check out the articles below for objective, concise reviews of key data security topics.
Authored by Lumigo
Learn how to monitor serverless applications in production, making them observable and easy to maintain and troubleshoot.
See top articles in our serverless monitoring guide:
Authored by Lumigo
Learn how to debug serverless applications. Understand the differences between debugging for monolithic and microservices apps, and understand serverless testing.
See top articles in our serverless debugging guide:
Authored by Lumigo
Learn how to optimize AWS Lambda performance, overcoming challenges like short-running functions, timeouts, and cold starts.
See top articles in our AWS Lambda performance guide:
Authored by NetApp
Learn to monitor workloads on AWS using first-party and third-party tools, and discover best practices for performance and cost optimization.
See top articles in the AWS monitoring guide:
Authored by Cloudinary
Learn how to optimize images and use compression, quality settings, CDN and other techniques to dramatically reduce page load time.
See top articles in the image optimization guide:
Authored by Cloudinary
Discover next-generation image formats that can help you improve web performance. New formats like WebP and JPEG-XR deliver high quality with improved compression.
See top articles in our image formats guide:
Authored by Granulate
Authored by Cloudinary
Learn how to optimize video content for higher performance and improved user experience, by using the latest compression and streaming technology, and automatically adjusting videos to user requirements.
See top articles in our video optimization guide:
Authored by Cloudinary
Below are additional articles that can help you learn about data security topics.