Performance testing generates metrics across multiple system layers. This guide defines each metric, explains how it is measured, and provides industry-accepted thresholds for setting pass/fail criteria.
Where to start
If you are new to performance testing or unsure which metrics to focus on, start with these three:
Metric | What it tells you | Typical threshold |
Response time (RT) | How long users wait for a response | Varies by industry; typically < 500 ms for internet services |
Throughput (TPS/QPS) | How many transactions or queries the system handles per second | Higher is better; depends on your business volume |
Error rate | What percentage of requests fail | < 0.6% (success rate >= 99.4%) |
These three metrics -- latency, throughput, and availability -- correspond to the RED method (Rate, Errors, Duration) widely used in SRE and observability practices. Once you have a baseline for these, expand to the resource-level and component-level metrics described in the following sections.
System performance metrics
Response time
Response time (RT) is the elapsed time from when a client sends a request to when it receives a complete response. In a performance test, RT is measured from the moment load is applied until the server returns a result. RT is typically reported in seconds or milliseconds.
Average vs. percentile response time
Average response time is the mean RT across all requests for the same transaction while the system runs at steady state. Averages are useful as a general indicator, but they can mask outliers. A system with a 200 ms average RT might still have a p99 of 3 seconds, meaning 1% of users experience unacceptable latency. When available, use percentile metrics (p50, p90, p95, p99) alongside the average for a more accurate picture.
Response time categories:
Category | Description |
Simple transaction RT | Single-step operations such as a query or lookup |
Complex transaction RT | Multi-step operations involving joins, calculations, or external calls |
Special transaction RT | Transactions with known performance characteristics that differ from the norm (document the specifics when defining these) |
Industry benchmarks for online transactions
Industry | Acceptable RT |
Internet services | < 500 ms (e.g., Taobao targets 10 ms) |
Financial services | < 1 s (< 3 s for complex transactions) |
Insurance | < 3 s |
Manufacturing | < 5 s |
For batch processing, acceptable duration depends on the data volume and the available time window. As a general guideline, large-scale batch jobs (such as those run during promotional events like Double 11 or 99 Promotion) should complete within 2 hours.
Throughput
Throughput measures the number of operations processed per unit of time. Three common throughput metrics exist:
Metric | Full name | What it measures |
TPS | Transactions Per Second | End-to-end business transactions completed per second |
QPS | Queries Per Second | Database or API queries processed per second |
HPS | Hits Per Second | HTTP requests (clicks) received by the server per second |
For a simple application where each user action generates a single request, TPS, QPS, and HPS are equivalent. In more complex systems, TPS reflects the full business workflow, QPS counts individual query operations, and HPS counts raw HTTP hits.
Higher values indicate greater processing capacity. The following ranges are typical:
Industry | Typical TPS range |
E-commerce | 10,000 -- 1,000,000 |
Financial services (excluding internet-based) | 1,000 -- 50,000 |
Insurance (excluding internet-based) | 100 -- 100,000 |
Medium-sized internet services | 1,000 -- 50,000 |
Small internet services | 500 -- 10,000 |
Manufacturing | 10 -- 5,000 |
Concurrent users
Concurrent users is the number of users actively logged in and performing operations at a given point in time. In test configurations, this metric is represented as Virtual Users (VU).
The relationship between concurrent users and system capacity depends on the connection model:
Persistent connections: The maximum number of concurrent users directly represents the system's concurrency capacity.
Short-lived connections: The maximum number of concurrent users does not equal concurrency capacity, because it also depends on system architecture and throughput. If the system provides high throughput and connections are reused efficiently, concurrent users can exceed the number of simultaneous connections.
Choosing the right test mode
For most systems that use short-lived connections, throughput-based testing (TPS or RPS mode) is more representative than concurrency-based testing. Performance Testing (PTS) supports RPS mode, which simplifies throughput test setup and measurement.
Note: Performance tests typically aim to measure system processing capacity rather than maximum concurrent users. A small number of concurrent users generating high request rates can stress a system just as effectively as a large number of idle users.
Error rate
Error rate, also called failure ratio (FR), is the percentage of failed transactions out of all transactions:
Error rate = (Failed transactions / Total transactions) x 100%In a stable system, request failures are caused by timeouts, so the error rate is equal to the timeout rate.
Threshold: Error rate should stay below 0.6% (6 per mille), corresponding to a success rate of 99.4% or higher.
Resource metrics
Resource metrics identify infrastructure bottlenecks. Monitor these on the server side during load tests.
CPU
Metric | Threshold | Description |
CPU utilization (overall) | <= 75% | Total CPU usage across user, sys, wait, and idle modes |
CPU sys% | <= 30% | Time spent in kernel mode |
CPU wait% | <= 5% | Time spent waiting for I/O |
CPU load | < number of CPU cores | Average request queue length |
These thresholds apply to every CPU, including single-core instances. The load threshold specifically requires that the system load average remain below the total number of CPU cores.
Memory
High memory utilization alone does not indicate a bottleneck. Modern operating systems use free memory for caching, so 100% memory utilization does not necessarily mean a memory bottleneck. Monitor swap usage as the primary indicator of memory pressure instead.
Threshold: Swap usage should stay below 70%. Exceeding this value typically degrades system performance.
Disk throughput
Disk throughput is the volume of data read from or written to disk per unit of time without a disk failure.
Metric | Description |
IOPS | Input/output operations per second |
Disk busy% | Percentage of time the disk is actively processing requests |
Disk queue length | Number of pending I/O requests |
Average service time | Mean time to complete a single I/O operation |
Average wait time | Mean time an I/O request waits in the queue |
Disk usage | Percentage of disk capacity consumed |
Threshold: Disk busy% should stay below 70%.
Network throughput
Network throughput is the volume of data transmitted over the network per unit of time without a network failure, measured in bytes per second (B/s) or megabits per second (Mbps).
Threshold: Network utilization should not exceed 70% of the maximum link or device capacity.
Kernel parameters
The following OS kernel parameters affect system capacity under load. Review and tune these before running performance tests.
Parameter | Unit | Description |
Maxuprc | -- | Maximum processes per user |
Max_thread_proc | -- | Maximum threads per process |
Filecache_max | Bytes | Maximum physical memory for file I/O cache |
Ninode | -- | Maximum in-memory inodes (HFS) |
Nkthread | -- | Maximum concurrent kernel threads |
Nproc | -- | Maximum concurrent processes |
Nstrpty | -- | Maximum stream-based pseudo terminal slaves |
Maxdsiz | Bytes | Maximum data segment size per process |
maxdsiz_64bit | Bytes | Maximum data segment size per process (64-bit) |
maxfiles_lim | -- | Maximum file descriptors per process |
maxssiz_64bit | Bytes | Maximum stack size per process |
Maxtsiz | Bytes | Maximum text segment size per process |
nflocks | -- | Maximum file locks |
maxtsiz_64bit | Bytes | Maximum text segment size per process (64-bit) |
msgmni | -- | Maximum System V IPC message queue IDs |
msgtql | -- | Maximum System V IPC messages |
npty | -- | Maximum BSD pseudo TTYs |
nstrtel | -- | Kernel-supported telnet device files |
nswapdev | -- | Maximum swap devices |
nswapfs | -- | Maximum swap file systems |
semmni | -- | System V IPC semaphore IDs |
semmns | -- | System V IPC semaphores |
shmmax | Bytes | Maximum System V shared memory segment size |
shmmni | -- | System V shared memory IDs |
shmseg | -- | Maximum System V shared memory segments per process |
Middleware metrics
Common metrics for Java-based middleware (such as Tomcat and WebLogic) fall into three categories: garbage collection (GC), thread pool, and JDBC connections.
Category | Metric | Unit | Description |
GC | GC frequency | -- | Partial garbage collection events per time interval |
GC | Full GC frequency | -- | Full garbage collection events per time interval |
GC | Average full GC duration | Seconds | Mean time to complete a full GC cycle |
GC | Maximum full GC duration | Seconds | Longest observed full GC pause |
GC | Heap usage | % | Percentage of JVM heap memory in use |
Thread pool | Active thread count | -- | Threads currently processing requests |
Thread pool | Pending user requests | -- | Requests queued and waiting for a thread |
JDBC | Active JDBC connections | -- | Database connections currently in use |
Recommended baselines (under normal system performance):
Metric | Guideline |
Active threads | Min: 50, Max: 200 |
Active JDBC connections | Min: 50, Max: 200 |
Full GC frequency | Should be infrequent; frequent full GC indicates memory pressure |
JVM heap size | Set both minimum and maximum to 1024 MB as a starting point |
Database metrics
Common metrics for relational databases (such as MySQL) cover SQL execution, throughput, cache efficiency, and locking.
Category | Metric | Unit | Description |
SQL | Execution duration | Microseconds | Time to run a single SQL statement |
Throughput | QPS | -- | Queries processed per second |
Throughput | TPS | -- | Transactions committed per second |
Cache hit ratio | Key buffer hit ratio | % | Index buffer cache effectiveness |
Cache hit ratio | InnoDB buffer hit ratio | % | InnoDB buffer pool cache effectiveness |
Cache hit ratio | Query cache hit ratio | % | Query result cache effectiveness |
Cache hit ratio | Table cache hit ratio | % | Table metadata cache effectiveness |
Cache hit ratio | Thread cache hit ratio | % | Thread reuse effectiveness |
Lock | Lock waits | -- | Number of times a query waited for a lock |
Lock | Lock wait time | Microseconds | Total time spent waiting for locks |
Thresholds:
Metric | Guideline |
SQL execution duration | Lower is better; target microsecond-level execution |
Cache hit ratios | >= 95% |
Lock waits and wait time | Lower is better; high values indicate contention |
Frontend metrics
Frontend metrics capture the end-user experience in browser-based applications.
Page rendering
Metric | Unit | Description |
First Contentful Paint (FCP) | ms | Time until the first visible content appears after navigating to a URL |
OnLoad event time | ms | Time until the browser fires the |
Time to fully loaded | ms | Time until all |
Page composition
Metric | Unit | Description |
Page size | KB | Total size of all downloaded resources |
Request count | -- | Total number of network requests required to load the page (lower is better) |
Network timing
Metric | Unit | Description |
DNS lookup time | ms | Time to resolve the domain name |
Connection time | ms | Time to establish a TCP/IP connection |
Server time | ms | Time the server spends processing the request |
Transfer time | ms | Time to transmit the response body |
Wait time | ms | Time waiting for a resource to become available |
Thresholds: Minimize page size (use compression) and aim for the lowest practical page rendering times.
Stability metrics
Stability testing determines whether the system maintains consistent performance over extended periods under sustained load.
Test conditions: Apply load at 80% of the system's maximum capacity or at the expected daily peak.
Duration requirements:
System type | Minimum stable run time |
Business-hours system (8 h/day) | 8 hours |
24/7 system | 24 hours |
Pass criteria:
TPS remains flat with no significant fluctuations.
No resource leaks (memory, connections, file handles) or exceptions occur.
If the system cannot sustain stable performance, expect degradation or crashes as load duration increases.
Batch processing metrics
Batch processing throughput is the volume of data processed per unit of time, typically measured in records per second. Processing efficiency is the primary metric for estimating batch processing time windows.
Considerations:
Multiple batch jobs may run simultaneously with overlapping time windows. Account for this when planning capacity.
Long-running batch jobs can degrade online transaction performance. Schedule batch windows to minimize overlap with peak traffic.
Thresholds:
For large data volumes, keep the batch window as short as possible.
Batch processing must not degrade online transaction response times.
Scalability metrics
Scalability quantifies the relationship between added resources and the resulting performance gain in a clustered or distributed deployment:
Scalability = (Performance increase / Original performance) / (Resource increase / Original resources) x 100%Run multiple tests with incrementally added resources to observe the scalability trend. A highly scalable system shows a linear or near-linear relationship between resources and performance. Large-scale distributed systems often have high scalability.
Thresholds:
Ideal: Linear scaling (100%).
Acceptable: Performance increase of 70% or more relative to the resource increase.
Reliability metrics
Reliability testing validates that the system recovers from failures without data loss or extended downtime.
Hot standby
Evaluate the following during failover and failback:
Checkpoint | What to verify |
Failover success | Does the standby node take over, and how long does the switchover take? |
Business continuity during failover | Is service interrupted during the switchover? |
Failback success | Does the primary node resume, and how long does the switchback take? |
Business continuity during failback | Is service interrupted during the switchback? |
Data integrity | How much data, if any, is lost during the switchback? |
Run these tests under realistic load using a pressure generation tool to match production conditions.
Cluster architecture
Checkpoint | What to verify |
Node failure | Is service interrupted when a cluster node fails? |
Node addition | Does adding a new node require a system restart? |
Node recovery restart | Does the system need to be restarted when a recovered node is re-added to the cluster? |
Node recovery continuity | Is service interrupted when a recovered node is re-added to the cluster? |
Switchover duration | How long does the node switchover take? |
Run cluster reliability tests under load to produce results consistent with production behavior.
Backup and recovery
Checkpoint | What to verify |
Backup success | Does the backup complete, and how long does it take? |
Backup automation | Is the backup process scripted and repeatable? |
Recovery success | Does the recovery complete, and how long does it take? |
Recovery automation | Is the recovery process scripted and repeatable? |
Choose the right metrics for your test
Not every test requires every metric. Select metrics based on your test objective:
Test objective | Recommended metrics |
Measure system capacity | Response time, TPS/QPS, error rate, CPU, memory |
Validate frontend performance | FCP, OnLoad time, page size, request count, network timing |
Verify stability under sustained load | TPS trend over time, resource utilization (check for leaks) |
Evaluate batch processing | Processing throughput (records/s), batch window duration |
Confirm scalability | Scalability ratio across incremental resource additions |
Test reliability and failover | Failover/failback time, data loss, service continuity |
Assess concurrent user capacity | Virtual users, connection limits, response time under concurrency |
After collecting metrics, document the test prerequisites -- including workload profiles, data volumes, and system resource specifications -- to make the results reproducible and comparable. When verifying system performance capacity, specify the metric requirements in your test plan based on the metric definitions described in this guide.