All Products
Search
Document Center

Performance Testing:Test metrics

Last Updated:Mar 11, 2026

Performance testing generates metrics across multiple system layers. This guide defines each metric, explains how it is measured, and provides industry-accepted thresholds for setting pass/fail criteria.

Where to start

If you are new to performance testing or unsure which metrics to focus on, start with these three:

Metric

What it tells you

Typical threshold

Response time (RT)

How long users wait for a response

Varies by industry; typically < 500 ms for internet services

Throughput (TPS/QPS)

How many transactions or queries the system handles per second

Higher is better; depends on your business volume

Error rate

What percentage of requests fail

< 0.6% (success rate >= 99.4%)

These three metrics -- latency, throughput, and availability -- correspond to the RED method (Rate, Errors, Duration) widely used in SRE and observability practices. Once you have a baseline for these, expand to the resource-level and component-level metrics described in the following sections.

System performance metrics

Response time

Response time (RT) is the elapsed time from when a client sends a request to when it receives a complete response. In a performance test, RT is measured from the moment load is applied until the server returns a result. RT is typically reported in seconds or milliseconds.

Average vs. percentile response time

Average response time is the mean RT across all requests for the same transaction while the system runs at steady state. Averages are useful as a general indicator, but they can mask outliers. A system with a 200 ms average RT might still have a p99 of 3 seconds, meaning 1% of users experience unacceptable latency. When available, use percentile metrics (p50, p90, p95, p99) alongside the average for a more accurate picture.

Response time categories:

Category

Description

Simple transaction RT

Single-step operations such as a query or lookup

Complex transaction RT

Multi-step operations involving joins, calculations, or external calls

Special transaction RT

Transactions with known performance characteristics that differ from the norm (document the specifics when defining these)

Industry benchmarks for online transactions

Industry

Acceptable RT

Internet services

< 500 ms (e.g., Taobao targets 10 ms)

Financial services

< 1 s (< 3 s for complex transactions)

Insurance

< 3 s

Manufacturing

< 5 s

For batch processing, acceptable duration depends on the data volume and the available time window. As a general guideline, large-scale batch jobs (such as those run during promotional events like Double 11 or 99 Promotion) should complete within 2 hours.

Throughput

Throughput measures the number of operations processed per unit of time. Three common throughput metrics exist:

Metric

Full name

What it measures

TPS

Transactions Per Second

End-to-end business transactions completed per second

QPS

Queries Per Second

Database or API queries processed per second

HPS

Hits Per Second

HTTP requests (clicks) received by the server per second

For a simple application where each user action generates a single request, TPS, QPS, and HPS are equivalent. In more complex systems, TPS reflects the full business workflow, QPS counts individual query operations, and HPS counts raw HTTP hits.

Higher values indicate greater processing capacity. The following ranges are typical:

Industry

Typical TPS range

E-commerce

10,000 -- 1,000,000

Financial services (excluding internet-based)

1,000 -- 50,000

Insurance (excluding internet-based)

100 -- 100,000

Medium-sized internet services

1,000 -- 50,000

Small internet services

500 -- 10,000

Manufacturing

10 -- 5,000

Concurrent users

Concurrent users is the number of users actively logged in and performing operations at a given point in time. In test configurations, this metric is represented as Virtual Users (VU).

The relationship between concurrent users and system capacity depends on the connection model:

  • Persistent connections: The maximum number of concurrent users directly represents the system's concurrency capacity.

  • Short-lived connections: The maximum number of concurrent users does not equal concurrency capacity, because it also depends on system architecture and throughput. If the system provides high throughput and connections are reused efficiently, concurrent users can exceed the number of simultaneous connections.

Choosing the right test mode

For most systems that use short-lived connections, throughput-based testing (TPS or RPS mode) is more representative than concurrency-based testing. Performance Testing (PTS) supports RPS mode, which simplifies throughput test setup and measurement.

Note: Performance tests typically aim to measure system processing capacity rather than maximum concurrent users. A small number of concurrent users generating high request rates can stress a system just as effectively as a large number of idle users.

Error rate

Error rate, also called failure ratio (FR), is the percentage of failed transactions out of all transactions:

Error rate = (Failed transactions / Total transactions) x 100%

In a stable system, request failures are caused by timeouts, so the error rate is equal to the timeout rate.

Threshold: Error rate should stay below 0.6% (6 per mille), corresponding to a success rate of 99.4% or higher.

Resource metrics

Resource metrics identify infrastructure bottlenecks. Monitor these on the server side during load tests.

CPU

Metric

Threshold

Description

CPU utilization (overall)

<= 75%

Total CPU usage across user, sys, wait, and idle modes

CPU sys%

<= 30%

Time spent in kernel mode

CPU wait%

<= 5%

Time spent waiting for I/O

CPU load

< number of CPU cores

Average request queue length

These thresholds apply to every CPU, including single-core instances. The load threshold specifically requires that the system load average remain below the total number of CPU cores.

Memory

High memory utilization alone does not indicate a bottleneck. Modern operating systems use free memory for caching, so 100% memory utilization does not necessarily mean a memory bottleneck. Monitor swap usage as the primary indicator of memory pressure instead.

Threshold: Swap usage should stay below 70%. Exceeding this value typically degrades system performance.

Disk throughput

Disk throughput is the volume of data read from or written to disk per unit of time without a disk failure.

Metric

Description

IOPS

Input/output operations per second

Disk busy%

Percentage of time the disk is actively processing requests

Disk queue length

Number of pending I/O requests

Average service time

Mean time to complete a single I/O operation

Average wait time

Mean time an I/O request waits in the queue

Disk usage

Percentage of disk capacity consumed

Threshold: Disk busy% should stay below 70%.

Network throughput

Network throughput is the volume of data transmitted over the network per unit of time without a network failure, measured in bytes per second (B/s) or megabits per second (Mbps).

Threshold: Network utilization should not exceed 70% of the maximum link or device capacity.

Kernel parameters

The following OS kernel parameters affect system capacity under load. Review and tune these before running performance tests.

Parameter

Unit

Description

Maxuprc

--

Maximum processes per user

Max_thread_proc

--

Maximum threads per process

Filecache_max

Bytes

Maximum physical memory for file I/O cache

Ninode

--

Maximum in-memory inodes (HFS)

Nkthread

--

Maximum concurrent kernel threads

Nproc

--

Maximum concurrent processes

Nstrpty

--

Maximum stream-based pseudo terminal slaves

Maxdsiz

Bytes

Maximum data segment size per process

maxdsiz_64bit

Bytes

Maximum data segment size per process (64-bit)

maxfiles_lim

--

Maximum file descriptors per process

maxssiz_64bit

Bytes

Maximum stack size per process

Maxtsiz

Bytes

Maximum text segment size per process

nflocks

--

Maximum file locks

maxtsiz_64bit

Bytes

Maximum text segment size per process (64-bit)

msgmni

--

Maximum System V IPC message queue IDs

msgtql

--

Maximum System V IPC messages

npty

--

Maximum BSD pseudo TTYs

nstrtel

--

Kernel-supported telnet device files

nswapdev

--

Maximum swap devices

nswapfs

--

Maximum swap file systems

semmni

--

System V IPC semaphore IDs

semmns

--

System V IPC semaphores

shmmax

Bytes

Maximum System V shared memory segment size

shmmni

--

System V shared memory IDs

shmseg

--

Maximum System V shared memory segments per process

Middleware metrics

Common metrics for Java-based middleware (such as Tomcat and WebLogic) fall into three categories: garbage collection (GC), thread pool, and JDBC connections.

Category

Metric

Unit

Description

GC

GC frequency

--

Partial garbage collection events per time interval

GC

Full GC frequency

--

Full garbage collection events per time interval

GC

Average full GC duration

Seconds

Mean time to complete a full GC cycle

GC

Maximum full GC duration

Seconds

Longest observed full GC pause

GC

Heap usage

%

Percentage of JVM heap memory in use

Thread pool

Active thread count

--

Threads currently processing requests

Thread pool

Pending user requests

--

Requests queued and waiting for a thread

JDBC

Active JDBC connections

--

Database connections currently in use

Recommended baselines (under normal system performance):

Metric

Guideline

Active threads

Min: 50, Max: 200

Active JDBC connections

Min: 50, Max: 200

Full GC frequency

Should be infrequent; frequent full GC indicates memory pressure

JVM heap size

Set both minimum and maximum to 1024 MB as a starting point

Database metrics

Common metrics for relational databases (such as MySQL) cover SQL execution, throughput, cache efficiency, and locking.

Category

Metric

Unit

Description

SQL

Execution duration

Microseconds

Time to run a single SQL statement

Throughput

QPS

--

Queries processed per second

Throughput

TPS

--

Transactions committed per second

Cache hit ratio

Key buffer hit ratio

%

Index buffer cache effectiveness

Cache hit ratio

InnoDB buffer hit ratio

%

InnoDB buffer pool cache effectiveness

Cache hit ratio

Query cache hit ratio

%

Query result cache effectiveness

Cache hit ratio

Table cache hit ratio

%

Table metadata cache effectiveness

Cache hit ratio

Thread cache hit ratio

%

Thread reuse effectiveness

Lock

Lock waits

--

Number of times a query waited for a lock

Lock

Lock wait time

Microseconds

Total time spent waiting for locks

Thresholds:

Metric

Guideline

SQL execution duration

Lower is better; target microsecond-level execution

Cache hit ratios

>= 95%

Lock waits and wait time

Lower is better; high values indicate contention

Frontend metrics

Frontend metrics capture the end-user experience in browser-based applications.

Page rendering

Metric

Unit

Description

First Contentful Paint (FCP)

ms

Time until the first visible content appears after navigating to a URL

OnLoad event time

ms

Time until the browser fires the onLoad event (all synchronous resources downloaded)

Time to fully loaded

ms

Time until all onLoad JavaScript completes and all dynamic or lazy-loaded content finishes loading

Page composition

Metric

Unit

Description

Page size

KB

Total size of all downloaded resources

Request count

--

Total number of network requests required to load the page (lower is better)

Network timing

Metric

Unit

Description

DNS lookup time

ms

Time to resolve the domain name

Connection time

ms

Time to establish a TCP/IP connection

Server time

ms

Time the server spends processing the request

Transfer time

ms

Time to transmit the response body

Wait time

ms

Time waiting for a resource to become available

Thresholds: Minimize page size (use compression) and aim for the lowest practical page rendering times.

Stability metrics

Stability testing determines whether the system maintains consistent performance over extended periods under sustained load.

Test conditions: Apply load at 80% of the system's maximum capacity or at the expected daily peak.

Duration requirements:

System type

Minimum stable run time

Business-hours system (8 h/day)

8 hours

24/7 system

24 hours

Pass criteria:

  • TPS remains flat with no significant fluctuations.

  • No resource leaks (memory, connections, file handles) or exceptions occur.

If the system cannot sustain stable performance, expect degradation or crashes as load duration increases.

Batch processing metrics

Batch processing throughput is the volume of data processed per unit of time, typically measured in records per second. Processing efficiency is the primary metric for estimating batch processing time windows.

Considerations:

  • Multiple batch jobs may run simultaneously with overlapping time windows. Account for this when planning capacity.

  • Long-running batch jobs can degrade online transaction performance. Schedule batch windows to minimize overlap with peak traffic.

Thresholds:

  • For large data volumes, keep the batch window as short as possible.

  • Batch processing must not degrade online transaction response times.

Scalability metrics

Scalability quantifies the relationship between added resources and the resulting performance gain in a clustered or distributed deployment:

Scalability = (Performance increase / Original performance) / (Resource increase / Original resources) x 100%

Run multiple tests with incrementally added resources to observe the scalability trend. A highly scalable system shows a linear or near-linear relationship between resources and performance. Large-scale distributed systems often have high scalability.

Thresholds:

  • Ideal: Linear scaling (100%).

  • Acceptable: Performance increase of 70% or more relative to the resource increase.

Reliability metrics

Reliability testing validates that the system recovers from failures without data loss or extended downtime.

Hot standby

Evaluate the following during failover and failback:

Checkpoint

What to verify

Failover success

Does the standby node take over, and how long does the switchover take?

Business continuity during failover

Is service interrupted during the switchover?

Failback success

Does the primary node resume, and how long does the switchback take?

Business continuity during failback

Is service interrupted during the switchback?

Data integrity

How much data, if any, is lost during the switchback?

Run these tests under realistic load using a pressure generation tool to match production conditions.

Cluster architecture

Checkpoint

What to verify

Node failure

Is service interrupted when a cluster node fails?

Node addition

Does adding a new node require a system restart?

Node recovery restart

Does the system need to be restarted when a recovered node is re-added to the cluster?

Node recovery continuity

Is service interrupted when a recovered node is re-added to the cluster?

Switchover duration

How long does the node switchover take?

Run cluster reliability tests under load to produce results consistent with production behavior.

Backup and recovery

Checkpoint

What to verify

Backup success

Does the backup complete, and how long does it take?

Backup automation

Is the backup process scripted and repeatable?

Recovery success

Does the recovery complete, and how long does it take?

Recovery automation

Is the recovery process scripted and repeatable?

Choose the right metrics for your test

Not every test requires every metric. Select metrics based on your test objective:

Test objective

Recommended metrics

Measure system capacity

Response time, TPS/QPS, error rate, CPU, memory

Validate frontend performance

FCP, OnLoad time, page size, request count, network timing

Verify stability under sustained load

TPS trend over time, resource utilization (check for leaks)

Evaluate batch processing

Processing throughput (records/s), batch window duration

Confirm scalability

Scalability ratio across incremental resource additions

Test reliability and failover

Failover/failback time, data loss, service continuity

Assess concurrent user capacity

Virtual users, connection limits, response time under concurrency

After collecting metrics, document the test prerequisites -- including workload profiles, data volumes, and system resource specifications -- to make the results reproducible and comparable. When verifying system performance capacity, specify the metric requirements in your test plan based on the metric definitions described in this guide.