PTS Key Metrics Explained: RT, QPS & Stability - Performance Testing

Performance testing generates metrics across multiple system layers. This guide defines each metric, explains how it is measured, and provides industry-accepted thresholds for setting pass/fail criteria.

Where to start

If you are new to performance testing or unsure which metrics to focus on, start with these three:

Metric	What it tells you	Typical threshold
Response time (RT)	How long users wait for a response	Varies by industry; typically < 500 ms for internet services
Throughput (TPS/QPS)	How many transactions or queries the system handles per second	Higher is better; depends on your business volume
Error rate	What percentage of requests fail	< 0.6% (success rate >= 99.4%)

These three metrics -- latency, throughput, and availability -- correspond to the RED method (Rate, Errors, Duration) widely used in SRE and observability practices. Once you have a baseline for these, expand to the resource-level and component-level metrics described in the following sections.

System performance metrics

Response time

Response time (RT) is the elapsed time from when a client sends a request to when it receives a complete response. In a performance test, RT is measured from the moment load is applied until the server returns a result. RT is typically reported in seconds or milliseconds.

Average vs. percentile response time

Average response time is the mean RT across all requests for the same transaction while the system runs at steady state. Averages are useful as a general indicator, but they can mask outliers. A system with a 200 ms average RT might still have a p99 of 3 seconds, meaning 1% of users experience unacceptable latency. When available, use percentile metrics (p50, p90, p95, p99) alongside the average for a more accurate picture.

Response time categories:

Category	Description
Simple transaction RT	Single-step operations such as a query or lookup
Complex transaction RT	Multi-step operations involving joins, calculations, or external calls
Special transaction RT	Transactions with known performance characteristics that differ from the norm (document the specifics when defining these)

Industry benchmarks for online transactions

Industry	Acceptable RT
Internet services	< 500 ms (e.g., Taobao targets 10 ms)
Financial services	< 1 s (< 3 s for complex transactions)
Insurance	< 3 s
Manufacturing	< 5 s

For batch processing, acceptable duration depends on the data volume and the available time window. As a general guideline, large-scale batch jobs (such as those run during promotional events like Double 11 or 99 Promotion) should complete within 2 hours.

Throughput

Throughput measures the number of operations processed per unit of time. Three common throughput metrics exist:

Metric	Full name	What it measures
TPS	Transactions Per Second	End-to-end business transactions completed per second
QPS	Queries Per Second	Database or API queries processed per second
HPS	Hits Per Second	HTTP requests (clicks) received by the server per second

For a simple application where each user action generates a single request, TPS, QPS, and HPS are equivalent. In more complex systems, TPS reflects the full business workflow, QPS counts individual query operations, and HPS counts raw HTTP hits.

Higher values indicate greater processing capacity. The following ranges are typical:

Industry	Typical TPS range
E-commerce	10,000 -- 1,000,000
Financial services (excluding internet-based)	1,000 -- 50,000
Insurance (excluding internet-based)	100 -- 100,000
Medium-sized internet services	1,000 -- 50,000
Small internet services	500 -- 10,000
Manufacturing	10 -- 5,000

Concurrent users

Concurrent users is the number of users actively logged in and performing operations at a given point in time. In test configurations, this metric is represented as Virtual Users (VU).

The relationship between concurrent users and system capacity depends on the connection model:

Persistent connections: The maximum number of concurrent users directly represents the system's concurrency capacity.
Short-lived connections: The maximum number of concurrent users does not equal concurrency capacity, because it also depends on system architecture and throughput. If the system provides high throughput and connections are reused efficiently, concurrent users can exceed the number of simultaneous connections.

Choosing the right test mode

For most systems that use short-lived connections, throughput-based testing (TPS or RPS mode) is more representative than concurrency-based testing. Performance Testing (PTS) supports RPS mode, which simplifies throughput test setup and measurement.

Note: Performance tests typically aim to measure system processing capacity rather than maximum concurrent users. A small number of concurrent users generating high request rates can stress a system just as effectively as a large number of idle users.

Error rate

Error rate, also called failure ratio (FR), is the percentage of failed transactions out of all transactions:

Error rate = (Failed transactions / Total transactions) x 100%

In a stable system, request failures are caused by timeouts, so the error rate is equal to the timeout rate.

Threshold: Error rate should stay below 0.6% (6 per mille), corresponding to a success rate of 99.4% or higher.

Resource metrics

Resource metrics identify infrastructure bottlenecks. Monitor these on the server side during load tests.

CPU

Metric	Threshold	Description
CPU utilization (overall)	<= 75%	Total CPU usage across user, sys, wait, and idle modes
CPU sys%	<= 30%	Time spent in kernel mode
CPU wait%	<= 5%	Time spent waiting for I/O
CPU load	< number of CPU cores	Average request queue length

These thresholds apply to every CPU, including single-core instances. The load threshold specifically requires that the system load average remain below the total number of CPU cores.

Memory

High memory utilization alone does not indicate a bottleneck. Modern operating systems use free memory for caching, so 100% memory utilization does not necessarily mean a memory bottleneck. Monitor swap usage as the primary indicator of memory pressure instead.

Threshold: Swap usage should stay below 70%. Exceeding this value typically degrades system performance.

Disk throughput

Disk throughput is the volume of data read from or written to disk per unit of time without a disk failure.

Metric	Description
IOPS	Input/output operations per second
Disk busy%	Percentage of time the disk is actively processing requests
Disk queue length	Number of pending I/O requests
Average service time	Mean time to complete a single I/O operation
Average wait time	Mean time an I/O request waits in the queue
Disk usage	Percentage of disk capacity consumed

Threshold: Disk busy% should stay below 70%.

Network throughput

Network throughput is the volume of data transmitted over the network per unit of time without a network failure, measured in bytes per second (B/s) or megabits per second (Mbps).

Threshold: Network utilization should not exceed 70% of the maximum link or device capacity.

Kernel parameters

The following OS kernel parameters affect system capacity under load. Review and tune these before running performance tests.

Parameter	Unit	Description
Maxuprc	--	Maximum processes per user
Max_thread_proc	--	Maximum threads per process
Filecache_max	Bytes	Maximum physical memory for file I/O cache
Ninode	--	Maximum in-memory inodes (HFS)
Nkthread	--	Maximum concurrent kernel threads
Nproc	--	Maximum concurrent processes
Nstrpty	--	Maximum stream-based pseudo terminal slaves
Maxdsiz	Bytes	Maximum data segment size per process
maxdsiz_64bit	Bytes	Maximum data segment size per process (64-bit)
maxfiles_lim	--	Maximum file descriptors per process
maxssiz_64bit	Bytes	Maximum stack size per process
Maxtsiz	Bytes	Maximum text segment size per process
nflocks	--	Maximum file locks
maxtsiz_64bit	Bytes	Maximum text segment size per process (64-bit)
msgmni	--	Maximum System V IPC message queue IDs
msgtql	--	Maximum System V IPC messages
npty	--	Maximum BSD pseudo TTYs
nstrtel	--	Kernel-supported telnet device files
nswapdev	--	Maximum swap devices
nswapfs	--	Maximum swap file systems
semmni	--	System V IPC semaphore IDs
semmns	--	System V IPC semaphores
shmmax	Bytes	Maximum System V shared memory segment size
shmmni	--	System V shared memory IDs
shmseg	--	Maximum System V shared memory segments per process

Middleware metrics

Common metrics for Java-based middleware (such as Tomcat and WebLogic) fall into three categories: garbage collection (GC), thread pool, and JDBC connections.

Category	Metric	Unit	Description
GC	GC frequency	--	Partial garbage collection events per time interval
GC	Full GC frequency	--	Full garbage collection events per time interval
GC	Average full GC duration	Seconds	Mean time to complete a full GC cycle
GC	Maximum full GC duration	Seconds	Longest observed full GC pause
GC	Heap usage	%	Percentage of JVM heap memory in use
Thread pool	Active thread count	--	Threads currently processing requests
Thread pool	Pending user requests	--	Requests queued and waiting for a thread
JDBC	Active JDBC connections	--	Database connections currently in use

Recommended baselines (under normal system performance):

Metric	Guideline
Active threads	Min: 50, Max: 200
Active JDBC connections	Min: 50, Max: 200
Full GC frequency	Should be infrequent; frequent full GC indicates memory pressure
JVM heap size	Set both minimum and maximum to 1024 MB as a starting point

Database metrics

Common metrics for relational databases (such as MySQL) cover SQL execution, throughput, cache efficiency, and locking.

Category	Metric	Unit	Description
SQL	Execution duration	Microseconds	Time to run a single SQL statement
Throughput	QPS	--	Queries processed per second
Throughput	TPS	--	Transactions committed per second
Cache hit ratio	Key buffer hit ratio	%	Index buffer cache effectiveness
Cache hit ratio	InnoDB buffer hit ratio	%	InnoDB buffer pool cache effectiveness
Cache hit ratio	Query cache hit ratio	%	Query result cache effectiveness
Cache hit ratio	Table cache hit ratio	%	Table metadata cache effectiveness
Cache hit ratio	Thread cache hit ratio	%	Thread reuse effectiveness
Lock	Lock waits	--	Number of times a query waited for a lock
Lock	Lock wait time	Microseconds	Total time spent waiting for locks

Thresholds:

Metric	Guideline
SQL execution duration	Lower is better; target microsecond-level execution
Cache hit ratios	>= 95%
Lock waits and wait time	Lower is better; high values indicate contention

Frontend metrics

Frontend metrics capture the end-user experience in browser-based applications.

Page rendering

Metric	Unit	Description
First Contentful Paint (FCP)	ms	Time until the first visible content appears after navigating to a URL
OnLoad event time	ms	Time until the browser fires the `onLoad` event (all synchronous resources downloaded)
Time to fully loaded	ms	Time until all `onLoad` JavaScript completes and all dynamic or lazy-loaded content finishes loading

Page composition

Metric	Unit	Description
Page size	KB	Total size of all downloaded resources
Request count	--	Total number of network requests required to load the page (lower is better)

Network timing

Metric	Unit	Description
DNS lookup time	ms	Time to resolve the domain name
Connection time	ms	Time to establish a TCP/IP connection
Server time	ms	Time the server spends processing the request
Transfer time	ms	Time to transmit the response body
Wait time	ms	Time waiting for a resource to become available

Thresholds: Minimize page size (use compression) and aim for the lowest practical page rendering times.

Stability metrics

Stability testing determines whether the system maintains consistent performance over extended periods under sustained load.

Test conditions: Apply load at 80% of the system's maximum capacity or at the expected daily peak.

Duration requirements:

System type	Minimum stable run time
Business-hours system (8 h/day)	8 hours
24/7 system	24 hours

Pass criteria:

TPS remains flat with no significant fluctuations.
No resource leaks (memory, connections, file handles) or exceptions occur.

If the system cannot sustain stable performance, expect degradation or crashes as load duration increases.

Batch processing metrics

Batch processing throughput is the volume of data processed per unit of time, typically measured in records per second. Processing efficiency is the primary metric for estimating batch processing time windows.

Considerations:

Multiple batch jobs may run simultaneously with overlapping time windows. Account for this when planning capacity.
Long-running batch jobs can degrade online transaction performance. Schedule batch windows to minimize overlap with peak traffic.

Thresholds:

For large data volumes, keep the batch window as short as possible.
Batch processing must not degrade online transaction response times.

Scalability metrics

Scalability quantifies the relationship between added resources and the resulting performance gain in a clustered or distributed deployment:

Scalability = (Performance increase / Original performance) / (Resource increase / Original resources) x 100%

Run multiple tests with incrementally added resources to observe the scalability trend. A highly scalable system shows a linear or near-linear relationship between resources and performance. Large-scale distributed systems often have high scalability.

Thresholds:

Ideal: Linear scaling (100%).
Acceptable: Performance increase of 70% or more relative to the resource increase.

Reliability metrics

Reliability testing validates that the system recovers from failures without data loss or extended downtime.

Hot standby

Evaluate the following during failover and failback:

Checkpoint	What to verify
Failover success	Does the standby node take over, and how long does the switchover take?
Business continuity during failover	Is service interrupted during the switchover?
Failback success	Does the primary node resume, and how long does the switchback take?
Business continuity during failback	Is service interrupted during the switchback?
Data integrity	How much data, if any, is lost during the switchback?

Run these tests under realistic load using a pressure generation tool to match production conditions.

Cluster architecture

Checkpoint	What to verify
Node failure	Is service interrupted when a cluster node fails?
Node addition	Does adding a new node require a system restart?
Node recovery restart	Does the system need to be restarted when a recovered node is re-added to the cluster?
Node recovery continuity	Is service interrupted when a recovered node is re-added to the cluster?
Switchover duration	How long does the node switchover take?

Run cluster reliability tests under load to produce results consistent with production behavior.

Backup and recovery

Checkpoint	What to verify
Backup success	Does the backup complete, and how long does it take?
Backup automation	Is the backup process scripted and repeatable?
Recovery success	Does the recovery complete, and how long does it take?
Recovery automation	Is the recovery process scripted and repeatable?

Choose the right metrics for your test

Not every test requires every metric. Select metrics based on your test objective:

Test objective	Recommended metrics
Measure system capacity	Response time, TPS/QPS, error rate, CPU, memory
Validate frontend performance	FCP, OnLoad time, page size, request count, network timing
Verify stability under sustained load	TPS trend over time, resource utilization (check for leaks)
Evaluate batch processing	Processing throughput (records/s), batch window duration
Confirm scalability	Scalability ratio across incremental resource additions
Test reliability and failover	Failover/failback time, data loss, service continuity
Assess concurrent user capacity	Virtual users, connection limits, response time under concurrency

After collecting metrics, document the test prerequisites -- including workload profiles, data volumes, and system resource specifications -- to make the results reproducible and comparable. When verifying system performance capacity, specify the metric requirements in your test plan based on the metric definitions described in this guide.