Performance Testing Principles & Best Practices - PTS

This guide defines the technical requirements for planning and executing performance tests with Performance Testing Service (PTS). Follow these requirements to evaluate your system's real-world capacity, prevent production risks, and validate that your infrastructure handles expected traffic patterns.

Scope

These requirements apply to all projects that involve performance testing. The guide covers:

System environments
Test metrics
Business models and test models
Data volume
Test types
Business sessions and scenarios
Monitoring and bottleneck analysis
Tuning
Distributed stress testing with PTS

System environments

Choose an environment

System environments are classified into production, test, and other environments. Performance tests typically run in either a production environment or a dedicated test environment. Each has trade-offs:

Environment	Advantages	Disadvantages
Production	Accurate measurements against real infrastructure	Requires test data cleanup; risk to live services if not scheduled carefully
Test	Isolated from live traffic; controllable risk	Expensive to replicate production at full scale; results may not reflect production behavior

Production environment: Run stress tests during off-peak hours to avoid affecting live services. Alibaba Cloud full-link stress testing provides a more efficient approach for production testing. After testing, clean up test data or filter it from Business Intelligence (BI) reports.

Test environment: Build at a fraction of production scale -- typically half, quarter, or one-eighth. Common strategies include deploying separate test clusters for specific applications within production, or sharing databases between environments. Import masked data from production (typically the last 6 to 12 months) to maintain data relevance.

Test environment requirements

Match the test environment to production as closely as possible:

Requirement	Details
Architecture	Same system architecture as production
Instance specifications	Same Elastic Compute Service (ECS) instance types or container specifications
Software versions	Same operating system, middleware, database, and application versions
Configuration parameters	Same operating system, middleware, database, and application parameters
Data volume	Same order of magnitude as production
Scaling approach	Reduce the number of load generators and scale down other resources proportionally. Target half or quarter of the production configuration

Investigate the test environment

Before testing, investigate each layer of the test environment to establish a baseline for monitoring and bottleneck analysis:

Layer	What to investigate	Purpose
System architecture	System composition, layer functions, differences from production	Bottleneck analysis and production performance evaluation
Operating system	OS platform and version	Tool monitoring
Middleware	Middleware types and versions	Tool monitoring and bottleneck analysis
Database	Database types and versions	Tool monitoring and bottleneck analysis
Application	Running instances and their parameters	Issue detection and bottleneck analysis

Note

Use an Application Performance Management (APM) tool such as Application Real-Time Monitoring Service (ARMS) to trace issues across the middleware, database, and application layers.

Test metrics

Metric categories

Track metrics across four categories:

Category	Key metrics
Business	Concurrent users, transactions per second (TPS), success rate, response time (RT)
Resource	CPU utilization, memory utilization, disk I/O, network I/O, kernel parameters (semaphores, open file count)
Application	Idle threads, database connections, garbage collection (GC) count, full GC count, method duration
Frontend	Page load time, DNS resolution time, connection time, transfer time

Define metric thresholds

Define thresholds for each metric before testing. Without clear thresholds, test results lack actionable meaning -- different stakeholders (development, operations, business) have different expectations and care about different metrics.

Business metric benchmarks:

Metric	Guideline
Response time (RT)	Keep under 1 second for most services. High-performance systems like Taobao typically achieve RT in the tens of milliseconds. Specify whether your threshold applies to mean, median, p95, or p99 values -- aggregating response times as a simple mean can mask latency spikes that affect real users
TPS	Varies by system scale: 50--1,000 for small-to-medium businesses, 1,000--50,000 for banks, 30,000--300,000 for high-traffic platforms like Taobao
Success rate	Industry standard exceeds 99.6% under load

Resource metric guidelines:

Limit CPU utilization to 75% or below
Avoid swap partition usage entirely
Ideally, when a system reaches its capacity limit, a resource (CPU, memory, I/O) should be the bottleneck rather than an application-level issue. Adding resources to a resource-bound system scales its capacity; adding resources to an application-bound system does not

Business models

Role of business models

Each business operation (login, search, checkout, payment) consumes different amounts of system resources. The mix of business types and their relative proportions determines overall system capacity. If the test traffic mix does not match production, the results have no practical value.

For example, in e-commerce, different promotion types and product categories shift the ratio of read-heavy versus write-heavy operations. Accurate traffic modeling in PTS captures these patterns and reveals the true system bottleneck.

Select business types

Choose high-volume, high-risk, and high-growth business operations as representative test cases.

For systems already in production:

Collect business types and volumes at different peak periods. If the mix varies significantly across time periods, define multiple business models
Identify time windows with high resource consumption or anomalies during peak hours, and investigate their root causes
Review past production incidents. If an incident was caused by a business operation that was excluded from previous tests, add it to future test models

For systems not yet in production:

Determine business types and proportions through stakeholder interviews and requirements analysis
Assess whether specific business operations could spike during promotions or events
After initial test runs, review resource consumption per operation. If a low-proportion operation consumes disproportionate resources, adjust its weight in the model

Data volume

Basic data volume

Basic data volume refers to existing data in the database -- historical records, reference data, and accumulated business data. The volume of existing data directly affects query performance: a search across thousands of records behaves very differently from one across millions.

Requirements:

Match the test environment's data volume to the same order of magnitude as production
Account for data growth over the next three years. If data grows rapidly, load additional data into the test environment
When inserting test accounts into a production environment, plan thorough data preparation and cleanup logic
Full-link stress testing also requires the same order of magnitude between test and production environments

Parameterized data volume

Parameterized data drives the variability of test inputs (user IDs, product IDs, search terms). Small parameter sets cause cache hits that inflate performance results.

Requirements:

Maximize the parameterized data volume. If needed, clear caches or generate additional data programmatically
Distribute parameterized data realistically. If a business operation has geographic distribution patterns (for example, region-specific inventory queries), reflect that in the test data

Test models

Test models are derived from business models. In most cases they are identical. However, if a business operation cannot be simulated (due to technical constraints or security policies), remove it from the test model and recalculate the remaining proportions.

Important

If a removed operation carries significant risk (for example, a payment flow), evaluate that risk separately. If the risk is high, find an alternative testing approach rather than simply excluding it.

Test types

Performance test types fall into two broad categories -- load testing and stress testing -- with several specialized variants:

Test type	VUs / throughput	Duration	When to use	Required?
Single-transaction benchmark testing	Low	Short	Establish baseline performance for individual operations	Optional
Single-transaction load testing	Incremental	Medium	Evaluate resource consumption for a specific operation. Recommended for systems not yet in production	Optional
Mixed-transaction load testing (capacity testing)	Average production	Medium to long	Determine overall system capacity under realistic mixed workloads	Required
Mixed-transaction stress testing	Above average	Medium	Find the breaking point by pushing beyond expected peak load	Optional
Business mutation testing	Spike on single operation	Short	Validate system behavior when a specific operation spikes unexpectedly	Optional
Mixed-transaction stability testing	Average production	Long (hours+)	Verify sustained performance over extended periods	Required
Mixed-transaction reliability testing	Average production	Long	Test failover, recovery, and degradation under fault conditions	Optional
Batch testing	Varies	Varies	Measure batch processing performance	Optional
Batch impact testing on mixed transactions	Mixed	Long	Assess how batch jobs affect concurrent online transaction performance	Optional

Progression strategy

Start with simpler test types and progress toward more complex ones. Run single-transaction benchmarks first to establish baselines, then move to mixed-transaction load testing, and then to stress and stability testing. At minimum, run the two required types: mixed-transaction load testing and mixed-transaction stability testing.

Note

No single test type uncovers all risks. Different test types target different failure modes. Prioritize based on your system's risk profile.

Business sessions

A business session is an ordered sequence of API calls that represents a complete user workflow. For example, a typical e-commerce session might follow this pattern:

Browse the product catalog
Search for a specific product
Open a product detail page
Add the item to the cart
Log in
Complete checkout
Make payment

The accuracy of these sessions directly affects how well tests reflect real-world performance.

Requirements:

Design sessions based on actual production business rules and user workflows
Include realistic pauses between steps to simulate user think time. Without think time, the test generates unrealistic request rates that do not match production traffic patterns
Add checkpoints (assertions) at key points to verify server responses. For details, see Interface response parameters
Parameterize all variable data (user credentials, product IDs, search queries) and maximize the data volume

Scenarios

A stress testing scenario combines multiple HTTP/HTTPS API calls or URLs into a workload that simulates real production traffic. Each scenario defines the load profile: pressure mode, ramp-up method, and duration.

The critical requirement: The TPS proportion of each business operation in the test must match the actual business proportion during production peak hours.

RPS mode and concurrency mode

PTS supports two load modes. Choose the mode that matches how your system receives real traffic:

Open workload model (RPS mode -- recommended): You control the arrival rate of requests. Use this mode when your system faces uncontrolled incoming traffic, such as a public-facing web application. This is the most common scenario.
Closed workload model (concurrency mode): You control the number of concurrent users. Use this mode when your system controls admission, such as a call center with a fixed number of agents.

Important

If your real system is an open workload but you test with a closed workload model, the test results do not reflect actual production behavior. The concurrency model implicitly throttles request rates based on response time, which masks performance issues that surface under real traffic conditions.

Example: Two interfaces A and B have a production ratio of 1:4, with response times of 1 ms and 100 ms respectively.

Mode	Configuration	Result
RPS mode (recommended)	Set A to 100 RPS, B to 400 RPS	Direct 1:4 ratio. The business model matches production
Concurrency mode	Set A to 1 concurrent user, B to 400 concurrent users	Must compensate for the 100x RT difference. Ratio becomes 1:400 in concurrency to achieve 1:4 in throughput

RPS mode eliminates the need to calculate concurrency adjustments based on response time differences, giving more precise control over the traffic mix.

Monitoring

Why comprehensive monitoring matters

Monitoring during performance tests serves two purposes: real-time bottleneck detection and post-test root cause analysis. Without comprehensive metrics across all layers, identifying the source of a bottleneck becomes guesswork.

Metrics to monitor

Layer	Key metrics
Operating system	CPU utilization (User, Sys, Wait, Idle), memory utilization (including swap), disk I/O, network I/O, kernel parameters
Middleware	Thread pools, Java Database Connectivity (JDBC) connection pools, Java Virtual Machine (JVM) metrics (GC frequency, full GC frequency, heap size)
Database	Slow SQL statements, locks, cache hit ratio, sessions, process count
Application	Method duration, synchronous vs. asynchronous processing, buffering, cache behavior

Bottleneck analysis

Performance bottlenecks typically fall into four categories: operating system resources, middleware configuration, database issues, and application logic.

Analysis focus by layer

Layer	Focus areas
Operating system	CPU, memory, disk I/O, network I/O
Middleware	Thread pools, JDBC connection pools, JVM metrics (GC frequency, full GC frequency, heap size)
Database	Slow SQL statements, lock waits, deadlocks, cache hit ratio, sessions, process count
Application	Method duration, algorithm efficiency, synchronous vs. asynchronous processing, cache usage, buffering
Load generator	Resource consumption on the load generator machines. In most cases, load generators are unlikely to be the bottleneck. PTS has built-in protection and scheduling mechanisms that handle this automatically

Targeted analysis at each layer enables efficient tuning: fix the actual bottleneck rather than adding resources that have no impact.

Tuning

After identifying bottlenecks, tune the system and retest to verify improvements. Common tuning areas include:

Layer	Tuning targets
Middleware	Thread pool sizes, database connection pool sizes, JVM parameters
Database	Slow SQL optimization, deadlock and lock wait resolution, cache hit ratio improvement, process and session parameter adjustments
Application	Method execution time, algorithm optimization, synchronous-to-asynchronous conversion, cache strategy, buffer sizing
System resources	High CPU or memory consumption is usually caused by suboptimal application settings or parameter configurations -- not by a lack of hardware. Address application-level issues before scaling resources

Distributed stress testing with PTS

Performance Testing Service (PTS) is a SaaS-based platform for distributed stress testing. No additional installation or deployment is required.

Traffic simulation

PTS generates test traffic from Content Delivery Network (CDN) nodes across hundreds of cities and multiple carriers worldwide. This approach simulates realistic end-user access patterns, including geographic distribution and carrier-specific routing.

Scenario design

Record scenarios using mainstream browser recording plug-ins, or orchestrate sequential and parallel API execution through a visual interface -- no coding required
Parameterize requests using data files, built-in functions, string operations, and response extraction. PTS serves as a data factory that formats request parameters through simple encoding
Maintain cookie and session state across API calls, with extensible instructions that support multi-form think time and traffic regulation
Debug scenarios and individual APIs before running full-scale tests

Load control

Two load modes: Concurrency mode and requests per second (RPS) mode
Fast startup: Launch stress tests within minutes
Real-time adjustment: Traffic changes take effect within seconds
Pulse capability: Generate instantaneous spikes of millions of queries per second (QPS)
Safety controls: Stop stress testing traffic immediately when needed

Reporting

PTS collects real-time data during each test, including concurrency, TPS, RT, and sampled logs for each API. Reports are generated automatically for post-test analysis.

Supported industries

PTS supports workloads across e-commerce, media, finance and insurance, logistics, advertising and marketing, and social networking.