This guide defines the technical requirements for planning and executing performance tests with Performance Testing Service (PTS). Follow these requirements to evaluate your system's real-world capacity, prevent production risks, and validate that your infrastructure handles expected traffic patterns.
Scope
These requirements apply to all projects that involve performance testing. The guide covers:
System environments
Test metrics
Business models and test models
Data volume
Test types
Business sessions and scenarios
Monitoring and bottleneck analysis
Tuning
Distributed stress testing with PTS
System environments
Choose an environment
System environments are classified into production, test, and other environments. Performance tests typically run in either a production environment or a dedicated test environment. Each has trade-offs:
| Environment | Advantages | Disadvantages |
|---|---|---|
| Production | Accurate measurements against real infrastructure | Requires test data cleanup; risk to live services if not scheduled carefully |
| Test | Isolated from live traffic; controllable risk | Expensive to replicate production at full scale; results may not reflect production behavior |
Production environment: Run stress tests during off-peak hours to avoid affecting live services. Alibaba Cloud full-link stress testing provides a more efficient approach for production testing. After testing, clean up test data or filter it from Business Intelligence (BI) reports.
Test environment: Build at a fraction of production scale -- typically half, quarter, or one-eighth. Common strategies include deploying separate test clusters for specific applications within production, or sharing databases between environments. Import masked data from production (typically the last 6 to 12 months) to maintain data relevance.
Test environment requirements
Match the test environment to production as closely as possible:
| Requirement | Details |
|---|---|
| Architecture | Same system architecture as production |
| Instance specifications | Same Elastic Compute Service (ECS) instance types or container specifications |
| Software versions | Same operating system, middleware, database, and application versions |
| Configuration parameters | Same operating system, middleware, database, and application parameters |
| Data volume | Same order of magnitude as production |
| Scaling approach | Reduce the number of load generators and scale down other resources proportionally. Target half or quarter of the production configuration |
Investigate the test environment
Before testing, investigate each layer of the test environment to establish a baseline for monitoring and bottleneck analysis:
| Layer | What to investigate | Purpose |
|---|---|---|
| System architecture | System composition, layer functions, differences from production | Bottleneck analysis and production performance evaluation |
| Operating system | OS platform and version | Tool monitoring |
| Middleware | Middleware types and versions | Tool monitoring and bottleneck analysis |
| Database | Database types and versions | Tool monitoring and bottleneck analysis |
| Application | Running instances and their parameters | Issue detection and bottleneck analysis |
Use an Application Performance Management (APM) tool such as Application Real-Time Monitoring Service (ARMS) to trace issues across the middleware, database, and application layers.
Test metrics
Metric categories
Track metrics across four categories:
| Category | Key metrics |
|---|---|
| Business | Concurrent users, transactions per second (TPS), success rate, response time (RT) |
| Resource | CPU utilization, memory utilization, disk I/O, network I/O, kernel parameters (semaphores, open file count) |
| Application | Idle threads, database connections, garbage collection (GC) count, full GC count, method duration |
| Frontend | Page load time, DNS resolution time, connection time, transfer time |
Define metric thresholds
Define thresholds for each metric before testing. Without clear thresholds, test results lack actionable meaning -- different stakeholders (development, operations, business) have different expectations and care about different metrics.
Business metric benchmarks:
| Metric | Guideline |
|---|---|
| Response time (RT) | Keep under 1 second for most services. High-performance systems like Taobao typically achieve RT in the tens of milliseconds. Specify whether your threshold applies to mean, median, p95, or p99 values -- aggregating response times as a simple mean can mask latency spikes that affect real users |
| TPS | Varies by system scale: 50--1,000 for small-to-medium businesses, 1,000--50,000 for banks, 30,000--300,000 for high-traffic platforms like Taobao |
| Success rate | Industry standard exceeds 99.6% under load |
Resource metric guidelines:
Limit CPU utilization to 75% or below
Avoid swap partition usage entirely
Ideally, when a system reaches its capacity limit, a resource (CPU, memory, I/O) should be the bottleneck rather than an application-level issue. Adding resources to a resource-bound system scales its capacity; adding resources to an application-bound system does not
Business models
Role of business models
Each business operation (login, search, checkout, payment) consumes different amounts of system resources. The mix of business types and their relative proportions determines overall system capacity. If the test traffic mix does not match production, the results have no practical value.
For example, in e-commerce, different promotion types and product categories shift the ratio of read-heavy versus write-heavy operations. Accurate traffic modeling in PTS captures these patterns and reveals the true system bottleneck.
Select business types
Choose high-volume, high-risk, and high-growth business operations as representative test cases.
For systems already in production:
Collect business types and volumes at different peak periods. If the mix varies significantly across time periods, define multiple business models
Identify time windows with high resource consumption or anomalies during peak hours, and investigate their root causes
Review past production incidents. If an incident was caused by a business operation that was excluded from previous tests, add it to future test models
For systems not yet in production:
Determine business types and proportions through stakeholder interviews and requirements analysis
Assess whether specific business operations could spike during promotions or events
After initial test runs, review resource consumption per operation. If a low-proportion operation consumes disproportionate resources, adjust its weight in the model
Data volume
Basic data volume
Basic data volume refers to existing data in the database -- historical records, reference data, and accumulated business data. The volume of existing data directly affects query performance: a search across thousands of records behaves very differently from one across millions.
Requirements:
Match the test environment's data volume to the same order of magnitude as production
Account for data growth over the next three years. If data grows rapidly, load additional data into the test environment
When inserting test accounts into a production environment, plan thorough data preparation and cleanup logic
Full-link stress testing also requires the same order of magnitude between test and production environments
Parameterized data volume
Parameterized data drives the variability of test inputs (user IDs, product IDs, search terms). Small parameter sets cause cache hits that inflate performance results.
Requirements:
Maximize the parameterized data volume. If needed, clear caches or generate additional data programmatically
Distribute parameterized data realistically. If a business operation has geographic distribution patterns (for example, region-specific inventory queries), reflect that in the test data
Test models
Test models are derived from business models. In most cases they are identical. However, if a business operation cannot be simulated (due to technical constraints or security policies), remove it from the test model and recalculate the remaining proportions.
If a removed operation carries significant risk (for example, a payment flow), evaluate that risk separately. If the risk is high, find an alternative testing approach rather than simply excluding it.
Test types
Performance test types fall into two broad categories -- load testing and stress testing -- with several specialized variants:
| Test type | VUs / throughput | Duration | When to use | Required? |
|---|---|---|---|---|
| Single-transaction benchmark testing | Low | Short | Establish baseline performance for individual operations | Optional |
| Single-transaction load testing | Incremental | Medium | Evaluate resource consumption for a specific operation. Recommended for systems not yet in production | Optional |
| Mixed-transaction load testing (capacity testing) | Average production | Medium to long | Determine overall system capacity under realistic mixed workloads | Required |
| Mixed-transaction stress testing | Above average | Medium | Find the breaking point by pushing beyond expected peak load | Optional |
| Business mutation testing | Spike on single operation | Short | Validate system behavior when a specific operation spikes unexpectedly | Optional |
| Mixed-transaction stability testing | Average production | Long (hours+) | Verify sustained performance over extended periods | Required |
| Mixed-transaction reliability testing | Average production | Long | Test failover, recovery, and degradation under fault conditions | Optional |
| Batch testing | Varies | Varies | Measure batch processing performance | Optional |
| Batch impact testing on mixed transactions | Mixed | Long | Assess how batch jobs affect concurrent online transaction performance | Optional |
Progression strategy
Start with simpler test types and progress toward more complex ones. Run single-transaction benchmarks first to establish baselines, then move to mixed-transaction load testing, and then to stress and stability testing. At minimum, run the two required types: mixed-transaction load testing and mixed-transaction stability testing.
No single test type uncovers all risks. Different test types target different failure modes. Prioritize based on your system's risk profile.
Business sessions
A business session is an ordered sequence of API calls that represents a complete user workflow. For example, a typical e-commerce session might follow this pattern:
Browse the product catalog
Search for a specific product
Open a product detail page
Add the item to the cart
Log in
Complete checkout
Make payment
The accuracy of these sessions directly affects how well tests reflect real-world performance.
Requirements:
Design sessions based on actual production business rules and user workflows
Include realistic pauses between steps to simulate user think time. Without think time, the test generates unrealistic request rates that do not match production traffic patterns
Add checkpoints (assertions) at key points to verify server responses. For details, see Interface response parameters
Parameterize all variable data (user credentials, product IDs, search queries) and maximize the data volume
Scenarios
A stress testing scenario combines multiple HTTP/HTTPS API calls or URLs into a workload that simulates real production traffic. Each scenario defines the load profile: pressure mode, ramp-up method, and duration.
The critical requirement: The TPS proportion of each business operation in the test must match the actual business proportion during production peak hours.
RPS mode and concurrency mode
PTS supports two load modes. Choose the mode that matches how your system receives real traffic:
Open workload model (RPS mode -- recommended): You control the arrival rate of requests. Use this mode when your system faces uncontrolled incoming traffic, such as a public-facing web application. This is the most common scenario.
Closed workload model (concurrency mode): You control the number of concurrent users. Use this mode when your system controls admission, such as a call center with a fixed number of agents.
If your real system is an open workload but you test with a closed workload model, the test results do not reflect actual production behavior. The concurrency model implicitly throttles request rates based on response time, which masks performance issues that surface under real traffic conditions.
Example: Two interfaces A and B have a production ratio of 1:4, with response times of 1 ms and 100 ms respectively.
| Mode | Configuration | Result |
|---|---|---|
| RPS mode (recommended) | Set A to 100 RPS, B to 400 RPS | Direct 1:4 ratio. The business model matches production |
| Concurrency mode | Set A to 1 concurrent user, B to 400 concurrent users | Must compensate for the 100x RT difference. Ratio becomes 1:400 in concurrency to achieve 1:4 in throughput |
RPS mode eliminates the need to calculate concurrency adjustments based on response time differences, giving more precise control over the traffic mix.
Monitoring
Why comprehensive monitoring matters
Monitoring during performance tests serves two purposes: real-time bottleneck detection and post-test root cause analysis. Without comprehensive metrics across all layers, identifying the source of a bottleneck becomes guesswork.
Metrics to monitor
| Layer | Key metrics |
|---|---|
| Operating system | CPU utilization (User, Sys, Wait, Idle), memory utilization (including swap), disk I/O, network I/O, kernel parameters |
| Middleware | Thread pools, Java Database Connectivity (JDBC) connection pools, Java Virtual Machine (JVM) metrics (GC frequency, full GC frequency, heap size) |
| Database | Slow SQL statements, locks, cache hit ratio, sessions, process count |
| Application | Method duration, synchronous vs. asynchronous processing, buffering, cache behavior |
Bottleneck analysis
Performance bottlenecks typically fall into four categories: operating system resources, middleware configuration, database issues, and application logic.
Analysis focus by layer
| Layer | Focus areas |
|---|---|
| Operating system | CPU, memory, disk I/O, network I/O |
| Middleware | Thread pools, JDBC connection pools, JVM metrics (GC frequency, full GC frequency, heap size) |
| Database | Slow SQL statements, lock waits, deadlocks, cache hit ratio, sessions, process count |
| Application | Method duration, algorithm efficiency, synchronous vs. asynchronous processing, cache usage, buffering |
| Load generator | Resource consumption on the load generator machines. In most cases, load generators are unlikely to be the bottleneck. PTS has built-in protection and scheduling mechanisms that handle this automatically |
Targeted analysis at each layer enables efficient tuning: fix the actual bottleneck rather than adding resources that have no impact.
Tuning
After identifying bottlenecks, tune the system and retest to verify improvements. Common tuning areas include:
| Layer | Tuning targets |
|---|---|
| Middleware | Thread pool sizes, database connection pool sizes, JVM parameters |
| Database | Slow SQL optimization, deadlock and lock wait resolution, cache hit ratio improvement, process and session parameter adjustments |
| Application | Method execution time, algorithm optimization, synchronous-to-asynchronous conversion, cache strategy, buffer sizing |
| System resources | High CPU or memory consumption is usually caused by suboptimal application settings or parameter configurations -- not by a lack of hardware. Address application-level issues before scaling resources |
Distributed stress testing with PTS
Performance Testing Service (PTS) is a SaaS-based platform for distributed stress testing. No additional installation or deployment is required.
Traffic simulation
PTS generates test traffic from Content Delivery Network (CDN) nodes across hundreds of cities and multiple carriers worldwide. This approach simulates realistic end-user access patterns, including geographic distribution and carrier-specific routing.
Scenario design
Record scenarios using mainstream browser recording plug-ins, or orchestrate sequential and parallel API execution through a visual interface -- no coding required
Parameterize requests using data files, built-in functions, string operations, and response extraction. PTS serves as a data factory that formats request parameters through simple encoding
Maintain cookie and session state across API calls, with extensible instructions that support multi-form think time and traffic regulation
Debug scenarios and individual APIs before running full-scale tests
Load control
Two load modes: Concurrency mode and requests per second (RPS) mode
Fast startup: Launch stress tests within minutes
Real-time adjustment: Traffic changes take effect within seconds
Pulse capability: Generate instantaneous spikes of millions of queries per second (QPS)
Safety controls: Stop stress testing traffic immediately when needed
Reporting
PTS collects real-time data during each test, including concurrency, TPS, RT, and sampled logs for each API. Reports are generated automatically for post-test analysis.
Supported industries
PTS supports workloads across e-commerce, media, finance and insurance, logistics, advertising and marketing, and social networking.