All Products
Search
Document Center

Performance Testing:Technical guide for performance testing

Last Updated:Mar 11, 2026

This guide defines the technical requirements for planning and executing performance tests with Performance Testing Service (PTS). Follow these requirements to evaluate your system's real-world capacity, prevent production risks, and validate that your infrastructure handles expected traffic patterns.

Scope

These requirements apply to all projects that involve performance testing. The guide covers:

  • System environments

  • Test metrics

  • Business models and test models

  • Data volume

  • Test types

  • Business sessions and scenarios

  • Monitoring and bottleneck analysis

  • Tuning

  • Distributed stress testing with PTS

System environments

Choose an environment

System environments are classified into production, test, and other environments. Performance tests typically run in either a production environment or a dedicated test environment. Each has trade-offs:

EnvironmentAdvantagesDisadvantages
ProductionAccurate measurements against real infrastructureRequires test data cleanup; risk to live services if not scheduled carefully
TestIsolated from live traffic; controllable riskExpensive to replicate production at full scale; results may not reflect production behavior

Production environment: Run stress tests during off-peak hours to avoid affecting live services. Alibaba Cloud full-link stress testing provides a more efficient approach for production testing. After testing, clean up test data or filter it from Business Intelligence (BI) reports.

Test environment: Build at a fraction of production scale -- typically half, quarter, or one-eighth. Common strategies include deploying separate test clusters for specific applications within production, or sharing databases between environments. Import masked data from production (typically the last 6 to 12 months) to maintain data relevance.

Test environment requirements

Match the test environment to production as closely as possible:

RequirementDetails
ArchitectureSame system architecture as production
Instance specificationsSame Elastic Compute Service (ECS) instance types or container specifications
Software versionsSame operating system, middleware, database, and application versions
Configuration parametersSame operating system, middleware, database, and application parameters
Data volumeSame order of magnitude as production
Scaling approachReduce the number of load generators and scale down other resources proportionally. Target half or quarter of the production configuration

Investigate the test environment

Before testing, investigate each layer of the test environment to establish a baseline for monitoring and bottleneck analysis:

LayerWhat to investigatePurpose
System architectureSystem composition, layer functions, differences from productionBottleneck analysis and production performance evaluation
Operating systemOS platform and versionTool monitoring
MiddlewareMiddleware types and versionsTool monitoring and bottleneck analysis
DatabaseDatabase types and versionsTool monitoring and bottleneck analysis
ApplicationRunning instances and their parametersIssue detection and bottleneck analysis
Note

Use an Application Performance Management (APM) tool such as Application Real-Time Monitoring Service (ARMS) to trace issues across the middleware, database, and application layers.

Test metrics

Metric categories

Track metrics across four categories:

CategoryKey metrics
BusinessConcurrent users, transactions per second (TPS), success rate, response time (RT)
ResourceCPU utilization, memory utilization, disk I/O, network I/O, kernel parameters (semaphores, open file count)
ApplicationIdle threads, database connections, garbage collection (GC) count, full GC count, method duration
FrontendPage load time, DNS resolution time, connection time, transfer time

Define metric thresholds

Define thresholds for each metric before testing. Without clear thresholds, test results lack actionable meaning -- different stakeholders (development, operations, business) have different expectations and care about different metrics.

Business metric benchmarks:

MetricGuideline
Response time (RT)Keep under 1 second for most services. High-performance systems like Taobao typically achieve RT in the tens of milliseconds. Specify whether your threshold applies to mean, median, p95, or p99 values -- aggregating response times as a simple mean can mask latency spikes that affect real users
TPSVaries by system scale: 50--1,000 for small-to-medium businesses, 1,000--50,000 for banks, 30,000--300,000 for high-traffic platforms like Taobao
Success rateIndustry standard exceeds 99.6% under load

Resource metric guidelines:

  • Limit CPU utilization to 75% or below

  • Avoid swap partition usage entirely

  • Ideally, when a system reaches its capacity limit, a resource (CPU, memory, I/O) should be the bottleneck rather than an application-level issue. Adding resources to a resource-bound system scales its capacity; adding resources to an application-bound system does not

Business models

Role of business models

Each business operation (login, search, checkout, payment) consumes different amounts of system resources. The mix of business types and their relative proportions determines overall system capacity. If the test traffic mix does not match production, the results have no practical value.

For example, in e-commerce, different promotion types and product categories shift the ratio of read-heavy versus write-heavy operations. Accurate traffic modeling in PTS captures these patterns and reveals the true system bottleneck.

Select business types

Choose high-volume, high-risk, and high-growth business operations as representative test cases.

For systems already in production:

  • Collect business types and volumes at different peak periods. If the mix varies significantly across time periods, define multiple business models

  • Identify time windows with high resource consumption or anomalies during peak hours, and investigate their root causes

  • Review past production incidents. If an incident was caused by a business operation that was excluded from previous tests, add it to future test models

For systems not yet in production:

  • Determine business types and proportions through stakeholder interviews and requirements analysis

  • Assess whether specific business operations could spike during promotions or events

  • After initial test runs, review resource consumption per operation. If a low-proportion operation consumes disproportionate resources, adjust its weight in the model

Data volume

Basic data volume

Basic data volume refers to existing data in the database -- historical records, reference data, and accumulated business data. The volume of existing data directly affects query performance: a search across thousands of records behaves very differently from one across millions.

Requirements:

  • Match the test environment's data volume to the same order of magnitude as production

  • Account for data growth over the next three years. If data grows rapidly, load additional data into the test environment

  • When inserting test accounts into a production environment, plan thorough data preparation and cleanup logic

  • Full-link stress testing also requires the same order of magnitude between test and production environments

Parameterized data volume

Parameterized data drives the variability of test inputs (user IDs, product IDs, search terms). Small parameter sets cause cache hits that inflate performance results.

Requirements:

  • Maximize the parameterized data volume. If needed, clear caches or generate additional data programmatically

  • Distribute parameterized data realistically. If a business operation has geographic distribution patterns (for example, region-specific inventory queries), reflect that in the test data

Test models

Test models are derived from business models. In most cases they are identical. However, if a business operation cannot be simulated (due to technical constraints or security policies), remove it from the test model and recalculate the remaining proportions.

Important

If a removed operation carries significant risk (for example, a payment flow), evaluate that risk separately. If the risk is high, find an alternative testing approach rather than simply excluding it.

Test types

Performance test types fall into two broad categories -- load testing and stress testing -- with several specialized variants:

Test typeVUs / throughputDurationWhen to useRequired?
Single-transaction benchmark testingLowShortEstablish baseline performance for individual operationsOptional
Single-transaction load testingIncrementalMediumEvaluate resource consumption for a specific operation. Recommended for systems not yet in productionOptional
Mixed-transaction load testing (capacity testing)Average productionMedium to longDetermine overall system capacity under realistic mixed workloadsRequired
Mixed-transaction stress testingAbove averageMediumFind the breaking point by pushing beyond expected peak loadOptional
Business mutation testingSpike on single operationShortValidate system behavior when a specific operation spikes unexpectedlyOptional
Mixed-transaction stability testingAverage productionLong (hours+)Verify sustained performance over extended periodsRequired
Mixed-transaction reliability testingAverage productionLongTest failover, recovery, and degradation under fault conditionsOptional
Batch testingVariesVariesMeasure batch processing performanceOptional
Batch impact testing on mixed transactionsMixedLongAssess how batch jobs affect concurrent online transaction performanceOptional

Progression strategy

Start with simpler test types and progress toward more complex ones. Run single-transaction benchmarks first to establish baselines, then move to mixed-transaction load testing, and then to stress and stability testing. At minimum, run the two required types: mixed-transaction load testing and mixed-transaction stability testing.

Note

No single test type uncovers all risks. Different test types target different failure modes. Prioritize based on your system's risk profile.

Business sessions

A business session is an ordered sequence of API calls that represents a complete user workflow. For example, a typical e-commerce session might follow this pattern:

  1. Browse the product catalog

  2. Search for a specific product

  3. Open a product detail page

  4. Add the item to the cart

  5. Log in

  6. Complete checkout

  7. Make payment

The accuracy of these sessions directly affects how well tests reflect real-world performance.

Requirements:

  • Design sessions based on actual production business rules and user workflows

  • Include realistic pauses between steps to simulate user think time. Without think time, the test generates unrealistic request rates that do not match production traffic patterns

  • Add checkpoints (assertions) at key points to verify server responses. For details, see Interface response parameters

  • Parameterize all variable data (user credentials, product IDs, search queries) and maximize the data volume

Scenarios

A stress testing scenario combines multiple HTTP/HTTPS API calls or URLs into a workload that simulates real production traffic. Each scenario defines the load profile: pressure mode, ramp-up method, and duration.

The critical requirement: The TPS proportion of each business operation in the test must match the actual business proportion during production peak hours.

RPS mode and concurrency mode

PTS supports two load modes. Choose the mode that matches how your system receives real traffic:

  • Open workload model (RPS mode -- recommended): You control the arrival rate of requests. Use this mode when your system faces uncontrolled incoming traffic, such as a public-facing web application. This is the most common scenario.

  • Closed workload model (concurrency mode): You control the number of concurrent users. Use this mode when your system controls admission, such as a call center with a fixed number of agents.

Important

If your real system is an open workload but you test with a closed workload model, the test results do not reflect actual production behavior. The concurrency model implicitly throttles request rates based on response time, which masks performance issues that surface under real traffic conditions.

Example: Two interfaces A and B have a production ratio of 1:4, with response times of 1 ms and 100 ms respectively.

ModeConfigurationResult
RPS mode (recommended)Set A to 100 RPS, B to 400 RPSDirect 1:4 ratio. The business model matches production
Concurrency modeSet A to 1 concurrent user, B to 400 concurrent usersMust compensate for the 100x RT difference. Ratio becomes 1:400 in concurrency to achieve 1:4 in throughput

RPS mode eliminates the need to calculate concurrency adjustments based on response time differences, giving more precise control over the traffic mix.

Monitoring

Why comprehensive monitoring matters

Monitoring during performance tests serves two purposes: real-time bottleneck detection and post-test root cause analysis. Without comprehensive metrics across all layers, identifying the source of a bottleneck becomes guesswork.

Metrics to monitor

LayerKey metrics
Operating systemCPU utilization (User, Sys, Wait, Idle), memory utilization (including swap), disk I/O, network I/O, kernel parameters
MiddlewareThread pools, Java Database Connectivity (JDBC) connection pools, Java Virtual Machine (JVM) metrics (GC frequency, full GC frequency, heap size)
DatabaseSlow SQL statements, locks, cache hit ratio, sessions, process count
ApplicationMethod duration, synchronous vs. asynchronous processing, buffering, cache behavior

Bottleneck analysis

Performance bottlenecks typically fall into four categories: operating system resources, middleware configuration, database issues, and application logic.

Analysis focus by layer

LayerFocus areas
Operating systemCPU, memory, disk I/O, network I/O
MiddlewareThread pools, JDBC connection pools, JVM metrics (GC frequency, full GC frequency, heap size)
DatabaseSlow SQL statements, lock waits, deadlocks, cache hit ratio, sessions, process count
ApplicationMethod duration, algorithm efficiency, synchronous vs. asynchronous processing, cache usage, buffering
Load generatorResource consumption on the load generator machines. In most cases, load generators are unlikely to be the bottleneck. PTS has built-in protection and scheduling mechanisms that handle this automatically

Targeted analysis at each layer enables efficient tuning: fix the actual bottleneck rather than adding resources that have no impact.

Tuning

After identifying bottlenecks, tune the system and retest to verify improvements. Common tuning areas include:

LayerTuning targets
MiddlewareThread pool sizes, database connection pool sizes, JVM parameters
DatabaseSlow SQL optimization, deadlock and lock wait resolution, cache hit ratio improvement, process and session parameter adjustments
ApplicationMethod execution time, algorithm optimization, synchronous-to-asynchronous conversion, cache strategy, buffer sizing
System resourcesHigh CPU or memory consumption is usually caused by suboptimal application settings or parameter configurations -- not by a lack of hardware. Address application-level issues before scaling resources

Distributed stress testing with PTS

Performance Testing Service (PTS) is a SaaS-based platform for distributed stress testing. No additional installation or deployment is required.

Traffic simulation

PTS generates test traffic from Content Delivery Network (CDN) nodes across hundreds of cities and multiple carriers worldwide. This approach simulates realistic end-user access patterns, including geographic distribution and carrier-specific routing.

Scenario design

  • Record scenarios using mainstream browser recording plug-ins, or orchestrate sequential and parallel API execution through a visual interface -- no coding required

  • Parameterize requests using data files, built-in functions, string operations, and response extraction. PTS serves as a data factory that formats request parameters through simple encoding

  • Maintain cookie and session state across API calls, with extensible instructions that support multi-form think time and traffic regulation

  • Debug scenarios and individual APIs before running full-scale tests

Load control

  • Two load modes: Concurrency mode and requests per second (RPS) mode

  • Fast startup: Launch stress tests within minutes

  • Real-time adjustment: Traffic changes take effect within seconds

  • Pulse capability: Generate instantaneous spikes of millions of queries per second (QPS)

  • Safety controls: Stop stress testing traffic immediately when needed

Reporting

PTS collects real-time data during each test, including concurrency, TPS, RT, and sampled logs for each API. Reports are generated automatically for post-test analysis.

Supported industries

PTS supports workloads across e-commerce, media, finance and insurance, logistics, advertising and marketing, and social networking.