Run the TPC-DS benchmark against EMR Serverless StarRocks to measure online analytical processing (OLAP) query performance across 99 complex SQL queries on 24 tables. The benchmark measures response time—the elapsed time from submitting a query to receiving results.
This test is based on the TPC-DS benchmark but does not satisfy all TPC-DS requirements. Results may differ from official TPC-DS published results.
How it works
TPC-DS, formulated by the Transaction Processing Performance Council (TPC), is the industry-standard benchmark for data management systems. It tests a total of 99 complex SQL queries against 24 tables.
The test script automates the full pipeline:
Generate a dataset at the scale factor you specify.
Create the database and tables, then load the dataset.
Run all 99 queries and return per-query response times.
Reference results
The following results are from a BE configured with 8 compute units (CUs), equivalent to 8 CPU cores and 32 GB of memory, running against a 3 GB (SF=3) dataset. All 99 queries completed in 3,799 ms total. Most queries complete in under 50 ms.
| Query | Time (ms) | Query | Time (ms) | Query | Time (ms) |
|---|---|---|---|---|---|
| Query01 | 34 | Query34 | 33 | Query67 | 31 |
| Query02 | 36 | Query35 | 35 | Query68 | 33 |
| Query03 | 26 | Query36 | 30 | Query69 | 40 |
| Query04 | 57 | Query37 | 31 | Query70 | 34 |
| Query05 | 40 | Query38 | 37 | Query71 | 29 |
| Query06 | 29 | Query39 | 33 | Query72 | 32 |
| Query07 | 35 | Query40 | 36 | Query73 | 28 |
| Query08 | 33 | Query41 | 34 | Query74 | 40 |
| Query09 | 31 | Query42 | 25 | Query75 | 48 |
| Query10 | 33 | Query43 | 31 | Query76 | 28 |
| Query11 | 45 | Query44 | 30 | Query77 | 40 |
| Query12 | 26 | Query45 | 27 | Query78 | 36 |
| Query13 | 31 | Query46 | 33 | Query79 | 29 |
| Query14 | 91 | Query47 | 50 | Query80 | 120 |
| Query15 | 29 | Query48 | 36 | Query81 | 36 |
| Query16 | 50 | Query49 | 49 | Query82 | 28 |
| Query17 | 36 | Query50 | 38 | Query83 | 39 |
| Query18 | 33 | Query51 | 30 | Query84 | 26 |
| Query19 | 31 | Query52 | 26 | Query85 | 37 |
| Query20 | 32 | Query53 | 32 | Query86 | 28 |
| Query21 | 32 | Query54 | 32 | Query87 | 32 |
| Query22 | 33 | Query55 | 27 | Query88 | 46 |
| Query23 | 67 | Query56 | 49 | Query89 | 32 |
| Query24 | 39 | Query57 | 41 | Query90 | 28 |
| Query25 | 32 | Query58 | 37 | Query91 | 32 |
| Query26 | 36 | Query59 | 37 | Query92 | 35 |
| Query27 | 32 | Query60 | 48 | Query93 | 29 |
| Query28 | 32 | Query61 | 32 | Query94 | 65 |
| Query29 | 33 | Query62 | 42 | Query95 | 57 |
| Query30 | 37 | Query63 | 34 | Query96 | 27 |
| Query31 | 46 | Query64 | 130 | Query97 | 29 |
| Query32 | 32 | Query65 | 29 | Query98 | 25 |
| Query33 | 52 | Query66 | 87 | Query99 | 38 |
The three most time-intensive queries are Query64 (130 ms), Query80 (120 ms), and Query14 (91 ms), which involve multi-join aggregations typical of complex retail analytics workloads.
Prerequisites
Before you begin, make sure you have:
An Elastic Compute Service (ECS) instance with: The ECS instance is used to generate data, import data to StarRocks, and run test clients.
CentOS or Alibaba Cloud Linux as the operating system
An Enterprise SSD (ESSD) data disk with capacity greater than your target dataset size
A public IP address
The same region and virtual private cloud (VPC) as your EMR Serverless StarRocks instance
An EMR Serverless StarRocks instance (go to EMR Serverless > StarRocks > Instances tab, then click Create Instance)
Create a new EMR Serverless StarRocks instance for each test run. Upgrading or downgrading an existing instance introduces variables that affect result accuracy.
For steps to create these resources, see Create instances and Create an instance.
Run the benchmark
Step 1: Download the benchmark package
Log on to your ECS instance, then run the following commands to download and extract the benchmark package:
sudo wget https://starrocks-oss.oss-cn-beijing.aliyuncs.com/public-access/starrocks-tpcds-benchmark-for-serverless.zip
sudo yum install -y unzip
unzip starrocks-tpcds-benchmark-for-serverless.zipFor steps to connect to your ECS instance, see Connect to an instance.
Step 2: Configure the test package
Go to the extracted directory:
cd tpcds-poc-1.0Open the configuration file:
vim conf/starrocks.confSet the following parameters:
ImportantUse the internal endpoint for
mysql_host, not the public endpoint.ImportantMake sure the ESSD data disk capacity is greater than the
scale_factorvalue. For example, a scale factor of 3 generates approximately 3 GB of data.Parameter Description Default mysql_hostInternal endpoint of the frontend (FE) on your EMR Serverless StarRocks instance. Find it in the Instance Details tab under FE Details > internal endpoint. — mysql_portQuery port of the FE. Find it in the Instance Details tab under FE Details > query port. 9030mysql_userUsername to log on to the instance. adminmysql_passwordPassword to log on to the instance. — databaseDatabase name for the test. Use the default value. tpcdshttp_portHTTP port of the FE. Find it in the Instance Details tab under FE Details > HTTP port. 8030scale_factorScale factor (SF) of the test dataset, in GB. Controls how much data is generated. 3dataset_generate_root_pathLocal path where the generated dataset is stored. /mnt/disk1/starrocks-benchmark/datasets# for mysql cmd mysql_host: fe-c-***-internal.starrocks.aliyuncs.com mysql_port: 9030 mysql_user: admin mysql_password: **** database: tpcds # cluster ports http_port: 8030 be_heartbeat_port: 9050 broker_port: 8000 # benchmark config scale_factor: 3 dataset_generate_path: /mnt/disk1/starrocks-tpcds-benchmark/datasets
Step 3: Run the test
Switch to the bin directory, then run the full benchmark:
cd bin
sh run_tpcds.shThis creates the database and tables, generates the dataset, loads the data, and runs all 99 queries.
You can also run individual stages as needed:
| Command | Description |
|---|---|
sh run_tpcds.sh query | Run the 99 queries only (skips data generation and loading) |
sh run_tpcds.sh gen_data | Regenerate the test dataset |
sh run_tpcds.sh reload | Reload data from an existing gen_data directory |
Step 4: View test results
Results are returned directly after the queries complete. Queries run against the database named tpcds3.
The output format is:
Database from which data is queried: tpcds3
SQL Time(ms)
Query01 34
Query02 36
...
All time(ms): 3799Compare your results with the reference results in the Reference results section above. Significant deviations may indicate configuration differences, resource contention, or instance sizing.