Run TPC-H on EMR Serverless StarRocks to measure OLAP query latency before you make capacity decisions or validate that the service meets your performance requirements.
This test is based on the TPC-H benchmark but does not meet all TPC-H benchmark requirements. Results cannot be compared with published TPC-H benchmark results.
Reference results
The following results were collected on a backend (BE) with 8 compute units (CUs) — 8 CPU cores and 32 GB of memory — against a scale factor (SF) of 1 (1 GB of raw data).
All 22 queries completed in approximately 10 seconds total.
| Query | Time (seconds) |
|---|---|
| q1 | 0.339 |
| q2 | 0.213 |
| q3 | 0.262 |
| q4 | 0.210 |
| q5 | 0.778 |
| q6 | 0.103 |
| q7 | 0.346 |
| q8 | 3.229 |
| q9 | 0.611 |
| q10 | 0.557 |
| q11 | 0.100 |
| q12 | 0.151 |
| q13 | 0.347 |
| q14 | 0.098 |
| q15 | 0.175 |
| q16 | 0.142 |
| q17 | 0.248 |
| q18 | 0.518 |
| q19 | 0.130 |
| q20 | 0.400 |
| q21 | 0.600 |
| q22 | 0.065 |
| Total | ~10.026 |
Test environment:
| Component | Specification |
|---|---|
| BE compute units | 8 CUs (8 CPU cores, 32 GB memory) |
| ECS instance type | ecs.g6e.4xlarge |
| ECS operating system | CentOS 7.9 |
| ECS data disk | Enterprise SSD (ESSD) |
| Scale factor | SF 1 (1 GB raw data) |
TPC-H overview
TPC-H is a decision support benchmark. It consists of 22 business-oriented ad hoc queries against a simulated sales data warehouse of 8 tables. The benchmark measures query response time — from query submission to results returned.
Use the scale factor (SF) to control data volume: 1 SF equals 1 GB of raw data. Table sizes range from 1 GB to 3 TB depending on the SF you choose.
The SF value controls raw data volume. When sizing your disk, also account for index storage.
For the full specification, see TPC Benchmark H Standard Specification.
Prerequisites
Before you begin, make sure you have:
-
An Elastic Compute Service (ECS) instance with the following specifications:
-
Instance type:
ecs.g6e.4xlarge -
Operating system: CentOS 7.9
-
Data disk: Enterprise SSD (ESSD), sized for your SF value and index storage
-
-
An EMR Serverless StarRocks instance. For reproducible results, create a new instance for each test rather than resizing an existing one.
-
The ECS instance and the EMR Serverless StarRocks instance in the same virtual private cloud (VPC) and region.
For setup instructions, see Create an ECS instance and Create an EMR Serverless StarRocks instance.
Run the TPC-H benchmark
Step 1: Download and configure the test package
-
Log in to the ECS instance. For instructions, see Connect to an ECS instance.
-
Download and decompress the benchmark package:
wget https://emr-olap.oss-cn-beijing.aliyuncs.com/packages/starrocks-benchmark-for-serverless.tar.gz tar xzvf starrocks-benchmark-for-serverless.tar.gz -
Go to the package directory:
cd starrocks-benchmark-for-serverless -
Edit the configuration file:
vim group_vars/allThe configuration file contains the following parameters:
# mysql client config login_host: fe-c-8764bab92bc6****-internal.starrocks.aliyuncs.com login_port: 9030 login_user: admin login_password: xxxx # oss config bucket: "" endpoint: "" access_key_id: "" access_key_secret: "" # benchmark config scale_factor: 1 work_dir_root: /mnt/disk1/starrocks-benchmark/workdirs dataset_generate_root_path: /mnt/disk1/starrocks-benchmark/datasetsConnection parameters (required):
Parameter Description login_hostInternal endpoint of the frontend (FE) on your EMR Serverless StarRocks instance. Find it on the Instance Details tab under FE Details > internal endpoint. Use the internal endpoint, not the public endpoint. login_portQuery port of the FE. Default: 9030. Find it on the Instance Details tab under FE Details > query port.login_userInitial username for logging in to the instance. login_passwordPassword for logging in to the instance. Object Storage Service (OSS) parameters (optional):
If specified, the generated dataset is stored in OSS.
Parameter Description bucketName of your OSS bucket. endpointEndpoint for accessing OSS. access_key_idAccessKey ID of your Alibaba Cloud account. access_key_secretAccessKey secret of your Alibaba Cloud account. Benchmark parameters:
Parameter Default Description scale_factor1Data volume to generate. Unit: GB. 1 SF = 1 GB of raw data. work_dir_root/mnt/disk1/starrocks-benchmark/workdirsRoot directory for storing SQL statements and other test artifacts. dataset_generate_root_path/mnt/disk1/starrocks-benchmark/datasetsPath where the generated dataset is stored. If an OSS bucket is specified, it is mounted to this path.
Step 2: Run the test
Run the end-to-end TPC-H test:
bin/run_tpch.sh
This command creates the database, tables, and 22 SQL queries, generates the dataset, loads the data, and runs all queries.
You can also run individual phases:
-
Reload the dataset only:
bin/run_tpch.sh reload -
Run the query test only:
bin/run_tpch.sh query
Step 3: View results
After bin/run_tpch.sh completes, query results are printed to the terminal. Each line shows the query name and the time taken.
The working directory path is also printed at the end. Switch to that directory to inspect query statements, table creation SQL, and run logs:
<work_dir>/
├── config # Configurations for run.sh and run_mysql.sh
├── logs # Most recent run logs
│ ├── *.sql.err
│ ├── *.sql.out
│ └── run.log
├── queries # The 22 TPC-H SQL queries
│ ├── ddl
│ │ └── create_tables.sql
│ └── *.sql
├── run_mysql.sh
├── run.sh # Queries run in the TPC-H performance test
└── tpch_tools # dbgen toolkit
To browse logs directly:
cd <work_dir>/logs
In the reference test, the working directory is /mnt/disk1/starrocks-benchmark/workdirs/tpc_h/sf1.