All Products
Search
Document Center

E-MapReduce:Test plan

Last Updated:Mar 26, 2026

Run TPC-H on EMR Serverless StarRocks to measure OLAP query latency before you make capacity decisions or validate that the service meets your performance requirements.

This test is based on the TPC-H benchmark but does not meet all TPC-H benchmark requirements. Results cannot be compared with published TPC-H benchmark results.

Reference results

The following results were collected on a backend (BE) with 8 compute units (CUs) — 8 CPU cores and 32 GB of memory — against a scale factor (SF) of 1 (1 GB of raw data).

All 22 queries completed in approximately 10 seconds total.

Query Time (seconds)
q1 0.339
q2 0.213
q3 0.262
q4 0.210
q5 0.778
q6 0.103
q7 0.346
q8 3.229
q9 0.611
q10 0.557
q11 0.100
q12 0.151
q13 0.347
q14 0.098
q15 0.175
q16 0.142
q17 0.248
q18 0.518
q19 0.130
q20 0.400
q21 0.600
q22 0.065
Total ~10.026

Test environment:

Component Specification
BE compute units 8 CUs (8 CPU cores, 32 GB memory)
ECS instance type ecs.g6e.4xlarge
ECS operating system CentOS 7.9
ECS data disk Enterprise SSD (ESSD)
Scale factor SF 1 (1 GB raw data)

TPC-H overview

TPC-H is a decision support benchmark. It consists of 22 business-oriented ad hoc queries against a simulated sales data warehouse of 8 tables. The benchmark measures query response time — from query submission to results returned.

Use the scale factor (SF) to control data volume: 1 SF equals 1 GB of raw data. Table sizes range from 1 GB to 3 TB depending on the SF you choose.

The SF value controls raw data volume. When sizing your disk, also account for index storage.

For the full specification, see TPC Benchmark H Standard Specification.

Prerequisites

Before you begin, make sure you have:

  • An Elastic Compute Service (ECS) instance with the following specifications:

    • Instance type: ecs.g6e.4xlarge

    • Operating system: CentOS 7.9

    • Data disk: Enterprise SSD (ESSD), sized for your SF value and index storage

  • An EMR Serverless StarRocks instance. For reproducible results, create a new instance for each test rather than resizing an existing one.

  • The ECS instance and the EMR Serverless StarRocks instance in the same virtual private cloud (VPC) and region.

For setup instructions, see Create an ECS instance and Create an EMR Serverless StarRocks instance.

Run the TPC-H benchmark

Step 1: Download and configure the test package

  1. Log in to the ECS instance. For instructions, see Connect to an ECS instance.

  2. Download and decompress the benchmark package:

    wget https://emr-olap.oss-cn-beijing.aliyuncs.com/packages/starrocks-benchmark-for-serverless.tar.gz
    tar xzvf starrocks-benchmark-for-serverless.tar.gz
  3. Go to the package directory:

    cd starrocks-benchmark-for-serverless
  4. Edit the configuration file:

    vim group_vars/all

    The configuration file contains the following parameters:

    # mysql client config
    login_host: fe-c-8764bab92bc6****-internal.starrocks.aliyuncs.com
    login_port: 9030
    login_user: admin
    login_password: xxxx
    
    # oss config
    bucket: ""
    endpoint: ""
    access_key_id: ""
    access_key_secret: ""
    
    # benchmark config
    scale_factor: 1
    work_dir_root: /mnt/disk1/starrocks-benchmark/workdirs
    dataset_generate_root_path: /mnt/disk1/starrocks-benchmark/datasets

    Connection parameters (required):

    Parameter Description
    login_host Internal endpoint of the frontend (FE) on your EMR Serverless StarRocks instance. Find it on the Instance Details tab under FE Details > internal endpoint. Use the internal endpoint, not the public endpoint.
    login_port Query port of the FE. Default: 9030. Find it on the Instance Details tab under FE Details > query port.
    login_user Initial username for logging in to the instance.
    login_password Password for logging in to the instance.

    Object Storage Service (OSS) parameters (optional):

    If specified, the generated dataset is stored in OSS.

    Parameter Description
    bucket Name of your OSS bucket.
    endpoint Endpoint for accessing OSS.
    access_key_id AccessKey ID of your Alibaba Cloud account.
    access_key_secret AccessKey secret of your Alibaba Cloud account.

    Benchmark parameters:

    Parameter Default Description
    scale_factor 1 Data volume to generate. Unit: GB. 1 SF = 1 GB of raw data.
    work_dir_root /mnt/disk1/starrocks-benchmark/workdirs Root directory for storing SQL statements and other test artifacts.
    dataset_generate_root_path /mnt/disk1/starrocks-benchmark/datasets Path where the generated dataset is stored. If an OSS bucket is specified, it is mounted to this path.

Step 2: Run the test

Run the end-to-end TPC-H test:

bin/run_tpch.sh

This command creates the database, tables, and 22 SQL queries, generates the dataset, loads the data, and runs all queries.

You can also run individual phases:

  • Reload the dataset only:

    bin/run_tpch.sh reload
  • Run the query test only:

    bin/run_tpch.sh query

Step 3: View results

After bin/run_tpch.sh completes, query results are printed to the terminal. Each line shows the query name and the time taken.

The working directory path is also printed at the end. Switch to that directory to inspect query statements, table creation SQL, and run logs:

<work_dir>/
├── config          # Configurations for run.sh and run_mysql.sh
├── logs            # Most recent run logs
│   ├── *.sql.err
│   ├── *.sql.out
│   └── run.log
├── queries         # The 22 TPC-H SQL queries
│   ├── ddl
│   │   └── create_tables.sql
│   └── *.sql
├── run_mysql.sh
├── run.sh          # Queries run in the TPC-H performance test
└── tpch_tools      # dbgen toolkit

To browse logs directly:

cd <work_dir>/logs
Work directory file structure
In the reference test, the working directory is /mnt/disk1/starrocks-benchmark/workdirs/tpc_h/sf1.