All Products
Search
Document Center

E-MapReduce:Test plan

Last Updated:Mar 17, 2026

Run the TPC-DS benchmark against EMR Serverless StarRocks to measure online analytical processing (OLAP) query performance across 99 complex SQL queries on 24 tables. The benchmark measures response time—the elapsed time from submitting a query to receiving results.

This test is based on the TPC-DS benchmark but does not satisfy all TPC-DS requirements. Results may differ from official TPC-DS published results.

How it works

TPC-DS, formulated by the Transaction Processing Performance Council (TPC), is the industry-standard benchmark for data management systems. It tests a total of 99 complex SQL queries against 24 tables.

The test script automates the full pipeline:

  1. Generate a dataset at the scale factor you specify.

  2. Create the database and tables, then load the dataset.

  3. Run all 99 queries and return per-query response times.

Reference results

The following results are from a BE configured with 8 compute units (CUs), equivalent to 8 CPU cores and 32 GB of memory, running against a 3 GB (SF=3) dataset. All 99 queries completed in 3,799 ms total. Most queries complete in under 50 ms.

QueryTime (ms)QueryTime (ms)QueryTime (ms)
Query0134Query3433Query6731
Query0236Query3535Query6833
Query0326Query3630Query6940
Query0457Query3731Query7034
Query0540Query3837Query7129
Query0629Query3933Query7232
Query0735Query4036Query7328
Query0833Query4134Query7440
Query0931Query4225Query7548
Query1033Query4331Query7628
Query1145Query4430Query7740
Query1226Query4527Query7836
Query1331Query4633Query7929
Query1491Query4750Query80120
Query1529Query4836Query8136
Query1650Query4949Query8228
Query1736Query5038Query8339
Query1833Query5130Query8426
Query1931Query5226Query8537
Query2032Query5332Query8628
Query2132Query5432Query8732
Query2233Query5527Query8846
Query2367Query5649Query8932
Query2439Query5741Query9028
Query2532Query5837Query9132
Query2636Query5937Query9235
Query2732Query6048Query9329
Query2832Query6132Query9465
Query2933Query6242Query9557
Query3037Query6334Query9627
Query3146Query64130Query9729
Query3232Query6529Query9825
Query3352Query6687Query9938

The three most time-intensive queries are Query64 (130 ms), Query80 (120 ms), and Query14 (91 ms), which involve multi-join aggregations typical of complex retail analytics workloads.

Prerequisites

Before you begin, make sure you have:

  • An Elastic Compute Service (ECS) instance with: The ECS instance is used to generate data, import data to StarRocks, and run test clients.

    • CentOS or Alibaba Cloud Linux as the operating system

    • An Enterprise SSD (ESSD) data disk with capacity greater than your target dataset size

    • A public IP address

    • The same region and virtual private cloud (VPC) as your EMR Serverless StarRocks instance

  • An EMR Serverless StarRocks instance (go to EMR Serverless > StarRocks > Instances tab, then click Create Instance)

Important

Create a new EMR Serverless StarRocks instance for each test run. Upgrading or downgrading an existing instance introduces variables that affect result accuracy.

For steps to create these resources, see Create instances and Create an instance.

Run the benchmark

Step 1: Download the benchmark package

Log on to your ECS instance, then run the following commands to download and extract the benchmark package:

sudo wget https://starrocks-oss.oss-cn-beijing.aliyuncs.com/public-access/starrocks-tpcds-benchmark-for-serverless.zip
sudo yum install -y unzip
unzip starrocks-tpcds-benchmark-for-serverless.zip

For steps to connect to your ECS instance, see Connect to an instance.

Step 2: Configure the test package

  1. Go to the extracted directory:

       cd tpcds-poc-1.0
  2. Open the configuration file:

       vim conf/starrocks.conf
  3. Set the following parameters:

    Important

    Use the internal endpoint for mysql_host, not the public endpoint.

    Important

    Make sure the ESSD data disk capacity is greater than the scale_factor value. For example, a scale factor of 3 generates approximately 3 GB of data.

    ParameterDescriptionDefault
    mysql_hostInternal endpoint of the frontend (FE) on your EMR Serverless StarRocks instance. Find it in the Instance Details tab under FE Details > internal endpoint.
    mysql_portQuery port of the FE. Find it in the Instance Details tab under FE Details > query port.9030
    mysql_userUsername to log on to the instance.admin
    mysql_passwordPassword to log on to the instance.
    databaseDatabase name for the test. Use the default value.tpcds
    http_portHTTP port of the FE. Find it in the Instance Details tab under FE Details > HTTP port.8030
    scale_factorScale factor (SF) of the test dataset, in GB. Controls how much data is generated.3
    dataset_generate_root_pathLocal path where the generated dataset is stored./mnt/disk1/starrocks-benchmark/datasets
       # for mysql cmd
       mysql_host: fe-c-***-internal.starrocks.aliyuncs.com
       mysql_port: 9030
       mysql_user: admin
       mysql_password: ****
       database: tpcds
    
       # cluster ports
       http_port: 8030
       be_heartbeat_port: 9050
       broker_port: 8000
    
       # benchmark config
       scale_factor: 3
       dataset_generate_path: /mnt/disk1/starrocks-tpcds-benchmark/datasets

Step 3: Run the test

Switch to the bin directory, then run the full benchmark:

cd bin
sh run_tpcds.sh

This creates the database and tables, generates the dataset, loads the data, and runs all 99 queries.

You can also run individual stages as needed:

CommandDescription
sh run_tpcds.sh queryRun the 99 queries only (skips data generation and loading)
sh run_tpcds.sh gen_dataRegenerate the test dataset
sh run_tpcds.sh reloadReload data from an existing gen_data directory

Step 4: View test results

Results are returned directly after the queries complete. Queries run against the database named tpcds3.

The output format is:

Database from which data is queried: tpcds3
SQL         Time(ms)
Query01     34
Query02     36
...
All time(ms):   3799

Compare your results with the reference results in the Reference results section above. Significant deviations may indicate configuration differences, resource contention, or instance sizing.