All Products
Search
Document Center

E-MapReduce:Run a TPC-H performance test

Last Updated:Jun 21, 2026

This topic describes how to use the TPC-H benchmark to run a performance test for Online Analytical Processing (OLAP) query scenarios.

TPC-H

The following description is from TPC Benchmark H (TPC-H):

"The TPC-H is a decision support benchmark. It consists of a suite of business-oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions."

For details, see the TPCH Specification.

Note

This TPC-H implementation is based on the TPC-H benchmark but is not fully compliant with its official requirements. Therefore, these results are not comparable with officially published TPC-H benchmark results.

Dataset

The TPC-H benchmark was developed by the Transaction Processing Performance Council (TPC) to simulate decision support applications. It is widely used in academia and industry to evaluate the performance of decision support technologies.

TPC-H models a data warehouse for a sales system based on a real-world production environment. It contains eight tables, and the data volume can be set from 1 GB to 3 TB. The benchmark includes 22 queries, and the primary metric is the response time of each query, measured from submission to result return. The test results provide a comprehensive overview of the system's query processing capability. For more information, see the TPC-H benchmark.

Procedure

This test uses the 22 queries from the TPC-H benchmark. The test results are directly affected by the data volume. The TPC-H data generator uses a scale factor (SF) to control the generated data volume, where 1 SF corresponds to 1 GB of data.

Note

The specified data volume refers only to the raw data and does not include the space required for indexes. Therefore, you must provision additional storage space when you prepare the environment.

  1. Prepare the environment.

    Prepare the required resources for the OLAP query scenario.

    Note

    For consistent results, use a new instance for each test. Avoid using instances that have been scaled up or down.

    1. Create an ECS instance. For more information, see Create an ECS instance.

      The ECS instance for this test is configured as follows:

      • Instance type: ecs.g6e.4xlarge

      • Operating system: CentOS 7.9

      • Data disk: Enterprise SSD (ESSD). The disk capacity depends on the test data volume.

    2. Create a StarRocks instance. For more information, see Create an instance.

      For this test, the backend (BE) is configured with 8 compute units (CUs), providing 8 CPU cores and 32 GB of memory. You can select a configuration based on your business requirements.

      Note

      Create the StarRocks instance and the ECS instance in the same region and Virtual Private Cloud (VPC).

  2. Configure the toolkit parameters.

    1. Log on to the ECS instance. For more information, see Connect to an ECS instance.

    2. Run the following commands to download and decompress the starrocks-benchmark toolkit.

      wget https://emr-olap.oss-cn-beijing.aliyuncs.com/packages/starrocks-benchmark-for-serverless.tar.gz
      tar xzvf starrocks-benchmark-for-serverless.tar.gz
    3. Run the following command to change to the starrocks-benchmark-for-serverless directory.

      cd starrocks-benchmark-for-serverless
    4. Run the vim group_vars/all command to configure the benchmark test parameters.

      # mysql client config
      login_host: fe-c-8764bab92bc6****-internal.starrocks.aliyuncs.com
      login_port: 9030
      login_user: admin
      login_password: xxxx
      # oss config
      bucket: ""
      endpoint: ""
      access_key_id: ""
      access_key_secret: ""
      # benchmark config
      scale_factor: 1
      work_dir_root: /mnt/disk1/starrocks-benchmark/workdirs
      dataset_generate_root_path: /mnt/disk1/starrocks-benchmark/datasets

      The following table describes the parameters.

      Parameter

      Description

      Notes

      login_host

      The internal endpoint of the frontend (FE) of the StarRocks instance.

      On the Instance Details tab of your StarRocks instance, find the endpoint in the FE Details section under Internal Endpoint.

      Important

      Avoid using a public endpoint.

      StarRocks client connection parameters.

      login_port

      The query port of the FE of the StarRocks instance. The default value is 9030.

      On the Instance Details tab of your StarRocks instance, find the port in the FE Details section under Query Port.

      login_user

      The initial user of the StarRocks instance.

      login_password

      The password for the initial user of the StarRocks instance.

      bucket

      The name of the OSS bucket.

      Optional. OSS configuration. If you configure these parameters, the toolkit generates and stores the test dataset in OSS.

      endpoint

      The endpoint for accessing OSS.

      access_key_id

      The AccessKey ID of your Alibaba Cloud account.

      access_key_secret

      The AccessKey Secret of your Alibaba Cloud account.

      scale_factor

      The scale factor for the dataset, which controls the data volume. The default value is 1. The unit is GB.

      Benchmark configuration.

      work_dir_root

      The root working directory. It stores TPC-H table creation SQL, executed SQL, and other artifacts. The default value is /mnt/disk1/starrocks-benchmark/workdirs.

      dataset_generate_root_path

      The path for storing the generated test dataset. The default value is /mnt/disk1/starrocks-benchmark/datasets.

      If you configure OSS, the specified bucket is mounted to this path.

  3. Run the following command to perform an automated, end-to-end TPC-H test.

    bin/run_tpch.sh

    This command automates the entire TPC-H test process, which includes creating the database and tables, generating and loading the dataset, and running the 22 SQL queries.

    Other operations:

    • To load only the dataset, run the following command.

      bin/run_tpch.sh reload
    • To run only the TPC-H query test, run the following command.

      bin/run_tpch.sh query
  4. View the results.

    • Test result summary

      When the bin/run_tpch.sh command finishes, it prints the test results, which look similar to the following:

      TASK [tpc_h : debug] ****************************************************************************************************************************************
      ok: [10.1.**.**] => {
          "command_output.stdout_lines": [
              "[info] 2022-03-01 09:51:23.295 | Run sql queries started.",
              "[info] 2022-03-01 09:51:23.330 | Run q10.sql started.",
              "[info] 2022-03-01 09:51:23.913 | Run q10.sql finished. Time taken: 0:00:00, .557 seconds",
              "[info] 2022-03-01 09:51:23.923 | Run q11.sql started.",
              "[info] 2022-03-01 09:51:24.026 | Run q11.sql finished. Time taken: 0:00:00, .100 seconds",
              "[info] 2022-03-01 09:51:24.038 | Run q12.sql started.",
              "[info] 2022-03-01 09:51:24.192 | Run q12.sql finished. Time taken: 0:00:00, .151 seconds",
              "[info] 2022-03-01 09:51:24.204 | Run q13.sql started.",
              "[info] 2022-03-01 09:51:24.553 | Run q13.sql finished. Time taken: 0:00:00, .347 seconds",
              "[info] 2022-03-01 09:51:24.563 | Run q14.sql started.",
              "[info] 2022-03-01 09:51:24.665 | Run q14.sql finished. Time taken: 0:00:00, .098 seconds",
              "[info] 2022-03-01 09:51:24.675 | Run q15.sql started.",
              "[info] 2022-03-01 09:51:24.852 | Run q15.sql finished. Time taken: 0:00:00, .175 seconds",
              "[info] 2022-03-01 09:51:24.864 | Run q16.sql started.",
              "[info] 2022-03-01 09:51:25.008 | Run q16.sql finished. Time taken: 0:00:00, .142 seconds",
              "[info] 2022-03-01 09:51:25.018 | Run q17.sql started.",
              "[info] 2022-03-01 09:51:25.269 | Run q17.sql finished. Time taken: 0:00:00, .248 seconds",
              "[info] 2022-03-01 09:51:25.280 | Run q18.sql started.",
              "[info] 2022-03-01 09:51:25.800 | Run q18.sql finished. Time taken: 0:00:00, .518 seconds",
              "[info] 2022-03-01 09:51:25.810 | Run q19.sql started.",
              "[info] 2022-03-01 09:51:25.943 | Run q19.sql finished. Time taken: 0:00:00, .130 seconds",
              "[info] 2022-03-01 09:51:25.953 | Run q1.sql started.",
              "[info] 2022-03-01 09:51:26.295 | Run q1.sql finished. Time taken: 0:00:00, .339 seconds",
              "[info] 2022-03-01 09:51:26.305 | Run q20.sql started.",
              "[info] 2022-03-01 09:51:26.708 | Run q20.sql finished. Time taken: 0:00:00, .400 seconds",
              "[info] 2022-03-01 09:51:26.720 | Run q21.sql started.",
              "[info] 2022-03-01 09:51:27.323 | Run q21.sql finished. Time taken: 0:00:00, .600 seconds",
              "[info] 2022-03-01 09:51:27.334 | Run q22.sql started.",
              "[info] 2022-03-01 09:51:27.403 | Run q22.sql finished. Time taken: 0:00:00, .065 seconds",
              "[info] 2022-03-01 09:51:27.415 | Run q2.sql started.",
              "[info] 2022-03-01 09:51:27.632 | Run q2.sql finished. Time taken: 0:00:00, .213 seconds",
              "[info] 2022-03-01 09:51:27.648 | Run q3.sql started.",
              "[info] 2022-03-01 09:51:27.917 | Run q3.sql finished. Time taken: 0:00:00, .262 seconds",
              "[info] 2022-03-01 09:51:27.936 | Run q4.sql started.",
              "[info] 2022-03-01 09:51:28.150 | Run q4.sql finished. Time taken: 0:00:00, .210 seconds",
              "[info] 2022-03-01 09:51:28.172 | Run q5.sql started.",
              "[info] 2022-03-01 09:51:28.954 | Run q5.sql finished. Time taken: 0:00:00, .778 seconds",
              "[info] 2022-03-01 09:51:28.976 | Run q6.sql started.",
              "[info] 2022-03-01 09:51:29.080 | Run q6.sql finished. Time taken: 0:00:00, .103 seconds",
              "[info] 2022-03-01 09:51:29.096 | Run q7.sql started.",
              "[info] 2022-03-01 09:51:29.445 | Run q7.sql finished. Time taken: 0:00:00, .346 seconds",
              "[info] 2022-03-01 09:51:29.460 | Run q8.sql started.",
              "[info] 2022-03-01 09:51:32.692 | Run q8.sql finished. Time taken: 0:00:03, 3.229 seconds",
              "[info] 2022-03-01 09:51:32.703 | Run q9.sql started.",
              "[info] 2022-03-01 09:51:33.318 | Run q9.sql finished. Time taken: 0:00:00, .611 seconds",
              "[info] 2022-03-01 09:51:33.324 | Run sql queries finished. Time taken: 0:00:10, 10.026 seconds"
          ]
      }
      TASK [tpc_h : debug] ********************************************************************************************************************************************************************************
      ok: [10.1.0.91] => {
          "work_dir": "/mnt/disk1/starrocks-benchmark/workdirs/tpc_h/sf1"
      }
    • Detailed test results

      After you successfully run the bin/run_tpch.sh command, the system builds the entire working directory for the TPC-H test and outputs the path to the <work_dir> directory. You can then switch to this directory to view related information, such as query statements, table creation statements, and execution logs.

      TASK [tpc_h : debug] ****************************************************
      ok: [10.1.0.91] => {
          "work_dir": /mnt/disk1/starrocks-benchmark/workdirs/tpc_h/sf1
      }
      Note

      In this example, <work_dir> is /mnt/disk1/starrocks-benchmark/workdirs/tpc_h/sf1.

      Run the cd <work_dir>/logs command to change to the logs subdirectory and view the test results and detailed SQL output.

      The <work_dir> directory has the following structure:

      <work_dir>/
      ├── config # Configurations for the run.sh and run_mysql.sh scripts.
      ├── logs # Logs from the most recent TPC-H run.
      │   ├── *.sql.err
      │   ├── *.sql.out
      │   └── run.log
      ├── queries # The 22 TPC-H SQL queries.
      │   ├── ddl
      │   │   └── create_tables.sql # TPC-H table creation SQL.
      │   └── *.sql
      ├── run_mysql.sh
      ├── run.sh # Runs all TPC-H queries.
      └── tpch_tools # The dbgen toolkit.