All Products
Search
Document Center

E-MapReduce:Test plan

Last Updated:Dec 12, 2024

This topic describes how to use TPC Benchmark DS (TPC-DS) to test the performance of online analytical processing (OLAP) queries.

Overview of TPC-DS

TPC-DS is a standard benchmark formulated by the Transaction Processing Performance Council (TPC), the most well-known organization that defines measurement benchmarks for data management systems. The measurement results of this benchmark are also published by TPC.

In this test, a total of 99 complex SQL queries are tested against 24 tables in accordance with the TPC-DS benchmark test. The test focuses on the response time of each query, which is the amount of time required from submitting a query to receiving the returned results.

Note

The test in this example is performed based on the TPC-DS benchmark, but the test does not meet all requirements of the TPC-DS benchmark test. As a result, the test results in this example may not match the published results of the TPC-DS benchmark test.

Preparations

Prepare the basic environment and TPC-DS data that are required to test OLAP queries.

Note

To reduce the variables that may impact the test results, we recommend that you use new instances each time you perform a test, instead of upgrading or downgrading the specifications of existing instances.

Step

Description

  1. Create an Elastic Compute Service (ECS) instance.

Log on to the Alibaba Cloud Management Console and create an ECS instance. The ECS instance is used to generate data, import data to StarRocks, and test clients. We recommend that you create an ECS instance that meets the following requirements:

  • Operating system: CentOS or Alibaba Cloud Linux.

  • Data disk: Enterprise SSD (ESSD). You can determine the capacity based on the amount of data used in the test.

  • Region and VPC: Make sure that the ECS instance and the E-MapReduce (EMR) Serverless StarRocks instance that you want to use reside in the same region and are connected by using the same virtual private cloud (VPC).

  • Public IP address: A public IP address must be assigned to the ECS instance.

For more information, see Create instances.

  1. Create an EMR Serverless StarRocks instance.

In this test, a backend (BE) that has eight compute units (CUs) is used. This indicates that the EMR Serverless StarRocks instance is configured with eight CPU cores and 32 GB of memory. You can also configure the computing resources based on your business requirements.

Log on to the EMR console. In the left-side navigation pane, choose EMR Serverless > StarRocks. On the Instances tab, click Create Instance.

For more information, see Create an instance.

  1. Download and decompress the benchmark test package for StarRocks.

Log on to the ECS instance that you created in Step 1 and run the following commands on the ECS instance to download the benchmark test package:

sudo wget https://starrocks-oss.oss-cn-beijing.aliyuncs.com/public-access/starrocks-tpcds-benchmark-for-serverless.zip
sudo yum install -y unzip
unzip starrocks-tpcds-benchmark-for-serverless.zip

Procedure

Step 1: Configure the test package

  1. Log on to the created ECS instance. For more information, see Connect to an instance.

  2. Run the following command to go to the directory in which the test package is stored:

    cd tpcds-poc-1.0
  3. Run the vim conf/starrocks.conf command and modify the following parameters based on your business requirements:

    # for mysql cmd
    mysql_host: fe-c-***-internal.starrocks.aliyuncs.com
    mysql_port: 9030
    mysql_user: admin
    mysql_password: ****
    database: tpcds
    
    # cluster ports
    http_port: 8030
    be_heartbeat_port: 9050
    broker_port: 8000
    
    # benchmark config
    scale_factor: 3
    dataset_generate_path: /mnt/disk1/starrocks-tpcds-benchmark/datasets

    The following table describes the parameters.

    Parameter

    Description

    mysql_host

    The internal endpoint of the frontend (FE) on the EMR Serverless StarRocks instance.

    You can go to the Instance Details tab of the EMR Serverless StarRocks instance to view the internal endpoint of the instance in the FE Details section.

    Important

    We recommend that you do not use a public endpoint.

    mysql_port

    The query port of the FE on the EMR Serverless StarRocks instance. Default value: 9030.

    You can go to the Instance Details tab of the EMR Serverless StarRocks instance to view the query port of the instance in the FE Details section.

    mysql_user

    The username used to log on to the EMR Serverless StarRocks instance. Default value: admin.

    mysql_password

    The password used to log on to the EMR Serverless StarRocks instance.

    database

    The name of the database that is created for the EMR Serverless StarRocks instance. Use the default value.

    http_port

    The HTTP port of the FE on the EMR Serverless StarRocks instance. Default value: 8030.

    You can go to the Instance Details tab of the EMR Serverless StarRocks instance to view the HTTP port of the instance in the FE Details section.

    scale_factor

    The scale factor (SF) of the test dataset, which controls the amount of data to be generated. Default value: 3. Unit: GB.

    Important

    When you create the EMR Serverless StarRocks instance, make sure that the capacity of the disk is greater than the size of the test dataset to be generated.

    dataset_generate_root_path

    The path in which the generated test dataset is stored. Default value: /mnt/disk1/starrocks-benchmark/datasets.

Step 2: Run a test

Run the following commands to enable an automatic TPC-DS test. During the automatic TPC-DS test, a database, tables, and a dataset are created, data is loaded, and queries are run.

cd bin
sh run_tpcds.sh

You can also perform the following operations based on your business requirements:

  • Run the following command to run queries:

    sh run_tpcds.sh query
  • Run the following command to generate test data again:

    sh run_tpcds.sh gen_data
  • If the gen_data directory is generated and you need to import test data again, run the following command to refresh the directory:

    sh run_tpcds.sh reload

Step 3: View test results

The test results are directly returned after the queries are run. The SQL queries are run on the database named tpcds3. The following results show the response time of 99 queries:

Database from which data is queried: tpcds3
SQL		Time(ms)
Query01		34
Query02		36
Query03		26
Query04		57
Query05		40
Query06		29
Query07		35
Query08		33
Query09		31
Query10		33
Query11		45
Query12		26
Query13		31
Query14		91
Query15		29
Query16		50
Query17		36
Query18		33
Query19		31
Query20		32
Query21		32
Query22		33
Query23		67
Query24		39
Query25		32
Query26		36
Query27		32
Query28		32
Query29		33
Query30		37
Query31		46
Query32		32
Query33		52
Query34		33
Query35		35
Query36		30
Query37		31
Query38		37
Query39		33
Query40		36
Query41		34
Query42		25
Query43		31
Query44		30
Query45		27
Query46		33
Query47		50
Query48		36
Query49		49
Query50		38
Query51		30
Query52		26
Query53		32
Query54		32
Query55		27
Query56		49
Query57		41
Query58		37
Query59		37
Query60		48
Query61		32
Query62		42
Query63		34
Query64		130
Query65		29
Query66		87
Query67		31
Query68		33
Query69		40
Query70		34
Query71		29
Query72		32
Query73		28
Query74		40
Query75		48
Query76		28
Query77		40
Query78		36
Query79		29
Query80		120
Query81		36
Query82		28
Query83		39
Query84		26
Query85		37
Query86		28
Query87		32
Query88		46
Query89		32
Query90		28
Query91		32
Query92		35
Query93		29
Query94		65
Query95		57
Query96		27
Query97		29
Query98		25
Query99		38
All time(ms):	3799