ApsaraDB for SelectDB provides high-performance and easy-to-use data analytics services. It performs well in scenarios such as wide table aggregation, multi-table joins, and high-concurrency point queries. This topic describes how to evaluate the performance of SelectDB on the 1000 GB Star Schema Benchmark (SSB) test dataset using SSB standard and SSB flat tests.
Overview
The Star Schema Benchmark (SSB) is a lightweight performance test dataset for data warehouse scenarios. SSB provides a simplified star schema dataset based on the Transaction Processing Performance Council - H (TPC-H) benchmark. It primarily tests the performance of multi-table join queries in a star schema. The industry also commonly flattens SSB into a wide table model, known as SSB flat, to test the performance of query engines.
Standard test datasets, including SSB, often differ significantly from actual business scenarios. Some tests also involve parameter tuning for the specific dataset. Therefore, standard test results reflect database performance only in specific scenarios. You should use your actual business data for further testing.
Preparations
Step 1: Prepare a destination instance
Prepare an instance.
If you have a destination instance, ensure that its configuration meets the following requirements.
If you do not have a destination instance, create an instance.
The instance used for the performance test in this topic meets the following requirements.
The kernel version is 4.1 or later.
If the kernel version of your destination instance is earlier than 4.1, you must upgrade the instance. For more information, see Upgrade the kernel version.
Specifications must be 96 cores and 384 GB of memory or higher. This test uses an instance with 96 cores and 384 GB of memory.
Cluster cache space must be 1200 GB or more. This test uses 1200 GB of cache space.
Set the
streaming_load_max_mbparameter to its maximum value.During the test, the tool uploads data to SelectDB using Stream Load. The data volume for this test exceeds the default Stream Load limit of 10,240 MB. You must set the streaming_load_max_mb parameter for the BE to the maximum value of 10,240,000 MB. For more information about how to change parameters, see Parameter settings.
Create a destination database for the test data.
If you already have a destination database, you can skip this operation.
Connect to the instance. For more information, see Connect to an ApsaraDB for SelectDB instance using a MySQL client.
Create the database.
This test uses a destination database named test_db. Run the following statement.
CREATE DATABASE test_db;
Step 2: Prepare a test server
The following dependency installation scripts are for Linux servers. If your server uses a different operating system, you must modify the installation scripts accordingly.
Notes
Note the following for your server.
If you plan to use Git to download the SSB test tool on the server, you must enable a public endpoint for the server.
New ECS instance: When you purchase an ECS instance, select Assign Public IPv4 Address for Public IP.
Existing ECS instance without a public endpoint: To enable a public endpoint for an ECS instance, see Enable a public endpoint.
This test generates about 1000 GB of data files. Ensure that the server has sufficient disk space.
Procedure
Create a destination server.
If you already have a destination server, you can skip this step.
If you do not have a destination server, create a custom ECS instance and select Alibaba Cloud Linux for the image.
Install the MySQL client dependency.
yum install mysql(Optional) Install Git.
This test uses Git to download the SSB tool. If you have already obtained the SSB tool by other means and plan to upload it to the server manually, you can skip this step.
yum install git
Step 3: Ensure network connectivity
Ensure that the destination server where the SSB test tool will be installed can connect to the SelectDB instance.
Request a public endpoint for the SelectDB instance. For more information, see Request and release a public endpoint.
If the destination server for the SSB test tool is an Alibaba Cloud server in the same VPC as the ApsaraDB for SelectDB instance, you can skip this step.
Add the IP address of the destination server for the SSB test tool to the whitelist of the ApsaraDB for SelectDB instance. For more information, see Set a whitelist.
Step 4: Understand the test dataset
In this test, SSB generates 1000 GB of data and imports it into SelectDB to test the performance of SelectDB. The following section describes the data tables in the 1000 GB test dataset.
SSB table name | Number of rows | Notes |
lineorder | 5999989709 | Order details table. |
customer | 30000000 | Customer information table. |
part | 2000000 | Part information table. |
supplier | 2000000 | Supplier information table. |
dates | 2556 | Date table. |
lineorder_flat | 5999989709 | Flattened wide table. |
Procedure
The following scripts are for Linux servers. If your server uses a different operating system, you must modify the scripts accordingly.
Step 1: Log on to the destination server
If your server is an Alibaba Cloud ECS instance, see Connect to an ECS instance for logon instructions.
For other types of servers, refer to the relevant product documentation.
Step 2: Download and install the SSB data generation tool
Download the tool.
This test uses Git to download the tool. Run the following script.
git clone https://github.com/apache/doris.git && cd ./doris/tools/ssb-toolsYou can also download the tool from ssb-tools and manually upload it to the destination server.
Compilation tools
Run the following script to compile the tool.
sh bin/build-ssb-dbgen.sh
Step 3: Generate the SSB test dataset
Generating a large data volume takes a long time. The actual time required depends on the server performance.
Run the script to generate the test dataset in the installation directory of the test tool.
Syntax:
sh bin/gen-ssb-data.sh -s <yourAimDataNum>Parameter description:
yourAimDataNum:
Meaning: The size of the data to generate using SSB.
Unit: GB
This is a medium-scale test that requires generating a 1000 GB (1 TB) test dataset. This step can take a long time. Run the task in the background.
nohup sh bin/gen-ssb-data.sh -s 1000 > gen-ssb-data.log 2>&1 &The execution results are saved in the gen-ssb-data.log file in the tool's installation directory. You can view this file to verify that the process ran correctly.
The test dataset is saved in the ssb-data directory within the bin directory of the tool's installation path. The data files have a .tbl suffix.
If a "bang!" warning appears during data generation and you have confirmed that there is sufficient disk space for the test data, this may be caused by a concurrency control issue with the generation tool. This warning can be ignored.
Step 4: Use a script to create SSB test tables for SelectDB
Configure SelectDB instance information
Before you run the table creation script, configure the SelectDB instance information in the
doris-cluster.conffile. This file is located in thessb-tools/conf/directory of the tool's installation path. Example:# Any of FE host export FE_HOST='selectdb-cn-****.selectdbfe.rds.aliyuncs.com' # http_port in fe.conf export FE_HTTP_PORT=8080 # query_port in fe.conf export FE_QUERY_PORT=9030 # Doris username export USER='admin' # Doris password export PASSWORD='****' # The database where SSB tables located export DB='test_db'Parameter description:
Parameter
Description
FE_HOST
The endpoint of the SelectDB instance.
Get the VPC endpoint or public endpoint from the Network Information section on the instance details page in the SelectDB console.
FE_HTTP_PORT
The HTTP protocol port of the SelectDB instance.
Get the HTTP protocol port from the Network Information section on the instance details page in the SelectDB console.
FE_QUERY_PORT
The MySQL protocol port of the SelectDB instance. Get the MySQL protocol port from the Network Information section on the instance details page in the SelectDB console.
USER
The account for the SelectDB instance.
After you create a SelectDB instance, the system creates an admin account by default.
PASSWORD
The password for the SelectDB instance account.
If you set USER to the admin account but forgot the password, reset the account password in the console.
DB
The name of the destination database in the SelectDB instance for data import.
Create tables
In the tool's installation directory, run the following script to create the test tables. After the script is executed, the tables from the test dataset are created in the destination database of the SelectDB instance.
sh bin/create-ssb-tables.sh -s 1000
Step 5: Import data into SelectDB
Importing a large data volume takes a long time. The actual time required depends on the server performance.
In the tool's installation directory, run the following script to import all data from the SSB test set and the SSB flat wide table into SelectDB.
sh bin/load-ssb-data.shThis is a medium-scale test that requires importing the generated 1000 GB (1 TB) test dataset into SelectDB. This step can take a long time. Run the task in the background.
nohup sh bin/load-ssb-data.sh > load-ssb-data.log 2>&1 &The execution results are saved in the load-ssb-data.log file in the tool's installation directory. You can view this file to verify that the process ran correctly.
Step 6: Test query performance
Running a batch test on a large data volume takes a long time. The actual time required depends on the server performance.
SSB standard testing evaluates database performance in complex star schema query scenarios, such as multi-table joins, aggregations, and filtering.
SSB flat testing evaluates database performance on wide table structures and tests the impact of a flattened data model on query efficiency.
Batch test query SQL performance
SSB standard test
Run the SQL script for the SSB standard test to batch-execute the SQL statements in the test set.
Syntax:
sh bin/run-ssb-queries.sh -s <yourAimDataNum>Parameter description:
yourAimDataNum: Ensures that the query runs against the correct dataset scale. The value must match the scale used for data generation. For example, if you used
-s 1000to generate data, you must also use-s 1000to run queries.After the script is executed, the console window displays the performance of each SQL statement from the test set in SelectDB.
This is a medium-scale test that queries a 1000 GB (1 TB) test dataset. This step can take a long time. Run the task in the background.
nohup sh bin/run-ssb-queries.sh -s 1000 > run-ssb-queries.log 2>&1 &For more information about the SQL statements for batch testing, see ssb-queries.
The query performance results are saved in the run-ssb-queries.log file in the tool's installation directory. You can view this file to obtain information about the query process and the test results. For the test results on 1000 GB of data from this document, see Test results.
SSB-flat test
Run the SQL script for the SSB-flat test to batch-execute the SQL statements in the test set.
Syntax:
sh bin/run-ssb-flat-queries.sh -s <yourAimDataNum>Parameters:
yourAimDataNum: Ensures that the query runs against the correct dataset scale, which must match the scale used for data generation. For example, if you used
-s 1000to generate data, you must also use-s 1000to run queries.After the script runs, the console window displays the performance of each SQL statement from the test set in SelectDB.
This medium-scale test queries a 1000 GB (1 TB) test dataset. Because this step can take a long time, you can run the task in the background.
nohup sh bin/run-ssb-flat-queries.sh -s 1000 > run-ssb-flat-queries.log 2>&1 &For details about the SQL statements for batch testing, see ssb-flat-queries.
The query performance results and information about the query process are saved to the run-ssb-flat-queries.log file in the tool's installation directory. For the test results on 1000 GB of data, see Test results.
Test single query SQL performance
You can also test the performance of a single SQL statement in SelectDB as follows:
Connect to the SelectDB instance. For more information, see Connect to an ApsaraDB for SelectDB instance using DMS.
Run the target SQL statement.
SSB standard test
Obtain the target SQL statement from SSB standard test query statements and run it.
You can also select and run one of the SQL statements used in this test.
--Q1.1 SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE FROM lineorder, dates WHERE lo_orderdate = d_datekey AND d_year = 1993 AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25; --Q1.2 SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE FROM lineorder, dates WHERE lo_orderdate = d_datekey AND d_yearmonth = 'Jan1994' AND lo_discount BETWEEN 4 AND 6 AND lo_quantity BETWEEN 26 AND 35; --Q1.3 SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE FROM lineorder, dates WHERE lo_orderdate = d_datekey AND d_weeknuminyear = 6 AND d_year = 1994 AND lo_discount BETWEEN 5 AND 7 AND lo_quantity BETWEEN 26 AND 35; --Q2.1 SELECT SUM(lo_revenue), d_year, p_brand FROM lineorder, dates, part, supplier WHERE lo_orderdate = d_datekey AND lo_partkey = p_partkey AND lo_suppkey = s_suppkey AND p_category = 'MFGR#12' AND s_region = 'AMERICA' GROUP BY d_year, p_brand ORDER BY p_brand; --Q2.2 SELECT SUM(lo_revenue), d_year, p_brand FROM lineorder, dates, part, supplier WHERE lo_orderdate = d_datekey AND lo_partkey = p_partkey AND lo_suppkey = s_suppkey AND p_brand BETWEEN 'MFGR#2221' AND 'MFGR#2228' AND s_region = 'ASIA' GROUP BY d_year, p_brand ORDER BY d_year, p_brand; --Q2.3 SELECT SUM(lo_revenue), d_year, p_brand FROM lineorder, dates, part, supplier WHERE lo_orderdate = d_datekey AND lo_partkey = p_partkey AND lo_suppkey = s_suppkey AND p_brand = 'MFGR#2239' AND s_region = 'EUROPE' GROUP BY d_year, p_brand ORDER BY d_year, p_brand; --Q3.1 SELECT c_nation, s_nation, d_year, SUM(lo_revenue) AS REVENUE FROM customer, lineorder, supplier, dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_orderdate = d_datekey AND c_region = 'ASIA' AND s_region = 'ASIA' AND d_year >= 1992 AND d_year <= 1997 GROUP BY c_nation, s_nation, d_year ORDER BY d_year ASC, REVENUE DESC; --Q3.2 SELECT c_city, s_city, d_year, SUM(lo_revenue) AS REVENUE FROM customer, lineorder, supplier, dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_orderdate = d_datekey AND c_nation = 'UNITED STATES' AND s_nation = 'UNITED STATES' AND d_year >= 1992 AND d_year <= 1997 GROUP BY c_city, s_city, d_year ORDER BY d_year ASC, REVENUE DESC; --Q3.3 SELECT c_city, s_city, d_year, SUM(lo_revenue) AS REVENUE FROM customer, lineorder, supplier, dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_orderdate = d_datekey AND ( c_city = 'UNITED KI1' OR c_city = 'UNITED KI5' ) AND ( s_city = 'UNITED KI1' OR s_city = 'UNITED KI5' ) AND d_year >= 1992 AND d_year <= 1997 GROUP BY c_city, s_city, d_year ORDER BY d_year ASC, REVENUE DESC; --Q3.4 SELECT c_city, s_city, d_year, SUM(lo_revenue) AS REVENUE FROM customer, lineorder, supplier, dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_orderdate = d_datekey AND ( c_city = 'UNITED KI1' OR c_city = 'UNITED KI5' ) AND ( s_city = 'UNITED KI1' OR s_city = 'UNITED KI5' ) AND d_yearmonth = 'Dec1997' GROUP BY c_city, s_city, d_year ORDER BY d_year ASC, REVENUE DESC; --Q4.1 SELECT d_year, c_nation, SUM(lo_revenue - lo_supplycost) AS PROFIT FROM dates, customer, supplier, part, lineorder WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_partkey = p_partkey AND lo_orderdate = d_datekey AND c_region = 'AMERICA' AND s_region = 'AMERICA' AND ( p_mfgr = 'MFGR#1' OR p_mfgr = 'MFGR#2' ) GROUP BY d_year, c_nation ORDER BY d_year, c_nation; --Q4.2 SELECT d_year, s_nation, p_category, SUM(lo_revenue - lo_supplycost) AS PROFIT FROM dates, customer, supplier, part, lineorder WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_partkey = p_partkey AND lo_orderdate = d_datekey AND c_region = 'AMERICA' AND s_region = 'AMERICA' AND ( d_year = 1997 OR d_year = 1998 ) AND ( p_mfgr = 'MFGR#1' OR p_mfgr = 'MFGR#2' ) GROUP BY d_year, s_nation, p_category ORDER BY d_year, s_nation, p_category; --Q4.3 SELECT d_year, s_city, p_brand, SUM(lo_revenue - lo_supplycost) AS PROFIT FROM dates, customer, supplier, part, lineorder WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_partkey = p_partkey AND lo_orderdate = d_datekey AND s_nation = 'UNITED STATES' AND ( d_year = 1997 OR d_year = 1998 ) AND p_category = 'MFGR#14' GROUP BY d_year, s_city, p_brand ORDER BY d_year, s_city, p_brand;SSB-flat test
Obtain the target SQL statement from SSB-flat test query statements and run it.
You can also select and run one of the SQL statements used in this test.
--Q1.1 SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE LO_ORDERDATE >= 19930101 AND LO_ORDERDATE <= 19931231 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25; --Q1.2 SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE LO_ORDERDATE >= 19940101 AND LO_ORDERDATE <= 19940131 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35; --Q1.3 SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE weekofyear(LO_ORDERDATE) = 6 AND LO_ORDERDATE >= 19940101 AND LO_ORDERDATE <= 19941231 AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35; --Q2.1 SELECT SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR, P_BRAND FROM lineorder_flat WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' GROUP BY YEAR, P_BRAND ORDER BY YEAR, P_BRAND; --Q2.2 SELECT SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR, P_BRAND FROM lineorder_flat WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA' GROUP BY YEAR, P_BRAND ORDER BY YEAR, P_BRAND; --Q2.3 SELECT SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR, P_BRAND FROM lineorder_flat WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' GROUP BY YEAR, P_BRAND ORDER BY YEAR, P_BRAND; --Q3.1 SELECT C_NATION, S_NATION, (LO_ORDERDATE DIV 10000) AS YEAR, SUM(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND LO_ORDERDATE >= 19920101 AND LO_ORDERDATE <= 19971231 GROUP BY C_NATION, S_NATION, YEAR ORDER BY YEAR ASC, revenue DESC; --Q3.2 SELECT C_CITY, S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR, SUM(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND LO_ORDERDATE >= 19920101 AND LO_ORDERDATE <= 19971231 GROUP BY C_CITY, S_CITY, YEAR ORDER BY YEAR ASC, revenue DESC; --Q3.3 SELECT C_CITY, S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR, SUM(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_CITY IN ('UNITED KI1', 'UNITED KI5') AND S_CITY IN ('UNITED KI1', 'UNITED KI5') AND LO_ORDERDATE >= 19920101 AND LO_ORDERDATE <= 19971231 GROUP BY C_CITY, S_CITY, YEAR ORDER BY YEAR ASC, revenue DESC; --Q3.4 SELECT C_CITY, S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR, SUM(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_CITY IN ('UNITED KI1', 'UNITED KI5') AND S_CITY IN ('UNITED KI1', 'UNITED KI5') AND LO_ORDERDATE >= 19971201 AND LO_ORDERDATE <= 19971231 GROUP BY C_CITY, S_CITY, YEAR ORDER BY YEAR ASC, revenue DESC; --Q4.1 SELECT (LO_ORDERDATE DIV 10000) AS YEAR, C_NATION, SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND P_MFGR IN ('MFGR#1', 'MFGR#2') GROUP BY YEAR, C_NATION ORDER BY YEAR ASC, C_NATION ASC; --Q4.2 SELECT (LO_ORDERDATE DIV 10000) AS YEAR, S_NATION, P_CATEGORY, SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND LO_ORDERDATE >= 19970101 AND LO_ORDERDATE <= 19981231 AND P_MFGR IN ('MFGR#1', 'MFGR#2') GROUP BY YEAR, S_NATION, P_CATEGORY ORDER BY YEAR ASC, S_NATION ASC, P_CATEGORY ASC; --Q4.3 SELECT (LO_ORDERDATE DIV 10000) AS YEAR, S_CITY, P_BRAND, SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE S_NATION = 'UNITED STATES' AND LO_ORDERDATE >= 19970101 AND LO_ORDERDATE <= 19981231 AND P_CATEGORY = 'MFGR#14' GROUP BY YEAR, S_CITY, P_BRAND ORDER BY YEAR ASC, S_CITY ASC, P_BRAND ASC;
Test results
The following table shows the 1000 GB query performance results of the SSB standard test and the SSB-flat test. The tests were run on a SelectDB instance with kernel version 4.1.1, 96 cores, 384 GB of memory, and 1200 GB of cluster cache space.
Query | SSB 1000 | SSB-Flat 1000 |
Q1.1 | 0.14 | 0.07 |
Q1.2 | 0.08 | 0.03 |
Q1.3 | 0.08 | 0.08 |
Q2.1 | 0.56 | 0.36 |
Q2.2 | 0.65 | 0.33 |
Q2.3 | 0.43 | 0.25 |
Q3.1 | 1.46 | 0.62 |
Q3.2 | 0.55 | 0.38 |
Q3.3 | 0.42 | 0.23 |
Q3.4 | 0.12 | 0.04 |
Q4.1 | 1.64 | 0.9 |
Q4.2 | 0.5 | 0.17 |
Q4.3 | 0.27 | 0.12 |
Total | 6.9 | 3.58 |