本文将为您介绍如何使用TPC-H(商业智能计算测试)对OLAP查询场景和Key/Value点查场景进行性能测试。
TPC-H简介
以下文字描述引用自TPC Benchmark™ H (TPC-H)规范:
TPC-H是一个决策支持基准,由一套面向业务的临时查询和并发数据修改组成。选择的查询和填充数据库的数据具有广泛的行业相关性。该基准测试说明了决策支持系统可以检查大量数据,执行高度复杂的查询,并解答关键的业务问题。
详情请参见TPCH Specification。
本文的TPC-H的实现基于TPC-H的基准测试,并不能与已发布的TPC-H基准测试结果相比较,本文中的测试并不符合TPC-H基准测试的所有要求。
数据集简介
TPC-H(商业智能计算测试)是美国交易处理效能委员会(TPC,Transaction Processing Performance Council)组织制定的用来模拟决策支持类应用的一个测试集。目前在学术界和工业界普遍采用它来评价决策支持技术方面应用的性能。
TPC-H是根据真实的生产运行环境来建模,模拟了一套销售系统的数据仓库。其共包含8张表,数据量可设定从1 GB~3 TB不等。其基准测试共包含了22个查询,主要评价指标各个查询的响应时间,即从提交查询到结果返回所需时间。其测试结果可综合反映系统处理查询时的能力。详情请参见TPC-H基准。
场景说明
本测试场景主要包含如下内容:
OLAP查询场景测试,主要使用列存表,直接使用TPC-H测试中的22条查询语句进行测试。
Key/Value点查场景测试,主要使用行存表,针对ORDERS使用行存表后,进行主键过滤的点查。
数据更新场景,主要用于测试OLAP引擎在有主键的情况下数据更新的性能。
测试数据量会直接影响测试结果,TPC-H的生成工具中使用SF(scale factor)控制生成数据量的大小,1SF对应1GB。
以上的数据量仅针对原始数据的数据量,不包括索引等空间占用,因此在准备环境时,您需要预留更多的空间。
准备工作
您需要准备OLAP查询场景和Key/Value点查场景所需的基础环境和TPC-H数据。具体内容如下:
为了减少可能对测试结果有影响的变量,建议每次使用新创建的实例进行测试,不要使用升降配的实例。
基础环境准备
操作步骤
操作说明
1.创建ECS实例
登录阿里云,创建一个ECS实例,用于数据生成、向Hologres导入数据和客户端测试。建议规格具体如下:
ecs.g6.4xlarge规格。
CentOS 8.4系统。
ESSD数据盘,具体数据容量根据需要测试的数据量大小决定。
建议ECS与Hologres实例在相同地域,使用相同的VPC网络。
更多关于创建ECS的操作,请参见创建实例。
2.创建Hologres实例
本次测试环境使用了独享(按量付费)的实例,由于该实例仅用于测试使用,计算资源配置选择64核256GB。您可以根据实际业务需求,选择计算资源配置。
登录阿里云,进入Hologres管理控制台,在实例列表页面,单击新增引擎实例。
更多关于创建Hologres实例的操作,请参见购买Hologres。
3.创建测试数据库
您需要登录创建的Hologres实例,创建一个数据库。本文示例中命名数据库为tpch_1sf。
更多关于创建数据库的操作,请参见新建数据库。
生成TPC-H数据
您需要准备数据生成工具,即通过远程连接ECS实例,实现下载、编译并生成TPC-H数据。具体说明如下:
操作步骤
操作说明
1.准备数据生成工具
准备数据生成工具,您需要执行如下内容:
远程连接ECS实例。
更多关于连接方式的说明,请参见连接方式概述。
连接实例后,执行
yum update
,更新所有库。执行
yum install git
,安装git。执行
yum install gcc
,安装gcc。执行
git clone https://github.com/gregrahn/tpch-kit.git
,下载TPC-H数据生成代码。执行
cd tpch-kit/dbgen
,进入数据生成工具代码目录。执行
make
,编译数据生成工具代码。
2.生成数据
编译成功后,您可以执行
./dbgen --help
查看代码生成工具的相关参数。您可以执行如下内容生成TPC-H数据用于下文的测试,同时您也可以查看生成的文件:本文示例仅生成1 GB数据,您可以运行如下代码生成数据。
./dbgen -vf -s 1
如果您需要生成更多数据量的数据,可以调整SF的参数。例如,您可以使用如下代码生成1 TB数据。
./dbgen -vf -s 1000
一般情况下,32CU可以运行TPC-H SF10的数据量,256CU可以运行TPC-H SF50的数据量。
生成的数据存储在ECS中,您可以在数据生成后使用如下代码查看生成的文件。可以看到生成工具生成了8个数据文件,每个数据文件都对应一张数据集中的表。
ls | grep '.*.tbl'
OLAP查询场景测试
OLAP查询场景测试,您需要创建表并导入数据。数据导入完成后,为更好的执行查询,您可以收集各张表的特征信息,待后台Compaction(压缩)完成后,使用pgbench等工具进行查询测试。具体步骤如下:
创建表
本文内容主要基于psql进行数据导入操作,您需要先在ECS中运行如下命令安装psql。
yum install postgresql-server -y
安装psql后,您可以使用如下命令登录Hologres实例。更多关于参数的说明,请参见连接psql参数说明。
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database>
使用psql连接Hologres后,您可以使用如下建表语句创建数据库表。
DROP TABLE IF EXISTS LINEITEM; BEGIN; CREATE TABLE LINEITEM ( L_ORDERKEY bigint NOT NULL, L_PARTKEY int NOT NULL, L_SUPPKEY int NOT NULL, L_LINENUMBER int NOT NULL, L_QUANTITY DECIMAL(15, 2) NOT NULL, L_EXTENDEDPRICE DECIMAL(15, 2) NOT NULL, L_DISCOUNT DECIMAL(15, 2) NOT NULL, L_TAX DECIMAL(15, 2) NOT NULL, L_RETURNFLAG text NOT NULL, L_LINESTATUS text NOT NULL, L_SHIPDATE date NOT NULL, L_COMMITDATE date NOT NULL, L_RECEIPTDATE date NOT NULL, L_SHIPINSTRUCT text NOT NULL, L_SHIPMODE text NOT NULL, L_COMMENT text NOT NULL, PRIMARY KEY (L_ORDERKEY, L_LINENUMBER) ); CALL set_table_property ('LINEITEM', 'clustering_key', 'L_SHIPDATE,L_ORDERKEY'); CALL set_table_property ('LINEITEM', 'segment_key', 'L_SHIPDATE'); CALL set_table_property ('LINEITEM', 'distribution_key', 'L_ORDERKEY'); CALL set_table_property ('LINEITEM', 'bitmap_columns', 'L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE'); CALL set_table_property ('LINEITEM', 'dictionary_encoding_columns', 'l_comment:off,l_returnflag,l_linestatus,l_shipinstruct,l_shipmode'); COMMIT; DROP TABLE IF EXISTS ORDERS; BEGIN; CREATE TABLE ORDERS ( O_ORDERKEY bigint NOT NULL PRIMARY KEY, O_CUSTKEY int NOT NULL, O_ORDERSTATUS text NOT NULL, O_TOTALPRICE DECIMAL(15, 2) NOT NULL, O_ORDERDATE date NOT NULL, O_ORDERPRIORITY text NOT NULL, O_CLERK text NOT NULL, O_SHIPPRIORITY int NOT NULL, O_COMMENT text NOT NULL ); CALL set_table_property ('ORDERS', 'segment_key', 'O_ORDERDATE'); CALL set_table_property ('ORDERS', 'colocate_with', 'lineitem'); CALL set_table_property ('ORDERS', 'distribution_key', 'O_ORDERKEY'); CALL set_table_property ('ORDERS', 'bitmap_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY'); CALL set_table_property ('ORDERS', 'dictionary_encoding_columns', 'o_comment:off,o_orderstatus,o_orderpriority,o_clerk'); COMMIT; DROP TABLE IF EXISTS PARTSUPP; BEGIN; CREATE TABLE PARTSUPP ( PS_PARTKEY int NOT NULL, PS_SUPPKEY int NOT NULL, PS_AVAILQTY int NOT NULL, PS_SUPPLYCOST DECIMAL(15, 2) NOT NULL, PS_COMMENT text NOT NULL, PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY) ); CALL set_table_property ('PARTSUPP', 'distribution_key', 'PS_PARTKEY'); CALL set_table_property ('PARTSUPP', 'colocate_with', 'lineitem'); CALL set_table_property ('PARTSUPP', 'bitmap_columns', 'ps_availqty'); CALL set_table_property ('PARTSUPP', 'dictionary_encoding_columns', ''); COMMIT; DROP TABLE IF EXISTS PART; BEGIN; CREATE TABLE PART ( P_PARTKEY int NOT NULL PRIMARY KEY, P_NAME text NOT NULL, P_MFGR text NOT NULL, P_BRAND text NOT NULL, P_TYPE text NOT NULL, P_SIZE int NOT NULL, P_CONTAINER text NOT NULL, P_RETAILPRICE DECIMAL(15, 2) NOT NULL, P_COMMENT text NOT NULL ); CALL set_table_property ('PART', 'distribution_key', 'P_PARTKEY'); CALL set_table_property ('PART', 'colocate_with', 'lineitem'); CALL set_table_property ('PART', 'bitmap_columns', 'P_SIZE,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER'); CALL set_table_property ('PART', 'dictionary_encoding_columns', 'p_name:off,p_comment:off,p_mfgr,p_brand,p_type,p_container'); COMMIT; DROP TABLE IF EXISTS CUSTOMER; BEGIN; CREATE TABLE CUSTOMER ( C_CUSTKEY int NOT NULL PRIMARY KEY, C_NAME text NOT NULL, C_ADDRESS text NOT NULL, C_NATIONKEY int NOT NULL, C_PHONE text NOT NULL, C_ACCTBAL DECIMAL(15, 2) NOT NULL, C_MKTSEGMENT text NOT NULL, C_COMMENT text NOT NULL ); CALL set_table_property ('CUSTOMER', 'distribution_key', 'C_CUSTKEY'); CALL set_table_property ('CUSTOMER', 'colocate_with', 'lineitem'); CALL set_table_property ('CUSTOMER', 'bitmap_columns', 'C_NATIONKEY,C_MKTSEGMENT'); CALL set_table_property ('CUSTOMER', 'dictionary_encoding_columns', 'c_name:off,c_address:off,c_phone:off,c_comment:off,c_mktsegment'); COMMIT; DROP TABLE IF EXISTS SUPPLIER; BEGIN; CREATE TABLE SUPPLIER ( S_SUPPKEY int NOT NULL PRIMARY KEY, S_NAME text NOT NULL, S_ADDRESS text NOT NULL, S_NATIONKEY int NOT NULL, S_PHONE text NOT NULL, S_ACCTBAL DECIMAL(15, 2) NOT NULL, S_COMMENT text NOT NULL ); CALL set_table_property ('SUPPLIER', 'distribution_key', 'S_SUPPKEY'); CALL set_table_property ('SUPPLIER', 'colocate_with', 'lineitem'); CALL set_table_property ('SUPPLIER', 'bitmap_columns', 'S_NATIONKEY'); CALL set_table_property ('SUPPLIER', 'dictionary_encoding_columns', ''); COMMIT; DROP TABLE IF EXISTS NATION; BEGIN; CREATE TABLE NATION ( N_NATIONKEY int NOT NULL PRIMARY KEY, N_NAME text NOT NULL, N_REGIONKEY int NOT NULL, N_COMMENT text NOT NULL ); CALL set_table_property ('NATION', 'distribution_key', 'N_NATIONKEY'); CALL set_table_property ('NATION', 'colocate_with', 'lineitem'); CALL set_table_property ('NATION', 'bitmap_columns', ''); CALL set_table_property ('NATION', 'dictionary_encoding_columns', ''); COMMIT; DROP TABLE IF EXISTS REGION; BEGIN; CREATE TABLE REGION ( R_REGIONKEY int NOT NULL PRIMARY KEY, R_NAME text NOT NULL, R_COMMENT text ); CALL set_table_property ('REGION', 'distribution_key', 'R_REGIONKEY'); CALL set_table_property ('REGION', 'colocate_with', 'lineitem'); CALL set_table_property ('REGION', 'bitmap_columns', ''); CALL set_table_property ('REGION', 'dictionary_encoding_columns', ''); COMMIT;
数据表创建完毕后,您可以在psql中执行
tpch_1sf=# \dt
查看是否创建成功。tpch_1sf=# \dt List of relations Schema | Name | Type | Owner --------+----------+-------+-------------------- public | customer | table | tpch_1sf_developer public | lineitem | table | tpch_1sf_developer public | nation | table | tpch_1sf_developer public | orders | table | tpch_1sf_developer public | part | table | tpch_1sf_developer public | partsupp | table | tpch_1sf_developer public | region | table | tpch_1sf_developer public | supplier | table | tpch_1sf_developer (8 rows)
导入数据
您可以使用如下方式导入数据:
本文主要使用
COPY FROM STDIN
的方式导入数据。更多关于COPY FROM STDIN
的详细操作,请参见使用COPY命令导入或导出本地数据。此处会将tbl数据文件导入Hologres创建的表中,tbl为准备工作中生成的TPC-H数据。您也可以在数据生成工具的目录中,使用如下Shell脚本导入数据。更多关于参数的说明,请参见参数说明。
for i in `ls *.tbl`; do echo $i; name=`echo $i| cut -d'.' -f1`; PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY $name from stdin with delimiter '|' csv;" < $i; done
收集统计信息
为了更好的执行查询,您可以在psql中使用如下语句,收集各张表的特征信息,建议等后台Compaction完成后(CPU消耗下降并平稳)再执行查询。
--收集表的统计信息 analyze nation; analyze region; analyze lineitem; analyze orders; analyze customer; analyze part; analyze partsupp; analyze supplier; --针对非主键的JOIN KEY收集统计信息 analyze lineitem (l_orderkey,l_partkey,l_suppkey); analyze orders (o_custkey); analyze partsupp(ps_partkey,ps_suppkey);
执行查询
为了方便统计查询信息,您需要使用pgbench工具。您可以使用如下命令安装pgbench工具。
yum install postgresql-contrib -y
为了避免因工具兼容性问题影响测试,请您安装版本为13及以上的pgbench工具。如果您本地已经安装pgbench工具,请确保其版本为9.6以上。您可以通过执行如下命令查看当前工具版本。
pgbench --version
为了方便查询,您可以直接单击下载TPCH 22条查询语句,并将下载的语句上传至ECS。更多关于上传数据的操作,请参见上传本地文件到ECS实例。
进入ECS,并访问上传文件的目录,使用如下Shell命令解压缩文件。
unzip tpch_query_updated_20210721.zip
您可以使用如下方式执行TPC-H的查询语句。查询语句具体内容,请参见TPC-H 22条查询语句。
使用pgbench工具,执行单条查询。
TPC-H共有22条语句,例如您需要执行Q6语句,则具体语句为
6.sql
PGUSER=<AccessID> PGPASSWORD=<AccessKey> PGDATABASE=<Database> pgbench -h <Endpoint> -p <Port> -c <Client_Num> -T <Query_Seconds> -n -f xxx.sql
使用Shell脚本,批量执行22条查询,并将结果输出到文件hologres_tpch_test.out中。
rm -f hologres_tpch_test.out echo `date +"%Y-%m-%d %H:%M:%S"` begin >> ./hologres_tpch_test.out for i in {1..22} do PGUSER=<AccessID> PGPASSWORD=<AccessKey> PGDATABASE=<Database> pgbench -h <Endpoint> -p <Port> -c <Client_Num> -T <Query_Seconds> -n -f ./tpch_query_updated_20210721/q${i}.sql >> ./hologres_tpch_test.out done
具体参数说明如下所示:
参数
说明
PGUSER
当前阿里云账号的AccessKey ID。
您可以单击AccessKey 管理,获取AccessKey ID。
PGPASSWORD
当前阿里云账号的AccessKey Secret。
您可以单击AccessKey 管理,获取AccessKey Secret。
-h
Hologres实例的网络地址(Endpoint)。
您可以进入Hologres管理控制台的实例详情页,从网络信息区域获取网络地址。
-p
Hologres实例的端口(Port)。
您可以进入Hologres管理控制台的实例详情页,从网络信息区域获取实例端口。
-d
Hologres指定实例中的数据库名(Database)。
-c
客户端数目,即并发度(Client_Num)。
本示例取值为1,由于该测试仅测试查询性能,不测试并发,将并发度置为1即可。
-T
每个客户端需要执行的每个Query的压测时长(Query_Seconds,秒)。例如本示例取值为300。
-f
压力测试的SQL语句。
当您执行单条查询时,执行语句为
xxx.sql
TPC-H共有22条语句,如您需要执行Q6语句,则具体语句为6.sql
当您执行批量查询时,批量脚本中该参数的取值无需修改。
TPC-H 22条查询语句具体内容,请参见TPC-H 22条查询语句。
您可以通过查看hologres_tpch_test.out得到查询结果。其中,
transaction type
表示执行具体的SQL文件。latency average
表示对应SQL文件的3次查询的平均时间。2021-03-23 03:50:54 begin pghost: hgpostcn-cn-oew21c935002-cn-hangzhou.hologres.aliyuncs.com pgport: 80 nclients: 1 nxacts: 3 dbName: tpch_100 transaction type: ./tpch_data_tpch_query/1.sql scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 3 number of transactions actually processed: 3/3 latency average = 76.936 ms tps = 12.997850 (including connections establishing) tps = 15.972757 (excluding connections establishing) ...
Key/Value点查场景测试
Key/Value点查场景测试,您需要创建表并导入数据。数据导入完成后,可以生成查询语句并进行查询测试。具体步骤如下:
创建表
Key/Value点查场景测试继续使用OLAP查询场景创建的数据库,使用TPCH数据集中的ORDERS表进行测试。您在使用psql连接Hologres后即可运行如下命令创建表。psql连接Hologres操作,请参见连接psql参数说明。
说明由于点查场景需要使用行存表,所以不能使用OLAP查询场景中使用的表,需要新创建一张表。
DROP TABLE IF EXISTS public.orders_row; BEGIN; CREATE TABLE public.orders_row( O_ORDERKEY BIGINT NOT NULL PRIMARY KEY ,O_CUSTKEY INT NOT NULL ,O_ORDERSTATUS TEXT NOT NULL ,O_TOTALPRICE DECIMAL(15,2) NOT NULL ,O_ORDERDATE TIMESTAMPTZ NOT NULL ,O_ORDERPRIORITY TEXT NOT NULL ,O_CLERK TEXT NOT NULL ,O_SHIPPRIORITY INT NOT NULL ,O_COMMENT TEXT NOT NULL ); CALL SET_TABLE_PROPERTY('public.orders_row', 'orientation', 'row'); CALL SET_TABLE_PROPERTY('public.orders_row', 'clustering_key', 'o_orderkey'); CALL SET_TABLE_PROPERTY('public.orders_row', 'distribution_key', 'o_orderkey'); COMMIT;
导入数据
您可以使用如下方式导入数据:
COPY方式导入数据
本文主要使用
COPY FROM STDIN
的方式导入数据。更多关于COPY FROM STDIN
的详细操作,请参见使用COPY命令导入或导出本地数据。此处会将tbl数据文件导入Hologres创建的表中,tbl为准备工作中生成的TPC-H数据。您也可以在数据生成工具的目录中,使用如下Shell脚本导入数据。更多关于参数的说明,请参见参数说明。
PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY public.orders_row from stdin with delimiter '|' csv;" < orders.tbl
INSERT INTO方式导入数据
在OLAP场景测试时已经导入了ORDERS表的数据,您可以运行如下SQL语句导入数据。
INSERT INTO public.orders_row SELECT * FROM public.orders;
执行查询
生成查询语句。
Key/Value点查场景主要有两种查询场景,具体查询语句如下:
查询方式
查询语句
说明
单值筛选
SELECT column_a ,column_b ,... ,column_x FROM table_x WHERE pk = value_x ;
此查询语句主要用于单值筛选,即
WHERE
的SQL语句取值唯一。多值筛选
SELECT column_a ,column_b ,... ,column_x FROM table_x WHERE pk IN ( value_a, value_b,..., value_x ) ;
此查询语句主要用于多值筛选,即
WHERE
的SQL语句可以取多个值。您可以使用如下脚本生成所需的SQL语句,该脚本会生成2条SQL语句。其中
kv_query_single.sql
表示单值筛选的SQL。kv_query_in.sql
表示多值筛选的SQL,该脚本会随机生成一个针对10个值筛选的SQL。rm -rf kv_query mkdir kv_query cd kv_query echo " \set column_values random(1,99999999) select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey =:column_values; " >> kv_query_single.sql echo " \set column_values1 random(1,99999999) \set column_values2 random(1,99999999) \set column_values3 random(1,99999999) \set column_values4 random(1,99999999) \set column_values5 random(1,99999999) \set column_values6 random(1,99999999) \set column_values7 random(1,99999999) \set column_values8 random(1,99999999) \set column_values9 random(1,99999999) select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey in(:column_values1,:column_values2,:column_values3,:column_values4,:column_values5,:column_values6,:column_values7,:column_values8,:column_values9); " >> kv_query_in.sql
为了方便统计查询信息,您需要使用pgbench工具。您可以使用如下命令安装pgbench工具。
yum install postgresql-contrib -y
为了避免因工具兼容性问题影响测试,请您安装版本为13及以上的pgbench工具。如果您本地已经安装pgbench工具,请确保其版本为9.6以上。您可以通过执行如下命令查看当前工具版本。
pgbench --version
执行测试语句。
针对单值筛选的场景,使用pgbench工具进行压测。您需要在生成SQL的目录执行如下命令,更多参数说明,请参见参数说明。
PGUSER=<AccessID> PGPASSWORD=<AccessKey> PGDATABASE=<Database> pgbench -h <Endpoint> -p <Port> -c <Client_Num> -T <Query_Seconds> -M prepared -n -f kv_query_single.sql
针对多值筛选的场景,使用pgbench工具进行压测。您需要在生成SQL的目录执行如下命令,更多参数说明,请参见参数说明。
PGUSER=<AccessID> PGPASSWORD=<AccessKey> PGDATABASE=<Database> pgbench -h <Endpoint> -p <Port> -c <Client_Num> -T <Query_Seconds> -M prepared -n -f kv_query_in.sql
数据更新场景
该场景用于测试OLAP引擎在有主键情况下数据更新的性能,以及在主键冲突时更新整行数据。
生成查询
echo " \set O_ORDERKEY random(1,99999999) INSERT INTO public.orders_row(o_orderkey,o_custkey,o_orderstatus,o_totalprice,o_orderdate,o_orderpriority,o_clerk,o_shippriority,o_comment) VALUES (:O_ORDERKEY,1,'demo',1.1,'2021-01-01','demo','demo',1,'demo') on conflict(o_orderkey) do update set (o_orderkey,o_custkey,o_orderstatus,o_totalprice,o_orderdate,o_orderpriority,o_clerk,o_shippriority,o_comment)= ROW(excluded.*); " > /root/insert_on_conflict.sql
插入及更新
PGUSER=<AccessID> PGPASSWORD=<AccessKey> PGDATABASE=<Database> pgbench -h <Endpoint> -p 80 -c <Client_Num> -T <Query_Seconds> -M prepared -n -f /root/insert_on_conflict.sql
示例结果
transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 249 number of threads: 1 duration: 60 s number of transactions actually processed: 1923038 tps = 32005.850214 (including connections establishing) tps = 36403.145722 (excluding connections establishing)
Flink实时写入场景
该场景用于测试实时数据写入能力。
Hologres DDL
该场景Hologres的表拥有10列,其中
key
列为主键,Hologres DDL如下。DROP TABLE IF EXISTS flink_insert; BEGIN ; CREATE TABLE IF NOT EXISTS flink_insert( key INT PRIMARY KEY ,value1 TEXT ,value2 TEXT ,value3 TEXT ,value4 TEXT ,value5 TEXT ,value6 TEXT ,value7 TEXT ,value8 TEXT ,value9 TEXT ); CALL SET_TABLE_PROPERTY('flink_insert', 'orientation', 'row'); CALL SET_TABLE_PROPERTY('flink_insert', 'clustering_key', 'key'); CALL SET_TABLE_PROPERTY('flink_insert', 'distribution_key', 'key'); COMMIT;
Flink作业脚本
使用Flink全托管自带的随机数发生器向Hologres写入数据,当主键冲突时选择整行更新,单行数据量超过512 B,Flink作业脚本如下。
CREATE TEMPORARY TABLE flink_case_1_source ( key INT, value1 VARCHAR, value2 VARCHAR, value3 VARCHAR, value4 VARCHAR, value5 VARCHAR, value6 VARCHAR, value7 VARCHAR, value8 VARCHAR, value9 VARCHAR ) WITH ( 'connector' = 'datagen', -- optional options -- 'rows-per-second' = '1000000000', 'fields.key.min'='1', 'fields.key.max'='2147483647', 'fields.value1.length' = '57', 'fields.value2.length' = '57', 'fields.value3.length' = '57', 'fields.value4.length' = '57', 'fields.value5.length' = '57', 'fields.value6.length' = '57', 'fields.value7.length' = '57', 'fields.value8.length' = '57', 'fields.value9.length' = '57' ); -- 创建 Hologres 结果表 CREATE TEMPORARY TABLE flink_case_1_sink ( key INT, value1 VARCHAR, value2 VARCHAR, value3 VARCHAR, value4 VARCHAR, value5 VARCHAR, value6 VARCHAR, value7 VARCHAR, value8 VARCHAR, value9 VARCHAR ) WITH ( 'connector' = 'hologres', 'dbname'='<yourDbname>', --Hologres的数据库名称。 'tablename'='<yourTablename>', --Hologres用于接收数据的表名称。 'username'='<yourUsername>', --当前阿里云账号的AccessKey ID。 'password'='<yourPassword>', --当前阿里云账号的AccessKey Secret。 'endpoint'='<yourEndpoint>', --当前Hologres实例VPC网络的Endpoint。 'connectionSize' = '10', --默认为3 'jdbcWriteBatchSize' = '1024', --默认为256 'jdbcWriteBatchByteSize' = '2147483647', --默认为20971520 'mutatetype'='insertorreplace' --插入或整行替换已有数据 ); -- 进行 ETL 操作并写入数据 insert into flink_case_1_sink select key, value1, value2, value3, value4, value5, value6, value7, value8, value9 from flink_case_1_source ;
参数说明请参见Hologres结果表。
示例结果
在Hologres的管理控制台的监控信息页面,即可看到RPS的数值。
TPC-H 22条查询语句
TPCH 22条查询语句如下所示,您可以单击表格中的链接进行查看。
名称 | 查询语句 | |||
TPCH 22条查询语句 | ||||
- | - |
Q1
select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '120' day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;
Q2
select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 48 and p_type like '%STEEL' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' and ps_supplycost = ( select min(ps_supplycost) from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' ) order by s_acctbal desc, n_name, s_name, p_partkey limit 100;
Q3
select l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority from customer, orders, lineitem where c_mktsegment = 'MACHINERY' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date '1995-03-23' and l_shipdate > date '1995-03-23' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate limit 10;
Q4
select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '1996-07-01' and o_orderdate < date '1996-07-01' + interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;
Q5
select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' and o_orderdate >= date '1996-01-01' and o_orderdate < date '1996-01-01' + interval '1' year group by n_name order by revenue desc;
Q6
select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date '1996-01-01' and l_shipdate < date '1996-01-01' + interval '1' year and l_discount between 0.02 - 0.01 and 0.02 + 0.01 and l_quantity < 24;
Q7
select supp_nation, cust_nation, l_year, sum(volume) as revenue from ( select n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume from supplier, lineitem, orders, customer, nation n1, nation n2 where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ( (n1.n_name = 'CANADA' and n2.n_name = 'BRAZIL') or (n1.n_name = 'BRAZIL' and n2.n_name = 'CANADA') ) and l_shipdate between date '1995-01-01' and date '1996-12-31' ) as shipping group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year;
Q8
select o_year, sum(case when nation = 'BRAZIL' then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) as volume, n2.n_name as nation from part, supplier, lineitem, orders, customer, nation n1, nation n2, region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'AMERICA' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'LARGE ANODIZED COPPER' ) as all_nations group by o_year order by o_year;
Q9
select nation, o_year, sum(amount) as sum_profit from ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%maroon%' ) as profit group by nation, o_year order by nation, o_year desc;
Q10
select c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_comment from customer, orders, lineitem, nation where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= date '1993-02-01' and o_orderdate < date '1993-02-01' + interval '3' month and l_returnflag = 'R' and c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment order by revenue desc limit 20;
Q11
select ps_partkey, sum(ps_supplycost * ps_availqty) as value from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'EGYPT' group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * 0.0001000000 from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'EGYPT' ) order by value desc;
Q12
select l_shipmode, sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1 else 0 end) as high_line_count, sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count from orders, lineitem where o_orderkey = l_orderkey and l_shipmode in ('FOB', 'AIR') and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= date '1997-01-01' and l_receiptdate < date '1997-01-01' + interval '1' year group by l_shipmode order by l_shipmode;
Q13
select c_count, count(*) as custdist from ( select c_custkey, count(o_orderkey) as c_count from customer left outer join orders on c_custkey = o_custkey and o_comment not like '%special%deposits%' group by c_custkey ) c_orders group by c_count order by custdist desc, c_count desc;
Q14
select 100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount) else 0 end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue from lineitem, part where l_partkey = p_partkey and l_shipdate >= date '1997-06-01' and l_shipdate < date '1997-06-01' + interval '1' month;
Q15
with revenue0(SUPPLIER_NO, TOTAL_REVENUE) as ( select l_suppkey, sum(l_extendedprice * (1 - l_discount)) from lineitem where l_shipdate >= date '1995-02-01' and l_shipdate < date '1995-02-01' + interval '3' month group by l_suppkey ) select s_suppkey, s_name, s_address, s_phone, total_revenue from supplier, revenue0 where s_suppkey = supplier_no and total_revenue = ( select max(total_revenue) from revenue0 ) order by s_suppkey;
Q16
select p_brand, p_type, p_size, count(distinct ps_suppkey) as supplier_cnt from partsupp, part where p_partkey = ps_partkey and p_brand <> 'Brand#45' and p_type not like 'SMALL ANODIZED%' and p_size in (47, 15, 37, 30, 46, 16, 18, 6) and ps_suppkey not in ( select s_suppkey from supplier where s_comment like '%Customer%Complaints%' ) group by p_brand, p_type, p_size order by supplier_cnt desc, p_brand, p_type, p_size;
Q17
select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#51' and p_container = 'WRAP PACK' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey );
Q18
select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_orderkey in ( select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 312 ) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate limit 100;
Q19
select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#52' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 3 and l_quantity <= 3 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#43' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 12 and l_quantity <= 12 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#52' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 21 and l_quantity <= 21 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' );
Q20
select s_name, s_address from supplier, nation where s_suppkey in ( select ps_suppkey from partsupp where ps_partkey in ( select p_partkey from part where p_name like 'drab%' ) and ps_availqty > ( select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date '1996-01-01' and l_shipdate < date '1996-01-01' + interval '1' year ) ) and s_nationkey = n_nationkey and n_name = 'KENYA' order by s_name;
Q21
select s_name, count(*) as numwait from supplier, lineitem l1, orders, nation where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey ) and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) and s_nationkey = n_nationkey and n_name = 'PERU' group by s_name order by numwait desc, s_name limit 100;
Q22
select cntrycode, count(*) as numcust, sum(c_acctbal) as totacctbal from ( select substring(c_phone from 1 for 2) as cntrycode, c_acctbal from customer where substring(c_phone from 1 for 2) in ('24', '32', '17', '18', '12', '14', '22') and c_acctbal > ( select avg(c_acctbal) from customer where c_acctbal > 0.00 and substring(c_phone from 1 for 2) in ('24', '32', '17', '18', '12', '14', '22') ) and not exists ( select * from orders where o_custkey = c_custkey ) ) as custsale group by cntrycode order by cntrycode;