本文将为您介绍如何使用TPC-H(商业智能计算测试)对OLAP查询场景和Key/Value点查场景进行性能测试。
TPC-H简介
以下文字描述引用自TPC Benchmark™ H (TPC-H)规范:
TPC-H是一个决策支持基准,由一套面向业务的临时查询和并发数据修改组成。选择的查询和填充数据库的数据具有广泛的行业相关性。该基准测试说明了决策支持系统可以检查大量数据,执行高度复杂的查询,并解答关键的业务问题。
详情请参见TPCH Specification。
数据集简介
TPC-H(商业智能计算测试)是美国交易处理效能委员会(TPC,Transaction Processing Performance Council)组织制定的用来模拟决策支持类应用的一个测试集。目前在学术界和工业界普遍采用它来评价决策支持技术方面应用的性能。
TPC-H是根据真实的生产运行环境来建模,模拟了一套销售系统的数据仓库。其共包含8张表,数据量可设定从1 GB~3 TB不等。其基准测试共包含了22个查询,主要评价指标各个查询的响应时间,即从提交查询到结果返回所需时间。其测试结果可综合反映系统处理查询时的能力。详情请参见TPC-H基准。
场景说明
- OLAP查询场景测试,主要使用列存表,直接使用TPC-H测试中的22条查询语句进行测试。
- Key/Value点查场景测试,主要使用行存表,针对ORDERS使用行存表后,进行主键过滤的点查。
- 数据更新场景,主要用于测试OLAP引擎在有主键的情况下数据更新的性能。
准备工作
- 基础环境准备
操作步骤 操作说明 1.创建ECS实例 登录阿里云,创建一个ECS实例,用于数据生成、向Hologres导入数据和客户端测试。建议规格具体如下: - ecs.g6.4xlarge规格。
- CentOS 8.4系统。
- ESSD数据盘,具体数据容量根据需要测试的数据量大小决定。
- 建议ECS与Hologres实例在相同地域,使用相同的VPC网络。
2.创建Hologres实例 本次测试环境使用了独享(按量付费)的实例,由于该实例仅用于测试使用,计算资源配置选择64核256GB。您可以根据实际业务需求,选择计算资源配置。
登录阿里云,进入Hologres管理控制台,单击新增引擎实例。
更多关于创建Hologres实例的操作,请参见购买Hologres。
3.创建测试数据库 您需要登录创建的Hologres实例,创建一个数据库。本文示例中命名数据库为tpch_1sf。 更多关于创建数据库的操作,请参见新建数据库。
- 生成TPC-H数据您需要准备数据生成工具,即通过远程连接ECS实例,实现下载、编译并生成TPC-H数据。具体说明如下:
操作步骤 操作说明 1.准备数据生成工具 准备数据生成工具,您需要执行如下内容: - 远程连接ECS实例。
更多关于连接方式的说明,请参见连接方式概述。
- 连接实例后,执行
yum update
,更新所有库。 - 执行
yum install git
,安装git。 - 执行
yum install gcc
,安装gcc。 - 执行
git clone https://github.com/gregrahn/tpch-kit.git
,下载TPC-H数据生成代码。 - 执行
cd tpch-kit/dbgen
,进入数据生成工具代码目录。 - 执行
make
,编译数据生成工具代码。
2.生成数据 编译成功后,您可以执行 ./dbgen --help
查看代码生成工具的相关参数。您可以执行如下内容生成TPC-H数据用于下文的测试,同时您也可以查看生成的文件:-
本文示例仅生成1 GB数据,您可以运行如下代码生成数据。
如果您需要生成更多数据量的数据,可以调整SF的参数。例如,您可以使用如下代码生成1 TB数据。./dbgen -vf -s 1
一般情况下,32CU可以运行TPC-H SF10的数据量,256CU可以运行TPC-H SF50的数据量。./dbgen -vf -s 1000
- 生成的数据存储在ECS中,您可以在数据生成后使用如下代码查看生成的文件。可以看到生成工具生成了8个数据文件,每个数据文件都对应一张数据集中的表。
ls | grep '.*.tbl'
- 远程连接ECS实例。
OLAP查询场景测试
OLAP查询场景测试,您需要创建表并导入数据。数据导入完成后,为更好的执行查询,您可以收集各张表的特征信息,待后台Compaction(压缩)完成后,使用pgbench等工具进行查询测试。具体步骤如下:
Key/Value点查场景测试
Key/Value点查场景测试,您需要创建表并导入数据。数据导入完成后,可以生成查询语句并进行查询测试。具体步骤如下:
数据更新场景
该场景用于测试OLAP引擎在有主键情况下数据更新的性能,以及在主键冲突时更新整行数据。
- 生成查询
echo " \set O_ORDERKEY random(1,99999999) INSERT INTO public.orders_row(o_orderkey,o_custkey,o_orderstatus,o_totalprice,o_orderdate,o_orderpriority,o_clerk,o_shippriority,o_comment) VALUES (:O_ORDERKEY,1,'demo',1.1,'2021-01-01','demo','demo',1,'demo') on conflict(o_orderkey) do update set (o_orderkey,o_custkey,o_orderstatus,o_totalprice,o_orderdate,o_orderpriority,o_clerk,o_shippriority,o_comment)= ROW(excluded.*); " > /root/insert_on_conflict.sql
- 插入及更新
PGUSER=<AccessID> PGPASSWORD=<AccessKey> PGDATABASE=<Database> pgbench -h <Endpoint> -p 80 -c <Client_Num> -T <Query_Seconds> -M prepared -n -f /root/insert_on_conflict.sql
- 示例结果
transaction type: Custom query scaling factor: 1 query mode: prepared number of clients: 249 number of threads: 1 duration: 60 s number of transactions actually processed: 1923038 tps = 32005.850214 (including connections establishing) tps = 36403.145722 (excluding connections establishing)
Flink实时写入场景
该场景用于测试实时数据写入能力。
- Hologres DDL
该场景Hologres的表拥有10列,其中
key
列为主键,Hologres DDL如下。DROP TABLE IF EXISTS flink_insert; BEGIN ; CREATE TABLE IF NOT EXISTS flink_insert( key INT PRIMARY KEY ,value1 TEXT ,value2 TEXT ,value3 TEXT ,value4 TEXT ,value5 TEXT ,value6 TEXT ,value7 TEXT ,value8 TEXT ,value9 TEXT ); CALL SET_TABLE_PROPERTY('flink_insert', 'orientation', 'row'); CALL SET_TABLE_PROPERTY('flink_insert', 'clustering_key', 'key'); CALL SET_TABLE_PROPERTY('flink_insert', 'distribution_key', 'key'); COMMIT;
- Flink作业脚本
使用Flink全托管自带的随机数发生器向Hologres写入数据,当主键冲突时选择整行更新,单行数据量超过512 B,Flink作业脚本如下。
参数说明请参见Hologres结果表。CREATE TEMPORARY TABLE flink_case_1_source ( key INT, value1 VARCHAR, value2 VARCHAR, value3 VARCHAR, value4 VARCHAR, value5 VARCHAR, value6 VARCHAR, value7 VARCHAR, value8 VARCHAR, value9 VARCHAR ) WITH ( 'connector' = 'datagen', -- optional options -- 'rows-per-second' = '1000000000', 'fields.key.min'='1', 'fields.key.max'='2147483647', 'fields.value1.length' = '57', 'fields.value2.length' = '57', 'fields.value3.length' = '57', 'fields.value4.length' = '57', 'fields.value5.length' = '57', 'fields.value6.length' = '57', 'fields.value7.length' = '57', 'fields.value8.length' = '57', 'fields.value9.length' = '57' ); -- 创建 Hologres 结果表 CREATE TEMPORARY TABLE flink_case_1_sink ( key INT, value1 VARCHAR, value2 VARCHAR, value3 VARCHAR, value4 VARCHAR, value5 VARCHAR, value6 VARCHAR, value7 VARCHAR, value8 VARCHAR, value9 VARCHAR ) WITH ( 'connector' = 'hologres', 'dbname'='<yourDbname>', --Hologres的数据库名称。 'tablename'='<yourTablename>', --Hologres用于接收数据的表名称。 'username'='<yourUsername>', --当前阿里云账号的AccessKey ID。 'password'='<yourPassword>', --当前阿里云账号的AccessKey Secret。 'endpoint'='<yourEndpoint>', --当前Hologres实例VPC网络的Endpoint。 'connectionSize' = '10', --默认为3 'jdbcWriteBatchSize' = '1024', --默认为256 'jdbcWriteBatchByteSize' = '2147483647', --默认为20971520 'mutatetype'='insertorreplace' --插入或整行替换已有数据 ); -- 进行 ETL 操作并写入数据 insert into flink_case_1_sink select key, value1, value2, value3, value4, value5, value6, value7, value8, value9 from flink_case_1_source ;
- 示例结果
在Hologres的管理控制台的监控信息页面,即可看到RPS的数值。
TPC-H 22条查询语句
名称 | 查询语句 | |||
---|---|---|---|---|
TPCH 22条查询语句 | Q1 | Q2 | Q3 | Q4 |
Q5 | Q6 | Q7 | Q8 | |
Q9 | Q10 | Q11 | Q12 | |
Q13 | Q14 | Q15 | Q16 | |
Q17 | Q18 | Q19 | Q20 | |
Q21 | Q22 | - | - |
- Q1
select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '120' day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;
- Q2
select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 48 and p_type like '%STEEL' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' and ps_supplycost = ( select min(ps_supplycost) from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' ) order by s_acctbal desc, n_name, s_name, p_partkey limit 100;
- Q3
select l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority from customer, orders, lineitem where c_mktsegment = 'MACHINERY' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date '1995-03-23' and l_shipdate > date '1995-03-23' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate limit 10;
- Q4
select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '1996-07-01' and o_orderdate < date '1996-07-01' + interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;
- Q5
select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' and o_orderdate >= date '1996-01-01' and o_orderdate < date '1996-01-01' + interval '1' year group by n_name order by revenue desc;
- Q6
select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date '1996-01-01' and l_shipdate < date '1996-01-01' + interval '1' year and l_discount between 0.02 - 0.01 and 0.02 + 0.01 and l_quantity < 24;
- Q7
select supp_nation, cust_nation, l_year, sum(volume) as revenue from ( select n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume from supplier, lineitem, orders, customer, nation n1, nation n2 where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ( (n1.n_name = 'CANADA' and n2.n_name = 'BRAZIL') or (n1.n_name = 'BRAZIL' and n2.n_name = 'CANADA') ) and l_shipdate between date '1995-01-01' and date '1996-12-31' ) as shipping group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year;
- Q8
select o_year, sum(case when nation = 'BRAZIL' then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) as volume, n2.n_name as nation from part, supplier, lineitem, orders, customer, nation n1, nation n2, region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'AMERICA' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'LARGE ANODIZED COPPER' ) as all_nations group by o_year order by o_year;
- Q9
select nation, o_year, sum(amount) as sum_profit from ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%maroon%' ) as profit group by nation, o_year order by nation, o_year desc;
- Q10
select c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_comment from customer, orders, lineitem, nation where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= date '1993-02-01' and o_orderdate < date '1993-02-01' + interval '3' month and l_returnflag = 'R' and c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment order by revenue desc limit 20;
- Q11
select ps_partkey, sum(ps_supplycost * ps_availqty) as value from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'EGYPT' group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * 0.0001000000 from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'EGYPT' ) order by value desc;
- Q12
select l_shipmode, sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1 else 0 end) as high_line_count, sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count from orders, lineitem where o_orderkey = l_orderkey and l_shipmode in ('FOB', 'AIR') and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= date '1997-01-01' and l_receiptdate < date '1997-01-01' + interval '1' year group by l_shipmode order by l_shipmode;
- Q13
select c_count, count(*) as custdist from ( select c_custkey, count(o_orderkey) as c_count from customer left outer join orders on c_custkey = o_custkey and o_comment not like '%special%deposits%' group by c_custkey ) c_orders group by c_count order by custdist desc, c_count desc;
- Q14
select 100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount) else 0 end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue from lineitem, part where l_partkey = p_partkey and l_shipdate >= date '1997-06-01' and l_shipdate < date '1997-06-01' + interval '1' month;
- Q15
with revenue0(SUPPLIER_NO, TOTAL_REVENUE) as ( select l_suppkey, sum(l_extendedprice * (1 - l_discount)) from lineitem where l_shipdate >= date '1995-02-01' and l_shipdate < date '1995-02-01' + interval '3' month group by l_suppkey ) select s_suppkey, s_name, s_address, s_phone, total_revenue from supplier, revenue0 where s_suppkey = supplier_no and total_revenue = ( select max(total_revenue) from revenue0 ) order by s_suppkey;
- Q16
select p_brand, p_type, p_size, count(distinct ps_suppkey) as supplier_cnt from partsupp, part where p_partkey = ps_partkey and p_brand <> 'Brand#45' and p_type not like 'SMALL ANODIZED%' and p_size in (47, 15, 37, 30, 46, 16, 18, 6) and ps_suppkey not in ( select s_suppkey from supplier where s_comment like '%Customer%Complaints%' ) group by p_brand, p_type, p_size order by supplier_cnt desc, p_brand, p_type, p_size;
- Q17
select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#51' and p_container = 'WRAP PACK' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey );
- Q18
select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_orderkey in ( select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 312 ) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate limit 100;
- Q19
select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#52' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 3 and l_quantity <= 3 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#43' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 12 and l_quantity <= 12 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#52' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 21 and l_quantity <= 21 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' );
- Q20
select s_name, s_address from supplier, nation where s_suppkey in ( select ps_suppkey from partsupp where ps_partkey in ( select p_partkey from part where p_name like 'drab%' ) and ps_availqty > ( select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date '1996-01-01' and l_shipdate < date '1996-01-01' + interval '1' year ) ) and s_nationkey = n_nationkey and n_name = 'KENYA' order by s_name;
- Q21
select s_name, count(*) as numwait from supplier, lineitem l1, orders, nation where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey ) and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) and s_nationkey = n_nationkey and n_name = 'PERU' group by s_name order by numwait desc, s_name limit 100;
- Q22
select cntrycode, count(*) as numcust, sum(c_acctbal) as totacctbal from ( select substring(c_phone from 1 for 2) as cntrycode, c_acctbal from customer where substring(c_phone from 1 for 2) in ('24', '32', '17', '18', '12', '14', '22') and c_acctbal > ( select avg(c_acctbal) from customer where c_acctbal > 0.00 and substring(c_phone from 1 for 2) in ('24', '32', '17', '18', '12', '14', '22') ) and not exists ( select * from orders where o_custkey = c_custkey ) ) as custsale group by cntrycode order by cntrycode;