全部产品
Search
文档中心

实时数仓Hologres:测试方案介绍

更新时间:Feb 02, 2024

本文将为您介绍如何使用TPC-H(商业智能计算测试)对OLAP查询场景和Key/Value点查场景进行性能测试。

TPC-H简介

以下文字描述引用自TPC Benchmark™ H (TPC-H)规范:

TPC-H是一个决策支持基准,由一套面向业务的临时查询和并发数据修改组成。选择的查询和填充数据库的数据具有广泛的行业相关性。该基准测试说明了决策支持系统可以检查大量数据,执行高度复杂的查询,并解答关键的业务问题。

详情请参见TPCH Specification

说明

本文的TPC-H的实现基于TPC-H的基准测试,并不能与已发布的TPC-H基准测试结果相比较,本文中的测试并不符合TPC-H基准测试的所有要求。

数据集简介

TPC-H(商业智能计算测试)是美国交易处理效能委员会(TPC,Transaction Processing Performance Council)组织制定的用来模拟决策支持类应用的一个测试集。目前在学术界和工业界普遍采用它来评价决策支持技术方面应用的性能。

TPC-H是根据真实的生产运行环境来建模,模拟了一套销售系统的数据仓库。其共包含8张表,数据量可设定从1 GB~3 TB不等。其基准测试共包含了22个查询,主要评价指标各个查询的响应时间,即从提交查询到结果返回所需时间。其测试结果可综合反映系统处理查询时的能力。详情请参见TPC-H基准

场景说明

本测试场景主要包含如下内容:

测试数据量会直接影响测试结果,TPC-H的生成工具中使用SF(scale factor)控制生成数据量的大小,1SF对应1GB。

说明

以上的数据量仅针对原始数据的数据量,不包括索引等空间占用,因此在准备环境时,您需要预留更多的空间。

准备工作

您需要准备OLAP查询场景和Key/Value点查场景所需的基础环境和TPC-H数据。具体内容如下:

说明

为了减少可能对测试结果有影响的变量,建议每次使用新创建的实例进行测试,不要使用升降配的实例。

  • 基础环境准备

    操作步骤

    操作说明

    1.创建ECS实例

    登录阿里云,创建一个ECS实例,用于数据生成、向Hologres导入数据和客户端测试。建议规格具体如下:

    • ecs.g6.4xlarge规格。

    • CentOS 8.4系统。

    • ESSD数据盘,具体数据容量根据需要测试的数据量大小决定。

    • 建议ECS与Hologres实例在相同地域,使用相同的VPC网络。

    更多关于创建ECS的操作,请参见创建实例

    2.创建Hologres实例

    本次测试环境使用了独享(按量付费)的实例,由于该实例仅用于测试使用,计算资源配置选择64核256GB。您可以根据实际业务需求,选择计算资源配置。

    登录阿里云,进入Hologres管理控制台,在实例列表页面,单击新增引擎实例

    更多关于创建Hologres实例的操作,请参见购买Hologres

    3.创建测试数据库

    您需要登录创建的Hologres实例,创建一个数据库。本文示例中命名数据库为tpch_1sf

    更多关于创建数据库的操作,请参见新建数据库

  • 生成TPC-H数据

    您需要准备数据生成工具,即通过远程连接ECS实例,实现下载、编译并生成TPC-H数据。具体说明如下:

    操作步骤

    操作说明

    1.准备数据生成工具

    准备数据生成工具,您需要执行如下内容:

    1. 远程连接ECS实例。

      更多关于连接方式的说明,请参见连接方式概述

    2. 连接实例后,执行yum update,更新所有库。

    3. 执行yum install git,安装git。

    4. 执行yum install gcc,安装gcc。

    5. 执行git clone https://github.com/gregrahn/tpch-kit.git,下载TPC-H数据生成代码。

    6. 执行cd tpch-kit/dbgen,进入数据生成工具代码目录。

    7. 执行make,编译数据生成工具代码。

    2.生成数据

    编译成功后,您可以执行./dbgen --help查看代码生成工具的相关参数。您可以执行如下内容生成TPC-H数据用于下文的测试,同时您也可以查看生成的文件:

    • 本文示例仅生成1 GB数据,您可以运行如下代码生成数据。

      ./dbgen -vf -s 1

      如果您需要生成更多数据量的数据,可以调整SF的参数。例如,您可以使用如下代码生成1 TB数据。

      ./dbgen -vf -s 1000

      一般情况下,32CU可以运行TPC-H SF10的数据量,256CU可以运行TPC-H SF50的数据量。

    • 生成的数据存储在ECS中,您可以在数据生成后使用如下代码查看生成的文件。可以看到生成工具生成了8个数据文件,每个数据文件都对应一张数据集中的表。

      ls | grep '.*.tbl'

OLAP查询场景测试

OLAP查询场景测试,您需要创建表并导入数据。数据导入完成后,为更好的执行查询,您可以收集各张表的特征信息,待后台Compaction(压缩)完成后,使用pgbench等工具进行查询测试。具体步骤如下:

  1. 创建表

    1. 本文内容主要基于psql进行数据导入操作,您需要先在ECS中运行如下命令安装psql。

      yum install postgresql-server -y
    2. 安装psql后,您可以使用如下命令登录Hologres实例。更多关于参数的说明,请参见连接psql参数说明

      PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database>
    3. 使用psql连接Hologres后,您可以使用如下建表语句创建数据库表。

      DROP TABLE IF EXISTS LINEITEM;
      
      BEGIN;
      CREATE TABLE LINEITEM (
          L_ORDERKEY bigint NOT NULL,
          L_PARTKEY int NOT NULL,
          L_SUPPKEY int NOT NULL,
          L_LINENUMBER int NOT NULL,
          L_QUANTITY DECIMAL(15, 2) NOT NULL,
          L_EXTENDEDPRICE DECIMAL(15, 2) NOT NULL,
          L_DISCOUNT DECIMAL(15, 2) NOT NULL,
          L_TAX DECIMAL(15, 2) NOT NULL,
          L_RETURNFLAG text NOT NULL,
          L_LINESTATUS text NOT NULL,
          L_SHIPDATE date NOT NULL,
          L_COMMITDATE date NOT NULL,
          L_RECEIPTDATE date NOT NULL,
          L_SHIPINSTRUCT text NOT NULL,
          L_SHIPMODE text NOT NULL,
          L_COMMENT text NOT NULL,
          PRIMARY KEY (L_ORDERKEY, L_LINENUMBER)
      );
      CALL set_table_property ('LINEITEM', 'clustering_key', 'L_SHIPDATE,L_ORDERKEY');
      CALL set_table_property ('LINEITEM', 'segment_key', 'L_SHIPDATE');
      CALL set_table_property ('LINEITEM', 'distribution_key', 'L_ORDERKEY');
      CALL set_table_property ('LINEITEM', 'bitmap_columns', 'L_RETURNFLAG,L_LINESTATUS,L_SHIPINSTRUCT,L_SHIPMODE');
      CALL set_table_property ('LINEITEM', 'dictionary_encoding_columns', 'l_comment:off,l_returnflag,l_linestatus,l_shipinstruct,l_shipmode');
      COMMIT;
      
      DROP TABLE IF EXISTS ORDERS;
      
      BEGIN;
      CREATE TABLE ORDERS (
          O_ORDERKEY bigint NOT NULL PRIMARY KEY,
          O_CUSTKEY int NOT NULL,
          O_ORDERSTATUS text NOT NULL,
          O_TOTALPRICE DECIMAL(15, 2) NOT NULL,
          O_ORDERDATE date NOT NULL,
          O_ORDERPRIORITY text NOT NULL,
          O_CLERK text NOT NULL,
          O_SHIPPRIORITY int NOT NULL,
          O_COMMENT text NOT NULL
      );
      CALL set_table_property ('ORDERS', 'segment_key', 'O_ORDERDATE');
      CALL set_table_property ('ORDERS', 'colocate_with', 'lineitem');
      CALL set_table_property ('ORDERS', 'distribution_key', 'O_ORDERKEY');
      CALL set_table_property ('ORDERS', 'bitmap_columns', 'O_ORDERSTATUS,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY');
      CALL set_table_property ('ORDERS', 'dictionary_encoding_columns', 'o_comment:off,o_orderstatus,o_orderpriority,o_clerk');
      COMMIT;
      
      DROP TABLE IF EXISTS PARTSUPP;
      
      BEGIN;
      CREATE TABLE PARTSUPP (
          PS_PARTKEY int NOT NULL,
          PS_SUPPKEY int NOT NULL,
          PS_AVAILQTY int NOT NULL,
          PS_SUPPLYCOST DECIMAL(15, 2) NOT NULL,
          PS_COMMENT text NOT NULL,
          PRIMARY KEY (PS_PARTKEY, PS_SUPPKEY)
      );
      CALL set_table_property ('PARTSUPP', 'distribution_key', 'PS_PARTKEY');
      CALL set_table_property ('PARTSUPP', 'colocate_with', 'lineitem');
      CALL set_table_property ('PARTSUPP', 'bitmap_columns', 'ps_availqty');
      CALL set_table_property ('PARTSUPP', 'dictionary_encoding_columns', '');
      COMMIT;
      
      DROP TABLE IF EXISTS PART;
      
      BEGIN;
      CREATE TABLE PART (
          P_PARTKEY int NOT NULL PRIMARY KEY,
          P_NAME text NOT NULL,
          P_MFGR text NOT NULL,
          P_BRAND text NOT NULL,
          P_TYPE text NOT NULL,
          P_SIZE int NOT NULL,
          P_CONTAINER text NOT NULL,
          P_RETAILPRICE DECIMAL(15, 2) NOT NULL,
          P_COMMENT text NOT NULL
      );
      CALL set_table_property ('PART', 'distribution_key', 'P_PARTKEY');
      CALL set_table_property ('PART', 'colocate_with', 'lineitem');
      CALL set_table_property ('PART', 'bitmap_columns', 'P_SIZE,P_MFGR,P_BRAND,P_TYPE,P_CONTAINER');
      CALL set_table_property ('PART', 'dictionary_encoding_columns', 'p_name:off,p_comment:off,p_mfgr,p_brand,p_type,p_container');
      COMMIT;
      
      DROP TABLE IF EXISTS CUSTOMER;
      
      BEGIN;
      CREATE TABLE CUSTOMER (
          C_CUSTKEY int NOT NULL PRIMARY KEY,
          C_NAME text NOT NULL,
          C_ADDRESS text NOT NULL,
          C_NATIONKEY int NOT NULL,
          C_PHONE text NOT NULL,
          C_ACCTBAL DECIMAL(15, 2) NOT NULL,
          C_MKTSEGMENT text NOT NULL,
          C_COMMENT text NOT NULL
      );
      CALL set_table_property ('CUSTOMER', 'distribution_key', 'C_CUSTKEY');
      CALL set_table_property ('CUSTOMER', 'colocate_with', 'lineitem');
      CALL set_table_property ('CUSTOMER', 'bitmap_columns', 'C_NATIONKEY,C_MKTSEGMENT');
      CALL set_table_property ('CUSTOMER', 'dictionary_encoding_columns', 'c_name:off,c_address:off,c_phone:off,c_comment:off,c_mktsegment');
      COMMIT;
      
      DROP TABLE IF EXISTS SUPPLIER;
      
      BEGIN;
      CREATE TABLE SUPPLIER (
          S_SUPPKEY int NOT NULL PRIMARY KEY,
          S_NAME text NOT NULL,
          S_ADDRESS text NOT NULL,
          S_NATIONKEY int NOT NULL,
          S_PHONE text NOT NULL,
          S_ACCTBAL DECIMAL(15, 2) NOT NULL,
          S_COMMENT text NOT NULL
      );
      CALL set_table_property ('SUPPLIER', 'distribution_key', 'S_SUPPKEY');
      CALL set_table_property ('SUPPLIER', 'colocate_with', 'lineitem');
      CALL set_table_property ('SUPPLIER', 'bitmap_columns', 'S_NATIONKEY');
      CALL set_table_property ('SUPPLIER', 'dictionary_encoding_columns', '');
      COMMIT;
      
      DROP TABLE IF EXISTS NATION;
      
      BEGIN;
      CREATE TABLE NATION (
          N_NATIONKEY int NOT NULL PRIMARY KEY,
          N_NAME text NOT NULL,
          N_REGIONKEY int NOT NULL,
          N_COMMENT text NOT NULL
      );
      CALL set_table_property ('NATION', 'distribution_key', 'N_NATIONKEY');
      CALL set_table_property ('NATION', 'colocate_with', 'lineitem');
      CALL set_table_property ('NATION', 'bitmap_columns', '');
      CALL set_table_property ('NATION', 'dictionary_encoding_columns', '');
      COMMIT;
      
      DROP TABLE IF EXISTS REGION;
      
      BEGIN;
      CREATE TABLE REGION (
          R_REGIONKEY int NOT NULL PRIMARY KEY,
          R_NAME text NOT NULL,
          R_COMMENT text
      );
      CALL set_table_property ('REGION', 'distribution_key', 'R_REGIONKEY');
      CALL set_table_property ('REGION', 'colocate_with', 'lineitem');
      CALL set_table_property ('REGION', 'bitmap_columns', '');
      CALL set_table_property ('REGION', 'dictionary_encoding_columns', '');
      COMMIT;

      数据表创建完毕后,您可以在psql中执行tpch_1sf=# \dt查看是否创建成功。

      tpch_1sf=# \dt
                     List of relations
       Schema |   Name   | Type  |       Owner        
      --------+----------+-------+--------------------
       public | customer | table | tpch_1sf_developer
       public | lineitem | table | tpch_1sf_developer
       public | nation   | table | tpch_1sf_developer
       public | orders   | table | tpch_1sf_developer
       public | part     | table | tpch_1sf_developer
       public | partsupp | table | tpch_1sf_developer
       public | region   | table | tpch_1sf_developer
       public | supplier | table | tpch_1sf_developer
      (8 rows)
  2. 导入数据

    您可以使用如下方式导入数据:

    • 本文主要使用COPY FROM STDIN的方式导入数据。更多关于COPY FROM STDIN的详细操作,请参见使用COPY命令导入或导出本地数据。此处会将tbl数据文件导入Hologres创建的表中,tbl准备工作中生成的TPC-H数据。

    • 您也可以在数据生成工具的目录中,使用如下Shell脚本导入数据。更多关于参数的说明,请参见参数说明

      for i in `ls *.tbl`; do
          echo $i;
          name=`echo $i| cut -d'.' -f1`;
          PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY $name from stdin with delimiter '|' csv;" < $i;
      done
  3. 收集统计信息

    为了更好的执行查询,您可以在psql中使用如下语句,收集各张表的特征信息,建议等后台Compaction完成后(CPU消耗下降并平稳)再执行查询。

    --收集表的统计信息
    analyze nation;
    analyze region;
    analyze lineitem;
    analyze orders;
    analyze customer;
    analyze part;
    analyze partsupp;
    analyze supplier;
    --针对非主键的JOIN KEY收集统计信息
    analyze lineitem (l_orderkey,l_partkey,l_suppkey);
    analyze orders (o_custkey);
    analyze partsupp(ps_partkey,ps_suppkey);
  4. 执行查询

    1. 为了方便统计查询信息,您需要使用pgbench工具。您可以使用如下命令安装pgbench工具。

      yum install postgresql-contrib -y

      为了避免因工具兼容性问题影响测试,请您安装版本为13及以上的pgbench工具。如果您本地已经安装pgbench工具,请确保其版本为9.6以上。您可以通过执行如下命令查看当前工具版本。

      pgbench --version
    2. 为了方便查询,您可以直接单击下载TPCH 22条查询语句,并将下载的语句上传至ECS。更多关于上传数据的操作,请参见上传本地文件到ECS实例

    3. 进入ECS,并访问上传文件的目录,使用如下Shell命令解压缩文件。

      unzip tpch_query_updated_20210721.zip
    4. 您可以使用如下方式执行TPC-H的查询语句。查询语句具体内容,请参见TPC-H 22条查询语句

      • 使用pgbench工具,执行单条查询。

        TPC-H共有22条语句,例如您需要执行Q6语句,则具体语句为6.sql

        PGUSER=<AccessID> PGPASSWORD=<AccessKey> PGDATABASE=<Database> pgbench -h <Endpoint> -p <Port> -c <Client_Num> -T <Query_Seconds> -n -f xxx.sql
      • 使用Shell脚本,批量执行22条查询,并将结果输出到文件hologres_tpch_test.out中。

        rm -f hologres_tpch_test.out
        echo `date +"%Y-%m-%d %H:%M:%S"` begin >> ./hologres_tpch_test.out
        for i in {1..22}
        do
            PGUSER=<AccessID> PGPASSWORD=<AccessKey> PGDATABASE=<Database> pgbench -h <Endpoint> -p <Port> -c <Client_Num> -T <Query_Seconds> -n -f ./tpch_query_updated_20210721/q${i}.sql >> ./hologres_tpch_test.out
        done

      具体参数说明如下所示:

      参数

      说明

      PGUSER

      当前阿里云账号的AccessKey ID。

      您可以单击AccessKey 管理,获取AccessKey ID。

      PGPASSWORD

      当前阿里云账号的AccessKey Secret。

      您可以单击AccessKey 管理,获取AccessKey Secret。

      -h

      Hologres实例的网络地址(Endpoint)。

      您可以进入Hologres管理控制台的实例详情页,从网络信息区域获取网络地址。

      -p

      Hologres实例的端口(Port)。

      您可以进入Hologres管理控制台的实例详情页,从网络信息区域获取实例端口。

      -d

      Hologres指定实例中的数据库名(Database)。

      -c

      客户端数目,即并发度(Client_Num)。

      本示例取值为1,由于该测试仅测试查询性能,不测试并发,将并发度置为1即可。

      -T

      每个客户端需要执行的每个Query的压测时长(Query_Seconds,秒)。例如本示例取值为300。

      -f

      压力测试的SQL语句。

      • 当您执行单条查询时,执行语句为xxx.sqlTPC-H共有22条语句,如您需要执行Q6语句,则具体语句为6.sql

      • 当您执行批量查询时,批量脚本中该参数的取值无需修改。

      TPC-H 22条查询语句具体内容,请参见TPC-H 22条查询语句

    5. 您可以通过查看hologres_tpch_test.out得到查询结果。其中,transaction type表示执行具体的SQL文件。latency average表示对应SQL文件的3次查询的平均时间。

      2021-03-23 03:50:54 begin
      pghost: hgpostcn-cn-oew21c935002-cn-hangzhou.hologres.aliyuncs.com pgport: 80 nclients: 1 nxacts: 3 dbName: tpch_100
      transaction type: ./tpch_data_tpch_query/1.sql
      scaling factor: 1
      query mode: simple
      number of clients: 1
      number of threads: 1
      number of transactions per client: 3
      number of transactions actually processed: 3/3
      latency average = 76.936 ms
      tps = 12.997850 (including connections establishing)
      tps = 15.972757 (excluding connections establishing)
      ...

Key/Value点查场景测试

Key/Value点查场景测试,您需要创建表并导入数据。数据导入完成后,可以生成查询语句并进行查询测试。具体步骤如下:

  1. 创建表

    Key/Value点查场景测试继续使用OLAP查询场景创建的数据库,使用TPCH数据集中的ORDERS表进行测试。您在使用psql连接Hologres后即可运行如下命令创建表。psql连接Hologres操作,请参见连接psql参数说明

    说明

    由于点查场景需要使用行存表,所以不能使用OLAP查询场景中使用的表,需要新创建一张表。

    DROP TABLE IF EXISTS public.orders_row;
    
    BEGIN;
    CREATE TABLE public.orders_row(
        O_ORDERKEY       BIGINT            NOT NULL PRIMARY KEY
        ,O_CUSTKEY       INT            NOT NULL
        ,O_ORDERSTATUS   TEXT           NOT NULL
        ,O_TOTALPRICE    DECIMAL(15,2)  NOT NULL
        ,O_ORDERDATE     TIMESTAMPTZ    NOT NULL
        ,O_ORDERPRIORITY TEXT           NOT NULL
        ,O_CLERK         TEXT           NOT NULL
        ,O_SHIPPRIORITY  INT            NOT NULL
        ,O_COMMENT       TEXT           NOT NULL
    );
    CALL SET_TABLE_PROPERTY('public.orders_row', 'orientation', 'row');
    CALL SET_TABLE_PROPERTY('public.orders_row', 'clustering_key', 'o_orderkey');
    CALL SET_TABLE_PROPERTY('public.orders_row', 'distribution_key', 'o_orderkey');
    COMMIT;
  2. 导入数据

    您可以使用如下方式导入数据:

    • COPY方式导入数据

      • 本文主要使用COPY FROM STDIN的方式导入数据。更多关于COPY FROM STDIN的详细操作,请参见使用COPY命令导入或导出本地数据。此处会将tbl数据文件导入Hologres创建的表中,tbl准备工作中生成的TPC-H数据。

      • 您也可以在数据生成工具的目录中,使用如下Shell脚本导入数据。更多关于参数的说明,请参见参数说明

        PGUSER=<AccessID> PGPASSWORD=<AccessKey> psql -p <Port> -h <Endpoint> -d <Database> -c "COPY public.orders_row from stdin with delimiter '|' csv;" < orders.tbl
    • INSERT INTO方式导入数据

      在OLAP场景测试时已经导入了ORDERS表的数据,您可以运行如下SQL语句导入数据。

      INSERT INTO public.orders_row
      SELECT  *
      FROM    public.orders;
  3. 执行查询

    1. 生成查询语句。

      Key/Value点查场景主要有两种查询场景,具体查询语句如下:

      查询方式

      查询语句

      说明

      单值筛选

      SELECT  column_a
              ,column_b
              ,...
              ,column_x
      FROM    table_x
      WHERE   pk = value_x
      ;

      此查询语句主要用于单值筛选,即WHERE的SQL语句取值唯一。

      多值筛选

      SELECT  column_a
              ,column_b
              ,...
              ,column_x
      FROM    table_x
      WHERE   pk IN ( value_a, value_b,..., value_x )
      ;

      此查询语句主要用于多值筛选,即WHERE的SQL语句可以取多个值。

      您可以使用如下脚本生成所需的SQL语句,该脚本会生成2条SQL语句。其中kv_query_single.sql表示单值筛选的SQL。kv_query_in.sql表示多值筛选的SQL,该脚本会随机生成一个针对10个值筛选的SQL。

      rm -rf kv_query
      mkdir kv_query
      cd kv_query
      echo "
      \set column_values random(1,99999999)
      select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey =:column_values;
      " >> kv_query_single.sql
      echo "
      \set column_values1 random(1,99999999)
      \set column_values2 random(1,99999999)
      \set column_values3 random(1,99999999)
      \set column_values4 random(1,99999999)
      \set column_values5 random(1,99999999)
      \set column_values6 random(1,99999999)
      \set column_values7 random(1,99999999)
      \set column_values8 random(1,99999999)
      \set column_values9 random(1,99999999)
      select O_ORDERKEY,O_CUSTKEY,O_ORDERSTATUS,O_TOTALPRICE,O_ORDERDATE,O_ORDERPRIORITY,O_CLERK,O_SHIPPRIORITY,O_COMMENT from public.orders_row WHERE o_orderkey in(:column_values1,:column_values2,:column_values3,:column_values4,:column_values5,:column_values6,:column_values7,:column_values8,:column_values9);
      " >> kv_query_in.sql
    2. 为了方便统计查询信息,您需要使用pgbench工具。您可以使用如下命令安装pgbench工具。

      yum install postgresql-contrib -y

      为了避免因工具兼容性问题影响测试,请您安装版本为13及以上的pgbench工具。如果您本地已经安装pgbench工具,请确保其版本为9.6以上。您可以通过执行如下命令查看当前工具版本。

      pgbench --version
    3. 执行测试语句。

      • 针对单值筛选的场景,使用pgbench工具进行压测。您需要在生成SQL的目录执行如下命令,更多参数说明,请参见参数说明

        PGUSER=<AccessID> PGPASSWORD=<AccessKey> PGDATABASE=<Database> pgbench -h <Endpoint> -p <Port> -c <Client_Num> -T <Query_Seconds> -M prepared -n -f kv_query_single.sql
      • 针对多值筛选的场景,使用pgbench工具进行压测。您需要在生成SQL的目录执行如下命令,更多参数说明,请参见参数说明

        PGUSER=<AccessID> PGPASSWORD=<AccessKey> PGDATABASE=<Database> pgbench -h <Endpoint> -p <Port> -c <Client_Num> -T <Query_Seconds> -M prepared -n -f kv_query_in.sql

数据更新场景

该场景用于测试OLAP引擎在有主键情况下数据更新的性能,以及在主键冲突时更新整行数据。

  • 生成查询

    echo "
    \set O_ORDERKEY random(1,99999999)
    INSERT INTO public.orders_row(o_orderkey,o_custkey,o_orderstatus,o_totalprice,o_orderdate,o_orderpriority,o_clerk,o_shippriority,o_comment) VALUES (:O_ORDERKEY,1,'demo',1.1,'2021-01-01','demo','demo',1,'demo') on conflict(o_orderkey) do update set (o_orderkey,o_custkey,o_orderstatus,o_totalprice,o_orderdate,o_orderpriority,o_clerk,o_shippriority,o_comment)= ROW(excluded.*);
    " > /root/insert_on_conflict.sql
  • 插入及更新

    PGUSER=<AccessID> PGPASSWORD=<AccessKey> PGDATABASE=<Database> pgbench -h <Endpoint> -p 80 -c <Client_Num> -T <Query_Seconds> -M prepared -n -f /root/insert_on_conflict.sql
  • 示例结果

    transaction type: Custom query
    scaling factor: 1
    query mode: prepared
    number of clients: 249
    number of threads: 1
    duration: 60 s
    number of transactions actually processed: 1923038
    tps = 32005.850214 (including connections establishing)
    tps = 36403.145722 (excluding connections establishing)

Flink实时写入场景

该场景用于测试实时数据写入能力。

  • Hologres DDL

    该场景Hologres的表拥有10列,其中key列为主键,Hologres DDL如下。

    DROP TABLE IF EXISTS flink_insert;
    
    BEGIN ;
    CREATE TABLE IF NOT EXISTS flink_insert(
      key INT PRIMARY KEY
      ,value1 TEXT
      ,value2 TEXT
      ,value3 TEXT
      ,value4 TEXT
      ,value5 TEXT
      ,value6 TEXT
      ,value7 TEXT
      ,value8 TEXT
      ,value9 TEXT
    );
    CALL SET_TABLE_PROPERTY('flink_insert', 'orientation', 'row');
    CALL SET_TABLE_PROPERTY('flink_insert', 'clustering_key', 'key');
    CALL SET_TABLE_PROPERTY('flink_insert', 'distribution_key', 'key');
    COMMIT;
  • Flink作业脚本

    使用Flink全托管自带的随机数发生器向Hologres写入数据,当主键冲突时选择整行更新,单行数据量超过512 B,Flink作业脚本如下。

    CREATE TEMPORARY TABLE flink_case_1_source (
        key INT,
        value1 VARCHAR,
        value2 VARCHAR,
        value3 VARCHAR,
        value4 VARCHAR,
        value5 VARCHAR,
        value6 VARCHAR,
        value7 VARCHAR,
        value8 VARCHAR,
        value9 VARCHAR
      )
    WITH (
        'connector' = 'datagen',
         -- optional options --
        'rows-per-second' = '1000000000',
        'fields.key.min'='1',
        'fields.key.max'='2147483647',
        'fields.value1.length' = '57',
        'fields.value2.length' = '57',
        'fields.value3.length' = '57',
        'fields.value4.length' = '57',
        'fields.value5.length' = '57',
        'fields.value6.length' = '57',
        'fields.value7.length' = '57',
        'fields.value8.length' = '57',
        'fields.value9.length' = '57'
      );
    
    -- 创建 Hologres 结果表
    CREATE TEMPORARY TABLE flink_case_1_sink (
        key INT,
        value1 VARCHAR,
        value2 VARCHAR,
        value3 VARCHAR,
        value4 VARCHAR,
        value5 VARCHAR,
        value6 VARCHAR,
        value7 VARCHAR,
        value8 VARCHAR,
        value9 VARCHAR
      )
    WITH (
        'connector' = 'hologres',
        'dbname'='<yourDbname>',  --Hologres的数据库名称。
        'tablename'='<yourTablename>',  --Hologres用于接收数据的表名称。
        'username'='<yourUsername>',  --当前阿里云账号的AccessKey ID。
        'password'='<yourPassword>',  --当前阿里云账号的AccessKey Secret。
        'endpoint'='<yourEndpoint>',  --当前Hologres实例VPC网络的Endpoint。
        'connectionSize' = '10',  --默认为3
        'jdbcWriteBatchSize' = '1024',  --默认为256
        'jdbcWriteBatchByteSize' = '2147483647',  --默认为20971520
        'mutatetype'='insertorreplace'  --插入或整行替换已有数据
      );
    
    -- 进行 ETL 操作并写入数据
    insert into flink_case_1_sink
    select key,
      value1,
      value2,
      value3,
      value4,
      value5,
      value6,
      value7,
      value8,
      value9
    from
      flink_case_1_source
    ;

    参数说明请参见Hologres结果表

  • 示例结果

    在Hologres的管理控制台的监控信息页面,即可看到RPS的数值。RPS

TPC-H 22条查询语句

TPCH 22条查询语句如下所示,您可以单击表格中的链接进行查看。

名称

查询语句

TPCH 22条查询语句

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Q9

Q10

Q11

Q12

Q13

Q14

Q15

Q16

Q17

Q18

Q19

Q20

Q21

Q22

-

-

  • Q1

    select
            l_returnflag,
            l_linestatus,
            sum(l_quantity) as sum_qty,
            sum(l_extendedprice) as sum_base_price,
            sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
            sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
            avg(l_quantity) as avg_qty,
            avg(l_extendedprice) as avg_price,
            avg(l_discount) as avg_disc,
            count(*) as count_order
    from
            lineitem
    where
            l_shipdate <= date '1998-12-01' - interval '120' day
    group by
            l_returnflag,
            l_linestatus
    order by
            l_returnflag,
            l_linestatus;
  • Q2

    select
            s_acctbal,
            s_name,
            n_name,
            p_partkey,
            p_mfgr,
            s_address,
            s_phone,
            s_comment
    from
            part,
            supplier,
            partsupp,
            nation,
            region
    where
            p_partkey = ps_partkey
            and s_suppkey = ps_suppkey
            and p_size = 48
            and p_type like '%STEEL'
            and s_nationkey = n_nationkey
            and n_regionkey = r_regionkey
            and r_name = 'EUROPE'
            and ps_supplycost = (
                    select
                            min(ps_supplycost)
                    from
                            partsupp,
                            supplier,
                            nation,
                            region
                    where
                            p_partkey = ps_partkey
                            and s_suppkey = ps_suppkey
                            and s_nationkey = n_nationkey
                            and n_regionkey = r_regionkey
                            and r_name = 'EUROPE'
            )
    order by
            s_acctbal desc,
            n_name,
            s_name,
            p_partkey
    limit 100;
  • Q3

    select
            l_orderkey,
            sum(l_extendedprice * (1 - l_discount)) as revenue,
            o_orderdate,
            o_shippriority
    from
            customer,
            orders,
            lineitem
    where
            c_mktsegment = 'MACHINERY'
            and c_custkey = o_custkey
            and l_orderkey = o_orderkey
            and o_orderdate < date '1995-03-23'
            and l_shipdate > date '1995-03-23'
    group by
            l_orderkey,
            o_orderdate,
            o_shippriority
    order by
            revenue desc,
            o_orderdate
    limit 10;
  • Q4

    select
            o_orderpriority,
            count(*) as order_count
    from
            orders
    where
            o_orderdate >= date '1996-07-01'
            and o_orderdate < date '1996-07-01' + interval '3' month
            and exists (
                    select
                            *
                    from
                            lineitem
                    where
                            l_orderkey = o_orderkey
                            and l_commitdate < l_receiptdate
            )
    group by
            o_orderpriority
    order by
            o_orderpriority;
  • Q5

    select
            n_name,
            sum(l_extendedprice * (1 - l_discount)) as revenue
    from
            customer,
            orders,
            lineitem,
            supplier,
            nation,
            region
    where
            c_custkey = o_custkey
            and l_orderkey = o_orderkey
            and l_suppkey = s_suppkey
            and c_nationkey = s_nationkey
            and s_nationkey = n_nationkey
            and n_regionkey = r_regionkey
            and r_name = 'EUROPE'
            and o_orderdate >= date '1996-01-01'
            and o_orderdate < date '1996-01-01' + interval '1' year
    group by
            n_name
    order by
            revenue desc;
  • Q6

    select
            sum(l_extendedprice * l_discount) as revenue
    from
            lineitem
    where
            l_shipdate >= date '1996-01-01'
            and l_shipdate < date '1996-01-01' + interval '1' year
            and l_discount between 0.02 - 0.01 and 0.02 + 0.01
            and l_quantity < 24;
  • Q7

    select
            supp_nation,
            cust_nation,
            l_year,
            sum(volume) as revenue
    from
            (
                    select
                            n1.n_name as supp_nation,
                            n2.n_name as cust_nation,
                            extract(year from l_shipdate) as l_year,
                            l_extendedprice * (1 - l_discount) as volume
                    from
                            supplier,
                            lineitem,
                            orders,
                            customer,
                            nation n1,
                            nation n2
                    where
                            s_suppkey = l_suppkey
                            and o_orderkey = l_orderkey
                            and c_custkey = o_custkey
                            and s_nationkey = n1.n_nationkey
                            and c_nationkey = n2.n_nationkey
                            and (
                                    (n1.n_name = 'CANADA' and n2.n_name = 'BRAZIL')
                                    or (n1.n_name = 'BRAZIL' and n2.n_name = 'CANADA')
                            )
                            and l_shipdate between date '1995-01-01' and date '1996-12-31'
            ) as shipping
    group by
            supp_nation,
            cust_nation,
            l_year
    order by
            supp_nation,
            cust_nation,
            l_year;
  • Q8

    select
            o_year,
            sum(case
                    when nation = 'BRAZIL' then volume
                    else 0
            end) / sum(volume) as mkt_share
    from
            (
                    select
                            extract(year from o_orderdate) as o_year,
                            l_extendedprice * (1 - l_discount) as volume,
                            n2.n_name as nation
                    from
                            part,
                            supplier,
                            lineitem,
                            orders,
                            customer,
                            nation n1,
                            nation n2,
                            region
                    where
                            p_partkey = l_partkey
                            and s_suppkey = l_suppkey
                            and l_orderkey = o_orderkey
                            and o_custkey = c_custkey
                            and c_nationkey = n1.n_nationkey
                            and n1.n_regionkey = r_regionkey
                            and r_name = 'AMERICA'
                            and s_nationkey = n2.n_nationkey
                            and o_orderdate between date '1995-01-01' and date '1996-12-31'
                            and p_type = 'LARGE ANODIZED COPPER'
            ) as all_nations
    group by
            o_year
    order by
            o_year;
  • Q9

    select
            nation,
            o_year,
            sum(amount) as sum_profit
    from
            (
                    select
                            n_name as nation,
                            extract(year from o_orderdate) as o_year,
                            l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
                    from
                            part,
                            supplier,
                            lineitem,
                            partsupp,
                            orders,
                            nation
                    where
                            s_suppkey = l_suppkey
                            and ps_suppkey = l_suppkey
                            and ps_partkey = l_partkey
                            and p_partkey = l_partkey
                            and o_orderkey = l_orderkey
                            and s_nationkey = n_nationkey
                            and p_name like '%maroon%'
            ) as profit
    group by
            nation,
            o_year
    order by
            nation,
            o_year desc;
  • Q10

    select
            c_custkey,
            c_name,
            sum(l_extendedprice * (1 - l_discount)) as revenue,
            c_acctbal,
            n_name,
            c_address,
            c_phone,
            c_comment
    from
            customer,
            orders,
            lineitem,
            nation
    where
            c_custkey = o_custkey
            and l_orderkey = o_orderkey
            and o_orderdate >= date '1993-02-01'
            and o_orderdate < date '1993-02-01' + interval '3' month
            and l_returnflag = 'R'
            and c_nationkey = n_nationkey
    group by
            c_custkey,
            c_name,
            c_acctbal,
            c_phone,
            n_name,
            c_address,
            c_comment
    order by
            revenue desc
    limit 20;
  • Q11

    select
            ps_partkey,
            sum(ps_supplycost * ps_availqty) as value
    from
            partsupp,
            supplier,
            nation
    where
            ps_suppkey = s_suppkey
            and s_nationkey = n_nationkey
            and n_name = 'EGYPT'
    group by
            ps_partkey having
                    sum(ps_supplycost * ps_availqty) > (
                            select
                                    sum(ps_supplycost * ps_availqty) * 0.0001000000
                            from
                                    partsupp,
                                    supplier,
                                    nation
                            where
                                    ps_suppkey = s_suppkey
                                    and s_nationkey = n_nationkey
                                    and n_name = 'EGYPT'
                    )
    order by
            value desc;
  • Q12

    select
            l_shipmode,
            sum(case
                    when o_orderpriority = '1-URGENT'
                            or o_orderpriority = '2-HIGH'
                            then 1
                    else 0
            end) as high_line_count,
            sum(case
                    when o_orderpriority <> '1-URGENT'
                            and o_orderpriority <> '2-HIGH'
                            then 1
                    else 0
            end) as low_line_count
    from
            orders,
            lineitem
    where
            o_orderkey = l_orderkey
            and l_shipmode in ('FOB', 'AIR')
            and l_commitdate < l_receiptdate
            and l_shipdate < l_commitdate
            and l_receiptdate >= date '1997-01-01'
            and l_receiptdate < date '1997-01-01' + interval '1' year
    group by
            l_shipmode
    order by
            l_shipmode;
  • Q13

    select
            c_count,
            count(*) as custdist
    from
            (
                    select
                            c_custkey,
                            count(o_orderkey) as c_count
                    from
                            customer left outer join orders on
                                    c_custkey = o_custkey
                                    and o_comment not like '%special%deposits%'
                    group by
                            c_custkey
            ) c_orders
    group by
            c_count
    order by
            custdist desc,
            c_count desc;
  • Q14

    select
            100.00 * sum(case
                    when p_type like 'PROMO%'
                            then l_extendedprice * (1 - l_discount)
                    else 0
            end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
    from
            lineitem,
            part
    where
            l_partkey = p_partkey
            and l_shipdate >= date '1997-06-01'
            and l_shipdate < date '1997-06-01' + interval '1' month;
  • Q15

    with revenue0(SUPPLIER_NO, TOTAL_REVENUE)  as
        (
        select
            l_suppkey,
            sum(l_extendedprice * (1 - l_discount))
        from
            lineitem
        where
            l_shipdate >= date '1995-02-01'
            and l_shipdate < date '1995-02-01' + interval '3' month
        group by
            l_suppkey
        )
    select
        s_suppkey,
        s_name,
        s_address,
        s_phone,
        total_revenue
    from
        supplier,
        revenue0
    where
        s_suppkey = supplier_no
        and total_revenue = (
            select
                max(total_revenue)
            from
                revenue0
        )
    order by
        s_suppkey;
  • Q16

    select
            p_brand,
            p_type,
            p_size,
            count(distinct ps_suppkey) as supplier_cnt
    from
            partsupp,
            part
    where
            p_partkey = ps_partkey
            and p_brand <> 'Brand#45'
            and p_type not like 'SMALL ANODIZED%'
            and p_size in (47, 15, 37, 30, 46, 16, 18, 6)
            and ps_suppkey not in (
                    select
                            s_suppkey
                    from
                            supplier
                    where
                            s_comment like '%Customer%Complaints%'
            )
    group by
            p_brand,
            p_type,
            p_size
    order by
            supplier_cnt desc,
            p_brand,
            p_type,
            p_size;
  • Q17

    select
            sum(l_extendedprice) / 7.0 as avg_yearly
    from
            lineitem,
            part
    where
            p_partkey = l_partkey
            and p_brand = 'Brand#51'
            and p_container = 'WRAP PACK'
            and l_quantity < (
                    select
                            0.2 * avg(l_quantity)
                    from
                            lineitem
                    where
                            l_partkey = p_partkey
            );
  • Q18

    select
            c_name,
            c_custkey,
            o_orderkey,
            o_orderdate,
            o_totalprice,
            sum(l_quantity)
    from
            customer,
            orders,
            lineitem
    where
            o_orderkey in (
                    select
                            l_orderkey
                    from
                            lineitem
                    group by
                            l_orderkey having
                                    sum(l_quantity) > 312
            )
            and c_custkey = o_custkey
            and o_orderkey = l_orderkey
    group by
            c_name,
            c_custkey,
            o_orderkey,
            o_orderdate,
            o_totalprice
    order by
            o_totalprice desc,
            o_orderdate
    limit 100;
  • Q19

    select
            sum(l_extendedprice* (1 - l_discount)) as revenue
    from
            lineitem,
            part
    where
            (
                    p_partkey = l_partkey
                    and p_brand = 'Brand#52'
                    and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
                    and l_quantity >= 3 and l_quantity <= 3 + 10
                    and p_size between 1 and 5
                    and l_shipmode in ('AIR', 'AIR REG')
                    and l_shipinstruct = 'DELIVER IN PERSON'
            )
            or
            (
                    p_partkey = l_partkey
                    and p_brand = 'Brand#43'
                    and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
                    and l_quantity >= 12 and l_quantity <= 12 + 10
                    and p_size between 1 and 10
                    and l_shipmode in ('AIR', 'AIR REG')
                    and l_shipinstruct = 'DELIVER IN PERSON'
            )
            or
            (
                    p_partkey = l_partkey
                    and p_brand = 'Brand#52'
                    and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
                    and l_quantity >= 21 and l_quantity <= 21 + 10
                    and p_size between 1 and 15
                    and l_shipmode in ('AIR', 'AIR REG')
                    and l_shipinstruct = 'DELIVER IN PERSON'
            );
  • Q20

    select
            s_name,
            s_address
    from
            supplier,
            nation
    where
            s_suppkey in (
                    select
                            ps_suppkey
                    from
                            partsupp
                    where
                            ps_partkey in (
                                    select
                                            p_partkey
                                    from
                                            part
                                    where
                                            p_name like 'drab%'
                            )
                            and ps_availqty > (
                                    select
                                            0.5 * sum(l_quantity)
                                    from
                                            lineitem
                                    where
                                            l_partkey = ps_partkey
                                            and l_suppkey = ps_suppkey
                                            and l_shipdate >= date '1996-01-01'
                                            and l_shipdate < date '1996-01-01' + interval '1' year
                            )
            )
            and s_nationkey = n_nationkey
            and n_name = 'KENYA'
    order by
            s_name;
  • Q21

    select
            s_name,
            count(*) as numwait
    from
            supplier,
            lineitem l1,
            orders,
            nation
    where
            s_suppkey = l1.l_suppkey
            and o_orderkey = l1.l_orderkey
            and o_orderstatus = 'F'
            and l1.l_receiptdate > l1.l_commitdate
            and exists (
                    select
                            *
                    from
                            lineitem l2
                    where
                            l2.l_orderkey = l1.l_orderkey
                            and l2.l_suppkey <> l1.l_suppkey
            )
            and not exists (
                    select
                            *
                    from
                            lineitem l3
                    where
                            l3.l_orderkey = l1.l_orderkey
                            and l3.l_suppkey <> l1.l_suppkey
                            and l3.l_receiptdate > l3.l_commitdate
            )
            and s_nationkey = n_nationkey
            and n_name = 'PERU'
    group by
            s_name
    order by
            numwait desc,
            s_name
    limit 100;
  • Q22

    select
            cntrycode,
            count(*) as numcust,
            sum(c_acctbal) as totacctbal
    from
            (
                    select
                            substring(c_phone from 1 for 2) as cntrycode,
                            c_acctbal
                    from
                            customer
                    where
                            substring(c_phone from 1 for 2) in
                                    ('24', '32', '17', '18', '12', '14', '22')
                            and c_acctbal > (
                                    select
                                            avg(c_acctbal)
                                    from
                                            customer
                                    where
                                            c_acctbal > 0.00
                                            and substring(c_phone from 1 for 2) in
                                                    ('24', '32', '17', '18', '12', '14', '22')
                            )
                            and not exists (
                                    select
                                            *
                                    from
                                            orders
                                    where
                                            o_custkey = c_custkey
                            )
            ) as custsale
    group by
            cntrycode
    order by
            cntrycode;