本文介绍如何将TPC-DS测试数据导入AnalyticDB MySQL中。

对数据做兼容加工处理

如果表的字段没有设置默认值,生成的字段值默认是NULL。

使用管道符“|”作为字段分隔符的时候,“a,NULL,c,d,NULL”导出到文本文件是“a||c|d|”的格式,在使用LOAD DATA的方法导入AnalyticDB的时候,会产生报错提示导入失败,所以需要对NULL值做一些处理。

把NULL替换为0,int、bigint、varchar、date字段类型的都会替换
#!/bin/bash
# 用0来替换第一个字段的NULL值,把^|替换成0|
# 用0来替换中间字段的NULL值, 把||替换成|0|
# 用0来替换最后一个字段的NULL值,把|$替换成|0
for s_f in `ls *dat`
do
    echo "$s_f"
    i=1
    while [ `egrep '\|\||^\||\|$' $s_f |wc -l` -gt 0 ]
    do 
        echo $i
        sed 's/^|/0|/g;s/||/|0|/g;s/|$/|0/g' -i $s_f
        ((i++))
    done
done
把date字段的0值替换成 0000-00-00

1092|AAAAAAAACEEAAAAA|2001-10-27|0|Manufa....
16252|AAAAAAAAMHPDAAAA|0|1999-10-27|0|7.94|0|1001....
16252|AAAAAAAAMHPDAAAA|0|0|0|7.94|0|1001..

for s_f in item.dat store.dat web_page.dat web_site.dat call_center.dat
do
# 处理第一、第二个date都是NULL的
sed 's/^\([A-Za-z0-9]*|[A-Za-z0-9]*\)|0|0|\(.*\)/\1|0000-00-00|0000-00-00|\2/' -i $s_f

# 处理第二个date是NULL的
sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*|[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)|0|\(.*\)/\1|0000-00-00|\2/' -i $s_f

# 处理第一个date是NULL的
sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*\)|0|\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}|.*\)/\1|0000-00-00|\2/' -i $s_f

done

导入数据到AnalyticDB MySQL

使用LOAD DATA LOCAL INFILE的方式把dsdgen生成的数据文本导入到AnalyticDB MySQL。

说明
  • 如果是在Linux环境运行生成的数据文本,每行的结束符是'\n'。
  • 如果是在Windows环境运行生成的数据文本,每行的结束符是'\r\n'。
LOAD DATA LOCAL INFILE 'call_center.dat' INTO
TABLE call_center FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_page.dat' INTO
TABLE catalog_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_returns.dat'
INTO TABLE catalog_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_sales.dat'
INTO TABLE catalog_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'customer_address.dat'
INTO TABLE customer_address FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'customer.dat' INTO
TABLE customer FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE
'customer_demographics.dat' INTO TABLE customer_demographics FIELDS TERMINATED
BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'date_dim.dat' INTO
TABLE date_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'dbgen_version.dat'
INTO TABLE dbgen_version FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE
'household_demographics.dat' INTO TABLE household_demographics FIELDS
TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'income_band.dat' INTO
TABLE income_band FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'inventory.dat' INTO
TABLE inventory FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'item.dat' INTO TABLE
item FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'promotion.dat' INTO
TABLE promotion FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'reason.dat' INTO TABLE
reason FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'ship_mode.dat' INTO
TABLE ship_mode FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store.dat' INTO TABLE
store FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store_returns.dat'
INTO TABLE store_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store_sales.dat' INTO
TABLE store_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'time_dim.dat' INTO
TABLE time_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'warehouse.dat' INTO
TABLE warehouse FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_page.dat' INTO
TABLE web_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_returns.dat' INTO
TABLE web_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_sales.dat' INTO
TABLE web_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_site.dat' INTO
TABLE web_site FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';

在AnalyticDB MySQL中,还可以通过OSS外表方式导入测试数据,请参见通过外表导入OSS数据

下表列出了TPC-DS测试数据集中的表数据条数。

表名 数据条数
store_sales 2,879,987,999
catalog_sales 1,439,980,416
web_sales 720,000,376
store_returns 287,999,764
catalog_returns 143,996,756
inventory 783,000,000
web_returns 71,997,522
customer 12,000,000
customer_address 6,000,000
item 300,000
customer_demographics 1,920,800
date_dim 73,049
time_dim 86,400
catalog_page 30,000
web_page 3,000
store 1,002
promotion 1,500
household_demographics 7,200
web_site 54
call_center 42
reason 65
warehouse 20
ship_mode 20
income_band 20