This topic describes how to import TPC-DS test data to AnalyticDB for MySQL.

Process data for compatibility

If no default value is specified for a field in a table, the default value is NULL.

When a vertical bar (|) is used as the field separator, "a,NULL,c,d,NULL" is converted into "a||c|d|" in the generated text files. This leads to import failures when you import data to AnalyticDB for MySQL by using the LOAD DATA statement. Therefore, NULL values must be replaced.

Replace the NULL values in all fields of the INT, BIGINT, VARCHAR, and DATE types with 0
#!/bin/bash
# Replace NULL with 0 in the first field to convert ^| into 0|.
# Replace NULL with 0 in the fields in the middle to convert || into |0|.
# Replace NULL with 0 in the last field to convert |$ into |0.
for s_f in `ls *dat`
do
    echo "$s_f"
    i=1
    while [ `egrep '\|\||^\||\|$' $s_f |wc -l` -gt 0 ]
    do 
        echo $i
        sed 's/^|/0|/g;s/||/|0|/g;s/|$/|0/g' -i $s_f
        ((i++))
    done
done
Replace the 0 value in date-typed fields with 0000-00-00.

1092|AAAAAAAACEEAAAAA|2001-10-27|0|Manufa....
16252|AAAAAAAAMHPDAAAA|0|1999-10-27|0|7.94|0|1001....
16252|AAAAAAAAMHPDAAAA|0|0|0|7.94|0|1001..

for s_f in item.dat store.dat web_page.dat web_site.dat call_center.dat
do
# Process the first and second date-typed fields whose value is NULL.
sed 's/^\([A-Za-z0-9]*|[A-Za-z0-9]*\)|0|0|\(.*\)/\1|0000-00-00|0000-00-00|\2/' -i $s_f

# Process the second date-typed field whose value is NULL.
sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*|[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)|0|\(.*\)/\1|0000-00-00|\2/' -i $s_f

# Process the first date-typed field whose value is NULL.
sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*\)|0|\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}|.*\)/\1|0000-00-00|\2/' -i $s_f

done

Import data to AnalyticDB for MySQL

The following section describes how to import data files generated by dsdgen to AnalyticDB for MySQL by using the LOAD DATA LOCAL INFILE statement.

Note
  • If you run the data files in Linux, each row ends with \n.
  • If you run the data files in Windows, each row ends with \r\n.
LOAD DATA LOCAL INFILE 'call_center.dat' INTO
TABLE call_center FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_page.dat' INTO
TABLE catalog_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_returns.dat'
INTO TABLE catalog_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_sales.dat'
INTO TABLE catalog_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'customer_address.dat'
INTO TABLE customer_address FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'customer.dat' INTO
TABLE customer FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE
'customer_demographics.dat' INTO TABLE customer_demographics FIELDS TERMINATED
BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'date_dim.dat' INTO
TABLE date_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'dbgen_version.dat'
INTO TABLE dbgen_version FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE
'household_demographics.dat' INTO TABLE household_demographics FIELDS
TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'income_band.dat' INTO
TABLE income_band FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'inventory.dat' INTO
TABLE inventory FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'item.dat' INTO TABLE
item FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'promotion.dat' INTO
TABLE promotion FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'reason.dat' INTO TABLE
reason FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'ship_mode.dat' INTO
TABLE ship_mode FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store.dat' INTO TABLE
store FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store_returns.dat'
INTO TABLE store_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store_sales.dat' INTO
TABLE store_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'time_dim.dat' INTO
TABLE time_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'warehouse.dat' INTO
TABLE warehouse FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_page.dat' INTO
TABLE web_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_returns.dat' INTO
TABLE web_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_sales.dat' INTO
TABLE web_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_site.dat' INTO
TABLE web_site FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';

You can also use foreign tables to import data from Object Storage Service (OSS) to AnalyticDB for MySQL. For more information, see Import OSS data to AnalyticDB for MySQL by using external tables.

The following table lists the number of rows in different tables of the TPC-DS dataset.

Table name Number of rows
store_sales 2,879,987,999
catalog_sales 1,439,980,416
web_sales 720,000,376
store_returns 287,999,764
catalog_returns 143,996,756
inventory 783,000,000
web_returns 71,997,522
customer 12,000,000
customer_address 6,000,000
item 300,000
customer_demographics 1,920,800
date_dim 73,049
time_dim 86,400
catalog_page 30,000
web_page 3,000
store 1,002
promotion 1,500
household_demographics 7,200
web_site 54
call_center 42
reason 65
warehouse 20
ship_mode 20
income_band 20