This topic describes how to import TPC-DS test data to AnalyticDB for MySQL.
Process data for compatibility
If no default value is specified for a field in a table, the default value is NULL.
When a vertical bar (|) is used as the field separator, "a,NULL,c,d,NULL" is converted into "a||c|d|" in the generated text files. This leads to import failures when you import data to AnalyticDB for MySQL by using the LOAD DATA statement. Therefore, NULL values must be replaced.
- Replace the NULL values in all fields of the INT, BIGINT, VARCHAR, and DATE types with 0
#!/bin/bash # Replace NULL with 0 in the first field to convert ^| into 0|. # Replace NULL with 0 in the fields in the middle to convert || into |0|. # Replace NULL with 0 in the last field to convert |$ into |0. for s_f in `ls *dat` do echo "$s_f" i=1 while [ `egrep '\|\||^\||\|$' $s_f |wc -l` -gt 0 ] do echo $i sed 's/^|/0|/g;s/||/|0|/g;s/|$/|0/g' -i $s_f ((i++)) done done
- Replace the 0 value in date-typed fields with 0000-00-00.
1092|AAAAAAAACEEAAAAA|2001-10-27|0|Manufa.... 16252|AAAAAAAAMHPDAAAA|0|1999-10-27|0|7.94|0|1001.... 16252|AAAAAAAAMHPDAAAA|0|0|0|7.94|0|1001.. for s_f in item.dat store.dat web_page.dat web_site.dat call_center.dat do # Process the first and second date-typed fields whose value is NULL. sed 's/^\([A-Za-z0-9]*|[A-Za-z0-9]*\)|0|0|\(.*\)/\1|0000-00-00|0000-00-00|\2/' -i $s_f # Process the second date-typed field whose value is NULL. sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*|[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)|0|\(.*\)/\1|0000-00-00|\2/' -i $s_f # Process the first date-typed field whose value is NULL. sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*\)|0|\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}|.*\)/\1|0000-00-00|\2/' -i $s_f done
Import data to AnalyticDB for MySQL
The following section describes how to import data files generated by dsdgen to AnalyticDB for MySQL by using the LOAD DATA LOCAL INFILE statement.
- If you run the data files in Linux, each row ends with \n.
- If you run the data files in Windows, each row ends with \r\n.
LOAD DATA LOCAL INFILE 'call_center.dat' INTO
TABLE call_center FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_page.dat' INTO
TABLE catalog_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_returns.dat'
INTO TABLE catalog_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_sales.dat'
INTO TABLE catalog_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'customer_address.dat'
INTO TABLE customer_address FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'customer.dat' INTO
TABLE customer FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE
'customer_demographics.dat' INTO TABLE customer_demographics FIELDS TERMINATED
BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'date_dim.dat' INTO
TABLE date_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'dbgen_version.dat'
INTO TABLE dbgen_version FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE
'household_demographics.dat' INTO TABLE household_demographics FIELDS
TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'income_band.dat' INTO
TABLE income_band FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'inventory.dat' INTO
TABLE inventory FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'item.dat' INTO TABLE
item FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'promotion.dat' INTO
TABLE promotion FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'reason.dat' INTO TABLE
reason FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'ship_mode.dat' INTO
TABLE ship_mode FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store.dat' INTO TABLE
store FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store_returns.dat'
INTO TABLE store_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store_sales.dat' INTO
TABLE store_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'time_dim.dat' INTO
TABLE time_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'warehouse.dat' INTO
TABLE warehouse FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_page.dat' INTO
TABLE web_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_returns.dat' INTO
TABLE web_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_sales.dat' INTO
TABLE web_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_site.dat' INTO
TABLE web_site FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
You can also use foreign tables to import data from Object Storage Service (OSS) to AnalyticDB for MySQL. For more information, see Import OSS data to AnalyticDB for MySQL by using external tables.
The following table lists the number of rows in different tables of the TPC-DS dataset.
Table name | Number of rows |
---|---|
store_sales | 2,879,987,999 |
catalog_sales | 1,439,980,416 |
web_sales | 720,000,376 |
store_returns | 287,999,764 |
catalog_returns | 143,996,756 |
inventory | 783,000,000 |
web_returns | 71,997,522 |
customer | 12,000,000 |
customer_address | 6,000,000 |
item | 300,000 |
customer_demographics | 1,920,800 |
date_dim | 73,049 |
time_dim | 86,400 |
catalog_page | 30,000 |
web_page | 3,000 |
store | 1,002 |
promotion | 1,500 |
household_demographics | 7,200 |
web_site | 54 |
call_center | 42 |
reason | 65 |
warehouse | 20 |
ship_mode | 20 |
income_band | 20 |