Load 1 TB of TPC-DS test data into AnalyticDB for MySQL to run performance benchmarks. Two methods are available: importing from pre-staged OSS paths (recommended for Data Lakehouse Edition clusters) and generating data locally with dsdgen and importing via LOAD DATA.
This TPC-DS implementation is derived from the TPC-DS benchmark specification but does not comply with all its requirements and is not comparable to published TPC-DS benchmark results.
Dataset overview
The TPC-DS dataset contains 24 tables. At the 1 TB scale factor, the largest fact table (store_sales) contains approximately 2.9 billion rows, and the total dataset spans billions of rows across all fact tables.
| Table name | Number of rows |
|---|---|
| store_sales | 2,879,987,999 |
| catalog_sales | 1,439,980,416 |
| web_sales | 720,000,376 |
| inventory | 783,000,000 |
| store_returns | 287,999,764 |
| catalog_returns | 143,996,756 |
| web_returns | 71,997,522 |
| customer | 12,000,000 |
| customer_address | 6,000,000 |
| customer_demographics | 1,920,800 |
| item | 300,000 |
| time_dim | 86,400 |
| date_dim | 73,049 |
| catalog_page | 30,000 |
| web_page | 3,000 |
| store | 1,002 |
| promotion | 1,500 |
| household_demographics | 7,200 |
| web_site | 54 |
| call_center | 42 |
| reason | 65 |
| warehouse | 20 |
| ship_mode | 20 |
| income_band | 20 |
Import from OSS external tables (recommended)
AnalyticDB for MySQL provides pre-staged TPC-DS data in Object Storage Service (OSS) buckets across 16 regions. Rather than generating and uploading data yourself, create external tables that point directly to these OSS paths and then copy the data into your internal tables.
This method is available only for Data Lakehouse Edition clusters.
Prerequisites
Before you begin, ensure that you have:
A Data Lakehouse Edition cluster
The internal tables already created — see Create test tables
Step 1: Create an external database
CREATE EXTERNAL DATABASE IF NOT EXISTS external_tpcds;Step 2: Create 24 external tables
Each external table maps to an OSS path for your cluster's region. The examples below use the China (Beijing) path. Replace the LOCATION value with the path for your region.
OSS paths by region
| Region | Path |
|---|---|
| China (Hangzhou) | oss://dataset-cn-hangzhou-external/TPCDS/1TB |
| China (Zhangjiakou) | oss://dataset-cn-zhangjiakou-external/TPCDS/1TB |
| China (Beijing) | oss://dataset-cn-beijing-external/TPCDS/1TB |
| China (Shanghai) | oss://dataset-cn-shanghai-external/TPCDS/1TB |
| China (Shenzhen) | oss://dataset-cn-shenzhen-external/TPCDS/1TB |
| China (Qingdao) | oss://dataset-cn-qingdao-external/TPCDS/1TB |
| China (Guangzhou) | oss://dataset-cn-guangzhou-external/TPCDS/1TB |
| China (Hong Kong) | oss://dataset-cn-hongkong-external/TPCDS/1TB |
| Singapore | oss://dataset-ap-southeast-1-external/TPCDS/1TB |
| Malaysia (Kuala Lumpur) | oss://dataset-ap-southeast-3-external/TPCDS/1TB |
| Japan (Tokyo) | oss://dataset-ap-northeast-1-external/TPCDS/1TB |
| Indonesia (Jakarta) | oss://dataset-ap-southeast-5-external/TPCDS/1TB |
| Germany (Frankfurt) | oss://dataset-eu-central-1-external/TPCDS/1TB |
| US (Silicon Valley) | oss://dataset-us-west-1-external/TPCDS/1TB/ |
| UK (London) | oss://dataset-eu-west-1-external/TPCDS/1TB |
| US (Virginia) | oss://dataset-us-east-1-external/TPCDS/1TB |
All external tables use pipe-delimited text format (ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE). Run the following statements to create all 24 tables:
CREATE EXTERNAL TABLE external_tpcds.call_center
(
cc_call_center_sk BIGINT not null,
cc_call_center_id CHAR(16) not null,
cc_rec_start_date DATE,
cc_rec_end_date DATE,
cc_closed_date_sk BIGINT,
cc_open_date_sk BIGINT,
cc_name VARCHAR(50),
cc_class VARCHAR(50),
cc_employees INT,
cc_sq_ft INT,
cc_hours CHAR(20),
cc_manager VARCHAR(40),
cc_mkt_id INT,
cc_mkt_class CHAR(50),
cc_mkt_desc VARCHAR(100),
cc_market_manager VARCHAR(40),
cc_division INT,
cc_division_name VARCHAR(50),
cc_company INT,
cc_company_name CHAR(50),
cc_street_number CHAR(10),
cc_street_name VARCHAR(60),
cc_street_type CHAR(15),
cc_suite_number CHAR(10),
cc_city VARCHAR(60),
cc_county VARCHAR(30),
cc_state CHAR(2),
cc_zip CHAR(10),
cc_country VARCHAR(20),
cc_gmt_offset DECIMAL(5,2),
cc_tax_percentage DECIMAL(5,2),
dummy varchar
)ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/call_center';
CREATE EXTERNAL TABLE external_tpcds.catalog_page
(
cp_catalog_page_sk BIGINT not null,
cp_catalog_page_id VARCHAR(16) not null,
cp_start_date_sk BIGINT,
cp_end_date_sk BIGINT,
cp_department VARCHAR(50),
cp_catalog_number INT,
cp_catalog_page_number INT,
cp_description VARCHAR(100),
cp_type VARCHAR(100),
dummy varchar
)ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_page';
CREATE EXTERNAL TABLE external_tpcds.catalog_returns
(
cr_returned_date_sk BIGINT,
cr_returned_time_sk BIGINT,
cr_item_sk BIGINT not null,
cr_refunded_customer_sk BIGINT,
cr_refunded_cdemo_sk BIGINT,
cr_refunded_hdemo_sk BIGINT,
cr_refunded_addr_sk BIGINT,
cr_returning_customer_sk BIGINT,
cr_returning_cdemo_sk BIGINT,
cr_returning_hdemo_sk BIGINT,
cr_returning_addr_sk BIGINT,
cr_call_center_sk BIGINT,
cr_catalog_page_sk BIGINT ,
cr_ship_mode_sk BIGINT ,
cr_warehouse_sk BIGINT ,
cr_reason_sk BIGINT ,
cr_order_number BIGINT not null,
cr_return_quantity INT,
cr_return_amount DECIMAL(7,2),
cr_return_tax DECIMAL(7,2),
cr_return_amt_inc_tax DECIMAL(7,2),
cr_fee DECIMAL(7,2),
cr_return_ship_cost DECIMAL(7,2),
cr_refunded_cash DECIMAL(7,2),
cr_reversed_charge DECIMAL(7,2),
cr_store_credit DECIMAL(7,2),
cr_net_loss DECIMAL(7,2),
dummy varchar
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_returns';
CREATE EXTERNAL TABLE external_tpcds.catalog_sales
(
cs_sold_date_sk BIGINT,
cs_sold_time_sk BIGINT,
cs_ship_date_sk BIGINT,
cs_bill_customer_sk BIGINT,
cs_bill_cdemo_sk BIGINT,
cs_bill_hdemo_sk BIGINT,
cs_bill_addr_sk BIGINT,
cs_ship_customer_sk BIGINT,
cs_ship_cdemo_sk BIGINT,
cs_ship_hdemo_sk BIGINT,
cs_ship_addr_sk BIGINT,
cs_call_center_sk BIGINT,
cs_catalog_page_sk BIGINT,
cs_ship_mode_sk BIGINT,
cs_warehouse_sk BIGINT,
cs_item_sk BIGINT not null,
cs_promo_sk BIGINT,
cs_order_number BIGINT not null,
cs_quantity INT,
cs_wholesale_cost DECIMAL(7,2),
cs_list_price DECIMAL(7,2),
cs_sales_price DECIMAL(7,2),
cs_ext_discount_amt DECIMAL(7,2),
cs_ext_sales_price DECIMAL(7,2),
cs_ext_wholesale_cost DECIMAL(7,2),
cs_ext_list_price DECIMAL(7,2),
cs_ext_tax DECIMAL(7,2),
cs_coupon_amt DECIMAL(7,2),
cs_ext_ship_cost DECIMAL(7,2),
cs_net_paid DECIMAL(7,2),
cs_net_paid_inc_tax DECIMAL(7,2),
cs_net_paid_inc_ship DECIMAL(7,2),
cs_net_paid_inc_ship_tax DECIMAL(7,2),
cs_net_profit DECIMAL(7,2),
dummy varchar
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_sales';
CREATE EXTERNAL TABLE external_tpcds.customer
(
c_customer_sk BIGINT NOT NULL,
c_customer_id CHAR(16) NOT NULL,
c_current_cdemo_sk BIGINT,
c_current_hdemo_sk BIGINT,
c_current_addr_sk BIGINT,
c_first_shipto_date_sk BIGINT,
c_first_sales_date_sk BIGINT,
c_salutation CHAR(10),
c_first_name CHAR(20),
c_last_name CHAR(30),
c_preferred_cust_flag char(1),
c_birth_day INT,
c_birth_month INT,
c_birth_year INT,
c_birth_country VARCHAR(20),
c_login CHAR(13),
c_email_address CHAR(50),
c_last_review_date_sk BIGINT,
dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/customer';
CREATE EXTERNAL TABLE external_tpcds.customer_address
(
ca_address_sk BIGINT NOT NULL,
ca_address_id VARCHAR(16) NOT NULL,
ca_street_number VARCHAR(10),
ca_street_name VARCHAR(60),
ca_street_type VARCHAR(15),
ca_suite_number VARCHAR(10),
ca_city VARCHAR(60),
ca_county VARCHAR(30),
ca_state VARCHAR(2),
ca_zip VARCHAR(10),
ca_country VARCHAR(20),
ca_gmt_offset DECIMAL(5,2),
ca_location_type VARCHAR(20),
dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/customer_address';
CREATE EXTERNAL TABLE external_tpcds.customer_demographics
(
cd_demo_sk BIGINT not null,
cd_gender char(1),
cd_marital_status char(1),
cd_education_status char(20),
cd_purchase_estimate INT,
cd_credit_rating char(10),
cd_dep_count INT,
cd_dep_employed_count INT,
cd_dep_college_count INT,
dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/customer_demographics';
CREATE EXTERNAL TABLE external_tpcds.date_dim
(
d_date_sk BIGINT not null,
d_date_id CHAR(16) not null,
d_date DATE,
d_month_seq INT,
d_week_seq INT,
d_quarter_seq INT,
d_year INT,
d_dow INT,
d_moy INT,
d_dom INT,
d_qoy INT,
d_fy_year INT,
d_fy_quarter_seq INT,
d_fy_week_seq INT,
d_day_name CHAR(9),
d_quarter_name CHAR(6),
d_holiday CHAR(1),
d_weekend CHAR(1),
d_following_holiday CHAR(1),
d_first_dom INT,
d_last_dom INT,
d_same_day_ly INT,
d_same_day_lq INT,
d_current_day CHAR(1),
d_current_week CHAR(1),
d_current_month CHAR(1),
d_current_quarter CHAR(1),
d_current_year CHAR(1),
dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/date_dim';
CREATE EXTERNAL TABLE external_tpcds.household_demographics
(
hd_demo_sk BIGINT not null,
hd_income_band_sk BIGINT,
hd_buy_potential CHAR(15),
hd_dep_count INT,
hd_vehicle_count INT,
dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/household_demographics';
CREATE EXTERNAL TABLE external_tpcds.income_band
(
ib_income_band_sk BIGINT not null,
ib_lower_bound INT,
ib_upper_bound INT,
dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/income_band';
CREATE EXTERNAL TABLE external_tpcds.inventory
(
inv_date_sk BIGINT not null,
inv_item_sk BIGINT not null,
inv_warehouse_sk BIGINT not null,
inv_quantity_on_hand INT,
dummy varchar
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/inventory';
CREATE EXTERNAL TABLE external_tpcds.item
(
i_item_sk BIGINT not null,
i_item_id CHAR(16) not null,
i_rec_start_date DATE,
i_rec_end_date DATE,
i_item_desc VARCHAR(200),
i_current_price DECIMAL(7,2),
i_wholesale_cost DECIMAL(7,2),
i_brand_id INT,
i_brand CHAR(50),
i_class_id INT,
i_class CHAR(50),
i_category_id INT,
i_category CHAR(50),
i_manufact_id INT,
i_manufact CHAR(50),
i_size CHAR(20),
i_formulation CHAR(20),
i_color CHAR(20),
i_units CHAR(10),
i_container CHAR(10),
i_manager_id INT,
i_product_name char(50),
dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/item';
CREATE EXTERNAL TABLE external_tpcds.promotion
(
p_promo_sk BIGINT not null,
p_promo_id CHAR(16) not null,
p_start_date_sk BIGINT,
p_end_date_sk BIGINT,
p_item_sk BIGINT,
p_cost DECIMAL(15,2),
p_response_target INT,
p_promo_name CHAR(50),
p_channel_dmail CHAR(1),
p_channel_email CHAR(1),
p_channel_catalog CHAR(1),
p_channel_tv CHAR(1),
p_channel_radio CHAR(1),
p_channel_press CHAR(1),
p_channel_event CHAR(1),
p_channel_demo CHAR(1),
p_channel_details VARCHAR(100),
p_purpose CHAR(15),
p_discount_active CHAR(1),
dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/promotion';
CREATE EXTERNAL TABLE external_tpcds.reason
(
r_reason_sk BIGINT not null,
r_reason_id CHAR(16) not null,
r_reason_desc CHAR(100),
dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/reason';
CREATE EXTERNAL TABLE external_tpcds.ship_mode
(
sm_ship_mode_sk BIGINT,
sm_ship_mode_id CHAR(16) not null,
sm_type CHAR(30),
sm_code CHAR(10),
sm_carrier CHAR(20),
sm_contract CHAR(20),
dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/ship_mode';
CREATE EXTERNAL TABLE external_tpcds.store_returns
(
sr_returned_date_sk BIGINT,
sr_return_time_sk BIGINT,
sr_item_sk BIGINT not null,
sr_customer_sk BIGINT,
sr_cdemo_sk BIGINT,
sr_hdemo_sk BIGINT,
sr_addr_sk BIGINT,
sr_store_sk BIGINT,
sr_reason_sk BIGINT,
sr_ticket_number BIGINT not null,
sr_return_quantity INT,
sr_return_amt DECIMAL(7,2),
sr_return_tax DECIMAL(7,2),
sr_return_amt_inc_tax DECIMAL(7,2),
sr_fee DECIMAL(7,2),
sr_return_ship_cost DECIMAL(7,2),
sr_refunded_cash DECIMAL(7,2),
sr_reversed_charge DECIMAL(7,2),
sr_store_credit DECIMAL(7,2),
sr_net_loss DECIMAL(7,2),
dummy varchar
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/store_returns';
CREATE EXTERNAL TABLE external_tpcds.store_sales
(
ss_sold_date_sk BIGINT,
ss_sold_time_sk BIGINT,
ss_item_sk BIGINT not null,
ss_customer_sk BIGINT,
ss_cdemo_sk BIGINT,
ss_hdemo_sk BIGINT,
ss_addr_sk BIGINT,
ss_store_sk BIGINT,
ss_promo_sk BIGINT,
ss_ticket_number BIGINT not null,
ss_quantity INT,
ss_wholesale_cost DECIMAL(7,2),
ss_list_price DECIMAL(7,2),
ss_sales_price DECIMAL(7,2),
ss_ext_discount_amt DECIMAL(7,2),
ss_ext_sales_price DECIMAL(7,2),
ss_ext_wholesale_cost DECIMAL(7,2),
ss_ext_list_price DECIMAL(7,2),
ss_ext_tax DECIMAL(7,2),
ss_coupon_amt DECIMAL(7,2),
ss_net_paid DECIMAL(7,2),
ss_net_paid_inc_tax DECIMAL(7,2),
ss_net_profit DECIMAL(7,2),
dummy varchar
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/store_sales';
CREATE EXTERNAL TABLE external_tpcds.store
(
s_store_sk BIGINT not null,
s_store_id CHAR(16) not null,
s_rec_start_date DATE,
s_rec_end_date DATE,
s_closed_date_sk BIGINT,
s_store_name VARCHAR(50),
s_number_employees INT,
s_floor_space INT,
s_hours CHAR(20),
s_manager VARCHAR(40),
s_market_id INT,
s_geography_class VARCHAR(100),
s_market_desc VARCHAR(100),
s_market_manager VARCHAR(40),
s_division_id INT,
s_division_name VARCHAR(50),
s_company_id INT,
s_company_name VARCHAR(50),
s_street_number VARCHAR(10),
s_street_name VARCHAR(60),
s_street_type CHAR(15),
s_suite_number CHAR(10),
s_city VARCHAR(60),
s_county VARCHAR(30),
s_state CHAR(2),
s_zip CHAR(10),
s_country VARCHAR(20),
s_gmt_offset DECIMAL(5,2),
s_tax_percentage DECIMAL(5,2),
dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/store';
CREATE EXTERNAL TABLE external_tpcds.time_dim
(
t_time_sk BIGINT not null,
t_time_id CHAR(16) not null,
t_time INT,
t_hour INT,
t_minute INT,
t_second INT,
t_am_pm CHAR(2),
t_shift CHAR(20),
t_sub_shift CHAR(20),
t_meal_time CHAR(20),
dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/time_dim';
CREATE EXTERNAL TABLE external_tpcds.warehouse
(
w_warehouse_sk BIGINT not null,
w_warehouse_id CHAR(16) not null,
w_warehouse_name VARCHAR(20),
w_warehouse_sq_ft INT,
w_street_number CHAR(10),
w_street_name VARCHAR(60),
w_street_type CHAR(15),
w_suite_number CHAR(10),
w_city VARCHAR(60),
w_county VARCHAR(30),
w_state CHAR(2),
w_zip CHAR(10),
w_country VARCHAR(20),
w_gmt_offset DECIMAL(5,2),
dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/warehouse';
CREATE EXTERNAL TABLE external_tpcds.web_page
(
wp_web_page_sk BIGINT not null,
wp_web_page_id CHAR(16) not null,
wp_rec_start_date DATE,
wp_rec_end_date DATE,
wp_creation_date_sk BIGINT,
wp_access_date_sk BIGINT,
wp_autogen_flag CHAR(1),
wp_customer_sk BIGINT,
wp_url VARCHAR(100),
wp_type CHAR(50),
wp_char_count INT,
wp_link_count INT,
wp_image_count INT,
wp_max_ad_count INT,
dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/web_page';
CREATE EXTERNAL TABLE external_tpcds.web_returns
(
wr_returned_date_sk BIGINT,
wr_returned_time_sk BIGINT,
wr_item_sk BIGINT not null,
wr_refunded_customer_sk BIGINT,
wr_refunded_cdemo_sk BIGINT,
wr_refunded_hdemo_sk BIGINT,
wr_refunded_addr_sk BIGINT,
wr_returning_customer_sk BIGINT,
wr_returning_cdemo_sk BIGINT,
wr_returning_hdemo_sk BIGINT,
wr_returning_addr_sk BIGINT,
wr_web_page_sk BIGINT,
wr_reason_sk BIGINT,
wr_order_number BIGINT not null,
wr_return_quantity INT,
wr_return_amt DECIMAL(7,2),
wr_return_tax DECIMAL(7,2),
wr_return_amt_inc_tax DECIMAL(7,2),
wr_fee DECIMAL(7,2),
wr_return_ship_cost DECIMAL(7,2),
wr_refunded_cash DECIMAL(7,2),
wr_reversed_charge DECIMAL(7,2),
wr_account_credit DECIMAL(7,2),
wr_net_loss DECIMAL(7,2),
dummy varchar
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/web_returns';
CREATE EXTERNAL TABLE external_tpcds.web_sales
(
ws_sold_date_sk BIGINT,
ws_sold_time_sk BIGINT,
ws_ship_date_sk BIGINT,
ws_item_sk BIGINT not null,
ws_bill_customer_sk BIGINT,
ws_bill_cdemo_sk BIGINT,
ws_bill_hdemo_sk BIGINT,
ws_bill_addr_sk BIGINT,
ws_ship_customer_sk BIGINT,
ws_ship_cdemo_sk BIGINT,
ws_ship_hdemo_sk BIGINT,
ws_ship_addr_sk BIGINT,
ws_web_page_sk BIGINT,
ws_web_site_sk BIGINT,
ws_ship_mode_sk BIGINT,
ws_warehouse_sk BIGINT,
ws_promo_sk BIGINT,
ws_order_number BIGINT not null,
ws_quantity INT,
ws_wholesale_cost DECIMAL(7,2),
ws_list_price DECIMAL(7,2),
ws_sales_price DECIMAL(7,2),
ws_ext_discount_amt DECIMAL(7,2),
ws_ext_sales_price DECIMAL(7,2),
ws_ext_wholesale_cost DECIMAL(7,2),
ws_ext_list_price DECIMAL(7,2),
ws_ext_tax DECIMAL(7,2),
ws_coupon_amt DECIMAL(7,2),
ws_ext_ship_cost DECIMAL(7,2),
ws_net_paid DECIMAL(7,2),
ws_net_paid_inc_tax DECIMAL(7,2),
ws_net_paid_inc_ship DECIMAL(7,2),
ws_net_paid_inc_ship_tax DECIMAL(7,2),
ws_net_profit DECIMAL(7,2),
dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/web_sales';
CREATE EXTERNAL TABLE external_tpcds.web_site
(
web_site_sk BIGINT not null,
web_site_id CHAR(16) not null,
web_rec_start_date DATE,
web_rec_end_date DATE,
web_name VARCHAR(50),
web_open_date_sk BIGINT,
web_close_date_sk BIGINT,
web_class VARCHAR(50),
web_manager VARCHAR(40),
web_mkt_id INT,
web_mkt_class VARCHAR(50),
web_mkt_desc VARCHAR(100),
web_market_manager VARCHAR(40),
web_company_id INT,
web_company_name CHAR(50),
web_street_number CHAR(10),
web_street_name VARCHAR(60),
web_street_type CHAR(15),
web_suite_number CHAR(10),
web_city VARCHAR(60),
web_county VARCHAR(30),
web_state CHAR(2),
web_zip CHAR(10),
web_country VARCHAR(20),
web_gmt_offset DECIMAL(5,2),
web_tax_percentage DECIMAL(5,2),
dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION 'oss://dataset-cn-beijing-external/TPCDS/1TB/web_site';Step 3: Copy data into internal tables
Run INSERT INTO ... SELECT * FROM external_tpcds.<table> for each of the 24 tables. For information about creating the internal tables, see Create test tables.
INSERT OVERWRITE INTO promotion SELECT * FROM external_tpcds.promotion;
INSERT INTO web_site SELECT * FROM external_tpcds.web_site;
INSERT OVERWRITE INTO web_sales SELECT * FROM external_tpcds.web_sales;
INSERT OVERWRITE INTO web_returns SELECT * FROM external_tpcds.web_returns;
INSERT OVERWRITE INTO web_page SELECT * FROM external_tpcds.web_page;
INSERT INTO warehouse SELECT * FROM external_tpcds.warehouse;
INSERT OVERWRITE INTO time_dim SELECT * FROM external_tpcds.time_dim;
INSERT OVERWRITE INTO store_sales SELECT * FROM external_tpcds.store_sales;
INSERT OVERWRITE INTO store_returns SELECT * FROM external_tpcds.store_returns;
INSERT INTO store SELECT * FROM external_tpcds.store;
INSERT OVERWRITE INTO household_demographics SELECT * FROM external_tpcds.household_demographics;
INSERT INTO ship_mode SELECT * FROM external_tpcds.ship_mode;
INSERT INTO reason SELECT * FROM external_tpcds.reason;
INSERT INTO call_center SELECT * FROM external_tpcds.call_center;
INSERT OVERWRITE INTO item SELECT * FROM external_tpcds.item;
INSERT OVERWRITE INTO inventory SELECT * FROM external_tpcds.inventory;
INSERT INTO income_band SELECT * FROM external_tpcds.income_band;
INSERT INTO date_dim SELECT * FROM external_tpcds.date_dim;
INSERT OVERWRITE INTO customer_demographics SELECT * FROM external_tpcds.customer_demographics;
INSERT OVERWRITE INTO customer_address SELECT * FROM external_tpcds.customer_address;
INSERT OVERWRITE INTO customer SELECT * FROM external_tpcds.customer;
INSERT OVERWRITE INTO catalog_sales SELECT * FROM external_tpcds.catalog_sales;
INSERT OVERWRITE INTO catalog_returns SELECT * FROM external_tpcds.catalog_returns;
INSERT OVERWRITE INTO catalog_page SELECT * FROM external_tpcds.catalog_page;Step 4: Collect statistics
After the data is loaded, collect histogram statistics on all 24 tables. The query optimizer uses these statistics to generate efficient execution plans — skipping this step degrades query performance.
ANALYZE TABLE call_center UPDATE HISTOGRAM ;
ANALYZE TABLE catalog_page UPDATE HISTOGRAM ;
ANALYZE TABLE catalog_returns UPDATE HISTOGRAM ;
ANALYZE TABLE catalog_sales UPDATE HISTOGRAM ;
ANALYZE TABLE customer UPDATE HISTOGRAM ;
ANALYZE TABLE customer_address UPDATE HISTOGRAM ;
ANALYZE TABLE customer_demographics UPDATE HISTOGRAM ;
ANALYZE TABLE date_dim UPDATE HISTOGRAM ;
ANALYZE TABLE household_demographics UPDATE HISTOGRAM ;
ANALYZE TABLE income_band UPDATE HISTOGRAM ;
ANALYZE TABLE inventory UPDATE HISTOGRAM ;
ANALYZE TABLE item UPDATE HISTOGRAM ;
ANALYZE TABLE promotion UPDATE HISTOGRAM ;
ANALYZE TABLE reason UPDATE HISTOGRAM ;
ANALYZE TABLE ship_mode UPDATE HISTOGRAM ;
ANALYZE TABLE store UPDATE HISTOGRAM ;
ANALYZE TABLE store_returns UPDATE HISTOGRAM ;
ANALYZE TABLE store_sales UPDATE HISTOGRAM ;
ANALYZE TABLE time_dim UPDATE HISTOGRAM ;
ANALYZE TABLE warehouse UPDATE HISTOGRAM ;
ANALYZE TABLE web_page UPDATE HISTOGRAM ;
ANALYZE TABLE web_returns UPDATE HISTOGRAM ;
ANALYZE TABLE web_sales UPDATE HISTOGRAM ;
ANALYZE TABLE web_site UPDATE HISTOGRAM ;For more information about statistics, see Statistics.
Import using LOAD DATA
Use this method if you do not have a Data Lakehouse Edition cluster. It requires generating the TPC-DS data locally with dsdgen and then loading it into the cluster.
Prerequisites
Before you begin, ensure that you have:
The internal tables already created — see Create test tables
Step 1: Generate test data
Download the TPC-DS data generation tool dsdgen from the TPC official website and compile it to produce the
dsdgenbinary.Create a directory for the output files.
mkdir data1tbGenerate 1 TB of test data.
Parameter Description Example -scScale factor in GB. Use 1000for 1 TB.1000-dirOutput directory for the generated files. data1tb-TERMINATEWhether to add a field delimiter at the end of each row. Nomits the trailing delimiter;Yadds a vertical bar (|).N-PARALLELTotal number of chunks to split the dataset into. Run dsdgen once per chunk. 5-CHILDThe chunk number generated by the current command. 1./dsdgen -sc 1000 -dir data1tb -TERMINATE NTo generate data faster, split the work across parallel runs. The following example divides 1 TB into five chunks:
mkdir data1tb_5 ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 1 ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 2 ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 3 ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 4 ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 5dsdgen produces 25 pipe-delimited
.datfiles, one per table:call_center.dat catalog_page.dat catalog_returns.dat catalog_sales.dat customer_address.dat customer.dat customer_demographics.dat date_dim.dat dbgen_version.dat household_demographics.dat income_band.dat inventory.dat item.dat promotion.dat reason.dat ship_mode.dat store.dat store_returns.dat store_sales.dat time_dim.dat warehouse.dat web_page.dat web_returns.dat web_sales.dat web_site.datFor more information about dsdgen options, see the TPC-DS specification.
Step 2: Preprocess data for compatibility
The LOAD DATA statement fails on consecutive pipe delimiters, which dsdgen uses to represent NULL values (for example, a||c means the middle field is NULL). Run the following preprocessing scripts before importing.
Replace NULL values in integer, string, and date fields with `0`
#!/bin/bash
# Replace NULL values in the first field with 0 to convert ^| into 0|.
# Replace NULL values in the middle fields with 0 to convert || into |0|.
# Replace NULL values in the last field with 0 to convert |$ into |0.
for s_f in `ls *dat`
do
echo "$s_f"
i=1
while [ `egrep '\|\||^\||\|$' $s_f |wc -l` -gt 0 ]
do
echo $i
sed 's/^|/0|/g;s/||/|0|/g;s/|$/|0/g' -i $s_f
((i++))
done
doneFix date fields — replace `0` with `0000-00-00`
The previous script sets all NULL values to 0, including DATE fields. Run this script on the five tables that contain DATE columns to convert those 0 values into a valid date placeholder:
for s_f in item.dat store.dat web_page.dat web_site.dat call_center.dat
do
# Process the first and second date fields whose values are NULL (represented by 0).
sed 's/^\([A-Za-z0-9]*|[A-Za-z0-9]*\)|0|0|\(.*\)/\1|0000-00-00|0000-00-00|\2/' -i $s_f
# Process the second date fields whose values are NULL (represented by 0).
sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*|[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)|0|\(.*\)/\1|0000-00-00|\2/' -i $s_f
# Process the first date fields whose values are NULL (represented by 0).
sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*\)|0|\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}|.*\)/\1|0000-00-00|\2/' -i $s_f
doneStep 3: Load data into AnalyticDB for MySQL
Run LOAD DATA LOCAL INFILE for each .dat file.
Use LINES TERMINATED BY '\n' for files generated on Linux and LINES TERMINATED BY '\r\n' for files generated on Windows.
LOAD DATA LOCAL INFILE 'call_center.dat' INTO
TABLE call_center FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_page.dat' INTO
TABLE catalog_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_returns.dat'
INTO TABLE catalog_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_sales.dat'
INTO TABLE catalog_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'customer_address.dat'
INTO TABLE customer_address FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'customer.dat' INTO
TABLE customer FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'customer_demographics.dat'
INTO TABLE customer_demographics FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'date_dim.dat' INTO
TABLE date_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'dbgen_version.dat'
INTO TABLE dbgen_version FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'household_demographics.dat'
INTO TABLE household_demographics FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'income_band.dat' INTO
TABLE income_band FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'inventory.dat' INTO
TABLE inventory FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'item.dat' INTO TABLE
item FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'promotion.dat' INTO
TABLE promotion FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'reason.dat' INTO TABLE
reason FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'ship_mode.dat' INTO
TABLE ship_mode FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store.dat' INTO TABLE
store FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store_returns.dat'
INTO TABLE store_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store_sales.dat' INTO
TABLE store_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'time_dim.dat' INTO
TABLE time_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'warehouse.dat' INTO
TABLE warehouse FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_page.dat' INTO
TABLE web_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_returns.dat' INTO
TABLE web_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_sales.dat' INTO
TABLE web_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_site.dat' INTO
TABLE web_site FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';Step 4: Collect statistics
After the data is loaded, collect histogram statistics on all 24 tables. The query optimizer uses these statistics to generate efficient execution plans — skipping this step degrades query performance.
ANALYZE TABLE call_center UPDATE HISTOGRAM ;
ANALYZE TABLE catalog_page UPDATE HISTOGRAM ;
ANALYZE TABLE catalog_returns UPDATE HISTOGRAM ;
ANALYZE TABLE catalog_sales UPDATE HISTOGRAM ;
ANALYZE TABLE customer UPDATE HISTOGRAM ;
ANALYZE TABLE customer_address UPDATE HISTOGRAM ;
ANALYZE TABLE customer_demographics UPDATE HISTOGRAM ;
ANALYZE TABLE date_dim UPDATE HISTOGRAM ;
ANALYZE TABLE household_demographics UPDATE HISTOGRAM ;
ANALYZE TABLE income_band UPDATE HISTOGRAM ;
ANALYZE TABLE inventory UPDATE HISTOGRAM ;
ANALYZE TABLE item UPDATE HISTOGRAM ;
ANALYZE TABLE promotion UPDATE HISTOGRAM ;
ANALYZE TABLE reason UPDATE HISTOGRAM ;
ANALYZE TABLE ship_mode UPDATE HISTOGRAM ;
ANALYZE TABLE store UPDATE HISTOGRAM ;
ANALYZE TABLE store_returns UPDATE HISTOGRAM ;
ANALYZE TABLE store_sales UPDATE HISTOGRAM ;
ANALYZE TABLE time_dim UPDATE HISTOGRAM ;
ANALYZE TABLE warehouse UPDATE HISTOGRAM ;
ANALYZE TABLE web_page UPDATE HISTOGRAM ;
ANALYZE TABLE web_returns UPDATE HISTOGRAM ;
ANALYZE TABLE web_sales UPDATE HISTOGRAM ;
ANALYZE TABLE web_site UPDATE HISTOGRAM ;For more information about statistics, see Statistics.
What's next
With the dataset loaded, run the TPC-DS benchmark queries against your cluster. Download the official TPC-DS query set from the TPC official website.