All Products
Search
Document Center

AnalyticDB:Import test data

Last Updated:Mar 28, 2026

Load 1 TB of TPC-DS test data into AnalyticDB for MySQL to run performance benchmarks. Two methods are available: importing from pre-staged OSS paths (recommended for Data Lakehouse Edition clusters) and generating data locally with dsdgen and importing via LOAD DATA.

Note

This TPC-DS implementation is derived from the TPC-DS benchmark specification but does not comply with all its requirements and is not comparable to published TPC-DS benchmark results.

Dataset overview

The TPC-DS dataset contains 24 tables. At the 1 TB scale factor, the largest fact table (store_sales) contains approximately 2.9 billion rows, and the total dataset spans billions of rows across all fact tables.

Table nameNumber of rows
store_sales2,879,987,999
catalog_sales1,439,980,416
web_sales720,000,376
inventory783,000,000
store_returns287,999,764
catalog_returns143,996,756
web_returns71,997,522
customer12,000,000
customer_address6,000,000
customer_demographics1,920,800
item300,000
time_dim86,400
date_dim73,049
catalog_page30,000
web_page3,000
store1,002
promotion1,500
household_demographics7,200
web_site54
call_center42
reason65
warehouse20
ship_mode20
income_band20

Import from OSS external tables (recommended)

AnalyticDB for MySQL provides pre-staged TPC-DS data in Object Storage Service (OSS) buckets across 16 regions. Rather than generating and uploading data yourself, create external tables that point directly to these OSS paths and then copy the data into your internal tables.

Important

This method is available only for Data Lakehouse Edition clusters.

Prerequisites

Before you begin, ensure that you have:

  • A Data Lakehouse Edition cluster

  • The internal tables already created — see Create test tables

Step 1: Create an external database

CREATE EXTERNAL DATABASE IF NOT EXISTS external_tpcds;

Step 2: Create 24 external tables

Each external table maps to an OSS path for your cluster's region. The examples below use the China (Beijing) path. Replace the LOCATION value with the path for your region.

OSS paths by region

RegionPath
China (Hangzhou)oss://dataset-cn-hangzhou-external/TPCDS/1TB
China (Zhangjiakou)oss://dataset-cn-zhangjiakou-external/TPCDS/1TB
China (Beijing)oss://dataset-cn-beijing-external/TPCDS/1TB
China (Shanghai)oss://dataset-cn-shanghai-external/TPCDS/1TB
China (Shenzhen)oss://dataset-cn-shenzhen-external/TPCDS/1TB
China (Qingdao)oss://dataset-cn-qingdao-external/TPCDS/1TB
China (Guangzhou)oss://dataset-cn-guangzhou-external/TPCDS/1TB
China (Hong Kong)oss://dataset-cn-hongkong-external/TPCDS/1TB
Singaporeoss://dataset-ap-southeast-1-external/TPCDS/1TB
Malaysia (Kuala Lumpur)oss://dataset-ap-southeast-3-external/TPCDS/1TB
Japan (Tokyo)oss://dataset-ap-northeast-1-external/TPCDS/1TB
Indonesia (Jakarta)oss://dataset-ap-southeast-5-external/TPCDS/1TB
Germany (Frankfurt)oss://dataset-eu-central-1-external/TPCDS/1TB
US (Silicon Valley)oss://dataset-us-west-1-external/TPCDS/1TB/
UK (London)oss://dataset-eu-west-1-external/TPCDS/1TB
US (Virginia)oss://dataset-us-east-1-external/TPCDS/1TB

All external tables use pipe-delimited text format (ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE). Run the following statements to create all 24 tables:

CREATE EXTERNAL TABLE external_tpcds.call_center
(
  cc_call_center_sk             BIGINT not null,
  cc_call_center_id             CHAR(16) not null,
  cc_rec_start_date             DATE,
  cc_rec_end_date               DATE,
  cc_closed_date_sk             BIGINT,
  cc_open_date_sk               BIGINT,
  cc_name                       VARCHAR(50),
  cc_class                      VARCHAR(50),
  cc_employees                  INT,
  cc_sq_ft                      INT,
  cc_hours                      CHAR(20),
  cc_manager                    VARCHAR(40),
  cc_mkt_id                     INT,
  cc_mkt_class                  CHAR(50),
  cc_mkt_desc                   VARCHAR(100),
  cc_market_manager             VARCHAR(40),
  cc_division                   INT,
  cc_division_name              VARCHAR(50),
  cc_company                    INT,
  cc_company_name               CHAR(50),
  cc_street_number              CHAR(10),
  cc_street_name                VARCHAR(60),
  cc_street_type                CHAR(15),
  cc_suite_number               CHAR(10),
  cc_city                       VARCHAR(60),
  cc_county                     VARCHAR(30),
  cc_state                      CHAR(2),
  cc_zip                        CHAR(10),
  cc_country                    VARCHAR(20),
  cc_gmt_offset                 DECIMAL(5,2),
  cc_tax_percentage             DECIMAL(5,2),
  dummy varchar
)ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/call_center';


CREATE EXTERNAL TABLE external_tpcds.catalog_page
(
  cp_catalog_page_sk     BIGINT not null,
  cp_catalog_page_id     VARCHAR(16) not null,
  cp_start_date_sk   BIGINT,
  cp_end_date_sk     BIGINT,
  cp_department      VARCHAR(50),
  cp_catalog_number  INT,
  cp_catalog_page_number INT,
  cp_description     VARCHAR(100),
  cp_type        VARCHAR(100),
  dummy varchar
)ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_page';

CREATE EXTERNAL TABLE external_tpcds.catalog_returns
(
  cr_returned_date_sk        BIGINT,
  cr_returned_time_sk        BIGINT,
  cr_item_sk             BIGINT not null,
  cr_refunded_customer_sk    BIGINT,
  cr_refunded_cdemo_sk       BIGINT,
  cr_refunded_hdemo_sk       BIGINT,
  cr_refunded_addr_sk        BIGINT,
  cr_returning_customer_sk   BIGINT,
  cr_returning_cdemo_sk      BIGINT,
  cr_returning_hdemo_sk      BIGINT,
  cr_returning_addr_sk       BIGINT,
  cr_call_center_sk      BIGINT,
  cr_catalog_page_sk         BIGINT ,
  cr_ship_mode_sk        BIGINT ,
  cr_warehouse_sk        BIGINT ,
  cr_reason_sk           BIGINT ,
  cr_order_number        BIGINT not null,
  cr_return_quantity         INT,
  cr_return_amount       DECIMAL(7,2),
  cr_return_tax          DECIMAL(7,2),
  cr_return_amt_inc_tax      DECIMAL(7,2),
  cr_fee             DECIMAL(7,2),
  cr_return_ship_cost        DECIMAL(7,2),
  cr_refunded_cash       DECIMAL(7,2),
  cr_reversed_charge         DECIMAL(7,2),
  cr_store_credit        DECIMAL(7,2),
  cr_net_loss            DECIMAL(7,2),
  dummy varchar
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_returns';

CREATE EXTERNAL TABLE external_tpcds.catalog_sales
(
  cs_sold_date_sk           BIGINT,
  cs_sold_time_sk           BIGINT,
  cs_ship_date_sk           BIGINT,
  cs_bill_customer_sk       BIGINT,
  cs_bill_cdemo_sk          BIGINT,
  cs_bill_hdemo_sk          BIGINT,
  cs_bill_addr_sk           BIGINT,
  cs_ship_customer_sk       BIGINT,
  cs_ship_cdemo_sk          BIGINT,
  cs_ship_hdemo_sk          BIGINT,
  cs_ship_addr_sk           BIGINT,
  cs_call_center_sk         BIGINT,
  cs_catalog_page_sk        BIGINT,
  cs_ship_mode_sk           BIGINT,
  cs_warehouse_sk           BIGINT,
  cs_item_sk                BIGINT not null,
  cs_promo_sk               BIGINT,
  cs_order_number           BIGINT not null,
  cs_quantity               INT,
  cs_wholesale_cost         DECIMAL(7,2),
  cs_list_price             DECIMAL(7,2),
  cs_sales_price            DECIMAL(7,2),
  cs_ext_discount_amt       DECIMAL(7,2),
  cs_ext_sales_price        DECIMAL(7,2),
  cs_ext_wholesale_cost     DECIMAL(7,2),
  cs_ext_list_price         DECIMAL(7,2),
  cs_ext_tax                DECIMAL(7,2),
  cs_coupon_amt             DECIMAL(7,2),
  cs_ext_ship_cost          DECIMAL(7,2),
  cs_net_paid               DECIMAL(7,2),
  cs_net_paid_inc_tax       DECIMAL(7,2),
  cs_net_paid_inc_ship      DECIMAL(7,2),
  cs_net_paid_inc_ship_tax  DECIMAL(7,2),
  cs_net_profit             DECIMAL(7,2),
  dummy varchar
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_sales';

CREATE EXTERNAL TABLE external_tpcds.customer
(
  c_customer_sk         BIGINT NOT NULL,
  c_customer_id         CHAR(16) NOT NULL,
  c_current_cdemo_sk        BIGINT,
  c_current_hdemo_sk        BIGINT,
  c_current_addr_sk         BIGINT,
  c_first_shipto_date_sk    BIGINT,
  c_first_sales_date_sk     BIGINT,
  c_salutation          CHAR(10),
  c_first_name          CHAR(20),
  c_last_name           CHAR(30),
  c_preferred_cust_flag     char(1),
  c_birth_day           INT,
  c_birth_month         INT,
  c_birth_year          INT,
  c_birth_country       VARCHAR(20),
  c_login           CHAR(13),
  c_email_address       CHAR(50),
  c_last_review_date_sk     BIGINT,
  dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/customer';

CREATE EXTERNAL TABLE external_tpcds.customer_address
(
  ca_address_sk      BIGINT NOT NULL,
  ca_address_id      VARCHAR(16) NOT NULL,
  ca_street_number   VARCHAR(10),
  ca_street_name     VARCHAR(60),
  ca_street_type     VARCHAR(15),
  ca_suite_number    VARCHAR(10),
  ca_city        VARCHAR(60),
  ca_county      VARCHAR(30),
  ca_state       VARCHAR(2),
  ca_zip         VARCHAR(10),
  ca_country         VARCHAR(20),
  ca_gmt_offset      DECIMAL(5,2),
  ca_location_type   VARCHAR(20),
  dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/customer_address';

CREATE EXTERNAL TABLE external_tpcds.customer_demographics
(
  cd_demo_sk                BIGINT not null,
  cd_gender                 char(1),
  cd_marital_status         char(1),
  cd_education_status       char(20),
  cd_purchase_estimate      INT,
  cd_credit_rating          char(10),
  cd_dep_count              INT,
  cd_dep_employed_count     INT,
  cd_dep_college_count      INT,
  dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/customer_demographics';

CREATE EXTERNAL TABLE external_tpcds.date_dim
(
  d_date_sk                 BIGINT not null,
  d_date_id                 CHAR(16) not null,
  d_date                    DATE,
  d_month_seq               INT,
  d_week_seq                INT,
  d_quarter_seq             INT,
  d_year                    INT,
  d_dow                     INT,
  d_moy                     INT,
  d_dom                     INT,
  d_qoy                     INT,
  d_fy_year                 INT,
  d_fy_quarter_seq          INT,
  d_fy_week_seq             INT,
  d_day_name                CHAR(9),
  d_quarter_name            CHAR(6),
  d_holiday                 CHAR(1),
  d_weekend                 CHAR(1),
  d_following_holiday       CHAR(1),
  d_first_dom               INT,
  d_last_dom                INT,
  d_same_day_ly             INT,
  d_same_day_lq             INT,
  d_current_day             CHAR(1),
  d_current_week            CHAR(1),
  d_current_month           CHAR(1),
  d_current_quarter         CHAR(1),
  d_current_year            CHAR(1),
  dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/date_dim';

CREATE EXTERNAL TABLE external_tpcds.household_demographics
(
  hd_demo_sk                BIGINT not null,
  hd_income_band_sk         BIGINT,
  hd_buy_potential          CHAR(15),
  hd_dep_count              INT,
  hd_vehicle_count          INT,
  dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/household_demographics';

CREATE EXTERNAL TABLE external_tpcds.income_band
(
  ib_income_band_sk         BIGINT not null,
  ib_lower_bound            INT,
  ib_upper_bound            INT,
  dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/income_band';

CREATE EXTERNAL TABLE external_tpcds.inventory
(
  inv_date_sk               BIGINT not null,
  inv_item_sk               BIGINT not null,
  inv_warehouse_sk          BIGINT not null,
  inv_quantity_on_hand      INT,
  dummy varchar
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/inventory';


CREATE EXTERNAL TABLE external_tpcds.item
(
  i_item_sk                 BIGINT not null,
  i_item_id                 CHAR(16) not null,
  i_rec_start_date          DATE,
  i_rec_end_date            DATE,
  i_item_desc               VARCHAR(200),
  i_current_price           DECIMAL(7,2),
  i_wholesale_cost          DECIMAL(7,2),
  i_brand_id                INT,
  i_brand                   CHAR(50),
  i_class_id                INT,
  i_class                   CHAR(50),
  i_category_id             INT,
  i_category                CHAR(50),
  i_manufact_id             INT,
  i_manufact                CHAR(50),
  i_size                    CHAR(20),
  i_formulation             CHAR(20),
  i_color                   CHAR(20),
  i_units                   CHAR(10),
  i_container               CHAR(10),
  i_manager_id              INT,
  i_product_name            char(50),
  dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/item';

CREATE EXTERNAL TABLE external_tpcds.promotion
(
  p_promo_sk                BIGINT not null,
  p_promo_id                CHAR(16) not null,
  p_start_date_sk           BIGINT,
  p_end_date_sk             BIGINT,
  p_item_sk                 BIGINT,
  p_cost                    DECIMAL(15,2),
  p_response_target         INT,
  p_promo_name              CHAR(50),
  p_channel_dmail           CHAR(1),
  p_channel_email           CHAR(1),
  p_channel_catalog         CHAR(1),
  p_channel_tv              CHAR(1),
  p_channel_radio           CHAR(1),
  p_channel_press           CHAR(1),
  p_channel_event           CHAR(1),
  p_channel_demo            CHAR(1),
  p_channel_details         VARCHAR(100),
  p_purpose                 CHAR(15),
  p_discount_active         CHAR(1),
  dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/promotion';

CREATE EXTERNAL TABLE external_tpcds.reason
(
  r_reason_sk     BIGINT not null,
  r_reason_id     CHAR(16) not null,
  r_reason_desc   CHAR(100),
  dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/reason';

CREATE EXTERNAL TABLE external_tpcds.ship_mode
(
  sm_ship_mode_sk           BIGINT,
  sm_ship_mode_id           CHAR(16) not null,
  sm_type                   CHAR(30),
  sm_code                   CHAR(10),
  sm_carrier                CHAR(20),
  sm_contract               CHAR(20),
  dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/ship_mode';

CREATE EXTERNAL TABLE external_tpcds.store_returns
(
  sr_returned_date_sk       BIGINT,
  sr_return_time_sk         BIGINT,
  sr_item_sk                BIGINT not null,
  sr_customer_sk            BIGINT,
  sr_cdemo_sk               BIGINT,
  sr_hdemo_sk               BIGINT,
  sr_addr_sk                BIGINT,
  sr_store_sk               BIGINT,
  sr_reason_sk              BIGINT,
  sr_ticket_number          BIGINT not null,
  sr_return_quantity        INT,
  sr_return_amt             DECIMAL(7,2),
  sr_return_tax             DECIMAL(7,2),
  sr_return_amt_inc_tax     DECIMAL(7,2),
  sr_fee                    DECIMAL(7,2),
  sr_return_ship_cost       DECIMAL(7,2),
  sr_refunded_cash          DECIMAL(7,2),
  sr_reversed_charge        DECIMAL(7,2),
  sr_store_credit           DECIMAL(7,2),
  sr_net_loss               DECIMAL(7,2),
  dummy varchar
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/store_returns';

CREATE EXTERNAL TABLE external_tpcds.store_sales
(
  ss_sold_date_sk           BIGINT,
  ss_sold_time_sk           BIGINT,
  ss_item_sk                BIGINT not null,
  ss_customer_sk            BIGINT,
  ss_cdemo_sk               BIGINT,
  ss_hdemo_sk               BIGINT,
  ss_addr_sk                BIGINT,
  ss_store_sk               BIGINT,
  ss_promo_sk               BIGINT,
  ss_ticket_number          BIGINT not null,
  ss_quantity               INT,
  ss_wholesale_cost         DECIMAL(7,2),
  ss_list_price             DECIMAL(7,2),
  ss_sales_price            DECIMAL(7,2),
  ss_ext_discount_amt       DECIMAL(7,2),
  ss_ext_sales_price        DECIMAL(7,2),
  ss_ext_wholesale_cost     DECIMAL(7,2),
  ss_ext_list_price         DECIMAL(7,2),
  ss_ext_tax                DECIMAL(7,2),
  ss_coupon_amt             DECIMAL(7,2),
  ss_net_paid               DECIMAL(7,2),
  ss_net_paid_inc_tax       DECIMAL(7,2),
  ss_net_profit             DECIMAL(7,2),
  dummy varchar
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/store_sales';

CREATE EXTERNAL TABLE external_tpcds.store
(
  s_store_sk                BIGINT not null,
  s_store_id                CHAR(16) not null,
  s_rec_start_date          DATE,
  s_rec_end_date            DATE,
  s_closed_date_sk          BIGINT,
  s_store_name              VARCHAR(50),
  s_number_employees        INT,
  s_floor_space             INT,
  s_hours                   CHAR(20),
  s_manager                 VARCHAR(40),
  s_market_id               INT,
  s_geography_class         VARCHAR(100),
  s_market_desc             VARCHAR(100),
  s_market_manager          VARCHAR(40),
  s_division_id             INT,
  s_division_name           VARCHAR(50),
  s_company_id              INT,
  s_company_name            VARCHAR(50),
  s_street_number           VARCHAR(10),
  s_street_name             VARCHAR(60),
  s_street_type             CHAR(15),
  s_suite_number            CHAR(10),
  s_city                    VARCHAR(60),
  s_county                  VARCHAR(30),
  s_state                   CHAR(2),
  s_zip                     CHAR(10),
  s_country                 VARCHAR(20),
  s_gmt_offset              DECIMAL(5,2),
  s_tax_percentage          DECIMAL(5,2),
  dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/store';

CREATE EXTERNAL TABLE external_tpcds.time_dim
(
  t_time_sk                 BIGINT not null,
  t_time_id                 CHAR(16) not null,
  t_time                    INT,
  t_hour                    INT,
  t_minute                  INT,
  t_second                  INT,
  t_am_pm                   CHAR(2),
  t_shift                   CHAR(20),
  t_sub_shift               CHAR(20),
  t_meal_time               CHAR(20),
  dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/time_dim';

CREATE EXTERNAL TABLE external_tpcds.warehouse
(
  w_warehouse_sk            BIGINT not null,
  w_warehouse_id            CHAR(16) not null,
  w_warehouse_name          VARCHAR(20),
  w_warehouse_sq_ft         INT,
  w_street_number           CHAR(10),
  w_street_name             VARCHAR(60),
  w_street_type             CHAR(15),
  w_suite_number            CHAR(10),
  w_city                    VARCHAR(60),
  w_county                  VARCHAR(30),
  w_state                   CHAR(2),
  w_zip                     CHAR(10),
  w_country                 VARCHAR(20),
  w_gmt_offset              DECIMAL(5,2),
  dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/warehouse';

CREATE EXTERNAL TABLE external_tpcds.web_page
(
  wp_web_page_sk            BIGINT not null,
  wp_web_page_id            CHAR(16) not null,
  wp_rec_start_date         DATE,
  wp_rec_end_date           DATE,
  wp_creation_date_sk       BIGINT,
  wp_access_date_sk         BIGINT,
  wp_autogen_flag           CHAR(1),
  wp_customer_sk            BIGINT,
  wp_url                    VARCHAR(100),
  wp_type                   CHAR(50),
  wp_char_count             INT,
  wp_link_count             INT,
  wp_image_count            INT,
  wp_max_ad_count           INT,
  dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/web_page';

CREATE EXTERNAL TABLE external_tpcds.web_returns
(
  wr_returned_date_sk       BIGINT,
  wr_returned_time_sk       BIGINT,
  wr_item_sk                BIGINT not null,
  wr_refunded_customer_sk   BIGINT,
  wr_refunded_cdemo_sk      BIGINT,
  wr_refunded_hdemo_sk      BIGINT,
  wr_refunded_addr_sk       BIGINT,
  wr_returning_customer_sk  BIGINT,
  wr_returning_cdemo_sk     BIGINT,
  wr_returning_hdemo_sk     BIGINT,
  wr_returning_addr_sk      BIGINT,
  wr_web_page_sk            BIGINT,
  wr_reason_sk              BIGINT,
  wr_order_number           BIGINT not null,
  wr_return_quantity        INT,
  wr_return_amt             DECIMAL(7,2),
  wr_return_tax             DECIMAL(7,2),
  wr_return_amt_inc_tax     DECIMAL(7,2),
  wr_fee                    DECIMAL(7,2),
  wr_return_ship_cost       DECIMAL(7,2),
  wr_refunded_cash          DECIMAL(7,2),
  wr_reversed_charge        DECIMAL(7,2),
  wr_account_credit         DECIMAL(7,2),
  wr_net_loss               DECIMAL(7,2),
  dummy varchar
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/web_returns';

CREATE EXTERNAL TABLE external_tpcds.web_sales
(
  ws_sold_date_sk           BIGINT,
  ws_sold_time_sk           BIGINT,
  ws_ship_date_sk           BIGINT,
  ws_item_sk                BIGINT not null,
  ws_bill_customer_sk       BIGINT,
  ws_bill_cdemo_sk          BIGINT,
  ws_bill_hdemo_sk          BIGINT,
  ws_bill_addr_sk           BIGINT,
  ws_ship_customer_sk       BIGINT,
  ws_ship_cdemo_sk          BIGINT,
  ws_ship_hdemo_sk          BIGINT,
  ws_ship_addr_sk           BIGINT,
  ws_web_page_sk            BIGINT,
  ws_web_site_sk            BIGINT,
  ws_ship_mode_sk           BIGINT,
  ws_warehouse_sk           BIGINT,
  ws_promo_sk               BIGINT,
  ws_order_number           BIGINT not null,
  ws_quantity               INT,
  ws_wholesale_cost         DECIMAL(7,2),
  ws_list_price             DECIMAL(7,2),
  ws_sales_price            DECIMAL(7,2),
  ws_ext_discount_amt       DECIMAL(7,2),
  ws_ext_sales_price        DECIMAL(7,2),
  ws_ext_wholesale_cost     DECIMAL(7,2),
  ws_ext_list_price         DECIMAL(7,2),
  ws_ext_tax                DECIMAL(7,2),
  ws_coupon_amt             DECIMAL(7,2),
  ws_ext_ship_cost          DECIMAL(7,2),
  ws_net_paid               DECIMAL(7,2),
  ws_net_paid_inc_tax       DECIMAL(7,2),
  ws_net_paid_inc_ship      DECIMAL(7,2),
  ws_net_paid_inc_ship_tax  DECIMAL(7,2),
  ws_net_profit             DECIMAL(7,2),
  dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/web_sales';

CREATE EXTERNAL TABLE external_tpcds.web_site
(
  web_site_sk               BIGINT not null,
  web_site_id               CHAR(16) not null,
  web_rec_start_date        DATE,
  web_rec_end_date          DATE,
  web_name                  VARCHAR(50),
  web_open_date_sk          BIGINT,
  web_close_date_sk         BIGINT,
  web_class                 VARCHAR(50),
  web_manager               VARCHAR(40),
  web_mkt_id                INT,
  web_mkt_class             VARCHAR(50),
  web_mkt_desc              VARCHAR(100),
  web_market_manager        VARCHAR(40),
  web_company_id            INT,
  web_company_name          CHAR(50),
  web_street_number         CHAR(10),
  web_street_name           VARCHAR(60),
  web_street_type           CHAR(15),
  web_suite_number          CHAR(10),
  web_city                  VARCHAR(60),
  web_county                VARCHAR(30),
  web_state                 CHAR(2),
  web_zip                   CHAR(10),
  web_country               VARCHAR(20),
  web_gmt_offset            DECIMAL(5,2),
  web_tax_percentage        DECIMAL(5,2),
  dummy varchar
) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
STORED AS TEXTFILE
LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/web_site';

Step 3: Copy data into internal tables

Run INSERT INTO ... SELECT * FROM external_tpcds.<table> for each of the 24 tables. For information about creating the internal tables, see Create test tables.

INSERT OVERWRITE INTO promotion SELECT * FROM external_tpcds.promotion;
INSERT INTO web_site SELECT * FROM external_tpcds.web_site;
INSERT OVERWRITE INTO web_sales SELECT * FROM external_tpcds.web_sales;
INSERT OVERWRITE INTO web_returns SELECT * FROM external_tpcds.web_returns;
INSERT OVERWRITE INTO web_page SELECT * FROM external_tpcds.web_page;
INSERT INTO warehouse SELECT * FROM external_tpcds.warehouse;
INSERT OVERWRITE INTO time_dim SELECT * FROM external_tpcds.time_dim;
INSERT OVERWRITE INTO store_sales SELECT * FROM external_tpcds.store_sales;
INSERT OVERWRITE INTO store_returns SELECT * FROM external_tpcds.store_returns;
INSERT INTO store SELECT * FROM external_tpcds.store;
INSERT OVERWRITE INTO household_demographics SELECT * FROM external_tpcds.household_demographics;
INSERT INTO ship_mode SELECT * FROM external_tpcds.ship_mode;
INSERT INTO reason SELECT * FROM external_tpcds.reason;
INSERT INTO call_center SELECT * FROM external_tpcds.call_center;
INSERT OVERWRITE INTO item SELECT * FROM external_tpcds.item;
INSERT OVERWRITE INTO inventory SELECT * FROM external_tpcds.inventory;
INSERT INTO income_band SELECT * FROM external_tpcds.income_band;
INSERT INTO date_dim SELECT * FROM external_tpcds.date_dim;
INSERT OVERWRITE INTO customer_demographics SELECT * FROM external_tpcds.customer_demographics;
INSERT OVERWRITE INTO customer_address SELECT * FROM external_tpcds.customer_address;
INSERT OVERWRITE INTO customer SELECT * FROM external_tpcds.customer;
INSERT OVERWRITE INTO catalog_sales SELECT * FROM external_tpcds.catalog_sales;
INSERT OVERWRITE INTO catalog_returns SELECT * FROM external_tpcds.catalog_returns;
INSERT OVERWRITE INTO catalog_page SELECT * FROM external_tpcds.catalog_page;

Step 4: Collect statistics

After the data is loaded, collect histogram statistics on all 24 tables. The query optimizer uses these statistics to generate efficient execution plans — skipping this step degrades query performance.

ANALYZE TABLE call_center UPDATE HISTOGRAM ;
ANALYZE TABLE catalog_page UPDATE HISTOGRAM ;
ANALYZE TABLE catalog_returns UPDATE HISTOGRAM ;
ANALYZE TABLE catalog_sales UPDATE HISTOGRAM ;
ANALYZE TABLE customer UPDATE HISTOGRAM ;
ANALYZE TABLE customer_address UPDATE HISTOGRAM ;
ANALYZE TABLE customer_demographics UPDATE HISTOGRAM ;
ANALYZE TABLE date_dim UPDATE HISTOGRAM ;
ANALYZE TABLE household_demographics UPDATE HISTOGRAM ;
ANALYZE TABLE income_band UPDATE HISTOGRAM ;
ANALYZE TABLE inventory UPDATE HISTOGRAM ;
ANALYZE TABLE item UPDATE HISTOGRAM ;
ANALYZE TABLE promotion UPDATE HISTOGRAM ;
ANALYZE TABLE reason UPDATE HISTOGRAM ;
ANALYZE TABLE ship_mode UPDATE HISTOGRAM ;
ANALYZE TABLE store UPDATE HISTOGRAM ;
ANALYZE TABLE store_returns UPDATE HISTOGRAM ;
ANALYZE TABLE store_sales UPDATE HISTOGRAM ;
ANALYZE TABLE time_dim UPDATE HISTOGRAM ;
ANALYZE TABLE warehouse UPDATE HISTOGRAM ;
ANALYZE TABLE web_page UPDATE HISTOGRAM ;
ANALYZE TABLE web_returns UPDATE HISTOGRAM ;
ANALYZE TABLE web_sales UPDATE HISTOGRAM ;
ANALYZE TABLE web_site UPDATE HISTOGRAM ;

For more information about statistics, see Statistics.

Import using LOAD DATA

Use this method if you do not have a Data Lakehouse Edition cluster. It requires generating the TPC-DS data locally with dsdgen and then loading it into the cluster.

Prerequisites

Before you begin, ensure that you have:

Step 1: Generate test data

  1. Download the TPC-DS data generation tool dsdgen from the TPC official website and compile it to produce the dsdgen binary.

  2. Create a directory for the output files.

    mkdir data1tb
  3. Generate 1 TB of test data.

    ParameterDescriptionExample
    -scScale factor in GB. Use 1000 for 1 TB.1000
    -dirOutput directory for the generated files.data1tb
    -TERMINATEWhether to add a field delimiter at the end of each row. N omits the trailing delimiter; Y adds a vertical bar (|).N
    -PARALLELTotal number of chunks to split the dataset into. Run dsdgen once per chunk.5
    -CHILDThe chunk number generated by the current command.1
    ./dsdgen -sc 1000 -dir data1tb -TERMINATE N

    To generate data faster, split the work across parallel runs. The following example divides 1 TB into five chunks:

    mkdir data1tb_5
    
    ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 1
    ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 2
    ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 3
    ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 4
    ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 5

    dsdgen produces 25 pipe-delimited .dat files, one per table:

    call_center.dat       catalog_page.dat      catalog_returns.dat
    catalog_sales.dat     customer_address.dat  customer.dat
    customer_demographics.dat  date_dim.dat     dbgen_version.dat
    household_demographics.dat income_band.dat  inventory.dat
    item.dat              promotion.dat         reason.dat
    ship_mode.dat         store.dat             store_returns.dat
    store_sales.dat       time_dim.dat          warehouse.dat
    web_page.dat          web_returns.dat       web_sales.dat
    web_site.dat

    For more information about dsdgen options, see the TPC-DS specification.

Step 2: Preprocess data for compatibility

The LOAD DATA statement fails on consecutive pipe delimiters, which dsdgen uses to represent NULL values (for example, a||c means the middle field is NULL). Run the following preprocessing scripts before importing.

Replace NULL values in integer, string, and date fields with `0`

#!/bin/bash
# Replace NULL values in the first field with 0 to convert ^| into 0|.
# Replace NULL values in the middle fields with 0 to convert || into |0|.
# Replace NULL values in the last field with 0 to convert |$ into |0.
for s_f in `ls *dat`
do
    echo "$s_f"
    i=1
    while [ `egrep '\|\||^\||\|$' $s_f |wc -l` -gt 0 ]
    do
        echo $i
        sed 's/^|/0|/g;s/||/|0|/g;s/|$/|0/g' -i $s_f
        ((i++))
    done
done

Fix date fields — replace `0` with `0000-00-00`

The previous script sets all NULL values to 0, including DATE fields. Run this script on the five tables that contain DATE columns to convert those 0 values into a valid date placeholder:

for s_f in item.dat store.dat web_page.dat web_site.dat call_center.dat
do
# Process the first and second date fields whose values are NULL (represented by 0).
sed 's/^\([A-Za-z0-9]*|[A-Za-z0-9]*\)|0|0|\(.*\)/\1|0000-00-00|0000-00-00|\2/' -i $s_f

# Process the second date fields whose values are NULL (represented by 0).
sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*|[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)|0|\(.*\)/\1|0000-00-00|\2/' -i $s_f

# Process the first date fields whose values are NULL (represented by 0).
sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*\)|0|\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}|.*\)/\1|0000-00-00|\2/' -i $s_f

done

Step 3: Load data into AnalyticDB for MySQL

Run LOAD DATA LOCAL INFILE for each .dat file.

Note

Use LINES TERMINATED BY '\n' for files generated on Linux and LINES TERMINATED BY '\r\n' for files generated on Windows.

LOAD DATA LOCAL INFILE 'call_center.dat' INTO
TABLE call_center FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_page.dat' INTO
TABLE catalog_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_returns.dat'
INTO TABLE catalog_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'catalog_sales.dat'
INTO TABLE catalog_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'customer_address.dat'
INTO TABLE customer_address FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'customer.dat' INTO
TABLE customer FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'customer_demographics.dat'
INTO TABLE customer_demographics FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'date_dim.dat' INTO
TABLE date_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'dbgen_version.dat'
INTO TABLE dbgen_version FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'household_demographics.dat'
INTO TABLE household_demographics FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'income_band.dat' INTO
TABLE income_band FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'inventory.dat' INTO
TABLE inventory FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'item.dat' INTO TABLE
item FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'promotion.dat' INTO
TABLE promotion FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'reason.dat' INTO TABLE
reason FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'ship_mode.dat' INTO
TABLE ship_mode FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store.dat' INTO TABLE
store FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store_returns.dat'
INTO TABLE store_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'store_sales.dat' INTO
TABLE store_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'time_dim.dat' INTO
TABLE time_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'warehouse.dat' INTO
TABLE warehouse FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_page.dat' INTO
TABLE web_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_returns.dat' INTO
TABLE web_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_sales.dat' INTO
TABLE web_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
LOAD DATA LOCAL INFILE 'web_site.dat' INTO
TABLE web_site FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';

Step 4: Collect statistics

After the data is loaded, collect histogram statistics on all 24 tables. The query optimizer uses these statistics to generate efficient execution plans — skipping this step degrades query performance.

ANALYZE TABLE call_center UPDATE HISTOGRAM ;
ANALYZE TABLE catalog_page UPDATE HISTOGRAM ;
ANALYZE TABLE catalog_returns UPDATE HISTOGRAM ;
ANALYZE TABLE catalog_sales UPDATE HISTOGRAM ;
ANALYZE TABLE customer UPDATE HISTOGRAM ;
ANALYZE TABLE customer_address UPDATE HISTOGRAM ;
ANALYZE TABLE customer_demographics UPDATE HISTOGRAM ;
ANALYZE TABLE date_dim UPDATE HISTOGRAM ;
ANALYZE TABLE household_demographics UPDATE HISTOGRAM ;
ANALYZE TABLE income_band UPDATE HISTOGRAM ;
ANALYZE TABLE inventory UPDATE HISTOGRAM ;
ANALYZE TABLE item UPDATE HISTOGRAM ;
ANALYZE TABLE promotion UPDATE HISTOGRAM ;
ANALYZE TABLE reason UPDATE HISTOGRAM ;
ANALYZE TABLE ship_mode UPDATE HISTOGRAM ;
ANALYZE TABLE store UPDATE HISTOGRAM ;
ANALYZE TABLE store_returns UPDATE HISTOGRAM ;
ANALYZE TABLE store_sales UPDATE HISTOGRAM ;
ANALYZE TABLE time_dim UPDATE HISTOGRAM ;
ANALYZE TABLE warehouse UPDATE HISTOGRAM ;
ANALYZE TABLE web_page UPDATE HISTOGRAM ;
ANALYZE TABLE web_returns UPDATE HISTOGRAM ;
ANALYZE TABLE web_sales UPDATE HISTOGRAM ;
ANALYZE TABLE web_site UPDATE HISTOGRAM ;

For more information about statistics, see Statistics.

What's next

With the dataset loaded, run the TPC-DS benchmark queries against your cluster. Download the official TPC-DS query set from the TPC official website.