All Products
Search
Document Center

AnalyticDB:Import test data

Last Updated:Sep 20, 2024

This topic describes how to build a TPC-DS test dataset and import 1,000 GB of TPC-DS test data to AnalyticDB for MySQL.

Note

The implementation of TPC-DS in this topic is derived from the TPC-DS benchmark and is not comparable to published TPC-DS benchmark results. The implementation does not comply with all the requirements of the TPC-DS benchmark.

The following table describes the number of rows in different tables of the TPC-DS dataset.

Table name

Number of rows

store_sales

2,879,987,999

catalog_sales

1,439,980,416

web_sales

720,000,376

store_returns

287,999,764

catalog_returns

143,996,756

inventory

783,000,000

web_returns

71,997,522

customer

12,000,000

customer_address

6,000,000

item

300,000

customer_demographics

1,920,800

date_dim

73,049

time_dim

86,400

catalog_page

30,000

web_page

3,000

store

1,002

promotion

1,500

household_demographics

7,200

web_site

54

call_center

42

reason

65

warehouse

20

ship_mode

20

income_band

20

Use OSS external tables to import data (recommended)

Important

You can perform the following operations only for Data Lakehouse Edition clusters.

  1. Create an external database.

    CREATE EXTERNAL DATABASE IF NOT EXISTS external_tpcds;
  2. Create 24 external tables.

    Note

    AnalyticDB for MySQL provides Object Storage Service (OSS) paths that can be used to store TPC-DS test data. Replace the OSS path specified by the LOCATION parameter based on the region in which the AnalyticDB for MySQL cluster resides.

    OSS paths for different regions

    Region

    Path

    China (Hangzhou)

    oss://dataset-cn-hangzhou-external/TPCDS/1TB

    China (Zhangjiakou)

    oss://dataset-cn-zhangjiakou-external/TPCDS/1TB

    China (Beijing)

    oss://dataset-cn-beijing-external/TPCDS/1TB

    China (Shanghai)

    oss://dataset-cn-shanghai-external/TPCDS/1TB

    China (Shenzhen)

    oss://dataset-cn-shenzhen-external/TPCDS/1TB

    China (Qingdao)

    oss://dataset-cn-qingdao-external/TPCDS/1TB

    China (Guangzhou)

    oss://dataset-cn-guangzhou-external/TPCDS/1TB

    China (Hong Kong)

    oss://dataset-cn-hongkong-external/TPCDS/1TB

    Singapore

    oss://dataset-ap-southeast-1-external/TPCDS/1TB

    Malaysia (Kuala Lumpur)

    oss://dataset-ap-southeast-3-external/TPCDS/1TB

    Japan (Tokyo)

    oss://dataset-ap-northeast-1-external/TPCDS/1TB

    Indonesia (Jakarta)

    oss://dataset-ap-southeast-5-external/TPCDS/1TB

    Germany (Frankfurt)

    oss://dataset-eu-central-1-external/TPCDS/1TB

    US (Silicon Valley)

    oss://dataset-us-west-1-external/TPCDS/1TB/

    UK (London)

    oss://dataset-eu-west-1-external/TPCDS/1TB

    US (Virginia)

    oss://dataset-us-east-1-external/TPCDS/1TB

    CREATE EXTERNAl TABLE external_tpcds.call_center
    (
      cc_call_center_sk             BIGINT not null,
      cc_call_center_id             CHAR(16) not null,
      cc_rec_start_date             DATE,
      cc_rec_end_date               DATE,
      cc_closed_date_sk             BIGINT,
      cc_open_date_sk               BIGINT,
      cc_name                       VARCHAR(50),
      cc_class                      VARCHAR(50),
      cc_employees                  INT,
      cc_sq_ft                      INT,
      cc_hours                      CHAR(20),
      cc_manager                    VARCHAR(40),
      cc_mkt_id                     INT,
      cc_mkt_class                  CHAR(50),
      cc_mkt_desc                   VARCHAR(100),
      cc_market_manager             VARCHAR(40),
      cc_division                   INT,
      cc_division_name              VARCHAR(50),
      cc_company                    INT,
      cc_company_name               CHAR(50),
      cc_street_number              CHAR(10),
      cc_street_name                VARCHAR(60),
      cc_street_type                CHAR(15),
      cc_suite_number               CHAR(10),
      cc_city                       VARCHAR(60),
      cc_county                     VARCHAR(30),
      cc_state                      CHAR(2),
      cc_zip                        CHAR(10),
      cc_country                    VARCHAR(20),
      cc_gmt_offset                 DECIMAL(5,2),
      cc_tax_percentage             DECIMAL(5,2),
      dummy varchar
    )ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/call_center';
    
    
    CREATE EXTERNAl TABLE external_tpcds.catalog_page
    (
      cp_catalog_page_sk     BIGINT not null,
      cp_catalog_page_id     VARCHAR(16) not null,
      cp_start_date_sk   BIGINT,
      cp_end_date_sk     BIGINT,
      cp_department      VARCHAR(50),
      cp_catalog_number  INT,
      cp_catalog_page_number INT,
      cp_description     VARCHAR(100),
      cp_type        VARCHAR(100),
      dummy varchar
    )ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_page';
    
    CREATE EXTERNAl TABLE external_tpcds.catalog_returns
    (
      cr_returned_date_sk        BIGINT,
      cr_returned_time_sk        BIGINT,
      cr_item_sk             BIGINT not null,
      cr_refunded_customer_sk    BIGINT,
      cr_refunded_cdemo_sk       BIGINT,
      cr_refunded_hdemo_sk       BIGINT,
      cr_refunded_addr_sk        BIGINT,
      cr_returning_customer_sk   BIGINT,
      cr_returning_cdemo_sk      BIGINT,
      cr_returning_hdemo_sk      BIGINT,
      cr_returning_addr_sk       BIGINT,
      cr_call_center_sk      BIGINT,
      cr_catalog_page_sk         BIGINT ,
      cr_ship_mode_sk        BIGINT ,
      cr_warehouse_sk        BIGINT ,
      cr_reason_sk           BIGINT ,
      cr_order_number        BIGINT not null,
      cr_return_quantity         INT,
      cr_return_amount       DECIMAL(7,2),
      cr_return_tax          DECIMAL(7,2),
      cr_return_amt_inc_tax      DECIMAL(7,2),
      cr_fee             DECIMAL(7,2),
      cr_return_ship_cost        DECIMAL(7,2),
      cr_refunded_cash       DECIMAL(7,2),
      cr_reversed_charge         DECIMAL(7,2),
      cr_store_credit        DECIMAL(7,2),
      cr_net_loss            DECIMAL(7,2),
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_returns';
    
    CREATE EXTERNAl TABLE external_tpcds.catalog_sales
    (
      cs_sold_date_sk           BIGINT,
      cs_sold_time_sk           BIGINT,
      cs_ship_date_sk           BIGINT,
      cs_bill_customer_sk       BIGINT,
      cs_bill_cdemo_sk          BIGINT,
      cs_bill_hdemo_sk          BIGINT,
      cs_bill_addr_sk           BIGINT,
      cs_ship_customer_sk       BIGINT,
      cs_ship_cdemo_sk          BIGINT,
      cs_ship_hdemo_sk          BIGINT,
      cs_ship_addr_sk           BIGINT,
      cs_call_center_sk         BIGINT,
      cs_catalog_page_sk        BIGINT,
      cs_ship_mode_sk           BIGINT,
      cs_warehouse_sk           BIGINT,
      cs_item_sk                BIGINT not null,
      cs_promo_sk               BIGINT,
      cs_order_number           BIGINT not null,
      cs_quantity               INT,
      cs_wholesale_cost         DECIMAL(7,2),
      cs_list_price             DECIMAL(7,2),
      cs_sales_price            DECIMAL(7,2),
      cs_ext_discount_amt       DECIMAL(7,2),
      cs_ext_sales_price        DECIMAL(7,2),
      cs_ext_wholesale_cost     DECIMAL(7,2),
      cs_ext_list_price         DECIMAL(7,2),
      cs_ext_tax                DECIMAL(7,2),
      cs_coupon_amt             DECIMAL(7,2),
      cs_ext_ship_cost          DECIMAL(7,2),
      cs_net_paid               DECIMAL(7,2),
      cs_net_paid_inc_tax       DECIMAL(7,2),
      cs_net_paid_inc_ship      DECIMAL(7,2),
      cs_net_paid_inc_ship_tax  DECIMAL(7,2),
      cs_net_profit             DECIMAL(7,2),
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_sales';
    
    CREATE EXTERNAl TABLE external_tpcds.customer
    (
      c_customer_sk         BIGINT NOT NULL,
      c_customer_id         CHAR(16) NOT NULL,
      c_current_cdemo_sk        BIGINT,
      c_current_hdemo_sk        BIGINT,
      c_current_addr_sk         BIGINT,
      c_first_shipto_date_sk    BIGINT,
      c_first_sales_date_sk     BIGINT,
      c_salutation          CHAR(10),
      c_first_name          CHAR(20),
      c_last_name           CHAR(30),
      c_preferred_cust_flag     char(1),
      c_birth_day           INT,
      c_birth_month         INT,
      c_birth_year          INT,
      c_birth_country       VARCHAR(20),
      c_login           CHAR(13),
      c_email_address       CHAR(50),
      c_last_review_date_sk     BIGINT,
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/customer';
    
    CREATE EXTERNAl TABLE external_tpcds.customer_address
    (
      ca_address_sk      BIGINT NOT NULL,
      ca_address_id      VARCHAR(16) NOT NULL,
      ca_street_number   VARCHAR(10),
      ca_street_name     VARCHAR(60),
      ca_street_type     VARCHAR(15),
      ca_suite_number    VARCHAR(10),
      ca_city        VARCHAR(60),
      ca_county      VARCHAR(30),
      ca_state       VARCHAR(2),
      ca_zip         VARCHAR(10),
      ca_country         VARCHAR(20),
      ca_gmt_offset      DECIMAL(5,2),
      ca_location_type   VARCHAR(20),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/customer_address';
    
    CREATE EXTERNAl TABLE external_tpcds.customer_demographics
    (
      cd_demo_sk                BIGINT not null,
      cd_gender                 char(1),
      cd_marital_status         char(1),
      cd_education_status       char(20),
      cd_purchase_estimate      INT,
      cd_credit_rating          char(10),
      cd_dep_count              INT,
      cd_dep_employed_count     INT,
      cd_dep_college_count      INT,
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/customer_demographics';
    
    CREATE EXTERNAl TABLE external_tpcds.date_dim
    (
      d_date_sk                 BIGINT not null,
      d_date_id                 CHAR(16) not null,
      d_date                    DATE,
      d_month_seq               INT,
      d_week_seq                INT,
      d_quarter_seq             INT,
      d_year                    INT,
      d_dow                     INT,
      d_moy                     INT,
      d_dom                     INT,
      d_qoy                     INT,
      d_fy_year                 INT,
      d_fy_quarter_seq          INT,
      d_fy_week_seq             INT,
      d_day_name                CHAR(9),
      d_quarter_name            CHAR(6),
      d_holiday                 CHAR(1),
      d_weekend                 CHAR(1),
      d_following_holiday       CHAR(1),
      d_first_dom               INT,
      d_last_dom                INT,
      d_same_day_ly             INT,
      d_same_day_lq             INT,
      d_current_day             CHAR(1),
      d_current_week            CHAR(1),
      d_current_month           CHAR(1),
      d_current_quarter         CHAR(1),
      d_current_year            CHAR(1),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/date_dim';
    
    CREATE EXTERNAl TABLE external_tpcds.household_demographics
    (
      hd_demo_sk                BIGINT not null,
      hd_income_band_sk         BIGINT,
      hd_buy_potential          CHAR(15),
      hd_dep_count              INT,
      hd_vehicle_count          INT,
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/household_demographics';
    
    CREATE EXTERNAl TABLE external_tpcds.income_band
    (
      ib_income_band_sk         BIGINT not null,
      ib_lower_bound            INT,
      ib_upper_bound            INT,
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/income_band';
    
    CREATE EXTERNAl TABLE external_tpcds.inventory
    (
      inv_date_sk               BIGINT not null,
      inv_item_sk               BIGINT not null,
      inv_warehouse_sk          BIGINT not null,
      inv_quantity_on_hand      INT,
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/inventory';
    
    
    CREATE EXTERNAl TABLE external_tpcds.item
    (
      i_item_sk                 BIGINT not null,
      i_item_id                 CHAR(16) not null,
      i_rec_start_date          DATE,
      i_rec_end_date            DATE,
      i_item_desc               VARCHAR(200),
      i_current_price           DECIMAL(7,2),
      i_wholesale_cost          DECIMAL(7,2),
      i_brand_id                INT,
      i_brand                   CHAR(50),
      i_class_id                INT,
      i_class                   CHAR(50),
      i_category_id             INT,
      i_category                CHAR(50),
      i_manufact_id             INT,
      i_manufact                CHAR(50),
      i_size                    CHAR(20),
      i_formulation             CHAR(20),
      i_color                   CHAR(20),
      i_units                   CHAR(10),
      i_container               CHAR(10),
      i_manager_id              INT,
      i_product_name            char(50),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/item';
    
    CREATE EXTERNAl TABLE external_tpcds.promotion
    (
      p_promo_sk                BIGINT not null,
      p_promo_id                CHAR(16) not null,
      p_start_date_sk           BIGINT,
      p_end_date_sk             BIGINT,
      p_item_sk                 BIGINT,
      p_cost                    DECIMAL(15,2),
      p_response_target         INT,
      p_promo_name              CHAR(50),
      p_channel_dmail           CHAR(1),
      p_channel_email           CHAR(1),
      p_channel_catalog         CHAR(1),
      p_channel_tv              CHAR(1),
      p_channel_radio           CHAR(1),
      p_channel_press           CHAR(1),
      p_channel_event           CHAR(1),
      p_channel_demo            CHAR(1),
      p_channel_details         VARCHAR(100),
      p_purpose                 CHAR(15),
      p_discount_active         CHAR(1),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/promotion';
    
    CREATE EXTERNAl TABLE external_tpcds.reason
    (
      r_reason_sk     BIGINT not null,
      r_reason_id     CHAR(16) not null,
      r_reason_desc   CHAR(100),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/reason';
    
    CREATE EXTERNAl TABLE external_tpcds.ship_mode
    (
      sm_ship_mode_sk           BIGINT,
      sm_ship_mode_id           CHAR(16) not null,
      sm_type                   CHAR(30),
      sm_code                   CHAR(10),
      sm_carrier                CHAR(20),
      sm_contract               CHAR(20),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/ship_mode';
    
    CREATE EXTERNAl TABLE external_tpcds.store_returns
    (
      sr_returned_date_sk       BIGINT,
      sr_return_time_sk         BIGINT,
      sr_item_sk                BIGINT not null,
      sr_customer_sk            BIGINT,
      sr_cdemo_sk               BIGINT,
      sr_hdemo_sk               BIGINT,
      sr_addr_sk                BIGINT,
      sr_store_sk               BIGINT,
      sr_reason_sk              BIGINT,
      sr_ticket_number          BIGINT not null,
      sr_return_quantity        INT,
      sr_return_amt             DECIMAL(7,2),
      sr_return_tax             DECIMAL(7,2),
      sr_return_amt_inc_tax     DECIMAL(7,2),
      sr_fee                    DECIMAL(7,2),
      sr_return_ship_cost       DECIMAL(7,2),
      sr_refunded_cash          DECIMAL(7,2),
      sr_reversed_charge        DECIMAL(7,2),
      sr_store_credit           DECIMAL(7,2),
      sr_net_loss               DECIMAL(7,2),
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/store_returns';
    
    CREATE EXTERNAl TABLE external_tpcds.store_sales
    (
      ss_sold_date_sk           BIGINT,
      ss_sold_time_sk           BIGINT,
      ss_item_sk                BIGINT not null,
      ss_customer_sk            BIGINT,
      ss_cdemo_sk               BIGINT,
      ss_hdemo_sk               BIGINT,
      ss_addr_sk                BIGINT,
      ss_store_sk               BIGINT,
      ss_promo_sk               BIGINT,
      ss_ticket_number          BIGINT not null,
      ss_quantity               INT,
      ss_wholesale_cost         DECIMAL(7,2),
      ss_list_price             DECIMAL(7,2),
      ss_sales_price            DECIMAL(7,2),
      ss_ext_discount_amt       DECIMAL(7,2),
      ss_ext_sales_price        DECIMAL(7,2),
      ss_ext_wholesale_cost     DECIMAL(7,2),
      ss_ext_list_price         DECIMAL(7,2),
      ss_ext_tax                DECIMAL(7,2),
      ss_coupon_amt             DECIMAL(7,2),
      ss_net_paid               DECIMAL(7,2),
      ss_net_paid_inc_tax       DECIMAL(7,2),
      ss_net_profit             DECIMAL(7,2),
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/store_sales';
    
    CREATE EXTERNAl TABLE external_tpcds.store
    (
      s_store_sk                BIGINT not null,
      s_store_id                CHAR(16) not null,
      s_rec_start_date          DATE,
      s_rec_end_date            DATE,
      s_closed_date_sk          BIGINT,
      s_store_name              VARCHAR(50),
      s_number_employees        INT,
      s_floor_space             INT,
      s_hours                   CHAR(20),
      s_manager                 VARCHAR(40),
      s_market_id               INT,
      s_geography_class         VARCHAR(100),
      s_market_desc             VARCHAR(100),
      s_market_manager          VARCHAR(40),
      s_division_id             INT,
      s_division_name           VARCHAR(50),
      s_company_id              INT,
      s_company_name            VARCHAR(50),
      s_street_number           VARCHAR(10),
      s_street_name             VARCHAR(60),
      s_street_type             CHAR(15),
      s_suite_number            CHAR(10),
      s_city                    VARCHAR(60),
      s_county                  VARCHAR(30),
      s_state                   CHAR(2),
      s_zip                     CHAR(10),
      s_country                 VARCHAR(20),
      s_gmt_offset              DECIMAL(5,2),
      s_tax_percentage          DECIMAL(5,2),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/store';
    
    CREATE EXTERNAl TABLE external_tpcds.time_dim
    (
      t_time_sk                 BIGINT not null,
      t_time_id                 CHAR(16) not null,
      t_time                    INT,
      t_hour                    INT,
      t_minute                  INT,
      t_second                  INT,
      t_am_pm                   CHAR(2),
      t_shift                   CHAR(20),
      t_sub_shift               CHAR(20),
      t_meal_time               CHAR(20),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/time_dim';
    
    CREATE EXTERNAl TABLE external_tpcds.warehouse
    (
      w_warehouse_sk            BIGINT not null,
      w_warehouse_id            CHAR(16) not null,
      w_warehouse_name          VARCHAR(20),
      w_warehouse_sq_ft         INT,
      w_street_number           CHAR(10),
      w_street_name             VARCHAR(60),
      w_street_type             CHAR(15),
      w_suite_number            CHAR(10),
      w_city                    VARCHAR(60),
      w_county                  VARCHAR(30),
      w_state                   CHAR(2),
      w_zip                     CHAR(10),
      w_country                 VARCHAR(20),
      w_gmt_offset              DECIMAL(5,2),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/warehouse';
    
    CREATE EXTERNAl TABLE external_tpcds.web_page
    (
      wp_web_page_sk            BIGINT not null,
      wp_web_page_id            CHAR(16) not null,
      wp_rec_start_date         DATE,
      wp_rec_end_date           DATE,
      wp_creation_date_sk       BIGINT,
      wp_access_date_sk         BIGINT,
      wp_autogen_flag           CHAR(1),
      wp_customer_sk            BIGINT,
      wp_url                    VARCHAR(100),
      wp_type                   CHAR(50),
      wp_char_count             INT,
      wp_link_count             INT,
      wp_image_count            INT,
      wp_max_ad_count           INT,
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/web_page';
    
    CREATE EXTERNAl TABLE external_tpcds.web_returns
    (
      wr_returned_date_sk       BIGINT,
      wr_returned_time_sk       BIGINT,
      wr_item_sk                BIGINT not null,
      wr_refunded_customer_sk   BIGINT,
      wr_refunded_cdemo_sk      BIGINT,
      wr_refunded_hdemo_sk      BIGINT,
      wr_refunded_addr_sk       BIGINT,
      wr_returning_customer_sk  BIGINT,
      wr_returning_cdemo_sk     BIGINT,
      wr_returning_hdemo_sk     BIGINT,
      wr_returning_addr_sk      BIGINT,
      wr_web_page_sk            BIGINT,
      wr_reason_sk              BIGINT,
      wr_order_number           BIGINT not null,
      wr_return_quantity        INT,
      wr_return_amt             DECIMAL(7,2),
      wr_return_tax             DECIMAL(7,2),
      wr_return_amt_inc_tax     DECIMAL(7,2),
      wr_fee                    DECIMAL(7,2),
      wr_return_ship_cost       DECIMAL(7,2),
      wr_refunded_cash          DECIMAL(7,2),
      wr_reversed_charge        DECIMAL(7,2),
      wr_account_credit         DECIMAL(7,2),
      wr_net_loss               DECIMAL(7,2),
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/web_returns';
    
    CREATE EXTERNAl TABLE external_tpcds.web_sales
    (
      ws_sold_date_sk           BIGINT,
      ws_sold_time_sk           BIGINT,
      ws_ship_date_sk           BIGINT,
      ws_item_sk                BIGINT not null,
      ws_bill_customer_sk       BIGINT,
      ws_bill_cdemo_sk          BIGINT,
      ws_bill_hdemo_sk          BIGINT,
      ws_bill_addr_sk           BIGINT,
      ws_ship_customer_sk       BIGINT,
      ws_ship_cdemo_sk          BIGINT,
      ws_ship_hdemo_sk          BIGINT,
      ws_ship_addr_sk           BIGINT,
      ws_web_page_sk            BIGINT,
      ws_web_site_sk            BIGINT,
      ws_ship_mode_sk           BIGINT,
      ws_warehouse_sk           BIGINT,
      ws_promo_sk               BIGINT,
      ws_order_number           BIGINT not null,
      ws_quantity               INT,
      ws_wholesale_cost         DECIMAL(7,2),
      ws_list_price             DECIMAL(7,2),
      ws_sales_price            DECIMAL(7,2),
      ws_ext_discount_amt       DECIMAL(7,2),
      ws_ext_sales_price        DECIMAL(7,2),
      ws_ext_wholesale_cost     DECIMAL(7,2),
      ws_ext_list_price         DECIMAL(7,2),
      ws_ext_tax                DECIMAL(7,2),
      ws_coupon_amt             DECIMAL(7,2),
      ws_ext_ship_cost          DECIMAL(7,2),
      ws_net_paid               DECIMAL(7,2),
      ws_net_paid_inc_tax       DECIMAL(7,2),
      ws_net_paid_inc_ship      DECIMAL(7,2),
      ws_net_paid_inc_ship_tax  DECIMAL(7,2),
      ws_net_profit             DECIMAL(7,2),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/web_sales';
    
    CREATE EXTERNAl TABLE external_tpcds.web_site
    (
      web_site_sk               BIGINT not null,
      web_site_id               CHAR(16) not null,
      web_rec_start_date        DATE,
      web_rec_end_date          DATE,
      web_name                  VARCHAR(50),
      web_open_date_sk          BIGINT,
      web_close_date_sk         BIGINT,
      web_class                 VARCHAR(50),
      web_manager               VARCHAR(40),
      web_mkt_id                INT,
      web_mkt_class             VARCHAR(50),
      web_mkt_desc              VARCHAR(100),
      web_market_manager        VARCHAR(40),
      web_company_id            INT,
      web_company_name          CHAR(50),
      web_street_number         CHAR(10),
      web_street_name           VARCHAR(60),
      web_street_type           CHAR(15),
      web_suite_number          CHAR(10),
      web_city                  VARCHAR(60),
      web_county                VARCHAR(30),
      web_state                 CHAR(2),
      web_zip                   CHAR(10),
      web_country               VARCHAR(20),
      web_gmt_offset            DECIMAL(5,2),
      web_tax_percentage        DECIMAL(5,2),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/web_site';
  3. Import data from the external tables to corresponding AnalyticDB for MySQL internal tables. For information about how to create internal tables for tests, see Create test tables.

    INSERT OVERWRITE INTO promotion
    SELECT *
    FROM external_tpcds.promotion;
    
    INSERT INTO web_site
    SELECT *
    FROM external_tpcds.web_site;
    
    INSERT OVERWRITE INTO web_sales
    SELECT *
    FROM external_tpcds.web_sales;
    
    INSERT OVERWRITE INTO web_returns
    SELECT *
    FROM external_tpcds.web_returns;
    
    INSERT OVERWRITE INTO web_page
    SELECT *
    FROM external_tpcds.web_page;
    
    INSERT INTO warehouse
    SELECT *
    FROM external_tpcds.warehouse;
    
    INSERT OVERWRITE INTO time_dim
    SELECT *
    FROM external_tpcds.time_dim;
    
    INSERT OVERWRITE INTO store_sales
    SELECT *
    FROM external_tpcds.store_sales;
    
    INSERT OVERWRITE INTO store_returns
    SELECT *
    FROM external_tpcds.store_returns;
    
    INSERT INTO store
    SELECT *
    FROM external_tpcds.store;
    
    
    INSERT OVERWRITE INTO household_demographics
    SELECT *
    FROM external_tpcds.household_demographics;
    
    INSERT INTO ship_mode
    SELECT *
    FROM external_tpcds.ship_mode;
    
    INSERT INTO reason
    SELECT *
    FROM external_tpcds.reason;
    
    INSERT INTO call_center
    SELECT *
    FROM external_tpcds.call_center;
    
    INSERT OVERWRITE INTO item
    SELECT *
    FROM external_tpcds.item;
    
    INSERT OVERWRITE INTO inventory
    SELECT *
    FROM external_tpcds.inventory;
    
    INSERT INTO income_band
    SELECT *
    FROM external_tpcds.income_band;
    
    INSERT INTO date_dim
    SELECT *
    FROM external_tpcds.date_dim;
    
    INSERT OVERWRITE INTO customer_demographics
    SELECT *
    FROM external_tpcds.customer_demographics;
    
    INSERT OVERWRITE INTO customer_address
    SELECT *
    FROM external_tpcds.customer_address;
    
    INSERT OVERWRITE INTO customer
    SELECT *
    FROM external_tpcds.customer;
    
    INSERT OVERWRITE INTO catalog_sales
    SELECT *
    FROM external_tpcds.catalog_sales;
    
    INSERT OVERWRITE INTO catalog_returns
    SELECT *
    FROM external_tpcds.catalog_returns;
    
    INSERT OVERWRITE INTO catalog_page
    SELECT *
    FROM external_tpcds.catalog_page;
  4. Collect statistics.

    ANALYZE TABLE call_center UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_page UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE customer UPDATE HISTOGRAM ;
    ANALYZE TABLE customer_address UPDATE HISTOGRAM ;
    ANALYZE TABLE customer_demographics UPDATE HISTOGRAM ;
    ANALYZE TABLE date_dim UPDATE HISTOGRAM ;
    ANALYZE TABLE household_demographics UPDATE HISTOGRAM ;
    ANALYZE TABLE income_band UPDATE HISTOGRAM ;
    ANALYZE TABLE inventory UPDATE HISTOGRAM ;
    ANALYZE TABLE item UPDATE HISTOGRAM ;
    ANALYZE TABLE promotion UPDATE HISTOGRAM ;
    ANALYZE TABLE reason UPDATE HISTOGRAM ;
    ANALYZE TABLE ship_mode UPDATE HISTOGRAM ;
    ANALYZE TABLE store UPDATE HISTOGRAM ;
    ANALYZE TABLE store_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE store_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE time_dim UPDATE HISTOGRAM ;
    ANALYZE TABLE warehouse UPDATE HISTOGRAM ;
    ANALYZE TABLE web_page UPDATE HISTOGRAM ;
    ANALYZE TABLE web_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE web_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE web_site UPDATE HISTOGRAM ;
    Note

    The query optimizer converts queries into execution plans that are executed by the execution engine. The quality of an execution plan affects its query performance. Statistics about data columns can be used to help the query optimizer generate high-quality execution plans. After you import data, you must collect histogram statistics on all tables to obtain the optimal performance. For more information about statistics, see Statistics.

Execute the LOAD DATA statement to import data

  1. Construct data.

    1. Download the TPC-DS data generation tool dsdgen from the TPC official website and compile the downloaded file to generate a binary executable file named dsdgen.

    2. Create a directory in which you want to store data files.

      mkdir data1tb
    3. Construct test data.

      ./dsdgen -sc 1000 -dir data1tb -TERMINATE N

      The following table describes the parameters.

      Parameter

      Description

      Example

      -sc

      The size of the test data. A value of 10 specifies 10 GB. A value of 1000 specifies 1000 GB, which is equivalent to 1 TB.

      1000

      -dir

      The directory in which you want to write data files.

      data1tb

      -TERMINATE

      Specifies whether to add a field delimiter at the end of each row. Valid values:

      • N: No field delimiter is added at the end of each row.

      • Y: A field delimiter such as a vertical bar (|) is added at the end of each row.

      N

      -PARALLEL

      The total number of chunks.

      One command can generate only one chunk. Therefore, this parameter specifies the times you need to run the command.

      5

      -CHILD

      The serial number of the chunk that the current command generates.

      1

      In the following sample code, 1 TB of test data is divided into five chunks:

      mkdir data1tb_5
      
      ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 1
      ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 2
      ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 3
      ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 4
      ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 5

      Test data files are generated in the text format by using the dsdgen commands. A vertical bar (|) is used as the default field delimiter, and each row contains only a single data entry.

      call_center.dat
      catalog_page.dat
      catalog_returns.dat
      catalog_sales.dat
      customer_address.dat
      customer.dat
      customer_demographics.dat
      date_dim.dat
      dbgen_version.dat
      household_demographics.dat
      income_band.dat
      inventory.dat
      item.dat
      promotion.dat
      reason.dat
      ship_mode.dat
      store.dat
      store_returns.dat
      store_sales.dat
      time_dim.dat
      warehouse.dat
      web_page.dat
      web_returns.dat
      web_sales.dat
      web_site.dat

      For more information about how to use dsdgen, see tpc-ds.

  2. Process data for compatibility.

    If no default value is specified for a field in a table, the default value is NULL.

    When a vertical bar (|) is used as the field delimiter, "a,NULL,c,d,NULL" is converted into "a||c|d|" in the generated text files. This leads to import failures when you import data to AnalyticDB for MySQL by executing the LOAD DATA statement. Therefore, NULL values must be replaced.

    • Replace NULL values in all fields of the INT, BIGINT, VARCHAR, and DATE types with 0

      • #!/bin/bash
        # Replace NULL values in the first field with 0 to convert ^| into 0|.
        # Replace NULL values in the middle fields with 0 to convert || into |0|.
        # Replace NULL values in the last field with 0 to convert |$ into |0.
        for s_f in `ls *dat`
        do
            echo "$s_f"
            i=1
            while [ `egrep '\|\||^\||\|$' $s_f |wc -l` -gt 0 ]
            do 
                echo $i
                sed 's/^|/0|/g;s/||/|0|/g;s/|$/|0/g' -i $s_f
                ((i++))
            done
        done
    • Replace the 0 value in date fields with 0000-00-00

      • 
        1092|AAAAAAAACEEAAAAA|2001-10-27|0|Manufa....
        16252|AAAAAAAAMHPDAAAA|0|1999-10-27|0|7.94|0|1001....
        16252|AAAAAAAAMHPDAAAA|0|0|0|7.94|0|1001..
        
        for s_f in item.dat store.dat web_page.dat web_site.dat call_center.dat
        do
        # Process the first and second date fields whose values are NULL (represented by 0).
        sed 's/^\([A-Za-z0-9]*|[A-Za-z0-9]*\)|0|0|\(.*\)/\1|0000-00-00|0000-00-00|\2/' -i $s_f
        
        # Process the second date fields whose values are NULL (represented by 0).
        sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*|[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)|0|\(.*\)/\1|0000-00-00|\2/' -i $s_f
        
        # Process the first date fields whose values are NULL (represented by 0).
        sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*\)|0|\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}|.*\)/\1|0000-00-00|\2/' -i $s_f
        
        done
  3. Execute the LOAD DATA LOCAL INFILE statement to import the data files that are generated by dsdgen to AnalyticDB for MySQL.

    Note
    • If the data files are generated in a Linux environment, end each row with \n.

    • If the data files are generated in a Windows environment, end each row with \r\n.

    LOAD DATA LOCAL INFILE 'call_center.dat' INTO
    TABLE call_center FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'catalog_page.dat' INTO
    TABLE catalog_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'catalog_returns.dat'
    INTO TABLE catalog_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'catalog_sales.dat'
    INTO TABLE catalog_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'customer_address.dat'
    INTO TABLE customer_address FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'customer.dat' INTO
    TABLE customer FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE
    'customer_demographics.dat' INTO TABLE customer_demographics FIELDS TERMINATED
    BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'date_dim.dat' INTO
    TABLE date_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'dbgen_version.dat'
    INTO TABLE dbgen_version FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE
    'household_demographics.dat' INTO TABLE household_demographics FIELDS
    TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'income_band.dat' INTO
    TABLE income_band FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'inventory.dat' INTO
    TABLE inventory FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'item.dat' INTO TABLE
    item FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'promotion.dat' INTO
    TABLE promotion FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'reason.dat' INTO TABLE
    reason FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'ship_mode.dat' INTO
    TABLE ship_mode FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'store.dat' INTO TABLE
    store FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'store_returns.dat'
    INTO TABLE store_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'store_sales.dat' INTO
    TABLE store_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'time_dim.dat' INTO
    TABLE time_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'warehouse.dat' INTO
    TABLE warehouse FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'web_page.dat' INTO
    TABLE web_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'web_returns.dat' INTO
    TABLE web_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'web_sales.dat' INTO
    TABLE web_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'web_site.dat' INTO
    TABLE web_site FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
  4. Collect statistics.

    ANALYZE TABLE call_center UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_page UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE customer UPDATE HISTOGRAM ;
    ANALYZE TABLE customer_address UPDATE HISTOGRAM ;
    ANALYZE TABLE customer_demographics UPDATE HISTOGRAM ;
    ANALYZE TABLE date_dim UPDATE HISTOGRAM ;
    ANALYZE TABLE household_demographics UPDATE HISTOGRAM ;
    ANALYZE TABLE income_band UPDATE HISTOGRAM ;
    ANALYZE TABLE inventory UPDATE HISTOGRAM ;
    ANALYZE TABLE item UPDATE HISTOGRAM ;
    ANALYZE TABLE promotion UPDATE HISTOGRAM ;
    ANALYZE TABLE reason UPDATE HISTOGRAM ;
    ANALYZE TABLE ship_mode UPDATE HISTOGRAM ;
    ANALYZE TABLE store UPDATE HISTOGRAM ;
    ANALYZE TABLE store_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE store_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE time_dim UPDATE HISTOGRAM ;
    ANALYZE TABLE warehouse UPDATE HISTOGRAM ;
    ANALYZE TABLE web_page UPDATE HISTOGRAM ;
    ANALYZE TABLE web_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE web_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE web_site UPDATE HISTOGRAM ;
    Note

    The query optimizer converts queries into execution plans that are executed by the execution engine. The quality of an execution plan affects its query performance. Statistics about data columns can be used to help the query optimizer generate high-quality execution plans. After you import data, you must collect histogram statistics on all tables to obtain the optimal performance. For more information about statistics, see Statistics.