すべてのプロダクト
Search
ドキュメントセンター

AnalyticDB:テストデータのインポート

最終更新日:Sep 23, 2024

このトピックでは、TPC-DSテストデータセットを構築し、1,000 GBのTPC-DSテストデータをAnalyticDB for MySQLにインポートする方法について説明します。

説明

このトピックのTPC-DSの実装は、TPC-DSベンチマークから派生したものであり、公開されたTPC-DSベンチマークの結果とは比較できません。 実装は、TPC-DSベンチマークのすべての要件に準拠していません。

次の表に、TPC-DSデータセットのさまざまなテーブルの行数を示します。

テーブル名

行数

store_sales

2,879,987,999

catalog_sales

1,439,980,416

web_sales

720,000,376

store_returns

287,999,764

catalog_returns

143,996,756

在庫

783,000,000

web_returns

71,997,522

customer

12,000,000

customer_address

6,000,000

項目

300,000

customer_demographics

1,920,800

date_dim

73,049

time_dim

86,400

catalog_page

30,000

web_page

3,000

store

1,002

プロモーション

1,500

household_demographics

7,200

web_site

54

call_center

42

理由

65

倉庫

中国 (深セン) の Enterprise Edition トランジットルーターへ 20 GB の

ship_mode

中国 (深セン) の Enterprise Edition トランジットルーターへ 20 GB の

income_band

中国 (深セン) の Enterprise Edition トランジットルーターへ 20 GB の

OSS外部テーブルを使用したデータのインポート (推奨)

重要

以下の操作は、 Data Lakehouse Editionクラスター。

  1. 外部データベースを作成します。

    CREATE EXTERNAL DATABASE IF NOT EXISTS external_tpcds;
  2. 24の外部テーブルを作成します。

    説明

    AnalyticDB for MySQLは、TPC-DSテストデータの保存に使用できるObject Storage Service (OSS) パスを提供します。 AnalyticDB for MySQLクラスターが存在するリージョンに基づいて、LOCATIONパラメーターで指定されたOSSパスを置き換えます。

    OSSパスの異なるリージョン

    リージョン

    パス

    中国 (杭州)

    oss:// dataset-cn-hangzhou-external/TPCDS/1テラバイト

    中国 (張家口)

    oss:// dataset-cn-zhangjiakou-external/TPCDS/1テラバイト

    中国 (北京)

    oss:// dataset-cn-beijing-external/TPCDS/1テラバイト

    中国 (上海)

    oss:// dataset-cn-shanghai-external/TPCDS/1テラバイト

    中国 (深セン)

    oss:// dataset-cn-shenzhen-external/TPCDS/1テラバイト

    中国 (青島)

    oss:// dataset-cn-qingdao-external/TPCDS/1テラバイト

    中国 (広州)

    oss:// dataset-cn-guangzhou-external/TPCDS/1テラバイト

    中国 (香港)

    oss:// dataset-cn-hongkong-external/TPCDS/1テラバイト

    シンガポール

    oss:// dataset-ap-southeast-1-external/TPCDS/1テラバイト

    マレーシア (クアラルンプール)

    oss:// dataset-ap-southeast-3-external/TPCDS/1テラバイト

    日本 (東京)

    oss:// dataset-ap-northeast-1-external/TPCDS/1テラバイト

    インドネシア (ジャカルタ)

    oss:// dataset-ap-southeast-5-external/TPCDS/1テラバイト

    ドイツ (フランクフルト)

    oss:// dataset-eu-central-1-external/TPCDS/1テラバイト

    米国 (シリコンバレー)

    oss:// dataset-us-west-1-external/TPCDS/1テラバイト /

    イギリス (ロンドン)

    oss:// dataset-eu-west-1-external/TPCDS/1テラバイト

    米国 (バージニア)

    oss:// dataset-us-east-1-external/TPCDS/1テラバイト

    CREATE EXTERNAl TABLE external_tpcds.call_center
    (
      cc_call_center_sk             BIGINT not null,
      cc_call_center_id             CHAR(16) not null,
      cc_rec_start_date             DATE,
      cc_rec_end_date               DATE,
      cc_closed_date_sk             BIGINT,
      cc_open_date_sk               BIGINT,
      cc_name                       VARCHAR(50),
      cc_class                      VARCHAR(50),
      cc_employees                  INT,
      cc_sq_ft                      INT,
      cc_hours                      CHAR(20),
      cc_manager                    VARCHAR(40),
      cc_mkt_id                     INT,
      cc_mkt_class                  CHAR(50),
      cc_mkt_desc                   VARCHAR(100),
      cc_market_manager             VARCHAR(40),
      cc_division                   INT,
      cc_division_name              VARCHAR(50),
      cc_company                    INT,
      cc_company_name               CHAR(50),
      cc_street_number              CHAR(10),
      cc_street_name                VARCHAR(60),
      cc_street_type                CHAR(15),
      cc_suite_number               CHAR(10),
      cc_city                       VARCHAR(60),
      cc_county                     VARCHAR(30),
      cc_state                      CHAR(2),
      cc_zip                        CHAR(10),
      cc_country                    VARCHAR(20),
      cc_gmt_offset                 DECIMAL(5,2),
      cc_tax_percentage             DECIMAL(5,2),
      dummy varchar
    )ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/call_center';
    
    
    CREATE EXTERNAl TABLE external_tpcds.catalog_page
    (
      cp_catalog_page_sk     BIGINT not null,
      cp_catalog_page_id     VARCHAR(16) not null,
      cp_start_date_sk   BIGINT,
      cp_end_date_sk     BIGINT,
      cp_department      VARCHAR(50),
      cp_catalog_number  INT,
      cp_catalog_page_number INT,
      cp_description     VARCHAR(100),
      cp_type        VARCHAR(100),
      dummy varchar
    )ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_page';
    
    CREATE EXTERNAl TABLE external_tpcds.catalog_returns
    (
      cr_returned_date_sk        BIGINT,
      cr_returned_time_sk        BIGINT,
      cr_item_sk             BIGINT not null,
      cr_refunded_customer_sk    BIGINT,
      cr_refunded_cdemo_sk       BIGINT,
      cr_refunded_hdemo_sk       BIGINT,
      cr_refunded_addr_sk        BIGINT,
      cr_returning_customer_sk   BIGINT,
      cr_returning_cdemo_sk      BIGINT,
      cr_returning_hdemo_sk      BIGINT,
      cr_returning_addr_sk       BIGINT,
      cr_call_center_sk      BIGINT,
      cr_catalog_page_sk         BIGINT ,
      cr_ship_mode_sk        BIGINT ,
      cr_warehouse_sk        BIGINT ,
      cr_reason_sk           BIGINT ,
      cr_order_number        BIGINT not null,
      cr_return_quantity         INT,
      cr_return_amount       DECIMAL(7,2),
      cr_return_tax          DECIMAL(7,2),
      cr_return_amt_inc_tax      DECIMAL(7,2),
      cr_fee             DECIMAL(7,2),
      cr_return_ship_cost        DECIMAL(7,2),
      cr_refunded_cash       DECIMAL(7,2),
      cr_reversed_charge         DECIMAL(7,2),
      cr_store_credit        DECIMAL(7,2),
      cr_net_loss            DECIMAL(7,2),
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_returns';
    
    CREATE EXTERNAl TABLE external_tpcds.catalog_sales
    (
      cs_sold_date_sk           BIGINT,
      cs_sold_time_sk           BIGINT,
      cs_ship_date_sk           BIGINT,
      cs_bill_customer_sk       BIGINT,
      cs_bill_cdemo_sk          BIGINT,
      cs_bill_hdemo_sk          BIGINT,
      cs_bill_addr_sk           BIGINT,
      cs_ship_customer_sk       BIGINT,
      cs_ship_cdemo_sk          BIGINT,
      cs_ship_hdemo_sk          BIGINT,
      cs_ship_addr_sk           BIGINT,
      cs_call_center_sk         BIGINT,
      cs_catalog_page_sk        BIGINT,
      cs_ship_mode_sk           BIGINT,
      cs_warehouse_sk           BIGINT,
      cs_item_sk                BIGINT not null,
      cs_promo_sk               BIGINT,
      cs_order_number           BIGINT not null,
      cs_quantity               INT,
      cs_wholesale_cost         DECIMAL(7,2),
      cs_list_price             DECIMAL(7,2),
      cs_sales_price            DECIMAL(7,2),
      cs_ext_discount_amt       DECIMAL(7,2),
      cs_ext_sales_price        DECIMAL(7,2),
      cs_ext_wholesale_cost     DECIMAL(7,2),
      cs_ext_list_price         DECIMAL(7,2),
      cs_ext_tax                DECIMAL(7,2),
      cs_coupon_amt             DECIMAL(7,2),
      cs_ext_ship_cost          DECIMAL(7,2),
      cs_net_paid               DECIMAL(7,2),
      cs_net_paid_inc_tax       DECIMAL(7,2),
      cs_net_paid_inc_ship      DECIMAL(7,2),
      cs_net_paid_inc_ship_tax  DECIMAL(7,2),
      cs_net_profit             DECIMAL(7,2),
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/catalog_sales';
    
    CREATE EXTERNAl TABLE external_tpcds.customer
    (
      c_customer_sk         BIGINT NOT NULL,
      c_customer_id         CHAR(16) NOT NULL,
      c_current_cdemo_sk        BIGINT,
      c_current_hdemo_sk        BIGINT,
      c_current_addr_sk         BIGINT,
      c_first_shipto_date_sk    BIGINT,
      c_first_sales_date_sk     BIGINT,
      c_salutation          CHAR(10),
      c_first_name          CHAR(20),
      c_last_name           CHAR(30),
      c_preferred_cust_flag     char(1),
      c_birth_day           INT,
      c_birth_month         INT,
      c_birth_year          INT,
      c_birth_country       VARCHAR(20),
      c_login           CHAR(13),
      c_email_address       CHAR(50),
      c_last_review_date_sk     BIGINT,
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/customer';
    
    CREATE EXTERNAl TABLE external_tpcds.customer_address
    (
      ca_address_sk      BIGINT NOT NULL,
      ca_address_id      VARCHAR(16) NOT NULL,
      ca_street_number   VARCHAR(10),
      ca_street_name     VARCHAR(60),
      ca_street_type     VARCHAR(15),
      ca_suite_number    VARCHAR(10),
      ca_city        VARCHAR(60),
      ca_county      VARCHAR(30),
      ca_state       VARCHAR(2),
      ca_zip         VARCHAR(10),
      ca_country         VARCHAR(20),
      ca_gmt_offset      DECIMAL(5,2),
      ca_location_type   VARCHAR(20),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/customer_address';
    
    CREATE EXTERNAl TABLE external_tpcds.customer_demographics
    (
      cd_demo_sk                BIGINT not null,
      cd_gender                 char(1),
      cd_marital_status         char(1),
      cd_education_status       char(20),
      cd_purchase_estimate      INT,
      cd_credit_rating          char(10),
      cd_dep_count              INT,
      cd_dep_employed_count     INT,
      cd_dep_college_count      INT,
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/customer_demographics';
    
    CREATE EXTERNAl TABLE external_tpcds.date_dim
    (
      d_date_sk                 BIGINT not null,
      d_date_id                 CHAR(16) not null,
      d_date                    DATE,
      d_month_seq               INT,
      d_week_seq                INT,
      d_quarter_seq             INT,
      d_year                    INT,
      d_dow                     INT,
      d_moy                     INT,
      d_dom                     INT,
      d_qoy                     INT,
      d_fy_year                 INT,
      d_fy_quarter_seq          INT,
      d_fy_week_seq             INT,
      d_day_name                CHAR(9),
      d_quarter_name            CHAR(6),
      d_holiday                 CHAR(1),
      d_weekend                 CHAR(1),
      d_following_holiday       CHAR(1),
      d_first_dom               INT,
      d_last_dom                INT,
      d_same_day_ly             INT,
      d_same_day_lq             INT,
      d_current_day             CHAR(1),
      d_current_week            CHAR(1),
      d_current_month           CHAR(1),
      d_current_quarter         CHAR(1),
      d_current_year            CHAR(1),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/date_dim';
    
    CREATE EXTERNAl TABLE external_tpcds.household_demographics
    (
      hd_demo_sk                BIGINT not null,
      hd_income_band_sk         BIGINT,
      hd_buy_potential          CHAR(15),
      hd_dep_count              INT,
      hd_vehicle_count          INT,
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/household_demographics';
    
    CREATE EXTERNAl TABLE external_tpcds.income_band
    (
      ib_income_band_sk         BIGINT not null,
      ib_lower_bound            INT,
      ib_upper_bound            INT,
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/income_band';
    
    CREATE EXTERNAl TABLE external_tpcds.inventory
    (
      inv_date_sk               BIGINT not null,
      inv_item_sk               BIGINT not null,
      inv_warehouse_sk          BIGINT not null,
      inv_quantity_on_hand      INT,
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/inventory';
    
    
    CREATE EXTERNAl TABLE external_tpcds.item
    (
      i_item_sk                 BIGINT not null,
      i_item_id                 CHAR(16) not null,
      i_rec_start_date          DATE,
      i_rec_end_date            DATE,
      i_item_desc               VARCHAR(200),
      i_current_price           DECIMAL(7,2),
      i_wholesale_cost          DECIMAL(7,2),
      i_brand_id                INT,
      i_brand                   CHAR(50),
      i_class_id                INT,
      i_class                   CHAR(50),
      i_category_id             INT,
      i_category                CHAR(50),
      i_manufact_id             INT,
      i_manufact                CHAR(50),
      i_size                    CHAR(20),
      i_formulation             CHAR(20),
      i_color                   CHAR(20),
      i_units                   CHAR(10),
      i_container               CHAR(10),
      i_manager_id              INT,
      i_product_name            char(50),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/item';
    
    CREATE EXTERNAl TABLE external_tpcds.promotion
    (
      p_promo_sk                BIGINT not null,
      p_promo_id                CHAR(16) not null,
      p_start_date_sk           BIGINT,
      p_end_date_sk             BIGINT,
      p_item_sk                 BIGINT,
      p_cost                    DECIMAL(15,2),
      p_response_target         INT,
      p_promo_name              CHAR(50),
      p_channel_dmail           CHAR(1),
      p_channel_email           CHAR(1),
      p_channel_catalog         CHAR(1),
      p_channel_tv              CHAR(1),
      p_channel_radio           CHAR(1),
      p_channel_press           CHAR(1),
      p_channel_event           CHAR(1),
      p_channel_demo            CHAR(1),
      p_channel_details         VARCHAR(100),
      p_purpose                 CHAR(15),
      p_discount_active         CHAR(1),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/promotion';
    
    CREATE EXTERNAl TABLE external_tpcds.reason
    (
      r_reason_sk     BIGINT not null,
      r_reason_id     CHAR(16) not null,
      r_reason_desc   CHAR(100),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/reason';
    
    CREATE EXTERNAl TABLE external_tpcds.ship_mode
    (
      sm_ship_mode_sk           BIGINT,
      sm_ship_mode_id           CHAR(16) not null,
      sm_type                   CHAR(30),
      sm_code                   CHAR(10),
      sm_carrier                CHAR(20),
      sm_contract               CHAR(20),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/ship_mode';
    
    CREATE EXTERNAl TABLE external_tpcds.store_returns
    (
      sr_returned_date_sk       BIGINT,
      sr_return_time_sk         BIGINT,
      sr_item_sk                BIGINT not null,
      sr_customer_sk            BIGINT,
      sr_cdemo_sk               BIGINT,
      sr_hdemo_sk               BIGINT,
      sr_addr_sk                BIGINT,
      sr_store_sk               BIGINT,
      sr_reason_sk              BIGINT,
      sr_ticket_number          BIGINT not null,
      sr_return_quantity        INT,
      sr_return_amt             DECIMAL(7,2),
      sr_return_tax             DECIMAL(7,2),
      sr_return_amt_inc_tax     DECIMAL(7,2),
      sr_fee                    DECIMAL(7,2),
      sr_return_ship_cost       DECIMAL(7,2),
      sr_refunded_cash          DECIMAL(7,2),
      sr_reversed_charge        DECIMAL(7,2),
      sr_store_credit           DECIMAL(7,2),
      sr_net_loss               DECIMAL(7,2),
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/store_returns';
    
    CREATE EXTERNAl TABLE external_tpcds.store_sales
    (
      ss_sold_date_sk           BIGINT,
      ss_sold_time_sk           BIGINT,
      ss_item_sk                BIGINT not null,
      ss_customer_sk            BIGINT,
      ss_cdemo_sk               BIGINT,
      ss_hdemo_sk               BIGINT,
      ss_addr_sk                BIGINT,
      ss_store_sk               BIGINT,
      ss_promo_sk               BIGINT,
      ss_ticket_number          BIGINT not null,
      ss_quantity               INT,
      ss_wholesale_cost         DECIMAL(7,2),
      ss_list_price             DECIMAL(7,2),
      ss_sales_price            DECIMAL(7,2),
      ss_ext_discount_amt       DECIMAL(7,2),
      ss_ext_sales_price        DECIMAL(7,2),
      ss_ext_wholesale_cost     DECIMAL(7,2),
      ss_ext_list_price         DECIMAL(7,2),
      ss_ext_tax                DECIMAL(7,2),
      ss_coupon_amt             DECIMAL(7,2),
      ss_net_paid               DECIMAL(7,2),
      ss_net_paid_inc_tax       DECIMAL(7,2),
      ss_net_profit             DECIMAL(7,2),
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/store_sales';
    
    CREATE EXTERNAl TABLE external_tpcds.store
    (
      s_store_sk                BIGINT not null,
      s_store_id                CHAR(16) not null,
      s_rec_start_date          DATE,
      s_rec_end_date            DATE,
      s_closed_date_sk          BIGINT,
      s_store_name              VARCHAR(50),
      s_number_employees        INT,
      s_floor_space             INT,
      s_hours                   CHAR(20),
      s_manager                 VARCHAR(40),
      s_market_id               INT,
      s_geography_class         VARCHAR(100),
      s_market_desc             VARCHAR(100),
      s_market_manager          VARCHAR(40),
      s_division_id             INT,
      s_division_name           VARCHAR(50),
      s_company_id              INT,
      s_company_name            VARCHAR(50),
      s_street_number           VARCHAR(10),
      s_street_name             VARCHAR(60),
      s_street_type             CHAR(15),
      s_suite_number            CHAR(10),
      s_city                    VARCHAR(60),
      s_county                  VARCHAR(30),
      s_state                   CHAR(2),
      s_zip                     CHAR(10),
      s_country                 VARCHAR(20),
      s_gmt_offset              DECIMAL(5,2),
      s_tax_percentage          DECIMAL(5,2),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/store';
    
    CREATE EXTERNAl TABLE external_tpcds.time_dim
    (
      t_time_sk                 BIGINT not null,
      t_time_id                 CHAR(16) not null,
      t_time                    INT,
      t_hour                    INT,
      t_minute                  INT,
      t_second                  INT,
      t_am_pm                   CHAR(2),
      t_shift                   CHAR(20),
      t_sub_shift               CHAR(20),
      t_meal_time               CHAR(20),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/time_dim';
    
    CREATE EXTERNAl TABLE external_tpcds.warehouse
    (
      w_warehouse_sk            BIGINT not null,
      w_warehouse_id            CHAR(16) not null,
      w_warehouse_name          VARCHAR(20),
      w_warehouse_sq_ft         INT,
      w_street_number           CHAR(10),
      w_street_name             VARCHAR(60),
      w_street_type             CHAR(15),
      w_suite_number            CHAR(10),
      w_city                    VARCHAR(60),
      w_county                  VARCHAR(30),
      w_state                   CHAR(2),
      w_zip                     CHAR(10),
      w_country                 VARCHAR(20),
      w_gmt_offset              DECIMAL(5,2),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/warehouse';
    
    CREATE EXTERNAl TABLE external_tpcds.web_page
    (
      wp_web_page_sk            BIGINT not null,
      wp_web_page_id            CHAR(16) not null,
      wp_rec_start_date         DATE,
      wp_rec_end_date           DATE,
      wp_creation_date_sk       BIGINT,
      wp_access_date_sk         BIGINT,
      wp_autogen_flag           CHAR(1),
      wp_customer_sk            BIGINT,
      wp_url                    VARCHAR(100),
      wp_type                   CHAR(50),
      wp_char_count             INT,
      wp_link_count             INT,
      wp_image_count            INT,
      wp_max_ad_count           INT,
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/web_page';
    
    CREATE EXTERNAl TABLE external_tpcds.web_returns
    (
      wr_returned_date_sk       BIGINT,
      wr_returned_time_sk       BIGINT,
      wr_item_sk                BIGINT not null,
      wr_refunded_customer_sk   BIGINT,
      wr_refunded_cdemo_sk      BIGINT,
      wr_refunded_hdemo_sk      BIGINT,
      wr_refunded_addr_sk       BIGINT,
      wr_returning_customer_sk  BIGINT,
      wr_returning_cdemo_sk     BIGINT,
      wr_returning_hdemo_sk     BIGINT,
      wr_returning_addr_sk      BIGINT,
      wr_web_page_sk            BIGINT,
      wr_reason_sk              BIGINT,
      wr_order_number           BIGINT not null,
      wr_return_quantity        INT,
      wr_return_amt             DECIMAL(7,2),
      wr_return_tax             DECIMAL(7,2),
      wr_return_amt_inc_tax     DECIMAL(7,2),
      wr_fee                    DECIMAL(7,2),
      wr_return_ship_cost       DECIMAL(7,2),
      wr_refunded_cash          DECIMAL(7,2),
      wr_reversed_charge        DECIMAL(7,2),
      wr_account_credit         DECIMAL(7,2),
      wr_net_loss               DECIMAL(7,2),
      dummy varchar
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/web_returns';
    
    CREATE EXTERNAl TABLE external_tpcds.web_sales
    (
      ws_sold_date_sk           BIGINT,
      ws_sold_time_sk           BIGINT,
      ws_ship_date_sk           BIGINT,
      ws_item_sk                BIGINT not null,
      ws_bill_customer_sk       BIGINT,
      ws_bill_cdemo_sk          BIGINT,
      ws_bill_hdemo_sk          BIGINT,
      ws_bill_addr_sk           BIGINT,
      ws_ship_customer_sk       BIGINT,
      ws_ship_cdemo_sk          BIGINT,
      ws_ship_hdemo_sk          BIGINT,
      ws_ship_addr_sk           BIGINT,
      ws_web_page_sk            BIGINT,
      ws_web_site_sk            BIGINT,
      ws_ship_mode_sk           BIGINT,
      ws_warehouse_sk           BIGINT,
      ws_promo_sk               BIGINT,
      ws_order_number           BIGINT not null,
      ws_quantity               INT,
      ws_wholesale_cost         DECIMAL(7,2),
      ws_list_price             DECIMAL(7,2),
      ws_sales_price            DECIMAL(7,2),
      ws_ext_discount_amt       DECIMAL(7,2),
      ws_ext_sales_price        DECIMAL(7,2),
      ws_ext_wholesale_cost     DECIMAL(7,2),
      ws_ext_list_price         DECIMAL(7,2),
      ws_ext_tax                DECIMAL(7,2),
      ws_coupon_amt             DECIMAL(7,2),
      ws_ext_ship_cost          DECIMAL(7,2),
      ws_net_paid               DECIMAL(7,2),
      ws_net_paid_inc_tax       DECIMAL(7,2),
      ws_net_paid_inc_ship      DECIMAL(7,2),
      ws_net_paid_inc_ship_tax  DECIMAL(7,2),
      ws_net_profit             DECIMAL(7,2),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/web_sales';
    
    CREATE EXTERNAl TABLE external_tpcds.web_site
    (
      web_site_sk               BIGINT not null,
      web_site_id               CHAR(16) not null,
      web_rec_start_date        DATE,
      web_rec_end_date          DATE,
      web_name                  VARCHAR(50),
      web_open_date_sk          BIGINT,
      web_close_date_sk         BIGINT,
      web_class                 VARCHAR(50),
      web_manager               VARCHAR(40),
      web_mkt_id                INT,
      web_mkt_class             VARCHAR(50),
      web_mkt_desc              VARCHAR(100),
      web_market_manager        VARCHAR(40),
      web_company_id            INT,
      web_company_name          CHAR(50),
      web_street_number         CHAR(10),
      web_street_name           VARCHAR(60),
      web_street_type           CHAR(15),
      web_suite_number          CHAR(10),
      web_city                  VARCHAR(60),
      web_county                VARCHAR(30),
      web_state                 CHAR(2),
      web_zip                   CHAR(10),
      web_country               VARCHAR(20),
      web_gmt_offset            DECIMAL(5,2),
      web_tax_percentage        DECIMAL(5,2),
      dummy varchar
    ) ROW FORMAT DELIMITED FIELDS TERMINATED BY  '|'
    STORED AS TEXTFILE
    LOCATION  'oss://dataset-cn-beijing-external/TPCDS/1TB/web_site';
  3. 外部テーブルから対応するAnalyticDB for MySQL内部テーブルにデータをインポートします。 テスト用の内部テーブルの作成方法については、「テストテーブルの作成」をご参照ください。

    INSERT OVERWRITE INTO promotion
    SELECT *
    FROM external_tpcds.promotion;
    
    INSERT INTO web_site
    SELECT *
    FROM external_tpcds.web_site;
    
    INSERT OVERWRITE INTO web_sales
    SELECT *
    FROM external_tpcds.web_sales;
    
    INSERT OVERWRITE INTO web_returns
    SELECT *
    FROM external_tpcds.web_returns;
    
    INSERT OVERWRITE INTO web_page
    SELECT *
    FROM external_tpcds.web_page;
    
    INSERT INTO warehouse
    SELECT *
    FROM external_tpcds.warehouse;
    
    INSERT OVERWRITE INTO time_dim
    SELECT *
    FROM external_tpcds.time_dim;
    
    INSERT OVERWRITE INTO store_sales
    SELECT *
    FROM external_tpcds.store_sales;
    
    INSERT OVERWRITE INTO store_returns
    SELECT *
    FROM external_tpcds.store_returns;
    
    INSERT INTO store
    SELECT *
    FROM external_tpcds.store;
    
    
    INSERT OVERWRITE INTO household_demographics
    SELECT *
    FROM external_tpcds.household_demographics;
    
    INSERT INTO ship_mode
    SELECT *
    FROM external_tpcds.ship_mode;
    
    INSERT INTO reason
    SELECT *
    FROM external_tpcds.reason;
    
    INSERT INTO call_center
    SELECT *
    FROM external_tpcds.call_center;
    
    INSERT OVERWRITE INTO item
    SELECT *
    FROM external_tpcds.item;
    
    INSERT OVERWRITE INTO inventory
    SELECT *
    FROM external_tpcds.inventory;
    
    INSERT INTO income_band
    SELECT *
    FROM external_tpcds.income_band;
    
    INSERT INTO date_dim
    SELECT *
    FROM external_tpcds.date_dim;
    
    INSERT OVERWRITE INTO customer_demographics
    SELECT *
    FROM external_tpcds.customer_demographics;
    
    INSERT OVERWRITE INTO customer_address
    SELECT *
    FROM external_tpcds.customer_address;
    
    INSERT OVERWRITE INTO customer
    SELECT *
    FROM external_tpcds.customer;
    
    INSERT OVERWRITE INTO catalog_sales
    SELECT *
    FROM external_tpcds.catalog_sales;
    
    INSERT OVERWRITE INTO catalog_returns
    SELECT *
    FROM external_tpcds.catalog_returns;
    
    INSERT OVERWRITE INTO catalog_page
    SELECT *
    FROM external_tpcds.catalog_page;
  4. 統計を収集します。

    ANALYZE TABLE call_center UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_page UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE customer UPDATE HISTOGRAM ;
    ANALYZE TABLE customer_address UPDATE HISTOGRAM ;
    ANALYZE TABLE customer_demographics UPDATE HISTOGRAM ;
    ANALYZE TABLE date_dim UPDATE HISTOGRAM ;
    ANALYZE TABLE household_demographics UPDATE HISTOGRAM ;
    ANALYZE TABLE income_band UPDATE HISTOGRAM ;
    ANALYZE TABLE inventory UPDATE HISTOGRAM ;
    ANALYZE TABLE item UPDATE HISTOGRAM ;
    ANALYZE TABLE promotion UPDATE HISTOGRAM ;
    ANALYZE TABLE reason UPDATE HISTOGRAM ;
    ANALYZE TABLE ship_mode UPDATE HISTOGRAM ;
    ANALYZE TABLE store UPDATE HISTOGRAM ;
    ANALYZE TABLE store_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE store_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE time_dim UPDATE HISTOGRAM ;
    ANALYZE TABLE warehouse UPDATE HISTOGRAM ;
    ANALYZE TABLE web_page UPDATE HISTOGRAM ;
    ANALYZE TABLE web_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE web_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE web_site UPDATE HISTOGRAM ;
    説明

    クエリオプティマイザは、クエリを実行エンジンによって実行される実行プランに変換します。 実行プランの品質は、クエリのパフォーマンスに影響します。 データ列に関する統計を使用すると、クエリオプティマイザが高品質の実行計画を生成できるようになります。 データをインポートした後、最適なパフォーマンスを得るために、すべてのテーブルのヒストグラム統計を収集する必要があります。 統計の詳細については、「統計」をご参照ください。

LOAD DATAステートメントを実行してデータをインポートする

  1. データを構築します。

    1. TPC-DSデータ生成ツールdsdgenをTPC公式Webサイトからダウンロードし、ダウンロードしたファイルをコンパイルして、dsdgenというバイナリ実行可能ファイルを生成します。

    2. データファイルを格納するディレクトリを作成します。

      mkdir data1tb
    3. テストデータを作成します。

      ./dsdgen -sc 1000 -dir data1tb -TERMINATE N

      下表に、各パラメーターを説明します。

      パラメーター

      説明

      -sc

      テストデータのサイズ。 値10は10 GBを指定します。 1000の値は1000 GBを指定します。これは1テラバイトに相当します。

      1000

      -dir

      データファイルを書き込むディレクトリ。

      data1tb

      -終了

      各行の末尾にフィールド区切り文字を追加するかどうかを指定します。 有効な値:

      • N: 各行の末尾にフィールド区切り文字を追加しません。

      • Y: 各行の末尾に縦棒 (|) などのフィールド区切り文字を追加します。

      N

      -パラレル

      チャンクの総数。

      1つのコマンドで生成できるチャンクは1つだけです。 したがって、このパラメーターは、コマンドを実行するために必要な時間を指定します。

      5

      -子供

      現在のコマンドが生成するチャンクのシリアル番号。

      1

      次のサンプルコードでは、1テラバイトのテストデータを5つのチャンクに分割します。

      mkdir data1tb_5
      
      ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 1
      ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 2
      ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 3
      ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 4
      ./dsdgen -sc 1000 -dir data1tb_5 -TERMINATE N -PARALLEL 5 -CHILD 5

      テストデータファイルは、dsdgenコマンドを使用してテキスト形式で生成されます。 縦バー (|) はデフォルトのフィールド区切り文字として使用され、各行には1つのデータエントリのみが含まれます。

      call_center.dat
      catalog_page.dat
      catalog_returns.dat
      catalog_sales.dat
      customer_address.dat
      customer.dat
      customer_demographics.dat
      date_dim.dat
      dbgen_version.dat
      household_demographics.dat
      income_band.dat
      inventory.dat
      item.dat
      promotion.dat
      reason.dat
      ship_mode.dat
      store.dat
      store_returns.dat
      store_sales.dat
      time_dim.dat
      warehouse.dat
      web_page.dat
      web_returns.dat
      web_sales.dat
      web_site.dat

      dsdgenの使用方法の詳細については、「tpc-ds」をご参照ください。

  2. 互換性のためにデータを処理します。

    テーブルのフィールドにデフォルト値が指定されていない場合、デフォルト値はNULLです。

    縦棒 (|) をフィールドの区切り文字として使用した場合、生成されたテキストファイルでは、「a,NULL,c,d,NULL」が「a | | c | d | 」に変換されます。 LOAD dataステートメントを実行してAnalyticDB for MySQLにデータをインポートすると、インポートが失敗します。 したがって、NULL値を置き換える必要があります。

    • INT、BIGINT、VARCHAR、およびDATE型のすべてのフィールドのNULL値を0に置き換えます

      • #!/bin/bash
        # Replace NULL values in the first field with 0 to convert ^| into 0|.
        # Replace NULL values in the middle fields with 0 to convert || into |0|.
        # Replace NULL values in the last field with 0 to convert |$ into |0.
        for s_f in `ls *dat`
        do
            echo "$s_f"
            i=1
            while [ `egrep '\|\||^\||\|$' $s_f |wc -l` -gt 0 ]
            do 
                echo $i
                sed 's/^|/0|/g;s/||/|0|/g;s/|$/|0/g' -i $s_f
                ((i++))
            done
        done
    • 日付フィールドの0値を0000-00-00

      • 
        1092|AAAAAAAACEEAAAAA|2001-10-27|0|Manufa....
        16252|AAAAAAAAMHPDAAAA|0|1999-10-27|0|7.94|0|1001....
        16252|AAAAAAAAMHPDAAAA|0|0|0|7.94|0|1001..
        
        for s_f in item.dat store.dat web_page.dat web_site.dat call_center.dat
        do
        # Process the first and second date fields whose values are NULL (represented by 0).
        sed 's/^\([A-Za-z0-9]*|[A-Za-z0-9]*\)|0|0|\(.*\)/\1|0000-00-00|0000-00-00|\2/' -i $s_f
        
        # Process the second date fields whose values are NULL (represented by 0).
        sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*|[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}\)|0|\(.*\)/\1|0000-00-00|\2/' -i $s_f
        
        # Process the first date fields whose values are NULL (represented by 0).
        sed 's/^\([0-9A-Za-z]*|[A-Za-z0-9]*\)|0|\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}|.*\)/\1|0000-00-00|\2/' -i $s_f
        
        done
  3. LOAD DATA LOCAL INFILEステートメントを実行して、dsdgenによって生成されたデータファイルをAnalyticDB for MySQLにインポートします。

    説明
    • データファイルがLinux環境で生成される場合は、各行の末尾に \nを付けます。

    • データファイルがWindows環境で生成される場合は、各行の末尾に \r\nを付けます。

    LOAD DATA LOCAL INFILE 'call_center.dat' INTO
    TABLE call_center FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'catalog_page.dat' INTO
    TABLE catalog_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'catalog_returns.dat'
    INTO TABLE catalog_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'catalog_sales.dat'
    INTO TABLE catalog_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'customer_address.dat'
    INTO TABLE customer_address FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'customer.dat' INTO
    TABLE customer FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE
    'customer_demographics.dat' INTO TABLE customer_demographics FIELDS TERMINATED
    BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'date_dim.dat' INTO
    TABLE date_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'dbgen_version.dat'
    INTO TABLE dbgen_version FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE
    'household_demographics.dat' INTO TABLE household_demographics FIELDS
    TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'income_band.dat' INTO
    TABLE income_band FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'inventory.dat' INTO
    TABLE inventory FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'item.dat' INTO TABLE
    item FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'promotion.dat' INTO
    TABLE promotion FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'reason.dat' INTO TABLE
    reason FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'ship_mode.dat' INTO
    TABLE ship_mode FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'store.dat' INTO TABLE
    store FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'store_returns.dat'
    INTO TABLE store_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'store_sales.dat' INTO
    TABLE store_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'time_dim.dat' INTO
    TABLE time_dim FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'warehouse.dat' INTO
    TABLE warehouse FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'web_page.dat' INTO
    TABLE web_page FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'web_returns.dat' INTO
    TABLE web_returns FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'web_sales.dat' INTO
    TABLE web_sales FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
    LOAD DATA LOCAL INFILE 'web_site.dat' INTO
    TABLE web_site FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n';
  4. 統計を収集します。

    ANALYZE TABLE call_center UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_page UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE catalog_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE customer UPDATE HISTOGRAM ;
    ANALYZE TABLE customer_address UPDATE HISTOGRAM ;
    ANALYZE TABLE customer_demographics UPDATE HISTOGRAM ;
    ANALYZE TABLE date_dim UPDATE HISTOGRAM ;
    ANALYZE TABLE household_demographics UPDATE HISTOGRAM ;
    ANALYZE TABLE income_band UPDATE HISTOGRAM ;
    ANALYZE TABLE inventory UPDATE HISTOGRAM ;
    ANALYZE TABLE item UPDATE HISTOGRAM ;
    ANALYZE TABLE promotion UPDATE HISTOGRAM ;
    ANALYZE TABLE reason UPDATE HISTOGRAM ;
    ANALYZE TABLE ship_mode UPDATE HISTOGRAM ;
    ANALYZE TABLE store UPDATE HISTOGRAM ;
    ANALYZE TABLE store_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE store_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE time_dim UPDATE HISTOGRAM ;
    ANALYZE TABLE warehouse UPDATE HISTOGRAM ;
    ANALYZE TABLE web_page UPDATE HISTOGRAM ;
    ANALYZE TABLE web_returns UPDATE HISTOGRAM ;
    ANALYZE TABLE web_sales UPDATE HISTOGRAM ;
    ANALYZE TABLE web_site UPDATE HISTOGRAM ;
    説明

    クエリオプティマイザは、クエリを実行エンジンによって実行される実行プランに変換します。 実行プランの品質は、クエリのパフォーマンスに影響します。 データ列に関する統計を使用すると、クエリオプティマイザが高品質の実行計画を生成できるようになります。 データをインポートした後、最適なパフォーマンスを得るために、すべてのテーブルのヒストグラム統計を収集する必要があります。 統計の詳細については、「統計」をご参照ください。