You can create an Object Storage Service (OSS) schema and table in Data Lake Analytics (DLA) to cleanse data in files of different types in OSS, and write the result data to AnalyticDB for MySQL V3.0. This way, you can perform multidimensional analysis and pivoting, and business exploration on hundreds of billions of result data records within milliseconds.

  • . Activate DLA, AnalyticDB for MySQL V3.0, and OSS that are deployed in the same region.

  • Create a database and table in AnalyticDB for MySQL V3.0. For more information, see CREATE DATABASE and CREATE TABLE.

    The table schema contains the following information:

      CREATE TABLE shipping (
      id bigint NOT NULL COMMENT '',
      origin_state varchar NOT NULL COMMENT '',
      origin_zip varchar NOT NULL COMMENT '',
      destination_state varchar NOT NULL COMMENT '',
      destination_zip varchar NOT NULL COMMENT '',
      package_weight int NOT NULL COMMENT '',
      PRIMARY KEY (id)
      )
      DISTRIBUTED BY HASH (id)
      COMMENT ''
      CREATE TABLE order_table (
         customer_id bigint NOT NULL COMMENT '',
         order_id varchar NOT NULL COMMENT '',
         order_time timestamp NOT NULL COMMENT '',
         order_amount double NOT NULL COMMENT '',
         order_type varchar NOT NULL COMMENT '',
         address varchar NOT NULL COMMENT '',
         city varchar NOT NULL COMMENT '',
         order_season bigint COMMENT '',
         PRIMARY KEY (customer_id,order_time)
       )
       DISTRIBUTED BY HASH(customer_id)
       PARTITION BY VALUE(DATE_FORMAT(order_time, '%Y%m%d')) LIFECYCLE 30
       COMMENT ''

Usage notes

  • AnalyticDB for MySQL V3.0 uses primary key logic. If extract, transform, load (ETL) tasks fail for the INSERT FROM SELECT statement, you must retry all the tasks again.

  • When you query data that is written to AnalyticDB for MySQL V3.0, a latency exists. The latency is determined based on the configurations of the AnalyticDB for MySQL V3.0 cluster.

  • We recommend that you split data into multiple files to perform ETL tasks. For example, the size of data stored in OSS is 200 GB, and data is split into 100 folders, each of which stores 2 GB of data. 100 tables are created in DLA and ETL tasks are performed for each table. If an ETL task fails, you need to retry only this task.