You can create Object Storage Service (OSS) schemas and tables in Data Lake Analytics (DLA), perform data cleansing for different types of files in OSS, and then write the result data into AnalyticDB for MySQL 3.0. This allows you to perform instant multidimensional analysis and business exploration for hundreds of billions of data records within milliseconds.
The following shows the table schema information.
CREATE TABLE shipping (
id bigint NOT NULL COMMENT '',
origin_state varchar NOT NULL COMMENT '',
origin_zip varchar NOT NULL COMMENT '',
destination_state varchar NOT NULL COMMENT '',
destination_zip varchar NOT NULL COMMENT '',
package_weight int NOT NULL COMMENT '',
PRIMARY KEY (id)
DISTRIBUTED BY HASH (id)
CREATE TABLE order_table (
customer_id bigint NOT NULL COMMENT '',
order_id varchar NOT NULL COMMENT '',
order_time timestamp NOT NULL COMMENT '',
order_amount double NOT NULL COMMENT '',
order_type varchar NOT NULL COMMENT '',
address varchar NOT NULL COMMENT '',
city varchar NOT NULL COMMENT '',
order_season bigint COMMENT '',
PRIMARY KEY (customer_id,order_time)
DISTRIBUTED BY HASH(customer_id)
PARTITION BY VALUE(DATE_FORMAT(order_time, '%Y%m%d')) LIFECYCLE 30
AnalyticDB for MySQL uses the logic of update in the case of a primary key conflict. If the entire ETL task of INSERT FROM SELECT fails, the entire ETL task will be retried.
A delay occurs when data written to AnalyticDB for MySQL is queried. The delay depends on the cluster configuration of AnalyticDB for MySQL.
We recommend that you split the ETL task into small batches. For example, if the business permits, 200 GB of OSS data can be split into 100 folders with 2 GB of data in each folder. In DLA, you can create 100 tables and perform ETL on 100 tables separately. If a single ETL task fails, you only need to retry the task.