This topic describes how to synchronize data of all tables with the same schema in a MaxCompute project to Data Lake Analytics (DLA). The schema in the MaxCompute project is mapped to the schema specified by schema_name in DLA.

Background information

MaxCompute, formerly referred to as ODPS, is a fast and fully hosted data warehousing solution. It can process terabytes or petabytes of data. You can create a MaxCompute schema and MaxCompute table in Data Lake Analytics (DLA). Then, you can use a MySQL client or MySQL command-line tool to connect to the MaxCompute schema in DLA. Then, you can execute standard SQL statements to read data from MaxCompute.

Preparations

Before you read data from and write data to MaxCompute tables, make the following preparations in DLA:
  1. Activate DLA.
  2. Manage DLA accounts

Before you read data from and write data to MaxCompute tables, make the following preparations in MaxCompute:
  1. Activate MaxCompute and DataWorks
  2. Create tables
  3. Use DataWorks to create a MaxCompute table and prepare test data. For more information about how to create a MaxCompute table in DataWorks, see Create a MaxCompute table.

    This example shows how to execute the following DDL statement to create the bank_data table. For more information about how to create this table, see "Create the bank_data table" in Create tables and import data.

    CREATE TABLE IF NOT EXISTS bank_data
     (
      age             BIGINT COMMENT 'Age',
      job             STRING COMMENT 'Job type',
      marital         STRING COMMENT 'Marital status',
      education       STRING COMMENT 'Education level',
      default         STRING COMMENT 'Credit card used',
      housing         STRING COMMENT 'Mortgage',
      loan            STRING COMMENT 'Loan',
      contact         STRING COMMENT 'Contact',
      month           STRING COMMENT 'Month',
      day_of_week     STRING COMMENT 'Day of the week',
      duration        STRING COMMENT 'Duration',
      campaign        BIGINT COMMENT 'Number of contacts during the campaign',
      pdays           DOUBLE COMMENT 'Interval from the last contact',
      previous        DOUBLE COMMENT 'Number of contacts with the customer',
      poutcome        STRING COMMENT 'Result of the previous marketing campaign',
      emp_var_rate    DOUBLE COMMENT 'Employment variation rate',
      cons_price_idx  DOUBLE COMMENT 'Consumer price index',
      cons_conf_idx   DOUBLE COMMENT 'Consumer confidence index',
      euribor3m       DOUBLE COMMENT 'Euro deposit rate',
      nr_employed     DOUBLE COMMENT 'Number of employees',
      y               BIGINT COMMENT 'Time deposit available or not'
     );​

    Import the banking data to the bank_data table. For more information, see "Upload a local file to import its data to the bank_data table" in Create tables and import data.

Step 1: Create a MaxCompute schema in DLA

  1. Log on to the DLA console.
  2. In the left-side navigation pane, choose Serverless Presto > SQL access point. On the SQL access point page, find your virtual private cloud (VPC) in the VPC Network section and click Log on in DMS in the Actions column. Then, execute the following statement to create a schema:
    CREATE SCHEMA maxcpmpute_test WITH DBPROPERTIES (
    catalog = 'odps',
    project = 'dla_project',
    user = '<access-key>',
    password = '<access-secret>'
    )
    Parameter Description
    catalog The type of the schema.
    project The name of the MaxCompute project.
    user The AccessKey ID of the Alibaba Cloud account that is used to access MaxCompute.
    password The AccessKey secret that corresponds to the AccessKey ID of the Alibaba Cloud account.

Step 2: Synchronize data of tables from MaxCompute to DLA

Note
  • Data of external tables in MaxCompute cannot be synchronized to DLA.
  • If you want to synchronize data of a partitioned table from MaxCompute to DLA, the partition key columns in the table are defined as common columns and listed after other columns in the statement for creating a table in DLA.

Method 1: Run the MSCK REPAIR command to synchronize data of all tables from a MaxCompute project to DLA

msck repair database <schema_name>

Method 2: Synchronize data of a non-partitioned table from MaxCompute to DLA

You can execute the CREATE TABLE statement to synchronize data of a specified non-partitioned table from MaxCompute to DLA. For example, you can execute the following SQL statement to synchronize data of the bank_data table from MaxCompute to DLA.

CREATE external TABLE bank_data
(
 age             BIGINT COMMENT 'Age',
 job             STRING COMMENT 'Job type',
 marital         STRING COMMENT 'Marital status',
 education       STRING COMMENT 'Education level',
 default         STRING COMMENT 'Credit card used',
 housing         STRING COMMENT 'Mortgage',
 loan            STRING COMMENT 'Loan',
 contact         STRING COMMENT 'Contact information',
 month           STRING COMMENT 'Month',
 day_of_week     STRING COMMENT 'Day of the week',
 duration        STRING COMMENT 'Duration',
 campaign        BIGINT COMMENT 'Number of contacts during the campaign',
 pdays           DOUBLE COMMENT 'Interval from the last contact',
 previous        DOUBLE COMMENT 'Number of contacts with the customer',
 poutcome        STRING COMMENT 'Result of the previous marketing campaign',
 emp_var_rate    DOUBLE COMMENT 'Employment variation rate',
 cons_price_idx  DOUBLE COMMENT 'Consumer price index',
 cons_conf_idx   DOUBLE COMMENT 'Consumer confidence index',
 euribor3m       DOUBLE COMMENT 'Euro deposit rate',
 nr_employed     DOUBLE COMMENT 'Number of employees',
 y               BIGINT COMMENT 'Time deposit available or not'
);

Method 2: Synchronize data of a partitioned table from MaxCompute to DLA

The following partitioned table is created in MaxCompute:

CREATE TABLE IF NOT EXISTS part_datatype_test_complex (
  int_test int,
  bigint_test bigint,
  double_test double,
  string_test string,
  datetime_test datetime,
  boolean_test boolean,
  array_test array<INT>,
  map_test map<string, string>,
  struct_test_1 struct<x:INT, y:INT>,
  struct_test_2 struct<x:string, y:string>
)
partitioned by (pt bigint, region string);

You can execute the following statement to create a table in DLA:

CREATE EXTERNAL TABLE IF NOT EXISTS part_datatype_test_complex (
  int_test int,
  bigint_test bigint,
  double_test double,
  string_test string,
  datetime_test datetime,
  boolean_test boolean,
  array_test string,
  map_test string,
  struct_test_1 string,
  struct_test_2 string
)
partitioned by (pt bigint, region string);

Step 3: Read data from the MaxCompute table

After the table is created in DLA, you can use a MySQL client or MySQL command-line tool to connect to the MaxCompute schema in DLA and execute SELECT statements to read data of the table from MaxCompute to DLA.