All Products
Search
Document Center

Procedure

Last Updated: Apr 05, 2020

Step 1: Create a MaxCompute schema

  1. Log on to the Data Lake Analytics (DLA) console.

  2. In the left-side navigation pane, click Endpoint. In the endpoint list, find the target endpoint and click Log on in DMS. On the DMS for Data Lake Analytics page, execute the following SQL statement to create a schema:

    1. CREATE SCHEMA maxcpmpute_test WITH DBPROPERTIES (
    2. catalog = 'odps',
    3. location = 'http://dt.<maxcompute-region>.maxcompute.aliyun.com',
    4. endpoint = 'http://service.<maxcompute-region>.maxcompute.aliyun.com/api',
    5. project = 'dla_project',
    6. user = '<access-key>',
    7. password = '<access-secret>'
    8. )
    Parameter Description
    catalog The type of the schema.
    location The classic network Tunnel endpoint in the region where MaxCompute is activated.

    For the classic network Tunnel endpoints in each region, see MaxCompute endpoints

    endpoint The classic network endpoint in the region where MaxCompute is activated.

    For the classic network endpoints in each region, see MaxCompute endpoints

    project The name of the MaxCompute project.
    user The AccessKey ID of the Alibaba Cloud account used to access MaxCompute.
    password The AccessKey secret corresponding to the AccessKey ID.

Step 2: Synchronize the MaxCompute table information

Note:

  • External tables in MaxCompute cannot be synchronized to DLA.

  • If the MaxCompute table to be synchronized is a partition table, the partition columns are defined as normal columns in the DLA table creation statement, and the definitions of the partition columns are sorted after all other columns.

Method 1: Use MSCK REPAIR to synchronize all the tables of the MaxCompute project

  1. msck repair database <schema_name>

Run the preceding command to synchronize all the tables in the MaxCompute project corresponding to schema_name to DLA.

Method 2: Synchronize the information of a single table (non-partition table)

You can use CREATE TABLE to synchronize the information of a specified table. For example, you can use the following SQL statement to synchronize the bank_data table information.

  1. CREATE external TABLE bank_data
  2. (
  3. age BIGINT COMMENT 'age',
  4. job STRING COMMENT 'job type',
  5. marital STRING COMMENT 'marital status',
  6. education STRING COMMENT 'education level',
  7. default STRING COMMENT 'credit card ownership',
  8. housing STRING COMMENT 'mortgage',
  9. loan STRING COMMENT 'loan',
  10. contact STRING COMMENT 'contact information',
  11. month STRING COMMENT 'month',
  12. day_of_week STRING COMMENT 'day of the week',
  13. duration STRING COMMENT 'duration',
  14. campaign BIGINT COMMENT 'contact times during the campaign',
  15. pdays DOUBLE COMMENT 'interval from the last contact',
  16. previous DOUBLE COMMENT 'previous contact times with the customer',
  17. poutcome STRING COMMENT 'result of the previous marketing campaign',
  18. emp_var_rate DOUBLE COMMENT 'employment change rate',
  19. cons_price_idx DOUBLE COMMENT 'consumer price index',
  20. cons_conf_idx DOUBLE COMMENT 'consumer confidence index',
  21. euribor3m DOUBLE COMMENT 'euro deposit rate',
  22. nr_employed DOUBLE COMMENT 'number of employees',
  23. y BIGINT COMMENT 'whether time deposit is available'
  24. );

Method 2: Synchronize the information of a single table (partition table)

For example, MaxCompute has the following partition table:

  1. CREATE TABLE IF NOT EXISTS part_datatype_test_complex (
  2. int_test int,
  3. bigint_test bigint,
  4. double_test double,
  5. string_test string,
  6. datetime_test datetime,
  7. boolean_test boolean,
  8. array_test array<INT>,
  9. map_test map<string, string>,
  10. struct_test_1 struct<x:INT, y:INT>,
  11. struct_test_2 struct<x:string, y:string>
  12. )
  13. partitioned by (pt bigint, region string);

The corresponding table creation statement in DLA is as follows:

  1. CREATE EXTERNAL TABLE IF NOT EXISTS part_datatype_test_complex (
  2. int_test int,
  3. bigint_test bigint,
  4. double_test double,
  5. string_test string,
  6. datetime_test datetime,
  7. boolean_test boolean,
  8. array_test string,
  9. map_test string,
  10. struct_test_1 string,
  11. struct_test_2 string,
  12. pt bigint,
  13. region string
  14. )

Step 3: Read the MaxCompute table data

After the table is created, you can use the MySQL client or MySQL CLI to connect to the MaxCompute schema in DLA, and then use SELECT to read the MaxCompute table data.