All Products
Search
Document Center

AnalyticDB for MySQL:Access MaxCompute

Last Updated:Jan 23, 2024

This topic describes how to use AnalyticDB for MySQL Data Lakehouse Edition (V3.0) Spark SQL to read data from and write data to MaxCompute.

Background information

MaxCompute is a fast and fully-managed data warehousing service that can process terabytes or petabytes of data. You can use AnalyticDB for MySQL Data Lakehouse Edition (V3.0) Spark SQL to read data from and write data to MaxCompute.

Prerequisites

  • An AnalyticDB for MySQL Data Lakehouse Edition (V3.0) cluster is created. For more information, see Create a cluster.

  • A database account is created.

  • A job resource group is created. For more information, see Create a resource group.

  • A MaxCompute project is created in the same region as the Data Lakehouse Edition (V3.0) cluster. For more information, see Create a MaxCompute project. In this topic, a project named spark_on_maxcompute is created in the China (Hangzhou) region.

Procedure

You can choose to read data from and write data to MaxCompute in batch or interactive mode. For information about the differences between the two modes, see Spark SQL execution modes.

Use the batch mode to access MaxCompute

Step 1: Configure a Spark job

  1. Log on to the AnalyticDB for MySQL console.

  2. In the upper-left corner of the page, select the region where the cluster resides.

  3. In the left-side navigation pane, click Clusters.

  4. On the Data Lakehouse Edition (V3.0) tab, find the cluster and click the cluster ID.

  5. In the left-side navigation pane, choose Job Development > SQL Development.

  6. On the SQLConsole tab, select the Spark engine and the job resource group.

  7. On the SQLConsole tab, enter the following configurations in the SET Key=Value; format.

    SET spark.sql.catalog.odps = org.apache.spark.sql.execution.datasources.v2.odps.OdpsTableCatalog;
    SET spark.hadoop.odps.access.id = LTAI5tMnzDS5EyqqMsT****;
    SET spark.hadoop.odps.access.key = A2kHFzEgFidOKrDKqAbJIPt8****;
    SET spark.hadoop.odps.endpoint = http://service.cn-hangzhou.maxcompute.aliyun-inc.com/api;
    SET spark.hadoop.odps.project = spark_on_maxcompute;
    SET spark.adb.connectors = odps;

    Parameter

    Description

    spark.sql.catalog.<catalog_name>

    The method that is supported for Spark SQL to configure the data source.

    Set the value to org.apache.spark.sql.execution.datasources.v2.odps.OdpsTableCatalog.

    Note

    You can specify a custom value for catalog_name. In this example, odps is used.

    spark.hadoop.odps.access.id

    The AccessKey ID of an Alibaba Cloud account or a RAM user that is used to access MaxCompute.

    For information about how to obtain your AccessKey ID, see Obtain an AccessKey pair.

    spark.hadoop.odps.access.key

    The AccessKey secret of an Alibaba Cloud account or a RAM user that is used to access MaxCompute.

    For information about how to obtain your AccessKey secret, see Obtain an AccessKey pair.

    spark.hadoop.odps.endpoint

    The Virtual Private Cloud (VPC) endpoint of the MaxCompute project.

    For information about the VPC endpoint of each region, see Endpoints.

    spark.hadoop.odps.project

    The name of the MaxCompute project.

    spark.adb.connectors

    The data source of the Spark job. Set the value to odps.

  8. In the message that appears, click Proceed.

Step 2: Use Spark SQL to read data from and write data to MaxCompute

  1. Enter the following statement and click Execute (F8). In the message that appears, click Proceed to create a MaxCompute table.

    CREATE TABLE odps.spark_on_maxcompute.spark_test(k1 int, k2 string) 
    partitioned by (part int);
    Important

    In this example, the odps.spark_on_maxcompute.spark_test table name consists of three parts.

    • The first part is the value of catalog_name, which is contained in the spark.sql.catalog.<catalog_name> parameter. In this example, odps is used.

    • The second part is the name of the MaxCompute project.

    • The third part is the name of the MaxCompute table.

    You can also specify a name for the table in the <maxcompute_project_name>.<table_name> format. In this case, before you execute the statement to create the table, you must execute the USE <catalog_name>; statement. Then, you can execute the USE <maxcompute_project_name>; statement. This way, when you create the table, you only need to specify the table name.

  2. Enter the following statement and click Execute (F8). In the message that appears, click Proceed to write data to the MaxCompute table.

    INSERT INTO odps.spark_on_maxcompute.spark_test 
    values (1, 'aaa', 101), (2, 'bbb', 102);
  3. Enter the following statement and click Execute (F8). In the message that appears, click Proceed to query data in the MaxCompute table.

    SELECT * FROM odps.spark_on_maxcompute.spark_test;
    Note

    In the DataWorks console, you can find the MaxCompute project and check whether a MaxCompute table is created and has data written to it.

Use the interactive mode to access MaxCompute

Step 1: Configure a Spark job

  1. Log on to the AnalyticDB for MySQL console.

  2. In the upper-left corner of the page, select the region where the cluster resides.

  3. In the left-side navigation pane, click Clusters.

  4. On the Data Lakehouse Edition (V3.0) tab, find the cluster and click the cluster ID.

  5. In the left-side navigation pane, choose Clusters > Resource Groups.

  6. Find the job resource group that you want to manage and click Advanced Settings in the Actions column.

  7. Click the tubiao..png icon next to Other Settings and configure the parameters that are described in the following table.

    Parameter

    Description

    spark.sql.catalog.<catalog_name>

    The method that is supported for Spark SQL to configure the data source.

    Set the value to org.apache.spark.sql.execution.datasources.v2.odps.OdpsTableCatalog.

    Note

    You can specify a custom value for catalog_name. In this example, odps is used.

    spark.hadoop.odps.access.id

    The AccessKey ID of an Alibaba Cloud account or a RAM user that is used to access MaxCompute.

    For information about how to obtain your AccessKey ID, see Obtain an AccessKey pair.

    spark.hadoop.odps.access.key

    The AccessKey secret of an Alibaba Cloud account or a RAM user that is used to access MaxCompute.

    For information about how to obtain your AccessKey secret, see Obtain an AccessKey pair.

    spark.hadoop.odps.endpoint

    The Virtual Private Cloud (VPC) endpoint of the MaxCompute project.

    For information about the VPC endpoint of each region, see Endpoints.

    spark.hadoop.odps.project

    The name of the MaxCompute project.

    spark.adb.connectors

    The data source of the Spark job. Set the value to odps.

  8. Click Start. When Running is displayed on the Advanced Settings (Spark Thrift Server Settings) page, the settings take effect.

Step 2: Use Spark SQL to read data from and write data to MaxCompute

  1. In the left-side navigation pane, choose Job Development > SQL Development.

  2. On the SQLConsole tab, select the Spark engine and the job resource group.

  3. Enter the following statement and click Execute (F8) to create a MaxCompute table.

    CREATE TABLE odps.spark_on_maxcompute.spark_test(k1 int, k2 string) 
    partitioned by (part int);
    Important

    In this example, the odps.spark_on_maxcompute.spark_test table name consists of three parts.

    • The first part is the value of catalog_name, which is contained in the spark.sql.catalog.<catalog_name> parameter. In this example, odps is used.

    • The second part is the name of the MaxCompute project.

    • The third part is the name of the MaxCompute table.

    You can also specify a name for the table in the <maxcompute_project_name>.<table_name> format. In this case, before you execute the statement to create the table, you must execute the USE <catalog_name>; statement. Then, you can execute the USE <maxcompute_project_name>; statement. This way, when you create the table, you only need to specify the table name.

  4. Enter the following statement and click Execute (F8) to write data to the MaxCompute table.

    INSERT INTO odps.spark_on_maxcompute.spark_test 
    values (1, 'aaa', 101), (2, 'bbb', 102);
  5. Enter the following statement and click Execute (F8) to query data in the MaxCompute table.

    SELECT * FROM odps.spark_on_maxcompute.spark_test;
    Note

    In the DataWorks console, you can find the MaxCompute project and check whether a MaxCompute table is created and has data written to it.

Supported SQL statements and data types

Supported SQL statements

  • SHOW DATABASES

  • AHOW TABLES

  • CREATE TABLE

  • DROP TABLE

  • SELECT

  • INSERT INTO

  • INSERT INTO PARTITION

Important

The INSERT OVERWRITE PARTITION statement is not supported.

Supported field types

  • BOOLEAN

  • DOUBLE

  • INT

  • BIGINT

  • STRING

  • DECIMAL

  • DATA

  • TIMESTAMP

  • FLOAT

  • SMALLINT

  • TINYINT

  • BINARY

  • ARRAY

  • MAP

  • STRUCT

Supported partition field types

  • INT

  • BIGINT

  • STRING

  • TINYINT

  • SMALLINT