All Products
Search
Document Center

E-MapReduce:Read from and write to MaxCompute

Last Updated:Dec 22, 2025

E-MapReduce (EMR) Serverless Spark includes a built-in MaxCompute DataSource that is based on the Spark DataSource V2 API. To connect to MaxCompute, you must add the required configurations during development. This topic describes how to read data from and write data to MaxCompute using EMR Serverless Spark.

Background information

MaxCompute, formerly known as Open Data Processing Service (ODPS), is a fast and fully managed data warehouse solution that can process exabyte-scale data. It provides storage and batch processing for structured data, data warehousing solutions, and analytics modeling services. For more information, see What is MaxCompute?.

Prerequisites

Limits

  • This topic applies only to the following database engine versions:

    • esr-4.x: esr-4.6.0 and later.

    • esr-3.x: esr-3.5.0 and later.

    • esr-2.x: esr-2.9.0 and later.

  • The operations in this topic require you to enable the open storage feature for MaxCompute. For more information, see Enable open storage.

  • The MaxCompute Endpoint that you use must support the Storage API feature. If it does not, switch to an Endpoint that supports the Storage API. For more information, see Data transmission resources.

Notes

If you use the pay-as-you-go billing method for open storage, you are charged for the logical size of the data that you read or write after you exceed the free quota of 1 TB.

Procedure

Step 1: Create a session to connect to MaxCompute

You can create an SQL session or a Notebook session to connect to MaxCompute. For more information, see Session Manager.

Create an SQL session to connect to MaxCompute

  1. Go to the Sessions page.

    1. Log on to the EMR console.

    2. In the left-side navigation pane, choose EMR Serverless > Spark.

    3. On the Spark page, click the name of the workspace that you want to manage.

    4. In the left-side navigation pane of the EMR Serverless Spark page, choose Operation Center > Sessions.

  2. On the SQL Session tab, click Create SQL Session.

  3. On the Create SQL Session page, configure the parameters and click Create.

    Parameter

    Description

    Name

    The name of the SQL session. For example, mc_sql_compute.

    Spark Configuration

    The Spark configurations to connect to MaxCompute.

    Important

    To access a MaxCompute project for which the Layer 3 model is enabled, you must also set the spark.sql.catalog.odps.enableNamespaceSchema parameter to true in the Spark configurations. For more information about the parameters, see Spark Connector. For more information about schemas, see Schema operations.

    spark.sql.catalog.odps                        org.apache.spark.sql.execution.datasources.v2.odps.OdpsTableCatalog
    spark.sql.extensions                          org.apache.spark.sql.execution.datasources.v2.odps.extension.OdpsExtensions
    spark.sql.sources.partitionOverwriteMode      dynamic
    spark.hadoop.odps.tunnel.quota.name           pay-as-you-go
    spark.hadoop.odps.project.name                <project_name>
    spark.hadoop.odps.end.point                   https://service.cn-hangzhou-vpc.maxcompute.aliyun-inc.com/api
    spark.hadoop.odps.access.id                   <accessId>
    spark.hadoop.odps.access.key                  <accessKey>

    Replace the following information as needed:

    • <project_name>: Your MaxCompute project name.

    • https://service.cn-hangzhou-vpc.maxcompute.aliyun-inc.com/api: Your MaxCompute Endpoint. For more information, see Endpoints.

    • <accessId>: The AccessKey ID of the Alibaba Cloud account used to access the MaxCompute service.

    • <accessKey>: The AccessKey secret of the Alibaba Cloud account used to access the MaxCompute service.

Create a Notebook session to connect to MaxCompute

  1. Go to the Notebook Sessions tab.

    1. Log on to the EMR console.

    2. In the left-side navigation pane, choose EMR Serverless > Spark.

    3. On the Spark page, find the desired workspace and click the name of the workspace.

    4. In the left-side navigation pane of the EMR Serverless Spark page, choose Operation Center > Sessions.

    5. Click the Notebook Sessions tab.

  2. Click the Create Notebook Session button.

  3. On the Create Notebook Session page, configure the parameters and click the Create button.

    Parameter

    Description

    Name

    The name of the Notebook session. For example, mc_notebook_compute.

    Spark Configuration

    The Spark configurations to connect to MaxCompute.

    Important

    To access a MaxCompute project for which the schema feature is enabled, you must also set the spark.sql.catalog.odps.enableNamespaceSchema parameter to true in the Spark configurations. For more information about the parameters, see Spark Connector. For more information about schemas, see Schema operations.

    spark.sql.catalog.odps                        org.apache.spark.sql.execution.datasources.v2.odps.OdpsTableCatalog
    spark.sql.extensions                          org.apache.spark.sql.execution.datasources.v2.odps.extension.OdpsExtensions
    spark.sql.sources.partitionOverwriteMode      dynamic
    spark.hadoop.odps.tunnel.quota.name           pay-as-you-go
    spark.hadoop.odps.project.name                <project_name>
    spark.hadoop.odps.end.point                    https://service.cn-hangzhou-vpc.maxcompute.aliyun-inc.com/api
    spark.hadoop.odps.access.id                   <accessId>
    spark.hadoop.odps.access.key                  <accessKey>

    Replace the following information as needed:

    • <project_name>: Your MaxCompute project name.

    • https://service.cn-hangzhou-vpc.maxcompute.aliyun-inc.com/api: Your MaxCompute Endpoint. For more information, see Endpoints.

    • <accessId>: The AccessKey ID of the Alibaba Cloud account used to access the MaxCompute service.

    • <accessKey>: The AccessKey secret of the Alibaba Cloud account used to access the MaxCompute service.

Step 2: Query or write data in MaxCompute

Use SparkSQL to write and query data in MaxCompute

  1. On the EMR Serverless Spark page, click Data Development in the navigation pane on the left.

  2. On the Development tab, click the image icon.

  3. Create a SparkSQL job.

    1. In the dialog box that appears, enter a name, such as mc_load_task. For Type, keep the default, SparkSQL, and then click OK.

    2. Copy the following code to the tab of the new SparkSQL job (mc_load_task).

      CREATE TABLE odps.default.mc_table (name STRING, num BIGINT);
      
      INSERT INTO odps.default.mc_table (name, num) VALUES ('Alice', 100),('Bob', 200);
      
      SELECT * FROM odps.default.mc_table;
    3. From the database drop-down list, select a database. From the Compute drop-down list, select the SQL session that you created in Step 1: Create a session to connect to MaxCompute, such as mc_sql_compute.

    4. Click Run to execute the SparkSQL job.

      After the job runs successfully, the results appear on the Execution Results tab.

      image

  4. View the created table in the MaxCompute console.

    1. Log on to the MaxCompute console. In the upper-left corner, select a region.

    2. On the Projects page, find the project you created and click Manage in the Actions column.

    3. Click the Tables tab.

      You can see a new table named mc_table in the MaxCompute console.

      image

Use a Notebook to write and query data in MaxCompute

  1. On the EMR Serverless Spark page, click Data Development in the left navigation pane.

  2. On the Development tab, click the image icon.

  3. Create a Notebook.

    1. In the dialog box that appears, enter a name, such as mc_load_task, select Interactive Development > Notebook for Type, and then click OK.

    2. From the session drop-down list, select the running Notebook session that you created in Step 1: Create a session to connect to MaxCompute, such as mc_notebook_compute.

    3. Write and execute the code.

      1. In a Python cell, enter the following command to create a table.

        spark.sql("""
        CREATE TABLE odps.default.mc_table (name STRING, num BIGINT);
        """)
        
      2. In a new Python cell, enter the following command to insert data.

        spark.sql("INSERT INTO odps.default.mc_table (name, num) VALUES ('Alice', 100),('Bob', 200);")
      3. In a new Python cell, enter the following command to query data.

        spark.sql("SELECT * FROM odps.default.mc_table;").show()

        After the query is successfully executed, the results are displayed on the Execution Results tab.

        image

  4. View the created table in the MaxCompute console.

    1. Log on to the MaxCompute console. In the upper-left corner, select a region.

    2. On the Projects page, find the project you created and click Manage in the Actions column.

    3. Click the Tables tab.

      You can see a new table named mc_table in the MaxCompute console.

      image

FAQ

Why do I receive an "Access Denied" error when I query a MaxCompute table?

  • Symptom: When you query a MaxCompute table, the following error message appears.

    Access Denied - Not allowed to use storage api service on current endpoint
  • Cause: This error occurs because the current user is not authorized to use the MaxCompute Storage API service, or the Endpoint being used does not support the Storage API feature.

  • Solution:

    • Check whether the open storage feature is enabled.

      In the MaxCompute console, go to Tenants > Tenant Property. If the feature is not enabled, refer to Enable open storage to enable and configure it.

    • Check whether the current Endpoint supports the Storage API feature. If not, switch to an Endpoint that supports the Storage API. For more information, see Supported regions.

References

This topic provides examples that use SparkSQL and Notebook jobs. To learn how to read data from and write data to MaxCompute using other types of jobs, see Develop a batch or streaming job.