E-MapReduce (EMR) Serverless Spark includes a built-in MaxCompute DataSource that is based on the Spark DataSource V2 API. To connect to MaxCompute, you must add the required configurations during development. This topic describes how to read data from and write data to MaxCompute using EMR Serverless Spark.
Background information
MaxCompute, formerly known as Open Data Processing Service (ODPS), is a fast and fully managed data warehouse solution that can process exabyte-scale data. It provides storage and batch processing for structured data, data warehousing solutions, and analytics modeling services. For more information, see What is MaxCompute?.
Prerequisites
A workspace is created in EMR Serverless Spark. For more information, see Create a workspace.
A MaxCompute project is created and open storage is enabled. For more information, see Create a MaxCompute project and Enable open storage.
The examples in this topic use the pay-as-you-go billing method for open storage.
Limits
This topic applies only to the following database engine versions:
esr-4.x: esr-4.6.0 and later.
esr-3.x: esr-3.5.0 and later.
esr-2.x: esr-2.9.0 and later.
The operations in this topic require you to enable the open storage feature for MaxCompute. For more information, see Enable open storage.
The MaxCompute Endpoint that you use must support the Storage API feature. If it does not, switch to an Endpoint that supports the Storage API. For more information, see Data transmission resources.
Notes
If you use the pay-as-you-go billing method for open storage, you are charged for the logical size of the data that you read or write after you exceed the free quota of 1 TB.
Procedure
Step 1: Create a session to connect to MaxCompute
You can create an SQL session or a Notebook session to connect to MaxCompute. For more information, see Session Manager.
Create an SQL session to connect to MaxCompute
Go to the Sessions page.
Log on to the EMR console.
In the left-side navigation pane, choose .
On the Spark page, click the name of the workspace that you want to manage.
In the left-side navigation pane of the EMR Serverless Spark page, choose Operation Center > Sessions.
On the SQL Session tab, click Create SQL Session.
On the Create SQL Session page, configure the parameters and click Create.
Parameter
Description
Name
The name of the SQL session. For example, mc_sql_compute.
Spark Configuration
The Spark configurations to connect to MaxCompute.
ImportantTo access a MaxCompute project for which the Layer 3 model is enabled, you must also set the
spark.sql.catalog.odps.enableNamespaceSchemaparameter totruein the Spark configurations. For more information about the parameters, see Spark Connector. For more information about schemas, see Schema operations.spark.sql.catalog.odps org.apache.spark.sql.execution.datasources.v2.odps.OdpsTableCatalog spark.sql.extensions org.apache.spark.sql.execution.datasources.v2.odps.extension.OdpsExtensions spark.sql.sources.partitionOverwriteMode dynamic spark.hadoop.odps.tunnel.quota.name pay-as-you-go spark.hadoop.odps.project.name <project_name> spark.hadoop.odps.end.point https://service.cn-hangzhou-vpc.maxcompute.aliyun-inc.com/api spark.hadoop.odps.access.id <accessId> spark.hadoop.odps.access.key <accessKey>Replace the following information as needed:
<project_name>: Your MaxCompute project name.https://service.cn-hangzhou-vpc.maxcompute.aliyun-inc.com/api: Your MaxCompute Endpoint. For more information, see Endpoints.<accessId>: The AccessKey ID of the Alibaba Cloud account used to access the MaxCompute service.<accessKey>: The AccessKey secret of the Alibaba Cloud account used to access the MaxCompute service.
Create a Notebook session to connect to MaxCompute
Go to the Notebook Sessions tab.
Log on to the EMR console.
In the left-side navigation pane, choose .
On the Spark page, find the desired workspace and click the name of the workspace.
In the left-side navigation pane of the EMR Serverless Spark page, choose Operation Center > Sessions.
Click the Notebook Sessions tab.
Click the Create Notebook Session button.
On the Create Notebook Session page, configure the parameters and click the Create button.
Parameter
Description
Name
The name of the Notebook session. For example, mc_notebook_compute.
Spark Configuration
The Spark configurations to connect to MaxCompute.
ImportantTo access a MaxCompute project for which the schema feature is enabled, you must also set the
spark.sql.catalog.odps.enableNamespaceSchemaparameter totruein the Spark configurations. For more information about the parameters, see Spark Connector. For more information about schemas, see Schema operations.spark.sql.catalog.odps org.apache.spark.sql.execution.datasources.v2.odps.OdpsTableCatalog spark.sql.extensions org.apache.spark.sql.execution.datasources.v2.odps.extension.OdpsExtensions spark.sql.sources.partitionOverwriteMode dynamic spark.hadoop.odps.tunnel.quota.name pay-as-you-go spark.hadoop.odps.project.name <project_name> spark.hadoop.odps.end.point https://service.cn-hangzhou-vpc.maxcompute.aliyun-inc.com/api spark.hadoop.odps.access.id <accessId> spark.hadoop.odps.access.key <accessKey>Replace the following information as needed:
<project_name>: Your MaxCompute project name.https://service.cn-hangzhou-vpc.maxcompute.aliyun-inc.com/api: Your MaxCompute Endpoint. For more information, see Endpoints.<accessId>: The AccessKey ID of the Alibaba Cloud account used to access the MaxCompute service.<accessKey>: The AccessKey secret of the Alibaba Cloud account used to access the MaxCompute service.
Step 2: Query or write data in MaxCompute
Use SparkSQL to write and query data in MaxCompute
On the EMR Serverless Spark page, click Data Development in the navigation pane on the left.
On the Development tab, click the
icon.Create a SparkSQL job.
In the dialog box that appears, enter a name, such as mc_load_task. For Type, keep the default, SparkSQL, and then click OK.
Copy the following code to the tab of the new SparkSQL job (mc_load_task).
CREATE TABLE odps.default.mc_table (name STRING, num BIGINT); INSERT INTO odps.default.mc_table (name, num) VALUES ('Alice', 100),('Bob', 200); SELECT * FROM odps.default.mc_table;From the database drop-down list, select a database. From the Compute drop-down list, select the SQL session that you created in Step 1: Create a session to connect to MaxCompute, such as mc_sql_compute.
Click Run to execute the SparkSQL job.
After the job runs successfully, the results appear on the Execution Results tab.

View the created table in the MaxCompute console.
Log on to the MaxCompute console. In the upper-left corner, select a region.
On the Projects page, find the project you created and click Manage in the Actions column.
Click the Tables tab.
You can see a new table named
mc_tablein the MaxCompute console.
Use a Notebook to write and query data in MaxCompute
On the EMR Serverless Spark page, click Data Development in the left navigation pane.
On the Development tab, click the
icon.Create a Notebook.
In the dialog box that appears, enter a name, such as mc_load_task, select for Type, and then click OK.
From the session drop-down list, select the running Notebook session that you created in Step 1: Create a session to connect to MaxCompute, such as mc_notebook_compute.
Write and execute the code.
In a Python cell, enter the following command to create a table.
spark.sql(""" CREATE TABLE odps.default.mc_table (name STRING, num BIGINT); """)In a new Python cell, enter the following command to insert data.
spark.sql("INSERT INTO odps.default.mc_table (name, num) VALUES ('Alice', 100),('Bob', 200);")In a new Python cell, enter the following command to query data.
spark.sql("SELECT * FROM odps.default.mc_table;").show()After the query is successfully executed, the results are displayed on the Execution Results tab.

View the created table in the MaxCompute console.
Log on to the MaxCompute console. In the upper-left corner, select a region.
On the Projects page, find the project you created and click Manage in the Actions column.
Click the Tables tab.
You can see a new table named
mc_tablein the MaxCompute console.
FAQ
References
This topic provides examples that use SparkSQL and Notebook jobs. To learn how to read data from and write data to MaxCompute using other types of jobs, see Develop a batch or streaming job.