This topic describes how to configure Alibaba Cloud EMR Serverless Spark for read-only access to shared data across clouds using Delta Sharing in Databricks Unity Catalog. This method helps break down data silos and supports scenarios such as cross-cloud data analytics and data synchronization.
Core flow
The process involves configuring data sharing in Databricks and then providing the resulting access credential to an Alibaba Cloud EMR Serverless Spark task.
On the Databricks side: Create a
Shareand aRecipient. Add the tables to be shared to theShare. Then, generate a credential file that contains an access token for theRecipient.Credential transfer: Upload the downloaded credential file to Alibaba Cloud Object Storage Service (OSS) to make it accessible to the Spark task.
On the Serverless Spark side: Create and run a Spark task that is configured with the Delta Sharing connector. This task reads the credential file from OSS, accesses the Databricks data sharing endpoint over the public network, and then pulls the data for computation.
Procedure
Step 1: Configure data sharing in Databricks
In this step, you define which data to share in Databricks and create a secure credential for external access.
Create a Share
Log in to your Databricks workspace. In the navigation pane on the left, go to Catalog > Delta Sharing > Shared by me.

Click Share data to create a new
Share. AShareis a logical container used to organize and manage the tables that you want to share.
Add assets
In the new
Share, enter a Share name and then add data assets.Select one or more tables that you want to share externally.
Serverless Spark uses the configured Share name and the corresponding tables to access the shared data.

Create a recipient
If you have not created a
Recipient, you can create aRecipientfrom the drop-down list. You can also return to theSharepage and click Add recipient to create a new dataRecipient.
Configure the recipient and generate a credential file
Generate a credential file that contains an access token for the
Recipient.
Recipient type: Select
Open. TheOpentype is used for open sharing with any client that supports the Delta Sharing protocol, such as EMR Serverless Spark. It generates a portable credential file. TheDatabrickstype is used for internal sharing between Databricks platforms.NoteMake sure that the Recipient type is set to Open. If the Open option is not selectable, go to Organization > View Delta Sharing Settings in the upper-right corner and enable External delta sharing.


Authentication method: Select
Token.Token lifecycle: Set the validity period for the
Token. The maximum period is 365 days.
After the process is complete, an Activation link is generated.
Download the credential file
Open the Activation link from the previous step in a browser. The system prompts you to download a credential file with a
.shareextension.
Step 2: Prepare the Alibaba Cloud environment
In this step, you securely store the credential generated by Databricks on Alibaba Cloud and prepare the runtime environment for the Spark task.
Upload the credential file to OSS
Upload the
.sharecredential file that you downloaded in the previous step to your OSS bucket.Record the full OSS path of the file, for example,
oss://your-bucket/path/to/credential.share.NoteSecurity recommendations:
Store the credential file in a bucket with private permissions.
Make sure that the RAM role used to run the Spark task has read permissions for the OSS file.
Configure public network access
You must prepare a virtual private cloud (VPC) that has public network access and configure a compatible vSwitch for Serverless Spark. This setup allows Serverless Spark to access Databricks Delta Sharing over the public network. Alibaba Cloud provides NAT Gateway to enable public network access for a VPC. For more information, see NAT Gateway. For a list of zones and vSwitches that Serverless Spark supports, see List of supported zones and vSwitches.
Step 3: Access data in Serverless Spark
In this step, you will write and configure a Spark task to read Databricks data using the credential file.
Add network connectivity
When you add network connectivity in the workspace, select the VPC that you configured for public network access and its corresponding vSwitch. For more information, see Add network connectivity.
Prepare and upload the SQL file
Prepare an SQL file to write the shared dataset to DLF.
-- Create a temporary table -- Replace the credential file path and the share dataset -- demo_share is the name of the Share created earlier -- default.alitable is the shared library and table CREATE TEMPORARY TABLE dbc_delta_sharing USING deltaSharing LOCATION 'oss://your-bucket/path/to/credential.share#demo_share.default.alitable'; SELECT * FROM dbc_delta_sharing limit 10; -- Create a database and a table CREATE database if NOT EXISTS demo_day_ss_dlf; DROP TABLE if EXISTS demo_day_ss_dlf.dw_songs_ss_dlf; CREATE TABLE demo_day_ss_dlf.dw_songs_ss_dlf SELECT * FROM dbc_delta_sharing limit 10; SELECT * FROM demo_day_ss_dlf.dw_songs_ss_dlf limit 10;Upload the SQL file to the Serverless Spark workspace using the file management feature. For more information, see Manage files.
Add a data catalog
In the Add Data Catalog dialog box, select
DLF Data Catalog. For more information about data catalogs, see Manage data catalogs.Create a Spark task
Notebook task
Create a Notebook session
On the Session Manager page, create a Notebook session. Configure the following parameters and leave the others at their default settings.
Parameter
Description
Name
Enter a name for the Notebook session.
Engine Version
The latest version is recommended. This topic uses esr-4.6.0.
Network Connectivity
Select the network connectivity that you created.
Spark Configuration
Add the following configuration parameters.
spark.jars.packages io.delta:delta-sharing-spark_2.12:3.1.0 spark.sql.extensions io.delta.sql.DeltaSparkSessionExtension,org.apache.paimon.spark.extensions.PaimonSparkSessionExtensions spark.sql.catalog.spark_catalog org.apache.spark.sql.delta.catalog.DeltaCatalogCreate a Notebook task
On the Data Development page, create a Notebook task of the Interactive Development type. Copy and save the following code.
-- Create a temporary table -- Replace the credential file path and the share dataset -- demo_share is the name of the Share created earlier -- default.alitable is the shared library and table spark.sql("CREATE TEMPORARY TABLE dbc_delta_sharing USING deltaSharing LOCATION 'oss://your-bucket/path/to/credential.share#demo_share.default.alitable';") spark.sql("SELECT * FROM dbc_delta_sharing limit 10;").show()View the results
After the task is run, the results are displayed as shown in the following figure.

SQL task
On the Data Development page, create an SQL task of the Batch Job type. For more information about how to create an SQL task, see Develop a batch job or a streaming job.
Configure task parameters
On the task configuration page, configure the following parameters and leave the others at their default settings.
Parameter
Description
SQL File
The file required to submit the task.
Set Type to Workspace Resource and upload the SQL file from the drop-down list.
Engine Version
The latest version is recommended. This topic uses esr-4.6.0.
Network Connectivity
Select the network connectivity that you created.
Spark Configuration
Add the following configuration parameters.
spark.jars.packages io.delta:delta-sharing-spark_2.12:3.1.0 spark.sql.extensions io.delta.sql.DeltaSparkSessionExtension,org.apache.paimon.spark.extensions.PaimonSparkSessionExtensions spark.sql.catalog.spark_catalog org.apache.spark.sql.delta.catalog.DeltaCatalogRun the task and view the results
Click Run to submit the task. After the task is run, go to the run history section. In the Actions column of the task, click Log Details to view the log information.

Query the DLF data.

References
For more information about how to use other types of Spark tasks to read data from Databricks Delta Sharing, see delta-sharing.