All Products
Search
Document Center

E-MapReduce:Access Databricks data through Delta Sharing in Unity Catalog

Last Updated:Jan 05, 2026

This topic describes how to configure Alibaba Cloud EMR Serverless Spark for read-only access to shared data across clouds using Delta Sharing in Databricks Unity Catalog. This method helps break down data silos and supports scenarios such as cross-cloud data analytics and data synchronization.

Core flow

The process involves configuring data sharing in Databricks and then providing the resulting access credential to an Alibaba Cloud EMR Serverless Spark task.

  1. On the Databricks side: Create a Share and a Recipient. Add the tables to be shared to the Share. Then, generate a credential file that contains an access token for the Recipient.

  2. Credential transfer: Upload the downloaded credential file to Alibaba Cloud Object Storage Service (OSS) to make it accessible to the Spark task.

  3. On the Serverless Spark side: Create and run a Spark task that is configured with the Delta Sharing connector. This task reads the credential file from OSS, accesses the Databricks data sharing endpoint over the public network, and then pulls the data for computation.

Procedure

Step 1: Configure data sharing in Databricks

In this step, you define which data to share in Databricks and create a secure credential for external access.

  1. Create a Share

    • Log in to your Databricks workspace. In the navigation pane on the left, go to Catalog > Delta Sharing > Shared by me.image

    • Click Share data to create a new Share. A Share is a logical container used to organize and manage the tables that you want to share.image

  2. Add assets

    • In the new Share, enter a Share name and then add data assets.

    • Select one or more tables that you want to share externally.

      Serverless Spark uses the configured Share name and the corresponding tables to access the shared data.image

  3. Create a recipient

    • If you have not created a Recipient, you can create a Recipient from the drop-down list. You can also return to the Share page and click Add recipient to create a new data Recipient.

      image

  4. Configure the recipient and generate a credential file

    • Generate a credential file that contains an access token for the Recipient.

      image

      • Recipient type: Select Open. The Open type is used for open sharing with any client that supports the Delta Sharing protocol, such as EMR Serverless Spark. It generates a portable credential file. The Databricks type is used for internal sharing between Databricks platforms.

        Note

        Make sure that the Recipient type is set to Open. If the Open option is not selectable, go to Organization > View Delta Sharing Settings in the upper-right corner and enable External delta sharing. imageimage

      • Authentication method: Select Token.

      • Token lifecycle: Set the validity period for the Token. The maximum period is 365 days.

    • After the process is complete, an Activation link is generated.

  5. Download the credential file

    • Open the Activation link from the previous step in a browser. The system prompts you to download a credential file with a .share extension.image

Step 2: Prepare the Alibaba Cloud environment

In this step, you securely store the credential generated by Databricks on Alibaba Cloud and prepare the runtime environment for the Spark task.

  1. Upload the credential file to OSS

    • Upload the .share credential file that you downloaded in the previous step to your OSS bucket.

    • Record the full OSS path of the file, for example, oss://your-bucket/path/to/credential.share.

      Note

      Security recommendations:

      • Store the credential file in a bucket with private permissions.

      • Make sure that the RAM role used to run the Spark task has read permissions for the OSS file.

  2. Configure public network access

    You must prepare a virtual private cloud (VPC) that has public network access and configure a compatible vSwitch for Serverless Spark. This setup allows Serverless Spark to access Databricks Delta Sharing over the public network. Alibaba Cloud provides NAT Gateway to enable public network access for a VPC. For more information, see NAT Gateway. For a list of zones and vSwitches that Serverless Spark supports, see List of supported zones and vSwitches.

Step 3: Access data in Serverless Spark

In this step, you will write and configure a Spark task to read Databricks data using the credential file.

  1. Add network connectivity

    When you add network connectivity in the workspace, select the VPC that you configured for public network access and its corresponding vSwitch. For more information, see Add network connectivity.

  2. Prepare and upload the SQL file

    • Prepare an SQL file to write the shared dataset to DLF.

      -- Create a temporary table
      -- Replace the credential file path and the share dataset
      -- demo_share is the name of the Share created earlier
      -- default.alitable is the shared library and table
      CREATE TEMPORARY TABLE dbc_delta_sharing USING deltaSharing LOCATION 'oss://your-bucket/path/to/credential.share#demo_share.default.alitable';
      SELECT * FROM dbc_delta_sharing limit 10;
      
      -- Create a database and a table
      CREATE database if NOT EXISTS demo_day_ss_dlf;
      DROP TABLE if EXISTS demo_day_ss_dlf.dw_songs_ss_dlf;
      
      CREATE TABLE demo_day_ss_dlf.dw_songs_ss_dlf SELECT * FROM dbc_delta_sharing limit 10;
      SELECT * FROM demo_day_ss_dlf.dw_songs_ss_dlf limit 10;
    • Upload the SQL file to the Serverless Spark workspace using the file management feature. For more information, see Manage files.

  3. Add a data catalog

    In the Add Data Catalog dialog box, select DLF Data Catalog. For more information about data catalogs, see Manage data catalogs.

  4. Create a Spark task

    Notebook task

    • Create a Notebook session

      On the Session Manager page, create a Notebook session. Configure the following parameters and leave the others at their default settings.

      Parameter

      Description

      Name

      Enter a name for the Notebook session.

      Engine Version

      The latest version is recommended. This topic uses esr-4.6.0.

      Network Connectivity

      Select the network connectivity that you created.

      Spark Configuration

      Add the following configuration parameters.

      spark.jars.packages              io.delta:delta-sharing-spark_2.12:3.1.0
      spark.sql.extensions             io.delta.sql.DeltaSparkSessionExtension,org.apache.paimon.spark.extensions.PaimonSparkSessionExtensions
      spark.sql.catalog.spark_catalog  org.apache.spark.sql.delta.catalog.DeltaCatalog
    • Create a Notebook task

      On the Data Development page, create a Notebook task of the Interactive Development type. Copy and save the following code.

      -- Create a temporary table
      -- Replace the credential file path and the share dataset
      -- demo_share is the name of the Share created earlier
      -- default.alitable is the shared library and table
      spark.sql("CREATE TEMPORARY TABLE dbc_delta_sharing USING deltaSharing LOCATION 'oss://your-bucket/path/to/credential.share#demo_share.default.alitable';")
      
      spark.sql("SELECT * FROM dbc_delta_sharing limit 10;").show()
    • View the results

      After the task is run, the results are displayed as shown in the following figure.image

    SQL task

    On the Data Development page, create an SQL task of the Batch Job type. For more information about how to create an SQL task, see Develop a batch job or a streaming job.

    • Configure task parameters

      On the task configuration page, configure the following parameters and leave the others at their default settings.

      Parameter

      Description

      SQL File

      The file required to submit the task.

      Set Type to Workspace Resource and upload the SQL file from the drop-down list.

      Engine Version

      The latest version is recommended. This topic uses esr-4.6.0.

      Network Connectivity

      Select the network connectivity that you created.

      Spark Configuration

      Add the following configuration parameters.

      spark.jars.packages              io.delta:delta-sharing-spark_2.12:3.1.0
      spark.sql.extensions             io.delta.sql.DeltaSparkSessionExtension,org.apache.paimon.spark.extensions.PaimonSparkSessionExtensions
      spark.sql.catalog.spark_catalog  org.apache.spark.sql.delta.catalog.DeltaCatalog
    • Run the task and view the results

      • Click Run to submit the task. After the task is run, go to the run history section. In the Actions column of the task, click Log Details to view the log information.image

      • Query the DLF data.image

References

For more information about how to use other types of Spark tasks to read data from Databricks Delta Sharing, see delta-sharing.