All Products
Search
Document Center

MaxCompute:Configure a Databricks data source

Last Updated:Feb 28, 2026

MaxCompute Migration Service (MMS) migrates data from Databricks to MaxCompute. To start migrating, add a Databricks data source, establish network connectivity between the data source and MMS, and then synchronize the Databricks metadata so that migration jobs can be configured.

Billing

Data migration with MMS consumes resources across multiple platforms. The following table summarizes the billable items and responsible parties.

OperationBillable itemBilled by
Data source runtime and data migration task executionCompute costs: Spark jobs run on MaxCompute and consume compute resources.Alibaba Cloud (MaxCompute)
Reading data from Azure Blob StorageTraffic costs: Data transferred out of Azure Blob Storage.Azure
Reading data through Databricks JDBCCompute costs: Running queries through Databricks JDBC. VM costs: Azure virtual machine usage for reading data.Databricks
Data validation (if enabled)Compute costs: Validation SQL statements run on both Databricks and MaxCompute.Alibaba Cloud (MaxCompute) and Databricks
Network configurationNetwork costs: If you use a leased line, related fees apply. If you access Databricks and Azure over the public network, SNAT (Source Network Address Translation) fees apply for allowing the Alibaba Cloud VPC to access the external network.Leased line provider or Alibaba Cloud

To reduce migration costs, use subscription compute resources and exclusive Data Transmission Service resources to run migration jobs.

Before you begin

Complete the preparations for the destination MaxCompute project, then prepare the following items on the Databricks and Azure side.

Databricks personal access token

  1. Create a personal access token. For details, see Databricks personal access token authentication.

  2. Grant the token at least the following permissions on the directories to migrate:

    • BROWSE

    • EXECUTE

    • SELECT

    • USE CATALOG

    • USE SCHEMA

Active compute resource

Make sure an active compute resource is available in Databricks. MMS uses this resource to run SQL statements during metadata synchronization.

Azure Storage credentials (required if reading from Azure Blob Storage)

If you read Databricks data directly from Azure Blob Storage instead of through Databricks JDBC, prepare one of the following authentication methods:

  • SAS for the Azure Storage Account -- The shared access signature (SAS) must have the following permissions on the corresponding container: Read, List, Write, and Add.

  • Azure service principal -- The principal must have the Storage Blob Data Contributor role on the corresponding Storage Account. To create a service principal and assign the role, see the Azure documentation. The following example uses the Azure CLI to create a service principal with the Storage Blob Data Contributor role: Sample response:

      az ad sp create-for-rbac \
          --name "blob_owner" \
          --role "Storage Blob Data Contributor" \
          --scopes /subscriptions/08****58-****-****-****-1f6d****48ff/resourcegroups/<resourcegroupName>/providers/Microsoft.Storage/storageAccounts/<storageAccountName>

    image

Note

Note: If the Storage Account was created by Databricks during instance creation, only Databricks can access it. To allow MMS to read data from Azure Blob Storage, request Azure to remove the deny assignment from the Storage Account.

Procedure

Step 1: Add the data source

  1. Log on to the MaxCompute console, and select a region in the upper-left corner.

  2. In the navigation pane on the left, choose Data Transfer > Migration Service.

  3. On the Data Source tab, click Add Data Source.

    In the MaxCompute Service-linked Role dialog box, click OK to create the role. If this dialog box does not appear, this means the role has already been created.

    On the Add Data Source page, configure the data source information and then click Add to create the data source.

    • DataSource Basic Info

      ParameterRequiredDescription
      NameYesA custom name for the data source. Supports letters, digits, and Chinese characters. Special characters are not allowed.
      TypeYesSelect Databricks.
      Network ConnectionYesSelect the network connection to use. Network connections are created in the MaxCompute console under Manage Configurations > Network Connection. MMS uses the connection to communicate with a VPC and reach the data source. The console displays this field as Network link.
      Databricks Workspace URLYesThe URL of the Databricks workspace. Find this URL by navigating to the Databricks instance from the Databricks menu in the Azure portal. image
      Databricks Unity CatalogYesThe Databricks Unity Catalog to migrate. Each MMS data source supports only one catalog. The console displays this field as Databricks Catalog.
      Databricks Cluster IDYesThe ID of the Databricks cluster used to run SQL statements during metadata pulling.
      Databricks Access TokenYesThe personal access token for Databricks. For details, see Databricks personal access token authentication.
      Use Databricks JDBC to read table dataNoWhether to read Databricks tables through Databricks JDBC.
      Databricks JDBC URLNoThe JDBC URL for connecting to Databricks.
      Azure AuthenticationYesChoose an authentication method for accessing the Azure Storage Account: SAS -- Enter the Azure Storage Account SAS Token. To create a token, see the Azure documentation. Service Principal -- Enter the Azure Application ID, Azure App Client Secret, and Azure Tenant ID.
      Default Destination MaxCompute ProjectYesThe destination project for data migration mapping. This parameter cannot be modified.
      Destination MaxCompute Project ListNoAdditional destination projects, if data from one data source needs to go to multiple projects.
      MaxCompute Project For Migration JobsYesThe project where Spark, SQL, and other migration jobs run. The default compute quota of this project is used.
    • Other Info

      The following parameters are optional. Configure them as needed.

      ParameterDescription
      Meta TimerWhether to periodically pull data source metadata. Enable: Pulls metadata on a schedule. Set the Update Cycle to Daily (runs once per day at a specified time, accurate to the minute) or Hourly (runs once per hour at a specified minute). Configure the Update Started At time. Disable: Metadata is not pulled on a schedule.
      Metastore Access ConcurrencyNumber of concurrent connections to the metastore.
      Schema WhitelistSchemas to migrate. Separate multiple schemas with commas (,).
      Schema BlacklistSchemas to exclude from migration. Separate multiple schemas with commas (,).
      Table BlacklistDatabricks tables to exclude from migration. Format: schema.table or table. Separate multiple entries with commas (,).
      Table WhitelistDatabricks tables to migrate. Format: schema.table or table. Separate multiple entries with commas (,).
      Maximum Concurrency For Data Migration TasksMaximum number of concurrent MMS migration tasks. A high value may overload Databricks compute resources. Default: 20.
      SQL Parameters For MaxCompute Migration TasksSQL parameters for migration jobs. For details, see Flag parameters.
      Table Name Character MappingJSON-format mapping of special characters in source table names to characters supported by MaxCompute. Example: {"%": "_"}.
      Column Name Character MappingJSON-format mapping of special characters in source column names to characters supported by MaxCompute. Example: {"-": "_"}.
      Maximum Partitions Per TaskNumber of partitions migrated per MaxCompute Migration Assist (MMA) task. Migrating in batches reduces SQL submissions and saves time. Default: 50. Typically does not need to change.
      Maximum Data Size Per Task (GB)Maximum data size per migration task. Default: 5 GB. Typically does not need to change.
      Partition Value Character MappingJSON-format mapping of special characters in source partition values to characters supported by MaxCompute. Example: {"/": "_"}.
      Migration Time WindowThe time window during which migration tasks run. Configure a start time and end time.

Step 2: Synchronize metadata

After the data source starts, MMS creates a job instance that connects the data source to the service and synchronizes the source metadata. This enables subsequent migration job configuration.

Note

This job instance consumes 4 CU (Compute Unit) of compute resources. The system shuts down the data source when no migration or metadata synchronization jobs are pending or running. Restart the data source before using it again.

  1. On the Data Source tab, find the target data source and click Update metadata in the Actions column.

  2. On the Data Source tab, you can view the Status of the target data source.

  3. If Meta Timer is enabled, the system automatically updates metadata at the configured interval. Manual synchronization is not required in this case.

Next steps

After the data source is configured, create a migration job.