Before you migrate data using MaxCompute Migration Service (MMS), you must add and start a BigQuery data source. Ensure that there is network connectivity between the data source and the MMS service. You can then synchronize the BigQuery metadata with the MMS service to prepare for configuring migration jobs.
Regions
This feature is available only in the Singapore and Indonesia (Jakarta) regions.
Migration costs
Data migration using MMS consumes various resources and incurs costs. The main costs are described in the following table:
MMS operation | Billing item | Billed by |
| Compute costs: Spark jobs are generated on MaxCompute and consume compute resources. | Alibaba Cloud MaxCompute |
Source data storage | Storage costs: Incurred when the source accesses stored files if you use an object storage service such as Object Storage Service (OSS) or S3. | BigQuery |
If Enable verification is configured for the migration job | Compute costs: Incurred by executing verification SQL statements on Hive and MaxCompute. | Alibaba Cloud MaxCompute and BigQuery |
Network configuration | Network costs:
| Leased line provider or Alibaba Cloud network |
To reduce migration costs, we recommend that you use subscription compute resources and dedicated Data Transmission Service resources to run migration jobs.
Procedure
Ensure that you have completed the preparations for the destination MaxCompute project.
Step 1: Prepare the external data source
In the source BigQuery, perform the following steps:
Create a BigQuery service account and download the authentication JSON file.
Create a BigQuery project. Grant the BigQuery service account the required permissions to read the metadata and data of the project.
Step 2: Add a data source
Log on to the MaxCompute console and select a region in the top-left corner.
In the navigation pane on the left, choose .
On the Data Source tab, click Add Data Source.
In the MaxCompute Service-linked Role dialog box, click OK to create the role. If this dialog box does not appear, this means the role has already been created.
On the Add Data Source page, configure the data source information and then click Add to create the data source.
DataSource Basic Info
Parameter
Required
Description
Name
Yes
Name of the data source. You can customize the name. The name cannot contain special characters.
Type
Yes
Select BigQuery.
Network Connection
Yes
Select the network connection to use.
Network connections are created in the MaxCompute console under . They are used for communication between MMS and a VPC to connect to the data source.
Service Account Key File
Yes
The key file for the BigQuery service account.
You can create a service account and download the authentication JSON file from the BigQuery IAM console. For more information, see Service account overview.
Project ID
Yes
The name of the BigQuery project to migrate.
Default Destination MaxCompute Project
Yes
The destination project for data migration mapping. This cannot be modified.
Destination MaxCompute Project List
No
If data from one data source needs to be migrated to multiple destination projects, configure the list of destination MaxCompute projects.
Project whose quota will be used for sql
Yes
The project that runs migration jobs such as Spark and SQL jobs on MaxCompute. The default compute quota associated with this project is used.
Other Information
The following parameters are optional. You can configure them as needed.
Parameter
Description
Change Range Partitioned Table as
Specifies the migration method for BigQuery range-partitioned tables.
The default value is Partition.
Cluster
BigQuery Spark response compression option
Specifies the compression type for BigQuery data.
BigQuery type BigNumeric default precision
Specifies the precision for the BigQuery BIGNUMERIC data type. The default value is 38.
BigQuery type BigNumeric default scale
Specifies the number of decimal places for the BigQuery BIGNUMERIC data type. The default value is 18.
Auto partition
Specifies whether to enable automatic partitioning. This feature is not supported. Keep this parameter disabled.
Force the use of Append 2.0 tables
Specifies whether to force the destination table to use Append 2.0. This feature is not supported. Keep this parameter disabled.
BigQuery execution project
Specifies the name of the project that runs jobs on BigQuery.
Meta Timer
Specifies whether to periodically pull metadata from the data source. Valid values:
Enable: Enables periodic metadata pulls. You can set the Update Cycle to daily or hourly and configure the Update Started At.
Daily: The timer runs once a day at a specified time. The time is accurate to the minute.
Hourly: The timer runs every hour at a specified minute.
Disable: Pulls metadata on demand.
Api concurrent access number of meta
Specifies the number of concurrent requests to access the MaxCompute metastore. A larger value can improve the speed of retrieving MaxCompute metadata.
Dataset whitelist
Specifies the BigQuery databases that you want to migrate. Separate multiple databases with commas (,).
Dataset Blacklist
Specifies the BigQuery databases that you want to exclude from the migration. Separate multiple databases with commas (,).
Table blacklist
Specifies the BigQuery tables that you want to exclude from the migration.
Use the format
dbname.tablename. Separate multiple tables with commas (,).Table Whitelist
Specifies the BigQuery tables that you want to migrate.
Use the format
dbname.tablename. Separate multiple tables with commas (,).The max migration task number
You can configure the settings as required.
Flags for MaxCompute SQL execution
Specifies the SQL parameters. For more information, see Flag parameter list.
Table Name Character Mapping
Specifies the character mapping for table names. For example, you can map the hyphen (
-) in a BigQuery table name to an underscore (_) in the corresponding MaxCompute table name.
Step 3: Synchronize metadata
After you start the data source, a job instance is generated to connect the data source to the MMS service and synchronize the source metadata with the service. This synchronization is a prerequisite for configuring subsequent migration jobs.
This job instance consumes 4 CUs of compute resources. If no migration or metadata synchronization jobs are pending or running for the data source, the system automatically shuts down the data source. In this case, you must restart the data source before you can use it.
On the Data Source tab, find the target data source and click Update metadata in the Actions column.
On the Data Source tab, you can view the Status of the target data source.
If the metadata of the data source changes and the Scheduled metadata update feature is enabled, the system automatically updates the metadata at the configured time. In this case, you do not need to manually synchronize the metadata.
After you configure the external data source, you can create a migration job.