Configure a BigQuery data source - MaxCompute - Alibaba Cloud Documentation Center

Before you migrate data using MaxCompute Migration Service (MMS), you must add and start a BigQuery data source. Ensure that there is network connectivity between the data source and the MMS service. You can then synchronize the BigQuery metadata with the MMS service to prepare for configuring migration jobs.

Regions

This feature is available only in the Singapore and Indonesia (Jakarta) regions.

Migration costs

Data migration using MMS consumes various resources and incurs costs. The main costs are described in the following table:

MMS operation	Billing item	Billed by
Data source runtime, including metadata synchronization Data migration process for a job	Compute costs: Spark jobs are generated on MaxCompute and consume compute resources.	Alibaba Cloud MaxCompute
Source data storage	Storage costs: Incurred when the source accesses stored files if you use an object storage service such as Object Storage Service (OSS) or S3.	BigQuery
If Enable verification is configured for the migration job	Compute costs: Incurred by executing verification SQL statements on Hive and MaxCompute.	Alibaba Cloud MaxCompute and BigQuery
Network configuration	Network costs: If you use a leased line, leased line fees apply. If you do not use a leased line, data egress traffic fees from Google Cloud apply.	Leased line provider or Alibaba Cloud network

To reduce migration costs, we recommend that you use subscription compute resources and dedicated Data Transmission Service resources to run migration jobs.

Procedure

Ensure that you have completed the preparations for the destination MaxCompute project.

Step 1: Prepare the external data source

In the source BigQuery, perform the following steps:

Create a BigQuery service account and download the authentication JSON file.
Create a BigQuery project. Grant the BigQuery service account the required permissions to read the metadata and data of the project.

Step 2: Add a data source

Log on to the MaxCompute console and select a region in the top-left corner.
In the navigation pane on the left, choose Data Transfer > Migration Service.

On the Data Source tab, click Add Data Source.

In the MaxCompute Service-linked Role dialog box, click OK to create the role. If this dialog box does not appear, this means the role has already been created.

On the Add Data Source page, configure the data source information and then click Add to create the data source.

DataSource Basic Info

Parameter	Required	Description
Name	Yes	Name of the data source. You can customize the name. The name cannot contain special characters.
Type	Yes	Select BigQuery.
Network Connection	Yes	Select the network connection to use. Network connections are created in the MaxCompute console under Manage Configurations > Network Connection. They are used for communication between MMS and a VPC to connect to the data source.
Service Account Key File	Yes	The key file for the BigQuery service account. You can create a service account and download the authentication JSON file from the BigQuery IAM console. For more information, see Service account overview.
Project ID	Yes	The name of the BigQuery project to migrate.
Default Destination MaxCompute Project	Yes	The destination project for data migration mapping. This cannot be modified.
Destination MaxCompute Project List	No	If data from one data source needs to be migrated to multiple destination projects, configure the list of destination MaxCompute projects.
Project whose quota will be used for sql	Yes	The project that runs migration jobs such as Spark and SQL jobs on MaxCompute. The default compute quota associated with this project is used.

Other Information

The following parameters are optional. You can configure them as needed.

Parameter	Description
Change Range Partitioned Table as	Specifies the migration method for BigQuery range-partitioned tables. The default value is Partition. Cluster
BigQuery Spark response compression option	Specifies the compression type for BigQuery data.
BigQuery type BigNumeric default precision	Specifies the precision for the BigQuery BIGNUMERIC data type. The default value is 38.
BigQuery type BigNumeric default scale	Specifies the number of decimal places for the BigQuery BIGNUMERIC data type. The default value is 18.
Auto partition	Specifies whether to enable automatic partitioning. This feature is not supported. Keep this parameter disabled.
Force the use of Append 2.0 tables	Specifies whether to force the destination table to use Append 2.0. This feature is not supported. Keep this parameter disabled.
BigQuery execution project	Specifies the name of the project that runs jobs on BigQuery.
Meta Timer	Specifies whether to periodically pull metadata from the data source. Valid values: Enable: Enables periodic metadata pulls. You can set the Update Cycle to daily or hourly and configure the Update Started At. Daily: The timer runs once a day at a specified time. The time is accurate to the minute. Hourly: The timer runs every hour at a specified minute. Disable: Pulls metadata on demand.
Api concurrent access number of meta	Specifies the number of concurrent requests to access the MaxCompute metastore. A larger value can improve the speed of retrieving MaxCompute metadata.
Dataset whitelist	Specifies the BigQuery databases that you want to migrate. Separate multiple databases with commas (,).
Dataset Blacklist	Specifies the BigQuery databases that you want to exclude from the migration. Separate multiple databases with commas (,).
Table blacklist	Specifies the BigQuery tables that you want to exclude from the migration. Use the format `dbname.tablename`. Separate multiple tables with commas (,).
Table Whitelist	Specifies the BigQuery tables that you want to migrate. Use the format `dbname.tablename`. Separate multiple tables with commas (,).
The max migration task number	You can configure the settings as required.
Flags for MaxCompute SQL execution	Specifies the SQL parameters. For more information, see Flag parameter list.
Table Name Character Mapping	Specifies the character mapping for table names. For example, you can map the hyphen (`-`) in a BigQuery table name to an underscore (`_`) in the corresponding MaxCompute table name.

Step 3: Synchronize metadata

After you start the data source, a job instance is generated to connect the data source to the MMS service and synchronize the source metadata with the service. This synchronization is a prerequisite for configuring subsequent migration jobs.

Note

This job instance consumes 4 CUs of compute resources. If no migration or metadata synchronization jobs are pending or running for the data source, the system automatically shuts down the data source. In this case, you must restart the data source before you can use it.

On the Data Source tab, find the target data source and click Update metadata in the Actions column.
On the Data Source tab, you can view the Status of the target data source.
If the metadata of the data source changes and the Scheduled metadata update feature is enabled, the system automatically updates the metadata at the configured time. In this case, you do not need to manually synchronize the metadata.

After you configure the external data source, you can create a migration job.