All Products
Search
Document Center

DataWorks:BigQuery

Last Updated:Mar 26, 2026

The BigQuery data source connects DataWorks to Google BigQuery, letting you read BigQuery data as part of an offline synchronization task. Configure the task using the codeless UI or the code editor.

Supported features

Feature Supported
Offline (batch) sync Yes
Codeless UI configuration Yes
Code editor configuration Yes
Incremental sync (via where filter) Yes
Partition-based sync Yes
Concurrent sync (via splitPk) Yes

Supported regions

BigQuery data sources are available in the following regions: China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), UK (London), US (Silicon Valley), and US (Virginia).

The BigQuery SDK version is google-cloud-bigquery 2.29.0. For SDK details, see the BigQuery documentation.

Supported field types

The following table lists the BigQuery field types that DataWorks supports and their corresponding Java types.

BigQuery type Java type
BOOL Bool
INT64 Long
FLOAT64 BigDecimal
NUMERIC BigDecimal
BIGNUMERIC BigDecimal
STRING String
BYTES Bytes
STRUCT String
ARRAY String
TIMESTAMP Date
DATE Date
TIME Date
DATETIME Date
GEOGRAPHY String
JSON String
INTERVAL String

For the full BigQuery type reference, see the BigQuery data types documentation.

Prerequisites

Before you begin, make sure you have:

  • Network connectivity between your BigQuery data source and a serverless resource group or an exclusive resource group for Data Integration. Use a serverless resource group for most synchronization tasks. For setup instructions, see Network connectivity solutions.

  • A service account authentication file (JSON key) downloaded from Google Cloud.

  • The BigQuery Project ID for your Google Cloud project.

Add the BigQuery data source

Add BigQuery as a data source in DataWorks before creating a synchronization task. Follow the instructions in Data source management.

The following parameters are specific to the BigQuery data source:

Parameter Description
BigQuery Project ID The ID of your Google Cloud project.
BigQuery authorization information Upload the JSON authentication file downloaded from Google Cloud.
Tip: View parameter descriptions directly in the DataWorks console when adding the data source.

Configure a synchronization task

For step-by-step task configuration, see:

Appendix: Script example and parameter descriptions

Use the following script example and parameter reference when configuring a BigQuery batch synchronization task in the code editor. For general code editor requirements, see Configure a task in the code editor.

Reader script example

{
  "stepType": "bigquery",
  "parameter": {
    "datasource": "bq_test1",
    "dataSet": "database_0724",
    "table": "partition_1107",
    "column": [
      "id",
      "table_id",
      "table_no",
      "table_name",
      "table_status"
    ],
    "where": "xxx=3",
    "partition": [
      "_PARTITIONTIME='2023-11-07'"
    ]
  },
  "name": "Reader",
  "category": "reader"
}

Reader parameters

Parameter Required Default Description
datasource Yes None The name of the data source. Must match the name of the data source added in DataWorks.
dataSet Yes None The BigQuery dataset.
table Yes None The name of the table to synchronize.
column Yes None The columns to read. Example: ["id", "name", "age"].
where No None A filter condition applied to the query. BigQuery Reader combines column, table, and where into an SQL statement to extract data. For example, use LIMIT 10 for testing, or gmt_create>$bizdate for daily incremental sync. Leave blank to read all data.
partition No None One or more partitions to synchronize. Example: ["_PARTITIONTIME='2023-11-07'"].
splitPk No None The field used to partition data for concurrent synchronization. Has no effect if partition is specified.

What's next