Configure the RestAPI data source for data synchronization - DataWorks

Limitations

Resource groups: Only Serverless resource groups and exclusive resource groups for Data Integration are supported.
Request timeout: The built-in timeout is 60 seconds. You cannot configure a custom timeout value. If an API query takes longer than 60 seconds to respond, the task fails.
Table schema: Only a flat (single-layer) table schema is supported at the destination. Nested field structures are not supported. For example, if an API returns {data: {user: { id: 1, name:'lily'}, value: 123}}, flatten the fields to parallel columns such as user_id, user_name, and value at the destination.
Scheduling parameters: The REST API plugin does not support scheduling parameters.
Paging: Manual paging is supported — specify the page range using startIndex, endIndex, and step. Automatic paging (stopping when no more data is returned) is not supported. If the specified page count exceeds the actual number of pages, empty pages are treated as empty query results and the task continues to the next page without failing.

Supported field types

Type	Data Integration column type
Integer	LONG, INT
String	STRING
Floating-point	DOUBLE, FLOAT
Boolean	BOOLEAN
Date and time	DATE

Add a data source

Before you develop a synchronization task, add the REST API data source in the DataWorks console. For instructions, see Data source management.

Develop a synchronization task

To configure a single-table offline synchronization task, use either the codeless UI or the code editor:

For the full parameter reference and sample scripts, see Appendix: Script reference.

Examples

FAQ

Can I specify only the number of pages for data requests?

Yes. Use requestTimes: "multiple" with startIndex, endIndex, and step to define the page range.

Is automatic paging supported?

No. Specify the page range in advance using startIndex, endIndex, and step. The plugin cannot detect when there is no more data and stop automatically.

What happens if the specified page count is greater than the actual number of pages?

Empty pages are treated as empty query results. The task continues to the next page without failing.

Is only single-layer JSON parsing supported?

Yes. Deep (nested) parsing is not supported. Use dataPath to point to the target field, and flatten nested structures at the destination.

How do I read non-array data from a REST API?

Set dataPath to the path of the non-array field (for example, dataPath: "data.list") and set dataMode to multiData. In multiData mode, the column configuration is not applicable — specify the data path directly in dataPath.

Example:

{
  "reader": {
    "name": "restapi",
    "parameter": {
      "dataPath": "data.list",
      "dataMode": "multiData"
    }
  }
}

Appendix: Script reference

How it works

The REST API plugin sends an HTTP or HTTPS request and receives a JSON response body. Use dataPath to specify the JSONPath for extracting data from the response, and dataMode to control how the extracted data is passed to the writer.

Example 1: Array response (`multiData` mode)

The API returns a response where DATA is an array containing multiple records:

{
  "HEADER": { "BUSID": "bid1", "RECID": "uuid", "SENDER": "dc", "RECEIVER": "pre", "DTSEND": "202201250000" },
  "DATA": [
    { "SERNR": "sernr1" },
    { "SERNR": "sernr2" }
  ]
}

To extract each item in DATA as a separate synchronization record:

column:   ["SERNR"]
dataMode: "multiData"
dataPath: "DATA"

Example 2: Single-object response (`oneData` mode)

The API returns a response where content.DATA is a single object:

{
  "HEADER": { "BUSID": "bid1", "RECID": "uuid", "SENDER": "dc", "RECEIVER": "pre", "DTSEND": "202201250000" },
  "content": {
    "DATA": { "SERNR": "sernr2" }
  }
}

To extract content.DATA as a single synchronization record:

column:   ["SERNR"]
dataMode: "oneData"
dataPath: "content.DATA"

Reader script example

{
  "type": "job",
  "version": "2.0",
  "steps": [
    {
      "stepType": "restapi",
      "parameter": {
        "url": "http://127.0.0.1:5000/get_array5",
        "dataMode": "oneData",
        "responseType": "json",
        "column": [
          {
            "type": "long",
            "name": "a.b"
          },
          {
            "type": "string",
            "name": "a.c"
          }
        ],
        "dirtyData": "null",
        "method": "get",
        "socketTimeout": "60000",
        "defaultHeader": {
          "X-Custom-Header": "test header"
        },
        "customHeader": {
          "X-Custom-Header2": "test header2"
        },
        "parameters": "abc=1&def=1"
      },
      "name": "restapireader",
      "category": "reader"
    },
    {
      "stepType": "stream",
      "parameter": {},
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting": {
    "errorLimit": {
      "record": ""
    },
    "speed": {
      "throttle": true,
      "concurrent": 1,
      "mbps": "12"
    }
  },
  "order": {
    "hops": [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  }
}

Reader parameters

The following parameters apply when adding a data source and configuring a Data Integration node. The plugin does not support scheduling parameters.

Parameter	Required	Default	Description
`url`	Yes	—	The address of the RESTful API.
`method`	Yes	—	The HTTP request method. Valid values: `get`, `post`.
`dataMode`	Yes	—	How the JSON response is processed. `oneData`: reads one record from the response. `multiData`: reads a JSON array and passes multiple records to the writer.
`responseType`	Yes	`json`	The response format. Only `json` is supported.
`column`	Yes	—	The list of fields to read. Each field requires `type` (the data type) and `name` (the JSONPath to the value). Example: `[{"type": "long", "name": "a.b"}, {"type": "string", "name": "a.c"}]`
`dirtyData`	Yes	`dirty`	How to handle records where a column's JSONPath returns no value. `dirty`: marks the record as dirty data. `null`: sets the column value to null.
`requestTimes`	Yes	`single`	Whether to send one or multiple requests. `single`: sends one request. `multiple`: loops through page parameters defined by `startIndex`, `endIndex`, and `step`.
`dataPath`	No	—	The JSONPath to a single object or array in the response.
`socketTimeout`	No	`60000`	The socket timeout for the API request, in milliseconds.
`customHeader`	No	—	Custom HTTP headers to include in the request.
`parameters`	No	—	Request parameters. For GET requests, use the `key=value&key=value` format. For POST requests, use JSON format.
`requestParam`	No	—	The loop parameter name (for example, `pageNumber`) when `requestTimes` is `multiple`.
`startIndex`	No	—	The start index for the loop request (inclusive).
`endIndex`	No	—	The end index for the loop request (inclusive).
`step`	No	—	The step size for the loop request.
`authType`	No	—	The authentication method. See Authentication methods.
`authUsername` / `authPassword`	No	—	The username and password for Basic authentication.
`authToken`	No	—	The token for token-based authentication. Example: `{"Authorization": "Bearer TokenXXXXXX"}`. To use a custom encryption method, provide the encrypted credentials as the `AuthToken` value.
`accessKey` / `accessSecret`	No	—	The access key and access secret for Alibaba Cloud API signature authentication.

Authentication methods

Check your API documentation to identify the authentication method, then configure the corresponding parameters. The following keywords in your API documentation indicate which method to use:

Method	Keywords in your API docs	Parameters to configure
Basic authentication	"Basic Auth", "Basic HTTP", `Authorization: Basic`	`authUsername`, `authPassword`
Token-based authentication	"Bearer token", "API token", `Authorization: Bearer`	`authToken`
Alibaba Cloud API signature	"AK/SK", "AccessKey", Alibaba Cloud signature	`accessKey`, `accessSecret`

Writer script example

{
  "type": "job",
  "version": "2.0",
  "steps": [
    {
      "stepType": "stream",
      "parameter": {},
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "restapi",
      "parameter": {
        "url": "http://127.0.0.1:5000/writer1",
        "dataMode": "oneData",
        "responseType": "json",
        "column": [
          {
            "type": "long",
            "name": "a.b"
          },
          {
            "type": "string",
            "name": "a.c"
          }
        ],
        "method": "post",
        "defaultHeader": {
          "X-Custom-Header": "test header"
        },
        "customHeader": {
          "X-Custom-Header2": "test header2"
        },
        "parameters": "abc=1&def=1",
        "batchSize": 256
      },
      "name": "restapiwriter",
      "category": "writer"
    }
  ],
  "setting": {
    "errorLimit": {
      "record": "0"
    },
    "speed": {
      "throttle": true,
      "concurrent": 1,
      "mbps": "12"
    }
  },
  "order": {
    "hops": [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  }
}

Writer parameters

Parameter	Required	Default	Description
`url`	Yes	—	The address of the RESTful API.
`method`	Yes	—	The HTTP request method. Valid values: `post`, `put`.
`dataMode`	Yes	—	How records are sent. `oneData`: sends one record per request. `multiData`: sends a batch of records per request; the number of requests depends on the tasks split on the reader side.
`column`	Yes	—	The list of field paths for the generated JSON. Each field requires `type` and `name` (the JSONPath where the column's data is placed). Example: `[{"type": "long", "name": "a.b"}, {"type": "string", "name": "a.c"}]`
`batchSize`	Yes	`512`	The maximum number of records per request when `dataMode` is `multiData`.
`dataPath`	No	—	The JSONPath of the object where the output data is placed.
`customHeader`	No	—	Custom HTTP headers to include in the request.
`authType`	No	—	The authentication method. See Authentication methods.
`authUsername` / `authPassword`	No	—	The username and password for Basic authentication.
`authToken`	No	—	The token for token-based authentication.
`accessKey` / `accessSecret`	No	—	The access key and access secret for Alibaba Cloud API signature authentication.