The REST API data source lets you sync JSON data from RESTful APIs into destinations such as MaxCompute, or receive data from other data sources into a REST API endpoint. This topic describes the supported capabilities, configuration parameters, and script reference for the REST API data source in DataWorks Data Integration.
Limitations
-
Resource groups: Only Serverless resource groups and exclusive resource groups for Data Integration are supported.
-
Request timeout: The built-in timeout is 60 seconds. You cannot configure a custom timeout value. If an API query takes longer than 60 seconds to respond, the task fails.
-
Table schema: Only a flat (single-layer) table schema is supported at the destination. Nested field structures are not supported. For example, if an API returns
{data: {user: { id: 1, name:'lily'}, value: 123}}, flatten the fields to parallel columns such asuser_id,user_name, andvalueat the destination. -
Scheduling parameters: The REST API plugin does not support scheduling parameters.
-
Paging: Manual paging is supported — specify the page range using
startIndex,endIndex, andstep. Automatic paging (stopping when no more data is returned) is not supported. If the specified page count exceeds the actual number of pages, empty pages are treated as empty query results and the task continues to the next page without failing.
Supported field types
| Type | Data Integration column type |
|---|---|
| Integer | LONG, INT |
| String | STRING |
| Floating-point | DOUBLE, FLOAT |
| Boolean | BOOLEAN |
| Date and time | DATE |
Add a data source
Before you develop a synchronization task, add the REST API data source in the DataWorks console. For instructions, see Data source management.
Develop a synchronization task
To configure a single-table offline synchronization task, use either the codeless UI or the code editor:
For the full parameter reference and sample scripts, see Appendix: Script reference.
Examples
FAQ
Can I specify only the number of pages for data requests?
Yes. Use requestTimes: "multiple" with startIndex, endIndex, and step to define the page range.
Is automatic paging supported?
No. Specify the page range in advance using startIndex, endIndex, and step. The plugin cannot detect when there is no more data and stop automatically.
What happens if the specified page count is greater than the actual number of pages?
Empty pages are treated as empty query results. The task continues to the next page without failing.
Is only single-layer JSON parsing supported?
Yes. Deep (nested) parsing is not supported. Use dataPath to point to the target field, and flatten nested structures at the destination.
How do I read non-array data from a REST API?
Set dataPath to the path of the non-array field (for example, dataPath: "data.list") and set dataMode to multiData. In multiData mode, the column configuration is not applicable — specify the data path directly in dataPath.
Example:
{
"reader": {
"name": "restapi",
"parameter": {
"dataPath": "data.list",
"dataMode": "multiData"
}
}
}
Appendix: Script reference
How it works
The REST API plugin sends an HTTP or HTTPS request and receives a JSON response body. Use dataPath to specify the JSONPath for extracting data from the response, and dataMode to control how the extracted data is passed to the writer.
Example 1: Array response (`multiData` mode)
The API returns a response where DATA is an array containing multiple records:
{
"HEADER": { "BUSID": "bid1", "RECID": "uuid", "SENDER": "dc", "RECEIVER": "pre", "DTSEND": "202201250000" },
"DATA": [
{ "SERNR": "sernr1" },
{ "SERNR": "sernr2" }
]
}
To extract each item in DATA as a separate synchronization record:
column: ["SERNR"]
dataMode: "multiData"
dataPath: "DATA"
Example 2: Single-object response (`oneData` mode)
The API returns a response where content.DATA is a single object:
{
"HEADER": { "BUSID": "bid1", "RECID": "uuid", "SENDER": "dc", "RECEIVER": "pre", "DTSEND": "202201250000" },
"content": {
"DATA": { "SERNR": "sernr2" }
}
}
To extract content.DATA as a single synchronization record:
column: ["SERNR"]
dataMode: "oneData"
dataPath: "content.DATA"
Reader script example
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "restapi",
"parameter": {
"url": "http://127.0.0.1:5000/get_array5",
"dataMode": "oneData",
"responseType": "json",
"column": [
{
"type": "long",
"name": "a.b"
},
{
"type": "string",
"name": "a.c"
}
],
"dirtyData": "null",
"method": "get",
"socketTimeout": "60000",
"defaultHeader": {
"X-Custom-Header": "test header"
},
"customHeader": {
"X-Custom-Header2": "test header2"
},
"parameters": "abc=1&def=1"
},
"name": "restapireader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": ""
},
"speed": {
"throttle": true,
"concurrent": 1,
"mbps": "12"
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Reader parameters
The following parameters apply when adding a data source and configuring a Data Integration node. The plugin does not support scheduling parameters.
| Parameter | Required | Default | Description |
|---|---|---|---|
url |
Yes | — | The address of the RESTful API. |
method |
Yes | — | The HTTP request method. Valid values: get, post. |
dataMode |
Yes | — | How the JSON response is processed. oneData: reads one record from the response. multiData: reads a JSON array and passes multiple records to the writer. |
responseType |
Yes | json |
The response format. Only json is supported. |
column |
Yes | — | The list of fields to read. Each field requires type (the data type) and name (the JSONPath to the value). Example: [{"type": "long", "name": "a.b"}, {"type": "string", "name": "a.c"}] |
dirtyData |
Yes | dirty |
How to handle records where a column's JSONPath returns no value. dirty: marks the record as dirty data. null: sets the column value to null. |
requestTimes |
Yes | single |
Whether to send one or multiple requests. single: sends one request. multiple: loops through page parameters defined by startIndex, endIndex, and step. |
dataPath |
No | — | The JSONPath to a single object or array in the response. |
socketTimeout |
No | 60000 |
The socket timeout for the API request, in milliseconds. |
customHeader |
No | — | Custom HTTP headers to include in the request. |
parameters |
No | — | Request parameters. For GET requests, use the key=value&key=value format. For POST requests, use JSON format. |
requestParam |
No | — | The loop parameter name (for example, pageNumber) when requestTimes is multiple. |
startIndex |
No | — | The start index for the loop request (inclusive). |
endIndex |
No | — | The end index for the loop request (inclusive). |
step |
No | — | The step size for the loop request. |
authType |
No | — | The authentication method. See Authentication methods. |
authUsername / authPassword |
No | — | The username and password for Basic authentication. |
authToken |
No | — | The token for token-based authentication. Example: {"Authorization": "Bearer TokenXXXXXX"}. To use a custom encryption method, provide the encrypted credentials as the AuthToken value. |
accessKey / accessSecret |
No | — | The access key and access secret for Alibaba Cloud API signature authentication. |
Authentication methods
Check your API documentation to identify the authentication method, then configure the corresponding parameters. The following keywords in your API documentation indicate which method to use:
| Method | Keywords in your API docs | Parameters to configure |
|---|---|---|
| Basic authentication | "Basic Auth", "Basic HTTP", Authorization: Basic |
authUsername, authPassword |
| Token-based authentication | "Bearer token", "API token", Authorization: Bearer |
authToken |
| Alibaba Cloud API signature | "AK/SK", "AccessKey", Alibaba Cloud signature | accessKey, accessSecret |
Writer script example
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "stream",
"parameter": {},
"name": "Reader",
"category": "reader"
},
{
"stepType": "restapi",
"parameter": {
"url": "http://127.0.0.1:5000/writer1",
"dataMode": "oneData",
"responseType": "json",
"column": [
{
"type": "long",
"name": "a.b"
},
{
"type": "string",
"name": "a.c"
}
],
"method": "post",
"defaultHeader": {
"X-Custom-Header": "test header"
},
"customHeader": {
"X-Custom-Header2": "test header2"
},
"parameters": "abc=1&def=1",
"batchSize": 256
},
"name": "restapiwriter",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": true,
"concurrent": 1,
"mbps": "12"
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Writer parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
url |
Yes | — | The address of the RESTful API. |
method |
Yes | — | The HTTP request method. Valid values: post, put. |
dataMode |
Yes | — | How records are sent. oneData: sends one record per request. multiData: sends a batch of records per request; the number of requests depends on the tasks split on the reader side. |
column |
Yes | — | The list of field paths for the generated JSON. Each field requires type and name (the JSONPath where the column's data is placed). Example: [{"type": "long", "name": "a.b"}, {"type": "string", "name": "a.c"}] |
batchSize |
Yes | 512 |
The maximum number of records per request when dataMode is multiData. |
dataPath |
No | — | The JSONPath of the object where the output data is placed. |
customHeader |
No | — | Custom HTTP headers to include in the request. |
authType |
No | — | The authentication method. See Authentication methods. |
authUsername / authPassword |
No | — | The username and password for Basic authentication. |
authToken |
No | — | The token for token-based authentication. |
accessKey / accessSecret |
No | — | The access key and access secret for Alibaba Cloud API signature authentication. |