You can create a RestAPI data source to write JSON data from RESTful APIs to other data sources (such as MaxCompute) through synchronization tasks. RestAPI data sources can also serve as destinations to receive data from other data sources. This topic describes the capabilities of synchronizing data from or to RestAPI data sources.
Limits
RestAPI data sources support only exclusive resource groups for Data Integration.
DataWorks does not allow you to configure a timeout period when you use this type of data source. The built-in timeout period for a request in DataWorks is 60 seconds. If the time required to return the result of your API call exceeds 60 seconds, your task may fail.
Supported field types
Category | Data Integration Column Configuration Types |
Integer | LONG, INT |
String | STRING |
Floating point | DOUBLE, FLOAT |
Boolean | BOOLEAN |
Date and time | DATE |
Add a data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Add and manage data sources. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.
Configure a batch synchronization task to synchronize data of a single table
For more information about the configuration procedure, see Configure a batch synchronization task using the codeless UI and Configure a batch synchronization task using the code editor.
For information about all parameters that are configured and the code that is run when you use the code editor to configure a batch synchronization task, see Appendix: Code and parameters.
FAQ
Can I specify only the number of times of page flipping for a response?
Yes.
Does it support automatic pagination that stops when a request returns no more data?
A: No, this is not supported because it cannot be split into chunks.
The specified number of times of page flipping for a response is greater than the actual number of pages for the response. As a result, additional pages do not contain data. How does the system resolve this issue?
If no result is returned for the SQL query, additional pages do not contain data. In this case, the system continues to query the next data record.
Can RestAPI Reader parse only one level of data in the JSON-formatted response?
A: Yes, it does not perform deep parsing.
How do I configure RestAPI Reader to read data of a non-array type?
Make sure that in the
reader
'sparameter
block, you set thedataPath
parameter to the path that points to your data of a non-array type, such asdataPath:"data.list"
. This helps the plugin correctly locate the data fields to read. Next, set thedataMode
parameter tomultiData
. This way, DataWorks processes the data of a non-array type as multiple separate data records.NoteNote that in
multiData
mode, thecolumn
parameter does not apply. You must specify the data path directly indataPath
.The following code provides a configuration example:
reader: { name: "restapi", parameter: { dataPath: "data.list", dataMode: "multiData", // Other parameters } }
Appendix: Code and parameters
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a batch synchronization task by using the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Reader script example
Sample code:
{ "type":"job", "version":"2.0", "steps":[ { "stepType":"restapi", "parameter":{ "url":"http://127.0.0.1:5000/get_array5", "dataMode":"oneData", "responseType":"json", "column":[ { "type":"long", "name":"a.b" // Query data in the a.b path. }, { "type":"string", // Query data in the a.c path. "name":"a.c" } ], "dirtyData":"null", "method":"get", "defaultHeader":{ "X-Custom-Header":"test header" }, "customHeader":{ "X-Custom-Header2":"test header2" }, "parameters":"abc=1&def=1" }, "name":"restapireader", "category":"reader" }, { "stepType":"stream", "parameter":{ }, "name":"Writer", "category":"writer" } ], "setting":{ "errorLimit":{ "record":"" }, "speed":{ "throttle":true, // Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. "concurrent":1, // The maximum number of parallel threads. "mbps":"12"// The maximum transmission rate. Unit: MB/s. } }, "order":{ "hops":[ { "from":"Reader", "to":"Writer" } ] } }
Take note of the following information when you configure RestAPI Reader using the code editor:
After RestAPI Reader sends an HTTP or HTTPS request, a JSON-formatted response is returned. The dataPath parameter is used to specify the path of the JSON-formatted data record or JSON array that is queried. Examples: In the following sample response, a JSON array is returned for the DATA parameter that contains the business data. { "HEADER": { "BUSID": "bid1", "RECID": "uuid", "SENDER": "dc", "RECEIVER": "pre", "DTSEND": "202201250000" }, "DATA": [ { "SERNR": "sernr1" }, { "SERNR": "sernr2" } ] } To extract multiple data records from the JSON array and transfer the data records to a writer, you must configure the column parameter in the "column": [ "SERNR" ] format, the dataMode parameter in the "dataMode": "multiData" format, and the dataPath parameter in the "dataPath": "DATA" format. In the following sample response, a JSON object is returned for the content.DATA parameter that contains the business data. { "HEADER": { "BUSID": "bid1", "RECID": "uuid", "SENDER": "dc", "RECEIVER": "pre", "DTSEND": "202201250000" }, "content": { "DATA": { "SERNR": "sernr2" } } } To extract one data record from the JSON object and transfer the data record to a writer, you must configure the column parameter in the "column": [ "SERNR" ] format, the dataMode parameter in the "dataMode": "oneData" format, and the dataPath parameter in the "dataPath": "content.DATA" format.
Parameters in code for RestAPI Reader
You must configure the parameters that are described in the following table when you add a RestAPI data source and configure a data integration node.
Scheduling parameters are not supported for a data synchronization node that uses RestAPI Reader.
Parameter | Description | Required | Default value |
url | The URL of the RESTful API. | Yes | No default value |
dataMode | The format of the JSON data returned by a RESTful request.
| Yes | No default value |
responseType | The format of the response returned by the RESTful API. Only the JSON format is supported. | Yes | JSON |
column | The names of the fields from which you want to read data. The type parameter specifies the data type of a field. The name parameter specifies the JSON-formatted path in which the field is located. You can configure the column parameter in the following format: "column":[{"type":"long","name":"a.b" // Query data in the a.b path.},{"type":"string","name":"a.c"// Query data in the a.c path.}] You must configure the type and name parameters for each field. | Yes | No default value |
dataPath | The path of the JSON-formatted data record or JSON array that is queried. | No | No default value |
method | The request method. Valid values: get and post. | Yes | No default value |
customHeader | The header information transferred to the RESTful API. | No | No default value |
parameters | The parameter information transferred to the RESTful API.
| No | No default value |
dirtyData | The processing mechanism that is used when no data is found in the JSON-formatted path specified using the column parameter. Valid values:
| Yes | dirty |
requestTimes | The number of times data is requested from the RESTful address.
| Yes | single |
requestParam | If you set the requestTimes parameter to multiple, you must configure a parameter that you want to repeatedly pass to the RESTful API in each request. For example, if you configure the pageNumber parameter, RestAPI Reader passes the pageNumber parameter to the RESTful API based on the settings of the startIndex, endIndex, and step parameters. | No | No default value |
startIndex | The start point of requests. The data at the start point is also requested. | No | No default value |
endIndex | The end point of requests. The data at the end point is also requested. | No | No default value |
step | The step at which requests are sent. | No | No default value |
authType | The authentication method. Valid values:
| No | No default value |
authUsername/authPassword | The username and password used for basic authentication. | No | No default value |
authToken | The token used for token-based authentication. | No | No default value |
accessKey/accessSecret | The AccessKey pair used for authentication based on Alibaba Cloud API signature. | No | No default value |
Writer script demo
{
"type":"job",
"version":"2.0",
"steps":[
{
"stepType":"stream",
"parameter":{
},
"name":"Reader",
"category":"reader"
},
{
"stepType":"restapi",
"parameter":{
"url":"http://127.0.0.1:5000/writer1",
"dataMode":"oneData",
"responseType":"json",
"column":[
{
"type":"long", // Store data in the a.b path.
"name":"a.b"
},
{
"type":"string", // Store data in the a.c path.
"name":"a.c"
}
],
"method":"post",
"defaultHeader":{
"X-Custom-Header":"test header"
},
"customHeader":{
"X-Custom-Header2":"test header2"
},
"parameters":"abc=1&def=1",
"batchSize":256
},
"name":"restapiwriter",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0" // The maximum number of dirty data records allowed.
},
"speed":{
"throttle":true,// Specifies whether to enable throttling. The value false indicates that throttling is disabled, and the value true indicates that throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true.
"concurrent":1, // The maximum number of parallel threads.
"mbps":"12"// The maximum transmission rate. Unit: MB/s.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}
Parameters in code for RestAPI Writer
Parameter | Description | Required | Default value |
url | The URL of the RESTful API. | Yes | No default value |
dataMode | The format in which RestAPI Writer transfers JSON-formatted data.
| Yes | No default value |
column | The columns to which you want to write the generated JSON-formatted data. The type field specifies the data type of a column. The name field specifies the JSON-formatted path where the column is stored. You can configure the column parameter in the following format: "column":[{"type":"long","name":"a.b" // Store data in the a.b path.},{"type":"string","name":"a.c"// Store data in the a.c path.}] Note You must configure the type and name parameters for each field. | Yes | No default value |
dataPath | The path that is used to store the JSON-formatted data. | No | No default value |
method | The request method. Valid values: post and put. | Yes | No default value |
customHeader | The header information transferred to the RESTful API. | No | No default value |
authType | The authentication method. Valid Values:
| No | No default value |
authUsername/authPassword | The username and password used for basic authentication. | No | No default value |
authToken | The token used for token-based authentication. | No | No default value |
accessKey/accessSecret | The AccessKey pair used for authentication based on Alibaba Cloud API signature. | No | No default value |
batchSize | The maximum number of data records that can be transferred in each request when the dataMode parameter is set to multiData. | Yes | 512 |