All Products
Search
Document Center

DataWorks:REST API data source (HTTP)

Last Updated:Jun 30, 2026

You can create a REST API data source to write JSON data from a RESTful API to another data source, such as MaxCompute, by using a synchronization task. A REST API data source can also act as a destination to receive data from other data sources. This topic describes the data synchronization capabilities of the REST API data source in DataWorks.

Limitations

Supported column types

Important

When you synchronize data to a destination, only a flat, single-level table structure is supported. Nested column structures are not supported. For example, if an API returns a structure like {data: {user: { id: 1, name:'lily'}, value: 123}}, the columns must be flattened into parallel columns such as user_id, user_name, and value in the destination.

Type

Column type

Integer

LONG, INT

String

STRING

Floating-point

DOUBLE, FLOAT

Boolean

BOOLEAN

Date and time

DATE

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data source management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.

Data source authentication

The REST API data source supports the following three authentication methods:

  • No Auth: No authentication is required. You can directly access the API. This method is suitable for public APIs that do not require authentication.

  • Basic Auth: Authentication is performed by using a username and password. After you select this method, enter the username and password on the configuration page.

  • Token Auth: Authentication is performed by using a token. After you select this method, enter the access_token obtained from the third-party API in the token field on the configuration page.

DataWorks does not provide a built-in tool for obtaining third-party API tokens. If your third-party API uses token-based authentication such as OAuth 2.0, you must obtain the access_token from the API provider on your own. The following example shows how to obtain a token by using curl:

curl -X POST https://api.example.com/oauth/token \
  -d 'grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET'

After you obtain the token, set Authentication Method to Token Auth when you create a REST API data source, and enter the token in the corresponding field.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Configure a single-table batch synchronization task

Examples

FAQ

  • Can I only specify the number of pagination requests?

    • Answer: Yes.

  • Is automatic pagination supported? For example, stop paginating when the request returns no data.

    • Answer: No. Otherwise, split-based sharding cannot be performed.

  • If I specify more pagination pages than actually exist, causing empty data for the remaining pages, how does the system handle this?

    • Answer: When the remaining pages return empty data, it is equivalent to an SQL query returning no data. The system will continue to query the next record.

  • Does the system support parsing only one level of JSON data?

    • Answer: Yes. Deeper-level parsing is not performed.

  • How do I configure a non-array data type for a REST API in DataWorks Data Integration?

    • Answer: Make sure that in the reader section of parameter, set dataPath to the path that points to the non-array data. For example: dataPath:"data.list". This helps the plug-in correctly locate the data columns you want to read. Then, set dataMode to multiData. This means DataWorks will process the data as multiple individual records, even if they are not in array form in the source data.

      Note

      Note that in multiData mode, the column configuration is no longer applicable. You should directly specify the data path in dataPath.

      The following is an example of configuring a non-array data type for the REST API in Data Integration:

      reader: {
        name: "restapi",
        parameter: {
          dataPath: "data.list",
          dataMode: "multiData",
          // Other parameters
        }
      }

Appendix: Script demo and parameter description

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Script mode configuration. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script demo

  • The following is a script example:

    {
        "type":"job",
        "version":"2.0",
        "steps":[
            {
                "stepType":"restapi",
                "parameter":{
                    "url":"http://127.0.0.1:5000/get_array5",
                    "dataMode":"oneData",
                    "responseType":"json",
                    "column":[
                        {
                            "type":"long",
                            "name":"a.b"  //Find data from the a.b path
                        },
                        {
                            "type":"string",  //Find data from the a.c path
                            "name":"a.c"
                        }
                    ],
                    "dirtyData":"null",
                    "method":"get",
                    "socketTimeout":"60000",
                    "defaultHeader":{
                        "X-Custom-Header":"test header"
                    },
                    "customHeader":{
                        "X-Custom-Header2":"test header2"
                    },
                    "parameters":"abc=1&def=1"
                },
                "name":"restapireader",
                "category":"reader"
            },
            {
                "stepType":"stream",
                "parameter":{
    
                },
                "name":"Writer",
                "category":"writer"
            }
        ],
        "setting":{
            "errorLimit":{
                "record":""
            },
            "speed":{
                "throttle":true,  //When throttle is set to false, the mbps parameter does not take effect, indicating no throttling. When throttle is set to true, throttling is enabled.
                "concurrent":1,  //Job concurrency.
                "mbps":"12"//Throttling. Here 1 mbps = 1 MB/s.
            }
        },
        "order":{
            "hops":[
                {
                    "from":"Reader",
                    "to":"Writer"
                }
            ]
        }
    }
  • The script mode configuration is described as follows:

    After the RESTful API plugin sends an HTTP(S) request, it receives a response body (the body is a JSON object). The dataPath parameter specifies the JSON path to extract data from the body. Here are two examples:
    
    
    Using the following API response body as an example, the business data is in DATA, and the API returns multiple rows of data at once (DATA is an array):
    {
        "HEADER": {
            "BUSID": "bid1",
            "RECID": "uuid",
            "SENDER": "dc",
            "RECEIVER": "pre",
            "DTSEND": "202201250000"
        },
        "DATA": [
            {
                "SERNR": "sernr1"
            },
            {
                "SERNR": "sernr2"
            }
        ]
    }
    
    To extract multiple rows of data from DATA as multiple sync records, configure column as "column": [ "SERNR" ], dataMode as "dataMode": "multiData", and dataPath as "dataPath": "DATA".
    
    
    Using the following API response body as an example, the business data is in content.DATA, and the API returns one row of data at a time (DATA is an object):
    {
        "HEADER": {
            "BUSID": "bid1",
            "RECID": "uuid",
            "SENDER": "dc",
            "RECEIVER": "pre",
            "DTSEND": "202201250000"
        },
        "content": {
            "DATA": {
                "SERNR": "sernr2"
            }
        }
    }
    
    To extract one row of data from content.DATA as a single sync record, configure column as "column": [ "SERNR" ], dataMode as "dataMode": "oneData", and dataPath as "dataPath": "content.DATA".
                    

Reader script parameters

Note

The following parameters are involved in the process of adding a data source and configuring a Data Integration task node.

The current plug-in does not support scheduling parameters.

Parameter

Description

Required

Default value

url

The RESTful API URL.

Yes

N/A

dataMode

The format of the JSON data returned by the RESTful API request.

  • oneData: Retrieves one record from the returned JSON.

  • multiData: Retrieves a JSON array from the returned JSON and passes multiple records to the writer.

Yes

N/A

responseType

The data format of the response. Currently, only the JSON format is supported.

Yes

JSON

column

The list of columns to read. The type parameter specifies the data type of the source data, and the name parameter specifies the JSON path from which the current column data is retrieved. You can specify column information as follows.

"column":[{"type":"long","name":"a.b" //Retrieve data from path a.b},{"type":"string","name":"a.c"//Retrieve data from path a.c}]

For each column you specify, type and name are required.

Yes

N/A

dataPath

The path to a single JSON object or JSON array in the response.

No

N/A

method

The request method. GET and POST are supported.

Yes

N/A

socketTimeout

The socket timeout for accessing the RESTful API, in milliseconds.

No

60000

customHeader

The header information passed to the RESTful API.

No

N/A

parameters

The parameter information passed to the RESTful API.

  • For the GET method, enter abc=1&def=1.

  • For the POST method, enter JSON parameters.

No

N/A

dirtyData

Specifies how to handle data when no data is found at the specified column JSON path.

  • dirty: When a column cannot be found during data parsing, the record is marked as dirty data.

  • null: When a column cannot be found during data parsing, the column value is set to null.

Yes

dirty

requestTimes

The number of times to request data from the RESTful API.

  • single: Sends only one request.

  • multiple: Sends multiple requests.

Yes

single

requestParam

When requestTimes is set to multiple, you must specify the loop parameter, such as pageNumber. The plug-in loops through the pageNumber parameter based on the startIndex, endIndex, and step values, and passes it to the RESTful API for multiple requests.

No

N/A

startIndex

The start index of the loop requests. The start index is inclusive.

No

N/A

endIndex

The end index of the loop requests. The end index is inclusive.

No

N/A

step

The step size of the loop requests.

No

N/A

authType

The authentication method. Valid values:

  • Basic Auth: Basic authentication.

    If the data source API supports authentication with a username and password, select this method. Then, configure the username and password. During data integration, the credentials are sent to the RESTful endpoint through the Basic Auth protocol for authentication.

  • Token Auth: Token-based authentication.

    If the data source API supports token-based authentication, select this method. Then, configure a fixed token value. During data integration, the token is passed in the request header for authentication. For example: {"Authorization":"Bearer TokenXXXXXX"}.

    Note

    To use a custom encryption method, you can use the Token authentication method and provide the encrypted authentication information as the AuthToken.

No

N/A

authUsername/authPassword

The username and password for Basic Auth authentication.

No

N/A

authToken

The token for Token Auth authentication.

No

N/A

accessKey/accessSecret

The account information for Alibaba Cloud API signature authentication.

No

N/A

Writer script demo

{
    "type":"job",
    "version":"2.0",
    "steps":[
        {
            "stepType":"stream",
            "parameter":{

            },
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"restapi",
            "parameter":{
                "url":"http://127.0.0.1:5000/writer1",
                "dataMode":"oneData",
                "responseType":"json",
                "column":[
                    {
                        "type":"long", //Place column data to path a.b
                        "name":"a.b"
                    },
                    {
                        "type":"string", //Place column data to path a.c
                        "name":"a.c"
                    }
                ],
                "method":"post",
                "defaultHeader":{
                    "X-Custom-Header":"test header"
                },
                "customHeader":{
                    "X-Custom-Header2":"test header2"
                },
                "parameters":"abc=1&def=1",
                "batchSize":256
            },
            "name":"restapiwriter",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0" //The error count.
        },
        "speed":{
            "throttle":true,//If throttle is set to false, the mbps parameter does not take effect, which means throttling is disabled. If throttle is set to true, throttling is enabled.
            "concurrent":1, //The concurrency of the job.
            "mbps":"12"//Throttling. 1 mbps = 1 MB/s.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Writer script parameters

Parameter

Description

Required

Default value

url

The RESTful API URL.

Yes

N/A

dataMode

The format of the JSON data passed through the RESTful request.

  • oneData: Sends only one record per request. The number of requests equals the number of records.

  • multiData: Sends a batch of records per request. The number of requests is determined by the number of tasks split on the reader side.

Yes

N/A

column

The list of column paths for generating JSON data. The type parameter specifies the data type of the source data, and the name parameter specifies the JSON path where the current column data is placed. You can specify column information as follows.

"column":[{"type":"long","name":"a.b" //Place column data to path a.b},{"type":"string","name":"a.c"//Place column data to path a.c}]

Note

For each column you specify, type and name are required.

Yes

N/A

dataPath

The JSON object path where the data result is placed.

No

N/A

method

The request method. POST and PUT are supported.

Yes

N/A

customHeader

The header information passed to the RESTful API.

No

N/A

authType

The authentication method.

  • Basic Auth: Basic authentication.

    If the data source API supports authentication with a username and password, select this method. Then, configure the username and password. During data integration, the credentials are sent to the RESTful endpoint through the Basic Auth protocol for authentication.

  • Token Auth: Token-based authentication.

    If the data source API supports token-based authentication, select this method. Then, configure a fixed token value. During data integration, the token is passed in the request header for authentication. For example: {"Authorization":"Bearer TokenXXXXXX"}.

    Note

    To use a custom encryption method, you can use the Token authentication method and provide the encrypted authentication information as the AuthToken.

No

N/A

authUsername/authPassword

The username and password for Basic Auth authentication.

No

N/A

authToken

The token for Token Auth authentication.

No

N/A

accessKey/accessSecret

The account information for Alibaba Cloud API signature authentication.

No

N/A

batchSize

The maximum number of records per request when dataMode is set to multiData.

Yes

512