All Products
Search
Document Center

DataWorks:Use RestAPI Reader

Last Updated:Mar 26, 2026

RestAPI Reader reads data from RESTful APIs and writes it to a destination store through Data Integration. Configure an HTTP request URL and RestAPI Reader fetches the response, converts the JSON data to supported types, and passes it to a writer.

This topic shows four common patterns: time-range queries, paginated results, POST requests, and loop traversal with a for-each node.

For parameter reference, see RestAPI (HTTP). For RestAPI Writer configuration, see RestAPI (HTTP) and Writer script parameters.

Capabilities

Dimension Details
Supported response format JSON only
Supported data types INT, BOOLEAN, DATE, DOUBLE, FLOAT, LONG, STRING
Request methods GET, POST over HTTP
Authentication None, Basic Auth, Token Auth, Alibaba Cloud API signature

Authentication options

Use the following guidance to identify which authentication method your API requires:

  • None: Your API is publicly accessible without credentials.

  • Basic Auth: Your API documentation mentions keywords such as "Basic Auth", "Basic HTTP", or "Authorization: Basic". During sync, the username and password are passed in the RESTful API URL for authentication.

  • Token Auth: Your API documentation mentions keywords such as "bearer token", "token authentication", or "Authorization: Bearer". Pass a fixed token in the request header, for example: {"Authorization":"Bearer TokenXXXXXX"}. For APIs with custom encryption, provide the encrypted value as the AuthToken.

  • Alibaba Cloud API signature: Your API uses Alibaba Cloud's signature authentication scheme.

RestAPI data sources require an exclusive resource group for Data Integration. Select and test network connectivity to the resource group before configuring a task.

Prerequisites

Before you begin, ensure that you have:

  • A DataWorks workspace with DataStudio access

  • An exclusive resource group for Data Integration, with network connectivity to your RESTful API endpoint tested and confirmed

  • A MaxCompute data source configured in your workspace (for the examples in this topic)

Example 1: Read data within a time range

This example synchronizes time-range query results from a GET-based RESTful API to a MaxCompute partitioned table. The API accepts startTime and endTime parameters and returns matching records.

The API used here is for reference only. Adjust configurations to match your own API.

Sample API

Request:

http://TestAPIAddress:Port/rest/test2?startTime=<StartTime>&endTime=<EndTime>

Response:

{
    "status": "success",
    "totalNum": 187,
    "data": [
        {"axis": "series1", "value": 9191352, "createTime": "2023-01-04 00:07:20"},
        {"axis": "series1", "value": 6645322, "createTime": "2023-01-04 00:14:47"},
        {"axis": "series1", "value": 2078369, "createTime": "2023-01-04 00:22:13"},
        {"axis": "series1", "value": 7325410, "createTime": "2023-01-04 00:29:30"},
        {"axis": "series1", "value": 7448456, "createTime": "2023-01-04 00:37:04"},
        {"axis": "series1", "value": 5808077, "createTime": "2023-01-04 00:44:30"},
        {"axis": "series1", "value": 5625821, "createTime": "2023-01-04 00:52:06"}
    ]
}

The queried records are in data. The returned fields are axis, value, and createTime.

apirest2

Step 1: Create a MaxCompute partitioned table

CREATE TABLE IF NOT EXISTS ods_xiaobo_rest2
(
  `axis`       STRING,
  `value`      BIGINT,
  `createTime` STRING
)
PARTITIONED BY
(
  ds STRING
)
LIFECYCLE 3650;

If you work in a workspace in standard mode, deploy the table to the production environment to make it visible in Data Map.

With the write mode set to overwriting, the sync task overwrites partition data on each run, making reruns safe and preventing duplicate records.

Step 2: Add a RestAPI data source

Add a RestAPI data source to your workspace. For details, see Add a RestAPI data source.

image.png

Key fields:

  • url: The RESTful API endpoint URL.

  • Auth method: The authentication method supported by your data source and its required parameters.

Step 3: Configure the batch synchronization task

On the DataStudio page, create a batch synchronization task using the codeless UI. For details, see Use the codeless UI.

Source settings:

Setting Value
Data source The RestAPI data source added above
Request Method GET
Data Structure Array Data
json path to store data data

Request parameters — combine request parameters with scheduling parameters to sync data daily:

startTime=${extract_day} ${start_time}&endTime=${extract_day} ${end_time}

Add these scheduling parameters in the task's scheduling properties:

extract_day=${yyyy-mm-dd}
start_time=00:00:00
end_time=23:59:59

When the task runs on 2023-01-05, extract_day resolves to 2023-01-04, producing:

startTime=2023-01-04 00:00:00&endTime=2023-01-04 23:59:59

Destination settings:

Setting Value
Data source The MaxCompute data source
Table ods_xiaobo_rest2
Partition information ${bizdate} — add scheduling parameter bizdate=$bizdate

When the task runs on 2023-01-05, data is written to partition 20230104.

Field mappings: Specify the fields returned by the API (axis, value, createTime). Field names are case-sensitive. Click Map Fields with Same Name to auto-map, or drag lines to map manually.

Step 4: Test the task

Click Run with Parameters in the toolbar. Specify values for the scheduling parameters in the dialog.

带参运行

After the run, check the logs at the bottom of the configuration tab to confirm the parameter values are correct.

Step 5: Verify the data

Create an ad hoc query node in DataStudio to check the result:

SELECT * FROM ods_xiaobo_rest2 WHERE ds='20230104' ORDER BY createtime;

20230104 is the partition written in the test run above.

结果1

Step 6: Deploy and backfill

Deploy the task to the production environment. For standard mode workspaces, see Publishing process for tasks in a standard mode workspace.

提交发布

Find the deployed task on the Cycle Task page in Operation Center and backfill historical data as needed. See Backfill data for an auto triggered task and view data backfill instances generated for the task.

Example 2: Read paginated results

This example synchronizes paged data from a GET-based RESTful API to MaxCompute. The API uses pageSize and pageNum to control pagination.

The API used here is for reference only. Adjust configurations to match your own API.

Sample API

Request:

http://TestAPIAddress:Port/rest/test1?pageSize=5&pageNum=1

pageSize controls how many records to return per page. pageNum specifies the page number.

Response:

{
    "status": "success",
    "totalNum": 304,
    "data": [
        {"id": 6,  "name": "User 6"},
        {"id": 7,  "name": "User 7"},
        {"id": 8,  "name": "User 8"},
        {"id": 9,  "name": "User 9"},
        {"id": 10, "name": "User 10"}
    ]
}

The queried records are in data. The returned fields are id and name.

apirest1

Step 1: Create a MaxCompute partitioned table

CREATE TABLE IF NOT EXISTS ods_xiaobo_rest1
(
  `id`   BIGINT,
  `name` STRING
)
PARTITIONED BY
(
  ds STRING
)
LIFECYCLE 3650;

Deploy the table to the production environment in standard mode to view it in Data Map.

Step 2: Add a RestAPI data source

Add a RestAPI data source to your workspace. For details, see Add a RestAPI data source.

image.png

Step 3: Configure the batch synchronization task

On the DataStudio page, create a batch synchronization task using the codeless UI. For details, see Use the codeless UI.

Source settings:

Setting Value
Data source The RestAPI data source added above
Request Method GET
Data Structure Array Data
json path to store data data
Request parameters pageSize=50
Number of requests Multiple requests

For Multiple requests, configure the pagination traversal:

Parameter Value Description
Multiple requests parameter pageNum The page number parameter in the API
StartIndex 1 First page number
Step 1 Increment per request
EndIndex 100 Last page to fetch

Setting pageSize=50 tells the API to return 50 records per page. Keep this value reasonable — an excessively large pageSize increases load on the API server and the sync task.

With this configuration, Data Integration sends requests from pageNum=1 to pageNum=100 in steps of 1, collecting all returned records.

Destination settings:

Setting Value
Data source The MaxCompute data source
Table ods_xiaobo_rest1
Partition information ${bizdate} — add scheduling parameter bizdate=$bizdate

When the task runs on 2023-01-05, data is written to partition 20230104.

Field mappings: Specify id and name. Field names are case-sensitive. Click Map Fields with Same Name to auto-map.

Step 4: Test the task

Click Run with Parameters and specify the scheduling parameter values.

带参运行

Step 5: Verify the data

SELECT * FROM ods_xiaobo_rest1 WHERE ds='20230104' ORDER BY id;
结果1

Example 3: Read data from a POST-based API

This example synchronizes data from a POST-based RESTful API. The request body is JSON and includes a user ID and time range. The API returns nested records.

The API used here is for reference only. Adjust configurations to match your own API.

Sample API

Request:

http://TestAPIAddress:Port/rest/test3

Request body:

{
  "userId": 16,
  "startTime": "2023-01-04 00:00:00",
  "endTime": "2023-01-04 23:59:59"
}

Response:

{
    "status": "success",
    "totalNum": 289,
    "data": [
        {
            "user": {"id": 16, "name": "User 16"},
            "axis": "series1",
            "value": 8231053,
            "createTime": "2023-01-04 00:04:57"
        },
        {
            "user": {"id": 16, "name": "User 16"},
            "axis": "series1",
            "value": 6519928,
            "createTime": "2023-01-04 00:09:51"
        },
        {
            "user": {"id": 16, "name": "User 16"},
            "axis": "series1",
            "value": 2915920,
            "createTime": "2023-01-04 00:14:36"
        },
        {
            "user": {"id": 16, "name": "User 16"},
            "axis": "series1",
            "value": 7971851,
            "createTime": "2023-01-04 00:19:51"
        },
        {
            "user": {"id": 16, "name": "User 16"},
            "axis": "series1",
            "value": 6598996,
            "createTime": "2023-01-04 00:24:30"
        }
    ]
}

The returned fields are user.id, user.name, axis, value, and createTime. Note that user.id and user.name are nested fields — use a period (.) to reference them in field mappings.

apirest3

Step 1: Create a MaxCompute partitioned table

CREATE TABLE IF NOT EXISTS ods_xiaobo_rest3
(
  `user_id`     BIGINT,
  `name`        STRING,
  `axis`        STRING,
  `value`       BIGINT,
  `create_time` STRING
)
PARTITIONED BY
(
  ds STRING
)
LIFECYCLE 3650;

Deploy the table to the production environment in standard mode to view it in Data Map.

Step 2: Add a RestAPI data source

Add a RestAPI data source to your workspace. For details, see Add a RestAPI data source.

image.png

Step 3: Configure the batch synchronization task

On the DataStudio page, create a batch synchronization task using the codeless UI. For details, see Use the codeless UI.

Source settings:

Setting Value
Data source The RestAPI data source added above
Request Method POST
Data Structure Array Data
json path to store data data
Header {"Content-Type":"application/json"}

Request parameters — use a JSON body with scheduling parameters for daily sync:

{
    "userId": 16,
    "startTime": "${extract_day} 00:00:00",
    "endTime": "${extract_day} 23:59:59"
}

Add this scheduling parameter in the task's scheduling properties:

extract_day=${yyyy-mm-dd}

Destination settings:

Setting Value
Data source The MaxCompute data source
Table ods_xiaobo_rest3
Partition information ${bizdate} — add scheduling parameter bizdate=$bizdate

When the task runs on 2023-01-05, data is written to partition 20230104.

Field mappings: Specify the fields returned by the API. For nested fields, use dot notation — for example, enter user.id to reference the id field inside user. Field names are case-sensitive. Click Map Fields with Same Name to auto-map, or drag lines to map manually.

Step 4: Test the task

Click Run with Parameters and specify the scheduling parameter values.

带参运行

Step 5: Verify the data

SELECT * FROM ods_xiaobo_rest3 WHERE ds='20230105' ORDER BY create_time;

20230105 is the partition written in the test run above.

结果3

Example 4: Traverse request parameters in a loop

This example fetches weather data for multiple city-province combinations by looping through a parameter table. An assignment node reads the city list from MaxCompute, a for-each node iterates over it, and a batch synchronization task runs once per combination.

The API used here is for reference only. Adjust configurations to match your own API.

Sample API

Request:

http://TestAPIAddress:Port/rest/test5?date=2023-01-04&province=zhejiang&city=hangzhou

Response:

{
  "province": "P1",
  "city": "hz",
  "date": "2023-01-04",
  "minTemperature": "-14",
  "maxTemperature": "-7",
  "unit": "℃",
  "weather": "cool"
}
image.png

Step 1: Create the parameter table and destination table

The parameter table stores the province-city combinations to iterate over. The destination table stores the sync results, partitioned by province, city, and date.

Create a parameter table

-- Parameter table: stores the loop values
CREATE TABLE IF NOT EXISTS `citys`
(
  `province` STRING,
  `city`     STRING
);

INSERT INTO citys
SELECT 'shanghai', 'shanghai'
UNION ALL SELECT 'zhejiang', 'hangzhou'
UNION ALL SELECT 'sichuan', 'chengdu';

Create a MaxCompute partitioned table

-- Destination table: partitioned by province, city, and date
CREATE TABLE IF NOT EXISTS ods_xiaobo_rest5
(
    `minTemperature` STRING,
    `maxTemperature` STRING,
    `unit`           STRING,
    `weather`        STRING
)
PARTITIONED BY
(
    `province` STRING,
    `city`     STRING,
    `ds`       STRING
)
LIFECYCLE 3650;

In standard mode, deploy both tables to the production environment to view them in Data Map.

With the write mode set to overwriting, each run overwrites partition data, making reruns safe and preventing duplicate records.

Step 2: Add a RestAPI data source

Add a RestAPI data source to your workspace. For details, see Add a RestAPI data source.

image.png

Step 3: Create an assignment node

On the DataStudio page, create an assignment node named setval_citys. For details, see Assignment node.

Configure the node as follows:

Setting Value
Programming language ODPS SQL
Code SELECT province, city FROM citys;
Rerun property Allow Regardless of Running Status

After configuration, deploy the assignment node.

Step 4: Create a for-each node

On the DataStudio page, create a for-each node. For details, see the For-each node documentation.

Parameter Value
Rerun property Allow Regardless of Running Status
Ancestor node setval_citys
Input parameter Select the sources of the input parameters

Step 5: Configure the batch synchronization task inside the for-each node

Create a batch synchronization task using the codeless UI. For details, see Use the codeless UI.

The for-each node passes each row from citys as output parameters: dag.foreach.current[0] is province and dag.foreach.current[1] is city. Use these to build the API request and destination partition for each iteration.

Scheduling parameters:

bizdate=$[yyyymmdd-1]
bizdate_year=$[yyyy-1]
bizdate_month=$[mm-1]
bizdate_day=$[dd-1]

Request parameters (builds the API query string from scheduling and for-each parameters):

date=${bizdate_year}-${bizdate_month}-${bizdate_day}&province=${dag.foreach.current[0]}&city=${dag.foreach.current[1]}

Destination partition configuration:

Parameter Value Description
province ${dag.foreach.current[0]} Province from the current for-each iteration
city ${dag.foreach.current[1]} City from the current for-each iteration
ds ${bizdate} Scheduling parameter for the date partition

Field mappings: Specify the fields returned by the API (minTemperature, maxTemperature, unit, weather). Field names are case-sensitive. Click Map Fields with Same Name to auto-map, or drag lines to map manually.

After configuration, deploy the for-each node.

Step 6: Test the assignment node and for-each node

  1. On the Cycle Task page in Operation Center, find the assignment node and trigger a data backfill. For details, see Perform O&M on data backfill instances.

  2. Select the data timestamp range (start and end time) and the nodes to backfill.

  3. After the backfill instances are generated, check the run logs of the assignment node to confirm that scheduling parameters are rendered correctly. When parameters are correct, data is written to partitions such as province=shanghai,city=shanghai,ds=20231215.

Step 7: Verify the data

SELECT  weather,
        mintemperature,
        maxtemperature,
        unit,
        province,
        city,
        ds
FROM    ods_xiaobo_rest5
WHERE   ds != 1
ORDER BY ds, province, city;

ods_xiaobo_rest5 is the destination table created in Step 1.