All Products
Search
Document Center

DataWorks:HttpFile

Last Updated:Mar 26, 2026

DataWorks Data Integration downloads files from remote HTTP endpoints using the HTTP protocol and syncs them to a target data source.

Supported resource groups

HttpFile supports the following resource groups:

Supported field types

Data type Description
STRING Text.
LONG Integer.
BYTES Byte array. Text content is converted to a UTF-8 encoded byte array.
BOOL Boolean.
DOUBLE Decimal.
DATE Date and time. Supported formats: yyyy-MM-dd HH:mm:ss, yyyy-MM-dd, HH:mm:ss.

Supported file formats and compression

File format Supported
CSV Yes
TEXT (delimited) Yes
Compression Supported
gzip Yes
bzip2 Yes
zip Yes
skipHeader is not supported for compressed files.

Add a data source

Add the HttpFile data source on the Data Source Management page before creating a synchronization task. For instructions, see Data source management.

Configure a synchronization task

Configure an offline synchronization task

Use the codeless UI or the code editor to configure your task:

For the full script reference and parameter descriptions, see Script reference.

Script reference

Reader script example

The following script reads from a CSV file over HTTP using GET, skips the header row, and maps five columns to different data types.

{
  "type": "job",
  "version": "2.0",
  "steps": [
    {
      "stepType": "httpfile",
      "parameter": {
        "datasource": "<data-source-name>",
        "fileName": "/data/export.csv",
        "requestMethod": "GET",
        "requestHeaders": {
          "Authorization": "Bearer <token>"
        },
        "socketTimeoutSeconds": 3600,
        "connectTimeoutSeconds": 60,
        "bufferByteSizeInKB": 1024,
        "fileFormat": "csv",
        "encoding": "utf-8",
        "fieldDelimiter": ",",
        "skipHeader": true,
        "compress": "",
        "column": [
          { "index": 0, "type": "long" },
          { "index": 1, "type": "boolean" },
          { "index": 2, "type": "double" },
          { "index": 3, "type": "string" },
          { "index": 4, "type": "date" }
        ]
      },
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "stream",
      "parameter": {},
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting": {
    "errorLimit": {
      "record": "0"
    },
    "speed": {
      "concurrent": 1
    }
  },
  "order": {
    "hops": [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  }
}

Replace the placeholders with your actual values:

Placeholder Description Example
<data-source-name> The name of the HttpFile data source on the Data Source Management page. my-http-source
<token> Your API authentication token. eyJhbGc...

Reader parameters

Parameters are grouped by function. Connection parameters define how to reach the endpoint; read behavior parameters control how the file is parsed.

Connection parameters

Parameter Description Required Default
datasource The name of the HttpFile data source. Must match exactly the name on the Data Source Management page. Yes None
fileName The file path on the HTTP server. URL-encode any special characters or non-ASCII characters. For example, a space in /file/test abc.csv becomes /file/test%20abc.csv. The final request URL combines the data source base URL with this path. For encoding rules, see HTML URL Encoding Reference. Yes None
requestMethod The HTTP method. Valid values: GET, POST, PUT. No GET
requestParam Query parameters appended to the URL. Takes effect only when requestMethod is GET. URL-encode any special characters. For example, start=2024-03-25 17:06:54 becomes start=2024-03-25%2017:06:54. No None
requestBody The request body. Takes effect only when requestMethod is POST or PUT. Pair with Content-Type in requestHeaders. Example: {"requestBody": "{\"a\":\"b\"}", "requestHeaders": {"Content-Type": "application/json"}} No None
requestHeaders HTTP request headers as key-value pairs. Example: {"Content-Type": "application/json"} No {"User-Agent": "DataX Http File Reader"}
connectTimeoutSeconds How long to wait when establishing an HTTP connection, in seconds. If exceeded, the task fails. Available in Advanced mode only; not configurable in the codeless UI. No 60
socketTimeoutSeconds How long to wait between consecutive data packets, in seconds. If exceeded, the task fails. Available in Advanced mode only; not configurable in the codeless UI. No 3600
bufferByteSizeInKB Download buffer size, in KB. No 1024

Read behavior parameters

Parameter Description Required Default
fileFormat Source file format. Valid values: csv, text. Both formats support custom field delimiters. No None
encoding File character encoding. No utf-8
fieldDelimiter Field delimiter. For non-printable characters, use the Unicode representation, for example \u001b. Yes ,
useMultiCharDelimiter Specifies whether the field delimiter is a multi-character string. No false
lineDelimiter Line delimiter. Takes effect only when fileFormat is text. No None
skipHeader Specifies whether to skip the first row. Set to true for files with a header row. Not supported for compressed files. No false
compress Compression format of the source file. Leave blank if the file is uncompressed. Valid values: gzip, bzip2, zip. No None (uncompressed)
column List of columns to read. Each entry requires type and either index or value (not both). See Column configuration. Yes All columns read as STRING
nullFormat The string in the source file that represents a null value. For example, "nullFormat": "null" treats the string null as null; "nullFormat": "\u0001" treats the non-printable character as null. If not set, source data is written to the destination as-is. No None

Column configuration

Each entry in the column array uses the following fields:

Field Description
type Data type of the column. Required. Valid values: long, boolean, double, string, date.
index Column position in the source file, starting from 0. Specify either index or value, not both.
value A constant value to populate the column with, instead of reading from the source file. Specify either index or value, not both.

To read all columns as STRING without specifying individual types:

"column": ["*"]

To map specific columns with types and inject a constant:

"column": [
  { "type": "long", "index": 0 },
  { "type": "string", "value": "alibaba" }
]

What's next