All Products
Search
Document Center

DataWorks:HttpFile data source

Last Updated:May 17, 2024

DataWorks Data Integration supports HttpFile data sources. You can download files over HTTP and synchronize the files to a destination data source.

Limits

HttpFile data sources support only exclusive resource groups for Data Integration.

Data type mappings

Category

Description

STRING

Text.

LONG

Integer.

BYTES

Byte array. The text that is read is converted to a byte array. The encoding format is UTF-8.

BOOL

Boolean.

DOUBLE

Decimal.

DATE

Date and time. The following date and time formats are supported:

  • yyyy-MM-dd HH:mm:ss

  • yyyy-MM-dd

  • HH:mm:ss

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a data synchronization task, see the following sections. For information about the parameter settings, view the infotip of each parameter on the configuration tab of the task.

Add a data source

Before you configure a data synchronization task to synchronize data from a specific data source, you must add the data source to DataWorks. For more information, see Add and manage data sources.

Configure a batch synchronization task to synchronize data of a single table

Appendix: Code and parameters

Configure a batch synchronization task by using the code editor

If you use the code editor to configure a batch synchronization task, you must configure parameters for the reader of the related data source based on the format requirements in the code editor. For more information about the format requirements, see Configure a batch synchronization task by using the code editor. The following information describes the configuration details of parameters for the reader in the code editor.

Code for HttpFile Reader

In the following code, a synchronization task is configured to read data from an HttpFile file:

{
  "type": "job",
  "version": "2.0",
  "steps": [
    {
      "stepType": "httpfile",
      "parameter": {
        "datasource": "",
        "fileName": "/f/z/1.csv",
        "requestMethod": "GET",
        "requestBody": "",
        "requestHeaders": {
          "header1": "v1",
          "header2": "v2"
        },
        "socketTimeoutSeconds": 3600,
        "connectTimeoutSeconds": 60,
        "bufferByteSizeInKB": 1024,
        "fileFormat": "csv",
        "encoding": "utf8/gbk/...",
        "fieldDelimiter": ",",
        "useMultiCharDelimiter": true,
        "lineDelimiter": "\n",
        "skipHeader": true,
        "compress": "zip/gzip",
        "column": [
          {
            "index": 0,
            "type": "long"
          },
          {
            "index": 1,
            "type": "boolean"
          },
          {
            "index": 2,
            "type": "double"
          },
          {
            "index": 3,
            "type": "string"
          },
          {
            "index": 4,
            "type": "date"
          }
        ]
      },
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "stream",
      "parameter": {},
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting": {
    "errorLimit": {
      "record": "0"
    },
    "speed": {
      "concurrent": 1
    }
  },
  "order": {
    "hops": [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  }
}

Parameters in code for HttpFile Reader

Parameter

Description

Required

Default value

datasource

The name of the data source. It must be the same as the name of the added data source.

Yes

No default value

fileName

The file path. If the file name contains special characters, you must enter the value after the URL escape.

For example, you must escape a space to %20.

Original file path: /file/test abc.csv

Value of this parameter: /file/test%20abc.csv

Note
  • A file path can contain multiple special characters, such as spaces, number signs (#), and percent sign (%). The escape method varies based on the special characters. You can view the supported special characters in the DataWorks console. For information about the escape methods, see HTML Uniform Resource Locators.

  • The final access path consists of the URL domain name of the data source and the file path.

Yes

No default value

bufferByteSizeInKB

The buffer size of the downloaded file. Unit: KB.

No

1024

requestMethod

The request method. Valid values: GET, POST, and PUT.

No

GET

requestParam

This parameter takes effect only when the requestMethod parameter is set to GET. If the parameter value contains special characters, the parameter value must be escaped. Example:

The value of the start parameter is 2024-03-25 17:06:54

The value of this parameter is start=2024-03-25%2017:06:54.

Note

The start parameter specifies the start time of an operation when a GET request is initiated.

No

No default value

requestBody

The content of the request. This parameter takes effect only when the requestMethod parameter is set to POST or PUT. This parameter must be used with the Content-Type parameter in requestHeaders. Example:

{
 "requestBody":"{\"a\":\"b\"}",
 "requestHeaders": {
 "Content-Type": "application/json"
 }
}

No

No default value

requestHeaders

The request header, which is specified in a key-value pair. Example:

{
 "Content-Type": "application/json"
}

No

{
 "User-Agent": "DataX Http File Reader"
}

fileFormat

The type of the source file. Valid values: csv and text. You can specify delimiters for the two types of files.

No

No default value

column

The names of the columns from which you want to read data.

  • The type parameter specifies the source data type.

  • The index parameter specifies the ID of the column in the source file, starting from 0.

  • The value parameter specifies the column value if the column is a constant column. The reader does not read a constant column from the source. Instead, the reader generates data in a constant column based on the column value that you specify.

By default, the reader reads all data as strings based on the following configuration:

"column": ["*"]

You can also configure the column parameter in the following way:

"column":
    {
       "type": "long",
       "index": 0    // The first INT-type column in the file from which you want to read data. 
    },
    {
       "type": "string",
       "value": "alibaba"  // The value of the current column. In this code, the value is the constant alibaba. 
    }
Note

For the column parameter, you must configure the type parameter and either the index or value parameter. You are not allowed to configure the three parameters at the same time.

Yes

"column": ["*"]

fieldDelimiter

The column delimiter that is used in the file from which you want to read data.

Note

You must specify a column delimiter for HttpFile Reader. The default column delimiter is commas (,). If you do not specify a column delimiter, the default column delimiter is used.

If the delimiter is non-printable, enter a value encoded in Unicode, such as \u001b and \u007c.

Yes

,

lineDelimiter

The row delimiter that is used in the file from which you want to read data.

Note

This parameter takes effect only when the fileFormat parameter is set to text.

No

No default value

compress

The format in which files are compressed. By default, this parameter is left empty, which indicates that files are not compressed. The following compression formats are supported: GZIP, BZIP2, and ZIP.

No

No default value

encoding

The encoding format of the file from which you want to read data.

No

utf-8

nullFormat

The string that represents a null pointer. No standard strings can represent a null pointer in TXT files. You can use this parameter to define a string that represents a null pointer. Examples:

  • If you specify nullFormat:"null", the reader considers the printable string null as a null value.

  • If you specify nullFormat:"\u0001", the reader considers the non-printable string \u0001 as a null value.

  • If you do not configure the nullFormat parameter, the reader does not convert source data.

No

No default value

skipHeader

Specifies whether to skip the headers in a CSV-like file if the file has headers. Valid values:

  • true: indicates that the headers are skipped.

  • false: indicates that the headers are not skipped.

The skipHeader parameter is unavailable for compressed files. Common file compression formats are GZIP, BZIP2, and ZIP.

No

false

connectTimeoutSeconds

(advanced parameter, available only in the code editor)

The timeout period for HTTP requests. Unit: seconds. If the specified timeout period is exceeded, the task fails.

No

60

socketTimeoutSeconds

(advanced parameter, available only in the code editor)

The timeout period for HTTP responses. Unit: seconds. If the interval between two packets is greater than the specified timeout period, the task fails.

No

3600

References