All Products
Search
Document Center

DataWorks:COS

Last Updated:Mar 26, 2026

COS (Tencent Cloud Object Storage) is a read-only data source in DataWorks. Connect it to read files from a COS bucket and sync the data to any supported destination.

Supported capabilities

Capability Supported
Offline sync — read (source) Yes
Offline sync — write (destination) No
Real-time sync No

Prerequisites

Before you begin, ensure that you have:

  • A Tencent Cloud account with a COS bucket

  • A SecretId and SecretKey with read access to the bucket — get them from API Key Management in the Tencent Cloud console

  • The region ID and endpoint of the bucket — see Regions and Endpoints

  • A DataWorks workspace

Supported data types

Data type Description
STRING Text
LONG Integer
DOUBLE Floating-point number
BOOL Boolean
DATE Date and time. Supported formats: YYYY-MM-dd HH:mm:ss and yyyy-MM-ddHH:mm:ss
BYTES Byte array. Text content is converted to a UTF-8 encoded byte array.

Create a data source

Create a COS data source before configuring any sync task. For the full procedure, see Data Source Management.

The key parameters are described below.

Parameter Description
Data Source Name A unique name within the workspace. Use letters, digits, and underscores (_). The name cannot start with a digit or an underscore.
Region The region where the bucket is located. Enter the region ID. See Regions and Endpoints.
Bucket The name of the COS bucket.
Endpoint The endpoint of COS. See Regions and Endpoints.
AccessKey ID Corresponds to SecretId on Tencent Cloud. Get it from API Key Management.
AccessKey Secret Corresponds to SecretKey on Tencent Cloud. Get it from API Key Management.

Configure an offline sync task for a single table

To configure a COS offline sync task, use the codeless UI or the code editor.

Appendix: Script demo and parameters

Reader script demo

{
  "type": "job",
  "version": "2.0",
  "steps": [
    {
      "stepType": "cos",
      "parameter": {
        "datasource": "",
        "object": ["f/z/1.csv"],
        "fileFormat": "csv",
        "encoding": "utf8/gbk/...",
        "fieldDelimiter": ",",
        "useMultiCharDelimiter": true,
        "lineDelimiter": "\n",
        "skipHeader": true,
        "compress": "zip/gzip",
        "column": [
          {
            "index": 0,
            "type": "long"
          },
          {
            "index": 1,
            "type": "boolean"
          },
          {
            "index": 2,
            "type": "double"
          },
          {
            "index": 3,
            "type": "string"
          },
          {
            "index": 4,
            "type": "date"
          }
        ]
      },
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "stream",
      "parameter": {},
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting": {
    "errorLimit": {
      "record": "0"
    },
    "speed": {
      "concurrent": 1
    }
  },
  "order": {
    "hops": [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  }
}

Reader script parameters

Connection

Parameter Description Required Default
datasource The data source name. Must match the name of the data source you added. Yes None

Locate files

Parameter Description Required Default
object The file path. Supports the asterisk (*) wildcard and can be configured as an array. For example, to sync a/b/1.csv and a/b/2.csv, set this to a/b/*.csv. Yes None

Parse files

Parameter Description Required Default
fileFormat The format of the source file. Valid values: csv, text, parquet, orc. Yes None
column The fields to read. Each entry specifies:<br>- type: the data type in the source<br>- index: the column position (0-based)<br>- value: a constant value — no data is read from the source; a column is generated with this value instead<br><br>To read all columns as STRING, use "column": ["*"].<br><br>To specify individual columns:<br>``json<br>"column": [<br> { "type": "long", "index": 0 },<br> { "type": "string", "value": "alibaba" }<br>]``<br><br>
Important

Each entry must include type and either index or value.

Yes All columns as STRING
fieldDelimiter The field separator. For invisible characters, use Unicode encoding (for example, \u001b or \u007c). Yes ,
lineDelimiter The row separator. Valid only when fileFormat is text. No None
encoding The file encoding. No utf-8
compress The compression format. Supported values: gzip, bzip2, zip. Leave blank if the file is not compressed. No Uncompressed
skipHeader For a CSV file, specifies whether to read the table header.<br>- true: The table header is read during data synchronization.<br>- false: The table header is not read during data synchronization.<br><br>
Note

Not supported for compressed files.

No false
nullFormat The string to treat as a null value. For example, if set to "null", any field containing the text null is written to the destination as null. If not set, source data is written as-is without conversion. No None

Advanced options

Parameter Description Required Default
parquetSchema Required when fileFormat is parquet. Defines the data structure using the following format:<br><br>``<br>message MessageTypeName {<br> required|optional DataType ColumnName;<br> ...<br>}<br>`<br><br>Supported data types: BOOLEAN, Int32, Int64, Int96, FLOAT, DOUBLE, BINARY (use for string types), fixed_len_byte_array. Set all fields to optional unless null values are not allowed. Each field definition must end with a semicolon (;), including the last one.<br><br>Example:<br>`json<br>{"parquetSchema": "message UserProfile { optional int32 minute_id; optional int32 dsp_id; optional int32 adx_pid; optional int64 req; optional int64 res; optional int64 suc; optional int64 imp; optional double revenue; }"}<br>`` No (required for Parquet) None
csvReaderConfig Additional parameters for reading CSV files, passed as a map. Uses defaults if not specified. No None