All Products
Search
Document Center

DataWorks:TOS

Last Updated:Mar 26, 2026

The TOS data source connector reads files from Tinder Object Storage (TOS), parses them, and syncs the data to a destination in DataWorks.

Role: Source only. TOS cannot be used as a sync destination.

Task type: Offline sync, single-table mode.

Supported capabilities

Capability Supported
Source (read) Yes
Destination (write) No
Offline sync Yes
Real-time sync No
Single-table mode Yes
Multi-table mode No

Supported file formats

Format Notes
csv Supports header skipping, custom delimiters, and null value mapping
text Supports custom row delimiters
parquet Requires parquetSchema
orc No additional configuration required

Supported compression formats

Format Supported
gzip Yes
bzip2 Yes
zip Yes
Compression is not supported for files when skipHeader is configured.

Supported field types

Field type Description
STRING Text
LONG Integer
BYTES Byte array. Read text is converted to a byte array using UTF-8 encoding.
BOOL Boolean
DOUBLE Floating-point number
DATE Date and time. Supported formats: YYYY-MM-dd HH:mm:ss, yyyy-MM-dd, HH:mm:ss

Add a TOS data source

Add TOS as a data source in DataWorks before creating a sync task. Follow the instructions in Data source management. Parameter descriptions are available in the DataWorks console when you add the data source.

Configure a data sync task

Configure TOS as the Reader in an offline sync task. Two configuration methods are available:

Script sample and parameter descriptions

Script sample

The following script configures TOS as the Reader in a batch synchronization task. All parameters are set in the parameter block under the tos step.

{
  "type": "job",
  "version": "2.0",
  "steps": [
    {
      "stepType": "tos",
      "parameter": {
        "datasource": "",
        "object": ["f/z/1.csv"],
        "fileFormat": "csv",
        "encoding": "utf8/gbk/...",
        "fieldDelimiter": ",",
        "useMultiCharDelimiter": true,
        "skipHeader": true,
        "compress": "zip/gzip",
        "column": [
          {
            "index": 0,
            "type": "long"
          },
          {
            "index": 1,
            "type": "boolean"
          },
          {
            "index": 2,
            "type": "double"
          },
          {
            "index": 3,
            "type": "string"
          },
          {
            "index": 4,
            "type": "date"
          }
        ]
      },
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "stream",
      "parameter": {},
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting": {
    "errorLimit": {
      "record": "0"
    },
    "speed": {
      "concurrent": 1
    }
  },
  "order": {
    "hops": [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  }
}

Common parameters

Parameter Type Required Default Description
datasource String Yes Data source name. Must match the name added in the DataWorks console.
fileFormat String Yes File format. Valid values: csv, text, parquet, orc.
object String or array Yes Path to the file or files to read. Supports the * wildcard and arrays. To read a/b/1.csv and a/b/2.csv, set this to a/b/*.csv.
column Array Yes All columns as STRING Columns to read. See Column configuration.
fieldDelimiter String Yes , Field delimiter. For non-printable characters, use Unicode encoding, for example \u001b or \u007c.
lineDelimiter String No Row delimiter. Valid only when fileFormat is text.
compress String No None (uncompressed) Compression format. Valid values: gzip, bzip2, zip.
encoding String No utf-8 File encoding.
nullFormat String No None String in the source file that represents a null value. Set "nullFormat": "null" to treat the literal string "null" as null, or "nullFormat": "\u0001" to treat the non-printable character \u0001 as null. If not set, no conversion is applied and the source value is written as-is.
skipHeader Boolean No false For CSV files, whether to read the header row. true: reads the header during sync. false: the header is not read during sync. Not supported for compressed files.
csvReaderConfig Map No Advanced configuration for reading CSV files. Uses default values if not set.

Column configuration

The column parameter controls which columns to read and how to map their types.

By default, all columns are read as STRING:

"column": ["*"]

To specify column types explicitly, provide an array of column definitions. Each definition requires type and either index or value:

  • index: Zero-based column position in the source file.

  • value: A constant. Creates a column with a fixed value instead of reading from the file.

Example:

"column": [
  { "type": "long", "index": 0 },
  { "type": "string", "value": "alibaba" }
]

The second entry produces a column with the constant value "alibaba" for every row, regardless of the source data.

Use explicit column definitions when you need to:

  • Read only a subset of columns.

  • Enforce specific data types instead of relying on the default STRING type.

  • Add constant-value columns to the output.

Format-specific configuration

CSV

CSV files use fieldDelimiter and skipHeader for parsing control. Use csvReaderConfig for advanced options such as quote characters and multi-line records.

For files with non-standard delimiters, specify the delimiter using Unicode encoding. For example, use \u007c for the pipe character (|).

Parquet

Use the parquetSchema parameter when fileFormat is parquet. This parameter is ignored for other formats.

parquetSchema defines the schema of the Parquet file:

message MessageTypeName {
  Rule DataType FieldName;
  ...;
}
  • MessageTypeName: Name of the message type.

  • Rule: Use required for non-null fields, optional for nullable fields. Set all fields to optional unless you have a specific reason not to.

  • DataType: Valid values are BOOLEAN, Int32, Int64, Int96, FLOAT, DOUBLE, BINARY, and fixed_len_byte_array. Use BINARY for string fields.

  • Each field definition must end with a semicolon (;), including the last field.

Example:

"parquetSchema": "message m { optional int32 minute_id; optional int32 dsp_id; optional int32 adx_pid; optional int64 req; optional int64 res; optional int64 suc; optional int64 imp; optional double revenue; }"

Make sure the full configuration remains valid JSON after adding parquetSchema.

Text

Set lineDelimiter to define the row separator when reading plain text files.

ORC

No format-specific configuration is required for ORC files. Use the column parameter to select and type-map columns as needed.

What's next