All Products
Search
Document Center

DataWorks:TOS data source

Last Updated:Dec 05, 2025

A TOS data source lets you read files from Tinder Object Storage (TOS). You can use this data source to retrieve files stored in TOS, parse them, and sync the data to any destination data source. This topic describes the data synchronization capabilities of TOS in DataWorks.

Limits

TOS data sources in DataWorks support the following field types.

Data type

Description

STRING

Text.

LONG

Integer.

BYTES

Byte array. The read text is converted into a byte array with UTF-8 encoding.

BOOL

Boolean.

DOUBLE

Floating-point.

DATE

Date and time. The following formats are supported:

  • YYYY-MM-dd HH:mm:ss

  • yyyy-MM-dd

  • HH:mm:ss

Create a TOS data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data Source Management. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.

Develop a data sync task

You can use a TOS data source only as a source in an offline sync task for a single table. The following section describes how to configure the data sync task.

Appendix: Script sample and parameter descriptions

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a task in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script sample

{
  "type": "job",
  "version": "2.0",
  "steps": [
    {
      "stepType": "tos",
      "parameter": {
        "datasource": "",
        "object": ["f/z/1.csv"],
        "fileFormat": "csv",
        "encoding": "utf8/gbk/...",
        "fieldDelimiter": ",",
        "useMultiCharDelimiter": true,
        "skipHeader": true,
        "compress": "zip/gzip",
        "column": [
          {
            "index": 0,
            "type": "long"
          },
          {
            "index": 1,
            "type": "boolean"
          },
          {
            "index": 2,
            "type": "double"
          },
          {
            "index": 3,
            "type": "string"
          },
          {
            "index": 4,
            "type": "date"
          }
        ]
      },
      "name": "Reader",
      "category": "reader"
    },
    {
      "stepType": "stream",
      "parameter": {},
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting": {
    "errorLimit": {
      "record": "0"
    },
    "speed": {
      "concurrent": 1
    }
  },
  "order": {
    "hops": [
      {
        "from": "Reader",
        "to": "Writer"
      }
    ]
  }
}

Reader script parameters

Parameter

Description

Required

Default value

datasource

The name of the data source. This must be the same as the name of the data source that you add in the code editor.

Yes

None

fileFormat

The format of the source file. Supported formats: csv, text, parquet, and orc.

Yes

None

object

The file path. This parameter supports the asterisk (*) wildcard character and arrays.

For example, to sync the a/b/1.csv and a/b/2.csv files, you can set this parameter to a/b/*.csv.

Yes

None

column

The columns to read. The type parameter specifies the source data type. The index parameter specifies the column number in the text file, starting from 0. The value parameter specifies a constant. This creates a column with a constant value instead of reading data from the source file.

  • By default, you can read all data as the String type with the following configuration.

    column": ["*"]
  • You can specify the column information as follows.

    "column":    
        {       
            "type": "long",       
            "index": 0 // Obtains an integer field from the first column of the TOS text file.
        },    
        {       
            "type": "string",       
            "value": "alibaba" // Generates a string field with the value "alibaba" from within TOS as the current field.
    }
Note

For the column information that you specify, you must specify the type parameter and either the index or value parameter.

Yes

All columns are read as the STRING type.

fieldDelimiter

The field separator.

Note
  • You must specify a field separator for TOS Reader. If you do not specify one, the default comma (,) is used. The comma (,) is also the default value on the configuration page.

  • If the separator is a non-printable character, enter its Unicode encoding. For example, \u001b or \u007c.

Yes

,

lineDelimiter

The row delimiter.

Note

This parameter is valid only when fileFormat is set to text.

No

None

compress

The compression format of the text file. By default, this parameter is left empty, which means no compression. Supported formats: gzip, bzip2, and zip.

No

None

encoding

The encoding format of the file.

No

utf-8

nullFormat

A string in the text file that represents a null pointer. You can use nullFormat to define which strings represent null because text files do not have a standard way to define null. For example:

  • If you set nullFormat:"null", which is a visible character, a source data value of "null" is treated as a null field.

  • If you set nullFormat:"\u0001", which is a non-visible character, a source data value of "\u0001" is treated as a null field.

  • If you do not specify the "nullFormat" parameter, no conversion is performed. The source data is written to the destination as is.

No

None

skipHeader

For CSV files, use skipHeader to configure whether to skip the header.

  • True: Reads the table header during data source synchronization.

  • False: The table header is not read during data source synchronization.

Note

The skipHeader parameter is not supported for compressed files.

No

false

parquetSchema

The schema of the Parquet files to read. This parameter is valid only when fileFormat is set to parquet. Ensure that the entire configuration is valid JSON after you specify the parquetSchema.

message MessageTypeName {
Rule, DataType, FieldName;
......................;
}
  • The format of parquetSchema is as follows:

    • MessageTypeName: The name of the message type.

    • required/optional: Use required for non-null fields and optional for nullable fields. We recommend that you set this to optional for all fields.

    • dataType: Parquet files support BOOLEAN, Int32, Int64, Int96, FLOAT, DOUBLE, BINARY, and fixed_len_byte_array. If the field type is string, use BINARY.

    • Each column definition must end with a semicolon (;), including the last one.

  • The following is a configuration example:

    "parquetSchema": "message m { optional int32 minute_id; optional int32 dsp_id; optional int32 adx_pid; optional int64 req; optional int64 res; optional int64 suc; optional int64 imp; optional double revenue; }"

No

None

csvReaderConfig

The parameter configuration for reading CSV files. The value is of the Map type. A csvReader is used to read CSV files. If you do not configure this parameter, default values are used.

No

None