The TOS data source connector reads files from Tinder Object Storage (TOS), parses them, and syncs the data to a destination in DataWorks.
Role: Source only. TOS cannot be used as a sync destination.
Task type: Offline sync, single-table mode.
Supported capabilities
| Capability | Supported |
|---|---|
| Source (read) | Yes |
| Destination (write) | No |
| Offline sync | Yes |
| Real-time sync | No |
| Single-table mode | Yes |
| Multi-table mode | No |
Supported file formats
| Format | Notes |
|---|---|
csv |
Supports header skipping, custom delimiters, and null value mapping |
text |
Supports custom row delimiters |
parquet |
Requires parquetSchema |
orc |
No additional configuration required |
Supported compression formats
| Format | Supported |
|---|---|
gzip |
Yes |
bzip2 |
Yes |
zip |
Yes |
Compression is not supported for files when skipHeader is configured.
Supported field types
| Field type | Description |
|---|---|
STRING |
Text |
LONG |
Integer |
BYTES |
Byte array. Read text is converted to a byte array using UTF-8 encoding. |
BOOL |
Boolean |
DOUBLE |
Floating-point number |
DATE |
Date and time. Supported formats: YYYY-MM-dd HH:mm:ss, yyyy-MM-dd, HH:mm:ss |
Add a TOS data source
Add TOS as a data source in DataWorks before creating a sync task. Follow the instructions in Data source management. Parameter descriptions are available in the DataWorks console when you add the data source.
Configure a data sync task
Configure TOS as the Reader in an offline sync task. Two configuration methods are available:
-
Codeless UI: See Configure a task in the codeless UI.
-
Code editor: See Configure a task in the code editor. For the full parameter reference and a script sample, see Script sample and parameter descriptions.
Script sample and parameter descriptions
Script sample
The following script configures TOS as the Reader in a batch synchronization task. All parameters are set in the parameter block under the tos step.
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "tos",
"parameter": {
"datasource": "",
"object": ["f/z/1.csv"],
"fileFormat": "csv",
"encoding": "utf8/gbk/...",
"fieldDelimiter": ",",
"useMultiCharDelimiter": true,
"skipHeader": true,
"compress": "zip/gzip",
"column": [
{
"index": 0,
"type": "long"
},
{
"index": 1,
"type": "boolean"
},
{
"index": 2,
"type": "double"
},
{
"index": 3,
"type": "string"
},
{
"index": 4,
"type": "date"
}
]
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"concurrent": 1
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}
Common parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
datasource |
String | Yes | — | Data source name. Must match the name added in the DataWorks console. |
fileFormat |
String | Yes | — | File format. Valid values: csv, text, parquet, orc. |
object |
String or array | Yes | — | Path to the file or files to read. Supports the * wildcard and arrays. To read a/b/1.csv and a/b/2.csv, set this to a/b/*.csv. |
column |
Array | Yes | All columns as STRING |
Columns to read. See Column configuration. |
fieldDelimiter |
String | Yes | , |
Field delimiter. For non-printable characters, use Unicode encoding, for example \u001b or \u007c. |
lineDelimiter |
String | No | — | Row delimiter. Valid only when fileFormat is text. |
compress |
String | No | None (uncompressed) | Compression format. Valid values: gzip, bzip2, zip. |
encoding |
String | No | utf-8 |
File encoding. |
nullFormat |
String | No | None | String in the source file that represents a null value. Set "nullFormat": "null" to treat the literal string "null" as null, or "nullFormat": "\u0001" to treat the non-printable character \u0001 as null. If not set, no conversion is applied and the source value is written as-is. |
skipHeader |
Boolean | No | false |
For CSV files, whether to read the header row. true: reads the header during sync. false: the header is not read during sync. Not supported for compressed files. |
csvReaderConfig |
Map | No | — | Advanced configuration for reading CSV files. Uses default values if not set. |
Column configuration
The column parameter controls which columns to read and how to map their types.
By default, all columns are read as STRING:
"column": ["*"]
To specify column types explicitly, provide an array of column definitions. Each definition requires type and either index or value:
-
index: Zero-based column position in the source file. -
value: A constant. Creates a column with a fixed value instead of reading from the file.
Example:
"column": [
{ "type": "long", "index": 0 },
{ "type": "string", "value": "alibaba" }
]
The second entry produces a column with the constant value "alibaba" for every row, regardless of the source data.
Use explicit column definitions when you need to:
-
Read only a subset of columns.
-
Enforce specific data types instead of relying on the default
STRINGtype. -
Add constant-value columns to the output.
Format-specific configuration
CSV
CSV files use fieldDelimiter and skipHeader for parsing control. Use csvReaderConfig for advanced options such as quote characters and multi-line records.
For files with non-standard delimiters, specify the delimiter using Unicode encoding. For example, use \u007c for the pipe character (|).
Parquet
Use the parquetSchema parameter when fileFormat is parquet. This parameter is ignored for other formats.
parquetSchema defines the schema of the Parquet file:
message MessageTypeName {
Rule DataType FieldName;
...;
}
-
MessageTypeName: Name of the message type. -
Rule: Userequiredfor non-null fields,optionalfor nullable fields. Set all fields tooptionalunless you have a specific reason not to. -
DataType: Valid values areBOOLEAN,Int32,Int64,Int96,FLOAT,DOUBLE,BINARY, andfixed_len_byte_array. UseBINARYfor string fields. -
Each field definition must end with a semicolon (
;), including the last field.
Example:
"parquetSchema": "message m { optional int32 minute_id; optional int32 dsp_id; optional int32 adx_pid; optional int64 req; optional int64 res; optional int64 suc; optional int64 imp; optional double revenue; }"
Make sure the full configuration remains valid JSON after adding parquetSchema.
Text
Set lineDelimiter to define the row separator when reading plain text files.
ORC
No format-specific configuration is required for ORC files. Use the column parameter to select and type-map columns as needed.