A TOS data source lets you read files from Tinder Object Storage (TOS). You can use this data source to retrieve files stored in TOS, parse them, and sync the data to any destination data source. This topic describes the data synchronization capabilities of TOS in DataWorks.
Limits
TOS data sources in DataWorks support the following field types.
Data type | Description |
STRING | Text. |
LONG | Integer. |
BYTES | Byte array. The read text is converted into a byte array with |
BOOL | Boolean. |
DOUBLE | Floating-point. |
DATE | Date and time. The following formats are supported:
|
Create a TOS data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data Source Management. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.
Develop a data sync task
You can use a TOS data source only as a source in an offline sync task for a single table. The following section describes how to configure the data sync task.
For more information, see Configure a task in the codeless UI and Configure a task in the code editor.
For all parameters and a script sample for the code editor, see Appendix: Script sample and parameter descriptions.
Appendix: Script sample and parameter descriptions
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a task in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Reader script sample
{
"type": "job",
"version": "2.0",
"steps": [
{
"stepType": "tos",
"parameter": {
"datasource": "",
"object": ["f/z/1.csv"],
"fileFormat": "csv",
"encoding": "utf8/gbk/...",
"fieldDelimiter": ",",
"useMultiCharDelimiter": true,
"skipHeader": true,
"compress": "zip/gzip",
"column": [
{
"index": 0,
"type": "long"
},
{
"index": 1,
"type": "boolean"
},
{
"index": 2,
"type": "double"
},
{
"index": 3,
"type": "string"
},
{
"index": 4,
"type": "date"
}
]
},
"name": "Reader",
"category": "reader"
},
{
"stepType": "stream",
"parameter": {},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"concurrent": 1
}
},
"order": {
"hops": [
{
"from": "Reader",
"to": "Writer"
}
]
}
}Reader script parameters
Parameter | Description | Required | Default value |
datasource | The name of the data source. This must be the same as the name of the data source that you add in the code editor. | Yes | None |
fileFormat | The format of the source file. Supported formats: | Yes | None |
object | The file path. This parameter supports the asterisk (*) wildcard character and arrays. For example, to sync the a/b/1.csv and a/b/2.csv files, you can set this parameter to a/b/*.csv. | Yes | None |
column | The columns to read. The type parameter specifies the source data type. The index parameter specifies the column number in the text file, starting from 0. The value parameter specifies a constant. This creates a column with a constant value instead of reading data from the source file.
Note For the column information that you specify, you must specify the type parameter and either the index or value parameter. | Yes | All columns are read as the |
fieldDelimiter | The field separator. Note
| Yes |
|
lineDelimiter | The row delimiter. Note This parameter is valid only when fileFormat is set to text. | No | None |
compress | The compression format of the text file. By default, this parameter is left empty, which means no compression. Supported formats: | No |
|
encoding | The encoding format of the file. | No |
|
nullFormat | A string in the text file that represents a null pointer. You can use nullFormat to define which strings represent null because text files do not have a standard way to define null. For example:
| No | None |
skipHeader | For CSV files, use skipHeader to configure whether to skip the header.
Note The skipHeader parameter is not supported for compressed files. | No |
|
parquetSchema | The schema of the Parquet files to read. This parameter is valid only when fileFormat is set to parquet. Ensure that the entire configuration is valid JSON after you specify the parquetSchema.
| No | None |
csvReaderConfig | The parameter configuration for reading CSV files. The value is of the Map type. A csvReader is used to read CSV files. If you do not configure this parameter, default values are used. | No | None |