This topic describes the parameters that are supported by OSS Writer and how to configure OSS Writer by using the codeless user interface (UI) and code editor.

Background information

OSS Writer writes one or more CSV-like files to Object Storage Service (OSS). The number of files that are written to OSS depends on the number of parallel threads and the total number of files that you want to write to OSS.
Note Before you configure OSS Writer, you must configure an OSS data source. For more information, see Add an OSS data source.

OSS Writer can write files that store logical two-dimensional tables, such as CSV files that store text data, to OSS. For more information about OSS, see What is OSS?

OSS stores only unstructured data. Therefore, OSS Writer converts the data obtained from a reader to text files and writes the files to OSS. OSS Writer provides the following features:
  • Writes only text files to OSS. The data in the files must be organized as logical two-dimensional tables.
  • Writes CSV-like files with custom delimiters to OSS.
  • Uses parallel threads to write files to OSS. Each thread writes one file to OSS.
  • Supports object rotation. Files are written to OSS as objects. If the size of a file exceeds a specific threshold, OSS Writer writes excess data as another object.
OSS Writer does not support the following features:
  • Uses parallel threads to write a single file to OSS.
  • Distinguishes between data types. OSS does not distinguish between data types. OSS Writer writes all data as strings to OSS.

Parameters

Parameter Description Required Default value
datasource The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor. Yes No default value
object The prefix for the names of the files that you want to write to OSS. OSS simulates the directory effect by adding delimiters to file names. Examples:
  • If you set the object parameter to datax, the names of the files start with datax and end with random strings.
  • If you set the object parameter to cdo/datax, the names of the files start with /cdo/datax and end with random strings. OSS uses forward slashes (/) in file names to simulate the directory effect.

If you do not want to add a random universally unique identifier (UUID) as the suffix, we recommend that you set the writeSingleObject parameter to true. For more information, see the description of the writeSingleObject parameter.

Yes No default value
writeMode The write mode. Valid values:
  • truncate: OSS Writer deletes all existing objects whose names start with the specified prefix before it writes files to OSS. For example, if you set the object parameter to abc, OSS Writer deletes all the objects whose names start with abc before it writes files to OSS.
  • append: OSS Writer writes all files to OSS and suffixes the file names with random UUIDs to ensure that the names of the files are different from the names of existing objects. For example, if you set the object parameter to DI, the actual names of the files written to OSS are in the DI_****_****_**** format.
  • nonConflict: If OSS contains objects whose names start with the specified prefix, OSS Writer returns an error. For example, if you set the object parameter to abc and OSS contains an object named abc123, OSS Writer returns an error.
Yes No default value
writeSingleObject Specifies whether to write a single file to OSS at a time. Valid values:
  • true: OSS Writer writes a single file to OSS at a time.
  • false: OSS Writer writes multiple files to OSS at a time.
No false
fileFormat The format in which OSS Writer writes files to OSS. Valid values:
  • csv: If a file is written as a CSV file, the file must follow CSV specifications. If the data in the file contains column delimiters, the column delimiters are escaped by double quotation marks (").
  • text: If a file is written as a text file, the data in the file is separated by column delimiters. In this case, OSS Writer does not escape the column delimiters.
Note OSS Writer can write Parquet files to OSS. If you want to write Parquet files to OSS, you must configure the parquetschema parameter to define the related data type.
No text
compress The compression type of the files that you want to write to OSS. This parameter is available only in the code editor.
Note CSV and text files cannot be compressed. Parquet and ORC files can be compressed in a format such as Snappy and ZIP.
No No default value
fieldDelimiter The column delimiter that is used in the files that you want to write to OSS. No ,
encoding The encoding format of the files that you want to write to OSS. No utf-8
nullFormat The string that represents a null pointer. No standard strings can represent a null pointer in text files. You can use this parameter to define a string that represents a null pointer. For example, if you set nullFormat to null, Data Integration considers null as a null pointer. No No default value
header The table headers in the files that you want to write to OSS. Example: ["id", "name", "age"]. No No default value
maxFileSize (advanced parameter, which is available only in the code editor) The maximum size of a single file that can be written to OSS. Default value: 100000. Unit: MB. OSS Writer performs object rotation based on the value of this parameter. Object rotation is similar to log rotation of Log4j. When a file is uploaded to OSS in multiple parts, the maximum size of a part is 10 MB. This size is the minimum granularity used for object rotation. If you set the maxFileSize parameter to a value that is less than 10 MB, the maximum size of a single file that can be written to OSS is still 10 MB. The InitiateMultipartUploadRequest operation can be used to upload a file in a maximum of 10,000 parts at a time.

If object rotation occurs, suffixes, such as _1, _2, and _3, are appended to the new object names that consist of prefixes and random UUIDs.

No 100,000
Note The default unit is MB.

For example, if you set the maxFileSize parameter to 300, the maximum size of a single file that can be written to OSS is 300 MB.

suffix (advanced parameter, which is available only in the code editor) The file name extension of the files that you want to write to OSS. For example, if you set the suffix parameter to .csv, the final name of a file written to OSS is in the fileName****.csv format. No No default value

Configure OSS Writer by using the codeless UI

  1. Configure data sources.
    Configure Source and Target for the synchronization node. Configure data sources
    Parameter Description
    Connection The name of the data source to which you want to write data. This parameter is equivalent to the datasource parameter that is described in the preceding section.
    Object Name (Path Included) The prefix for the names of the files that you want to write to OSS. This parameter is equivalent to the object parameter that is described in the preceding section. Do not use this parameter to specify the name of your OSS bucket.
    File Type The format in which OSS Writer writes the files to OSS. Valid values: csv, text, and parquet.
    Field Delimiter This parameter is equivalent to the fieldDelimiter parameter that is described in the preceding section. By default, a comma (,) is used as a column delimiter.
    Encoding This parameter is equivalent to the encoding parameter that is described in the preceding section. Default value: UTF-8.
    Null String The string that represents a null pointer. This parameter is equivalent to the nullFormat parameter that is described in the preceding section. If the data in the file that you want to write contains the specified string, the string is replaced with null.
    Time Format The format in which data of the DATE data type is serialized in objects. Example: yyyy-MM-dd.
    Solution to Duplicate Prefixes The solution to prefix conflicts. If the prefix of the name of an existing object is the same as the specified prefix, OSS Writer replaces the existing object with the new object, appends data to the existing object, or returns an error.
  2. Configure field mappings. Fields in the source on the left have a one-to-one mapping with fields in the destination on the right. Field mappings
    Operation Description
    Map Fields with the Same Name Click Map Fields with the Same Name to establish mappings between fields with the same name. The data types of the fields must match.
    Map Fields in the Same Line Click Map Fields in the Same Line to establish mappings between fields in the same row. The data types of the fields must match.
    Delete All Mappings Click Delete All Mappings to remove the mappings that are established.

Configure OSS Writer by using the code editor

In the following code, a synchronization node is configured to write data to OSS. For more information about how to configure a synchronization node by using the code editor, see Create a synchronization node by using the code editor.
{
    "type":"job",
    "version":"2.0",
    "steps":[
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"oss",// The writer type. 
            "parameter":{
                "nullFormat":"",// The string that represents a null pointer. 
                "dateFormat":"",// The format in which data of the DATE data type is serialized in objects. 
                "datasource":"",// The name of the data source. 
                "writeMode":"",// The write mode. 
                "writeSingleObject":"false", // Specifies whether to write a single file to OSS at a time. 
                "encoding":"",// The encoding format. 
                "fieldDelimiter":","// The column delimiter. 
                "fileFormat":"",// The format in which OSS Writer writes files to OSS. 
                "object":""// The prefix for the names of the files that you want to write to OSS. 
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed. 
        },
        "speed":{
            "throttle":true,// Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1, // The maximum number of parallel threads. 
            "mbps":"12"// The maximum transmission rate.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Write ORC or Parquet files to OSS

OSS Writer writes ORC or Parquet files to OSS in the way in which HDFS Writer writes data to Hadoop Distributed File System (HDFS). In addition to the parameters for OSS Writer, extended parameters, such as Path and FileFormat, are added for OSS Writer. For more information about the extended parameters, see HDFS Writer.

The following sample code provides examples on how to configure a synchronization node to write an ORC file to OSS and how to configure a synchronization node to write a Parquet file to OSS.
Notice The following sample code is only for reference. You can modify the parameters based on your business requirements.
  • Write an ORC file to OSS
    {
          "stepType": "oss",
          "parameter": {
            "datasource": "",
            "fileFormat": "orc",
            "path": "/tests/case61",
            "fileName": "orc",
            "writeMode": "append",
            "column": [
              {
                "name": "col1",
                "type": "BIGINT"
              },
              {
                "name": "col2",
                "type": "DOUBLE"
              },
              {
                "name": "col3",
                "type": "STRING"
              }
            ],
            "writeMode": "append",
            "fieldDelimiter": "\t",
            "compress": "NONE",
            "encoding": "UTF-8"
          }
        }
  • Write a Parquet file to OSS
    {
          "stepType": "oss",
          "parameter": {
            "datasource": "",
            "fileFormat": "parquet",
            "path": "/tests/case61",
            "fileName": "test",
            "writeMode": "append",
            "fieldDelimiter": "\t",
            "compress": "SNAPPY",
            "encoding": "UTF-8",
            "parquetSchema": "message test { required int64 int64_col;\n required binary str_col (UTF8);\nrequired group params (MAP) {\nrepeated group key_value {\nrequired binary key (UTF8);\nrequired binary value (UTF8);\n}\n}\nrequired group params_arr (LIST) {\n  repeated group list {\n    required binary element (UTF8);\n  }\n}\nrequired group params_struct {\n  required int64 id;\n required binary name (UTF8);\n }\nrequired group params_arr_complex (LIST) {\n  repeated group list {\n    required group element {\n required int64 id;\n required binary name (UTF8);\n}\n  }\n}\nrequired group params_complex (MAP) {\nrepeated group key_value {\nrequired binary key (UTF8);\nrequired group value {\n  required int64 id;\n required binary name (UTF8);\n  }\n}\n}\nrequired group params_struct_complex {\n  required int64 id;\n required group detail {\n  required int64 id;\n required binary name (UTF8);\n  }\n  }\n}",
            "dataxParquetMode": "fields"
          }
        }