edit-icon download-icon

Configure OSS writer

Last Updated: Apr 03, 2018

OSS Writer provides the ability to write one or more table files in CSV‑like format into OSS. You must configure data source before configuring the OSS Writer plug‑in. For more information, see OSS data source config.

What is written and saved to OSS file is a two‑dimensional table in a logic sense, for example, text information in a CSV format.

  • For more information on OSS products, see What is OSS.

OSS Writer provides the ability to convert the data synchronization protocol to a text file in OSS, which itself is a non‑structured data storage. Currently, OSS Writer supports the following features:

  • Only supports writing text files and schema in the text file must be a two‑dimensional table.

  • Supports CSV‑like format files with custom delimiters.

  • Supports multi‑thread writing, with different subfiles written using different threads.

  • Supports file rollover. A file exceeding a specific size value must be switched. A file that contains lines exceeding a specific number of lines must be switched.

Currently unsupported features:

  • Concurrent writing is not supported for a single file.

  • OSS itself does not provide data types. OSS Writer writes data of String type to OSS.

OSS itself does not provide data types, which are defined by DataX OSS Writer:

Category OSS data type
Integer Long
Floating point Double
String String
Boolean Bool
Date and time Date

Parameter description

  • datasource

    • Description: Data source name. It must be identical to the data source name added. Adding data source is supported in script mode.

    • Required: Yes

    • Default value: None
  • object

    • Description: The file name written by OSS Writer. It enables the simulation of directories with file names in OSS.

      Only test must be specified for an object without a bucket name. The file name synchronized to the OSS end is identical to the one entered in the source end.

      If “object”: “test/DI” is specified, the object written in OSS begins with test/DI, in which test is a folder, DI is the prefix of the file name (suffix is a random string), and a forward slash (/) is used as the delimiter of the simulated OSS directory.

    • Required: Yes

    • Default value: None

  • writeMode

    • Description: The mode in which OSS Writer clears the existing data before writing data:

      • truncate: All objects with matched object name prefixes are cleared before writing. For example, if “object”: “abc” is specified, all objects beginning with abc are cleared.
      • append: No processing is done before writing. Data Integration OSS Writer writes data directly using the object name, and appends a random UUID suffix name to make sure no file name conflict occur. For example, if the object name you specified is Data Integration, the name is actually entered as DI_xxxxxx_xxxx_xxxx.
      • nonConflict: If an object with matched prefix exists in a specified path, an error is reported directly. For example, if “object”: “abc” is specified, when an object beginning with abc123 exists, an error is reported directly.
    • Required: Yes

    • Default value: None

  • fileFormat

    • Description: Format in which a file is written, including text format and strict csv format. If the data to be written contains column delimiters, the column delimiters are escaped to double quotation marks (“) in csv escape syntax. For text format, the data to be written is separated by column delimiters without being escaped.
    • Required: No

    • Default value: text

  • fieldDelimiter

    • Description: Delimiter used to separate the read fields.

    • Required: No

    • Default value: comma (,)

  • encoding

    • Description: Encoding of the written files.

    • Required: No

    • Default value: utf-8

  • nullFormat

    • Description: Defining null (null pointer) with a standard string is not allowed in text files. Data Synchronization system provides nullFormat to define which strings can be expressed as null.

      For example, when nullFormat = “null” is configured, if the source data is null, it is considered as a null field in Data Synchronization.

    • Required: No

    • Default value: None

  • header (advanced configuration, which is not supported in wizard mode)

    • Description: Header used when a file is written in OSS. For example, [‘id’, ‘name’, ‘age’].
    • Required: No

    • Default value: None

  • maxFileSize (advanced configuration, which is not supported in wizard mode)

    • Description: The maximum size of a single object file written in OSS, which defaults to 10,000 x 10 MB. It is similar to the log rotation based on the log size in log4j log printing. For multipart upload in OSS, the size of each part is 10 MB (which is the minimum file granularity for log rotation, and maxFileSize smaller than 10 MB is also taken as 10 MB), and the maximum number of parts supported for each OSS InitiateMultipartUploadRequest is 10,000. When rotation occurs, the naming rule for object is the original object prefix + a random UUID + a suffix such as _1, _2, _3.
    • Required: No

    • Default value: 100,000 MB

Development in wizard mode

TargetOSS

  • Data sources: datasource in the preceding parameter description. Select the OSS data source.
  • object prefix: object in the preceding parameter description. Enter a path to the OSS folder without the bucket name.

  • version of the type: A type version of current target.

  • Column delimiter: fieldDelimiter in the preceding parameter description, which defaults to “,”.

  • Encoding format: encoding in the preceding parameter description, which defaults to utf-8.

  • null value: nullFormat in the preceding parameter description, to define a string that represents the null value.

  • time format: A time serialization format.

Development in script mode

The following is a script configuration sample. For relevant parameters, see Parameter description.

  1. {
  2. "type": "job",
  3. "version": "1.0",
  4. "configuration": {
  5. "settting": {
  6. "key": "value"
  7. },
  8. "reader": {},
  9. "writer": {
  10. "plugin": "OSS",
  11. "parameter": {
  12. "datasource": "datasourceName",
  13. "object": "cdo/CDP",
  14. "encoding": "UTF-8",
  15. "fieldDelimiter": ",",
  16. "writeMode": "truncate|append|nonConflict"
  17. }
  18. }
  19. }
  20. }
Thank you! We've received your feedback.