edit-icon download-icon

Configure MaxCompute writer

Last Updated: Apr 03, 2018

The MaxCompute Writer plug-in is designed for ETL developers to insert or update data in MaxCompute. With the ability to import business data to MaxCompute, this plug-in is suitable for TB and GB-level data transmission. You must configure the data source before configuring the MaxCompute Writer plug-in. For more information, see MaxCompute data source config. For more information about MaxCompute, see What is MaxCompute.

At the underlying implementation level, it writes data into MaxCompute by using Tunnel based on the source project/table/partition/table field and other information you configured. For common Tunnel commands, see Tunnel commands.

MaxCompute Writer supports the following data types in MaxCompute.

Type MaxCompute data type
Integer bigint
Floating point double, decimal
String string
Date datetime
Boolean Boolean

Parameter description

  • datasource

    • Description: Data source name. It must be identical to the data source name added. Adding data source is supported in script mode.

    • Required: Yes

    • Default value: None
  • table

    • Description: the name of the data table to write data into (case-insensitive). Writing data into multiple tables is not supported.

    • Required: Yes

    • Default value: None
  • partition

    • Description: The partition information of the data table to write data into. Always specify the parameter through to a last-level partition. For example, if you want to write data to a three-level partition table, configure through to a last-level partition, for example, pt= 20150101, type=1, biz=2.

      • For non-partition tables, this value must not be entered, which means that the data is directly imported to the target table.

      • MaxCompute Writer does not support writing data through routing. For partition tables, always make sure that the data is written through to a last-level partition.

    • Required: For partition tables, it is required. For non-partition tables, it is left blank.

    • Default value: None
  • column

    • Description: The list of fields to be imported. If you are importing all the fields, it can be configured to “column”: [“*”]. If you are inserting some of the MaxCompute columns, enter these columns.

      For example, “column”: [“id, name”]. MaxCompute Writer supports column filtering and column order changing. For example: a table has three fields: a, b, and c, and you only want to synchronize fields c and b. You can configure it to “column”:[“c, b”]. During the import process, field a is automatically inserted with null values and set to null.

      Column must contain the specified column set to be synchronized and it cannot be blank.

    • Required: Yes

    • Default value: None
  • truncate

    • Description: “truncate”: “true” is configured to make sure the idempotence of write operations. When a reattempt is made after a failed write attempt, MaxCompute Writer cleans up this data and imports the new data. This makes sure the data is consistent after each rerunning.

      The option truncate is not an atomic operation. Because MaxCompute SQL is used for data cleansing, SQL cannot be atomic. Therefore, when multiple tasks clean up a Table/Partition at the same time, the concurrency and timing problem may occur. So proceed with caution. To avoid this problem, we recommend that you try not to operate on one partition with multiple job DDLs at the same time, or that you create partitions before starting multiple concurrent jobs.

    • Required: Yes

    • Default value: None

Development in wizard mode

TargetMaxCompute

  • Data sources: datasource in the preceding parameter description. Select the ODPS data source.

  • Table: table in the preceding parameter description. Select the table to be synchronized.

  • zoning information: partition in the preceding parameter description. Configure the partition information to be read.

  • Cleansing rules:

    • Clear existing data before writing: Before data importing, all the data in the table or partition must be cleared, which is equivalent to Insert Overwrite.

    • Keep existing data before writing: No data is cleared before the data import and new data is always appended with each run, which is equivalent to Insert Into.

  • Field mapping: column in the preceding parameter description. Read the column information of the MaxCompute source table.

Development in script mode

The following is a script configuration sample. For relevant parameters, see Parameter description.

  1. {
  2. "type": "job",
  3. "version": "1.0",
  4. "configuration": {
  5. "reader": {
  6. },
  7. "writer": {
  8. "plugin": "odps",
  9. "parameter": {
  10. "datasource": "datasourceName",
  11. "column": [
  12. "id",
  13. "name"
  14. ], //Target column. Enter the column names of the target table, which are separated by commas. You can also enter "*" to indicate all column names.
  15. "table": "table",
  16. "partition": "pt=20140501",
  17. "truncate": true
  18. }
  19. }
  20. }
  21. }

Supplemental instructions

Column filtering

MaxCompute itself does not support column filtering, reordering, and null filling, but MaxCompute Writer does. For example, if you want to import all the fields in the field list, you can configure it to “column”: [“*”]. For another example, if you have a table that contains three fields: a, b, and c, and you only want to synchronize fields c and b, you can configure it to “column”: [“c”,”b”]. Then, the first and second columns of reader are imported to field c and field b of MaxCompute, and the MaxCompute table’s field a, into which new records are inserted, is set to null.

How to handle column configuration errors

To make sure the data is written in a reliable manner, data loss from redundant columns must be prevented to avoid data quality failure. When redundant columns are written, MaxCompute Writer produces an error. For example, if the MaxCompute table has fields a, b, and c, but MaxCompute Writer writes more than three fields, MaxCompute Writer produces an error.

Notes on partition configuration

MaxCompute Writer only provides the write through to a last-level partition function, and does not support partition routing of writing based on a specific field. For a table that has three levels of partition, you must specify writing data to a level-3 partition. For example, to write data to the level-3 partition of a table, you can configure it to pt=20150101,type=1,biz=2, but not pt=20150101,type=1 or pt=20150101.

Task rerunning and failover

In MaxCompute Writer, “truncate”: “true” is configured to make sure the idempotence of write operations. When a reattempt is made after a failed write attempt, MaxCompute Writer cleans up this data and imports the new data. This makes sure the data is consistent after each rerunning. If the task is interrupted by any exceptions during the running process, the atomicity of the data cannot be guaranteed, nor will the data be rolled back or rerun automatically. It is required that you use this idempotence to rerun the task to guarantee data integrity.

Setting “truncate” to “true” cleans up all the data of the specified partition or table, so proceed with caution.

Thank you! We've received your feedback.