This topic describes the data types and parameters that are supported by MaxCompute Writer and how to configure MaxCompute Writer by using the codeless user interface (UI) and code editor.

Prerequisites

Before you configure MaxCompute Writer, you must configure a MaxCompute data source. For more information, see Add a MaxCompute data source.

Background information

MaxCompute Writer is designed for developers to insert data to or update data in MaxCompute. MaxCompute Writer can write gigabytes or terabytes of data to MaxCompute. For more information about MaxCompute, see What is MaxCompute?.

MaxCompute Writer writes data to MaxCompute by using Tunnel based on the information you specified, such as the source project, table, partition, and field. For more information about common Tunnel commands, see Tunnel commands.

For a table with a strict schema, such as a table in a MySQL database or MaxCompute project, Data Integration reads data from the table and stores the data in the memory. Then, Data Integration converts the data to the format that is supported by the destination and writes the data to the destination.

If the data conversion fails or the data fails to be written to the destination, the data is regarded as dirty data. You can specify a maximum number of dirty data records allowed.
Note If the data in the source contains a null value, MaxCompute Writer cannot convert the data to the VARCHAR type.

Parameters

Parameter Description Required Default value
datasource The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor. Yes No default value
table The name of the table to which you want to write data. The name is not case-sensitive. You can specify only one table. Yes No default value
partition The partitions to which you want to write data. The lowest-level partition must be specified. For example, if you want to write data to a table with three-level partitions, set the partition parameter to a value that contains the third-level partition information, such as pt=20150101, type=1, biz=2.
  • To write data to a non-partitioned table, do not specify this parameter. The data is directly written to the destination table.
  • MaxCompute Writer does not support data write operations based on the partition route. To write data to a partitioned table, make sure that the data is written to the lowest-level partition.
Required only for partitioned tables No default value
column The names of the columns to which you want to write data. If you want to write data to all the columns in the destination table, set this parameter to an asterisk (*), such as "column": ["*"]. If you want to write data to some columns in the destination table, set this parameter to the names of the specified columns. Separate the names with commas (,), such as "column": ["id","name"].
  • MaxCompute Writer can filter columns and change the order of columns. For example, a MaxCompute table has three columns: a, b, and c. If you want to write data only to column c and column b, you can enter "column": ["c","b"]. During data synchronization, column a is automatically set to null.
  • The column parameter must explicitly specify all the columns to which you want to write data. This parameter cannot be left empty.
Yes No default value
truncate To ensure the idempotence of write operations, set the truncate parameter to true. If a failed synchronization node is rerun due to a write failure, MaxCompute Writer deletes the data that has been written before and writes the source data again. This ensures that the same data is written for each rerun.

MaxCompute Writer uses MaxCompute SQL to delete data. MaxCompute SQL cannot ensure data atomicity. Therefore, the TRUNCATE operation is not an atomic operation. Conflicts may occur when multiple nodes delete data from the same table or partition in parallel.

To prevent this issue, we recommend that you do not execute multiple DDL statements to write data to the same partition at the same time. You can create different partitions for nodes that need to run in parallel.

Yes No default value

Configure MaxCompute Writer by using the codeless UI

  1. Configure data sources.
    Configure Source and Target for the synchronization node. Connections
    Parameter Description
    Connection The name of the data source to which you want to write data. This parameter is equivalent to the datasource parameter that is described in the preceding section.
    Table The name of the table to which you want to write data. This parameter is equivalent to the table parameter that is described in the preceding section.
    Writing Rule The write rule. Valid values:
    • Write with Original Data Deleted (Insert Overwrite): All data in the table or partition is deleted before MaxCompute Writer writes data. This rule is equivalent to the INSERT OVERWRITE statement.
    • Write with Original Data Retained (Insert Into): No data is deleted before MaxCompute Writer writes data. New data is appended upon each run. This rule is equivalent to the INSERT INTO statement.
    Note
    • MaxCompute Reader reads data by using Tunnel. Synchronization nodes cannot filter data. Each synchronization node reads all the data from a table or partition.
    • MaxCompute Writer writes data by using Tunnel instead of the INSERT INTO statement. You can view complete data in the destination table only after a synchronization node is properly run. Pay attention to the node dependencies.
    Convert Empty Strings to Null Specifies whether to convert empty strings to null.
  2. Configure field mappings. This operation is equivalent to setting the column parameter that is described in the preceding section. Fields in the source on the left have a one-to-one mapping with fields in the destination on the right. Field mappings
    Operation Description
    Map Fields with the Same Name Click Map Fields with the Same Name to establish mappings between fields with the same name. The data types of the fields must match.
    Map Fields in the Same Line Click Map Fields in the Same Line to establish mappings between fields in the same row. The data types of the fields must match.
    Delete All Mappings Click Delete All Mappings to remove the mappings that are established.
    Auto Layout Click Auto Layout. Then, the system automatically sorts the fields based on specific rules.

Configure MaxCompute Writer by using the code editor

You can configure MaxCompute Writer by using the code editor. For more information, see Create a sync node by using the code editor.

In the following code, a synchronization node is configured to write data to a MaxCompute table. For more information about the parameters, see the preceding parameter description.
{
    "type":"job",
    "version":"2.0",// The version number. 
    "steps":[
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"odps",// The writer type. 
            "parameter":{
                "partition":"",// The partitions to which you want to write data. 
                "truncate":true,// The write rule. 
                "compress":false,// Specifies whether to enable compression. 
                "datasource":"odps_first",// The name of the data source. 
            "column": [// The names of the columns to which you want to write data. 
                "id",
                "name",
                "age",
                "sex",
                "salary",
                "interest"
                ],
                "emptyAsNull":false,// Specifies whether to convert empty strings to null. 
                "table":""// The name of the table to which you want to write data. 
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed. 
        },
        "speed":{
            "throttle":true,// Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1, // The maximum number of parallel threads. 
            "mbps":"12"// The maximum transmission rate.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}
If you want to specify the Tunnel endpoint, you can configure the data source in the code editor. To configure the data source, replace "datasource":"", in the preceding code with detailed parameters of the data source. Example:
"accessId":"<yourAccessKeyId>",
 "accessKey":"<yourAccessKeySecret>",
 "endpoint":"http://service.eu-central-1.maxcompute.aliyun-inc.com/api",
 "odpsServer":"http://service.eu-central-1.maxcompute.aliyun-inc.com/api", 
"tunnelServer":"http://dt.eu-central-1.maxcompute.aliyun.com", 
"project":"**********", 

Additional instructions

  • Column filter

    MaxCompute Writer allows you to perform operations that MaxCompute does not support, such as filtering columns, reordering columns, and setting empty fields to null. To write data to all the columns in the destination table, set the column parameter to an asterisk (*), such as "column": ["*"].

    For example, a MaxCompute table has three columns: a, b, and c. If you want to write data only to column c and column b, you can set the column parameter to "column": ["c","b"]. The first column and the second column of the source table are written to column c and column b in the MaxCompute table. During data synchronization, column a is automatically set to null.

  • Handling column configuration errors

    To prevent data loss caused by redundant columns and ensure high data reliability, MaxCompute Writer returns an error message if the number of columns that are to be written is more than that in the destination table. For example, if a MaxCompute table contains columns a, b, and c, MaxCompute Writer returns an error message if more than three columns are to be written to the table.

  • Partition configuration

    MaxCompute Writer can write data to the lowest-level partition but cannot write data to a specified partition based on a field. To write data to a partitioned table, specify the lowest-level partition. For example, if you want to write data to a table with three-level partitions, set the partition parameter to a value that contains the third-level partition information, such as, pt=20150101, type=1, biz=2. The data cannot be written if you set the partition parameter to pt=20150101, type=1 or pt=20150101.

  • Node rerunning
    To ensure the idempotence of write operations, set the truncate parameter to true. If a failed synchronization node is rerun due to a write failure, MaxCompute Writer deletes the data that has been written before and writes the source data again. This ensures that the same data is written for each rerun. If a synchronization node is interrupted due to other exceptions, the data cannot be rolled back and the node cannot be automatically rerun. You can ensure the idempotence of write operations and the data integrity by setting the truncate parameter to true.
    Note If the truncate parameter is set to true, all data in the specified partition or table is deleted before a rerun. Exercise caution when you set this parameter to true.