This topic describes the data types and parameters supported by MaxCompute Writer and how to configure it by using the codeless user interface (UI) and code editor.

MaxCompute Writer is designed for developers to insert data to or update data in MaxCompute. MaxCompute Writer is suitable for importing data of the GB or TB level to MaxCompute.

Note You must configure a connection before configuring MaxCompute Writer. For more information, see Configure a MaxCompute connection. For more information about MaxCompute, see What is MaxCompute?.

Based on the specified information such as the source project, table, partition, and field, MaxCompute Writer writes data to MaxCompute through Tunnel. For more information about common Tunnel commands, see Tunnel commands.

Data Integration reads data from tables with strict schemas in a source database such as MySQL or MaxCompute to the memory, and converts the obtained data to the format supported by the destination data store before writing data to the destination data store.

If the data conversion fails or the data fails to be written to the destination data store, the data is regarded as dirty data. You can process the dirty data based on the maximum number of dirty data records allowed.
Note If data records contain the null value, MaxCompute Writer does not support data of the VARCHAR type.

Parameters

Parameter Description Required Default value
datasource The connection name. It must be identical to the name of the added connection. You can add connections in the code editor. Yes None
table The name of the destination table. The name is case-insensitive. You can specify only one table as the destination table. Yes None
partition The partition to which data is written. The last-level partition must be specified. For example, if you want to write data to a three-level partitioned table, set the partition parameter to a value that contains the last-level partition information, such as pt=20150101, type=1, biz=2.
  • To write data to a non-partitioned table, do not set this parameter. The data is directly written to the destination table.
  • MaxCompute Writer does not support writing data based on the partition route. To write data to a partitioned table, make sure that data is written to the last-level partition.
Required only for writing data to a partitioned table None
column The columns in the destination table to which data is written. Set the value to an asterisk (*) if data is written to all the columns in the destination table. That is, set the column parameter as follows: "column": ["*"]. Set the value to the specified columns if data is written to only some of the columns in the destination table. Separate the columns with a comma (,). Example: "column": ["id","name"].
  • MaxCompute Writer supports filtering columns and changing the order of columns. For example, a MaxCompute table has three columns: a, b, and c. If you want to write data only to column c and column b, you can set the column parameter as follows: "column": ["c","b"]. During data synchronization, column a is automatically set to null.
  • The column parameter must explicitly specify a set of columns to which data is written. The parameter cannot be left empty.
Yes None
truncate To guarantee the idempotence of write operations, set the value to true. That is, set the truncate parameter as follows: "truncate": "true". When a failed sync node is rerun due to a write failure, MaxCompute Writer deletes the data that has been written before importing the source data again. This guarantees that the same data is written for each rerun.

MaxCompute Writer uses MaxCompute SQL to delete data. MaxCompute SQL cannot guarantee the atomicity. Therefore, the truncation operation is not an atomic operation. Conflicts may occur when concurrent nodes delete data from the same table or partition.

To avoid this issue, we recommend that you do not run concurrent Data Definition Language (DDL) nodes to write data to the same partition. You can create different partitions for nodes that need to run concurrently.

Yes None

Configure MaxCompute Writer by using the codeless UI

  1. Configure the connections.
    Configure the source and destination connections for the sync node.Connections
    Parameter Description
    Connection The datasource parameter in the preceding parameter description. Select a connection type, and enter the name of a connection that has been configured in DataWorks.
    Table The table parameter in the preceding parameter description.
    Partition Key Column To write data to all the columns in the destination table, enter "column": ["*"]. The partition parameter supports wildcards and includes one or more partitions.
    • "partition":"pt=20140501/ds=*" specifies that data is written to all ds partitions with pt=20140501.
    • "partition":"pt=top?" specifies that data is written to the partitions with pt=top and pt=to.
    You can specify the partition key columns to which data is written. Assume that the partition key column of a MaxCompute table is pt=${bdp.system.bizdate}. You can configure the column to which data is written to pt. Ignore it if the column is marked as unidentified.
    • To write data to all partitions, enter pt=*.
    • To write data to some of the partitions, specify the corresponding dates.
    Writing Rule
    • Write with Original Data Deleted (Insert Overwrite): All data in the table or partition is deleted before data import. This rule is equivalent to the INSERT OVERWRITE statement.
    • Write with Original Data Retained (Insert Into): No data is deleted before data import. New data is always appended with each run. This rule is equivalent to the INSERT INTO statement.
    Note
    • MaxCompute Reader reads data by using Tunnel. Sync nodes do not support data filtering. Instead, they must read all the data in a specific table or partition.
    • MaxCompute Writer writes data through Tunnel, instead of by using the INSERT INTO statement. You can view the complete data in the destination table only after a sync node is run. Pay attention to the node dependencies.
    Convert Empty Strings to Null Default value: Yes.
  2. Configure field mapping, that is, the column parameter in the preceding parameter description.
    Fields in the source table on the left have a one-to-one mapping with fields in the destination table on the right.Mappings
  3. Configure channel control policies.Channel

Configure MaxCompute Writer by using the code editor

In the following code, a node is configured to write data to a MaxCompute table. For more information about the parameters, see the preceding parameter description.

{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"odps",// The writer type.
            "parameter":{
                "partition":"",// The partition information.
                "truncate":true,// The write rule.
                "compress":false,// Specifies whether to enable compression.
                "datasource":"odps_first",// The connection name.
            "column": [// The columns to which data is written.
                "id",
                "name",
                "age",
                "sex",
                "salary",
                "interest"
                ],
                "emptyAsNull":false,// Specifies whether to convert empty strings to null.
                "table":""// The name of the destination table.
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed.
        },
        "speed":{
            "throttle":false,// Specifies whether to enable bandwidth throttling. A value of false indicates that the bandwidth is not throttled. A value of true indicates that the bandwidth is throttled. The maximum transmission rate takes effect only if you set this parameter to true.
            "concurrent":1,// The maximum number of concurrent threads.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

If you want to specify the Tunnel endpoint of MaxCompute, you can configure the connection in the code editor. To configure the connection, replace "datasource":"", in the preceding code with detailed parameters of the connection. Example:

"accessId":"***********",
 "accessKey":"*********",
 "endpoint":"http://service.eu-central-1.maxcompute.aliyun-inc.com/api",
 "odpsServer":"http://service.eu-central-1.maxcompute.aliyun-inc.com/api", 
"tunnelServer":"http://dt.eu-central-1.maxcompute.aliyun.com", 
"project":"**********", 

Additional instructions

  • Column filter

    By configuring MaxCompute Writer, you can perform operations that MaxCompute does not support, such as filtering columns, reordering columns, and setting empty fields to null. To write data to all the columns in the destination table, set the column parameter as follows: "column": ["*"].

    For example, a MaxCompute table has three columns: a, b, and c. If you want to write data only to column c and column b, you can set the column parameter as follows: "column": ["c","b"]. The first column and the second column of the source data are written to column c and column b in the MaxCompute table respectively. During data synchronization, column a is automatically set to null.

  • Column configuration error handling

    To avoid losing the data of redundant columns and guarantee high data reliability, MaxCompute Writer returns an error message if more columns are to be written than expected. For example, a MaxCompute table has three columns: a, b, and c. MaxCompute Writer returns an error message if it is configured to write data to more than three columns.

  • Partition configuration

    MaxCompute Writer can only write data to the last-level partition, and does not support writing data to the specified partition based on a field. To write data to a partitioned table, specify the last-level partition. For example, if you want to write data to a three-level partitioned table, set the partition parameter to a value that contains the last-level partition information, such as pt=20150101, type=1, biz=2. The data cannot be written if you set the partition parameter to pt=20150101, type=1 or pt=20150101.

  • Node rerunning

    To guarantee the idempotence of write operations, set the truncate parameter to true. When a failed sync node is rerun due to a write failure, MaxCompute Writer deletes the data that has been written before importing the source data again. This guarantees that the same data is written for each rerun. If a sync node is interrupted due to other exceptions, the data cannot be rolled back and the node cannot be rerun automatically. By setting the truncate parameter to true, you can guarantee the idempotence of write operations and the data integrity.

    Note If the truncate parameter is set to true, all data of the specified partition or table is deleted before a rerun. Exercise caution when you set this parameter to true.