This topic describes the data types and parameters that are supported by MaxCompute Writer and how to configure MaxCompute Writer by using the codeless user interface (UI) and code editor.
Prerequisites
Before you configure MaxCompute Writer, you must configure a MaxCompute data source. For more information, see Add a MaxCompute data source.
Background information
MaxCompute Writer is designed for developers to insert data to or update data in MaxCompute. MaxCompute Writer can write gigabytes or terabytes of data to MaxCompute. For more information about MaxCompute, see What is MaxCompute?.
MaxCompute Writer writes data to MaxCompute by using Tunnel based on the information you specified, such as the source project, table, partition, and field. For more information about common Tunnel commands, see Tunnel commands.
For a table with a strict schema, such as a table in a MySQL database or MaxCompute project, Data Integration reads data from the table and stores the data in the memory. Then, Data Integration converts the data to the format that is supported by the destination and writes the data to the destination.
Parameters
Parameter | Description | Required | Default value |
---|---|---|---|
datasource | The name of the data source. It must be the same as the name of the added data source. You can add data sources by using the code editor. | Yes | No default value |
table | The name of the table to which you want to write data. The name is not case-sensitive. You can specify only one table. | Yes | No default value |
partition | The partitions to which you want to write data. The lowest-level partition must be
specified. For example, if you want to write data to a table with three-level partitions,
set the partition parameter to a value that contains the third-level partition information,
such as pt=20150101, type=1, biz=2 .
|
Required only for partitioned tables | No default value |
column | The names of the columns to which you want to write data. If you want to write data
to all the columns in the destination table, set this parameter to an asterisk (*),
such as "column": ["*"] . If you want to write data to some columns in the destination table, set this parameter
to the names of the specified columns. Separate the names with commas (,), such as
"column": ["id","name"] .
|
Yes | No default value |
truncate | To ensure the idempotence of write operations, set the truncate parameter to true. If a failed synchronization node is rerun due to a write failure,
MaxCompute Writer deletes the data that has been written before and writes the source
data again. This ensures that the same data is written for each rerun.
MaxCompute Writer uses MaxCompute SQL to delete data. MaxCompute SQL cannot ensure data atomicity. Therefore, the TRUNCATE operation is not an atomic operation. Conflicts may occur when multiple nodes delete data from the same table or partition in parallel. To prevent this issue, we recommend that you do not execute multiple DDL statements to write data to the same partition at the same time. You can create different partitions for nodes that need to run in parallel. |
Yes | No default value |
Configure MaxCompute Writer by using the codeless UI
- Configure data sources.
Configure Source and Target for the synchronization node.
Parameter Description Connection The name of the data source to which you want to write data. This parameter is equivalent to the datasource parameter that is described in the preceding section. Table The name of the table to which you want to write data. This parameter is equivalent to the table parameter that is described in the preceding section. Writing Rule The write rule. Valid values: - Write with Original Data Deleted (Insert Overwrite): All data in the table or partition is deleted before MaxCompute Writer writes data.
This rule is equivalent to the
INSERT OVERWRITE
statement. - Write with Original Data Retained (Insert Into): No data is deleted before MaxCompute Writer writes data. New data is appended upon
each run. This rule is equivalent to the
INSERT INTO
statement.
Note- MaxCompute Reader reads data by using Tunnel. Synchronization nodes cannot filter data. Each synchronization node reads all the data from a table or partition.
- MaxCompute Writer writes data by using Tunnel instead of the INSERT INTO statement. You can view complete data in the destination table only after a synchronization node is properly run. Pay attention to the node dependencies.
Convert Empty Strings to Null Specifies whether to convert empty strings to null. - Write with Original Data Deleted (Insert Overwrite): All data in the table or partition is deleted before MaxCompute Writer writes data.
This rule is equivalent to the
- Configure field mappings. This operation is equivalent to setting the column parameter that is described in the preceding section. Fields in the source on the
left have a one-to-one mapping with fields in the destination on the right.
Operation Description Map Fields with the Same Name Click Map Fields with the Same Name to establish mappings between fields with the same name. The data types of the fields must match. Map Fields in the Same Line Click Map Fields in the Same Line to establish mappings between fields in the same row. The data types of the fields must match. Delete All Mappings Click Delete All Mappings to remove the mappings that are established. Auto Layout Click Auto Layout. Then, the system automatically sorts the fields based on specific rules.
Configure MaxCompute Writer by using the code editor
You can configure MaxCompute Writer by using the code editor. For more information, see Create a sync node by using the code editor.
{
"type":"job",
"version":"2.0",// The version number.
"steps":[
{
"stepType":"stream",
"parameter":{},
"name":"Reader",
"category":"reader"
},
{
"stepType":"odps",// The writer type.
"parameter":{
"partition":"",// The partitions to which you want to write data.
"truncate":true,// The write rule.
"compress":false,// Specifies whether to enable compression.
"datasource":"odps_first",// The name of the data source.
"column": [// The names of the columns to which you want to write data.
"id",
"name",
"age",
"sex",
"salary",
"interest"
],
"emptyAsNull":false,// Specifies whether to convert empty strings to null.
"table":""// The name of the table to which you want to write data.
},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"// The maximum number of dirty data records allowed.
},
"speed":{
"throttle":true,// Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true.
"concurrent":1, // The maximum number of parallel threads.
"mbps":"12"// The maximum transmission rate.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}
"datasource":"",
in the preceding code with detailed parameters of the data source. Example: "accessId":"<yourAccessKeyId>",
"accessKey":"<yourAccessKeySecret>",
"endpoint":"http://service.eu-central-1.maxcompute.aliyun-inc.com/api",
"odpsServer":"http://service.eu-central-1.maxcompute.aliyun-inc.com/api",
"tunnelServer":"http://dt.eu-central-1.maxcompute.aliyun.com",
"project":"**********",
Additional instructions
- Column filter
MaxCompute Writer allows you to perform operations that MaxCompute does not support, such as filtering columns, reordering columns, and setting empty fields to null. To write data to all the columns in the destination table, set the column parameter to an asterisk (*), such as
"column": ["*"]
.For example, a MaxCompute table has three columns: a, b, and c. If you want to write data only to column c and column b, you can set the column parameter to
"column": ["c","b"]
. The first column and the second column of the source table are written to column c and column b in the MaxCompute table. During data synchronization, column a is automatically set to null. - Handling column configuration errors
To prevent data loss caused by redundant columns and ensure high data reliability, MaxCompute Writer returns an error message if the number of columns that are to be written is more than that in the destination table. For example, if a MaxCompute table contains columns a, b, and c, MaxCompute Writer returns an error message if more than three columns are to be written to the table.
- Partition configuration
MaxCompute Writer can write data to the lowest-level partition but cannot write data to a specified partition based on a field. To write data to a partitioned table, specify the lowest-level partition. For example, if you want to write data to a table with three-level partitions, set the partition parameter to a value that contains the third-level partition information, such as,
pt=20150101, type=1, biz=2
. The data cannot be written if you set the partition parameter topt=20150101, type=1
orpt=20150101
. - Node rerunning
To ensure the idempotence of write operations, set the
truncate
parameter to true. If a failed synchronization node is rerun due to a write failure, MaxCompute Writer deletes the data that has been written before and writes the source data again. This ensures that the same data is written for each rerun. If a synchronization node is interrupted due to other exceptions, the data cannot be rolled back and the node cannot be automatically rerun. You can ensure the idempotence of write operations and the data integrity by setting the truncate parameter to true.Note If the truncate parameter is set to true, all data in the specified partition or table is deleted before a rerun. Exercise caution when you set this parameter to true.