DataWorks Data Integration uses the Lindorm Reader and Lindorm Writer plug-ins to read data from and write data to Lindorm. This topic describes the data read and write capabilities that DataWorks provides for Lindorm.
Applicability
LindormTable supports serverless resource groups (recommended) and exclusive resource groups for Data Integration.
The compute engine supports only serverless resource groups.
Lindorm is a multi-model database. For more information, see Lindorm documentation. DataWorks currently supports only LindormTable and the compute engine.
Supported field types
Lindorm Reader and Lindorm Writer support most Lindorm data types, but some types are not supported. You must verify that your data types are supported.
The following table lists the data type conversions for Lindorm Reader and Lindorm Writer.
Type categorization | Data type |
Integer | INT, LONG, SHORT |
Floating-point | DOUBLE, FLOAT, DOUBLE |
String | STRING |
Date and time | DATE |
Boolean | BOOLEAN |
Binary | BINARYSTRING |
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.
Offline single-table sync
Supported data sources: All data source types that Data Integration supports.
Configuration guide: Offline single-table sync task
For a list of all parameters and a script demo for the code editor, see Appendix: Script demo and parameters.
Real-time single-table sync
Supported data sources: Kafka, LogHub, and Hologres
Configuration guide: Real-time single-table sync task
Real-time full-database sync
Supported data source: PostgreSQL
Configuration guide: Configure a real-time full-database sync task
Appendix: Script demo and parameters
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a task in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Reader script demo
Configure a job to extract data from a LindormTable Lindorm SQL table to a local machine.
{ "type": "job", "version": "2.0", "steps": [ { "stepType": "lindorm", "parameter": { "mode": "FixedColumn", "caching": 128, "column": [ "id", "value" ], "envType": 1, "datasource": "lindorm", "tableMode": "tableService", "table": "lindorm_table" }, "name": "lindormreader", "category": "reader" }, { "stepType": "mysql", "parameter": { "postSql": [], "datasource": "lindorm", "session": [], "envType": 1, "column": [ "id", "value" ], "socketTimeout": 3600000, "writeMode": "insert", "batchSize": 1024, "encoding": "UTF-8", "table": "", "preSql": [] }, "name": "Writer", "category": "writer" } ], "setting": { "jvmOption": "", "executeMode": null, "errorLimit": { "record": "0" }, "speed": { // Sets the transmission speed in byte/s. DataX tries to reach but not exceed this speed. "byte": 1048576 } // Error limit "errorLimit": { // The maximum number of error records. If the number of error records exceeds this value, the job fails. "record": 0, // The maximum percentage of error records. For example, 1.0 means 100%, and 0.02 means 2%. "percentage": 0.02 } }, "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] } }Configure a job to extract data from a LindormTable Lindorm HBaseLike (WideColumn) table to a local machine.
{ "type": "job", "version": "2.0", "steps": [ { "stepType": "lindorm", "parameter": { "mode": "FixedColumn", "column": [ "STRING|rowkey", "INT|f:a" ], "envType": 1, "datasource": "lindorm", "tableMode": "wideColumn", "table":"lindorm_table" }, "name": "lindormreader", "category": "reader" }, { "stepType": "mysql", "parameter": { "postSql": [], "datasource": "_IDB.TAOBAO", "session": [], "envType": 1, "column": [ "id", "value" ], "socketTimeout": 3600000, "guid": "", "writeMode": "insert", "batchSize": 1024, "encoding": "UTF-8", "table": "", "preSql": [] }, "name": "Writer", "category": "writer" } ], "setting": { "jvmOption": "", "executeMode": null, "errorLimit": { "record": "0" }, "speed": { // Sets the transmission speed in byte/s. DataX tries to reach but not exceed this speed. "byte": 1048576 } // Error limit "errorLimit": { // The maximum number of error records. If the number of error records exceeds this value, the job fails. "record": 0, // The maximum percentage of error records. For example, 1.0 means 100%, and 0.02 means 2%. "percentage": 0.02 } }, "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] } }Configure a job to extract data from a compute engine table to a local machine.
{ "type": "job", "version": "2.0", "steps": [ { "stepType": "lindorm", "parameter": { "datasource": "lindorm_datasource", "column": [ "id", "value" ], "tableComment": "", "where": "", "session": [], "splitPk": "id", "table": "auto_ob_149912212480" }, "name": "lindormreader", "category": "reader" }, { "stepType": "mysql", "parameter": { "postSql": [], "datasource": "_IDB.TAOBAO", "session": [], "envType": 1, "column": [ "id", "value" ], "socketTimeout": 3600000, "guid": "", "writeMode": "insert", "batchSize": 1024, "encoding": "UTF-8", "table": "", "preSql": [] }, "name": "Writer", "category": "writer" } ], "setting": { "jvmOption": "", "executeMode": null, "errorLimit": { "record": "0" }, "speed": { // Sets the transmission speed in byte/s. DataX tries to reach but not exceed this speed. "byte": 1048576 } // Error limit "errorLimit": { // The maximum number of error records. If the number of error records exceeds this value, the job fails. "record": 0, // The maximum percentage of error records. For example, 1.0 means 100%, and 0.02 means 2%. "percentage": 0.02 } }, "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] } }
Reader script parameters
Parameter | Description | Required | Default value | ||||||||||||||||||||
mode | Specific to LindormTable. Specifies the data read mode. Valid values are FixedColumn and DynamicColumn. | Yes | FixedColumn | ||||||||||||||||||||
tableMode | Specific to LindormTable. Valid values are table for standard table SQL mode and wideColumn for wide table mode. The default value is table. If you select table mode, you do not need to specify this parameter. | No | Not specified by default | ||||||||||||||||||||
table | The name of the Lindorm table from which to read data. The table name is case-sensitive. | Yes | None | ||||||||||||||||||||
encoding | Specific to LindormTable. The codec. Valid values are UTF-8 and GBK. This parameter is typically used to convert a Lindorm byte[] type stored in binary to a String type. | No | UTF-8 | ||||||||||||||||||||
caching | Specific to LindormTable. The number of records to retrieve in a single batch. A larger value can significantly reduce network interactions between the data synchronization system and Lindorm, improving overall throughput. If this value is too large, it may cause excessive pressure on the Lindorm server or an out-of-memory (OOM) error in the data synchronization process. | No | 100 | ||||||||||||||||||||
selects | Specific to LindormTable. The system does not automatically shard data for the current table type being read. By default, the job runs with a single concurrent process. You must manually configure the selects parameter to shard the data. Example: Limits:
| No | None | ||||||||||||||||||||
session | Specific to the compute engine. Session-level job parameters, such as | No | None | ||||||||||||||||||||
splitPk | Specific to the compute engine. The shard key. This parameter is specific to reading data from compute engine tables. If you specify splitPk, the data is sharded based on the specified field. Data synchronization starts concurrent tasks to synchronize the data, which improves efficiency.
| No | None | ||||||||||||||||||||
columns | The list of fields to read. You can crop columns and reorder them. Cropping columns means you can select a subset of columns to export. Reordering columns means you can export columns in an order different from the table schema.
| Yes | None |
Writer script demo
Configure a job to write data from a MySQL data source to a LindormTable Lindorm SQL table.
{ "type": "job", "version": "2.0", "steps": [ { "stepType": "mysql", "parameter": { "checkSlave": true, "datasource": " ", "envType": 1, "column": [ "id", "value" ], "socketTimeout": 3600000, "masterSlave": "slave", "connection": [ { "datasource": " ", "table": [] } ], "where": "", "splitPk": "", "encoding": "UTF-8", "print": true }, "name": "mysqlReader", "category": "reader" }, { "stepType": "lindorm", "parameter": { "nullMode": "skip", "datasource": "lindorm_datasource", "envType": 1, "column": [ "id", "value" ], "dynamicColumn": "false", "table": "lindorm_table", "encoding": "utf8" }, "name": "Writer", "category": "writer" } ], "setting": { "jvmOption": "", "executeMode": null, "speed": { // Sets the transmission speed in byte/s. DataX tries to reach but not exceed this speed. "byte": 1048576 }, // Error limit "errorLimit": { // The maximum number of error records. If the number of error records exceeds this value, the job fails. "record": 0, // The maximum percentage of error records. For example, 1.0 means 100%, and 0.02 means 2%. "percentage": 0.02 } }, "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] } }Configure a job to write data from a MySQL data source to a LindormTable Lindorm HBaseLike (WideColumn) table.
{ "type": "job", "version": "2.0", "steps": [ { "stepType": "mysql", "parameter": { "envType": 0, "datasource": " ", "column": [ "id", "value" ], "connection": [ { "datasource": " ", "table": [] } ], "where": "", "splitPk": "", "encoding": "UTF-8" }, "name": "Reader", "category": "reader" }, { "stepType": "lindorm", "parameter": { "datasource": "lindorm_datasource", "table": "xxxxxx", "encoding": "utf8", "nullMode": "skip", "dynamicColumn": "false", "caching": 128, "column": [ // Maps fields from the source in order. "ROW|STRING", // The rowkey. This is a fixed configuration. The first field from the source is mapped to the rowkey. In this example, the id field is mapped to the rowkey. "cf:name|STRING" // cf specifies the column family name, which you can change. name specifies the column name in the destination, which you can change. ] }, "name":"Writer", "category":"writer" } ], "setting": { "jvmOption": "", "errorLimit": { "record": "0" }, "speed": { "concurrent": 3, "throttle": false } }, "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] } }Configure a job to write data from a MySQL data source to a compute engine table.
{ "type": "job", "version": "2.0", "steps": [ { "stepType": "mysql", "parameter": { "envType": 0, "datasource": " ", "column": [ "id", "value" ], "connection": [ { "datasource": " ", "table": [] } ], "where": "", "splitPk": "", "encoding": "UTF-8" }, "name": "Reader", "category": "reader" }, { "stepType": "lindorm", "parameter": { "datasource": "lindorm_datasource", "table": "xxxxxx", "column": [ "id", "value" ], "formatType": "ICEBERG" }, "name":"Writer", "category":"writer" } ], "setting": { "jvmOption": "", "errorLimit": { "record": "0" }, "speed": { "concurrent": 3, "throttle": false } }, "order": { "hops": [ { "from": "Reader", "to": "Writer" } ] } }
Writer script parameters
Parameter | Description | Required | Default value |
table | The name of the Lindorm table to which to write data. The table name is case-sensitive. | Yes | None |
encoding | Specifies the codec for LindormTable. Valid values are UTF-8 and GBK. This parameter is typically used to convert a Lindorm byte[] type stored in binary to a String type. | No | UTF-8 |
columns | Specifies the list of fields to write. You can crop columns, which means you can select a subset of columns to export. You can also reorder columns, which means you can export columns in an order that is different from the table schema.
| Yes | None |
nullMode | This parameter applies only to LindormTable. It specifies how to handle null values from the source data.
| No | EMPTY_BYTES |
formatType | This parameter applies only to the compute engine. It specifies the format of the table for the sync task. Valid values:
| No | None |