How to perform batch writes to a Redis data source in DataWorks - DataWorks

DataWorks Data Integration allows you to use Redis Writer to write data to Redis. This topic describes how to perform batch writes to a Redis data source by using DataWorks.

Limitations

Data import tasks can be run on serverless resource groups (recommended) and exclusive resource groups for Data Integration.
When you use Redis Writer to write data, rerunning a synchronization task is not an idempotent operation if the value is a List. Therefore, if the value is a List, you must manually clear the corresponding data from Redis before you rerun the task.
Important
Redis does not currently support Bloom filter configurations. To handle duplicate data, you can use a workaround: add a node, such as a Shell, Python, or PyODPS node, before or after the synchronization node in your workflow to perform deduplication.

Supported data types

Redis supports a rich set of value data types, including String, List, Set, ZSet (sorted set), and Hash. For more information about Redis, see redis.io.

Data synchronization task development

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

For the configuration procedure, see Configure a task in the codeless UI and Configure a task in the code editor.
For a complete list of parameters and code samples for the code editor, see Appendix: Code samples and parameters.

Appendix: Code samples and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Use the Code Editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Writer code sample

The following code is a sample data synchronization task that reads data from a MySQL database and writes it to Redis. It shows the code for both the MySQL Reader and the Redis Writer.

{
    "type":"job",
    "version":"2.0",  // The version number.
    "steps":[
        { // The following is a code sample for the Reader. For more information about Reader parameters, see the documentation for the corresponding Reader plugin.
            "stepType":"mysql",   
            "parameter": {
                "envType": 0,
                "datasource": "xc_mysql_demo2",
                "column": [
                    "id",
                    "value",
                    "table"
                ],
                "connection": [
                    {
                        "datasource": "xc_mysql_demo2",
                        "table": []
                    }
                ],
                "where": "",
                "splitPk": "",
                "encoding": "UTF-8"
            },,
            "name":"Reader",
            "category":"reader"
        },
        {// The following is a code sample for the Writer.
            "stepType":"redis",                    // The plugin name for Redis Writer. Set this parameter to redis.
            "parameter":{                          // The following section describes the main parameters of Redis Writer.
                "expireTime":{                     // The cache expiration time for Redis values. You can set this parameter to the seconds type or the unixtime type."seconds":"1000"
                            }, 
                "keyFieldDelimiter":"u0001",       // The delimiter for Redis keys.
                "dateFormat":"yyyy-MM-dd HH:mm:ss",// The date format used when data is written to Redis.
                "datasource":"xc_mysql_demo2",     // The data source name. This must be the same as the name of the data source you added.
                "envType": 0,                      // The environment type. Development environment: 1. Production environment: 0.
                "writeMode":{                      // The write mode.
                    "type":"string"                // The value type.
                    "mode":"set",                  // The write mode for a specific value type.
                    "valueFieldDelimiter":"u0001", // The delimiter between values.
                             },
                "keyIndexes":[0,1],                // Used for mapping from the source to Redis. Specifies the source columns to be used as the key (the first column starts from 0). For example, if the first and second columns of the source are combined as the Redis key, set this parameter to [0,1].
                "batchSize":"1000"                 // The number of records in each batch.
        "column": [                        // For Redis type string with set operation: if this column is not configured, the value format is a delimiter-separated string (CSV format. Assuming ID=1, name="John", age=18, sex=male, the Redis value example: "18::male"). If column is configured in the following format, the Redis value will be written in JSON format. Assuming ID=1, name="John", age=18, sex=male, the Redis value example: {"id":1,"name":"John","age":18,"sex":"male"}
                {
                "name": "id",
                "index": "0"

                },
                {
                "name": "name",
                "index": "1"
                },
                {
                "name": "age",
                "index": "2"
                },
                {
                "name": "sex",
                "index": "3"
                }
            ]
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"                           // The error count.
        },
        "speed":{
            "throttle":true,// When throttle is set to false, the mbps parameter does not take effect, which means throttling is disabled. When throttle is set to true, throttling is enabled.
            "concurrent":1, // The concurrency of the job.
            "mbps":"12"// The throttling rate. 1 mbps = 1 MB/s.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

Writer parameters

Parameter	Description	Required	Default value
expireTime	The cache expiration time for Redis values, in seconds. If this parameter is not specified, the default value `0` is used, which indicates that the data never expires. You can configure expireTime in one of the following two ways: seconds: specifies how long from now until the data expires. The value is the number of seconds from the current time to the expiration time. unixtime: specifies how long from January 1, 1970 until the data expires. The value is the number of seconds from January 1, 1970 to the expiration time.	No	0 (0 indicates that the data never expires)
keyFieldDelimiter	The delimiter for Redis keys. For example, key=key1\u0001id. This parameter is required if multiple keys need to be concatenated. If only one key is used, you can skip this parameter.	No	\u0001
dateFormat	The date format used when data is written to Redis: yyyy-MM-dd HH:mm:ss.	No	N/A
datasource	The data source name. The value must be the same as the name of the data source you added.	Yes	N/A
selectDatabase	The Redis database to write to (`"0" to "N-1"`, where N is the number of `databases` configured in Redis). Database selection is not supported for Redis clusters.	No	Database 0 by default
writeMode	Redis Writer supports the following five value types for writing data to Redis: String List Set ZSet (sorted set) Hash The writeMode configuration varies slightly depending on the value type. For more information, see writeMode parameter description below. Note When you configure Redis Writer, you must set writeMode to one of the five supported data types, and only one type can be specified. If you do not configure this parameter, writeMode uses the default value `string`.	No	string
keyIndexes	Specifies the column indexes of the source columns to be used as the key. Column indexes start from 0 (the first column has an index of 0, the second column has an index of 1, and so on). If a single source column is used as the Redis key, set this parameter to the index of that column. For example, if the first column is used as the key, set this parameter to `0`. If multiple consecutive source columns are combined as the Redis key, set this parameter to an array of the corresponding column indexes. For example, if the second through fourth columns are combined as the key, set this parameter to `[1,3]`. Note After you configure keyIndexes, Redis Writer uses the remaining columns as the value. If you want to synchronize only specific columns as the key and other specific columns as the value, you do not need to synchronize all columns. You can specify the column parameter in the Reader plugin to filter columns.	Yes	N/A
batchSize	The number of records in each batch. This parameter can significantly reduce the number of network interactions between the data synchronization system and Redis, and improve overall throughput. If this value is set too large, the data synchronization process may encounter an out-of-memory (OOM) error.	No	1,000
timeout	The timeout for writing data to Redis, in milliseconds.	No	30,000
redisMode	The running mode of Redis. Valid values: Cluster mode: Set redisMode to ClusterMode to indicate the cluster mode. In cluster mode, other data sources communicate directly with the Redis cluster when writing data to Redis. Typically, self-managed Redis cluster addresses and Alibaba Cloud Redis direct connection addresses use this mode. Cluster mode supports batch writes. Non-cluster mode: Leave redisMode empty (do not configure any value) to indicate non-cluster mode. Typically, Alibaba Cloud Redis cluster proxy addresses, read/write splitting addresses, and Standard Edition addresses use this mode. Non-cluster mode does not currently support batch writes. Note Supported on serverless resource groups (recommended) and exclusive resource groups for Data Integration.	No	N/A
column	The column configuration for writing data to Redis. For Redis type string with the set operation: If this column is not configured, the value format is a delimiter-separated string (CSV format. For example, if ID=1, name="John", age=18, and sex=male, the Redis value example is: "18::male"). If column is configured in the following format, such as `"column": [{"index":"0", "name":"id"}, {"index":"1", "name":"name"}]`, the Redis value data is stored in JSON format after being written to Redis, in the format of `{"id":"value of the corresponding source column","name":"value of the corresponding source column"}` . For example, if ID=1 and name="John", the Redis value example is `{"id":"1","name":"John"}`.	No	N/A

writeMode parameter description

When you configure Redis Writer, you must set writeMode to one of the five supported data types, and only one type can be specified. If you do not configure this parameter, writeMode uses the default value string.

Value type	type parameter (required)	mode parameter (required)	valueFieldDelimiter parameter (optional)	writeMode configuration example
String	Set type to `string`.	mode is the write mode parameter. When the value is a string: Set mode to `set`. If the data to be stored already exists, the existing data is overwritten.	valueFieldDelimiter is the delimiter between values. The default value is `\u0001`. This parameter is mainly used when there are more than two columns in each row of source data. For example, if there are three columns, the values are separated by the delimiter as follows: value1\u0001value2\u0001value3. If the source data has only two columns (key and value), you do not need to configure this parameter.	`"writeMode":{ "type": "string", "mode": "set", "valueFieldDelimiter": "\u0001" }`
List	Set type to `list`.	mode is the write mode parameter. When the value is a list, the following options are available: `lpush`: stores data at the leftmost position of the list. `rpush`: stores data at the rightmost position of the list.		`"writeMode":{ "type": "list", "mode": "lpush\|rpush", "valueFieldDelimiter": "\u0001" }`
Set	Set type to `set`.	mode is the write mode parameter. When the value is a set: Set mode to `sadd` to store data in the set. If the data to be stored already exists, the existing data is overwritten.		`"writeMode":{ "type": "set", "mode": "sadd", "valueFieldDelimiter": "\u0001" }`
ZSet (sorted set)	Set type to `zset`.	mode is the write mode parameter. When the value is a ZSet (sorted set): Set mode to `zadd` to store data in the sorted set. If the data to be stored already exists, the existing data is overwritten.	This parameter does not need to be configured.	`"writeMode":{ "type": "zset", "mode": "zadd" }` Note When the value type is zset, each row of source data must follow the corresponding format. Each row can contain only one score-value pair in addition to the key, and the score must precede the value so that Redis Writer can correctly identify which column corresponds to the score and which to the value.
Hash	Set type to `hash`.	mode is the write mode parameter. When the value is a hash: Set mode to `hset` to store data in the hash. If the data to be stored already exists, the existing data is overwritten.	This parameter does not need to be configured.	`"writeMode":{ "type": "hash", "mode": "hset" }` Note When the value type is hash, each row of source data must follow the corresponding format. Each row can contain only one attribute-value pair in addition to the key, and the attribute must precede the value so that Redis Writer can correctly identify which column corresponds to the attribute and which to the value.