All Products
Search
Document Center

DataWorks:Tablestore

Last Updated:Mar 26, 2026

The Tablestore data source gives DataWorks a bidirectional channel to read data from and write data to Tablestore, a NoSQL storage service built on the Alibaba Cloud Apsara Distributed File System.

Supported capabilities

The Tablestore Reader and Writer plugins support two table types and two read/write modes. Use the following table to confirm whether your scenario is supported before configuring a task.

Table type Row mode Column mode
Wide table Read / Write Read / Write
Time series table Read / Write Not supported
Set newVersion to true to enable all modes above. The legacy version (newVersion: false) supports only row mode for wide tables.

Key concepts

Row mode and column mode

Row mode maps each record to one row in the Tablestore table. The data format is {primary key values, attribute column values}.

Column mode uses Tablestore's three-level multi-version model: Row > Column > Version. Each column can have multiple versions identified by a timestamp. In column mode, each record is a four-tuple: {primary key value, column name, timestamp, column value}.

The following example shows how column mode transforms data. A table with two primary keys (gid, uid) produces nine input records in four-tuple format:

// Input: 9 records in four-tuple format
1, pk1, row1, 1677699863871, value_0_0
1, pk1, row2, 1677699863871, value_0_1
1, pk1, row3, 1677699863871, value_0_2
2, pk2, row1, 1677699863871, value_1_0
2, pk2, row2, 1677699863871, value_1_1
2, pk2, row3, 1677699863871, value_1_2
3, pk3, row1, 1677699863871, value_2_0
3, pk3, row2, 1677699863871, value_2_1
3, pk3, row3, 1677699863871, value_2_2

After writing to a wide table in column mode, the result is:

// Output: 3 rows in the wide table
gid  uid  row1       row2       row3
1    pk1  value_0_0  value_0_1  value_0_2
2    pk2  value_1_0  value_1_1  value_1_2
3    pk3  value_2_0  value_2_1  value_2_2

Wide tables and time series tables

Wide tables are standard Tablestore tables with primary key columns and attribute columns. Both row mode and column mode are supported.

Time series tables store timestamped data organized by measurement name, data source, and tags. Time series tables support row mode only — column mode is not compatible.

Plugin versions

The newVersion parameter controls which version of the Reader and Writer plugins DataWorks uses.

newVersion value Supported modes Notes
false (default) Row mode, wide tables only Legacy version
true (recommended) Row mode and column mode for wide tables; row mode for time series tables; Writer also supports auto-increment primary key columns Uses fewer system resources; compatible with existing configurations

Limits

  • Only one table per synchronization task is supported. Tablestore does not support synchronizing multiple tables in a single task.

  • Source column order must match the order of primary key columns followed by attribute columns in the destination Tablestore table. A mismatch causes a column mapping error.

  • Tablestore Reader splits the data range into N concurrent tasks, where N is the concurrency level configured for the synchronization job.

  • Tablestore does not support date types. Use the LONG type to store UNIX timestamps at the application layer.

  • Configure INTEGER columns as INT in the code editor. DataWorks converts INT to INTEGER internally. Configuring the type directly as INTEGER causes a task failure.

Supported field types

Tablestore Reader and Writer support all Tablestore data types.

Type category Tablestore data type Code editor value
Integer INTEGER INT
Floating-point DOUBLE DOUBLE
String STRING STRING
Boolean BOOLEAN BOOLEAN
Binary BINARY BINARY

Add a Tablestore data source

Before configuring a synchronization task, add Tablestore as a data source in DataWorks. See Data source management for instructions. Parameter descriptions are available in the DataWorks console when you add the data source.

Configure a synchronization task

For the entry point and general procedure for configuring a synchronization task, see Configure a task in the code editor.

The following sections provide script examples and parameter references for the Tablestore Reader and Writer.

Appendix I: Reader script demo and parameter description

All Tablestore Reader scripts use the stepType: ots plugin. Set newVersion to true and configure mode and isTimeseriesTable to select the read mode.

Reader script examples

Row mode — wide table

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "ots",
            "parameter": {
                "datasource": "<data-source-name>",
                "newVersion": "true",
                "mode": "normal",
                "isTimeseriesTable": "false",
                "table": "<table-name>",
                "column": [
                    {"name": "column1"},
                    {"name": "column2"},
                    {"name": "column3"},
                    {"name": "column4"},
                    {"name": "column5"}
                ],
                "range": {
                    "begin": [
                        {"type": "STRING", "value": "beginValue"},
                        {"type": "INT", "value": "0"},
                        {"type": "INF_MIN"},
                        {"type": "INF_MIN"}
                    ],
                    "end": [
                        {"type": "STRING", "value": "endValue"},
                        {"type": "INT", "value": "100"},
                        {"type": "INF_MAX"},
                        {"type": "INF_MAX"}
                    ],
                    "split": [
                        {"type": "STRING", "value": "splitPoint1"},
                        {"type": "STRING", "value": "splitPoint2"},
                        {"type": "STRING", "value": "splitPoint3"}
                    ]
                }
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {"record": "0"},
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [{"from": "Reader", "to": "Writer"}]
    }
}

Row mode — time series table

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "ots",
            "parameter": {
                "datasource": "<data-source-name>",
                "table": "<table-name>",
                "mode": "normal",
                "newVersion": "true",
                "isTimeseriesTable": "true",
                "measurementName": "measurement_1",
                "column": [
                    {"name": "_m_name"},
                    {"name": "tagA", "is_timeseries_tag": "true"},
                    {"name": "double_0", "type": "DOUBLE"},
                    {"name": "string_0", "type": "STRING"},
                    {"name": "long_0", "type": "INT"},
                    {"name": "binary_0", "type": "BINARY"},
                    {"name": "bool_0", "type": "BOOL"},
                    {"type": "STRING", "value": "testString"}
                ]
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {"record": "0"},
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [{"from": "Reader", "to": "Writer"}]
    }
}

Column mode — wide table

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "ots",
            "parameter": {
                "datasource": "<data-source-name>",
                "table": "<table-name>",
                "newVersion": "true",
                "mode": "multiversion",
                "column": [
                    {"name": "mobile"},
                    {"name": "name"},
                    {"name": "age"},
                    {"name": "salary"},
                    {"name": "marry"}
                ],
                "range": {
                    "begin": [{"type": "INF_MIN"}, {"type": "INF_MAX"}],
                    "end": [{"type": "INF_MAX"}, {"type": "INF_MIN"}],
                    "split": []
                }
            },
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {"record": "0"},
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [{"from": "Reader", "to": "Writer"}]
    }
}

Reader general parameters

Parameter Description Required Default
endpoint The endpoint of the Tablestore instance. See Endpoints. Yes None
accessId The AccessKey ID for Tablestore. Yes None
accessKey The AccessKey secret for Tablestore. Yes None
instanceName The name of the Tablestore instance. An instance is the basic unit for resource management in Tablestore. Access control and resource metering are performed at the instance level. Yes None
table The name of the table to read from. Only one table per task is supported. Yes None
newVersion The plugin version. Set to true to use the new Reader, which supports row mode, column mode, wide tables, and time series tables, and uses fewer system resources. The new plugin is compatible with existing configurations — existing tasks continue to work after adding newVersion: true. No false
mode The read mode. normal reads in row mode with format {primary key values, attribute column values}. multiVersion reads in column mode with format {primary key, attribute column name, timestamp, value}. Takes effect only when newVersion: true. No normal
isTimeseriesTable Set to true if the table is a time series table. Takes effect only when newVersion: true and mode: normal. Time series tables cannot be read in column mode. No false

Row mode parameters — wide table

Parameter Description Required Default
column The columns to read. Use a JSON array of column name objects. Supports normal columns ({"name":"col1"}), constant columns ({"type":"STRING","value":"DataX"}), and INF_MIN/INF_MAX constants. If you use INF_MIN or INF_MAX, do not specify the value property, or an error occurs. Functions and custom expressions are not supported. Yes None
begin / end The primary key range to read. Use {"type":"INF_MIN"} and {"type":"INF_MAX"} for open-ended ranges. When fewer values are configured than the number of primary key columns, the leftmost primary key matching principle applies — unconfigured columns default to [INF_MIN, INF_MAX). The last primary key column uses a left-closed, right-open interval; all other columns use a closed interval. No [INF_MIN, INF_MAX)
split Custom split points for data sharding. Use this when hot spots cause uneven data distribution. The task splits data at the specified values and reads each segment concurrently. Set more split points than the task concurrency level for best performance. If not configured, the Reader automatically splits data by finding the min and max of the partition key and dividing evenly (integers by integer division, strings by Unicode code of the first character). No Auto split

`begin`/`end` examples

To read rows where Hundreds is in [3, 5] and Tens is in [4, 6] from a table with three primary key columns [Hundreds, Tens, Ones]:

"range": {
    "begin": [
        {"type": "INT", "value": "3"},
        {"type": "INT", "value": "4"}
    ],
    "end": [
        {"type": "INT", "value": "5"},
        {"type": "INT", "value": "6"}
    ]
}

To further narrow the range so that Ones is in [5, 7):

"range": {
    "begin": [
        {"type": "INT", "value": "3"},
        {"type": "INT", "value": "4"},
        {"type": "INT", "value": "5"}
    ],
    "end": [
        {"type": "INT", "value": "5"},
        {"type": "INT", "value": "6"},
        {"type": "INT", "value": "7"}
    ]
}

Row mode parameters — time series table

Parameter Description Required Default
column The columns to read. Each element is either a constant column (requires type and value) or a normal column (requires name). Predefined column names: _m_name (measurement name, String), _data_source (data source, String), _tags (tags, String), _time (timestamp, Long). Set is_timeseries_tag: true to read an individual tag value as a separate column. Yes None
measurementName The measurement name to filter by. If not configured, data is read from the entire table. No None
timeRange The time range to read, as [begin, end). begin must be less than end. The timestamp unit is microseconds. begin defaults to 0; end defaults to Long.MAX_VALUE (9223372036854775807). No All versions

Column mode parameters — wide table

Parameter Description Required Default
column The attribute columns to export. Primary key columns and constant columns are not supported. The exported four-tuple includes the complete primary key automatically. Each column can be listed only once. Yes All columns
range The primary key range to read, as [begin, end). If begin is less than end, data is read in ascending order. If begin is greater than end, data is read in descending order. begin and end cannot be equal. Supported types: string, int, binary (Base64-encoded), INF_MIN, INF_MAX. The split sub-parameter accepts partition key values (first primary key column only) to enable concurrent reading. No All data
timeRange The time range of versions to read, as [begin, end). The timestamp unit is microseconds. No All versions
maxVersion The maximum number of versions to read per column. Value range: 1 to INT32_MAX. No All versions

Appendix II: Writer script demo and parameter description

All Tablestore Writer scripts use the stepType: ots plugin. Set newVersion to true and configure mode and isTimeseriesTable to select the write mode.

Writer script examples

Row mode — wide table

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "ots",
            "parameter": {
                "datasource": "<data-source-name>",
                "table": "<table-name>",
                "newVersion": "true",
                "mode": "normal",
                "isTimeseriesTable": "false",
                "primaryKey": [
                    {"name": "gid", "type": "INT"},
                    {"name": "uid", "type": "STRING"}
                ],
                "column": [
                    {"name": "col1", "type": "INT"},
                    {"name": "col2", "type": "DOUBLE"},
                    {"name": "col3", "type": "STRING"},
                    {"name": "col4", "type": "STRING"},
                    {"name": "col5", "type": "BOOL"}
                ],
                "writeMode": "PutRow"
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {"record": "0"},
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [{"from": "Reader", "to": "Writer"}]
    }
}

Row mode — time series table

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "ots",
            "parameter": {
                "datasource": "<data-source-name>",
                "table": "testTimeseriesTableName01",
                "mode": "normal",
                "newVersion": "true",
                "isTimeseriesTable": "true",
                "timeunit": "microseconds",
                "column": [
                    {"name": "_m_name"},
                    {"name": "_data_source"},
                    {"name": "_tags"},
                    {"name": "_time"},
                    {"name": "string_1", "type": "string"},
                    {"name": "tag3", "is_timeseries_tag": "true"}
                ]
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {"record": "0"},
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [{"from": "Reader", "to": "Writer"}]
    }
}

Column mode — wide table

{
    "type": "job",
    "version": "2.0",
    "steps": [
        {
            "stepType": "stream",
            "parameter": {},
            "name": "Reader",
            "category": "reader"
        },
        {
            "stepType": "ots",
            "parameter": {
                "datasource": "<data-source-name>",
                "table": "<table-name>",
                "newVersion": "true",
                "mode": "multiVersion",
                "primaryKey": ["gid", "uid"]
            },
            "name": "Writer",
            "category": "writer"
        }
    ],
    "setting": {
        "errorLimit": {"record": "0"},
        "speed": {
            "throttle": true,
            "concurrent": 1,
            "mbps": "12"
        }
    },
    "order": {
        "hops": [{"from": "Reader", "to": "Writer"}]
    }
}

Writer general parameters

Parameter Description Required Default
datasource The data source name as configured in DataWorks. The value must match the name of the added data source exactly. Yes None
endPoint The endpoint of the Tablestore instance. See Endpoints. Yes None
accessId The AccessKey ID for Tablestore. Yes None
accessKey The AccessKey secret for Tablestore. Yes None
instanceName The name of the Tablestore instance. An instance is the basic unit for resource management in Tablestore. Access control and resource metering are performed at the instance level. Yes None
table The name of the table to write to. Only one table per task is supported. Yes None
newVersion The plugin version. Set to true to use the new Writer, which supports row mode, column mode, wide tables, time series tables, and auto-increment primary key columns, and uses fewer system resources. The new plugin is compatible with existing configurations. Yes false
mode The write mode. normal writes in row mode. multiVersion writes in column mode. Takes effect only when newVersion: true. No normal
isTimeseriesTable Set to true if the table is a time series table. Takes effect only when newVersion: true and mode: normal. Column mode is not compatible with time series tables. No false

Row mode parameters — wide table

Parameter Description Required Default
primaryKey The primary key columns of the destination table. Specify each column as {"name":"<col>","type":"<type>"}. Tablestore primary keys support only STRING and INT types. Yes None
column The attribute columns to write. Specify each column as {"name":"<col>","type":"<type>"}. Supported types: STRING, INT, DOUBLE, BOOL, BINARY. Constants, functions, and custom expressions are not supported. Yes None
writeMode The write behavior when a row with matching primary keys already exists in the destination table. PutRow overwrites the entire existing row with the new data. UpdateRow adds, modifies, or deletes the values of specified columns in the existing row based on the request content. If the row does not exist, both modes create a new row. Yes None
enableAutoIncrement Set to true if the destination table has an auto-increment primary key column. The Writer automatically detects and handles the auto-increment column — do not include it in primaryKey. Set to false (or omit) if the table has no auto-increment column; writing to a table with an auto-increment column without this flag causes an error. No false
requestTotalSizeLimitation The maximum size of a single row, in bytes. No 1 MB
attributeColumnSizeLimitation The maximum size of a single attribute column value, in bytes. No 2 MB
primaryKeyColumnSizeLimitation The maximum size of a single primary key column value, in bytes. No 1 KB
attributeColumnMaxCount The maximum number of attribute columns per row. No 1,024

Row mode parameters — time series table

Parameter Description Required Default
column The fields to write. Each element maps to one field in the time series record. Predefined column names: _m_name (measurement name, String), _data_source (data source, String), _tags (tags, String, format: ["key1=value1","key2=value2"]), _time (timestamp, Long). Set is_timeseries_tag: true to write an individual column as a tag key-value pair. The _m_name and _time fields are required. Yes None
timeunit The unit of the _time timestamp field. Supported values: NANOSECONDS, MICROSECONDS, MILLISECONDS, SECONDS, MINUTES. No MICROSECONDS

Column mode parameters — wide table

Parameter Description Required Default
primaryKey The primary key column names of the destination table, as a JSON array of strings. The input record format must be {pk0, pk1, ...}, {columnName}, {timestamp}, {value} — primary key values come first, followed by the column name, timestamp, and value. Yes None
columnNamePrefixFilter A prefix to strip from column names before writing. Use this when importing from HBase, where the column family prefix (e.g., cf:) is part of the column name but is not supported by Tablestore. If a column name does not start with the configured prefix, the record is sent to the dirty data queue. No None

FAQ

How do I write to a table that has an auto-increment primary key column?

Add "newVersion": "true" and "enableAutoIncrement": "true" to the Writer configuration. Do not include the auto-increment column name in primaryKey. The total number of entries in primaryKey plus column must equal the number of columns in the upstream Reader output.

What is the difference between `_tags` and `is_timeseries_tag` in time series configurations?

They control whether tags are handled as a batch or individually.

_tags reads or writes all tags as a single column. The format is ["phone=xiaomi","camera=LEICA","RAM=8G"]. Use this when you want to preserve the full tag set as one value.

is_timeseries_tag: true reads or writes a single tag by key name as a separate column. For example, configuring {"name": "phone", "is_timeseries_tag": "true"} and {"name": "camera", "is_timeseries_tag": "true"} produces two columns with values xiaomi and LEICA respectively.

You can mix both approaches in the same configuration. For example, if the upstream data has one column containing ["phone=xiaomi","camera=LEICA","RAM=8G"] and another column with a numeric value 6499 that you want to add as a tag price:

"column": [
    {"name": "_tags"},
    {"name": "price", "is_timeseries_tag": "true"}
]

This imports the first column as a tag batch and adds price=6499 as an additional individual tag.

数据格式