All Products
Search
Document Center

MaxCompute:Tablestore external tables

Last Updated:Jul 16, 2025

This topic describes how to import data from Tablestore to MaxCompute to establish seamless connections between multiple data sources.

Background information

Tablestore is a NoSQL data storage service built on the Alibaba Cloud Apsara distributed system. Tablestore provides storage and real-time access to massive amounts of structured data. For more information, see Tablestore documentation.

You can create, search for, configure, and process external tables in the DataWorks console. You can also query and analyze data using the external table feature. For more information, see External tables.

Precautions

  • You must ensure network connectivity between MaxCompute and Tablestore. When you access Tablestore data from MaxCompute in the public cloud, we recommend that you use the private endpoint of Tablestore. The private endpoint ends with ots-internal.aliyuncs.com, for example, tablestore://odps-ots-dev.cn-shanghai.ots-internal.aliyuncs.com.

  • Tablestore and MaxCompute use different data type systems. The following table lists the mappings between the data types supported by Tablestore and MaxCompute.

    MaxCompute Type

    Tablestore Type

    STRING

    STRING

    BIGINT

    INTEGER

    DOUBLE

    DOUBLE

    BOOLEAN

    BOOLEAN

    BINARY

    BINARY

  • Tablestore external tables do not support the clustering attribute.

Prerequisites

Create a foreign table

MaxCompute provides the external table feature. You can use external tables to import data from Tablestore to the meta system of MaxCompute for processing. The following section describes how to create a Tablestore external table.

Sample statements:

DROP TABLE IF EXISTS ots_table_external;
CREATE EXTERNAL TABLE IF NOT EXISTS ots_table_external
(
  odps_orderkey bigint,
  odps_orderdate string,
  odps_custkey bigint,
  odps_orderstatus string,
  odps_totalprice double,
  odps_createdate timestamp
)
STORED BY 'com.aliyun.odps.TableStoreStorageHandler'
WITH SERDEPROPERTIES (
  'tablestore.columns.mapping'=':o_orderkey,:o_orderdate,o_custkey,o_orderstatus,o_totalprice',
  'tablestore.table.name'='ots_tpch_orders',
  'odps.properties.rolearn'='acs:ram::xxxxx:role/aliyunodpsdefaultrole',
  'tablestore.read.mode'='permissive',
  'tablestore.corrupt.column'='ColumnName',
  'tablestore.timestamp.ticks.unit'='seconds',
  'tablestore.column.odps_createdate.timestamp.ticks.unit'='millis',
  'tablestore.table.put.row'='true'
)
LOCATION 'tablestore://odps-ots-dev.cn-shanghai.ots-internal.aliyuncs.com';

The following table describes the key parameters used in the preceding table creation statement.

Parameter

Required

Description

com.aliyun.odps.TableStoreStorageHandler

Yes

A built-in MaxCompute storage handler that is used to process Tablestore data. The storage handler defines the interaction between MaxCompute and Tablestore. The related logic is implemented by MaxCompute.

tablestore.columns.mapping

Yes

The columns of the Tablestore table that you want MaxCompute to access. The columns include primary key columns and attribute columns.

  • A colon (:) at the beginning of a column name indicates that the column is a primary key column of the Tablestore table, for example, :o_orderkey and :o_orderdate in the example. Other columns are attribute columns.

  • Tablestore supports 1 to 4 primary key columns. The data types of primary key columns can be STRING, INTEGER, and BINARY. The first primary key column is the partition key.

  • When you define column mappings, you must specify all primary key columns of the specified Tablestore table and the attribute columns that you use MaxCompute to access.

tablestore.table.name

Yes

The name of the Tablestore table that you want MaxCompute to access. In this example, the name is ots_tpch_orders.

odps.properties.rolearn

Yes

The Alibaba Cloud Resource Name (ARN) of AliyunODPSDefaultRole in Resource Access Management (RAM). You can obtain the ARN in the RAM Console. In the navigation pane on the left, choose Identity Management > Roles. Click the role name to go to the details page to obtain the ARN.

tablestore.timestamp.ticks.unit

No

Table-level time type settings. This parameter specifies that all fields of the INTEGER data type in the external table use the same time unit. Valid values:

  • seconds

  • millis

  • micros

  • nanos

tablestore.column.<col1_name>.timestamp.ticks.unit

No

Column-level time type settings. This parameter specifies the time unit of a column in the external table. Valid values:

  • seconds

  • millis

  • micros

  • nanos

Note

If both tablestore.timestamp.ticks.unit and tablestore.column.<col1_name>.timestamp.ticks.unit are configured, the tablestore.column.<col1_name>.timestamp.ticks.unit parameter takes precedence.

tablestore.table.put.row

No

Specifies the write mode of the PutRow operation. Valid values:

  • True: enabled.

  • False (default): disabled.

Note

You can configure the following flag parameter to specify the write mode of the PutRow operation. The default value is False. For more information, see Flag parameter list.

set odps.sql.unstructured.tablestore.put.row=true;

tablestore.read.mode

No

Specifies the read behavior if MaxCompute detects dirty data in the Tablestore external table. Valid values:

  • permissive (default): If MaxCompute detects dirty data, MaxCompute skips the dirty data.

  • failfast: If MaxCompute detects dirty data, MaxCompute reports an error.

For examples of dirty data processing, see Examples of dirty data processing.

tablestore.corrupt.column

No

Specifies the column to which dirty data is written.

  • This parameter is required only when the tablestore.read.mode parameter is set to permissive.

  • The specified column must be the last column of the MaxCompute external table.

  • You cannot specify the primary key column of the Tablestore table as the column to which dirty data is written.

For examples of dirty data processing, see Examples of dirty data processing.

LOCATION

Yes

Specifies information about Tablestore, such as the name and endpoint of a Tablestore instance. You must complete RAM or Security Token Service (STS) authorization to ensure secure access to Tablestore data.

Note

If an error indicating inconsistent network types is returned when you use the public endpoint, you can change the network type to the classic network.

You can execute the following statement to view the structure of the external table that you created:

DESC extended <table_name>;
Note

In the execution result, Extended Info includes the basic information about the external table, the information about the storage handler, and the location of the external table.

Query data in the foreign table

After you create an external table, you can execute a MaxCompute SQL statement to access Tablestore data using the external table. Example:

SELECT odps_orderkey, odps_orderdate, SUM(odps_totalprice) AS sum_total
FROM ots_table_external
WHERE odps_orderkey > 5000 AND odps_orderkey < 7000 AND odps_orderdate >= '1996-05-03' AND odps_orderdate < '1997-05-01'
GROUP BY odps_orderkey, odps_orderdate
HAVING sum_total> 400000.0;
Note

When you query foreign tables or fields, the table names and field names are not case-sensitive, and forcible uppercase and lowercase conversions are not supported.

If you access Tablestore data using MaxCompute SQL statements, all operations, such as the selection of column names, are performed in MaxCompute. In the preceding example, the column names are odps_orderkey and odps_totalprice rather than the names of the primary key column o_orderkey and attribute column o_totalprice in the Tablestore table. This is because the mappings are defined in the DDL statement used to create the external table. You can also retain the names of the primary key columns and attribute columns in the Tablestore table as needed.

If you want to compute one piece of data multiple times, you can import the data from Tablestore to an internal table of MaxCompute. This way, you do not need to read the data from Tablestore every time you want to compute the data using MaxCompute. Example:

CREATE TABLE internal_orders AS
SELECT odps_orderkey, odps_orderdate, odps_custkey, odps_totalprice
FROM ots_table_external
WHERE odps_orderkey > 5000 ;

internal_orders is a MaxCompute table that supports all features of a MaxCompute internal table. The internal_orders table uses the efficiently compressed column store and contains complete internal macro data and statistical information. The internal_orders table is stored in MaxCompute. Therefore, you can access the internal_orders table faster than a Tablestore table. This method is suitable for data that needs to be computed multiple times.

Export data from MaxCompute to Tablestore

Note

MaxCompute does not actively create a destination Tablestore table. Before you export data to a Tablestore table, make sure that the table exists. Otherwise, an error is reported.

An external table named ots_table_external is created to allow MaxCompute to access the ots_tpch_orders table in Tablestore. The data is stored in an internal MaxCompute table named internal_orders. If you want to process the data in the internal_orders table and then write the processed data back to Tablestore, you can execute the insert overwrite table statement on the external table. Example:

INSERT OVERWRITE TABLE ots_table_external
SELECT odps_orderkey, odps_orderdate, odps_custkey, CONCAT(odps_custkey, 'SHIPPED'), CEIL(odps_totalprice)
FROM internal_orders;
Note

If the data in the internal MaxCompute table is sorted based on primary keys, the data is written to a single partition of the Tablestore table. In this case, you cannot fully utilize distributed write operations. In this scenario, we recommend that you use distribute by rand() to randomly distribute the data. Example:

INSERT OVERWRITE TABLE ots_table_external
SELECT odps_orderkey, odps_orderdate, odps_custkey, CONCAT(odps_custkey, 'SHIPPED'), CEIL(odps_totalprice)
FROM (SELECT * FROM internal_orders DISTRIBUTE BY rand()) t;

Tablestore is a NoSQL data storage service that stores data in the key-value pair format. Data outputs from MaxCompute affect only the rows that include the primary keys of the Tablestore table. In this example, only the rows that include odps_orderkey and odps_orderdate are affected. Only the attribute columns that are specified when you create the ots_table_external table are updated. The columns that are not included in the external table are not modified.

Note
  • If the size of data that you want to write from MaxCompute to Tablestore at a time is greater than 4 MB, an error may occur. In this case, you must remove the excess data and then write data to Tablestore again.

    ODPS-0010000:System internal error - Output to TableStore failed with exception:
    TableStore BatchWrite request id XXXXX failed with error code OTSParameterInvalid and message:The total data size of BatchWriteRow request exceeds the limit
  • Writing multiple data entries at the same time or by row is considered a single operation. For more information, see BatchWriteRow. If you want to write large amounts of data at a time, you can write the data by row.

  • If you want to write multiple data entries at a time, make sure that you do not write duplicate rows. If duplicate rows exist, the following error may occur:

    ErrorCode: OTSParameterInvalid, ErrorMessage: The input parameter is invalid 

    For more information, see OTSParameterInvalid error when you use BatchWriteRow to submit 100 data entries at a time.

  • Because Tablestore is a key-value storage service, when you use insert overwrite table to write data to a Tablestore table, the entire content of the destination Tablestore table is not cleared. Only the values of the keys that match the keys in the source table are overwritten.

Examples of dirty data processing

  1. Create a Tablestore table named mf_ots_test and prepare data. For more information, see Quick Start for wide table model.

    The following sample code shows the default data of the Tablestore table.

    +----+------+------+
    | id | name | desc |
    +----+------+------+
    | 1  | Jack | Description of Jack |
    +----+------+------+
  2. Create a MaxCompute external table.

    CREATE EXTERNAL TABLE IF NOT EXISTS mf_ots_external_permi
    (
     id string,
    	name bigint,
    	desc string,
    	corrupt_col string
    )
    STORED BY 'com.aliyun.odps.TableStoreStorageHandler'
    WITH SERDEPROPERTIES (
      'tablestore.columns.mapping'=':id,name,desc',
      'tablestore.table.name'='mf_ots_test',
      'tablestore.read.mode'='permissive',
      'tablestore.corrupt.column'='corrupt_col',
      'odps.properties.rolearn'='acs:ram::139699392458****:role/aliyunodpsdefaultrole'
    )
    LOCATION 'tablestore://santie-doc.cn-shanghai.ots-internal.aliyuncs.com';
  3. Run the following code to query data in the MaxCompute external table:

    ---Query data
    SELECT * FROM mf_ots_external_permi;

    The following result is returned. The error field is written to the corrupt_col column in the JSON format.

    --
    +------------+------------+------------+-------------+
    | id         | name       | desc       | corrupt_col |
    +------------+------------+------------+-------------+
    | 1          | NULL       | Description of Jack      | {"name": "\"Jack\""} |
    +------------+------------+------------+-------------+
    Note

    If tablestore.read.mode is not configured or is set to permissive but tablestore.corrupt.column is not configured to specify the column to which dirty data is written, the error message "Columns not match with columns mapping and corrupt column" is returned when you query the external table.