This topic describes how to use an external table to import MaxCompute data to an ApsaraDB for ClickHouse cluster (for ClickHouse 20.3.10.75).

Basic syntax

Log on to the destination ApsaraDB for ClickHouse cluster and create an external table. The following syntax is used:
CREATE TABLE <table_name> [on cluster default]
(
'col_name' col_type,[col_name col_type]
)
ENGINE = MaxCompute('<tunnel-endpoint>', '<project-name>', '<table-name>', '<partition-spec>', '<access-key-id>', '<access-key-secret>', '<read-thread-num>');
Parameter Description
table_name The name of the custom external table.
'col_name' col_type The name and type of the column. The column type of the ApsaraDB for ClickHouse table must map that of the MaxCompute table. For more information, see Data type mapping between MaxCompute and ApsaraDB for ClickHouse.
ENGINE = MaxCompute Indicates an external table. The storage engine is MaxCompute.
tunnel-endpoint The endpoint of MaxCompute.
Note To ensure normal connection between the ApsaraDB for ClickHouse cluster and MaxCompute, the tunnel-endpoint must be an endpoint of VPC type and MaxCompute must be in the same region as the ApsaraDB for ClickHouse cluster, such as both in the China (Shanghai) region. For more information about how to view the tunnel endpoints of MaxCompute, see Configure endpoints.
project-name The name of the MaxCompute project.
Note It must be consistent with the name in MaxCompute.
table-name The name of the MaxCompute table.
Note It must be consistent with the name in MaxCompute.
partition-spec The MaxCompute partition that you want to access. You must specify partitions. The expression to specify multiple partitions cannot contain spaces or single quotation marks ('). Example: sale_date=2020-09-01,region=beijing.
access-key-id The AccessKey ID used to access the MaxCompute table.
Note The account corresponding to the AccessKey ID must have the read permissions on the MaxCompute project and table.
access-key-secret The AccessKey secret used to access the destination table in MaxCompute.
read-thread-num (optional) The number of concurrent reads for a single shard. Default value: 1.

Example

If a MaxCompute table contains the following data types:

CREATE TABLE IF NOT EXISTS data
(
    v1  TINYINT                  ,
    v2  SMALLINT                 ,
    v3  INT                      ,
    v4  BIGINT                   ,
    v5  FLOAT                    ,
    v7  DOUBLE                   ,
    v8  DECIMAL(38,18)           ,
    v9  VARCHAR(20)              ,
    v10 CHAR(20)                 ,
    v11 STRING                   ,
    v13 DATETIME                 ,
    v14 DATE                     ,
    v15 TIMESTAMP                ,
    v16 BOOLEAN                  ,
    v18 ARRAY<STRING>            ,
    v19 ARRAY<INT>               ,
    v20 MAP<STRING,INT>          ,
    v21 STRUCT<A1:STRING, A2:INT>
) 
PARTITIONED BY
(
    v17 STRING                   
)
LIFECYCLE 100;

You can create the following external table:

CREATE TABLE default.odpsTable ON CLUSTER default
(
    `v1` Int8, 
    `v2` Nullable(Int16), 
    `v3` Nullable(Int32), 
    `v4` Nullable(Int64), 
    `v5` Nullable(Float32), 
    `v7` Nullable(Float64), 
    `v8` Nullable(String), 
    `v9` Nullable(String), 
    `v10` Nullable(String), 
    `v11` Nullable(String), 
    `v13` Nullable(Datetime), 
    `v14` Nullable(Date), 
    `v16` Nullable(UInt8), 
    `v18` Array(Nullable(String)), 
    `v19` Array(Nullable(Int32)), 
    `v20` Nested(
    aa Nullable(String), 
    bb Nullable(Int32)), 
    `v21` Tuple(Nullable(String), Nullable(Int32))
)
ENGINE = MaxCompute('http://dt.cn-hangzhou.maxcompute.aliyun-inc.com', '*', 'data', 'v17=2020', '*', '*', 3)

Create an ApsaraDB for ClickHouse table and a distributed table.

CREATE TABLE default.odpsTableData ON CLUSTER default
(
    `v1` Int8, 
    `v2` Nullable(Int16), 
    `v3` Nullable(Int32), 
    `v4` Nullable(Int64), 
    `v5` Nullable(Float32), 
    `v7` Nullable(Float64), 
    `v8` Nullable(String), 
    `v9` Nullable(String), 
    `v10` Nullable(String), 
    `v11` Nullable(String), 
    `v13` Nullable(Datetime), 
    `v14` Nullable(Date), 
    `v16` Nullable(UInt8), 
    `v18` Array(Nullable(String)), 
    `v19` Array(Nullable(Int32)), 
    `v20` Nested(
    aa Nullable(String), 
    bb Nullable(Int32)), 
    `v21` Tuple(Nullable(String), Nullable(Int32))
)
ENGINE = MergeTree ORDER BY v1;


CREATE TABLE default.odpsTableData_dist ON CLUSTER default as odpsTableData ENGINE = Distributed(default, default, odpsTableData, rand());

Import data from the external table to the ApsaraDB for ClickHouse table.

insert into odpsTableData_dist select * from odpsTable;

Data type mapping between MaxCompute and ApsaraDB for ClickHouse

MaxCompute data type ApsaraDB for ClickHouse data type
BOOLEAN UINT8
TINYINT UINT8 or INT8
SMALLLINT UINT16 or INT16
INT UINT32 or INT32
BIGINT UINT64 or INT64
FLOAT FLOAT32
DOUBLE FLOAT64
CHAR STRING
VARCHAR STRING
BINARY STRING
STRING STRING
DATE DATE
DATETIME DATETIME
UUID Not supported
INTERVAL Not supported
DECIMAL STRING
TIMESTAMP Not supported
MAP NESTED(Nullable(keyType), Nullable(valueType)). It only supports non-nested MAP structures.

For example, MAP(string, string) in MaxCompute corresponds to NESTED(k: Nullable(String), v Nullable(String)) in ApsaraDB for ClickHouse. k and v can be any names.

Note The data within the NESTED structure must be nullable. Otherwise, an error occurs when a table is created.
ARRAY ARRAY(Nullable(Type)). It only supports non-nested ARRAY structures.
For example, ARRAY(int) in MaxCompute corresponds to ARRAY(Nullable(Int32)) in ApsaraDB for ClickHouse.
Note The data within the ARRAY structure must be nullable. Otherwise, an error occurs when a table is created.
STRUCT TUPLE(Nullable(Type)) only supports non-nested STRUCT structures. Only basic data types are allowed in STRUCT structures.
For example, STRUCT<x:int, y:string> in MaxCompute corresponds to TUPLE(Nullable(Int32), Nullable(String)) in ApsaraDB for ClickHouse. The attribute names in the MaxCompute table are lost. However, you can obtain the attribute names by using the TUPLE structure in ApsaraDB for ClickHouse based on the order of the original fields. For example, if field va is of the TUPLE type, you can use va.1 to obtain the first attribute name and va.2 to obtain the second attribute name. Other attribute names can be obtained in the same way.
Note The data within the TUPLE structure must be nullable. Otherwise, an error occurs when a table is created.