This topic describes how to use an external table to import MaxCompute data to an ApsaraDB ClickHouse cluster. The kernel version of the cluster is 19.15.2.2.

Import data

  1. Log on to the destination ApsaraDB for ClickHouse cluster and create an external table.
    Use the following syntax to create an external table:
    CREATE TABLE <table_name> [on cluster default]
    (
    'col_name' col_type,[col_name col_type]
    )
    ENGINE = MaxCompute('<tunnel-endpoint>', '<project-name>', '<table-name>', '<partition-spec>', '<access-key-id>', '<access-key-secret>');
    Parameter Description
    table_name The name of the custom external table.
    'col_name' col_type The name and data type of the column.

    The data types of columns in ApsaraDB for ClickHouse must be mapped to those of columns in MaxCompute. For more information, see Data type mapping between MaxCompute and ApsaraDB for ClickHouse.

    ENGINE = MaxCompute The MaxCompute storage engine that is used for the external table.
    tunnel-endpoint The endpoint of MaxCompute.
    Note To ensure a connection between the ApsaraDB for ClickHouse cluster and MaxCompute, the value of tunnel-endpoint must be a VPC endpoint, and MaxCompute must be deployed in the same region as the ApsaraDB for ClickHouse cluster. For example, MaxCompute and the ApsaraDB for ClickHouse cluster are deployed in the China (Shanghai) region. For information about the tunnel endpoints of MaxCompute, see Endpoints.
    project-name The name of the MaxCompute project which contains the data that you want to import.
    Note The project name must be the same as the project name that is used in MaxCompute.
    table-name The name of the MaxCompute table which contains the data that you want to import.
    Note The table name must be the same as the table name that is used in MaxCompute.
    partition-spec The MaxCompute partition that you want to access.

    You must specify partitions. The expression to specify multiple partitions cannot contain spaces or single quotation marks ('). Example: sale_date=2020-09-01,region=beijing.

    access-key-id The AccessKey ID of your account that is used to access the data.
    Note The account must have read permissions on the MaxCompute project and the MaxCompute table.
    access-key-secret The AccessKey secret of your account that is used to access the MaxCompute table.
    Example
    CREATE TABLE default.odps_src_tbl on cluster default
    (
        `shop_name` String,
        `customer_id` String,
        `total_price` Float64
    )
    ENGINE = MaxCompute('http://dt.cn-shanghai.maxcompute.aliyun-inc.com', 'test_project', 'test_tbl', 'sale_date=2020-09-01,region=beijing', '<your-access-key-id>', '<your-access-key-secret>');
  2. Create a destination table.
    CREATE TABLE default.odps_dst_tbl_local on cluster default
    (
        `shop_name` String,
        `customer_id` String,
        `total_price` Float64
    )
    ENGINE= MergeTree()
    ORDER BY customer_id;
    
    CREATE TABLE default.odps_dst_tbl_dist on cluster default
    as default.odps_dst_tbl_local
    ENGINE= Distributed(default, default, odps_dst_tbl_local, rand());
  3. Import data from the external table to the destination table.
    insert into odps_dst_tbl_dist select * from odps_src_tbl;

Data type mapping between MaxCompute and ApsaraDB for ClickHouse

Data type in MaxCompute Data type in ApsaraDB for ClickHouse
boolean UInt8
tinyint UInt8, Int8
smalllint UInt16, Int16
int UInt32, Int32
bigint UInt64, Int64
float Float32
double Float64
char String
varchar String
binary String
string String
date UInt16
datetime UInt32
UUID Not supported
INTERVAL Not supported
Decimal Not supported
Timestamp Not supported
Composite data type Not supported