Import data from MaxCompute using an external table - ApsaraDB for ClickHouse

Use an external table to read MaxCompute data directly in ApsaraDB for ClickHouse, then insert it into a local distributed table. This page applies to clusters running kernel version 19.15.2.2.

Prerequisites

Before you begin, ensure that you have:

An ApsaraDB for ClickHouse cluster running kernel version 19.15.2.2
A MaxCompute project and table with the data you want to import
An AccessKey ID and AccessKey secret with read permissions on the MaxCompute project and table
MaxCompute deployed in the same region as your ApsaraDB for ClickHouse cluster
The Virtual Private Cloud (VPC) tunnel endpoint for MaxCompute in that region (see Endpoint)

Note The tunnel endpoint must be a VPC endpoint, not a public endpoint. Using a public endpoint prevents the cluster from connecting to MaxCompute.

Import data

The import process has three steps: create an external table that maps to MaxCompute, create the destination table, then insert the data.

Step 1: Create an external table

Log on to your ApsaraDB for ClickHouse cluster and run the following statement, replacing the placeholder values with your own:

CREATE TABLE default.odps_src_tbl ON CLUSTER default
(
    `shop_name`    String,
    `customer_id`  String,
    `total_price`  Float64
)
ENGINE = MaxCompute(
    'http://dt.cn-shanghai.maxcompute.aliyun-inc.com',  -- VPC tunnel endpoint
    'test_project',                                      -- MaxCompute project name
    'test_tbl',                                          -- MaxCompute table name
    'sale_date=2020-09-01,region=beijing',               -- partition spec
    '<your-access-key-id>',                              -- AccessKey ID
    '<your-access-key-secret>'                           -- AccessKey secret
);

Note The ON CLUSTER default clause is optional. Include it only if you are running a multi-node cluster and want the table to be created on all nodes.

The ENGINE = MaxCompute(...) parameters are positional and must appear in this order:

Parameter	Description
`tunnel-endpoint`	VPC tunnel endpoint for MaxCompute. MaxCompute must be in the same region as the ApsaraDB for ClickHouse cluster.
`project-name`	Name of the MaxCompute project. Must match the project name exactly as it appears in MaxCompute.
`table-name`	Name of the MaxCompute table. Must match the table name exactly as it appears in MaxCompute.
`partition-spec`	Partition to read. Partitions are required. The expression must not contain spaces or single quotation marks (`'`). To specify multiple partitions, use comma-separated key-value pairs, for example: `sale_date=2020-09-01,region=beijing`.
`access-key-id`	AccessKey ID of an account with read permissions on the MaxCompute project and table.
`access-key-secret`	AccessKey secret of the same account.

Column names and data types in the external table must match the corresponding columns in the MaxCompute table. For the supported type mappings, see Data type mapping.

Step 2: Create the destination table

Create a local MergeTree table and a distributed table on top of it:

-- Local table
CREATE TABLE default.odps_dst_tbl_local ON CLUSTER default
(
    `shop_name`    String,
    `customer_id`  String,
    `total_price`  Float64
)
ENGINE = MergeTree()
ORDER BY customer_id;

-- Distributed table (used for writes and queries across the cluster)
CREATE TABLE default.odps_dst_tbl_dist ON CLUSTER default
AS default.odps_dst_tbl_local
ENGINE = Distributed(default, default, odps_dst_tbl_local, rand());

Step 3: Insert the data

Insert data from the external table into the distributed table:

INSERT INTO odps_dst_tbl_dist SELECT * FROM odps_src_tbl;

After the insert completes, query the distributed table to verify the data was imported correctly.

Data type mapping between MaxCompute and ApsaraDB for ClickHouse

When creating the external table, map each MaxCompute column type to its corresponding ApsaraDB for ClickHouse type:

MaxCompute type	ApsaraDB for ClickHouse type
boolean	UInt8
tinyint	UInt8, Int8
smallint	UInt16, Int16
int	UInt32, Int32
bigint	UInt64, Int64
float	Float32
double	Float64
char	String
varchar	String
binary	String
string	String
date	UInt16
datetime	UInt32
UUID	Not supported
INTERVAL	Not supported
Decimal	Not supported
Timestamp	Not supported
Composite data type	Not supported