Use an external table to read MaxCompute data directly in ApsaraDB for ClickHouse, then insert it into a local distributed table. This page applies to clusters running kernel version 19.15.2.2.
Prerequisites
Before you begin, ensure that you have:
An ApsaraDB for ClickHouse cluster running kernel version 19.15.2.2
A MaxCompute project and table with the data you want to import
An AccessKey ID and AccessKey secret with read permissions on the MaxCompute project and table
MaxCompute deployed in the same region as your ApsaraDB for ClickHouse cluster
The Virtual Private Cloud (VPC) tunnel endpoint for MaxCompute in that region (see Endpoint)
Import data
The import process has three steps: create an external table that maps to MaxCompute, create the destination table, then insert the data.
Step 1: Create an external table
Log on to your ApsaraDB for ClickHouse cluster and run the following statement, replacing the placeholder values with your own:
CREATE TABLE default.odps_src_tbl ON CLUSTER default
(
`shop_name` String,
`customer_id` String,
`total_price` Float64
)
ENGINE = MaxCompute(
'http://dt.cn-shanghai.maxcompute.aliyun-inc.com', -- VPC tunnel endpoint
'test_project', -- MaxCompute project name
'test_tbl', -- MaxCompute table name
'sale_date=2020-09-01,region=beijing', -- partition spec
'<your-access-key-id>', -- AccessKey ID
'<your-access-key-secret>' -- AccessKey secret
);ON CLUSTER default clause is optional. Include it only if you are running a multi-node cluster and want the table to be created on all nodes.The ENGINE = MaxCompute(...) parameters are positional and must appear in this order:
| Parameter | Description |
|---|---|
tunnel-endpoint | VPC tunnel endpoint for MaxCompute. MaxCompute must be in the same region as the ApsaraDB for ClickHouse cluster. |
project-name | Name of the MaxCompute project. Must match the project name exactly as it appears in MaxCompute. |
table-name | Name of the MaxCompute table. Must match the table name exactly as it appears in MaxCompute. |
partition-spec | Partition to read. Partitions are required. The expression must not contain spaces or single quotation marks ('). To specify multiple partitions, use comma-separated key-value pairs, for example: sale_date=2020-09-01,region=beijing. |
access-key-id | AccessKey ID of an account with read permissions on the MaxCompute project and table. |
access-key-secret | AccessKey secret of the same account. |
Column names and data types in the external table must match the corresponding columns in the MaxCompute table. For the supported type mappings, see Data type mapping.
Step 2: Create the destination table
Create a local MergeTree table and a distributed table on top of it:
-- Local table
CREATE TABLE default.odps_dst_tbl_local ON CLUSTER default
(
`shop_name` String,
`customer_id` String,
`total_price` Float64
)
ENGINE = MergeTree()
ORDER BY customer_id;
-- Distributed table (used for writes and queries across the cluster)
CREATE TABLE default.odps_dst_tbl_dist ON CLUSTER default
AS default.odps_dst_tbl_local
ENGINE = Distributed(default, default, odps_dst_tbl_local, rand());Step 3: Insert the data
Insert data from the external table into the distributed table:
INSERT INTO odps_dst_tbl_dist SELECT * FROM odps_src_tbl;After the insert completes, query the distributed table to verify the data was imported correctly.
Data type mapping between MaxCompute and ApsaraDB for ClickHouse
When creating the external table, map each MaxCompute column type to its corresponding ApsaraDB for ClickHouse type:
| MaxCompute type | ApsaraDB for ClickHouse type |
|---|---|
| boolean | UInt8 |
| tinyint | UInt8, Int8 |
| smallint | UInt16, Int16 |
| int | UInt32, Int32 |
| bigint | UInt64, Int64 |
| float | Float32 |
| double | Float64 |
| char | String |
| varchar | String |
| binary | String |
| string | String |
| date | UInt16 |
| datetime | UInt32 |
| UUID | Not supported |
| INTERVAL | Not supported |
| Decimal | Not supported |
| Timestamp | Not supported |
| Composite data type | Not supported |