Import data from OSS using table engines or table functions - ApsaraDB for ClickHouse

Import data from Object Storage Service (OSS) into ApsaraDB for ClickHouse to run log queries, analytics, and data processing workloads. Two import methods are supported: the OSS table engine for persistent access to a fixed OSS data source, and the oss() table function for one-off or ad hoc imports.

Prerequisites

Before you begin, ensure that you have:

OSS activated. For details, see Activate OSS
A bucket in the same region as your ApsaraDB for ClickHouse cluster. For details, see Create a bucket
Read permissions on the OSS objects you want to import. For details, see Overview of permissions and access control

Choose a method

Method	Best for
OSS table engine	A fixed OSS data source you query repeatedly. Define it once as an external table and reuse it.
`oss()` table function	A one-off or ad hoc import. No external table required — OSS connection details are passed directly in the `INSERT INTO` statement.

Both methods use version-specific syntax. Check your cluster version in the Cluster Properties section on the Cluster Information page before running any commands.

Prepare test data

Save the following data as test.csv and upload it to your OSS bucket. ApsaraDB for ClickHouse uses a comma (,) as the default column delimiter.

1,yang,32,shanghai,http://example1.com
2,wang,22,beijing,http://example2.com
3,xiao,23,shenzhen,http://example3.com
4,jess,45,hangzhou,http://example4.com
5,jack,14,shanghai,http://example5.com
6,tomy,25,hangzhou,http://example6.com
7,lucy,45,shanghai,http://example7.com
8,tengyin,26,shanghai,http://example8.com
9,wangli,27,shenzhen,http://example9.com
10,xiaohua,37,shanghai,http://example10.com

Import data from OSS

Step 1: Connect to the cluster

Connect to your ApsaraDB for ClickHouse cluster.

Step 2: Create a local table

Create the local table oss_test_tbl_local. Select the CREATE TABLE statement that matches your cluster's replica configuration, which you can view in the Cluster Properties section on the Cluster Information page.

Important

The schema of your ApsaraDB for ClickHouse table must match the schema of the OSS data. Pay special attention to nullable fields — mismatches cause data parsing failures and cluster exceptions.
Select the appropriate CREATE TABLE statement based on the replica configuration of your cluster. You can view the replica configuration in the Cluster Properties section on the Cluster Information page in the console.

Single-replica cluster

CREATE TABLE oss_test_tbl_local ON CLUSTER default
(
  id         UInt8,
  user_name  String,
  age        UInt16,
  city       String,
  access_url String
)
ENGINE = MergeTree()
ORDER BY id;

Dual-replica cluster

CREATE TABLE oss_test_tbl_local ON CLUSTER default
(
  id         UInt8,
  user_name  String,
  age        UInt16,
  city       String,
  access_url String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{database}/{table}/{shard}', '{replica}')
ORDER BY id;

For more information about table engines, see Table engines.

Step 3: Create a distributed table (optional)

Create the distributed table oss_test_tbl_distributed if you want to distribute imported data across all local tables simultaneously. Skip this step if you only need the data in a single local table.

CREATE TABLE oss_test_tbl_distributed ON CLUSTER default
(
  id         UInt8,
  user_name  String,
  age        UInt16,
  city       String,
  access_url String
)
ENGINE = Distributed(default, default, oss_test_tbl_local, rand());

Note

Create a distributed table if you want to distribute data to all local tables in a collection at the same time.

Step 4: Import data

Choose the method that fits your use case.

Method 1: OSS table engine

Use this method when you need to repeatedly query or import from the same OSS data source. Define the external table once; subsequent imports only require an INSERT INTO ... SELECT statement.

Create an OSS external table

The syntax differs by cluster version.

Syntax for versions earlier than 22.8

Syntax for version 22.8 and later

Syntax for versions earlier than 22.8

CREATE TABLE <table_name> [ON CLUSTER cluster]
(
  'col_name1' col_type1,
  'col_name2' col_type2,
  ...
)
ENGINE = OSS('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>');

Syntax for version 22.8 and later

CREATE TABLE <table_name> [ON CLUSTER cluster]
(
  'col_name1' col_type1,
  'col_name2' col_type2,
  ...
)
ENGINE = OSS('https://<BucketName>.<oss-endpoint>/<file-name>', '<access-key-id>', '<access-key-secret>', '<file-format-name>');

Parameters

Parameter	Description
`table_name`	The name of the table.
`ON CLUSTER cluster`	Creates the table on each node. The value is fixed as `ON CLUSTER default`.
`col_name1`, `col_name2`	Column names.
`col_type1`, `col_type2`	Column data types. The schema must match the data in OSS.
`BucketName`	The name of the OSS bucket.
`oss-endpoint`	The internal VPC endpoint for OSS. The bucket and cluster must be in the same region. Access OSS through Alibaba Cloud VPC. For available endpoints, see Regions and endpoints.
`file-name`	The name of the file in OSS.
`access-key-id`	The AccessKey ID used to access OSS. For details, see Obtain an AccessKey pair.
`access-key-secret`	The AccessKey secret used to access OSS. For details, see Obtain an AccessKey pair.
`oss-file-path`	The OSS storage path, in the format `oss://<bucket-name>/<path-to-file>`. Supports wildcard characters for fuzzy matching.
`file-format-name`	The file format. This example uses `CSV`.

Example for versions earlier than 22.8

CREATE TABLE oss_test_tbl ON CLUSTER default
(
  id         UInt8,
  user_name  String,
  age        UInt16,
  city       String,
  access_url String
)
ENGINE = OSS('oss-cn-shanghai-internal.aliyuncs.com', 'LTAI****************', 'yourAccessKeySecret', 'oss://testBucketName/test.csv', 'CSV');

Example for version 22.8 and later

CREATE TABLE oss_test_tbl ON CLUSTER default
(
  id         UInt8,
  user_name  String,
  age        UInt16,
  city       String,
  access_url String
)
ENGINE = OSS('http://testBucketName.oss-cn-shanghai-internal.aliyuncs.com/test.csv', 'STS.****************', 'STS.****************', 'CSV');

Load data into ClickHouse

Run an INSERT INTO ... SELECT statement to import data from the OSS external table into your distributed table:

INSERT INTO oss_test_tbl_distributed SELECT * FROM oss_test_tbl;

To import into a local table instead, replace oss_test_tbl_distributed with the local table name.

If your CSV file uses a delimiter other than a comma, specify it with format_csv_delimiter. For example, to import a pipe-delimited (|) file:

INSERT INTO oss_test_tbl_distributed SELECT * FROM oss_test_tbl SETTINGS format_csv_delimiter='|';

Method 2:oss()table function

Use this method for a one-off import. No external table is required — pass the OSS connection details directly in the INSERT INTO statement.

Syntax for versions earlier than 22.8

Syntax for version 22.8 and later

Syntax for versions earlier than 22.8

INSERT INTO <table_name> SELECT * FROM oss('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>', '<col_name> <col_type>(,...)');

Syntax for version 22.8 and later

INSERT INTO <table_name> SELECT * FROM oss('https://<BucketName>.<oss-endpoint>/<file-name>', '<access-key-id>', '<access-key-secret>', '<file-format-name>', '<col_name> <col_type>(,...)');

For more information about the parameters, see the parameter descriptions.

Example for versions earlier than 22.8

INSERT INTO oss_test_tbl_distributed SELECT * FROM oss('oss-cn-shanghai-internal.aliyuncs.com', 'LTAI****************', 'yourAccessKeySecret', 'oss://testBucketName/test.csv', 'CSV', 'id UInt8, user_name String, age UInt16, city String, access_url String');

Example for version 22.8 and later

INSERT INTO oss_test_tbl_distributed SELECT * FROM oss('http://testBucketName.oss-cn-shanghai-internal.aliyuncs.com/test.csv', 'STS.****************', 'STS.****************', 'CSV', 'id UInt8, user_name String, age UInt16, city String, access_url String');

If your CSV file uses a delimiter other than a comma, specify it with format_csv_delimiter:

INSERT INTO oss_test_tbl_distributed SELECT * FROM oss('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>', '<col_name> <col_type>(,...)') SETTINGS format_csv_delimiter='|';

Step 5: Verify the import

Query the distributed table to confirm the data was imported correctly:

SELECT * FROM oss_test_tbl_distributed;

Expected output:

┌─id─┬─user_name─┬──age──┬───city─────┬─────access_url────────┐
│  1 │  yang     │   32  │  shanghai  │  http://example1.com  │
│  2 │  wang     │   22  │  beijing   │  http://example2.com  │
│  3 │  xiao     │   23  │  shenzhen  │  http://example3.com  │
│  4 │  jess     │   45  │  hangzhou  │  http://example4.com  │
│  5 │  jack     │   14  │  shanghai  │  http://example5.com  │
│  6 │  tomy     │   25  │  hangzhou  │  http://example6.com  │
│  7 │  lucy     │   45  │  shanghai  │  http://example7.com  │
│  8 │  tengyin  │   26  │  shanghai  │  http://example8.com  │
│  9 │  wangli   │   27  │  shenzhen  │  http://example9.com  │
│ 10 │  xiaohua  │   37  │  shanghai  │  http://example10.com │
└────┴───────────┴───────┴────────────┴───────────────────────┘

Match multiple OSS files with wildcards

When OSS stores many small files that follow a consistent naming pattern, use wildcard characters in the oss-file-path parameter to match them in a single import.

Wildcard	Matches	Example
`*`	Any file name or folder name	`/dir/*` matches all files in `/dir`
`{x,y,z}`	Any of the listed values	`file_{x,y,z}` matches `file_x`, `file_y`, or `file_z`
`{num1..num2}`	Any value in the numeric range	`file_{1..3}` matches `file_1`, `file_2`, or `file_3`
`?`	Any single character	`file_?` matches `file_a`, `file_b`, or `file_c`

Example

Given the following folder structure in OSS:

oss://testBucketName/
  doc-data/
    oss-import/
      small_files/
        access_log_csv_1.txt
        access_log_csv_2.txt
        access_log_csv_3.txt

The following oss-file-path values all match these files:

oss://testBucketName/doc-data/oss-import/small_files/*
oss://testBucketName/doc-data/oss-import/small_files/access*
oss://testBucketName/doc-data/oss-import/small_files/access_log_csv_{1,2,3}.txt
oss://testBucketName/doc-data/oss-import/*/access_log_csv_{1,2,3}.txt
oss://testBucketName/doc-data/oss-import/*/*
oss://testBucketName/doc-data/oss-import/*/access_log_csv_{1..3}.txt
oss://testBucketName/doc-data/oss-import/*/access_log_csv_?.txt