All Products
Search
Document Center

ApsaraDB for ClickHouse:Import from OSS

Last Updated:Mar 28, 2026

Import data from Object Storage Service (OSS) into ApsaraDB for ClickHouse to run log queries, analytics, and data processing workloads. Two import methods are supported: the OSS table engine for persistent access to a fixed OSS data source, and the oss() table function for one-off or ad hoc imports.

Prerequisites

Before you begin, ensure that you have:

Choose a method

MethodBest for
OSS table engineA fixed OSS data source you query repeatedly. Define it once as an external table and reuse it.
`oss()` table functionA one-off or ad hoc import. No external table required — OSS connection details are passed directly in the INSERT INTO statement.

Both methods use version-specific syntax. Check your cluster version in the Cluster Properties section on the Cluster Information page before running any commands.

Prepare test data

Save the following data as test.csv and upload it to your OSS bucket. ApsaraDB for ClickHouse uses a comma (,) as the default column delimiter.

1,yang,32,shanghai,http://example1.com
2,wang,22,beijing,http://example2.com
3,xiao,23,shenzhen,http://example3.com
4,jess,45,hangzhou,http://example4.com
5,jack,14,shanghai,http://example5.com
6,tomy,25,hangzhou,http://example6.com
7,lucy,45,shanghai,http://example7.com
8,tengyin,26,shanghai,http://example8.com
9,wangli,27,shenzhen,http://example9.com
10,xiaohua,37,shanghai,http://example10.com

Import data from OSS

Step 1: Connect to the cluster

Connect to your ApsaraDB for ClickHouse cluster.

Step 2: Create a local table

Create the local table oss_test_tbl_local. Select the CREATE TABLE statement that matches your cluster's replica configuration, which you can view in the Cluster Properties section on the Cluster Information page.

Important
  • The schema of your ApsaraDB for ClickHouse table must match the schema of the OSS data. Pay special attention to nullable fields — mismatches cause data parsing failures and cluster exceptions.

  • Select the appropriate CREATE TABLE statement based on the replica configuration of your cluster. You can view the replica configuration in the Cluster Properties section on the Cluster Information page in the console.

Single-replica cluster

CREATE TABLE oss_test_tbl_local ON CLUSTER default
(
  id         UInt8,
  user_name  String,
  age        UInt16,
  city       String,
  access_url String
)
ENGINE = MergeTree()
ORDER BY id;

Dual-replica cluster

CREATE TABLE oss_test_tbl_local ON CLUSTER default
(
  id         UInt8,
  user_name  String,
  age        UInt16,
  city       String,
  access_url String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{database}/{table}/{shard}', '{replica}')
ORDER BY id;

For more information about table engines, see Table engines.

Step 3: Create a distributed table (optional)

Create the distributed table oss_test_tbl_distributed if you want to distribute imported data across all local tables simultaneously. Skip this step if you only need the data in a single local table.

CREATE TABLE oss_test_tbl_distributed ON CLUSTER default
(
  id         UInt8,
  user_name  String,
  age        UInt16,
  city       String,
  access_url String
)
ENGINE = Distributed(default, default, oss_test_tbl_local, rand());
Note

Create a distributed table if you want to distribute data to all local tables in a collection at the same time.

Step 4: Import data

Choose the method that fits your use case.

Method 1: OSS table engine

Use this method when you need to repeatedly query or import from the same OSS data source. Define the external table once; subsequent imports only require an INSERT INTO ... SELECT statement.

Create an OSS external table

The syntax differs by cluster version.

Syntax for versions earlier than 22.8

Syntax for version 22.8 and later

Syntax for versions earlier than 22.8

CREATE TABLE <table_name> [ON CLUSTER cluster]
(
  'col_name1' col_type1,
  'col_name2' col_type2,
  ...
)
ENGINE = OSS('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>');

Syntax for version 22.8 and later

CREATE TABLE <table_name> [ON CLUSTER cluster]
(
  'col_name1' col_type1,
  'col_name2' col_type2,
  ...
)
ENGINE = OSS('https://<BucketName>.<oss-endpoint>/<file-name>', '<access-key-id>', '<access-key-secret>', '<file-format-name>');

Parameters

ParameterDescription
table_nameThe name of the table.
ON CLUSTER clusterCreates the table on each node. The value is fixed as ON CLUSTER default.
col_name1, col_name2Column names.
col_type1, col_type2Column data types. The schema must match the data in OSS.
BucketNameThe name of the OSS bucket.
oss-endpointThe internal VPC endpoint for OSS. The bucket and cluster must be in the same region. Access OSS through Alibaba Cloud VPC. For available endpoints, see Regions and endpoints.
file-nameThe name of the file in OSS.
access-key-idThe AccessKey ID used to access OSS. For details, see Obtain an AccessKey pair.
access-key-secretThe AccessKey secret used to access OSS. For details, see Obtain an AccessKey pair.
oss-file-pathThe OSS storage path, in the format oss://<bucket-name>/<path-to-file>. Supports wildcard characters for fuzzy matching.
file-format-nameThe file format. This example uses CSV.

Example for versions earlier than 22.8

CREATE TABLE oss_test_tbl ON CLUSTER default
(
  id         UInt8,
  user_name  String,
  age        UInt16,
  city       String,
  access_url String
)
ENGINE = OSS('oss-cn-shanghai-internal.aliyuncs.com', 'LTAI****************', 'yourAccessKeySecret', 'oss://testBucketName/test.csv', 'CSV');

Example for version 22.8 and later

CREATE TABLE oss_test_tbl ON CLUSTER default
(
  id         UInt8,
  user_name  String,
  age        UInt16,
  city       String,
  access_url String
)
ENGINE = OSS('http://testBucketName.oss-cn-shanghai-internal.aliyuncs.com/test.csv', 'STS.****************', 'STS.****************', 'CSV');

Load data into ClickHouse

Run an INSERT INTO ... SELECT statement to import data from the OSS external table into your distributed table:

INSERT INTO oss_test_tbl_distributed SELECT * FROM oss_test_tbl;

To import into a local table instead, replace oss_test_tbl_distributed with the local table name.

If your CSV file uses a delimiter other than a comma, specify it with format_csv_delimiter. For example, to import a pipe-delimited (|) file:

INSERT INTO oss_test_tbl_distributed SELECT * FROM oss_test_tbl SETTINGS format_csv_delimiter='|';

Method 2:oss()table function

Use this method for a one-off import. No external table is required — pass the OSS connection details directly in the INSERT INTO statement.

Syntax for versions earlier than 22.8

Syntax for version 22.8 and later

Syntax for versions earlier than 22.8

INSERT INTO <table_name> SELECT * FROM oss('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>', '<col_name> <col_type>(,...)');

Syntax for version 22.8 and later

INSERT INTO <table_name> SELECT * FROM oss('https://<BucketName>.<oss-endpoint>/<file-name>', '<access-key-id>', '<access-key-secret>', '<file-format-name>', '<col_name> <col_type>(,...)');

For more information about the parameters, see the parameter descriptions.

Example for versions earlier than 22.8

INSERT INTO oss_test_tbl_distributed SELECT * FROM oss('oss-cn-shanghai-internal.aliyuncs.com', 'LTAI****************', 'yourAccessKeySecret', 'oss://testBucketName/test.csv', 'CSV', 'id UInt8, user_name String, age UInt16, city String, access_url String');

Example for version 22.8 and later

INSERT INTO oss_test_tbl_distributed SELECT * FROM oss('http://testBucketName.oss-cn-shanghai-internal.aliyuncs.com/test.csv', 'STS.****************', 'STS.****************', 'CSV', 'id UInt8, user_name String, age UInt16, city String, access_url String');

If your CSV file uses a delimiter other than a comma, specify it with format_csv_delimiter:

INSERT INTO oss_test_tbl_distributed SELECT * FROM oss('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>', '<col_name> <col_type>(,...)') SETTINGS format_csv_delimiter='|';

Step 5: Verify the import

Query the distributed table to confirm the data was imported correctly:

SELECT * FROM oss_test_tbl_distributed;

Expected output:

┌─id─┬─user_name─┬──age──┬───city─────┬─────access_url────────┐
│  1 │  yang     │   32  │  shanghai  │  http://example1.com  │
│  2 │  wang     │   22  │  beijing   │  http://example2.com  │
│  3 │  xiao     │   23  │  shenzhen  │  http://example3.com  │
│  4 │  jess     │   45  │  hangzhou  │  http://example4.com  │
│  5 │  jack     │   14  │  shanghai  │  http://example5.com  │
│  6 │  tomy     │   25  │  hangzhou  │  http://example6.com  │
│  7 │  lucy     │   45  │  shanghai  │  http://example7.com  │
│  8 │  tengyin  │   26  │  shanghai  │  http://example8.com  │
│  9 │  wangli   │   27  │  shenzhen  │  http://example9.com  │
│ 10 │  xiaohua  │   37  │  shanghai  │  http://example10.com │
└────┴───────────┴───────┴────────────┴───────────────────────┘

Match multiple OSS files with wildcards

When OSS stores many small files that follow a consistent naming pattern, use wildcard characters in the oss-file-path parameter to match them in a single import.

WildcardMatchesExample
*Any file name or folder name/dir/* matches all files in /dir
{x,y,z}Any of the listed valuesfile_{x,y,z} matches file_x, file_y, or file_z
{num1..num2}Any value in the numeric rangefile_{1..3} matches file_1, file_2, or file_3
?Any single characterfile_? matches file_a, file_b, or file_c

Example

Given the following folder structure in OSS:

oss://testBucketName/
  doc-data/
    oss-import/
      small_files/
        access_log_csv_1.txt
        access_log_csv_2.txt
        access_log_csv_3.txt

The following oss-file-path values all match these files:

  • oss://testBucketName/doc-data/oss-import/small_files/*

  • oss://testBucketName/doc-data/oss-import/small_files/access*

  • oss://testBucketName/doc-data/oss-import/small_files/access_log_csv_{1,2,3}.txt

  • oss://testBucketName/doc-data/oss-import/*/access_log_csv_{1,2,3}.txt

  • oss://testBucketName/doc-data/oss-import/*/*

  • oss://testBucketName/doc-data/oss-import/*/access_log_csv_{1..3}.txt

  • oss://testBucketName/doc-data/oss-import/*/access_log_csv_?.txt