All Products
Search
Document Center

ApsaraDB for ClickHouse:Use an external table to access OSS data

Last Updated:Apr 21, 2025

This topic describes how to query Object Storage Service (OSS) data from an ApsaraDB for ClickHouse cluster and how to write or export data from a cluster to OSS.

Prerequisites

  • OSS requirements:

    • OSS is activated. For more information, see Activate OSS.

    • A Resource Access Management (RAM) user that is used to access OSS is granted the read and write permissions on OSS objects. For more information, see Access control.

  • Cluster requirements:

    A database account is created. For more information, see Account Management.

Preparations

  1. An OSS bucket is created. For more information, see Create buckets in the console.

    Important

    Make sure that the OSS bucket is in the same region as your ApsaraDB for ClickHouse cluster.

    In this topic, a bucket named ck-test-oss is used as an example.

  2. Upload an object to OSS.

    Files uploaded to OSS are referred to as objects. The test.csv file is used as an example. Before you access the OSS bucket, you must upload the object to OSS. For more information, see Upload objects in the console.

    If an object already exists in OSS, make sure that the object format is supported by ApsaraDB for ClickHouse. ApsaraDB for ClickHouse supports all object formats except Protobuf and CapnProto. For more information, see File formats supported by ClickHouse.

Query OSS data

Method 1: Use an OSS external table

Enterprise Edition or Community-compatible Edition V22.8 or later

  1. Log on to the database.

    1. Log on to the ApsaraDB for ClickHouse console.

    2. On the Clusters of Community-compatible Edition tab of the Clusters page, click the ID of the cluster that you want to manage.

    3. In the upper-right corner of the Cluster Information page, click Log On to Database.

    4. In the Log On to Database Instance dialog box, enter the database username and password and click Login.

  2. Create an OSS external table.

    An OSS external table is a special table type supported by ApsaraDB for ClickHouse. You can use an external table to query the objects stored in OSS without the need to export data to the local storage, achieving separation of storage and computing.

    Sample statement:

    CREATE TABLE <table_name> [ON cluster default]
    (
    'col_name1' col_type1,
    'col_name2' col_type2,
    ...
    )
    ENGINE = OSS('https://<bucket-name>.<oss-endpoint>/<file-name>', '<access-key-id>', '<access-key-secret>', '<file-format-name>');

    The following table describes the parameters.

    Parameter

    Description

    Example

    table_name

    The name of the table.

    oss_test_tb

    col_name1,col_name2

    The names of the columns.

    user_name

    col_type1,col_type2

    The types of the columns.

    Important

    The schema of the OSS external table must match the OSS data.

    String

    bucket-name

    The name of the OSS bucket.

    ck-test-oss

    oss-endpoint

    The internal endpoint that is used to access the OSS bucket from an Elastic Compute Service (ECS) instance over a virtual private cloud (VPC). For more information, see Regions and endpoints of OSS.

    Important

    Make sure that the OSS bucket and your ApsaraDB for ClickHouse cluster are in the same region.

    oss-cn-hangzhou-internal.aliyuncs.com

    file-name

    The name of the object in the OSS bucket.

    test.csv

    access-key-id

    The AccessKey ID of the RAM user that is used to access OSS data.

    LTAI****************

    access-key-secret

    The AccessKey secret of the RAM user that is used to access OSS data.

    For more information, see View the AccessKey information of a RAM user.

    yourAccessKeySecret

    file-format-name

    The format of the object.

    Important

    The format must be a supported format described in Formats overview. Do not change the case of a format name.

    CSV

    Sample statement:

    CREATE TABLE oss_test_tb ON cluster default
    (
    id UInt8,
    user_name String,
    age UInt16,
    city String,
    access_url String
    )
    ENGINE = OSS('http://ck-test-oss.oss-cn-hangzhou-internal.aliyuncs.com/test.csv', 'LTAI****************', 'yourAccessKeySecret','CSV')
  3. Query OSS data.

    SELECT * FROM oss_test_tb;

    Sample result:

    +--------------+---------------------+---------------+----------------+----------------------+
    | id           | user_name           | age           | city           | access_url           |
    +--------------+---------------------+---------------+----------------+----------------------+
    | 1           | tick                | 32            | shanghai       | http://example.com   |
    +--------------+---------------------+---------------+----------------+----------------------+
    Returned rows: [1], Time elapsed: [183ms]

Community-compatible Edition V21.8 or earlier

  1. Log on to the database.

    1. Log on to the ApsaraDB for ClickHouse console.

    2. On the Clusters of Community-compatible Edition tab of the Clusters page, click the ID of the cluster that you want to manage.

    3. In the upper-right corner of the Cluster Information page, click Log On to Database.

    4. In the Log On to Database Instance dialog box, enter the database username and password and click Login.

  2. Create an OSS external table.

    Sample statement:

    CREATE TABLE <table_name> [ON cluster default]
    (
    'col_name1' col_type1,
    'col_name2' col_type2,
    ...
    )
    ENGINE = OSS('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>');

    The following table describes the parameters.

    Parameter

    Description

    Example

    table_name

    The name of the table.

    oss_test_tb

    col_name1,col_name2

    The names of the columns.

    user_name

    col_type1,col_type2

    The types of the columns.

    Important

    The schema of the OSS external table must match the OSS data.

    String

    oss-endpoint

    The internal endpoint that is used to access the OSS bucket from an ECS instance over a VPC. For more information, see Regions and endpoints of OSS.

    Important

    Make sure that the OSS bucket and your ApsaraDB for ClickHouse cluster are in the same region.

    oss-cn-hangzhou-internal.aliyuncs.com

    access-key-id

    The AccessKey ID of the RAM user that is used to access OSS data.

    LTAI****************

    access-key-secret

    The AccessKey secret of the RAM user that is used to access OSS data.

    For more information, see View the AccessKey information of a RAM user.

    yourAccessKeySecret

    oss-file-path

    The storage path of the object. In most cases, the path is in the oss://<bucket-name>/<path-to-file> format.

    Note

    The oss-file-path parameter supports fuzzy matching in which wildcard characters are used. For more information, see Use wildcard characters to fuzzy match the storage path of OSS.

    oss://ck-test-oss/test.csv

    file-format-name

    The format of the object.

    Important

    The format must be a supported format described in Formats overview. Do not change the case of a format name.

    CSV

    Sample statement:

    CREATE TABLE oss_test_tb ON cluster default
    (
    id UInt8,
    user_name String,
    age UInt16,
    city String,
    access_url String
    )
    ENGINE = OSS('oss-cn-hangzhou-internal.aliyuncs.com', 'LTAI****************', 'yourAccessKeySecret', 'oss://ck-test-oss/test.csv', 'CSV');
  3. Query OSS data.

    SELECT * FROM oss_test_tb;

    Sample result:

    +--------------+---------------------+---------------+----------------+----------------------+
    | id           | user_name           | age           | city           | access_url           |
    +--------------+---------------------+---------------+----------------+----------------------+
    | 1           | tick                | 32            | shanghai       | http://example.com   |
    +--------------+---------------------+---------------+----------------+----------------------+
    Returned rows: [1], Time elapsed: [183ms]

Method 2: Use table functions

Enterprise Edition or Community-compatible Edition V22.8 or later

  1. Log on to the database.

    1. Log on to the ApsaraDB for ClickHouse console.

    2. On the Clusters of Community-compatible Edition tab of the Clusters page, click the ID of the cluster that you want to manage.

    3. In the upper-right corner of the Cluster Information page, click Log On to Database.

    4. In the Log On to Database Instance dialog box, enter the database username and password and click Login.

  2. Use a table function to query data.

    Sample statement:

    SELECT * FROM oss('https://<bucket-name>.<oss-endpoint>/<file-name>','<access-key-id>', '<access-key-secret>', '<file-format-name>', '<col_name> <col_type>(,...)');

    The parameters in the preceding sample statement are the same as the parameters used for creating an OSS external table. For more information, see Parameter description.

    Sample statement:

    SELECT * FROM oss('http://testBucketName.oss-cn-shanghai-internal.aliyuncs.com/test.csv', 'LTAI****************', 'yourAccessKeySecret', 'CSV', 'id UInt8, user_name String, age UInt16, city String, access_url String')

    Sample result:

    +--------------+---------------------+---------------+----------------+----------------------+
    | id           | user_name           | age           | city           | access_url           |
    +--------------+---------------------+---------------+----------------+----------------------+
    | 1           | tick                | 32            | shanghai       | http://example.com   |
    +--------------+---------------------+---------------+----------------+----------------------+
    Returned rows: [1], Time elapsed: [183ms]

Community-compatible Edition V21.8 or earlier

  1. Log on to the database.

    1. Log on to the ApsaraDB for ClickHouse console.

    2. On the Clusters of Community-compatible Edition tab of the Clusters page, click the ID of the cluster that you want to manage.

    3. In the upper-right corner of the Cluster Information page, click Log On to Database.

    4. In the Log On to Database Instance dialog box, enter the database username and password and click Login.

  2. Use a table function to query data.

    Sample statement:

    SELECT * FROM oss('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>', '<col_name> <col_type>(,...)');

    The parameters in the preceding sample statement are the same as the parameters used for creating an OSS external table. For more information, see Parameter description.

    Sample statement:

    SELECT * FROM oss('oss-cn-hangzhou-internal.aliyuncs.com', 'LTAI****************', 'yourAccessKeySecret', 'oss://ck-test-oss/test.csv', 'CSV', 'id UInt8, user_name String, age UInt16, city String, access_url String');

    Sample results:

    +--------------+---------------------+---------------+----------------+----------------------+
    | id           | user_name           | age           | city           | access_url           |
    +--------------+---------------------+---------------+----------------+----------------------+
    | 1           | tick                | 32            | shanghai       | http://example.com   |
    +--------------+---------------------+---------------+----------------+----------------------+
    Returned rows: [1], Time elapsed: [183ms]

Export or write data to OSS

Usage notes

Enterprise Edition or Community-compatible Edition V22.8 or later

When you export or write data from an ApsaraDB for ClickHouse cluster to OSS, you must set the s3_truncate_on_insert or s3_create_new_file_on_insert parameter to 1. If an object already exists in the destination path and you do not configure the parameters or set the parameters to 0, the export or write operation fails.

  • s3_truncate_on_insert=1: ApsaraDB for ClickHouse checks whether an object exists in the destination path when you export or write data to OSS.

    • If an object exists, ApsaraDB for ClickHouse overwrites the existing object with the new data.

    • This setting is suitable for full overwrite during data exports or writes. However, existing data in OSS will be overwritten. Proceed with caution.

  • s3_create_new_file_on_insert=1: ApsaraDB for ClickHouse checks whether an object exists in the destination path when you export or write data to OSS.

    • If an object exists, ApsaraDB for ClickHouse creates a new object in the destination path and write data to the new object. The new object is named in the following format: Name of the existing object + Sequence number + Object format name. The sequence number starts from 0 and increments by 1 for each object creation.

      For example, an object named test.csv exists in the destination path that is specified when you create the OSS external table. The first time you export or write data to OSS, an object named test1.csv is created. If you perform the operation the second time, an object named test2.csv is created.

    • This method is suitable for incremental exports or writes, where existing data remains unchanged.

Community-compatible Edition V21.8 or earlier

When you export or write data from an ApsaraDB for ClickHouse cluster to OSS, no new objects are created and the existing OSS object is overwritten with the new data. Proceed with caution.

Step 1: Log on to the database

  1. Log on to the ApsaraDB for ClickHouse console.

  2. On the Clusters of Community-compatible Edition tab of the Clusters page, click the ID of the cluster that you want to manage.

  3. In the upper-right corner of the Cluster Information page, click Log On to Database.

  4. In the Log On to Database Instance dialog box, enter the database username and password and click Login.

Step 2: (Optional) Prepare the data to be exported

If you want to export data from the cluster to OSS but no data exists in the cluster, you can perform the following steps to prepare data.

If the cluster already contains data, you can skip this step.

  1. Create a table based on the cluster edition.

    In this example, a single-replica cluster that runs Community-compatible Edition is used. If your cluster is a double-replica cluster that runs Enterprise Edition or Community-compatible Edition, you can follow the instructions described in Create a table.

     CREATE TABLE test_tb_local ON cluster default
    (
    id UInt8,
    user_name String,
    age UInt16,
    city String,
    access_url String
    )
    ENGINE = MergeTree()
    ORDER BY id;
  2. Write sample data to OSS.

    INSERT INTO test_tb_local VALUES (13, 'tick', 25, 'shanghai', 'http://example.com');

  3. (Optional) View the data.

    Execute the following statement to check whether the data is written:

    SELECT * FROM test_tb_local;

Step 3: Create an OSS external table

Enterprise Edition or Community-compatible Edition V22.8 or later

Sample statement:

CREATE TABLE <table_name> [ON cluster default]
(
'col_name1' col_type1,
'col_name2' col_type2,
...
)
ENGINE = OSS('https://<bucket-name>.<oss-endpoint>/<file-name>', '<access-key-id>', '<access-key-secret>', '<file-format-name>');

The following table describes the parameters.

Parameter

Description

Example

table_name

The name of the table.

oss_test_tb

col_name1,col_name2

The names of the columns.

user_name

col_type1,col_type2

The types of the columns.

Important

The schema of the OSS external table must match the OSS data.

String

bucket-name

The name of the OSS bucket.

ck-test-oss

oss-endpoint

The internal endpoint that is used to access the OSS bucket from an Elastic Compute Service (ECS) instance over a virtual private cloud (VPC). For more information, see Regions and endpoints of OSS.

Important

Make sure that the OSS bucket and your ApsaraDB for ClickHouse cluster are in the same region.

oss-cn-hangzhou-internal.aliyuncs.com

file-name

The name of the object in the OSS bucket.

test.csv

access-key-id

The AccessKey ID of the RAM user that is used to access OSS data.

LTAI****************

access-key-secret

The AccessKey secret of the RAM user that is used to access OSS data.

For more information, see View the AccessKey information of a RAM user.

yourAccessKeySecret

file-format-name

The format of the object.

Important

The format must be a supported format described in Formats overview. Do not change the case of a format name.

CSV

Sample statement:

CREATE TABLE oss_test_tb ON cluster default
(
id UInt8,
user_name String,
age UInt16,
city String,
access_url String
)
ENGINE = OSS('http://ck-test-oss.oss-cn-hangzhou-internal.aliyuncs.com/test.csv', 'LTAI****************', 'yourAccessKeySecret','CSV')

Community-compatible Edition V21.8 or earlier

Sample statement:

CREATE TABLE <table_name> [ON cluster default]
(
'col_name1' col_type1,
'col_name2' col_type2,
...
)
ENGINE = OSS('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>');

The following table describes the parameters.

Parameter

Description

Example

table_name

The name of the table.

oss_test_tb

col_name1,col_name2

The names of the columns.

user_name

col_type1,col_type2

The types of the columns.

Important

The schema of the OSS external table must match the OSS data.

String

oss-endpoint

The internal endpoint that is used to access the OSS bucket from an ECS instance over a VPC. For more information, see Regions and endpoints of OSS.

Important

Make sure that the OSS bucket and your ApsaraDB for ClickHouse cluster are in the same region.

oss-cn-hangzhou-internal.aliyuncs.com

access-key-id

The AccessKey ID of the RAM user that is used to access OSS data.

LTAI****************

access-key-secret

The AccessKey secret of the RAM user that is used to access OSS data.

For more information, see View the AccessKey information of a RAM user.

yourAccessKeySecret

oss-file-path

The storage path of the object. In most cases, the path is in the oss://<bucket-name>/<path-to-file> format.

Note

The oss-file-path parameter supports fuzzy matching in which wildcard characters are used. For more information, see Use wildcard characters to fuzzy match the storage path of OSS.

oss://ck-test-oss/test.csv

file-format-name

The format of the object.

Important

The format must be a supported format described in Formats overview. Do not change the case of a format name.

CSV

Sample statement:

CREATE TABLE oss_test_tb ON cluster default
(
id UInt8,
user_name String,
age UInt16,
city String,
access_url String
)
ENGINE = OSS('oss-cn-hangzhou-internal.aliyuncs.com', 'LTAI****************', 'yourAccessKeySecret', 'oss://ck-test-oss/test.csv', 'CSV');

Step 4: Export or write data to OSS

When you export or write data to OSS, you use the external table and the INSERT INTO statement. For more information, see INSERT INTO. The following section provides the sample statements used in various scenarios.

Export data to OSS

Enterprise Edition or Community-compatible Edition V22.8 or later

  • Full data overwrite in exports: export data to OSS and overwrite existing data.

    INSERT INTO oss_test_tb SETTINGS s3_truncate_on_insert = 1 SELECT * FROM test_tb_local;
  • Incremental exports: export data to OSS without overwriting existing data.

    INSERT INTO oss_test_tb SETTINGS s3_create_new_file_on_insert = 1 SELECT * FROM test_tb_local;

    After this operation is complete, a new object is created in the OSS bucket. The new object is named in the following format: Name of the existing object + Sequence number + Object format name. The sequence number starts from 0 and increments by 1 for each object creation.

    For example, an object named test.csv exists in the destination path that is specified when you create the OSS external table. The first time you export or write data to OSS, an object named test1.csv is created. If you perform the operation the second time, an object named test2.csv is created. You can view the new object in the OSS console.

Community-compatible Edition V21.8 or earlier

Important

When you export data from the ApsaraDB for ClickHouse cluster to OSS, existing data in OSS is overwritten. Proceed with caution.

INSERT INTO oss_test_tb  SELECT * FROM test_tb_local;

Write data to OSS

Enterprise Edition or Community-compatible Edition V22.8 or later

  • Write new data to OSS and overwrite existing data.

    INSERT INTO oss_test_tb SETTINGS s3_truncate_on_insert = 1 VALUES (14, 'tick', 25, 'shanghai', 'http://example.com');
  • Write new data to OSS without overwriting existing data.

    INSERT INTO oss_test_tb SETTINGS s3_create_new_file_on_insert = 1 VALUES (14, 'tick', 25, 'shanghai', 'http://example.com');

    After this operation is complete, a new object is created in the OSS bucket. The new object is named in the following format: Name of the existing object + Sequence number + Object format name. The sequence number starts from 0 and increments by 1 for each object creation.

    For example, an object named test.csv exists in the destination path that is specified when you create the OSS external table. The first time you export or write data to OSS, an object named test1.csv is created. If you perform the operation the second time, an object named test2.csv is created. You can view the new object in the OSS console.

Community-compatible Edition V21.8 or earlier

Important

When you export data from the ApsaraDB for ClickHouse cluster to OSS, existing data in OSS is overwritten. Proceed with caution.

INSERT INTO oss_test_tb SETTINGS s3_truncate_on_insert = 1 VALUES (14, 'tick', 25, 'shanghai', 'http://example.com');

More operations: import data from OSS to ApsaraDB for ClickHouse

If you want to use OSS as an intermediate storage to migrate data to an ApsaraDB for ClickHouse cluster, you need to import the data from OSS to the cluster. For more information, see Import data from OSS.