This topic describes how to query Object Storage Service (OSS) data from an ApsaraDB for ClickHouse cluster and how to write or export data from a cluster to OSS.
Prerequisites
OSS requirements:
OSS is activated. For more information, see Activate OSS.
A Resource Access Management (RAM) user that is used to access OSS is granted the read and write permissions on OSS objects. For more information, see Access control.
Cluster requirements:
A database account is created. For more information, see Account Management.
Preparations
An OSS bucket is created. For more information, see Create buckets in the console.
ImportantMake sure that the OSS bucket is in the same region as your ApsaraDB for ClickHouse cluster.
In this topic, a bucket named ck-test-oss is used as an example.
Upload an object to OSS.
Files uploaded to OSS are referred to as objects. The test.csv file is used as an example. Before you access the OSS bucket, you must upload the object to OSS. For more information, see Upload objects in the console.
If an object already exists in OSS, make sure that the object format is supported by ApsaraDB for ClickHouse. ApsaraDB for ClickHouse supports all object formats except Protobuf and CapnProto. For more information, see File formats supported by ClickHouse.
Query OSS data
Method 1: Use an OSS external table
Enterprise Edition or Community-compatible Edition V22.8 or later
Log on to the database.
Log on to the ApsaraDB for ClickHouse console.
On the Clusters of Community-compatible Edition tab of the Clusters page, click the ID of the cluster that you want to manage.
In the upper-right corner of the Cluster Information page, click Log On to Database.
In the Log On to Database Instance dialog box, enter the database username and password and click Login.
Create an OSS external table.
An OSS external table is a special table type supported by ApsaraDB for ClickHouse. You can use an external table to query the objects stored in OSS without the need to export data to the local storage, achieving separation of storage and computing.
Sample statement:
CREATE TABLE <table_name> [ON cluster default] ( 'col_name1' col_type1, 'col_name2' col_type2, ... ) ENGINE = OSS('https://<bucket-name>.<oss-endpoint>/<file-name>', '<access-key-id>', '<access-key-secret>', '<file-format-name>');
The following table describes the parameters.
Parameter
Description
Example
table_name
The name of the table.
oss_test_tb
col_name1,col_name2
The names of the columns.
user_name
col_type1,col_type2
The types of the columns.
ImportantThe schema of the OSS external table must match the OSS data.
String
bucket-name
The name of the OSS bucket.
ck-test-oss
oss-endpoint
The internal endpoint that is used to access the OSS bucket from an Elastic Compute Service (ECS) instance over a virtual private cloud (VPC). For more information, see Regions and endpoints of OSS.
ImportantMake sure that the OSS bucket and your ApsaraDB for ClickHouse cluster are in the same region.
oss-cn-hangzhou-internal.aliyuncs.com
file-name
The name of the object in the OSS bucket.
test.csv
access-key-id
The AccessKey ID of the RAM user that is used to access OSS data.
LTAI****************
access-key-secret
The AccessKey secret of the RAM user that is used to access OSS data.
For more information, see View the AccessKey information of a RAM user.
yourAccessKeySecret
file-format-name
The format of the object.
ImportantThe format must be a supported format described in Formats overview. Do not change the case of a format name.
CSV
Sample statement:
CREATE TABLE oss_test_tb ON cluster default ( id UInt8, user_name String, age UInt16, city String, access_url String ) ENGINE = OSS('http://ck-test-oss.oss-cn-hangzhou-internal.aliyuncs.com/test.csv', 'LTAI****************', 'yourAccessKeySecret','CSV')
Query OSS data.
SELECT * FROM oss_test_tb;
Sample result:
+--------------+---------------------+---------------+----------------+----------------------+ | id | user_name | age | city | access_url | +--------------+---------------------+---------------+----------------+----------------------+ | 1 | tick | 32 | shanghai | http://example.com | +--------------+---------------------+---------------+----------------+----------------------+ Returned rows: [1], Time elapsed: [183ms]
Community-compatible Edition V21.8 or earlier
Log on to the database.
Log on to the ApsaraDB for ClickHouse console.
On the Clusters of Community-compatible Edition tab of the Clusters page, click the ID of the cluster that you want to manage.
In the upper-right corner of the Cluster Information page, click Log On to Database.
In the Log On to Database Instance dialog box, enter the database username and password and click Login.
Create an OSS external table.
Sample statement:
CREATE TABLE <table_name> [ON cluster default] ( 'col_name1' col_type1, 'col_name2' col_type2, ... ) ENGINE = OSS('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>');
The following table describes the parameters.
Parameter
Description
Example
table_name
The name of the table.
oss_test_tb
col_name1,col_name2
The names of the columns.
user_name
col_type1,col_type2
The types of the columns.
ImportantThe schema of the OSS external table must match the OSS data.
String
oss-endpoint
The internal endpoint that is used to access the OSS bucket from an ECS instance over a VPC. For more information, see Regions and endpoints of OSS.
ImportantMake sure that the OSS bucket and your ApsaraDB for ClickHouse cluster are in the same region.
oss-cn-hangzhou-internal.aliyuncs.com
access-key-id
The AccessKey ID of the RAM user that is used to access OSS data.
LTAI****************
access-key-secret
The AccessKey secret of the RAM user that is used to access OSS data.
For more information, see View the AccessKey information of a RAM user.
yourAccessKeySecret
oss-file-path
The storage path of the object. In most cases, the path is in the
oss://<bucket-name>/<path-to-file>
format.NoteThe
oss-file-path
parameter supports fuzzy matching in which wildcard characters are used. For more information, see Use wildcard characters to fuzzy match the storage path of OSS.oss://ck-test-oss/test.csv
file-format-name
The format of the object.
ImportantThe format must be a supported format described in Formats overview. Do not change the case of a format name.
CSV
Sample statement:
CREATE TABLE oss_test_tb ON cluster default ( id UInt8, user_name String, age UInt16, city String, access_url String ) ENGINE = OSS('oss-cn-hangzhou-internal.aliyuncs.com', 'LTAI****************', 'yourAccessKeySecret', 'oss://ck-test-oss/test.csv', 'CSV');
Query OSS data.
SELECT * FROM oss_test_tb;
Sample result:
+--------------+---------------------+---------------+----------------+----------------------+ | id | user_name | age | city | access_url | +--------------+---------------------+---------------+----------------+----------------------+ | 1 | tick | 32 | shanghai | http://example.com | +--------------+---------------------+---------------+----------------+----------------------+ Returned rows: [1], Time elapsed: [183ms]
Method 2: Use table functions
Enterprise Edition or Community-compatible Edition V22.8 or later
Log on to the database.
Log on to the ApsaraDB for ClickHouse console.
On the Clusters of Community-compatible Edition tab of the Clusters page, click the ID of the cluster that you want to manage.
In the upper-right corner of the Cluster Information page, click Log On to Database.
In the Log On to Database Instance dialog box, enter the database username and password and click Login.
Use a table function to query data.
Sample statement:
SELECT * FROM oss('https://<bucket-name>.<oss-endpoint>/<file-name>','<access-key-id>', '<access-key-secret>', '<file-format-name>', '<col_name> <col_type>(,...)');
The parameters in the preceding sample statement are the same as the parameters used for creating an OSS external table. For more information, see Parameter description.
Sample statement:
SELECT * FROM oss('http://testBucketName.oss-cn-shanghai-internal.aliyuncs.com/test.csv', 'LTAI****************', 'yourAccessKeySecret', 'CSV', 'id UInt8, user_name String, age UInt16, city String, access_url String')
Sample result:
+--------------+---------------------+---------------+----------------+----------------------+ | id | user_name | age | city | access_url | +--------------+---------------------+---------------+----------------+----------------------+ | 1 | tick | 32 | shanghai | http://example.com | +--------------+---------------------+---------------+----------------+----------------------+ Returned rows: [1], Time elapsed: [183ms]
Community-compatible Edition V21.8 or earlier
Log on to the database.
Log on to the ApsaraDB for ClickHouse console.
On the Clusters of Community-compatible Edition tab of the Clusters page, click the ID of the cluster that you want to manage.
In the upper-right corner of the Cluster Information page, click Log On to Database.
In the Log On to Database Instance dialog box, enter the database username and password and click Login.
Use a table function to query data.
Sample statement:
SELECT * FROM oss('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>', '<col_name> <col_type>(,...)');
The parameters in the preceding sample statement are the same as the parameters used for creating an OSS external table. For more information, see Parameter description.
Sample statement:
SELECT * FROM oss('oss-cn-hangzhou-internal.aliyuncs.com', 'LTAI****************', 'yourAccessKeySecret', 'oss://ck-test-oss/test.csv', 'CSV', 'id UInt8, user_name String, age UInt16, city String, access_url String');
Sample results:
+--------------+---------------------+---------------+----------------+----------------------+ | id | user_name | age | city | access_url | +--------------+---------------------+---------------+----------------+----------------------+ | 1 | tick | 32 | shanghai | http://example.com | +--------------+---------------------+---------------+----------------+----------------------+ Returned rows: [1], Time elapsed: [183ms]
Export or write data to OSS
Usage notes
Enterprise Edition or Community-compatible Edition V22.8 or later
When you export or write data from an ApsaraDB for ClickHouse cluster to OSS, you must set the s3_truncate_on_insert or s3_create_new_file_on_insert parameter to 1. If an object already exists in the destination path and you do not configure the parameters or set the parameters to 0, the export or write operation fails.
s3_truncate_on_insert=1: ApsaraDB for ClickHouse checks whether an object exists in the destination path when you export or write data to OSS.
If an object exists, ApsaraDB for ClickHouse overwrites the existing object with the new data.
This setting is suitable for full overwrite during data exports or writes. However, existing data in OSS will be overwritten. Proceed with caution.
s3_create_new_file_on_insert=1: ApsaraDB for ClickHouse checks whether an object exists in the destination path when you export or write data to OSS.
If an object exists, ApsaraDB for ClickHouse creates a new object in the destination path and write data to the new object. The new object is named in the following format:
Name of the existing object + Sequence number + Object format name
. The sequence number starts from 0 and increments by 1 for each object creation.For example, an object named test.csv exists in the destination path that is specified when you create the OSS external table. The first time you export or write data to OSS, an object named test1.csv is created. If you perform the operation the second time, an object named test2.csv is created.
This method is suitable for incremental exports or writes, where existing data remains unchanged.
Community-compatible Edition V21.8 or earlier
When you export or write data from an ApsaraDB for ClickHouse cluster to OSS, no new objects are created and the existing OSS object is overwritten with the new data. Proceed with caution.
Step 1: Log on to the database
Log on to the ApsaraDB for ClickHouse console.
On the Clusters of Community-compatible Edition tab of the Clusters page, click the ID of the cluster that you want to manage.
In the upper-right corner of the Cluster Information page, click Log On to Database.
In the Log On to Database Instance dialog box, enter the database username and password and click Login.
Step 2: (Optional) Prepare the data to be exported
If you want to export data from the cluster to OSS but no data exists in the cluster, you can perform the following steps to prepare data.
If the cluster already contains data, you can skip this step.
Create a table based on the cluster edition.
In this example, a single-replica cluster that runs Community-compatible Edition is used. If your cluster is a double-replica cluster that runs Enterprise Edition or Community-compatible Edition, you can follow the instructions described in Create a table.
CREATE TABLE test_tb_local ON cluster default ( id UInt8, user_name String, age UInt16, city String, access_url String ) ENGINE = MergeTree() ORDER BY id;
Write sample data to OSS.
INSERT INTO test_tb_local VALUES (13, 'tick', 25, 'shanghai', 'http://example.com');
(Optional) View the data.
Execute the following statement to check whether the data is written:
SELECT * FROM test_tb_local;
Step 3: Create an OSS external table
Enterprise Edition or Community-compatible Edition V22.8 or later
Sample statement:
CREATE TABLE <table_name> [ON cluster default]
(
'col_name1' col_type1,
'col_name2' col_type2,
...
)
ENGINE = OSS('https://<bucket-name>.<oss-endpoint>/<file-name>', '<access-key-id>', '<access-key-secret>', '<file-format-name>');
The following table describes the parameters.
Parameter | Description | Example |
table_name | The name of the table. | oss_test_tb |
col_name1,col_name2 | The names of the columns. | user_name |
col_type1,col_type2 | The types of the columns. Important The schema of the OSS external table must match the OSS data. | String |
bucket-name | The name of the OSS bucket. | ck-test-oss |
oss-endpoint | The internal endpoint that is used to access the OSS bucket from an Elastic Compute Service (ECS) instance over a virtual private cloud (VPC). For more information, see Regions and endpoints of OSS. Important Make sure that the OSS bucket and your ApsaraDB for ClickHouse cluster are in the same region. | oss-cn-hangzhou-internal.aliyuncs.com |
file-name | The name of the object in the OSS bucket. | test.csv |
access-key-id | The AccessKey ID of the RAM user that is used to access OSS data. | LTAI**************** |
access-key-secret | The AccessKey secret of the RAM user that is used to access OSS data. For more information, see View the AccessKey information of a RAM user. | yourAccessKeySecret |
file-format-name | The format of the object. Important The format must be a supported format described in Formats overview. Do not change the case of a format name. | CSV |
Sample statement:
CREATE TABLE oss_test_tb ON cluster default
(
id UInt8,
user_name String,
age UInt16,
city String,
access_url String
)
ENGINE = OSS('http://ck-test-oss.oss-cn-hangzhou-internal.aliyuncs.com/test.csv', 'LTAI****************', 'yourAccessKeySecret','CSV')
Community-compatible Edition V21.8 or earlier
Sample statement:
CREATE TABLE <table_name> [ON cluster default]
(
'col_name1' col_type1,
'col_name2' col_type2,
...
)
ENGINE = OSS('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>');
The following table describes the parameters.
Parameter | Description | Example |
table_name | The name of the table. | oss_test_tb |
col_name1,col_name2 | The names of the columns. | user_name |
col_type1,col_type2 | The types of the columns. Important The schema of the OSS external table must match the OSS data. | String |
oss-endpoint | The internal endpoint that is used to access the OSS bucket from an ECS instance over a VPC. For more information, see Regions and endpoints of OSS. Important Make sure that the OSS bucket and your ApsaraDB for ClickHouse cluster are in the same region. | oss-cn-hangzhou-internal.aliyuncs.com |
access-key-id | The AccessKey ID of the RAM user that is used to access OSS data. | LTAI**************** |
access-key-secret | The AccessKey secret of the RAM user that is used to access OSS data. For more information, see View the AccessKey information of a RAM user. | yourAccessKeySecret |
oss-file-path | The storage path of the object. In most cases, the path is in the Note The | oss://ck-test-oss/test.csv |
file-format-name | The format of the object. Important The format must be a supported format described in Formats overview. Do not change the case of a format name. | CSV |
Sample statement:
CREATE TABLE oss_test_tb ON cluster default
(
id UInt8,
user_name String,
age UInt16,
city String,
access_url String
)
ENGINE = OSS('oss-cn-hangzhou-internal.aliyuncs.com', 'LTAI****************', 'yourAccessKeySecret', 'oss://ck-test-oss/test.csv', 'CSV');
Step 4: Export or write data to OSS
When you export or write data to OSS, you use the external table and the INSERT INTO statement. For more information, see INSERT INTO. The following section provides the sample statements used in various scenarios.
Export data to OSS
Enterprise Edition or Community-compatible Edition V22.8 or later
Full data overwrite in exports: export data to OSS and overwrite existing data.
INSERT INTO oss_test_tb SETTINGS s3_truncate_on_insert = 1 SELECT * FROM test_tb_local;
Incremental exports: export data to OSS without overwriting existing data.
INSERT INTO oss_test_tb SETTINGS s3_create_new_file_on_insert = 1 SELECT * FROM test_tb_local;
After this operation is complete, a new object is created in the OSS bucket. The new object is named in the following format:
Name of the existing object + Sequence number + Object format name
. The sequence number starts from 0 and increments by 1 for each object creation.For example, an object named test.csv exists in the destination path that is specified when you create the OSS external table. The first time you export or write data to OSS, an object named test1.csv is created. If you perform the operation the second time, an object named test2.csv is created. You can view the new object in the OSS console.
Community-compatible Edition V21.8 or earlier
When you export data from the ApsaraDB for ClickHouse cluster to OSS, existing data in OSS is overwritten. Proceed with caution.
INSERT INTO oss_test_tb SELECT * FROM test_tb_local;
Write data to OSS
Enterprise Edition or Community-compatible Edition V22.8 or later
Write new data to OSS and overwrite existing data.
INSERT INTO oss_test_tb SETTINGS s3_truncate_on_insert = 1 VALUES (14, 'tick', 25, 'shanghai', 'http://example.com');
Write new data to OSS without overwriting existing data.
INSERT INTO oss_test_tb SETTINGS s3_create_new_file_on_insert = 1 VALUES (14, 'tick', 25, 'shanghai', 'http://example.com');
After this operation is complete, a new object is created in the OSS bucket. The new object is named in the following format:
Name of the existing object + Sequence number + Object format name
. The sequence number starts from 0 and increments by 1 for each object creation.For example, an object named test.csv exists in the destination path that is specified when you create the OSS external table. The first time you export or write data to OSS, an object named test1.csv is created. If you perform the operation the second time, an object named test2.csv is created. You can view the new object in the OSS console.
Community-compatible Edition V21.8 or earlier
When you export data from the ApsaraDB for ClickHouse cluster to OSS, existing data in OSS is overwritten. Proceed with caution.
INSERT INTO oss_test_tb SETTINGS s3_truncate_on_insert = 1 VALUES (14, 'tick', 25, 'shanghai', 'http://example.com');
More operations: import data from OSS to ApsaraDB for ClickHouse
If you want to use OSS as an intermediate storage to migrate data to an ApsaraDB for ClickHouse cluster, you need to import the data from OSS to the cluster. For more information, see Import data from OSS.