Access data in OSS - ApsaraDB for ClickHouse - Alibaba Cloud Documentation Center

Prerequisites

OSS is activated. For information about how to activate OSS, see Activate OSS.
An OSS bucket is created in the region where the ApsaraDB for ClickHouse cluster is deployed. For information about how to create an OSS bucket, see Create buckets.
A file that contains data is uploaded to OSS. For information about how to upload a file to OSS, see Upload objects.

Note In this example, the following content is stored in a file named test.csv: 1,tick,32,shanghai,http://example.com.
The Alibaba Cloud account that you use to access the OSS bucket is granted the required permissions to read and write objects in the bucket. For information about how to grant permissions, see Overview.

Precautions

ApsaraDB for ClickHouse allows you to access OSS files in multiple formats except Protobuf and CapnProto. For more information, see Formats for Input and Output Data.
Make sure that your OSS bucket and ApsaraDB for ClickHouse cluster are deployed in the same region, and you can access OSS over a virtual private cloud (VPC).
If you use ApsaraDB for ClickHouse to write data to OSS, original data in OSS files is overwritten.

Step 1: Log on to the database

Log on to the ApsaraDB for ClickHouse console.
On the Clusters page, click the Default Instances tab, and click the ID of the cluster that you want to manage.
On the Cluster Information page, click Log On to Database in the upper-right corner.
In the Log on to Database Instance dialog box, enter the username and password of the database account, and click Login.

Step 2: Create an OSS external table

Create an OSS external table.

For a cluster of version 21.8 and earlier, use the following syntax to create an OSS external table:

CREATE TABLE <table_name> [on cluster default]
(
'col_name1' col_type1,
'col_name2' col_type2,
...
)
ENGINE = OSS('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>');

For a cluster of version 22.8, use the following syntax to create an OSS external table:

CREATE TABLE <table_name> [on cluster default]
(
'col_name1' col_type1,
'col_name2' col_type2,
...
)
ENGINE = OSS('https://<bucketName>.<oss-endpoint>/<file-name>', '<access-key-id>', '<access-key-secret>', '<file-format-name>');

The following table describes the parameters.


Parameter	Description
`table_name`	The name of the table.
`col_name1,col_name2`	The names of the columns.
`col_type1,col_type2`	The data types of the columns. Important The schema of the OSS external table must correspond to the schema of the CSV file that contains the data in OSS.
`BucketName`	The name of the bucket.
`oss-endpoint`	The private endpoint that you want to use to access an Elastic Compute Service (ECS) instance over the VPC in which the OSS bucket is deployed. For more information, see Regions and endpoints. Important Make sure that the OSS bucket is deployed in the same region as the ApsaraDB for ClickHouse cluster.
`file-name`	The name of the file.
`access-key-id`	The AccessKey ID that you want to use to access the data in OSS.
`access-key-secret`	The AccessKey secret that you want to use to access the data in OSS.
`oss-file-path`	The storage path of the CSV file that you uploaded. The path is in the `oss://<bucket-name>/<path-to-file>` format. Note You can set the `oss-file-path` parameter to a value that includes wildcard characters to perform a fuzzy match. For more information, see Use wildcard characters to perform fuzzy matching for storage paths in OSS.
`file-format-name`	The format of the file. In this topic, the CSV format is used.

Sample statements:

For a cluster of version 21.8 and earlier, the following statements provide an example on how to create an OSS external table:

CREATE TABLE oss_test_tbl on cluster default
(
id UInt8,
user_name String,
age UInt16,
city String,
access_url String
)
ENGINE = OSS('oss-cn-shanghai-internal.aliyuncs.com', 'LTAI5tDVcUKu2CbcBwhr****', 'WAcroHBcL7IVOK8FIHzNJy91Lc****', 'oss://testBucketName/test.csv', 'CSV');

For a cluster of version 22.8, the following statements provide an example on how to create an OSS external table:

CREATE TABLE oss_test_tbl on cluster default
(
id UInt8,
user_name String,
age UInt16,
city String,
access_url String
)
ENGINE = OSS('http://testBucketName.oss-cn-shanghai-internal.aliyuncs.com/test.csv', 'LTAI5tDVcUKu2CbcBwhr****', 'WAcroHBcL7IVOK8FIHzNJy91Lc****','CSV')

Query the data in the OSS external table.

select * from oss_test_tbl;

The following result is returned:

┌─id─┬─user_name─┬──age──┬───city─────┬─────access_url────────┐
│  1 │  tick     │   32  │  shanghai  │   http://example.com  │
└────┴───────────┴───────┴────────────┴───────────────────────┘

Step 3: Write data to OSS

insert into oss_test_tbl values(11, 'tick', 25, 'shanghai', 'http://example.com');

Step 4: Query the data in OSS

ApsaraDB for ClickHouse allows you to use a table engine or table function to query the data in OSS.

Use a table engine to query the data in OSS.

Query the data in the OSS external table.

select * from oss_test_tbl;

The following result is returned:

┌─id─┬─user_name─┬──age──┬───city─────┬─────access_url────────┐
│ 11 │  tick     │   25  │  shanghai  │   http://example.com  │
└────┴───────────┴───────┴────────────┴───────────────────────┘

Use a table function to query the data in OSS.

For a cluster of version 21.8 and earlier, use the following syntax to query the data in OSS:

SELECT * FROM oss('<oss-endpoint>', '<access-key-id>', '<access-key-secret>', '<oss-file-path>', '<file-format-name>', '<col_name> <col_type>(,...)');

For a cluster of version 22.8, use the following syntax to query the data in OSS:

SELECT * FROM oss('https://<bucketName>.<oss-endpoint>/<file-name>','<access-key-id>', '<access-key-secret>', '<file-format-name>', '<col_name> <col_type>(,...)');

For more information about parameters, see the table that describes parameters.

Sample statements:

For a cluster of version 21.8 and earlier, the following statements provide an example on how to query the data in OSS:

SELECT * FROM oss('oss-cn-shanghai-internal.aliyuncs.com', 'LTAI5tDVcUKu2CbcBwhr****', 'WAcroHBcL7IVOK8FIHzNJy91Lc****', 'oss://testBucketName/test.csv', 'CSV', 'id UInt8, user_name String, age UInt16, city String, access_url String');

For a cluster of version 22.8, the following statements provide an example on how to query the data in OSS:

SELECT * FROM oss('http://testBucketName.oss-cn-shanghai-internal.aliyuncs.com/test.csv', 'LTAI5tDVcUKu2CbcBwhr****', 'WAcroHBcL7IVOK8FIHzNJy91Lc****', 'CSV', 'id UInt8, user_name String, age UInt16, city String, access_url String')

The following result is returned:

┌─id─┬─user_name─┬──age──┬───city─────┬─────access_url────────┐
│ 11 │  tick     │   25  │  shanghai  │   http://example.com  │
└────┴───────────┴───────┴────────────┴───────────────────────┘