Use JindoSDK with Trino to query data in OSS-HDFS - Object Storage Service

Trino is an open-source, distributed SQL query engine for interactive analytics. This topic describes how to use JindoSDK with Trino to query data in OSS-HDFS.

Prerequisites

You have purchased an ECS instance to use as the deployment environment. For more information, see Purchase an ECS instance.
You have created a Hadoop environment. For more information, see Create a Hadoop runtime environment.
You have deployed Trino. For more information, see Deploy Trino.
You have activated OSS-HDFS and granted the required permissions. For more information, see Activate OSS-HDFS.

Procedure

Connect to an ECS instance. For more information, see Connect to an instance.
Configure JindoSDK.
1. Download the latest version of the JindoSDK JAR package. For the download link, see GitHub.
2. Decompress the JindoSDK JAR package.
  The following command shows how to decompress jindosdk-x.x.x-linux.tar.gz. If you use a different version of JindoSDK, replace the package name with the actual one.
```
tar zxvf jindosdk-x.x.x-linux.tar.gz
```
  Note
  x.x.x represents the version number of the JindoSDK JAR package.
3. Install the JindoSDK JAR package to the Trino classpath.
```
cp jindosdk-x.x.x-linux/lib/*.jar  $Trino_HOME/plugin/hive-hadoop2/
```

Configure the OSS-HDFS implementation class and AccessKey.

On all Trino nodes, add the OSS-HDFS implementation class to the Hadoop core-site.xml configuration file.

<configuration>
    <property>
        <name>fs.AbstractFileSystem.oss.impl</name>
        <value>com.aliyun.jindodata.oss.JindoOSS</value>
    </property>

    <property>
        <name>fs.oss.impl</name>
        <value>com.aliyun.jindodata.oss.JindoOssFileSystem</value>
    </property>
</configuration>

On all Trino nodes, add the AccessKey ID and AccessKey secret for your OSS-HDFS-enabled bucket to the Hadoop core-site.xml configuration file.

<configuration>
    <property>
        <name>fs.oss.accessKeyId</name>
        <value>LTAI********</value>
    </property>

    <property>
        <name>fs.oss.accessKeySecret</name>
        <value>KZo1********</value>
    </property>
</configuration>

Configure the OSS-HDFS Endpoint.
To access an OSS bucket using OSS-HDFS, you must configure an Endpoint. The recommended access path format is oss://{yourBucketName}.{yourBucketEndpoint}/{path}, for example, oss://examplebucket.cn-shanghai.oss-dls.aliyuncs.com/exampleobject.txt. After the configuration is complete, JindoSDK uses the Endpoint in the access path to access the OSS-HDFS API.
You can also configure the OSS-HDFS Endpoint using other methods. Endpoints configured using different methods have different priorities. For more information, see Appendix 1: Other ways to configure an Endpoint.
Important
After you complete the preceding configurations, restart the Trino service for the configurations to take effect.
Query data in OSS-HDFS.
The following example uses a common Hive catalog to show how you can use Trino to create a schema in OSS and run a simple SQL query. Because Trino depends on Hive Metastore, you must also install and deploy JindoSDK for the Hive service. For more information, see Use JindoSDK with Hive to process data in OSS-HDFS.
1. Log on to the Trino console.
```
trino --server <Trino_server_address>:<Trino_server_port> --catalog hive
```
2. Create a schema in OSS.
```
create schema testDB with (location='oss://{yourBucketName}.{yourBucketEndpoint}/{schema_dir}');
```
3. Use the schema.
```
use testDB;
```
4. Create a table.
```
create table tbl (key int, val int);
```
5. Insert data into the table.
```
insert into tbl values (1,666);
```
6. Query the table.
```
select * from tbl;
```