OSS-HDFS provides an HDFS-compatible interface that allows big data engines such as Trino to access OSS data directly. Set up Trino on an E-MapReduce (EMR) cluster to run interactive SQL queries against data stored in OSS-HDFS.
Prerequisites
Before you begin, make sure that you have:
An EMR cluster of EMR V3.42.0 or later, or EMR V5.8.0 or later, with the Trino service selected. For more information, see Create a cluster.
A bucket with OSS-HDFS enabled and the required access permissions granted. For more information, see Enable OSS-HDFS and grant access permissions.
Step 1: Create an EMR cluster
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
Create an EMR cluster with the following settings:
Parameter Required value Product Version EMR-3.46.2 or later, or EMR-5.12.2 or later Root Storage Directory of Cluster A bucket with OSS-HDFS enabled Other parameters Default values For more information, see Create a cluster.
Step 2: Connect to the Trino server
Get the Trino server address and port:
On the EMR on ECS page, click the name of your cluster.
Go to Services > Trino > Configure to find the server address and port.
Run the following command to connect to the Trino server:
Parameter Description <Trino_server_address>The IP address or hostname of the Trino server. Located on the Configure tab. <Trino_server_port>The port number of the Trino server. Located on the Configure tab. <catalog_name>The catalog to connect to, such as hive.trino --server <Trino_server_address>:<Trino_server_port> --catalog <catalog_name>Note: The--catalogflag requires a value. Specify the catalog that maps to your data source.
Step 3: Query data stored in OSS-HDFS
After you connect to the Trino server, run the following SQL statements to create a schema, load data, and run a query.
Create a schema that points to an OSS-HDFS location.
create schema testDB with (location='oss://<Bucket>.<Endpoint>/<schema_dir>');Replace the placeholders with your values:
Placeholder Description Example <Bucket>The name of the OSS bucket with OSS-HDFS enabled my-data-bucket<Endpoint>The OSS-HDFS endpoint for your region cn-hangzhou.oss-dls.aliyuncs.com<schema_dir>The directory path for the schema trino/testDBThe full URI follows this format:
oss://<Bucket>.<Endpoint>/<schema_dir>Switch to the new schema.
use testDB;Create a table.
create table tbl (key int, val int);Insert data into the table.
insert into tbl values (1,666);Query data in the table.
select * from tbl;Expected output:
key | val -----+----- 1 | 666 (1 row)
Result
The query returns the row you inserted. The data is stored in the OSS-HDFS location specified in the schema.