Query data stored in OSS-HDFS by running Trino on an EMR cluster.
Prerequisites
-
An EMR cluster (EMR-3.42.0 or later, or EMR-5.8.0 or later) with Trino selected. Create a cluster.
-
OSS-HDFS is enabled with authorized access. Enable the OSS-HDFS service.
Procedure
-
Log on to the E-MapReduce console. In the left-side navigation pane, click EMR on ECS and create an EMR cluster.
When you create the EMR cluster, make sure that Product Version is EMR-3.46.2 or later, or EMR-5.12.2 or later, and Root Storage Directory of Cluster is set to an OSS-HDFS-enabled bucket. Use the defaults for other parameters. For details, see Create a cluster.
-
Query data in the OSS-HDFS service.
-
Connect to the Trino CLI.
On the EMR on ECS console, go to Services > Trino > the Configure tab to get <Trino_server_address> and <Trino_server_port>.
trino --server <Trino_server_address>:<Trino_server_port> --catalog hive Create a schema in OSS.
create schema testDB with (location='oss://<Bucket>.<Endpoint>/<schema_dir>');Use the schema.
use testDB;Create a table.
create table tbl (key int, val int);Insert data into the table.
insert into tbl values (1,666);Query the table.
select * from tbl;
-