DLA CU Edition allows you to access self-managed HDFS databases. This topic describes how to access and query data in a self-managed HDFS database by using DLA.
- DLA CU Edition is activated. For more information, see Quick start for the DLA Presto CU edition.
- The network that you specified when you create a virtual cluster (VC) is in the same VPC as your HDFS cluster.
- The CSV file that contains the following information is created. You can also create a TXT file, for example,
example.txt, and paste the following information into the file.
7,8,9Run the following commands to upload the file to your HDFS cluster:
In the preceding commands,
hadoop fs -mkdir -p hdfs://172.16.199.0:9000/test/p/d=1 hadoop fs -copyFromLocal example.txt hdfs://172.16.199.0:9000/test/p/d=1/example.txt
172.16.199.0:9000is only for your reference. Replace 172.16.199.0:9000 with the host information of your HDFS cluster.
- Log on to the DLA console.
- In the left-side navigation pane, choose Serverless Presto > Execute.
- In the upper-right corner, click Log on in DMS to execute SQL operations, and run the following SQL command to create your HDFS database:
CREATE DATABASE `my_hdfs_db` WITH DBPROPERTIES ( catalog = 'hive', location = 'hdfs://172.16.199.0:9000/test/' )Note You can also use a MySQL client or program code to connect to DLA and run the preceding command to create your HDFS database.Parameters:
- catalog: the schema you created. The value hive indicates that the hive schema is created.
- location: the directory for saving HDFS database files.
- Create an HDFS table.
CREATE EXTERNAL TABLE p ( `a` int, `b` int, `c` int ) partitioned by (d int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS `TEXTFILE` LOCATION 'hdfs://172.16.199.0:9000/test/p/';Note The parameters and attributes of the HDFS table are mostly the same as those of OSS tables. The differences lie in the following aspects:
- Due to network connectivity issues, DLA does not check directories when you create a database or table. You must ensure the correctness of the directories.
- For the same reason, HDFS tables do not support the auto.create.location attribute.
- Query or access data.
Only CUs of DLA can interwork with the network where your HDFS cluster resides. Therefore, you must add
hint: /*+cluster=your-vc-name*/to the SQL statements that are used to access the HDFS table. This way, the SQL statements can be executed in CUs.Example:
mysql> /*+ cluster=vc-test */ -> alter table p -> add partition (d=1) -> location 'hdfs://172.18.0.6:9000/test/p/d=1'; Query OK, 0 rows affected (8.63 sec) mysql> /*+ cluster=vc-test */ -> alter table p -> drop partition (d=1) ; Query OK, 0 rows affected (6.08 sec) mysql> /*+ cluster=vc-test */ msck repair table p; +-----------+-----------+ | Partition | Operation | +-----------+-----------+ | d=1 | CREATED | +-----------+-----------+ 1 row in set (16.47 sec) mysql> /*+ cluster=vc-test */ select * from p; +------+------+------+------+ | a | b | c | d | +------+------+------+------+ | 7 | 8 | 9 | 1 | +------+------+------+------+ 1 row in set (4.74 sec)
For more information, see Common SQL statements.