DLA CU Edition allows you to access self-managed HDFS databases. This topic describes how to access and query data in a self-managed HDFS database by using DLA.

Prerequisites

  • DLA CU Edition is activated. For more information, see Quick start for the DLA Presto CU edition.
  • The network that you specified when you create a virtual cluster (VC) is in the same VPC as your HDFS cluster.
  • The CSV file that contains the following information is created. You can also create a TXT file, for example, example.txt, and paste the following information into the file.
    7,8,9
    Run the following commands to upload the file to your HDFS cluster:
    hadoop fs -mkdir -p hdfs://172.16.199.0:9000/test/p/d=1
    hadoop fs -copyFromLocal example.txt hdfs://172.16.199.0:9000/test/p/d=1/example.txt
    In the preceding commands, 172.16.199.0:9000 is only for your reference. Replace 172.16.199.0:9000 with the host information of your HDFS cluster.
Note DLA is not allowed to access HDFS clusters with Kerberos authentication enabled. If your HDFS cluster is enabled with Kerberos authentication, you can submit a ticket or contact DLA technical support personnel by using DingTalk.

Procedure

  1. Log on to the DLA console.
  2. In the left-side navigation pane, choose Serverless Presto > Execute.
  3. In the upper-right corner, click Log on in DMS to execute SQL operations, and run the following SQL command to create your HDFS database:
    CREATE DATABASE `my_hdfs_db`
    WITH DBPROPERTIES (
        catalog = 'hive',
        location = 'hdfs://172.16.199.0:9000/test/'
    )
    Note You can also use a MySQL client or program code to connect to DLA and run the preceding command to create your HDFS database.
    Parameters:
    • catalog: the schema you created. The value hive indicates that the hive schema is created.
    • location: the directory for saving HDFS database files.
  4. Create an HDFS table.
    CREATE EXTERNAL TABLE p (
        `a` int,
        `b` int,
        `c` int
    ) partitioned by (d int)
    ROW FORMAT DELIMITED
        FIELDS TERMINATED BY ','
    STORED AS `TEXTFILE`
    LOCATION 'hdfs://172.16.199.0:9000/test/p/';
    Note The parameters and attributes of the HDFS table are mostly the same as those of OSS tables. The differences lie in the following aspects:
    • Due to network connectivity issues, DLA does not check directories when you create a database or table. You must ensure the correctness of the directories.
    • For the same reason, HDFS tables do not support the auto.create.location attribute.
  5. Query or access data.
    Only CUs of DLA can interwork with the network where your HDFS cluster resides. Therefore, you must add hint: /*+cluster=your-vc-name*/ to the SQL statements that are used to access the HDFS table. This way, the SQL statements can be executed in CUs.
    Example:
    mysql> /*+ cluster=vc-test */
        -> alter table p
        -> add partition (d=1)
        -> location 'hdfs://172.18.0.6:9000/test/p/d=1';
    Query OK, 0 rows affected (8.63 sec)
    
    mysql> /*+ cluster=vc-test */
        -> alter table p
        -> drop partition (d=1) ;
    Query OK, 0 rows affected (6.08 sec)
    
    mysql> /*+ cluster=vc-test */ msck repair table p;
    +-----------+-----------+
    | Partition | Operation |
    +-----------+-----------+
    | d=1       | CREATED   |
    +-----------+-----------+
    1 row in set (16.47 sec)
    
    mysql> /*+ cluster=vc-test */ select * from  p;
    +------+------+------+------+
    | a    | b    | c    | d    |
    +------+------+------+------+
    |    7 |    8 |    9 |    1 |
    +------+------+------+------+
    1 row in set (4.74 sec)

    For more information, see Common SQL statements.