All Products
Search
Document Center

Use DLA to access LindormDFS

Last Updated: Jul 11, 2021

This topic describes how to connect to LindormDFS by using Data Lake Analytics (DLA) and query data in LindormDFS.

DLA allows you to use Serverless Presto CU and Serverless Spark to access LindormDFS.

For more information about how to use DLA Serverless Spark to access LindormDFS, see Hadoop.

The following sections describe how to use DLA Serverless SQL CU to access LindormDFS.

Preparations

  1. Activate the LindormDFS service. For more information, see Activate the LindormDFS service.

  2. You can use only Serverless Presto to access LindormDFS. Make sure that Serverless Presto is activated. For more information, see Use the DLA Presto-compatible Presto CU edition.

  3. The network that you specify when you create a virtual cluster (VC) and the network in which LindormDFS is deployed is in the same virtual private cloud (VPC).

  4. Configure the whitelist that controls access to LindormDFS and add the CIDR block of the VPC to the whitelist.

    Whitelist
  5. A simple CSV file is used in the following example. You can create a local TXT file such as example.txt and paste the following information to the file:

    7,8,9

    Run the following commands to upload the file to LindormDFS:

    hadoop fs -mkdir -p hdfs://${Instance ID}-master1-001.lindorm.rds.aliyuncs.com:8020/test/p/d=1
    hadoop fs -copyFromLocal example.txt hdfs://${Intance ID}-master1-001.lindorm.rds.aliyuncs.com:8020/test/p/d=1/example.txt
    Note

    When you use the serverless SQL service provided by DLA to access LindormDFS, the high availability mode is not supported. Therefore, you must replace each of the addresses in the preceding commands with the address of the host where the primary node of LindormDFS runs. You can find the information about the primary and secondary nodes in the hdfs-site file. For more information about the configuration information in the hdfs-site file, see Generation of access configuration.

  6. Run the following command to check whether the file is uploaded:

    hadoop fs -ls hdfs://${Instance ID}-master1-001.lindorm.rds.aliyuncs.com:8020/test/p/d=1

    The following code provides an example of the command output. The output indicates that the upload is successful.

    Command output

Procedure

  1. Log on to the DLA console.

  2. In the left-side navigation pane, choose Serverless SQL > Execute.

  3. In the upper-right corner, click Log on in DMS to execute SQL operations. Then, execute the following SQL statement to create a database:

    CREATE DATABASE `my_lindorm_store_db`
    WITH DBPROPERTIES (
        catalog = 'hive',
        location = 'hdfs://${Instance ID}-master1-001.lindorm.rds.aliyuncs.com:8020/test/'
    )
    Note

    You can also use a MySQL client or program code to connect to DLA and execute the preceding SQL statement to create a database.

    The following list describes the parameters in the SQL statement:

    1. catalog: The value hive indicates that the Hive schema is created.

    2. LOCATION: The directory where the database is located.

  4. Create a table

    USE `my_lindorm_store_db`;
    CREATE EXTERNAL TABLE p (
        `a` int,
        `b` int,
        `c` int
    ) partitioned by (d int)
    ROW FORMAT DELIMITED
        FIELDS TERMINATED BY ','
    STORED AS `TEXTFILE`
    LOCATION 'hdfs://${Instance ID}-master1-001.lindorm.rds.aliyuncs.com:8020/test/p/';
    Note

    The parameters and attributes of the HDFS table are similar to those of an Object Storage Service (OSS) table. The differences lie in the following aspects:

    • Due to network connectivity issues, the directory information is not checked when you create a database or a table. You must make sure that the directory is valid.

    • For the same reason, HDFS tables do not support the auto.create.location attribute.

  5. Query or access data.

    Add hint: /*+pool-selector=your-vc-name*/ to the SQL statements that are executed to access tables in LindormDFS. This step is required because only CUs of DLA Serverless SQL CU can access the network where LindormDFS is deployed. After this hint is added, the SQL statements can be properly executed in DLA Serverless SQL CU.
     /*+ pool-selector=test-vc */
        -> alter table p
        -> add partition (d=1)
        -> location 'hdfs://${Instance ID}-master1-001.lindorm.rds.aliyuncs.com:8020/test/p/d=1';
    Query OK, 0 rows affected (8.63 sec)
    
    mysql> /*+ pool-selector=test-vc */ select * from  p;
    +------+------+------+------+
    | a    | b    | c    | d    |
    +------+------+------+------+
    |    7 |    8 |    9 |    1 |
    +------+------+------+------+
    1 row in set (4.74 sec)

    The following result is returned.

    For more information, see Common SQL statements.