Data Lake Analytics (DLA) CU Edition allows you to access a self-managed Hive metastore. This topic describes how to use DLA to access and query Hive metastore data from Hadoop Distributed File System (HDFS).

Prerequisites

  • DLA CU Edition is activated. For more information, see Use the DLA Presto-compatible Presto CU edition.
    Note
    • When you create a virtual cluster (VC), make sure that the network of the data source, the Hive metastore, and the HDFS cluster reside in the same virtual private cloud (VPC).
    • DLA is not allowed to access HDFS clusters for which Kerberos authentication is enabled. If Kerberos authentication is enabled for your HDFS cluster, submit a ticket.
  • A database and a table are created in the Hive metastore. Data is inserted into the table. Sample statements:
    CREATE DATABASE testDb;
    
    CREATE EXTERNAL TABLE if not exists testDb.testTable(
                    id int,
                    name string);
    
    insert into testDb.testTable(id, name) values (1, "jack");

Procedure

  1. Log on to the DLA console.
  2. In the left-side navigation pane, choose Serverless Presto > Execute. On the Execute page, click Log on to DMS to execute SQL, and execute the following SQL statements to create a database and a table.
    Note You can also use a MySQL client or program code to connect to DLA and execute the following SQL statements to create a database and a table.
    1. Create a database.
      CREATE DATABASE `dlaDb`
      WITH DBPROPERTIES (
          catalog = 'customer_hive',
          database = 'testDb',
          location = '172.16.199.34:9083',
          vpc_id = 'xxx',
          hdfs_properties = 'fs.defaultFS=hdfs://172.16.199.41:9000'
      )
      Parameter Description
      catalog The catalog of the database that you want to create. The data source of the database is the metadata stored in the Hive metastore. Set this parameter to customer_hive.
      database The name of the database that you want to create. Set this parameter to the name of the database in the Hive metastore.
      location The endpoint of the database that you want to create. Set this parameter to the endpoint of the Hive metastore.
      vpc_id The ID of the VPC where the Hive metastore resides.
      hdfs_properties The default HDFS cluster configuration in the Hive metastore that you want to access. The HDFS cluster can be a non-HA HDFS cluster or an HA HDFS cluster.
      • Example value for a non-HA HDFS cluster:
        // The default HDFS cluster configuration is the endpoint of a self-managed non-HA HDFS cluster. The endpoint is a combination of an IP address and a port number that is mapped to a domain name. A hostname is not supported.
        hdfs_properties='fs.defaultFS=hdfs://172.16.199.41:9000'
      • Example value for an HA HDFS cluster:
        // The default HDFS cluster configuration is the endpoint of a self-managed HA HDFS cluster. The endpoint is a combination of an IP address and a port number that is mapped to a domain name. A hostname is not supported.
        hdfs_properties='fs.defaultFS=hdfs://emr-cluster;dfs.nameservices=emr-cluster;dfs.client.failover.proxy.provider.emr-cluster=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider;dfs.ha.namenodes.emr-cluster=nn1,nn2;dfs.nameservice.id=emr-cluster;dfs.namenode.rpc-address.emr-cluster.nn1=172.16.199.34:8020;dfs.namenode.rpc-address.emr-cluster.nn2=172.16.199.35:8020'
    2. Create a table.
      • Execute the CREATE TABLE statement to create a table.
        CREATE EXTERNAL TABLE if not exists dlaDb.testTable(
                        id int,
                        name string);
        Note The table name testTable must be the same as the table name of the database in the Hive metastore.
      • Execute the MSCK statement to create a table.
        msck repair database dlaDb;
  3. Query or access data.
    Only DLA CU Edition can access the Hive metastore. Therefore, you must add the hint hint: /*+cluster=your-vc-name*/ to all the SQL statements that are used to access tables in the Hive metastore. This way, SQL statements with the hint can be executed in DLA CU Edition.
    Sample statement:
    mysql> /*+ cluster=vc-test */ select * from  dlaDb.testTable;
    +------+------+
    | id   | name |
    +------+------+
    |    1 | jack |
    +------+------+
    1 row in set (1.74 sec)