Data Lake Analytics (DLA) uses Java Database Connectivity (JDBC) to connect to a user-created Druid database that is hosted on Elastic Compute Service (ECS) and queries data in the Druid database.

This topic describes how to use DLA to connect to a user-created Druid database and query its data.

Prerequisites

DLA is connected to a user-created Druid database that is hosted on an ECS instance over a virtual private cloud (VPC). In this situation, you must make sure that DLA is in the same region as the ECS instance.

druid.sql.enable and druid.sql.avatica.enable are set to true. For more information about other SQL parameters, see Druid documentation.

Wikipedia Edits is selected as the example dataset and the data can be loaded to the Druid database by using the wizard.

You have added an ECS security group rule to allow access from the CIDR block 100.104.0.0/16.

When you add an ECS security group rule, select the broker port and set the authorization object to 100.104.0.0/16. By default, broker port 8082 is used.

Procedure

  1. Connect to DLA.
  2. Execute the following SQL statement to create a Druid schema in DLA:
      CREATE DATABASE `my_druid_db`
    WITH DBPROPERTIES (
     catalog = 'druid',
     location = 'jdbc:avatica:remote:url=http://BROKER:8082/druid/v2/sql/avatica/',
     user = '******',
     password = '******',
     VPC_ID = 'vpc-*****',
     INSTANCE_ID = '********'
    )

    Parameters:

    • catalog: the schema you want to create. The value druid indicates that a Druid schema is created.
    • location: the JDBC address of the Druid database. For more information, see Druid documentation.
    • user: the username used to access the Druid database.
    • password: the password used to access the Druid database.
    • VPC_ID: the ID of the VPC where the broker resides.
    • INSTANCE_ID: the ID of the ECS instance where the broker resides.
  3. Execute the following SQL statement to create a wikipedia table in the my_druid_db schema:

     CREATE EXTERNAL TABLE `wikipedia` ( 
    `__time` TIMESTAMP NULL COMMENT '', 
    `added` BIGINT NULL COMMENT '',
     `channel` string NULL COMMENT '', 
    `cityname` string NULL COMMENT '', 
    `comment` string NULL COMMENT '',
     `countryisocode` string NULL COMMENT '',
    `countryname` string NULL COMMENT '', 
    `deleted` BIGINT NULL COMMENT '', 
    `delta` BIGINT NULL COMMENT '',
     `isanonymous` string NULL COMMENT '', 
    `isminor` string NULL COMMENT '',
     `isnew` string NULL COMMENT '', 
    `isrobot` string NULL COMMENT '',
     `isunpatrolled` string NULL COMMENT '', 
    `namespace` string NULL COMMENT '',
     `page` string NULL COMMENT '', 
    `regionisocode` string NULL COMMENT '',
     `regionname` string NULL COMMENT '', 
    `user` string NULL COMMENT ''
    )
    TBLPROPERTIES (
     TABLE_MAPPING = 'druid.wikipedia',
     COLUMN_MAPPING = 'cityname,cityName; countryisocode,countryIsoCode; countryname,countryName; isanonymous,isAnonymous; isminor,isMinor; isnew,isNew; isrobot,isRobot; isunpatrolled,isUnpatrolled; regionisocode,regionIsoCode; regionname,regionName; '
    );

    You can also run the msck command to update the table schema in the Druid database:

     msck repair database my_druid_db;

    After the table is created, you can execute the SELECT statement to read data from the Druid database.

     select * from wikipedia;