This topic describes how to use Spark SQL to perform data analysis and interactive development on the Druid data source.

CREATE TABLE syntax

create table tbName
using druid
options(propertyKey=propertyValue[, propertyKey=propertyValue]*);

Parameters

Parameter Description Required
curator.connect The host and port of ZooKeeper, such as emr-header-1:2181. Yes
curator.max.retries The maximum number of connection retries upon a ZooKeeper connection failure. Default value: 5. No
curator.retry.base.sleep The initial interval at which a connection retry is made upon a ZooKeeper connection failure. Default value: 100. Unit: milliseconds. No
curator.retry.max.sleep The maximum interval at which a connection retry is made upon a ZooKeeper connection failure. Default value: 3000. Unit: milliseconds. No
index.service The indexing service, such as druid or overlord. Yes
data.source The name of the data source from which data is written to Druid. Yes
discovery.path The discovery path of Druid. The default value is druid or discovery. No
firehose The firehose, such as druid:firehose:%s. Yes
rollup.aggregators The rollup aggregators of Tranquility, in JSON format. Example:
{\"metricsSpec\":[{\"type\":\"count\",\"name\":\"count\"},
                  {\"type\":\"doubleSum\",\"fieldName\":\"value\",\"name\":\"sum\"},
                  {\"type\":\"doubleMin\",\"fieldName\":\"value\",\"name\":\"min\"},
                  {\"type\":\"doubleMax\",\"fieldName\":\"value\",\"name\":\"max\"}]}
where, metricsSpec is fixed.
Yes
rollup.dimensions The dimension from which data is written to Druid. Yes
rollup.query.granularities The rollup granularity, such as minute. Yes
tuning.window.period The size of the time window. Default value: PT10M. No
tuning.segment.granularity The segment granularity. Default value: DAY. No
tuning.partitions The number of partitions. Default value: 1. No
tuning.replications The number of replicas. Default value: 1. No
timestampSpec.column The name of the timestamp column when data is written to Druid. Default value: timestamp. No
timestampSpec.format The format of the timestamp column name when data is written to Druid. Default value: iso. No

Table schema

When you create a Druid data table, you do not need to explicitly define the fields in the data table. Example:
create table druid_test_table
using druid
options(
curator.connect="${ZooKeeper-host}:${ZooKeeper-port}}",
index.service="druid/overlord",
data.source="test_source",
discovery.path="/druid/discovery",
firehouse="druid:firehose:%s",
rollup.aggregators="{\"metricsSpec\":[{\"type\":\"count\",\"name\":\"count\"},
                                      {\"type\":\"doubleSum\",\"fieldName\":\"value\",\"name\":\"sum\"},
                                      {\"type\":\"doubleMin\",\"fieldName\":\"value\",\"name\":\"min\"},
                                      {\"type\":\"doubleMax\",\"fieldName\":\"value\",\"name\":\"max\"}]}",
rollup.dimensions="timestamp,metric,userId",
rollup.query.granularities="minute",
tuning.segment.granularity="FIVE_MINUTE",
tuning.window.period="PT5M",
timestampSpec.column="timestamp",
timestampSpec.format="posix");