This topic describes how to use Spark Streaming SQL to perform data analysis and interactive development on the Druid data source.
CREATE TABLE syntax
create table tbName
using druid
options(propertyKey=propertyValue[, propertyKey=propertyValue]*);
Table schema
When you create a Druid data table, you do not need to explicitly define the fields
in the data table. Example:
create table druid_test_table
using druid
options(
curator.connect="${ZooKeeper-host}:${ZooKeeper-port}}",
index.service="druid/overlord",
data.source="test_source",
discovery.path="/druid/discovery",
firehose="druid:firehose:%s",
rollup.aggregators="{\"metricsSpec\":[{\"type\":\"count\",\"name\":\"count\"},
{\"type\":\"doubleSum\",\"fieldName\":\"value\",\"name\":\"sum\"},
{\"type\":\"doubleMin\",\"fieldName\":\"value\",\"name\":\"min\"},
{\"type\":\"doubleMax\",\"fieldName\":\"value\",\"name\":\"max\"}]}",
rollup.dimensions="timestamp,metric,userId",
rollup.query.granularities="minute",
tuning.segment.granularity="FIVE_MINUTE",
tuning.window.period="PT5M",
timestampSpec.column="timestamp",
timestampSpec.format="posix");
Parameters
Parameter | Description | Required |
---|---|---|
curator.connect | The host and port of ZooKeeper, such as emr-header-1:2181. | Yes |
curator.max.retries | The maximum number of connection retries upon a ZooKeeper connection failure. Default value: 5. | No |
curator.retry.base.sleep | The initial interval at which a connection retry is made upon a ZooKeeper connection failure. Default value: 100. Unit: milliseconds. | No |
curator.retry.max.sleep | The maximum interval at which a connection retry is made upon a ZooKeeper connection failure. Default value: 3000. Unit: milliseconds. | No |
index.service | The indexing service, such as druid or overlord. | Yes |
data.source | The name of the data source from which data is written into Druid. | Yes |
discovery.path | The discovery path of Druid. The default value is druid or discovery. | No |
firehose | The firehose, such as druid:firehose:%s .
|
Yes |
rollup.aggregators | The rollup aggregators of Tranquility, in the JSON format. Example: where, metricsSpec is fixed.
|
Yes |
rollup.dimensions | The dimension from which data is written into Druid. | Yes |
rollup.query.granularities | The rollup granularity, such as minute. | Yes |
tuning.window.period | The size of the time window. Default value: PT10M. | No |
tuning.segment.granularity | The segment granularity. Default value: DAY. | No |
tuning.partitions | The number of partitions. Default value: 1. | No |
tuning.replications | The number of replicas. Default value: 1. | No |
timestampSpec.column | The name of the timestamp column when data is written into Druid. Default value: timestamp. | No |
timestampSpec.format | The format of the timestamp column name when data is written into Druid. Default value: iso. | No |