This topic describes how to use Spark Streaming SQL to perform data analysis and interactive development on the LogHub data source.

CREATE TABLE syntax

CREATE TABLE tbName(columnName dataType [,columnName dataType]*)
USING loghub
OPTIONS(propertyName=propertyValue[,propertyName=propertyValue]*);

Table schema

When you create a LogHub data table, you must explicitly define the fields in the data table. Custom field names in the table schema must be the same as key names in Log Service.
spark-sql> CREATE TABLE loghub_table_test(content string)
         > USING loghub
         > OPTIONS(
         > endpoint="sls.aliyuncs.com",
         > access.key.id="yHiu*******BG2s",
         > access.key.secret="ABctuw0M***************iKKljZy",
         > sls.project="test",
         > sls.store="myInstance");

spark-sql> DESC loghub_table_test;
content  string  NULL
Time taken: 0.436 seconds, Fetched 1 row(s)
CREATE TABLE syntax, with no schema specified:
spark-sql> CREATE TABLE loghub_table_test
         > USING loghub
         > OPTIONS
         > (...)

spark-sql> DESC loghub_table_test;
__logProject__ string  NULL
__logStore__ string NULL
__shard__ string NULL
__time__ string NULL
__topic__ string NULL
__source__ string NULL
__value__ string NULL
__sequence_number__ string NULL
Time taken: 0.436 seconds, Fetched 1 row(s)

Parameters

Parameter Description Required
endpoint The endpoint of the Log Service API. Yes
access.key.id The AccessKey ID. Yes
access.key.secret The AccessKey secret. Yes
sls.store The name of the Log Service project. Yes
sls.project The name of the Logstore. Yes
The LogHub schema is optional for EMR SDK 2.0.0 or later. The following table describes relevant parameters.
Parameter Type Description
__logProject__ STRING The name of the Logstore.
__logStore__ STRING The name of the Log Service project.
__shard__ STRING The shard of the Logstore.
__time__ STRING The time when the log entry was created.
__topic__ STRING The topic of the log.
__source__ STRING The source IP address of Log Service.
__value__ STRING The content of Log Service, in the JSON format.
__sequence_number__ STRING The sequence number of the record. This field can be specified only when appendSequenceNumber is set to true. Default value: NULL.