This topic describes how to use Spark Streaming SQL to perform data analysis and interactive development on the LogHub data source.
CREATE TABLE syntax
CREATE TABLE tbName(columnName dataType [,columnName dataType]*)
USING loghub
OPTIONS(propertyName=propertyValue[,propertyName=propertyValue]*);
Table schema
When you create a LogHub data table, you must explicitly define the fields in the
data table. Custom field names in the table schema must be the same as key names in
Log Service.
spark-sql> CREATE TABLE loghub_table_test(content string)
> USING loghub
> OPTIONS(
> endpoint="sls.aliyuncs.com",
> access.key.id="yHiu*******BG2s",
> access.key.secret="ABctuw0M***************iKKljZy",
> sls.project="test",
> sls.store="myInstance");
spark-sql> DESC loghub_table_test;
content string NULL
Time taken: 0.436 seconds, Fetched 1 row(s)
CREATE TABLE syntax, with no schema specified:
spark-sql> CREATE TABLE loghub_table_test
> USING loghub
> OPTIONS
> (...)
spark-sql> DESC loghub_table_test;
__logProject__ string NULL
__logStore__ string NULL
__shard__ string NULL
__time__ string NULL
__topic__ string NULL
__source__ string NULL
__value__ string NULL
__sequence_number__ string NULL
Time taken: 0.436 seconds, Fetched 1 row(s)
Parameters
Parameter | Description | Required |
---|---|---|
endpoint | The endpoint of the Log Service API. | Yes |
access.key.id | The AccessKey ID. | Yes |
access.key.secret | The AccessKey secret. | Yes |
sls.store | The name of the Log Service project. | Yes |
sls.project | The name of the Logstore. | Yes |
The LogHub schema is optional for EMR SDK 2.0.0 or later. The following table describes
relevant parameters.
Parameter | Type | Description |
---|---|---|
__logProject__ | STRING | The name of the Logstore. |
__logStore__ | STRING | The name of the Log Service project. |
__shard__ | STRING | The shard of the Logstore. |
__time__ | STRING | The time when the log entry was created. |
__topic__ | STRING | The topic of the log. |
__source__ | STRING | The source IP address of Log Service. |
__value__ | STRING | The content of Log Service, in the JSON format. |
__sequence_number__ | STRING | The sequence number of the record. This field can be specified only when appendSequenceNumber is set to true. Default value: NULL. |