This topic describes how to configure a Streaming SQL job.

Background information

For more information about Streaming SQL, see Spark Streaming SQL.

When you configure a Streaming SQL job, you must specify dependency libraries. The following table describes the recent versions and other details about the dependency library provided by Spark Streaming SQL. We recommend that you use the latest version of the dependency library.

Dependency library Supported version Release date Reference string Description
datasources-bundle 2.0.0 (recommended) 2020/02/26 sharedlibs:streamingsql:datasources-bundle:2.0.0 Supported data sources include Kafka, LogHub, Druid, Tablestore, HBase, JDBC, DataHub, Redis, Kudu, and DTS.
1.9.0 2019/11/20 sharedlibs:streamingsql:datasources-bundle:1.9.0 Supported data sources include Kafka, LogHub, Druid, Tablestore, HBase, JDBC, DataHub, Redis, and Kudu.
1.8.0 2019/10/17 sharedlibs:streamingsql:datasources-bundle:1.8.0 Supported data sources include Kafka, LogHub, Druid, Tablestore, HBase, JDBC, DataHub, and Redis.
1.7.0 2019/07/29 sharedlibs:streamingsql:datasources-bundle:1.7.0 Supported data sources include Kafka, LogHub, Druid, Tablestore, HBase, and JDBC.

For more information, see Overview.

Prerequisites

  • A project is created. For more information, see Manage projects.
  • Resources and data files required for a job are obtained, such as JAR packages, names of the data files, and storage paths of both the JAR packages and data files.

Procedure

  1. Go to the Data Platform tab.
    1. Log on to the Alibaba Cloud EMR console by using your Alibaba Cloud account.
    2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
    3. Click the Data Platform tab.
  2. In the Projects section, find your project and click Edit Job in the Actions column.
  3. Create a Streaming SQL job.
    1. In the Edit Job pane on the left, right-click the folder on which you want to perform operations and select Create Job.
    2. In the Create Job dialog box, specify Name and Description, and select Streaming SQL from the Job Type drop-down list.
    3. Click OK.
  4. Edit job content.
    Specify the command line parameters required to submit the job in the Content field. Example:
    --- Create a Log Service table.  
    CREATE TABLE IF NOT EXISTS ${slsTableName} 
       USING loghub 
       OPTIONS ( 
            sls.project = '${logProjectName}', 
            sls.store = '${logStoreName}', 
            access.key.id = '${accessKeyId}', 
            access.key.secret = '${accessKeySecret}', 
            endpoint = '${endpoint}'
       ); 
    --- Import data to HDFS. 
    INSERT INTO 
        ${hdfsTableName} 
    SELECT 
        col1, col2 
    FROM  ${slsTableName} 
    WHERE ${condition}
    Note The command used to submit a Streaming SQL job is streaming-sql -f {sql_script}. The SQL statements that you enter in the job editor are saved in sql_script.
  5. Configure dependency libraries and actions on failures.
    1. Click Job Settings in the upper-right corner.
    2. On the Shared Libraries and Streaming Task Settings tabs, configure dependency libraries and actions on failures.
      Section Configuration item Description
      Dependent Libraries Libraries Job execution depends on some library files related to data sources. EMR publishes the libraries to the repository of the scheduling center as dependency libraries. You must specify dependency libraries when you create a job.

      To specify a dependency library, enter its reference string, such as sharedlibs:streamingsql:datasources-bundle:2.0.0.

      Actions on Failures Action on Current Statement Failure The action to perform when EMR fails to execute a statement. You can perform one of the following actions:
      • Execute Next Statement: Execute the next statement.
      • Terminate Job: Terminate the job.
    3. Click Save.