This topic describes how to configure Streaming SQL jobs.

Prerequisites

  • You have created a project. For more information, see Manage a workflow project.
  • You have downloaded the Spark Streaming SQL dependent libraries. For more information, see Background information.

Background information

For more information about Streaming SQL, see Spark Streaming SQL.

When you configure a Streaming SQL job, you must specify dependent libraries. The following table lists the version and other detailed information about the dependent library provided by Spark Streaming SQL to support multiple data sources. In principle, you need to use the latest version of the dependent library.

Name Version Release date Reference string Details
datasources-bundle 1.7.0 July 29, 2019 sharedlibs:streamingsql:datasources-bundle:1.7.0 Supported data sources: Kafka, Log Service, Druid, Table Store, HBase, and JDBC.
  • Copy the reference string to Job Settings > Streaming Task Settings > Dependent Libraries in Data Platform of the E-MapReduce console.
  • The data sources specified in the preceding table all support reading and writing streams.
  • For more information, see Data sources.

Step 1: Create a Streaming SQL job.

  1. Log on to the E-MapReduce console by using an Alibaba Cloud account.
  2. Click the Data Platform tab. The Projects list is displayed.
  3. Click Workflows next to the target job. In the left-side navigation pane, select Edit Job.
  4. In the left-side Edit Job pane, right-click a folder and then select Create Job.
    Note You can right-click a folder and then choose to create a sub-folder, rename the current folder, or delete the current folder.
  5. In the Create Job dialog box, enter the Name and Description, and select Streaming SQL from the Job Type drop-down list.

  6. Click OK to create the Streaming SQL job.
    The system then automatically opens the job for you. You can enter code into the job.

Step 2: Configure Streaming SQL statements for the job.

In the E-MapReduce background, Streaming SQL jobs are submitted in the form of streaming-sql -f {SQL_SCRIPT}. Streaming SQL statements are saved in SQL_SCRIPT.

After you create the job, enter Streaming SQL statements into the text box of the job.

Streaming SQL statement example:

--- Create a Log Service table 
CREATE TABLE IF NOT EXISTS ${slsTableName} 
   USING loghub 
   OPTIONS ( 
        sls.project = '${logProjectName}', 
        sls.store = '${logStoreName}', 
        access.key.id = '${accessKeyId}', 
        access.key.secret = '${accessKeySecret}', 
        endpoint = '${endpoint}'
   ); 
--- Import data to HDFS 
INSERT INTO 
    ${hdfsTableName} 
SELECT 
    col1, col2 
FROM  ${slsTableName} 
WHERE ${condition}


Step 3: Specify dependent libraries and actions on failures.

Dependent libraries: Streaming SQL jobs depend on data source libraries. E-MapReduce publishes these libraries to the repository of the scheduling center as dependent libraries. When you create a job, you need to specify dependent libraries for the job.

Actions on failures: Specify the action to be performed when E-MapReduce failed to execute the current statement.

  1. After you enter statements into the job, click Job Settings in the upper-right corner, and then click the Streaming Task Settings tab.
  2. On the Streaming Task Settings tab, specify the dependent libraries and actions on failures.
    Section Parameter Description
    Actions on Failures Action on Current Statement Failure When E-MapReduce failed to execute the current statement, you can choose to perform one of the following actions:
    • Execute Next Statement: Execute the next statement.
    • Terminate Job: Terminate the job.
    Dependent Libraries Libraries Enter the reference strings of dependent libraries, for example, sharedlibs:streamingsql:datasources-bundle:1.7.0.
  3. Click Save to complete configuring the Streaming SQL job.