All Products
Search
Document Center

Lindorm:Use DMS to manage jobs

Last Updated:Sep 28, 2023

Lindorm Distributed Processing System (LDPS) provided by Lindorm allows you to use the task orchestration feature of Data Management (DMS) to schedule Lindorm Spark jobs and view the publishing history and logs of Lindorm Spark jobs. LDPS can meet your computing requirements in scenarios such as data production, interactive analytics, machine learning, and graph computing. This topic describes how to use DMS to manage Lindorm Spark jobs.

Prerequisites

Create a Lindorm Spark task flow

  1. Log on to the DMS console V5.0.

  2. Go to the Task Orchestration page.

    • Simple mode:

      1. In the Scene Guide section, click Data Transmission and Processing (DTS).

      2. Click Task Orchestration in the Data processing section on the right side of the page.

    • Normal mode: In the top navigation bar, choose DTS > Data Development > Task Orchestration.

  3. On the Task Orchestration page, click Create Task Flow.

  4. In the Create Task Flow dialog box, specify Task Flow Name and Description, and click OK.

  5. In the Task Type section on the left side, drag Lindorm Spark nodes to the canvas, and connect one node to another node to specify the dependencies between the nodes.

  6. Configure the Lindorm Spark node.

    1. Double-click the Lindorm Spark node, or click the Lindorm Spark node and then click the 配置节点 icon.

    2. On the page that appears, configure the basic parameters and custom parameters of the job that you want to run.

      • In the Basic configuration section, configure the basic parameters. The following table describes the basic parameters.

        Parameter

        Description

        Region

        Select the region where your Lindorm instance is deployed.

        Lindorm Instance

        Select the ID of your Lindorm instance.

        Task Type

        Select a Spark job type. The following types of jobs are supported:

        • JAR

        • Python

        • SQL

      • In the Job configuration section, configure the custom parameters of the job that you want to run. The following sections describe the configuration templates and custom parameters of different types of Spark jobs.

        • The following section describes the configuration template and custom parameters of a JAR Spark job:

          {
            "mainResource" : "oss://path/to/your/file.jar",
            "mainClass" : "path.to.main.class",
            "args" : [ "arg1", "arg2" ],
            "configs" : {
              "spark.hadoop.fs.oss.endpoint" : "",
              "spark.hadoop.fs.oss.accessKeyId" : "",
              "spark.hadoop.fs.oss.accessKeySecret" : "",
              "spark.hadoop.fs.oss.impl" : "org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem",
              "spark.sql.shuffle.partitions" : "20"
            }
          }

          Parameter

          Type

          Required

          Description

          Example

          mainResource

          String

          Yes

          The path in which the JAR package is stored in HDFS or OSS.

          • HDFS path in which the JAR package is stored: hdfs:///path/spark-examples_2.12-3.1.1.jar

          • OSS path in which the JAR package is stored: oss://testBucketName/path/spark-examples_2.12-3.1.1.jar

          mainClass

          String

          Yes

          The class that is used as the entry point for your program in the JAR job.

          com.aliyun.ldspark.SparkPi

          args

          Array

          No

          The parameter that is passed to the mainClass parameter.

          ["arg1", "arg2"]

          configs

          Json

          No

          The system parameters of the Spark job. If the job is uploaded to OSS, you must configure the following parameters in configs:

          • spark.hadoop.fs.oss.endpoint: The path in which the Spark job is stored in OSS.

          • spark.hadoop.fs.oss.accessKeyId: The AccessKey ID used to access OSS. You can obtain the AccessKey ID in the console. For more information, see Obtain an AccessKey pair.

          • spark.hadoop.fs.oss.accessKeySecret: The AccessKey secret used to access OSS. You can obtain the AccessKey secret in the console. For more information, see Obtain an AccessKey pair.

          • spark.hadoop.fs.oss.impl: The class used to access OSS. Set the value to org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem.

          { "spark.sql.shuffle.partitions": "200"}

        • The following section describes the configuration template and custom parameters of a Python Spark job:

          {
            "mainResource" : "oss://path/to/your/file.py",
            "args" : [ "arg1", "arg2" ],
            "configs" : {
              "spark.hadoop.fs.oss.endpoint" : "",
              "spark.hadoop.fs.oss.accessKeyId" : "",
              "spark.hadoop.fs.oss.accessKeySecret" : "",
              "spark.hadoop.fs.oss.impl" : "org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem",
              "spark.submit.pyFiles" : "oss://path/to/your/project_file.py,oss://path/to/your/project_module.zip",
              "spark.archives" : "oss://path/to/your/environment.tar.gz#environment",
              "spark.sql.shuffle.partitions" : "20"
            }
          }

          Parameter

          Type

          Required

          Description

          Example

          mainResource

          String

          Yes

          The path in which the Python file is stored in OSS or HDFS.

          • OSS path in which the Python file is stored: oss://testBucketName/path/spark-examples.py

          • HDFS path in which the Python file is stored: hdfs:///path/spark-examples.py

          args

          Array

          No

          The parameter that is passed to the mainClass parameter.

          ["arg1", "arg2"]

          configs

          Json

          No

          The system parameters of the Spark job. If you need to upload the job to OSS, you must configure the following parameters in configs:

          • spark.hadoop.fs.oss.endpoint: The path in which the Spark job is stored in OSS.

          • spark.hadoop.fs.oss.accessKeyId: The AccessKey ID used to access OSS. You can obtain the AccessKey ID in the console. For more information, see Obtain an AccessKey pair.

          • spark.hadoop.fs.oss.accessKeySecret: The AccessKey secret used to access OSS. You can obtain the AccessKey secret in the console. For more information, see Obtain an AccessKey pair.

          • spark.hadoop.fs.oss.impl: The class used to access OSS. Set the value to org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem.

          {"spark.sql.shuffle.partitions": "200"}

        • The following section describes the configuration template and custom parameters of an SQL job:

          {
            "mainResource" : "oss://path/to/your/file.sql",
            "configs" : {
              "spark.hadoop.fs.oss.endpoint" : "",
              "spark.hadoop.fs.oss.accessKeyId" : "",
              "spark.hadoop.fs.oss.accessKeySecret" : "",
              "spark.hadoop.fs.oss.impl" : "org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem",
              "spark.sql.shuffle.partitions" : "20"
            }
          }

          Parameter

          Type

          Required

          Description

          Example

          mainResource

          String

          Yes

          The path in which the SQL file is stored in OSS or HDFS.

          • OSS path in which the SQL file is stored: oss://testBucketName/path/spark-examples.sql

          • HDFS path in which the SQL file is stored: hdfs:///path/spark-examples.sql

          configs

          Json

          No

          Other configurations of the SQL job.

          { "spark.executor.memory" : "8g"}

    3. After the preceding configuration is complete, click Try Run in the upper-left corner to check whether the job runs as expected.

  7. Publish the task flow. After all nodes are configured, click Publish in the upper-left corner of the page of the current task flow.

    发布任务流

View the publishing history and logs of a task flow

  1. On the Task Orchestration page, click the name of the task flow whose publishing history and logs you want to view.

  2. On the page that appears, click Go to O&M in the upper-right corner.

    查看发布记录
  3. View the publishing history and logs of the task flow.

    • View the publishing history of the task flow. On the Task Flow Information page, click the Published Tasks tab to view the publishing history of the task flow.发布记录

    • View the logs of the task flow.

      1. On the Running History tab, select Scheduling Trigger or Triggered Manually from the drop-down list in the upper-left corner to view the details of all nodes in the task flow.查看任务日志

      2. Click View in the line of the node that you want to view. Then, view the logs for the submission of the Lindorm Spark job, and obtain the job ID and SparkUI of the node.

        Note

        If the job failed to be submitted, provide the job ID and SparkUI when you submit a ticket.

Advanced settings

Note

You can configure a Lindorm Spark task flow in the DMS console. After the Lindorm Spark task flow is configured, you must republish the task flow.

Configure the scheduling settings

You can configure a scheduling policy based on your business requirements. The Lindorm Spark task flow is automatically executed based on the scheduling policy. You can perform the following steps to configure a scheduling policy:

  1. On the Task Orchestration page, click the name of the task flow for which you want to configure a scheduling policy.

  2. In the lower-left corner of the page that appears, click Task Flow Information.

    任务流信息
  3. In the Scheduling Settings section on the right, turn on Enable Scheduling and configure a scheduling policy. The following table describes the parameters that you can configure.

    Parameter

    Description

    Scheduling Type

    The scheduling type of the task flow. Valid values:

    • Cyclic scheduling: The task flow is periodically scheduled. For example, the task flow is run once a week.

    • Schedule once: The task flow is run once at a specific point in time. You need to specify only the point in time when the task flow is run.

    Effective Time

    The period during which the scheduling properties take effect. The default time period is from January 1, 1970 to January 1, 9999, which indicates that the scheduling properties permanently take effect.

    Scheduling Cycle

    The scheduling cycle of the task flow. Valid values:

    • Hour: The task flow is run within the hours that you specify. If you select this value, you must set the Timed Scheduling parameter.

    • Day: The task flow is run at the specified point in time every day. If you select this value, you must set the Specific Point in Time parameter.

    • Week: The task flow is run at the specified point in time on the days that you select every week. If you select this value, you must set the Specified Time and Specific Point in Time parameters.

    • Month: The task flow is run at the specified point in time on the days that you select every month. If you select this value, you must set the Specified Time and Specific Point in Time parameters.

    Timed Scheduling

    The scheduling method of the task flow. DMS provides the following scheduling methods:

    • Run at an interval:

      • Starting Time: the time when DMS runs the task flow.

      • Intervals: the interval at which the task flow is run. Unit: hours.

      • End Time: the time when DMS stops running the task flow.

      For example, you can set the Starting Time parameter to 00:00, the Intervals parameter to 6, and the End Time parameter to 20:59. In this case, DMS runs the task flow at 00:00, 06:00, 12:00, and 18:00.

    • Run at the specified point in time: You can select the hours at which DMS runs the task flow by using the Specified Time parameter.

      For example, if you select 0Hour and 5Hour, DMS runs the task flow at 00:00 and 05:00.

    Specified Time

    • If you set the Scheduling Cycle parameter to Week, you can select one or more days of a week from the drop-down list.

    • If you set the Scheduling Cycle parameter to Month, you can select one or more days of a month from the drop-down list.

    Specific Point in Time

    The point in time of the specified days at which the task flow is run.

    For example, if you set this parameter to 02:55, DMS runs the task flow at 02:55 on the specified days.

    Cron Expression

    The CRON expression that is automatically generated based on the values that you specify for the preceding parameters.

    Example: If you want your task flow to be scheduled at 00:00 and 12:00 every day, configure the scheduling policy by setting the following parameters:

    • Set Scheduling Type to Cyclic scheduling.

    • Select Hour from the Scheduling Cycle drop-down list.

    • In the Timed Scheduling field, select Specified Time. Select 0Hour and 12Hour from the Specified Time drop-down list.

Configure variables

For a task flow for which cyclic scheduling is enabled, you can configure a time variable for the job that you want to execute. For example, you can configure the time variable bizdate for the node. The time variable specifies the day before the time in point when the task is executed. You can perform the following steps to configure the time variable:

  1. On the current task flow page, double-click the Lindorm Spark node, or click the Lindorm Spark node and click the 配置节点 icon.

  2. In the right-side navigation pane, click Variable Setting.

  3. On the Node Variable or Task Flow Variable tab, add a variable.

  4. In the Job configuration section, use the variable. For information about other variables, see Variables.

Manage notifications

If you enable the notification feature for your task flow, the system sends a notification message based on the execution result of the task flow. You can perform the following steps to enable the notification feature:

  1. In the lower-left corner of the current task flow page, click Notification Configurations.

  2. Turn on one of the following notification switches based on your business requirements:

    • Basic Notifications

      • Success Notification: The system sends a notification message if the task flow is executed.

      • Failure Notification: The system sends a notification message if the task flow fails to be executed.

    • Timeout Notification: The system sends a notification message if the task flow times out.

    • Alert Notification: The system sends a notification message when the task is about to start.

  3. (Optional): Configure the recipients of the messages. For more information about how to configure message recipients, see Manage notification rules.

Execute SQL statements

  1. Log on to the DMS console V5.0.

  2. Click the Home tab.

  3. In the left-side navigation pane, click the image.png icon to create an instance.

  4. In the Add Instance dialog box, select Lindorm_Compute in the NoSQL Database section.

    image.png
  5. Specify Instance Region, Instance ID, Database Account, and Database password for the instance, and then click Submit.

  6. In the dialog box that appears, click Submit to go to SQLConsole.

  7. On the SQLConsole tab, enter the SQL statement to be executed and click Execute.

References

For more information about the task orchestration feature of DMS, see Overview.