This topic describes how to create and run Spark jobs in the DLA console.
- A virtual cluster (VC) is created. For more information, see Create a virtual cluster.
Note When you create a VC, you must set Engine to Spark.
- Your RAM user is granted the permissions to submit jobs. This operation is required only when you log on to DLA as a RAM user. For more information, see Grant permissions to a RAM user (detailed version). If SparkPi is used, you do not need to access external data sources. You only need to bind a DLA child account with the RAM user and grant the DLA child account the permissions to access DLA.
- Log on to the DLA console.
- In the top navigation bar, select the region where DLA is deployed.
- In the left-side navigation pane, choose Serverless Spark > Submit job.
- On the Parameter Configuration page, click Create Job.
- In the Create Job dialog box, specify the parameters as required.
File Name The name of the folder or file. Data Format The data format. You can select File or Folder from the Data Format drop-down list. Parent The parent directory of the file or folder.
- You must set this parameter to Job List that is a root directory. All jobs must be created in the Job List directory.
- You can first create a folder in the Job List directory, and then create jobs in the created folder. Alternatively, you can directly create jobs in the Job List directory.
- Click OK to create a Spark job.
- Compile a Spark job. For more information, see Configure a Spark job.
- Proceed with one of the following operations as required.
- Click Save to save the Spark job. You can then reuse the job if needed.
- Click Execute to run the Spark job. You can then check the execution status of the job in real time.
Task ID The ID of the Spark job, which is generated by DLA. State The running status of the Spark job.
- STARTING: indicates that the Spark job is being submitted.
- RUNNING: indicates that the Spark job is running.
- SUCCESS: indicates that the Spark job succeeds.
- DEAD: indicates that an error occurs when the Spark job is running. You can view the job logs and then troubleshoot the error.
- KILLED: indicates that the Spark job is killed.
Task Name The name of the Spark job that you created. Submit Time The time when the Spark job was submitted. Start Up Time The time when the Spark job started. Update time The time when the status of the Spark job changed. Duration The time required to run the Spark job. Operation You can perform the following operations:
- Log: queries the log of the Spark job. The latest 300 lines in the log can be queried.
- SparkUI: accesses the address of the Apache Spark web UI of the job. If the token expires, click Refresh to obtain the latest address.
- Details: views the JSON script that is used to submit the Spark job.
- Kill: kills the Spark job.
To view the sample job SparkPi provided by DLA, click Example to view the job configurations and click Execute to run the job.
- DLA provides demo code for developing Spark jobs. For more information, see Aliyun DLA Demo. You can clone the code and run the mvn package command. We recommend that you follow the instructions in this topic to configure and develop the POM file.
- For more information about how to use DMS to orchestrate and schedule Spark jobs, see Use the task orchestration feature of DMS to train a machine learning model.