This topic describes how to use Alink to schedule jobs in the E-MapReduce (EMR) console.

Prerequisites

  • An EMR Data Science cluster is created. For more information, see Create a cluster.
  • A project is created. For more information, see Manage projects.
  • Alink is configured. For more information, see PAI-Alink.
  • PuTTY and SSH Secure File Transfer Client are installed on your computer.

Obtain the task script

  1. Log on to the Alibaba Cloud EMR console and access the homepage of Alink. For more information, see PAI-Alink.
  2. On the homepage of Alink, click Create from Template for a template to create an experiment based on the template.
    In this topic, the Scoring Card Function template is used as an example.
  3. In the Create Sample Experiment dialog box, specify Experiment Name and click OK.
  4. In the lower part of the page, choose Deploy > Generate Deployment Script.
    The Deployment Script dialog box displays the script details.
    Note The script is the executable script of the experiment.
  5. Save the script as a file named script.py on your computer.

Configure the deployment script

  1. Create a configuration file named config.txt.
    Add the following content to the file:
    userId=default
    alinkServerEndpoint=http://127.0.0.1:9301
    hadoopHome=/usr/lib/hadoop-current
    hadoopUserName=hadoop
    token=ZSHTIeEkwrtZJJsN1ZZmCJJmr5jaj1wO
  2. Use SSH Secure File Transfer Client to upload the config.txt and script.py files to the root directory of the master node in the Data Science cluster.
  3. Log on to the master node of the cluster. For more information, see Connect to the master node of an EMR cluster in SSH mode.
  4. Run the following command to view the files:
    check file

Configure a scheduling task

  1. Log on to the EMR console.
  2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
  3. Click the Data Platform tab.
  4. Create a task execution cluster.
    1. Find the project where you want to configure a scheduling task, and click Edit Job in the Actions column.
    2. Click the Projects tab.
    3. In the left-side navigation pane, click Cluster Settings.
    4. In the upper-right corner, click Add Cluster.
    5. In the Add Cluster dialog box, select the created Data Science cluster from the Select Cluster drop-down list and click OK.
  5. Create a job.
    1. Click the Data Platform tab.
    2. Create a Shell job.
      For more information about how to create a job, see Configure a Shell job.
    3. Enter job content in the Content field.
      Example:
      sudo alinkcmd run -c /root/config.txt -f /root/script.py
  6. Configure the job.
    1. Click Job Settings in the upper-right corner.
    2. In the Job Settings pane, click the Advanced Settings tab.
    3. In the Mode section, select Header/Gateway Node from the Job Submission drop-down list.
  7. Run the job.
    1. Click Save in the upper-right corner.
    2. Click Run in the upper-right corner.
    3. In the Run Job dialog box, set the execution cluster to the created Data Science cluster and click OK.