MaxCompute allows you to schedule jobs by using Azkaban. This helps you efficiently complete high-frequency data analysis. This topic describes how to use Azkaban to schedule SQL jobs by running commands on the MaxCompute client.

Background information

Azkaban is a system that is used to schedule different types of jobs, including the Command, Hadoop MapReduce, Hive, Spark, and Pig jobs. The most commonly used jobs are Command jobs. Azkaban also allows you to use custom plug-ins. For more information about Azkaban, see Azkaban.

To schedule jobs, you need to package the files that are required for scheduling these jobs into a ZIP file. The files include the source data file and script files for creating tables, importing data, and querying data. Then, upload the ZIP file to Azkaban.

This topic demonstrates how to use the job scheduling feature of Azkaban to implement the SQL logic for creating tables, importing data, and querying data. The following figure shows the workflow for scheduling jobs and lists the job files and script files that are used for each job.

Workflow and related files

Prerequisites

Before you schedule MaxCompute jobs, make sure that the following conditions are met:

Procedure

  1. Step 1: Prepare the required files and package them into a ZIP file.
    Prepare the source data files and script files that are required for scheduling jobs and package them into a ZIP file.
  2. Step 2: Upload the ZIP file to Azkaban.
    Create an Azkaban project, upload the ZIP file to Azkaban by using the project, and create a job scheduling workflow.
  3. Step 3: Run the workflow.
    Run the job scheduling workflow.
  4. Step 4: Query the execution results of the workflow.
    Query the execution result of the workflow.

Step 1: Prepare the required files and package them into a ZIP file

  1. Prepare the required files and package them into a ZIP file.
    The following files are required in this topic:
    • The source data file. This file is in the TXT format. In this topic, the emp.txt file is prepared. This file contains the following data:
      7369,SMITH,CLERK,7902,1980-12-17 00:00:00,800,,20
      7499,ALLEN,SALESMAN,7698,1981-02-20 00:00:00,1600,300,30
      7521,WARD,SALESMAN,7698,1981-02-22 00:00:00,1250,500,30
      7566,JONES,MANAGER,7839,1981-04-02 00:00:00,2975,,20
      7654,MARTIN,SALESMAN,7698,1981-09-28 00:00:00,1250,1400,30
      7698,BLAKE,MANAGER,7839,1981-05-01 00:00:00,2850,,30
      7782,CLARK,MANAGER,7839,1981-06-09 00:00:00,2450,,10
      7788,SCOTT,ANALYST,7566,1987-04-19 00:00:00,3000,,20
      7839,KING,PRESIDENT,,1981-11-17 00:00:00,5000,,10
      7844,TURNER,SALESMAN,7698,1981-09-08 00:00:00,1500,0,30
      7876,ADAMS,CLERK,7788,1987-05-23 00:00:00,1100,,20
      7900,JAMES,CLERK,7698,1981-12-03 00:00:00,950,,30
      7902,FORD,ANALYST,7566,1981-12-03 00:00:00,3000,,20
      7934,MILLER,CLERK,7782,1982-01-23 00:00:00,1300,,10
      7948,JACCKA,CLERK,7782,1981-04-12 00:00:00,5000,,10
      7956,WELAN,CLERK,7649,1982-07-20 00:00:00,2450,,10
      7956,TEBAGE,CLERK,7748,1982-12-30 00:00:00,1300,,10
    • The script file for creating tables and uploading data. This file is in the SQL format. In this topic, the upload.sql file is prepared. This file contains the following content:
      drop table if exists azkaban_emp;
      create table  azkaban_emp
         (empno bigint,
          ename string,
          job string,
          mgr bigint,
          hiredate datetime,
          sal bigint,
          comm bigint,
          deptno bigint) lifecycle 1;
      tunnel upload emp.txt azkaban_emp;
    • The script file for querying data. This file is in the SQL format. In this topic, the cat_data.sql file is prepared. This file contains the following content:
      select * from azkaban_emp;
    • The file for starting the job. This file is in the job format. In this topic, the start.job file is prepared. This file contains the following content:
      #start
      type=command
      command=echo 'job start'
    • The file for uploading the job data. This file is in the job format. In this topic, the upload_data.job file is prepared. This file contains the following content:
      #upload_data
      type=command
      dependencies=start
      command=D:/odpscmd_public/bin/odpscmd.bat -f 'upload.sql'

      command indicates the local installation directory of the MaxCompute client. In this topic, D:/odpscmd_public/bin/odpscmd.bat is used.

    • The file for querying the job data. This file is in the job format. In this topic, the mc.job file is prepared. This file contains the following content:
      #mc.job
      type=command
      command=D:/odpscmd_public//bin/odpscmd -f 'cat_data.sql'
      dependencies=upload_data

      command indicates the local installation directory of the MaxCompute client. In this topic, D:/odpscmd_public/bin/odpscmd.bat is used.

  2. Package the files that you prepare into a ZIP file.
    In this topic, the preceding files are packaged into the demo1.zip file. The following figure shows the files in the demo1.zip file. Files in the demo1.zip file

Step 2: Upload the ZIP file to Azkaban

  1. Log on to Azkaban.
    For more information, see UserManager.
  2. Create an Azkaban project.
    For more information about how to create an Azkaban project, see Create Projects.
  3. Upload the ZIP file that is generated in Step 1 to the Azkaban project.
    For more information about how to upload a ZIP file, see Upload Projects. Upload the ZIP fileAfter the ZIP file is uploaded, you can view the workflow on the Graph tab. For more information about how to view a workflow, see Flow View. Workflow

Step 3: Run the workflow

After the workflow is created, click Schedule/Execute Flow in the upper-right corner. In the dialog box that appears, click Execute in the Flow View panel to schedule jobs.

For more information about how to run a workflow, see Executing Flow View.

Execute

Step 4: Query the execution results of the workflow

After the workflow is run, you can view the execution results of each job on the Job List tab of the Execution page. You can also find a job on the Job List tab and click Details in the Details column to query the details of this job.

For more information about how to view the execution results of a workflow, see Execution.

Query execution results of the workflow