All Products
Search
Document Center

MaxCompute:Job priority

Last Updated:Aug 04, 2023

MaxCompute provides the job priority feature for projects that use the subscription billing method. This topic describes how to enable the job priority feature and how to configure and view job priorities.

Background information

You may purchase limited subscription-based resources for MaxCompute projects. During data development, jobs that are more important have higher priorities in using computing resources. For example, to make sure that the system can generate data before 06:00, you must ensure that sufficient computing resources are available for relevant jobs or workflows.

MaxCompute allows you to configure priorities for the jobs in projects that use the subscription billing method. This way, jobs that have higher priorities can preferentially use computing resources. When a job with a higher priority is run, the job can preempt computing resources from jobs with lower priorities.

Overview

Each MaxCompute job can have a priority that ranges from 0 to 9. A smaller value indicates a higher priority. Computing resources are preferentially allocated to jobs with higher priorities.

Take note of the following points:

  • In MaxCompute, you can configure priorities only for jobs in projects that use the subscription billing method.

  • If the job priority feature is not enabled for a project, the default priority is 9 for all jobs except the following jobs:

    For Machine Learning Platform for AI (PAI) jobs, the default priority is 1.

Enable the job priority feature

Only the owner of a project or a user that is assigned the Super_Administrator role can run the following command to enable the job priority feature for the project:

setproject odps.instance.priority.enable=true;

After you enable the job priority feature for a project that uses the subscription billing method, the specified priority of each job immediately takes effect. If the job priorities that you specified are inappropriate, jobs may become out of order.

Important

We recommend that you use Information Schema to check the priorities of existing jobs and set the priority to 9 for specific jobs before you enable the job priority feature.

To check and adjust the priorities of existing jobs in a project, perform the following steps:

  1. Collect statistics on priorities of existing jobs.

    Example:

    SELECT  get_json_object(
                REPLACE(settings, '.', '_')
                ,'$.odps_instance_priority'
            ) AS priority
            ,task_type
            ,COUNT(1) AS cnt
    FROM    information_schema.tasks_history
    WHERE   ds = '${bizdate}' // Set bizdate to the data timestamp. 
    GROUP BY get_json_object(
                 REPLACE(settings, '.', '_')
                 ,'$.odps_instance_priority'
             )
             ,task_type
    ORDER BY cnt DESC
    LIMIT   100
    ;

    The following result is returned:

    +----------+-----------+------------+
    | priority | task_type | cnt        |
    +----------+-----------+------------+
    | 9        | SQL       | 4          |
    | NULL     | SQL       | 1          |
    | 2        | SQL       | 1          |
    +----------+-----------+------------+

    The returned result shows that the existing jobs have the following priorities: NULL, 2, and 9. You must find jobs whose priority is not 9. In this example, find jobs whose priority is 2 or NULL. Jobs that have a priority of NULL are DDL jobs. You can ignore these jobs.

  2. Query jobs whose priority is not 9. Example:

    SELECT  inst_id
            ,owner_name
            ,task_name
            ,task_type
            ,settings
    FROM    information_schema.tasks_history
    WHERE   ds = '${bizdate}'
    AND     get_json_object(REPLACE(settings, '.', '_'), '$.odps_instance_priority') = '${priority}'
    LIMIT   100
    ;
    • bizdate: the data timestamp. For example, you can set this parameter to 20200517.

    • priority: the job priority that is not 9. For example, you can set this parameter to 2.

    The following result is returned:

    +---------+------------+-----------+-----------+----------+
    | inst_id | owner_name | task_name | task_type | settings |
    +---------+------------+-----------+-----------+----------+
    | 20200517160200907g4jm**** | ALIYUN$odps_dev_****@prod.trusteeship.aliyunid.com | console_query_task_158973132**** | SQL       | {"SKYNET_ID": "21000041****", "odps.instance.priority": "2", "SKYNET_ONDUTY": "113058643178****", "user_agent": "JavaSDK Revision:33acd11 Version:0.30.9 JavaVersion:1.8.0_112 CLT(0.30.2 : 9da012b); Linux(/)", "biz_id": "210000416174_20200517_211843317416_210033365461_1_habai_test_1130586431784115_39419845061****", "SKYNET_NODENAME": "test_priority"} |
    +---------+------------+-----------+-----------+----------+
    • SKYNET_ID: the ID of the DataWorks node on which the job is run. If the returned result does not contain this parameter, the job is not submitted in DataWorks. You must query these jobs based on the owner and client information that are indicated by the owner_name and user_agent parameters in the returned result.

    • SKYNET_ONDUTY: the ID of the auto triggered DataWorks node on which the job is run. To view more information about the job, you can log on to the DataWorks console and go to Operation Center of the workspace to which the auto triggered node belongs. In Operation Center, choose Cycle Task Maintenance > Cycle Instance in the left-side navigation pane. On the page that appears, find the auto triggered node on which the job is run.

  3. Adjust job priorities based on your business requirements.

    • Jobs that are submitted in DataWorks: If the DataWorks node on which a job is run is added to a baseline, check whether the baseline is properly configured. If the baseline is not properly configured, delete the baseline. For more information, see Manage baselines.

    • Jobs that are not submitted in DataWorks: Based on the returned result in Step 2, find the owner and code of a job and delete the priority settings of the job from the code. Then, the default priority 9 takes effect for the job.

Configure job priorities

To configure job priorities, you can use one of the following methods:

  • Run the MaxCompute client. Go to a project and configure a priority for each job.

    This method is used to configure priorities for ad hoc query jobs. Example:

    set odps.instance.priority=values;
    // Set values to a value in the range of 0 to 9.
  • Run the MaxCompute client. Go to a project, use SQL statements to specify parameters, and then configure a priority for each job.

    This method is used to configure priorities for ad hoc query jobs. Example:

    bin/odpscmd --config=xxx --project=xxx --instance-priority=x -e "<sql>"
  • Use SDK for Java to configure the priority for each job.

    You can use this method to customize the code for configuring job priorities. For more information, see SDK for Java. Example:

    import com.aliyun.odps.Instance;
    import com.aliyun.odps.LogView;
    import com.aliyun.odps.Odps;
    import com.aliyun.odps.OdpsException;
    import com.aliyun.odps.account.Account;
    import com.aliyun.odps.account.AliyunAccount;
    import com.aliyun.odps.task.SQLTask;
    public class OdpsPriorityDemo {
        public static void main(String args[]) throws OdpsException {
         	  // The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. Using these credentials to perform operations in OSS is a high-risk operation. We recommend that you use a Resource Access Management (RAM) user to call API operations or perform routine O&M. To create a RAM user, log on to the RAM console.
    				// In this example, the AccessKey ID and AccessKey secret are configured as environment variables. You can also save your AccessKey pair in the configuration file based on your business requirements.
    				// We recommend that you do not hard-code the AccessKey ID and AccessKey secret in your code. Otherwise, the AccessKey pair may be leaked.
            Account account = new AliyunAccount(System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"),System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET"));
            Odps odps = new Odps(account);
            // The URL of MaxCompute. 
            String odpsUrl = "http://service.odps.aliyun.com/api"; 
            odps.setEndpoint(odpsUrl);
            odps.setDefaultProject("xxxxxxxxxx");
            SQLTask task = new SQLTask();
            task.setName("adhoc_sql_task_1");
            task.setQuery("select count(*) from aa;");
         		// The priority of the job. For example, you can set the priority to 5. 
            Instance instance = odps.instances().create(task, 5); 
            LogView logView = new LogView(odps);
            // Optional. Display the Logview information, based on which you can view the status of the job. 
            System.out.println(logView.generateLogView(instance, 24)); 
            // Optional. Wait until the job is run. 
            instance.waitForSuccess(); 
        }
    }
  • Use the baseline management feature of DataWorks to configure the priority for one or more jobs at a time.

    This method is used to configure the priority for jobs that are run on auto triggered nodes. This ensures that an auto triggered node and its ancestor nodes can generate data in a timely manner. You can use the baseline management feature to configure the priority for all jobs in a workflow at the same time. For more information, see Manage baselines.

    You can configure the priority of a DataWorks baseline to 1, 3, 5, 7, or 8. A larger value indicates a higher priority. If you use the baseline management feature of DataWorks to configure the priority for MaxCompute jobs, the MaxCompute job priority is calculated by using the following formula: MaxCompute job priority = 9 - DataWorks baseline priority.

    Note

    By default, DataWorks does not provide a baseline priority for ad hoc query jobs. Therefore, the lowest priority of MaxCompute jobs that are initiated in an ad hoc query is 9.

    The default baseline priority in a DataWorks workflow is 1. Therefore, the lowest priority of MaxCompute jobs that are initiated in an ad hoc query is 8.

  • Configure the priority for each job in the code of the DataWorks node on which the job is run.

    This method is used to configure priorities for ad hoc query jobs. Example:

    set odps.instance.priority=x;
    // Set x to a value in the range from 0 to 9.

View job priorities

You can view the value of the odps.instance.priority parameter on the Json Summary tab of Logview 2.0. The odps.instance.priority parameter specifies the job priority. For more information about the operations in Logview 2.0, see Use Logview V2.0 to view job information.Logview 2.0

Note

If you view the source XML file in Logview, the job priority in the XML file may be inaccurate. The job priority is fixed to 9 for the following types of projects, regardless of the job priority in the XML file: projects that use the pay-as-you-go billing method and projects for which the job priority feature is disabled and that use the subscription billing method. This ensures that jobs can be queued in a reasonable manner.