Analyze metadata to find top billing accounts and time-consuming jobs - MaxCompute

Track which RAM users drive the most compute spend and which jobs take the longest to run. This tutorial shows you how to back up job history from Information Schema, run cost and duration analysis queries, and push daily top-N reports to a DingTalk group.

Background

When you use MaxCompute in DataWorks standard mode, most jobs run under the root account. Information Schema records this root account as the executor, so only a small number of jobs show a RAM user as the owner. This makes per-user cost attribution and duration analysis difficult out of the box.

Data source options

Source	Retention	Use when
Project-level Information Schema (`TASKS_HISTORY` view)	14 days	You need job-level detail for a single project
Tenant-level Information Schema	Longer, more complete	You need cross-project or longer-horizon analysis

This tutorial uses the TASKS_HISTORY view. To back up beyond 14 days, the workflow writes daily snapshots to a persistent table. To use tenant-level Information Schema instead, see Tenant-level Information Schema.

How it works

Install the Information Schema permission package for each project to query.
Back up TASKS_HISTORY daily into a persistent table (information_history).
Parse the settings field to extract the actual RAM user from the SKYNET_ONDUTY key—this corrects the root-account attribution problem.
Run aggregation queries to produce cost_topn (billing) and time_topn (duration) tables.
Push the ranked results to a DingTalk group using a Java program packaged as a JAR file.
Wire the three DataWorks nodes into a dependency chain and schedule them to run daily.

Prerequisites

Before you begin, make sure you have:

A MaxCompute project with the project owner account or a RAM user that has the Super_Administrator role
Access to the DataWorks console with permission to create and schedule nodes
A DingTalk group where you have permission to add a robot
IntelliJ IDEA (or another Maven-compatible IDE) to compile the notification JAR

Step 1: Install the Information Schema package

Note

Starting March 1, 2024, MaxCompute no longer automatically installs the project-level Information Schema package for new projects. If you are working with a new project, query the tenant-level Information Schema instead. For existing projects, follow the steps below.

The installation must be performed by the project owner or a RAM user with the Super_Administrator role. Install the package using either of the following methods:

Log on to the MaxCompute client and run:

install package Information_Schema.systables;

Log on to the DataWorks console, go to Ad-hoc Query, and run:
```
install package Information_Schema.systables;
```

To analyze metadata from multiple MaxCompute projects, install the package for each project separately. Then insert the backups from all projects into a single table for unified analysis.

For more information about the features and limits of Information Schema, see Project-level Information Schema.

(Optional) Step 2: Grant permissions to other users

By default, only the project owner can view Information Schema data. To let other users or roles query it, grant them read access to the package.

Run the following commands in the MaxCompute client:

grant <actions> on package Information_Schema.systables to user <user_name>;
grant <actions> on package Information_Schema.systables to role <role_name>;

Parameter	Description
`<actions>`	The permission to grant. Set this to `Read`.
`<user_name>`	An Alibaba Cloud account or RAM user added to the project. Run `list users;` to see available accounts.
`<role_name>`	A role added to the project. Run `list roles;` to see available roles.

Example:

grant read on package Information_Schema.systables to user RAM$Bob@aliyun.com:user01;

For more information, see Access resources across projects based on packages.

Step 3: Back up job metadata

Create a persistent backup table and schedule a daily DataWorks node to populate it from TASKS_HISTORY.

3.1 Create the backup table

Log on to the MaxCompute client and run:

-- Replace <project_name> with your MaxCompute project name.
CREATE TABLE if NOT EXISTS <project_name>.information_history
(
    task_catalog    STRING
    ,task_schema    STRING
    ,task_name      STRING
    ,task_type      STRING   -- Job type: SQL, SQLRT (query acceleration), or CUPID (Spark)
    ,inst_id        STRING   -- Instance ID, used to identify time-consuming jobs
    ,`status`       STRING
    ,owner_id       STRING
    ,owner_name     STRING
    ,result         STRING
    ,start_time     DATETIME
    ,end_time       DATETIME
    ,input_records  BIGINT
    ,output_records BIGINT
    ,input_bytes    BIGINT   -- Input data size in bytes; used in cost calculation
    ,output_bytes   BIGINT
    ,input_tables   STRING
    ,output_tables  STRING
    ,operation_text STRING
    ,signature      STRING
    ,complexity     DOUBLE   -- Job complexity coefficient; multiplied with input_bytes to estimate cost
    ,cost_cpu       DOUBLE   -- CPU-hours consumed; used for Spark job cost calculation
    ,cost_mem       DOUBLE
    ,settings       STRING   -- JSON blob containing scheduler metadata; parse SKYNET_ONDUTY to get the true RAM user
    ,ds             STRING
);

3.2 Create the ODPS SQL node for daily backup

In DataWorks Data Studio, create an ODPS SQL node named information_history. See Create an ODPS SQL node for setup details.

Add the following SQL to the node:

-- Replace <project_name> with your MaxCompute project name.
use <project_name>;
insert into table <project_name>.information_history
select * from information_schema.tasks_history where ds ='${datetime1}';

${datetime1} is a DataWorks scheduling parameter. In the Scheduling Configuration panel on the right, go to Basic Properties and set Parameter to datetime1=${yyyymmdd}.

Save the node by clicking the Save icon in the upper-left corner.

Note

To aggregate metadata from multiple MaxCompute projects, create one ODPS SQL node per project, each inserting into the same <code class="inline-code___exakR" data-tag="code" id="code_5f326182">information_history</code> table.

Step 4: Create analysis jobs for top-N billing accounts and time-consuming jobs

The settings field in TASKS_HISTORY contains a JSON object with scheduler metadata including SKYNET_ONDUTY, which identifies the RAM user who submitted the job—not just the root account. The analysis jobs use GET_JSON_OBJECT(settings, "$.SKYNET_ONDUTY") to attribute costs and durations to the correct user.

4.1 Create the supporting tables

Log on to the MaxCompute client and run the following commands.

RAM user lookup table (user_ram): maps account IDs to display names.

CREATE TABLE if NOT EXISTS <project_name>.user_ram
(
    user_id   STRING
    ,user_name STRING
);

Top-N billing accounts table (cost_topn):

CREATE TABLE if NOT EXISTS <project_name>.cost_topn
(
    cost_sum   DECIMAL(38,5)
    ,task_owner STRING
)
partitioned BY
(
    ds STRING
);

Top-N time-consuming jobs table (time_topn):

CREATE TABLE if NOT EXISTS <project_name>.time_topn
(
    inst_id    STRING
    ,cost_time  BIGINT
    ,task_owner STRING
)
partitioned BY
(
    ds STRING
);

4.2 Create the analysis ODPS SQL node

In DataWorks Data Studio, create an ODPS SQL node named topn. Add the following SQL:

-- Enable data type 2.0. For more information, see Data type editions.
SET odps.sql.decimal.odps2=true;

-- Populate cost_topn.
-- Cost formulas by job type:
--   SQL / SQLRT: input_bytes (GB) x complexity x 0.3 (CNY)
--   CUPID (Spark, Terminated only): cost_cpu / 100 / 3600 x 0.66 (CNY)
-- Replace <user_id> with the account IDs you want to map to display names (from user_ram).
INSERT INTO TABLE <project_name>.cost_topn PARTITION (ds = '${datetime1}')
SELECT
    NVL(cost_sum, 0) cost_sum
    ,CASE WHEN a.task_owner = '<user_id>' OR a.task_owner = '<user_id>' OR a.task_owner = '<user_id>'
               THEN b.user_name
          ELSE a.task_owner
     END task_owner
FROM (
    SELECT
        inst_id
        ,owner_name
        ,task_type
        ,a.input_bytes
        ,a.cost_cpu
        ,a.status
        ,CASE WHEN a.task_type = 'SQL'   THEN CAST(a.input_bytes/1024/1024/1024 * a.complexity * 0.3 AS DECIMAL(18,5))
              WHEN a.task_type = 'SQLRT' THEN CAST(a.input_bytes/1024/1024/1024 * a.complexity * 0.3 AS DECIMAL(18,5))
              WHEN a.task_type = 'CUPID' AND a.status = 'Terminated'
                                         THEN CAST(a.cost_cpu/100/3600 * 0.66 AS DECIMAL(18,5))
              ELSE 0
         END cost_sum
        ,a.settings
        ,GET_JSON_OBJECT(settings, "$.SKYNET_ONDUTY") owner
        ,CASE WHEN GET_JSON_OBJECT(a.settings, "$.SKYNET_ONDUTY") IS NULL THEN owner_name
              ELSE GET_JSON_OBJECT(a.settings, "$.SKYNET_ONDUTY")
         END task_owner
    FROM information_history
    WHERE ds = '${datetime1}'
) a
LEFT JOIN <project_name>.user_ram b
ON a.task_owner = b.user_id;

-- Populate time_topn.
-- cost_time is measured in seconds using DATEDIFF(end_time, start_time, 'ss').
INSERT INTO TABLE <project_name>.time_topn PARTITION (ds = '${datetime1}')
SELECT
    inst_id
    ,cost_time
    ,CASE WHEN a.task_owner = '<user_id>' OR a.task_owner = '<user_id>' OR a.task_owner = '<user_id>'
               THEN b.user_name
          ELSE a.task_owner
     END task_owner
FROM (
    SELECT
        inst_id
        ,task_type
        ,status
        ,DATEDIFF(a.end_time, a.start_time, 'ss') AS cost_time
        ,CASE WHEN GET_JSON_OBJECT(a.settings, "$.SKYNET_ONDUTY") IS NULL THEN owner_name
              ELSE GET_JSON_OBJECT(a.settings, "$.SKYNET_ONDUTY")
         END task_owner
    FROM <project_name>.information_history a
    WHERE ds = '${datetime1}'
) a
LEFT JOIN <project_name>.user_ram b
ON a.task_owner = b.user_id;

Note

In the example, <code class="inline-code___exakR" data-tag="code" id="009e222468q5n">task_type = 'SQL'</code> represents an SQL job, <code class="inline-code___exakR" data-tag="code" id="356b9950b9pcf">task_type = 'SQLRT'</code> represents a query acceleration job, and <code class="inline-code___exakR" data-tag="code" id="c5b5e2e926bma">task_type = 'CUPID'</code> represents a Spark job. To include other billable job types such as MapReduce or Mars, add CASE branches using their respective billing formulas. For more information, see <a baseurl="t2020484_v1_1_0.dita" data-node="1781616" data-tag="xref" href="t1781616.dita#concept_2245130" id="xref_t2i_fh1_b6j" props="intl partner" rel="noopener noreferrer" target="_blank">Compute unit-based billing (pay-as-you-go)</a><a baseurl="t2020484_v1_1_0.dita" data-node="1781616" data-tag="xref" href="t1781616.dita#concept_2245130" id="xref_t2i_fh1_b6j" props="intl partner" rel="noopener noreferrer" target="_blank">Compute unit-based billing (pay-as-you-go)</a>.

In the Scheduling Configuration panel, set Parameter to datetime1=${yyyymmdd}. Save the node.

Step 5: Push top-N results to DingTalk

5.1 Create a DingTalk group chatbot

The following steps use the DingTalk PC client as an example.

Open the target DingTalk group and click the icon in the upper-right corner.
On the Group Settings panel, click Smart Group Assistant.
On the Smart Group Assistant panel, click Add Robot.
In the Add Robot section of the Group Robot dialog box, click the icon.
In the Group Robot dialog box, click Custom.
In the Robot Details dialog box, click Add.

In the Add Robot dialog box, configure the robot:

Property	Description
Profile picture	Click the icon in the lower-right corner of the profile picture to edit it.
Robot Name	Enter a name for the robot.
Security Settings	Configure at least one security setting. Three options are available: Custom Keywords (up to 10 keywords), Add Signature (select to get the robot's signing key), and IP Address (Range) (restricts requests to a specified IP range). Select I have read and agree to the "Custom Robot Service and Disclaimer", then click Finish.

Copy the generated Webhook URL, then click Finish.

Important
Keep the Webhook URL confidential. Exposing it publicly allows anyone to post messages to your group.

5.2 Build the notification program

Use IntelliJ IDEA to create a Maven project that queries cost_topn and time_topn and sends the results to DingTalk via the Webhook URL. After compilation, the build produces a JAR file named topn_new.jar.

Configure POM dependencies:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>DingTalk_Information</groupId>
    <artifactId>DingTalk_Information</artifactId>
    <version>1.0-SNAPSHOT</version>
    <dependencies>
        <dependency>
            <groupId>com.aliyun.odps</groupId>
            <artifactId>odps-sdk-core</artifactId>
            <version>0.35.5-public</version>
        </dependency>
        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>1.2.15</version>
            <exclusions>
                <exclusion>
                    <groupId>com.sun.jmx</groupId>
                    <artifactId>jmxri</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.sun.jdmk</groupId>
                    <artifactId>jmxtools</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>javax.jms</groupId>
                    <artifactId>jms</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>com.aliyun</groupId>
            <artifactId>alibaba-dingtalk-service-sdk</artifactId>
            <version>1.0.1</version>
        </dependency>
        <dependency>
            <groupId>com.aliyun.odps</groupId>
            <artifactId>odps-jdbc</artifactId>
            <version>3.0.1</version>
            <classifier>jar-with-dependencies</classifier>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.4.1</version>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <archive>
                        <manifest>
                            <mainClass>com.alibaba.sgri.message.test</mainClass>
                        </manifest>
                    </archive>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

Java program example:

package com.alibaba.sgri.message;
import java.io.IOException;
import java.util.concurrent.atomic.AtomicInteger;
import com.aliyun.odps.Instance;
import com.aliyun.odps.Odps;
import com.aliyun.odps.OdpsException;
import com.aliyun.odps.account.Account;
import com.aliyun.odps.account.AliyunAccount;
import com.aliyun.odps.data.ResultSet;
import com.aliyun.odps.task.SQLTask;
import com.dingtalk.api.DefaultDingTalkClient;
import com.dingtalk.api.DingTalkClient;
import com.dingtalk.api.request.OapiRobotSendRequest;
import com.dingtalk.api.response.OapiRobotSendResponse;
import com.taobao.api.ApiException;

public class test {

    public static void main(String[] args) throws ApiException {
        if (args.length < 1) {
            System.out.println("Please enter the date parameter.");
            System.exit(0);
        }
        System.out.println("Start reading data.");
        DingTalkClient client = new DefaultDingTalkClient(
                "<Your chatbot's Webhook URL>");
        OapiRobotSendRequest request = new OapiRobotSendRequest();
        request.setMsgtype("markdown");
        OapiRobotSendRequest.Markdown markdown = new OapiRobotSendRequest.Markdown();
        // The date here is used as a parameter.
        markdown.setText(getContent(args[0]));
        markdown.setTitle("Top N jobs by consumption");
        request.setMarkdown(markdown);
        OapiRobotSendResponse response = client.execute(request);
        System.out.println("Message sent successfully.");
    }

    /**
     * Read from ODPS to get the data to send.
     */

    public static String getContent(String day) {
        Odps odps = createOdps();
        StringBuilder sb = new StringBuilder();
        try {
            //==================Billing accounts=====================
            String costTopnSql = "select sum(cost_sum)cost_sum,task_owner from cost_topn where ds='" + day + "' " + "group by task_owner order by cost_sum desc limit 5;";
            Instance costInstance = SQLTask.run(odps, costTopnSql);
            costInstance.waitForSuccess();
            ResultSet costTopnRecords = SQLTask.getResultSet(costInstance);
            sb.append("<font color=#FF0000 size=4>").append("Top N Billing Accounts (").append(day).append(
                    ")[Calculated based on Alibaba Cloud pay-as-you-go billing]").append("</font>").append("\n\n");
            AtomicInteger costIndex = new AtomicInteger(1);
            costTopnRecords.forEach(item -> {
                sb.append(costIndex.getAndIncrement()).append(".").append("Account:");
                sb.append("<font color=#2E64FE>").append(item.getString("task_owner")).append("\n\n").append("</font>");
                sb.append("  ").append(" ").append("Cost:").append("<font color=#2E64FE>").append(item.get("cost_sum"))
                        .append(" CNY").append(
                        "</font>").append("\n\n")
                        .append("</font>");
            });
            //==================Time-consuming jobs=====================
            String timeTopnSql = "select * from time_topn where ds='" + day + "' ORDER BY cost_time DESC limit 5;";
            Instance timeInstance = SQLTask.run(odps, timeTopnSql);
            timeInstance.waitForSuccess();
            ResultSet timeTopnRecords = SQLTask.getResultSet(timeInstance);
            sb.append("<font color=#FF8C00 size=4>").append("Top N Time-consuming Jobs (").append(day).append(")")
                    .append("\n\n").append("</font>");
            AtomicInteger timeIndex = new AtomicInteger(1);
            timeTopnRecords.forEach(item -> {
                sb.append(timeIndex.getAndIncrement()).append(".").append("Job:");
                sb.append("<font color=#2E64FE>").append(item.getString("inst_id")).append("\n\n").append("</font>");
                sb.append("   ").append("Account:").append("<font color=#2E64FE>").append(item.getString("task_owner")).append("\n\n").append("</font>");
                sb.append("   ").append("Duration:").append("<font color=#2E64FE>").append(item.get("cost_time"))
                        .append("s").append(
                        "</font>").append("\n\n");
            });
        } catch (OdpsException | IOException e) {
            e.printStackTrace();
        }
        return sb.toString();
    }

    /**
     * Create an ODPS object.
     */
    public static Odps createOdps() {
        String project = "<project_name>";
        // Store AccessKey credentials in environment variables rather than hardcoding them.
        // An Alibaba Cloud account AccessKey has permissions to access all APIs.
        // For least-privilege access, use a RAM user's AccessKey instead.
        String accessId = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID");
        String accessKey = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET");
        String endPoint = "http://service.odps.aliyun.com/api";
        Account account = new AliyunAccount(accessId, accessKey);
        Odps odps = new Odps(account);
        odps.setEndpoint(endPoint);
        odps.setDefaultProject(project);
        return odps;
    }
}

For IntelliJ IDEA usage, click Help in the upper-right corner of the interface.

5.3 Upload the JAR and create the Shell node

Upload topn_new.jar as a MaxCompute resource. See Create and use MaxCompute resources.
In DataWorks Data Studio, create a Shell node named dingsend. See Shell node for setup details. Add the following command to the node:
```
java -jar topn_new.jar $1
```
In the Scheduling Configuration panel, set Parameter to ${yyyymmdd}. Save the node.

Step 6: Configure node dependencies and run

On the business flow canvas, connect the three nodes in sequence: information_history → topn → dingsend. This ensures each stage runs only after the previous one completes.

Configure the rerun properties and ancestor node dependencies for each node. When the configuration is complete, right-click a node and select Run Node.

For dependency configuration details, see Configure same-cycle scheduling dependencies and Configure node context.

MaxCompute:Analyze top N billing accounts and time-consuming jobs in MaxCompute

Background

Prerequisites

Step 1: Install the Information Schema package

(Optional) Step 2: Grant permissions to other users

Step 3: Back up job metadata

3.1 Create the backup table

3.2 Create the ODPS SQL node for daily backup

Step 4: Create analysis jobs for top-N billing accounts and time-consuming jobs

4.1 Create the supporting tables

4.2 Create the analysis ODPS SQL node

Step 5: Push top-N results to DingTalk

5.1 Create a DingTalk group chatbot

5.2 Build the notification program

5.3 Upload the JAR and create the Shell node

Step 6: Configure node dependencies and run

What's next