This topic describes how to analyze MaxCompute bills.

Background information

MaxCompute is a big data analytics platform. You can select the pay-as-you-go or subscription billing method when you purchase computing resources. Computing resources are charged by project on a daily basis, and the bills are generated before 06:00 the next day. For more information about MaxCompute billing methods, see Billing method.

Generally, bill fluctuations (generally increases) are released during data development or on the eve of a product launch. You can download bill details from Billing Management and optimize your tasks based on the provided fluctuation information. For more information, see View billing details.

Description of bill details

The fields in a bill are described as follows:
  • ProjectId: indicates the MaxCompute project list of your Alibaba Cloud account that corresponds to the current account or the RAM user.
  • MeteringId: includes the IDs of storage, computing, upload, and download tasks. The ID of an SQL task is specified by InstanceId, and the ID of an upload or download task is specified by Tunnel SessionId.
  • Type: includes Storage (storage data), ComputationSql (computing data), UploadIn (data uploaded over an internal network), UploadEx (data uploaded over the Internet), DownloadIn (data downloaded over an internal network), and DownloadEx (data downloaded over the Internet). Only the items that are outlined in red are charged based on billing rules.
  • Storage: indicates the storage data that is read on an hourly basis. Unit: bytes.
  • StartTime/EndTime: indicates the time from which a task begins to run, or the time at which a task stops. Only the storage data is obtained each hour.
  • ComputationSqlInput: indicates the volume of input data that is computed by executing an SQL statement. Unit: bytes.
  • ComputationSqlComplexity: indicates SQL statement complexity. This is one of the SQL billing factors. Unit: bytes.
  • UploadEx/DownloadEx: indicates the volume of data that is uploaded or downloaded over the Internet. Unit: bytes.
  • CoreSecond: indicates the unit of MapReduce/Spark task computing time. You need to convert the unit into hours.
  • InputOTS/InputOSS: indicates the volume of data that is read after external tables begin to incur charges. Unit: bytes.

Upload bill details to MaxCompute

  1. Verify that the data and column delimiters in the CSV file are correct. Data in the file is separated with commas (,), and cell values are enclosed in double quotation marks (").
  2. To facilitate the use of tools such as Tunnel, replace all double quotation marks (") with empty strings, and click Replace All.
  3. Create a MaxCompute table named maxcomputefee to store the bill details.
    DROP TABLE IF EXISTS maxcomputefee ;
    CREATE TABLE IF NOT EXISTS maxcomputefee 
    (
        projectid STRING COMMENT 'ProjectId'
        ,feeid STRING COMMENT 'MeteringId'
        ,type STRING COMMENT 'Type, including Storage, ComputationSQL, and DownloadEx'
        ,storage BIGINT COMMENT 'Storage'
        ,endtime DATETIME COMMENT 'EndTime'
        ,computationsqlinput BIGINT COMMENT 'ComputationSqlInput'
        ,computationsqlcomplexity DOUBLE COMMENT 'ComputationSqlComplexity'
        ,uploadex BIGINT COMMENT 'UploadEx'
        ,download BIGINT COMMENT 'DownloadEx'
        ,cu_usage DOUBLE COMMENT 'MR Compute Hour (second)'
        ,input_ots BIGINT COMMENT 'InputOTS'
        ,input_oss BIGINT COMMENT 'InputOSS'
        ,starttime DATETIME COMMENT 'StartTime'
        ,source_type String COMMENT 'OdpsSpecCode'
        ,source_id String COMMENT 'DataWorks Scheduling Job ID'
    );
  4. Run the tunnel upload command to upload data. For more information, see Tunnel commands.
    odps@ sz_mc>tunnel upload /Users/yangyi/Desktop/ODPS_2019-01-12_2019-01-14.csv maxcomputefee -c "UTF-8" -h "true" -dfp "yyyy-MM-dd HH:mm:ss";
    Note You can also upload data by using the data import function of DataWorks. For more information, see Import or export data using the Data Integration.
  5. Execute the following statement to verify data:
    SELECT * FROM maxcomputefee limit 10;

Use SQL to analyze the bill data

  1. Analyze SQL computing fees. MaxCompute SQL meets the requirements of 95% cloud users. Incurred fees also occupy a large proportion of MaxCompute bills.
    Note One-time SQL computing fee = Input data volume × SQL statement complexity × USD 0.0438/GB
    -- Analyze the SQL fees. The SQL fees are ranked by sqlmoney.
    SELECT  to_char(endtime,'yyyymmdd') as ds,feeid as instanceid
            ,projectid
            ,computationsqlcomplexity  -- Complexity
            ,SUM((computationsqlinput / 1024 / 1024 / 1024)) as computationsqlinput  -- Input data (GB)
            ,SUM((computationsqlinput / 1024 / 1024 / 1024)) * computationsqlcomplexity * 0.0438 AS sqlmoney
    FROM    maxcomputefee
    WHERE   TYPE = 'ComputationSql'
    AND to_char(endtime,'yyyymmdd') >= '20190112'
    GROUP BY to_char(endtime,'yyyymmdd'),feeid
             ,projectid
             ,computationsqlcomplexity
    ORDER BY sqlmoney DESC 
    LIMIT   10000
    ;
    The following figure shows the query results.
    The query results show that:
    • You can lower costs by reducing the amount of data to read and complexity of large-scale SQL tasks.
    • You can summarize data based on the ds field (on a daily basis) and analyze the SQL fee trend over a specific period of time. For example, you can create a line chart in an EXCEL file or by using tools such as Quick BI to display the data.
    • You can follow these steps to locate the node you want to optimize based on the execution result:
      1. Obtain the value of the instanceid field.
        Run the wait instanceid; command in the MaxCompute console or DataWorks to view the target task and SQL statements.
      2. Open a browser and enter the Logview URL.
        For more information about the Logview, see Logview.
      3. Obtain the name of the DataWorks node from the Logview.
        In the Logview, click XML in the SourceXML column that corresponds to the target task to view the running details. As shown in the following figure, SKYNET_NODENAME (which is underlined in red) is the name of a DataWorks node. Only tasks that are executed by the scheduling system have a value. With the node name (SKYNET_NODENAME), you can locate the target node, optimize the node, and view the node owner.
  2. Analyze the task number increase trend. A surge in the number of tasks due to repeated operations or unreasonable configurations of scheduling attributes results in fee increases.
    -- Analyze the task number increase trend.
    SELECT  TO_CHAR(endtime,'yyyymmdd') AS ds
            ,projectid
            ,COUNT(*) AS tasknum
    FROM    maxcomputefee
    WHERE   TYPE = 'ComputationSql'
    AND     TO_CHAR(endtime,'yyyymmdd') >= '20190112'
    GROUP BY TO_CHAR(endtime,'yyyymmdd')
             ,projectid
    ORDER BY tasknum DESC
    LIMIT   10000
    ;
    The following figure shows the result.The preceding figure shows the fluctuation of tasks that were submitted to MaxCompute and were successfully executed from January 12, 2019 to January 14, 2019.
  3. Analyze storage fees.
    -- Analyze storage fees.
    SELECT  t.ds
            ,t.projectid
            ,t.storage
            ,CASE    WHEN t.storage < 0.5 THEN 0.01
                     WHEN t.storage >= 0.5 AND t.storage <= 10240 THEN t.storage*0.0072
                     WHEN t.storage > 10240 AND t.storage <= 102400 THEN (10240*0.0072+(t.storage-10240)*0.006)
                     WHEN t.storage > 102400  THEN (10240*0.0072+(102400-10240)*0.006+(t.storage-102400)*0.004)
            END storage_fee
    FROM    (
                SELECT  to_char(starttime,'yyyymmdd') as ds
                        ,projectid
                        ,SUM(storage/1024/1024/1024)/24 AS storage
                FROM    maxcomputefee
                WHERE   TYPE = 'Storage'
                and to_char(starttime,'yyyymmdd') >= '20190112'
                GROUP BY to_char(starttime,'yyyymmdd')
                         ,projectid
            ) t
    ORDER BY storage_fee DESC
    ;
    The following figure shows the results.The results show that:
    • Storage fees increased on January 12, 2019 but decreased on January 14, 2019.
    • You can reduce storage tasks by setting the table lifecycle and deleting unnecessary temporary tables.
  4. Analyze download fees.
    MaxCompute charges for data that is downloaded from the Internet or across regions. One-time download fee = Downloaded data volume × USD 0.1166/GB
    -- Analyze the download fee details.
    SELECT  TO_CHAR(starttime,'yyyymmdd') AS ds
            ,projectid
            ,SUM((download/1024/1024/1024)*0.1166) AS download_fee
    FROM    maxcomputefee
    WHERE   type = 'DownloadEx'
    AND     TO_CHAR(starttime,'yyyymmdd') >= '20190112'
    GROUP BY TO_CHAR(starttime,'yyyymmdd')
             ,projectid
    ORDER BY download_fee DESC
    ;
  5. Analyze MapReduce task fees.
    Note MapReduce task fee = Total computing hours × USD 0.0690/hour per task
    -- Analyze MapReduce task fees.
    SELECT  TO_CHAR(starttime,'yyyymmdd') AS ds
            ,projectid
            ,(cu_usage/3600)*0.0690 AS mr_fee
    FROM    maxcomputefee
    WHERE   type = 'MapReduce'
    AND     TO_CHAR(starttime,'yyyymmdd') >= '20190112'
    GROUP BY TO_CHAR(starttime,'yyyymmdd')
             ,projectid
             ,cu_usage
    ORDER BY mr_fee DESC
    ;
  6. Analyze external table (OTS and OSS) task fees.
    Note One-time SQL computing fee of an external SQL table = Input data volume × SQL statement complexity (1) × USD 0.0044/GB
    -- Analyze SQL task fees of an OTS table.
    SELECT  TO_CHAR(starttime,'yyyymmdd') AS ds
            ,projectid
            ,(input_ots/1024/1024/1024)*1*0.0044 AS ots_fee
    FROM    maxcomputefee
    WHERE   type = 'ComputationSql'
    AND     TO_CHAR(starttime,'yyyymmdd') >= '20190112'
    GROUP BY TO_CHAR(starttime,'yyyymmdd')
             ,projectid
             ,input_ots
    ORDER BY ots_fee DESC
    ;
    
    -- Analyze SQL task fees of an OSS table.
    SELECT  TO_CHAR(starttime,'yyyymmdd') AS ds
            ,projectid
            ,(input_oss/1024/1024/1024)*1*0.0044 AS ots_fee
    FROM    maxcomputefee
    WHERE   type = 'ComputationSql'
    AND     TO_CHAR(starttime,'yyyymmdd') >= '20190112'
    GROUP BY TO_CHAR(starttime,'yyyymmdd')
             ,projectid
             ,input_oss
    ORDER BY ots_fee DESC
    ;
  7. Analyze Lightning query fees.
    Note One-time Lightning query fee = Input data volume × 0.03/GB
    SELECT  to_char(endtime,'yyyymmdd') as ds,feeid as instanceid
            ,projectid
            ,computationsqlcomplexity
            ,SUM((computationsqlinput / 1024 / 1024 / 1024)) as computationsqlinput
            ,SUM((computationsqlinput / 1024 / 1024 / 1024)) * computationsqlcomplexity * 0.03 AS sqlmoney
    FROM    maxcomputefee
    WHERE   TYPE = 'LightningQuery'
    --AND to_char(endtime,'yyyymmdd') >= '20190112'
    GROUP BY to_char(endtime,'yyyymmdd'),feeid
             ,projectid
             ,computationsqlcomplexity
    ORDER BY sqlmoney DESC 
    LIMIT   10000
    ;
  8. Analyze Spark computing fees.
    Note Spark computing task fee = Total computing hours × USD 0.1041/hour per task
    -- Analyze Spark computing fees.
    SELECT  TO_CHAR(starttime,'yyyymmdd') AS ds
            ,projectid
            ,(cu_usage/3600)*0.1041 AS mr_fee
    FROM    maxcomputefee
    WHERE   type = 'spark'
    AND     TO_CHAR(starttime,'yyyymmdd') >= '20190112'
    GROUP BY TO_CHAR(starttime,'yyyymmdd')
             ,projectid
             ,cu_usage
    ORDER BY mr_fee DESC
    ;