Dynamic Route for CDN (DCDN) log analysis is a log data analysis and query service that integrates Object Storage Service (OSS) and DCDN. You can enable offline log delivery in the DCDN console to automatically deliver DCDN logs to your OSS buckets for long-term management. Then, you can use the serverless Presto engine of Data Lake Analytics (DLA) to analyze the log data. This topic describes how to use the serverless Presto engine of DLA to analyze and query DCDN log data.

Prerequisites

Before you analyze DCDN log data, make sure that you have performed the following operations:

Background information

After you enable offline log delivery in the DCDN console, DLA in the region where DCDN offline log delivery is enabled is automatically activated and DLA automatically creates a table to store the delivered logs.

Procedure

This example assumes that DCDN logs are stored in the China (Shanghai) region where DCDN offline log delivery is enabled.

  1. Log on to the DLA console.
  2. Select China (Shanghai) from the region drop-down list in the upper-left corner, as shown in the following figure.
  3. In the left-side navigation pane, choose Serverless Presto > Execute.
  4. On the Execute page, execute an SQL statement to query and analyze DCDN log data. Examples:
    • Top 10 URL path access errors
      // Query the HTTP status codes that are greater than 400. Obtain access errors that correspond to each URL path based on the URL paths contained in the requests, excluding aggregated URL paths. Then, sort the errors in descending order of the error quantity to obtain top 10 errors.
      SELECT urlPath,count(*) as c from alicdn_offline_log.alicdn_offline_log_dcdn WHERE httpcode > 400 GROUP BY urlPath ORDER BY  c DESC LIMIT 10;
    • Access source percentage
      // Use the domain name filter condition to filter out the logs for which the access source is empty. Aggregate the number of requests from each access source. Then, calculate the percentage of requests for a single domain to all requests. Retain two decimal places for the percentage.
      select
        refer_domain as "Access source",
        c as "Requests",
        round(c * 100.0 /(sum(c) over()), 2) as "Percentage (%)"
      from
        (
          select
            domain as refer_domain,
            count(*) as c
          from
            alicdn_offline_log.alicdn_offline_log_dcdn
          group by
            refer_domain
          order by
            c desc
          limit
            100
        )
        where refer_domain ! = "";
    • Top domains based on access times
      // Use GROUP BY to group access times by domain name, use the COUNT (*) function to calculate the number of times of accesses to each domain, use ORDER BY to sort the domains based on the access times, and use DESC to sort the domains in descending order. In addition, use Limit 100 to return the top 100 domains, use a pie chart for graphical presentation, use return_code for classification, and set the value column to c.
      select
        Domain,
        count(*) as c
      FROM
        alicdn_offline_log.alicdn_offline_log_dcdn
      group by
        Domain
      order by
        c desc
      limit
        100
    • Distribution of offline logs of different data types
      // Use the SUM function to calculate the total value of respsize, and use the calculated value as sum_res. Use GROUP BY to group offline logs of different data types by content_type, use ORDER BY to sort offline logs of different data types by sum_res. Then, obtain top 10 data types, use a pie chart for graphical presentation, use content_type for classification, and set the value column to sum_res.
      select
        contenttype,
        sum(respsize) as sum_res
      FROM alicdn_offline_log.alicdn_offline_log_dcdn
      group by
        contenttype
      order by
        sum_res desc
      limit
        10;