This topic describes how to enable log query. The log query and log analysis features provided by Log Service are integrated in Web Application Firewall (WAF). You can query and analyze logs on the Log Service page of the WAF console. After you enable the WAF log collection feature for a specified domain, you can query and analyze collected log entries in real time, view or edit dashboard data, and set monitoring and alert rules on the Log Service page in the WAF console.

Procedure

  1. Log on to the Web Application Firewall console.
  2. In the top navigation bar, select the resource group to which the instance belongs and the region, Mainland China or International, in which the instance is deployed.
  3. In the left-side navigation pane, choose Log Management > Log Service .
  4. On the Log Service page that appears, select a domain and ensure that Status next to the domain is turned on.
  5. Click the Log Query tab.
    This tab is integrated with the log query and analysis page in the Log Service console. A query statement is automatically entered. For example, matched_host: "<yourDomainName>" is automatically entered. This statement is used to query all log entries about the domain that you select.Log Query tab
  6. Enter a query and analysis statement, select a time range, and click Search & Analyze.

More operations on the Log Query tab

On the Log Query tab, you can perform the following operations on log entries:
  • Customize query and analysis

    Log Service supports a wide range of query and analysis statements that can be used in complex scenarios. For more information, see Query and analyze log entries by using statements.

  • View the distribution of log entries within a specified time range
    The column chart below the search bar shows the distribution of log entries that are filtered by a time range and query statements. The horizontal axis indicates the time range, and the vertical axis indicates the number of log entries. The total number of the log entries is displayed below the chart.
    Note You can drag a column in the column chart to narrow down the time range. The time picker automatically changes the time range, and the corresponding query results are updated.
  • View raw logs

    On the Raw Logs tab, the details of each log entry are displayed in a separate section. The details include the time when the log was collected, the content, and other fields. Click Display Content Column to set the display mode to Full Line or New Line for long strings in the Content column. Click Column Settings to select the columns you want to view. Click the download icon to download the query results.

  • View analysis charts

    Log analysis results are displayed in charts. You can select various types of charts on the Graph tab. For more information, see Overview.

    Graph tab
  • Perform quick analysis

    On the Raw Logs tab, view the distribution of a log field within a specified time range with one click. This helps reduce the time needed to query key data. For more information, see Quick analysis.

    Quick Analysis

Query and analyze log entries by using statements

A query and analysis statement consists of a search clause and an analytics clause that are separated with a vertical bar (|).

$Search | $Analytics
Clause Description
Search Specifies the keyword, fuzzy string, numeric value, range, or a combination of these items as the query condition. If the statement is empty or only contains an asterisk (*) wildcard, all log entries are queried.
Analytics Calculates and analyzes the query results or all log entries.
Note Both the search and analytics clauses are optional.
  • If the search clause is empty, all log entries within a specified time range are queried and analyzed.
  • If the analytics clause is empty, the query results are returned without analysis.

Query syntax

The query syntax of Log Service supports full text query and field-based query. Statements in the search bar can be displayed in multiple lines and highlighted.
  • Full text query

    When you enter keywords, you do not have to specify fields for log entry queries. If you want to query the log entries that contain the entire keyword, you can enter a keyword in a pair of double quotation marks ("). If you enter more than one keyword, separate them with spaces or and.

    Examples
    • Query log entries based on multiple keywords

      You can execute one of the following statements to query all the log entries that contain www.aliyun.com and error:

      www.aliyun.com error and www.aliyun.com and error

    • Query log entries based on query conditions

      You can execute the following statement to query the log entries that contain www.aliyun.com and error or 404:

      www.aliyun.com and (error or 404)
    • Query log entries based on a prefix

      You can execute the following statement to query the log entries that contain www.aliyun.com and start with failed_:

      www.aliyun.com and failed_*
      Note The asterisk (*) wildcard can only be added as a suffix and cannot be added as a prefix. For example, the statement cannot be *_error.
  • Field-based query

    You can query log entries based on fields.

    You can specify a numeric field and value in the format of Field name: Value or Field name >= Value. You can also use the and and or operators to specify a combination of fields or use field-based query together with full text query.
    Note The log entries that record access, operations, and attacks on a specific domain in Log Service for WAF can also be queried by field. For information about the definition, type, and format of each field, see Log fields.
    Examples
    • Query log entries based on multiple fields

      You can execute the following statement to query the log entries that record HTTP flood attacks blocked by WAF on the www.aliyun.com domain:

      matched_host: www.aliyun.com and cc_blocks: 1

      You can execute the following statement to query all the log entries that record access, with a 404 error reported, from a specified client whose IP address is 1.2.3.4 to the www.aliyun.com domain:

      real_client_ip: 1.2.3.4 and matched_host: www.aliyun.com and status: 404
      Note In this example, the matched_host, cc_blocks, real_client_ip, and status fields are the fields defined in a WAF log.
    • Query log entries based on numeric fields

      You can execute the following statement to query the log entries that record slow requests whose response time exceeds 5 seconds:

      request_time_msec > 5000

      You can also query log entries based on a time range. For example, you can execute the following statement to query the log entries that record the requests whose response time exceeds 5 seconds but does not exceed 10 seconds:

      request_time_msec in (5000 10000]
      Note You can execute the following statement to obtain the same query results: request_time_msec > 5000 and request_time_msec <= 10000.
    • Check the field availability status

      You can determine whether a field is available in log entries by executing the following statements:

      • Query the log entries that contain the ua_browser field.
        ua_browser: *
      • Query the log entries that do not contain the ua_browser field.
        not ua_browser: *
For more information about the query statements supported by Log Service, see Overview.

Analytics syntax

You can use SQL-92 statements to analyze log entries.

For more information about the statement syntax and functions supported by Log Service, see Real-time analysis.
Note
  • You can omit the from Table name part (that is, from log) in standard SQL statements.
  • By default, the first 100 log entries are returned. You can modify the number by using the LIMIT syntax.

Query and analysis examples

Time-based log query and analysis

Each log entry has a time field, which indicates the time the log entry was generated. The time is in the format of yyyy-MM-ddTHH:mm:ss+Time zone. For example, in 2018-05-31T20:11:58+08:00, the time zone is UTC+8.

Each log entry has a built-in field __time__. This field also indicates the time the log entry was generated. The time is in the format of a UNIX timestamp. The value is used in the time-based calculation. The value of this field indicates the number of seconds that have elapsed since the UTC time 00:00:00, January 1, 1970. If you want to obtain a recognizable calculation result, you must convert the format first.
  • Select and display the time
    You can use the time field to display time information in logs. For example, search for the last 10 log entries that record the HTTP flood attacks on the www.aliyun.com domain. The attacks are blocked by WAF in a specific time period. The data about the time, real_client_ip, and http_user_agent fields is displayed.
    matched_host: www.aliyun.com and cc_blocks: 1 
    |  select time, real_client_ip, http_user_agent
        order by time desc
        limit 10
  • Calculate the time
    You can use the __time__ field to calculate the time. For example, calculate the number of days that have elapsed since the domain suffered an HTTP flood attack.
    matched_host: www.aliyun.com  and cc_blocks: 1 
    |  select time, round((to_unixtime(now()) - __time__)/86400, 1) as "days_passed", real_client_ip, http_user_agent
          order by time desc
          limit 10

    In this example, round((to_unixtime(now()) - __time__)/86400, 1) is used to calculate the number of days that have elapsed since the domain suffered an HTTP flood attack. First, use now() to obtain the current time and convert the current time into a Unix timestamp by using to_unixtime. Then, subtract the value of the built-in field __time__ from the converted time to obtain the number of seconds that have elapsed. Finally, divide this number of seconds by 86400 (the total number of seconds in a day) and apply the round(data, 1) function to keep one decimal place. The result is the number of days that have elapsed since each attack log entry was generated.

  • Perform statistical analysis in groups based on the time
    You can query the log entries about the trend of HTTP flood attacks on a specific domain within a specified time period.
    matched_host: www.aliyun.com  and cc_blocks: 1 
    | select date_trunc('day', __time__) as dt, count(1) as PV 
          group by dt 
          order by dt

    In this example, the built-in field __time__ is used by the date_trunc('day', ..) function to align the time of the entries by day. Each log entry is grouped based on the day when the log entry was generated. The total number of log entries in each group is counted by using count(1). Then, these entries are ordered by the group. You can use other values for the first parameter of the date_trunc function to group the log entries based on other time units, such as second, minute, hour, week, month, and year. For more information about this function, see Date and time functions.

    We recommend that you display the results in a line chart.

  • Group log entries by a custom time period
    If you want to use more flexible groupings to analyze log entries based on time, complex calculations are required. For example, you can query the log entries about the trend of HTTP flood attacks on a specified domain within every five minutes.
    matched_host: www.aliyun.com  and cc_blocks: 1 
    | select from_unixtime(__time__ - __time__% 300) as dt, 
             count(1) as PV 
          group by dt 
          order by dt 
          limit 1000

    In this example, the built-in field is used to align the time by using the formula __time__ - __time__% 300, and the from_unixtime function is used to convert the format of the result. Then, each log entry is assigned to a group that indicates a time period of five minutes (300 seconds), and the total number of log entries in each group is counted by using count(1). Finally, the query results are ordered by time range and the first 1,000 result entries are returned, which include the log entries that were generated within the first 83 hours of the specified time period.

    We recommend that you display the results in a line chart.

The date_parse and date_format functions convert the time format. For more information about the functions that can be used to parse the time, see Date and time functions.

Client IP address-based log query and analysis

A WAF log contains the field real_client_ip, which reflects the real client IP address. In scenarios where your website is accessed by using a proxy server or the IP address in a request header is incorrect, you cannot obtain the real IP address of the user. However, the remote_addr field forms a direct connection to the client, which can be used to obtain the real IP address.
  • Classify attackers by country
    You can query the log entries about the distribution of HTTP flood attacks by country.
    matched_host: www.aliyun.com  and cc_blocks: 1 
    | SELECT ip_to_country(if(real_client_ip='-', remote_addr, real_client_ip)) as country, 
             count(1) as "number of attacks" 
             group by country

    In this example, the function if(condition, option1, option2) returns the real client IP address. If real_client_ip is -, the function returns the value of remote_addr. Otherwise, the function returns the value of real_client_ip. Then, the ip_to_country function is used to obtain the country information from the client IP address.

    We recommend that you display the results on a world map.

  • Classify visitors by province
    To further obtain the distribution of visitors by province, you can use the ip_to_province function to obtain the province information from IP addresses.
    matched_host: www.aliyun.com  and cc_blocks: 1 
    | SELECT ip_to_province(if(real_client_ip='-', remote_addr, real_client_ip)) as province, 
             count(1) as "number of attacks" 
             group by province

    In this example, the ip_to_province function is used to obtain the country information from the real IP address of a client. If the IP address is in a region outside mainland China, the function can be used to attempt to obtain the specific state information. If you choose to display the results with a map of China, IP addresses that are in regions outside mainland China are not displayed.

    We recommend that you display the results with a map of China.

  • Classify attackers on a heat map
    You can use the ip_to_geo function to obtain the geographic information (the latitude and the longitude) from the real IP addresses of clients. This information can be used to generate a heat map to indicate the density of attacks.
    matched_host: www.aliyun.com  and cc_blocks: 1 
    | SELECT ip_to_geo(if(real_client_ip='-', remote_addr, real_client_ip)) as geo, 
             count(1) as "number of attacks" 
             group by geo
             limit 10000

    In this example, the ip_to_geo function is used to obtain the latitude and the longitude from the real IP addresses of clients. The first 10,000 result entries are returned.

    Select AMAP and click Show Heat Map.

For more information about IP-based functions, see IP functions. For example, you can use the ip_to_provider function to obtain the provider information of an IP address. You can use the ip_to_domain function to determine whether an IP address is public or private.