ログデータ分析の一般的な例 - Simple Log Service - Alibaba Cloud ドキュメントセンター

このトピックでは、ログデータ分析の例をいくつか紹介します。

過去 5 分間でエラー率が 40% を超えた場合にアラートをトリガーする

HTTP 500 エラーが返される割合を毎分計算し、過去 5 分間でエラー率が 40% を超えた場合にアラートをトリガーします。

status:500 | select __topic__, max_by(error_count,window_time)/1.0/sum(error_count) as error_ratio, sum(error_count) as total_error  from (
select __topic__, count(*) as error_count , __time__ - __time__ % 300  as window_time  from log group by __topic__, window_time
) 
group by __topic__  having  max_by(error_count,window_time)/1.0/sum(error_count)   > 0.4  and sum(error_count) > 500 order by total_error desc limit 100

トラフィック統計を収集し、アラートを設定する

トラフィック統計を毎分収集し、トラフィック量が特定のしきい値を下回った場合にアラートをトリガーします。直前の 1 分間に収集された統計は、完全な 1 分間をカバーしていません。1 分あたりの平均トラフィック統計を収集するには、正規化のために統計値を greatest(max(__time__) - min(__time__),1) で除算します。

* | SELECT SUM(inflow) / greatest(max(__time__) - min(__time__),1) as inflow_per_minute, date_trunc('minute',__time__)  as minute group by minute

データサイズごとの平均レイテンシを計算する

データサイズに基づいてデータを複数のバケットに分散し、各バケットのデータの平均レイテンシを計算します。

* | select avg(latency) as latency , case when originSize < 5000 then 's1' when originSize < 20000 then 's2' when originSize < 500000 then 's3' when originSize < 100000000 then 's4' else 's5' end as os group by os

さまざまな結果の割合を返す

さまざまな部署のカウント結果と結果の割合を返します。このクエリには、サブクエリとウィンドウ関数が含まれています。sum(c) over() は、すべての行の値の合計を示します。

* |  select   department, c*1.0/ sum(c) over ()  from(select  count(1) as c, department   from log group by department)

クエリ条件を満たす URL の数をカウントする

特性に基づいて URL の数をカウントするには、CASE WHEN 句または COUNT_IF 句を使用できます。後者の句の方が簡単です。

* | select  count_if(uri like '%login') as login_num, count_if(uri like '%register') as register_num, date_format(date_trunc('minute', __time__), '%m-%d %H:%i') as time  group by time order by time limit 100

一般的な集計分析

PV 別にクライアントのグローバル分布をクエリする

ip_to_country 関数を使用して IP アドレスが属する国を取得し、ip_country フィールドに基づいて国を集計してから、各国にあるクライアントの数をカウントします。結果を表示するために世界地図を設定できます。詳細については、「世界地図を設定する」をご参照ください。

* |
select
  ip_to_country(client_ip) as ip_country,
  count(*) as pv
group by
  ip_country
order by
  pv desc
limit
  500

リクエストメソッドのカテゴリと PV トレンドをクエリする

時間値を分単位で切り捨て、切り捨てられた時間値を集計およびグループ化して request_method フィールドに基づいてページビュー（PV）数を計算し、時間値をソートします。結果を表示するために折れ線グラフを設定できます。折れ線グラフでは、x 軸を t、y 軸を pv、集計列を request_method に設定します。詳細については、「折れ線グラフ」をご参照ください。

* |
select
  date_format(date_trunc('minute', __time__), '%m-%d %H:%i') as t,
  request_method,
  count(*) as pv
group by
  t,
  request_method
order by
  t asc
limit
  10000

http_user_agent ベースのリクエストの PV による分布をクエリする

http_user_agent フィールドに基づいてデータを集計およびグループ化し、各エージェントからのリクエスト数とクライアントに返された合計応答サイズをクエリします。合計応答サイズの単位をバイトから MB に変換し、サイズを小数点以下 2 桁に四捨五入します。次に、CASE WHEN 式を使用して、ステータス文字列を 2xx、3xx、4xx、5xx などのレイヤーに分類し、各レイヤーの割合を計算します。結果を表示するためにテーブルを設定できます。詳細については、「基本テーブル」をご参照ください。

* |
select
  http_user_agent as "User agent",
  count(*) as pv,
  round(sum(request_length) / 1024.0 / 1024, 2) as "Request traffic (MB)",
  round(sum(body_bytes_sent) / 1024.0 / 1024, 2) as "Response traffic (MB)",
  round(
    sum(
      case
        when status >= 200
        and status < 300 then 1
        else 0
      end
    ) * 100.0 / count(1),
    6
  ) as "Percentage of status code 2xx (%)",
  round(
    sum(
      case
        when status >= 300
        and status < 400 then 1
        else 0
      end
    ) * 100.0 / count(1),
    6
  ) as "Percentage of status code 3xx (%)",
  round(
    sum(
      case
        when status >= 400
        and status < 500 then 1
        else 0
      end
    ) * 100.0 / count(1),
    6
  ) as "Percentage of status code 4xx (%)",
  round(
    sum(
      case
        when status >= 500
        and status < 600 then 1
        else 0
      end
    ) * 100.0 / count(1),
    6
  ) as "Percentage of status code 5xx (%)"
group by
  "User agent"
order by
  pv desc
limit
  100

当月の 1 日あたりの消費量とトレンド予測結果をクエリする

請求書番号に基づいてデータをグループ化し、時間、サービス識別子、請求タイプ、支払い額などの情報を取得します。日次費用を集計および計算し、費用が発生した最新の日を時間で検索します。sls_inner_ts_regression 関数を使用して、当月の 1 日あたりの消費量とトレンド予測結果を取得します。sls_inner_ts_regression 関数では、時間、費用、渡された文字列、予測期間、予測ポイント数が指定されます。渡された文字列の名前をエイリアスとして使用できます。当月の 1 日あたりの消費量とトレンド予測結果を表示するために折れ線グラフを設定できます。折れ線グラフでは、x 軸を時間、y 軸を実際の消費量と予測消費量に設定します。詳細については、「折れ線グラフ」をご参照ください。

source :bill |
select
  date_format(res.stamp, '%Y-%m-%d') as time,
  res.real as "Actual consumption",case
    when is_nan(res.real) then res.pred
    else null
  end as "Forecast consumption",
  res.instances
from(
    select
      sls_inner_ts_regression(
        cast(day as bigint),
        total,
        array ['total'],
        86400,
        60
      ) as res
    from
      (
        select
          *
        from
          (
            select
              *,
              max(day) over() as lastday
            from
              (
                select
                  to_unixtime(date_trunc('day', __time__)) as day,
                  sum(PretaxAmount) as total
                from
                  (
                    select
                      RecordID,
                      arbitrary(__time__) as __time__,
                      arbitrary(ProductCode) as ProductCode,
                      arbitrary(item) as item,
                      arbitrary(PretaxAmount) as PretaxAmount
                    from
                      log
                    group by
                      RecordID
                  )
                group by
                  day
                order by
                  day
              )
          )
        where
          day < lastday
      )
  )
limit
  1000

当月の各サービスの消費量の割合をクエリする

サービス名ごとに各サービスの合計費用を集計および計算し、row_number 関数を使用して費用でサービスを降順にソートし、費用が最も高い上位 6 つのサービスを検索してから、その他のサービスをその他にグループ化します。結果を表示するためにドーナツチャートを設定できます。詳細については、「ドーナツチャートを設定する」をご参照ください。

source :bill |
select
  case
    when rnk > 6
    or pretaxamount <= 0 then 'Other'
    else ProductName
  end as ProductName,
  sum(PretaxAmount) as PretaxAmount
from(
    select
      *,
      row_number() over(
        order by
          pretaxamount desc
      ) as rnk
    from(
        select
          ProductName,
          sum(PretaxAmount) as PretaxAmount
        from
          log
        group by
          ProductName
      )
  )
group by
  ProductName
order by
  PretaxAmount desc
limit
  1000

前日の費用と前月の同日の費用の比率を計算する

前日の費用を計算し、COALESCE 式を使用して費用が発生しなかった場合に 0 が返されるようにし、round 関数を使用して結果を小数点以下 3 桁に四捨五入してから、compare 関数を使用して前日の費用と前月の同日の費用の比率を計算します。結果を表示するためにトレンドチャートを設定できます。詳細については、「トレンドチャート」をご参照ください。

source :bill |
select
  round(diff [1], 2),
  round(diff [2], 2),
  round(diff [3] * 100 -100, 2)
from(
    select
      compare("Expense of the previous day", 604800) as diff
    from(
        select
          round(coalesce(sum(PretaxAmount), 0), 3) as "Expense of the previous day"
        from
          log
      )
  )

Tomcat Web サービス分析

Tomcat リクエストステータスのトレンドをクエリする

date_trunc 関数を使用してログの時間値を分単位で切り捨て、date_format 関数を使用して時間と分の時間値を抽出し、GROUP BY 句を使用して抽出された時間値と状態コードに基づいてデータを集計して、1 分あたりの各状態コードの数をカウントします。結果を表示するためにフローチャートを設定できます。フローチャートでは、x 軸を時間、y 軸をカウント、集計列をステータスに設定します。詳細については、「フローチャート」をご参照ください。

* |
select
  date_format(date_trunc('minute', __time__), '%H:%i') as time,
  COUNT(1) as c,
  status
GROUP by
  time,
  status
ORDER by
  time
LIMIT
  1000

Tomcat アクセスの PV と UV の経時分布をクエリする

time_series 関数を使用してログの時間値を 2 分単位で切り捨て、時間でデータを集計してから、PV 数とユニークビジター（UV）数を計算します。approx_distinct 関数を使用して重複排除後の PV 数と UV 数を計算し、結果を時間でソートします。結果を表示するために複数の y 軸を持つ折れ線グラフを設定できます。折れ線グラフでは、x 軸を時間、y 軸を uv と pv に設定します。詳細については、「複数の y 軸を持つ折れ線グラフを作成する」をご参照ください。

* |
select
  time_series(__time__, '2m', '%H:%i', '0') as time,
  COUNT(1) as pv,
  approx_distinct(remote_addr) as uv
GROUP by
  time
ORDER by
  time
LIMIT
  1000

Tomcat エラーリクエストの数と前時間との比較をクエリする

内部 SQL ステートメントを使用して状態コードが 400 以上のリクエスト数を取得し、SQL ステートメントの中央にある compare 関数を使用して 3,600 秒前に生成されたデータを取得します。外部 SQL ステートメントでは、c1 は現在のエラーリクエスト数、c2 は 3,600 秒前に生成されたエラーリクエスト数、c3 は c1 と c2 の比率を指定します。結果を表示するためにドーナツチャートを設定できます。ドーナツチャートでは、c1 は表示値、c3 は比較値です。詳細については、「ドーナツチャートを設定する」をご参照ください。

status >= 400 |
SELECT
  diff [1] AS c1,
  diff [2] AS c2,
  round(diff [1] * 100.0 / diff [2] - 100.0, 2) AS c3
FROM
  (
    select
      compare(c, 3600) AS diff
    from
      (
        select
          count(1) as c
        from
          log
      )
  )

Tomcat リクエストの上位 10 個の URI をクエリする

request_uri フィールドに基づいてデータを集計およびグループ化し、各 request_uri の PV 数を計算し、PV で結果を降順にソートしてから、上位 10 個のレコードを取得します。基本的なバーゲージを設定できます。バーゲージでは、x 軸をページ、y 軸を pv に設定します。詳細については、「基本的なバーゲージを設定する」をご参照ください。

* |
SELECT
  request_uri as page,
  COUNT(*) as pv
GROUP by
  page
ORDER by
  pv DESC
LIMIT
  10

Tomcat クライアントのタイプと分布をクエリする

user_agent フィールドに基づいてデータを集計および分析し、各タイプのクライアントの数を計算します。結果を表示するためにドーナツチャートを設定できます。ドーナツチャートでは、タイプを user_agent、値の列を c に設定します。詳細については、「ドーナツチャートを設定する」をご参照ください。

* |
SELECT
  user_agent,
  COUNT(*) AS c
GROUP BY
  user_agent
ORDER BY
  c DESC

送信 Tomcat トラフィックの統計を収集する

time_series 関数を使用して時間値を切り捨て、時間値を集計して body_bytes_sent フィールドの値の合計を計算します。折れ線グラフの x 軸と y 軸を設定して結果を表示できます。折れ線グラフでは、x 軸を時間、y 軸を body-sent に設定します。詳細については、「折れ線グラフの x 軸と y 軸を設定する」をご参照ください。

* |
select
  time_series(__time__, '10s', '%H:%i:%S', '0') as time,
  sum(body_bytes_sent) as body_sent
GROUP by
  time
ORDER by
  time
LIMIT
  1000

Tomcat エラーリクエスト数の割合をクエリする

内部 SQL ステートメントで状態コードが 400 以上のエラーリクエスト数とリクエストの総数をクエリし、エラー Tomcat リクエスト数とリクエスト総数の比率を計算します。結果を表示するためにダイヤルを設定できます。詳細については、「ダイヤルを設定する」をご参照ください。

* |
select
  round((errorCount * 100.0 / totalCount), 2) as errorRatio
from
  (
    select
      sum(
        case
          when status >= 400 then 1
          else 0
        end
      ) as errorCount,
      count(1) as totalCount
    from
      log
  )