This topic provides a use case to describe how to use Realtime Compute for Apache Flink to monitor core video metrics.

Background information

As the Internet technologies evolve, live streaming, especially the live streaming ecological chain, attracts more and more attention. Live streaming allows viewers to watch various videos, such as sports events, major events, and news, online and in real time over the Internet.

Poor user experience leads to a loss of users. Therefore, live streaming platforms must focus on the following points:
  • Quality of experience (QoE) of casters and audiences: Focus on system metrics, such as frame freezing rate of video or audio signals, delay rate, and packet loss rate.
  • Timeliness: Identify issues related to the system in real time, and locate the issues in advance.
  • Overall customer operation of the website: track user trends and identify popular videos

This topic provides a use case to describe how to use Realtime Compute for Apache Flink to monitor the system stability and the operations of a live streaming platform.

Description

To build a highly interactive social community and cover more live streaming scenarios to realize more profit, a platform operator can take the following actions:
  • Hire multiple casters for its live streaming website.
  • Allow each caster to stream to audiences in a channel.
  • Allow users to watch the video of the caster in the current channel and hear the voice of the caster.
  • Allow each caster to invite multiple audiences in a channel to a private chat.
Figure 1. Workflow
  1. Casters and audiences use a live streaming app that sends streaming data to the server every 10 seconds.
  2. After the server receives data from the app, the server saves the data to its local disk. Alibaba Cloud Log Service then collects the data from the server.
  3. Realtime Compute for Apache Flink subscribes to the data on Log Service.
  4. Realtime Compute for Apache Flink analyzes live streaming data.

Business objectives

Obtain the following metrics based on the tracked logs sent from the client app:
  • Channel faults, including frame freezing, frame drop, and out of synchronization between the audio and video signals
  • Average end-to-end latency collected by region
  • Overall frame freezing rate collected in real time (Number of online users who encounter frame freezing/Total number of online users × 100%. This metric can be used to measure the scope of users who encounter frame freezing.)
  • Average number of frame freezing times per user (Total number of times online frame freezing occurs/Total number of online users. This metric can be used to measure the overall severity of frame freezing based on frame freezing times.)

It is expected that the preceding data is calculated and written to an ApsaraDB RDS database in real time. This way, online data and even some alerts can be displayed in reports and dashboards.

Data format

The following table describes the data format of the tracked logs that the app client sends to the server.
Field name Description
ip The IP address of the client.
agent The type of the client.
roomid The ID of the channel.
userid The user ID.
abytes The audio bitrate.
afcnt The number of audio frames.
adrop The number of dropped audio frames.
afts The audio timestamp.
alat The end-to-end latency of audio frames.
vbytes The video bitrate.
vfcnt The number of video frames.
vdrop The number of dropped video frames.
vfts The video timestamp.
vlat The end-to-end latency of video frames.
ublock The number of upstream frame freezing times.
dblock The number of downstream frame freezing times.
timestamp The timestamp when a log was generated.
region The region where live streaming is performed.
Log Service uses semi-structured storage and displays the preceding fields in the following log format:
{
    "ip": "ip",
    "agent": "agent",
    "roomid": "123456789",
    "userid": "123456789",
    "abytes": "123456",
    "afcnt": "34",
    "adrop": "3",
    "afts": "1515922566",
    "alat": "123",
    "vbytes": "123456",
    "vfcnt": "34",
    "vdrop": "4",
    "vfts": "1515922566",
    "vlat": "123",
    "ublock": "1",
    "dblock": "2",
    "timestamp": "15151922566",
    "region": "hangzhou"
}         

SQL statements

  • Data cleansing
    Declare the source table in Realtime Compute for Apache Flink.
    CREATE TABLE app_heartbeat_stream_source (
        ip VARCHAR,
        agent VARCHAR,
        roomid VARCHAR,
        userid VARCHAR,
        abytes VARCHAR,
        afcnt VARCHAR,
        adrop VARCHAR,
        afts VARCHAR,
        alat VARCHAR,
        vbytes VARCHAR,
        vfcnt VARCHAR,
        vdrop VARCHAR,
        vfts VARCHAR,
        vlat VARCHAR,
        ublock VARCHAR,
        dblock VARCHAR,
        `timestamp` VARCHAR,
        app_ts AS TO_TIMESTAMP(CAST(`timestamp` AS BIGINT)), // Specify the fields for which a watermark is generated. 
        WATERMARK FOR app_ts AS withOffset(app_ts, 10000) // Add an offset of 10 seconds to define a watermark. 
    ) WITH (
        type ='sls',
        endPoint ='http://cn-hangzhou-corp.sls.aliyuncs.com',
        accessId ='xxxxxxxxxxx',
        accessKey ='xxxxxxxxxxxxxxxxxxxxxxxxxxxx',
        project ='xxxx',
        logStore ='app_heartbeat_stream_source',
    );                
    For business convenience, all data is processed as the VARCHAR type in the source table. For subsequent processing, data in the source table is cleaned for the following purposes:
    1. Format conversion: Convert some data of the VARCHAR type to the BIGINT type.
    2. Business data supplement: For example, enter region-related information.
    CREATE VIEW view_app_heartbeat_stream AS
    SELECT
        ip,
        agent,
        CAST(roomid AS BIGINT),
        CAST(userid AS BIGINT),
        CAST(abytes AS BIGINT),
        CAST(afcnt AS BIGINT),
        CAST(adrop AS BIGINT),
        CAST(afts AS BIGINT),
        CAST(alat AS BIGINT),
        CAST(vbytes AS BIGINT),
        CAST(vfcnt AS BIGINT),
        CAST(vdrop AS BIGINT),
        CAST(vfts AS BIGINT),
        CAST(vlat AS BIGINT),
        CAST(ublock AS BIGINT),
        CAST(dblock AS BIGINT),
        app_ts,    
        region
    FROM
        app_heartbeat_stream_source;                 
  • Channel fault statistics
    Use a new window every 10 minutes to collect statistics on channel faults, including frame freezing, frame drop, and out of synchronization between the audio and video signals.
    CREATE VIEW room_error_statistics_10min AS
    SELECT
        CAST(TUMBLE_START(app_ts, INTERVAL '10' MINUTE) as VARCHAR) as app_ts,
        roomid,
        SUM(ublock) as ublock, // Collect statistics on the number of upstream frame freezing times in the last 10 minutes. 
        SUM(dblock) as dblock, // Collect statistics on the number of downstream frame freezing times in the last 10 minutes. 
        SUM(adrop) as adrop, // Collect statistics on the number of audio packets dropped in the last 10 minutes. 
        SUM(vdrop) as vdrop, // Collect statistics on the number of video packets dropped in the last 10 minutes. 
        SUM(alat) as alat, // Collect statistics on the audio latency in the last 10 minutes. 
        SUM(vlat) as vlat, // Collect statistics on the video latency in the last 10 minutes. 
    FROM
        view_app_heartbeat_stream
    GROUP BY
        TUMBLE(app_ts, INTERVAL '10' MINUTE), roomid;                   
  • Latency statistics collected by region
    Collect statistics on the average end-to-end latency of audio and video data by region every 10 minutes.
    CREATE VIEW region_lat_statistics_10min AS
    SELECT 
        CAST(TUMBLE_START(app_ts, INTERVAL '10' MINUTE) as VARCHAR) as app_ts,
        region,
        SUM(alat)/COUNT(alat) as alat,
        SUM(vlat)/COUNT(vlat) as vlat,
    FROM
        view_app_heartbeat_stream
    GROUP BY
        TUMBLE(app_ts, INTERVAL '10' MINUTE), region;               
  • Overall frame freezing rate collected in real time
    Calculate the overall frame freezing rate by using the following formula: Number of online users who encounter frame freezing/Total number of online users × 100%. This metric can be used to measure the scope of users who encounter frame freezing.
    CREATE VIEW block_total_statistics_10min AS
    SELECT
        CAST(TUMBLE_START(app_ts, INTERVAL '10' MINUTE) as VARCHAR) as app_ts,
        SUM(IF(ublock <> 0 OR dblock <> 0, 1, 0)) / CAST(COUNT(DISTINCT userid) AS DOUBLE) as block_rate, // COUNT(DISTINCT) is supported only in Blink of a version later than 1.4.4.
    FROM
        view_app_heartbeat_stream
    GROUP BY
        TUMBLE(app_ts, INTERVAL '10' MINUTE);         
  • Number of frame freezing times per user
    Calculate the number of frame freezing times per user by using the following formula: Total number of online frame freezing times/Total number of online users. This metric can be used to measure the overall severity of frame freezing based on the number of times frame freezing occurs.
    CREATE VIEW block_peruser_statistics_10min AS
    SELECT
        CAST(TUMBLE_START(app_ts, INTERVAL '10' MINUTE) as VARCHAR) as app_ts,
        SUM(ublock+dblock) / CAST(COUNT(DISTINCT userid) AS DOUBLE) as block_peruser, // COUNT(DISTINCT) is supported only in Blink of a version later than 1.4.4.
    FROM
        view_app_heartbeat_stream
    GROUP BY
        TUMBLE(app_ts, INTERVAL '10' MINUTE);                

Demo code and source code

Alibaba Cloud provides a complete demo for you based on core video metric monitoring for live streaming described in this topic. You can refer to the demo code to register upstream and downstream storage data and to develop your own solution to monitor core video metrics. You can click Demo to download it. To use the demo, you must upload a CSV file as a source table to DataHub and create an ApsaraDB RDS result table to store the result data.