×
Community Blog The Details of the Stress Testing Guarantee Technology behind Livestreaming during Double 11

The Details of the Stress Testing Guarantee Technology behind Livestreaming during Double 11

This article introduces some of the livestreaming architectures and the challenges they bring to application architecture.

By Zijin

Reviewing & Proofreading by Fengyun

Editing & Typesetting by Jiuyuan

Released by Alibaba Developer

"From January this year to now, the number of Taobao Live users has exceeded 0.5 billion. By August, the network traffic has also increased by 59%, and it has increased by 55% on the core merchant GMV. Double 11 started on the evening of October 20. We hope Taobao Live, as home field, will take over this matter." A few days ago, Cheng Daofang, Head of the Livestreaming Division of Taobao Business Group, revealed in an interview with a reporter from 21st Century Business Herald that livestreaming in the past year has been thriving, and it will be more professional this coming year.

1

With such a large number of users, what different challenges do live applications bring to backend services? Today, we will introduce some of the livestreaming architectures and the challenges they bring to application architecture.

Livestreaming Architecture

We usually see the following kinds of livesteaming:

  1. Single-person livestreaming, such as Taobao Live, is usually accompanied by flash sales, live comments, gift rockets (user contribution), and other business logic.
  2. Multi-person livestreaming, such as voice chat and online group meetings
  3. Recorded Broadcasts – For some livestreaming scenarios, such as training and meetings, livestreaming videos need to be saved for dissemination and retention. There is a need to record the livestream. This often has low requirements for real-time performance.

When you watch a livestream, if the service is connected to Alibaba Cloud CDN, the playback end will select the nearest Alibaba Cloud CDN node to perform pull streaming play. At this time, the pull streaming pressure is on Alibaba Cloud CDN. If the Alibaba Cloud CDN is not connected, the playback end will perform pull streaming from the livestreaming origin server.

The following figure shows the architecture of a common video stream and two data trends:

2

  1. Video stream push-pull logic, as shown in the blue line
  2. Regular business logic, as shown in the yellow line

There are four main modules:

  1. Push Streaming End: Its main function is to collect audio and video data from the streamers and push it to the streaming media server.
  2. Streaming Media Server: The main function is to convert the data transmitted from the push streaming end into a specified format and push it to the playback end for users to watch. Cloud manufacturers also provide a complete set of solutions for the streaming media server.
  3. Business Server End: It mainly deals with some common business logic, such as flash sales and live comments.
  4. Playback End: In short, the playback end pulls audio and video for playback and presents the corresponding content to the users.

The protocol of the four key modules is the streaming media transmission protocol. The structure of most livestreaming adopts the format shown in the preceding figure. The difference is whether to introduce Alibaba Cloud CDN. In general, we recommend introducing Alibaba Cloud CDN to reduce the impact of live network traffic on servers. The agreement among the four modules does not emphasize consistency.

Next, we will discuss the most vulnerable risks of this architecture and how we can troubleshoot these risk points through stress testing.

Challenges with Livestreaming

Challenge 1: The Pressure of Video Streaming on the Streaming Media Server

In push-pull logic, it will have an impact on the streaming media server due to the large network traffic involved in the video and the long route. The common solution is to introduce Alibaba Cloud CDN. When users start watching the video, first, they will approach the nearest Alibaba Cloud CDN to pull the stream. If the video is not cached in Alibaba Cloud CDN at this time, Alibaba Cloud CDN will return to the streaming media server.

However, the risk exists when numerous users watch Alibaba Cloud CDN at the same time, which will push a large number of Alibaba Cloud CDN back to its origin. This kind of pulse network traffic will lead to unpredictable effects on the streaming server.

We usually use stress testing to verify the validity of the process in advance. We can even use stress testing to warm up the video in Alibaba Cloud CDN in advance. However, the traditional HTTP request protocol cannot support this scenario. The reasons are listed below:

  1. The open-source software srs_bench and JMeter provide some plug-ins. However, the open-source software requires users to have a deeper understanding of video protocols, and the threshold for use will be higher.
  2. Video pressure testing has a demanding requirement for bandwidth, which means the cost of pressure testing machines is high.
  3. Video stress testing needs to consider the impact of region on transmission quality.

Performance Testing Service (PTS) has added the RTMP/HLS protocol and made an abstraction in combination with pressure testing scenarios to solve the preceding problems, allowing users to use pressure testing of different protocols in an interface.

3

PTS also provides rich orchestrate modes, which can orchestrate scenarios easily and freely. More importantly, you can use PTS national customized mode to simulate customers’ requests from different places and detect problems more quickly.

Challenge 2: Low-Latency Interaction Protocols

Unlike traditional promotion events, livestreaming often interacts with offline customers with live comments, remarks, real-time chatting, and flash sales. If the host speaks enthusiastically, and the users do not respond, it is an unsuccessful livestream. However, common HTTP requests cannot meet timeliness requirements. Therefore, these features are usually implemented by WebSocket. Since HTTP is a stateless and connectionless protocol, WebSocket establishes a long chain through the server/client to ensure real-time messages and reduce the performance overhead.

Every time a WebSocket connection is established, an HTTP request is initiated during the handshake phase. The version number supported by WebSocket, the word version number of the protocol, the original address, the host address, and other contents are provided to the server through the HTTP protocol. The key part of the message is the head of the upgrade, which tells the server to upgrade the current HTTP request to the WebSocket protocol. If the server supports it, the returned status code must be 101:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept:xxxxxxxxxxxxxxxxxxxx

With the preceding return, the WebSocket connection was established successfully. Then, the data transmission service was carried out according to the WebSocket protocol.

For the communication process of WebSocket, JMeter provides plug-ins to simulate the whole process. However, it also requires you to understand the protocol, which is relatively obscure to use. PTS makes an abstraction of the business meaning. You can understand the complex protocols by configuring scenario configuration and pressure configuration. You only need to configure basic configuration, such as pressure testing URL, parameter settings, checkpoint settings, and other simple parameters.

4

In addition to livestreaming, WebSocket is widely used in scenarios with very high real-time requirements, such as online games, equity funds, live sports updates, chat rooms, live comments, and online education.

Challenge 3: High Concurrency Pulse Network Traffic

The service time of livestreaming applications is different from common applications since it is concentrated. Therefore, a large number of users will flood in over a few hours. A Big V livestream usually causes millions of users to log on. Therefore, the capability requirements of the livestreaming system corresponding to pulse network traffic become quite high. Moreover, when rushing to purchase, unlike a traditional flash sale, the livestreamers usually start the flash sale at a certain time. The time is often inaccurate. At the same time, the pulse network traffic has extremely high requirements for the system. Many problems that do not occur at ordinary times, such as lazy loading, jit preheating, hot and cold data switching, and other problems that do not occur in traditional large network traffic, will occur.

These two features require the stress testing tool to be able to initiate large network traffic instantly. This requires more machine engines and precise network traffic control to meet the demands of rapid network traffic growth.

These two points are the strengths of Alibaba Cloud PTS. Alibaba Cloud PTS stands on the shoulders of Double 11 giants and is an extension of Alibaba’s comprehensive process stress testing. PTS initiates millions of users' network traffic through scaling and elasticity, eliminating machine and labor costs. PTS can control network traffic precisely in real-time. It is an excellent solution to deal with the rapidly rising network traffic pulse of ApsaraVideo Live.

Conclusion

PTS has comprehensively upgraded the protocols supported by PTS in response to changes in the video and livestreaming industries. It supports traditional HTTP requests and introduces HTTP 2, streaming media, MQTT, and other protocols, allowing users to test anywhere.

Note: The products/solutions involved in this article are currently only published on the Alibaba Cloud domestic website. You are welcome to leave a message if you have an interest in an international version!

0 0 0
Share on

Alibaba Cloud Community

189 posts | 11 followers

You may also like

Comments