Community Blog An Overview of Alibaba Cloud's Cutting-Edge Live Broadcast Technology

An Overview of Alibaba Cloud's Cutting-Edge Live Broadcast Technology

In this blog, we'll explore some of the interesting features of Alibaba Cloud's live broadcast technology and share its application scenarios.

By Alibaba Cloud Edge Plus


The Tokyo Olympics has come to an end. During the Games, hundreds of millions of global viewers swarmed to various broadcasting platforms to watch the games. The live broadcast capabilities of these platforms are particularly important. As the technology foundation of ApsaraVideo Live, Alibaba Cloud has advantages in product technology, resource bandwidth, and service assurance. On this basis, Alibaba Cloud can provide full-procedure technical support and guarantee for major live broadcast platforms, which ensures ultimate watching experience. This article introduces the implementation of live broadcast technology of Alibaba Cloud.

New Trends and Challenges in the Development of Live Video Broadcast

According to the forecast of iiMedia Research, a third-party consultancy, from 2017 to 2020, the live video broadcast industry has been in a stage of high-speed development. In 2020, the live video broadcast industry generated over 1 trillion yuan in market revenue, covering a total of 526 million users.

The application scope of live video broadcast has expanded from pan-Internet industries, including video entertainment and e-commerce, to traditional industries such as online education, video security, radio and television media, and medical services. "Live broadcast +" has become a new trend. With huge market potential, the live video broadcast industry is a competitive industry that involves lots of participants. To attract more users, live broadcast providers must be able to refine the live content, enrich live broadcast scenarios, and innovate their marketing models. To achieve these, live broadcast platforms need to incorporate real-time interaction and short videos, thus achieving better overall effect.

Live broadcast providers that build their own live broadcast platforms face great challenges:

  • Self-built systems involve large investment in resources and hardware as well as high cost of bandwidth. The result is also not perfect. For most enterprises, it is advised to save the money for their core business and choose a professional and flexible live broadcast service provider.
  • The implementation involves many technologies, such as distributed storage, distributed computing, video encoding and decoding, video encryption, and CDN delivery. This generates extremely high labor and time costs in terms of development and O&M.
  • The operating costs are high. Due to the burstiness of the live video broadcast industry, self-built systems couldn't meet the elastic bandwidth demand, resulting in high operating costs. In addition, the manual censoring of live content brings high operating costs.

Service Architecture of Alibaba Cloud ApsaraVideo Live

Alibaba Cloud ApsaraVideo Live is an audio and video live broadcast platform based on leading technologies, including content access and distribution network and large-scale distributed real-time video processing. It features easy access, low latency, and high concurrency, providing high-definition and smooth audio and video live broadcast services.


As shown in the preceding figure, a caster collects live content from collection devices and then uses the stream ingest SDK to push live stream. The ApsaraVideo Live service pushes live stream to the live broadcast center of Alibaba Cloud through edge stream ingest. Then, the video stream is accelerated through CDN edge nodes to ensure the stability of uplink transmission. After the video stream is delivered to the live center, the caster can process the stream based on your needs. For example, the caster can transcode the stream, perform time shifting, record the stream, or capture some snapshots of the stream.

The processed stream is delivered to client devices for playback through CDN nodes. Mobile players can be developed by integrating player SDK provided by Alibaba Cloud. In addition to transcoding and capturing snapshots of live stream, users can deliver the recorded live stream to ApsaraVideo VOD by using the Live-to-VOD feature. In ApsaraVideo VOD, users can edit the recorded live stream online as short videos and provide the recorded live stream as on-demand videos. This process associates live streaming with the production and dissemination of short videos.

Core Advantages of ApsaraVideo Live


Global Acceleration: Global Network of Edge Cloud Nodes

Alibaba Cloud has over 2,800 edge cloud nodes around the world and nine live centers. It supports the seamless layout of overseas business. Supported by the global real-time transport network (GRTN) for audio and videos of Alibaba Cloud, live streams from the whole world can be accessed from the nearest point and quickly transmitted to designated live centers through express connect for content delivery.


Ultimate Audio-visual Effect: Exclusive Audio and Video Technologies to Ensure the Best Experience

Alibaba Cloud's Narrowband HD technology can intelligently analyze the scenes, actions, content, textures, and other details in videos. For example, for different content such as footballs, players, and grass in football matches, encoding optimizations based on different strategies are implemented. Thus, the bitrate is reduced while the image output continues, saving the bandwidth cost by 20% to 40%.


The image on the left shows normal transcoding, while the image on the right shows Narrowband HD transcoding. When the audience sees this picture, the focus would be on the human face. With intelligent analysis, the system assigns more bitrates to the human face so as to achieve better recognition of the texture of the whole human face by making the details clearer. Now, let's look at the bitrate analysis. If the video image on the left is complex, the bitrate is between 1.5 MB and 2 MB.

When there are less details in the video image, for example, during the halftime break of a football match, we can use intelligent recognition to reduce the consumption of the bitrate. With this technology, the overall bandwidth is reduced by 30% to 40% on average. In other words, the bandwidth is saved while ensuring clearer images. This is Alibaba Cloud's Narrowband HD 2.0 technology.


Alibaba Cloud has also developed its real-time high-performance video encoder called Ali S265, which supports H265 1080p high-quality real-time transcoding, video enhancement algorithms, and image enhancement. Encoding in live video broadcast scenarios has a critical prerequisite. That is, encoding must be real-time, which means a one-hour video must be transcoded in one hour. More precisely, for example, the video content of each second needs to be transcoded in one second one by one to ensure the real-time transcoding.

Ali S265 can achieve 1080p high-quality real-time transcoding for videos and use an image enhancement algorithm to enhance the image quality. In the example above, you can see that the details of the snowflakes on the tree behind the animal have been enhanced after being processed by Ali S265. On the basis of ensuring real-time transcoding and image quality, the image is processed by an enhancement algorithm to be clearer and more layered.

Leading Technology: Further Innovations of Live Broadcast Technology

Based on ApsaraVideo Live, Real-time Streaming (RTS) optimizes underlying technologies such as full-procedure latency monitoring, CDN protocol transformation, and UDP. By integrating with ApsaraVideo Player SDK, it achieves millisecond-level latency among nodes in scenarios with tens of millions of concurrent requests. This reduces the latency of 3 to 6 seconds in traditional live broadcasts and ensures low latency, less stalling, and ultimately fast access and smooth live streaming watching experience. RTS has multiple technical advantages and can be widely used in various industrial scenarios. With practical experience for hundreds of customers, RTS brings great value to the business.


Based on ApsaraVideo Live and ApsaraVideo Media Processing (MTS), the Production Studio service by Alibaba Cloud is developed to transform traditional tools for video production on the cloud. The effects of directed videos are innovated by integrating video AI recognition, bilingual translation, and various interaction features. You can use the Production Studio service on demand without purchasing extra hardware. The production studio service provides the production console, APIs, and Web SDKs. You can access them as needed to facilitate secondary development or for direct use. The console is easy to interact with and can reduce learning costs.

In addition to live stream and on-demand video sources, multiple types of content sources, such as pictures, documents, and web pages, are supported. A maximum of six videos can be mixed and encoded at the same time. Capabilities such as multi-view, real-time image and text packaging components, multi-language subtitles, and video AI are provided. They help to package and produce live broadcasts at any time and synchronize them online with one click, creating a wonderful and immersive live broadcast experience.


The multi-location function combines and switches among multiple streams from multiple locations on different sites of the event. Videos from different locations are transmitted through video frame-level synchronous playback, enabling users to have multiple viewing angles at the same time and helping them enjoy all wonderful scenes. The virtual studio is realized by using the real-time automatic matting technology based on the depth algorithm, which supports multi-device, multi-location, and remote broadcast. Through cloud matting and synthesis capabilities, broadcast scenes such as dual screens, split screens, and picture in picture are realized, creating an immersive live broadcast experience.

This feature is used to gather multiple video programs, create live broadcast rooms similar to carousel studios, and diversify live broadcast scenarios and program forms. Users can add, remove, modify, and search for programs in an episode list and modify program content. Users can use this feature to implement business scenarios in a flexible, easy, and collaborative manner.

Production Studio real-time subtitles, integrated with production studio, Damo Academy ASR, and translation services, provides real-time multi-language voice-to-subtitle service for live stream. It supports long-term storage of translated subtitles during live recordings and settings of various parameters such as font, background, effect, and display time. In addition, flexible use of templates in multiple languages such as Chinese, English, French, Spanish, and Russian is also possible. Moreover, real-time overlay of subtitles is implemented in the process of converting live broadcast voice to text, and the translation is integrated into the live stream in the form of subtitles for display.


Production Studio also supports the integration of live video clips, on-demand video clips, images, texts, dynamic H5 component materials, and AI capabilities. By doing so, it reconstructs the production procedure of video content, displays data information in multiple dimensions, enhances content richness, expands traffic exposure, and gains through advertisements.

Video Intelligence: The Application of Video AI

The Video Review service is realized based on massive labeled data and deep learning algorithms. This service can accurately identify prohibited content in media files, including pornography, violence, terrorism, advertising, and unhealthy scenarios, in several dimensions, including voices, texts, and visual display. This service also supports the content review of videos, images, and files to ensure content security.


Stream ingest SDK of Alibaba Cloud is a powerful audio/video broadcast service based on Content Delivery Network (CDN) and audio/video real-time communication technologies of Alibaba Cloud. It provides easy-to-use open APIs, smooth and network-adaptive playback experience, multi-node-based low-latency optimization, and real-time retouching. Intelligent retouching is a detection and recognition technology for a large number of human faces based on intelligent vision algorithms. It provides capabilities such as retouching, shaping, and makeup beautifying and shooting filters and stickers.

The exclusive locating technology for facial key positions covers 106 basic positions and 280 high-accuracy positions, which makes effects realistic. The intelligent vision algorithm and real-time rendering technology are optimized on a regular basis for a better user experience. Face retouching and shaping effects, filters, stickers, and materials are constantly upgraded and enriched to make images more enjoyable. Comprehensive developer support ensures quick response to customer needs as well as excellent and reliable services.


Security and Stability: Multiple Security Policies to Ensure the Security of Live Videos

ApsaraVideo Live supports access control, such as the Refer UA blacklist/whitelist and the IP blacklist/whitelist. It also supports playback center authentication and business remote authentication. Playback center authentication includes URL authentication for stream ingest and playback. Secure URL authentication supports customized authentication keys and authentication expiration time to dynamically generate authentication URLs. The business remote authentication refers to transmitting the business request information to the customized authentication center of the customer for validity check.

Reliable and stable live broadcast is achieved through the switching between active and standby streams. The switching process is simple and easy to operate. ApsaraVideo Live supports customized authentication by using EdgeScript. Users can customize authentication scripts based on the business features, thus achieving fast deployment and publishing. Users can compile EdgeScript on CDN edge nodes for live broadcast without paying attention to the hardware configuration, region deployment, scheduling, and automatic scaling of the machine. After being uploaded, the edge cloud nodes of ApsaraVideo Live can be deployed at the globe. The requests from all over the world can be processed on global edge nodes based on the code logic.

Live video encryption is a cloud-device integrated video encryption solution that uses a proprietary cryptography algorithm to ensure the security of video stream transmission. It supports general-purpose DRM encryption, as well as multi-terminal, multi-platform, and comprehensive copyright protection. This encryption solution uses independent encryption keys to avoid a wide range of security problems caused by the leakage of a single key. It supports encryption transcoding and decryption for playback. With dynamic key management, this solution provides better protection for video resources and effectively prevents video leaks and hotlinks. With the application of digital watermarking technology in live videos, we can obtain evidence, trace the source, and investigate the responsible persons for infringement of copyrights in live broadcasts of major sports events.

ApsaraVideo Live provides real-time monitoring of the quality of live stream ingest, views, error status, viewers, playback traffic bandwidth, and playback quality in seconds. Users can detect the exceptions in the live broadcast process in a timely manner with ultra-low latency. Real-time log delivery is designed to deliver the logs of domains in ApsaraVideo Live to Log Service. Users can also analyze the logs to detect and identify issues related to stream ingest or formulate operation strategies based on the analysis of live stream audience.

Application Scenarios of ApsaraVideo Live

Based on their applications, typical live video broadcast scenarios include live broadcasts of large sports events, pan-entertainment (shows, games, and social media), e-commerce, party activities, online education, and enterprises.

Live Broadcast of Major Events


  • Applicable scenarios: Live broadcast of large events such as the Olympic Games, World Cup, sports events, and e-sports events.
  • Scenario demands: Highly reliable, high-quality, and low-latency live broadcast services. Support for stable and smooth concurrent viewing of tens of millions of users, full-procedure disaster recovery and emergency plans, and cinema-like immersive viewing experience.
  • Absolute stability: Active/standby stream ingest, remote dual-center disaster recovery, multi-bitrate alignment, and httpDNS + 302 scheduling. Various solutions ensure stable live broadcasts.
  • Content upgrading: Smart production automatically generates competition highlights; second-level time shifting for highlights; insertion at the beginning and end of sports event broadcasts; brand LOGO exposure; integration of videos and ads. Production studio helps to boost brand marketing and monetization under huge traffic.
  • Extremely smooth viewing experience: Narrowband HD™ 2.0 provides cinema-like image quality. 50-frame definition gives users a smoother immersive viewing experience. Multiple streams are merged and optimized on the cloud to dynamically generate the stream with the best frame rate for output.
  • Live broadcast security: Live broadcast DRM ensures content security. It guarantees smooth user experience for tens of millions of users who concurrently broadcast videos and interact with viewers on bullet screen through comprehensive disaster recovery and emergency plans.

Pan-entertainment Live Broadcast


  • Application scenarios: Live shows (live singing and talk shows), UGC videos (life, entertainment, and making friends), and live game commentary.
  • Scenario demands: Live broadcast of shows, games, and social activities; low-cost live broadcast transcoding and distribution capabilities for global culture and entertainment broadcasting industries to help customers quickly broadcast activities worldwide.
  • Capabilities of the live broadcast side: The caster ingests streams on mobile phones or PCs, and viewers watch streams on terminals. Alibaba Cloud ApsaraVideo Live provides the stream ingest SDK and play SDK with built-in face beautifying functions.
  • Capabilities of the live broadcast server: The GRTN transmission network and real-time transcoding capability enable stable, smooth, and high-quality live content for millions of viewers.
  • Interaction between casters and audiences: Likes, comments, and interactions in streaming studios.
  • Automated review plans: According to user's criteria, flexibly adjust the strategy for reviewing pornographic and terrorism content in videos. Multiple recognition solutions for multiple audio scenarios. Scheduled control for specific ad events to identify ad variants. Effective identification of static and meaningless video content to improve operation and control efficiency.

E-commerce Live Broadcast


  • Applicable scenarios: The live broadcast of the mall (product introduction, list sharing, sales conversion maximizing) and WeChat-business interaction ("live + interaction" mode, promoting the sales of products through social media).
  • Scenario demands: Two most important aspects of live commerce are the live broadcast capability and the interactive communication capability. With live broadcast capability, viewers can watch live streams; with interactive communication capability, viewers can participate in real-time interactions in the studio.
  • Capabilities of the live broadcast side: The caster can ingest streams through mobile phones, computers, or professional devices. Visitors can watch the videos through mobile phones, PCs, the Web, and applets.
  • Capabilities of the live broadcast server: The live broadcast server can access the nearest live streams and deliver them after acceleration. This ensures stable, smooth, and high-definition live content for buyers from around the world.
  • Live broadcast recording and playback: Short videos are generated based on the wonderful product introductions during the live broadcast process. After the live broadcast, these short videos can be accumulated as wonderful content to promote product sales. The time shifting feature allows playback of any highlights in the live broadcast process on demand, and buyers can drag and drop the timeline to watch missed highlights.
  • Interaction between casters and buyers: Buyers may inquire about some information about certain goods, leave comments, or interact with the casters during live broadcasts. Flash sales, lottery, and interactive marketing with red envelopes may also be involved in live broadcasts. The end-to-end latency of live stream is one second or less.

Party Activities Live Broadcast


  • Applicable scenarios: Live broadcast for activities related to news reports, sports shows, and variety shows.
  • Scenario demands: High-quality and highly reliable live broadcast for various evening galas and activities. Audio-visual feast of live video broadcast featuring UHD,Dolby Atmos, and large-scale global content delivery.
  • Higher definition + Dolby Atmos: 4K HD broadcast vehicle for signal transmission and Dolby Atmos make the sound heard by the audiences in front of the screen "more real than that heard by the on-site audiences". While the users are engaged in shopping spree, they enjoy both audio and visual perception.
  • High reliability: The dual-channel SRT return technology ensures seamless switching. If any abnormities occur in the main signal channel, the screens are not interrupted after being switched to the standby channel. This ensures high-quality transmission of important content from program sources in complex networks. It enables more stable, faster, and more complete content delivery to user screens at a lower cost.

Live Broadcast of Online Education


  • Applicable scenarios: Large classes of adult education.
  • Scenario demands: Stable viewing of high-quality live classes for students in different regions with different internet connections. RTS ensures better teacher-student interaction (low-latency live streams and synchronous message interaction). Live broadcast security protects core teaching content.
  • Live broadcast concurrent views: ApsaraVideo Live supports tens of millions of concurrent views and covers more than 2,800 CDN nodes around the world. ApsaraVideo Live reserves a bandwidth of 150 Tbit/s and provides a leased line to guarantee live broadcast quality across countries.
  • Live playback recording: The live courses are recorded on the cloud and can generate a playback file that can be viewed at any time. Live broadcast time shifting supports on-demand playback of any highlights.
  • Live interaction and Q&A: The interaction between teachers and students promotes teaching effect of online education. The integrated SDK supports interactive video connection, comment presentation, and group management.
  • Live broadcast security and anti-theft: ApsaraVideo Live supports link and content encryption functions, such as URL authentication, remote authentication, Alibaba encryption, and DRM encryption. They ensure that video content is protected from piracy and illegal content.

Enterprise Live Broadcast


  • Applicable scenarios: Enterprise marketing live broadcast and financial live broadcast.
  • Scenario demands: No perfect offline studios and professional live broadcast organizations. Multiple camera locations, directors, and virtual studios on the cloud are required to perfect live broadcast content. At the same time, the live broadcast needs to be low-latency and highly interactive.
  • Short-latency interactive live broadcast to achieve good marketing results: Live broadcast for enterprise marketing and financial scenarios require complete features, high cost performance, and ultra-low latency. The live broadcast should support millions of concurrent views at an end-to-end latency of 1 second. The interaction is more timely, the sec-killing and red envelope activities are smoother, and the GMV and user conversion are improved.
  • Cloud-based video processing makes live broadcast more professional: Marketing live broadcast prepares live broadcast content and strategies more targeted and makes the content more informative and professional. Production studio supports video on demand and converged switching between live sources and enables seamless insertion of content such as premium VOD trailers during live broadcast. The virtual studio supports multiple devices, multiple camera locations, and remote broadcasting. Through the cloud matting and synthesis capabilities, dual-screen, split-screen, picture-in-picture, and other broadcasting scenes can be implemented, creating immersive live broadcast experience.

Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.

0 0 0
Share on

Alibaba Clouder

2,600 posts | 754 followers

You may also like