You can use Function Compute and Serverless Workflow to build an elastic and highly available audio and video processing system in a serverless architecture. This topic describes the benefits and scenarios of this serverless solution, and compares this serverless solution with the conventional solution in terms of performance, costs, and engineering efficiency.

New requirements for audio and video processing

Among video-on-demand solutions, video transcoding is the one that consumes the most computing power. You can use the dedicated transcoding services provided by Alibaba Cloud. However, you may build your transcoding service in the following scenarios:
  • You require a more elastic video processing service.

    For example, you have deployed a video processing service on a virtual machine or container platform by using FFmpeg. You want to improve the elasticity and availability of the service.

  • You require parallel processing of multiple files.

    For example, you have the need to process a large number of videos at a time, and you require parallel processing of the video files.

  • You have the need to process a large number of oversized videos at a time with high efficiency.

    For example, hundreds of 1080p videos, each more than 4 GB in size, are regularly generated every Friday. You have the need to complete the processing of the videos within a few hours.

  • You require more advanced custom processing features.

    For example, you want to record the transcoding details in your database each time a video is transcoded. Alternatively, you want the popular videos to be automatically prefetched to Alibaba Cloud CDN nodes after the videos are transcoded to relieve pressure on the origin server.

  • You have the need to convert audio formats, customize the sample rate of audio streams, or reduce noise in audio streams.
  • You require simple transcoding services or lightweight media processing services.

    For example, you want to obtain a GIF image that is generated based on the first few frames of a video, or query the duration of an audio file or a video file. In this case, building a custom media processing system is cost-effective.

  • You have the need to directly read and process your mezzanine files.

    For example, your mezzanine video files are stored in Apsara File Storage NAS or on disks that are attached to Elastic Compute Service (ECS) instances. If you build a custom video processing system, the system can directly read and process your mezzanine video files, without migrating them to Object Storage Service (OSS).

  • You have the need to add more features to your media processing system that can be used to convert videos to other formats.

    For example, your video processing system can be used to transcode videos, add watermarks to videos, and generate GIF images based on video thumbnails. You want to add more features to your video processing system, such as adjusting the parameters used for transcoding. You also hope that the existing online services provided by the system are not affected when new features are released.

If you have one of the preceding requirements for your media processing system, or you want to use an elastic, highly available, cost-effective, and O&M-free system that supports all processing logic, you can see the best practices described in the following sections of this topic.

Conventional solution

As computer and network technologies develop, the video-on-demand technology is favored by industries such as education and entertainment because this technology provides excellent human-computer interaction and media streaming services. Nowadays, cloud service providers are continuously refining and optimizing their product lines. This frees you from hardware procurement and technology development when you build video-on-demand applications in the cloud. The following figure shows the conventional solution of Alibaba Cloud. buhuosanyinshipinduanyunliant

In this solution, OSS can store a large number of videos. The uploaded videos can be transcoded to adapt to the playback speed of different regular terminals and CDN-accelerated terminals.

Simple video processing system

You may require simple video processing. The following figure shows the architecture of the solution. zhuomianosschufaqisimple

When you upload a video to OSS, the OSS trigger automatically triggers a specific function. The function uses FFmpeg to transcode the video and sends the transcoded video to OSS. For more information about the demo and procedure of the simple video processing system, see simple-video-processing. For more information about OSS triggers, see Overview.

Video processing workflow system

You may want to speed up the transcoding of long videos or perform complex operations on videos. In this case, you can use Serverless Workflow to orchestrate functions to implement a powerful video processing workflow system. The following figure shows the architecture of the solution. yinshipinshilichangjingheserverlessgongzuoliu
For example, when you upload a video in the MOV format to OSS, the OSS trigger automatically triggers a specific function. The function uses Serverless Workflow to transcode the video to one or more formats at the same time. The formats of the transcoded videos are determined by the DST_FORMATS environment variable of the function. This way, the following requirements can be met:
  • A video can be transcoded to various formats at the same time for custom processing, such as adding watermarks to the video or synchronizing the updated information about the processed video to the database.
  • When multiple videos are uploaded to OSS at the same time, Function Compute automatically scales out the resources to process the videos in parallel. Function Compute can also transcode the videos to multiple formats at the same time.
  • You can transcode oversized videos by using NAS and segmenting the video stream. Each oversized video is first segmented, then the video segments are transcoded in parallel, and at last, the video segments are merged into a video. If you specify proper time intervals at which an oversized video is segmented, the transcoding of the video can be accelerated.
    Note Video segmenting refers to splitting a video stream into multiple segments at specified time intervals and recording the information about the segments in a generated index file.

For more information about how to implement the solution, see fc-fnf-video-processing.

Benefits of a serverless solution

Remarkable engineering efficiency

Item Serverless solution Self-managed system
Infrastructure N/A You must purchase and manage infrastructure resources.
Development efficiency You can focus on the development of business logic. You can use Serverless Devs to orchestrate and deploy resources. In addition to necessary business logic development, you must build an online runtime environment, which includes the installation of related software, service configuration, and security updates.
Parallel and distributed video processing You can use Serverless Workflow to orchestrate resources so that you can implement parallel processing of multiple videos or distributed processing of a single oversized video. Alibaba Cloud ensures the service stability and monitors video processing. Strong development capabilities and a sound monitoring system are required to ensure the stability of the video processing system.
Training costs You need only to write code in the corresponding programming language and be familiar with FFmpeg. In addition to the understanding of the corresponding programming language and FFmpeg, you must be familiar with the features, terms, and parameters that are related to Kubernetes and ECS.
Business cycle The solution is estimated to consume three man-days, including two man-days for development and debugging and one man-day for stress testing and checking. The system is estimated to consume about 30 man-days for the following items, which exclude business logic development: hardware procurement, software and environment configuration, system development, testing, monitoring and alerting, and canary release of the system.

Auto scaling and O&M-free

Item Serverless solution Self-managed system
Elastic and highly available Function Compute can automatically adjust resources in milliseconds. This allows you to scale out the underlying system with high efficiency to cope with traffic peaks. In addition, Function Compute does not require O&M and provides excellent transcoding performance. You must manage a Server Load Balancer (SLB) instance and your self-managed system cannot reach the auto-scaling speed of Function Compute.
Monitoring, alerting, and queries The solution allows you to enjoy the workflow executions of Serverless Workflow and function executions that are more fine-grained. You can query the latency and logs of each function execution. In addition, the solution uses a sounder monitoring and alerting mechanism. You can use metrics similar to those of Auto Scaling or containers.

Excellent transcoding performance

For example, a video file lasts for 89 seconds and is in the MOV format. A cloud service consumes 188 seconds to perform regular transcoding on the video file to convert it to the MP4 format. This amount of time, 188 seconds, is marked as T as a reference. The percentage of performance acceleration is calculated by using the following formula:

Percentage of performance acceleration = T/Time consumed for transcoding by Function Compute

Time interval of video segmenting (s) Amount of time consumed for transcoding by Function Compute (s) Percentage of performance acceleration (%)
45 160 117.5
25 100 188
15 70 268.6
10 45 417.8
5 35 537.1

Cost-effective

In some scenarios, the serverless solution that uses Function Compute saves more costs than the self-managed system in terms of video processing. In addition, the serverless solution has greater cost competitiveness than the video transcoding services provided by other cloud service providers.

The following part uses an example of the conversion between MP4 and FLV, which are the formats most commonly used in video-on-demand services. Based on experiments, the memory capacity of Function Compute is set to 3 GB for the serverless solution. The following table describes the fees for transcoding MP4 files to the FLV format.

Table 1. Transcode MP4 files into the FLV format
Resolution Rate Frame rate Amount of time consumed for transcoding by Function Compute Fee for transcoding by Function Compute Fee for transcoding by a specific cloud service provider Percentage of cost reduction
Standard definition (SD): 640 × 480 pixels 889 KB/s 24 11.2s 0.003732288 0.032 88.3%
High definition (HD): 1280 × 720 pixels 1963 KB/s 24 20.5s 0.00683142 0.065 89.5%
Ultra HD: 1920 × 1080 pixels 3689 KB/s 24 40s 0.0133296 0.126 89.4%
4K 3840*2160 11185 KB/s 24 142s 0.04732008 0.556 91.5%
Table 2. Transcode FLV files into the MP4 format
Resolution Rate Frame rate Amount of time consumed for transcoding by Function Compute Fee for transcoding by Function Compute Fee for transcoding by a specific cloud service provider Percentage of cost reduction
SD: 640 × 480 pixels 712 KB/s 24 34.5s 0.01149678 0.032 64.1%
HD: 1280 × 720 pixels 1806 KB/s 24 100.3s 0.033424 0.065 48.6%
Ultra HD: 1920 × 1080 pixels 3911 KB/s 24 226.4s 0.0754455 0.126 40.1%
4K 3840*2160 15109 KB/s 24 912s 0.30391488 0.556 45.3%

Percentage of cost reduction = (Fee for transcoding by a specific cloud service provider - Fee for transcoding by Function Compute)/Fee for transcoding by a specific cloud service provider

The specific cloud service provider charges users based on the billing policy of regular transcoding. The minimum billable duration is 1 minute for each video. In this example, videos that last for 2 minutes are used. If the billing duration is replaced by 1.5 minutes, the percentage of cost reduction generally fluctuates by less than 10%.

The preceding tables show that the serverless solution that uses Function Compute and Serverless Workflow has great cost competitiveness on the bidirectional conversion between the FLV and MP4 formats. Specifically, the conversion from FLV to MP4 requires complex computing capabilities, whereas the conversion from MP4 to FLV requires simple computing capabilities. Based on practical experience, the actual cost reduction is usually more obvious than that described in the preceding tables. This is due to the following reasons:
  • The test videos have high bitrates, whereas most of the videos in actual use are in the SD or low definition (LD) quality and they have lower bitrates than the test videos. The videos in actual use require fewer computing resources. Transcoding by Function Compute consumes a smaller amount of time and thus the costs become lower. However, the configured pricing policies of general cloud transcoding services do not vary with the video quality or bitrate.
  • If general cloud transcoding services are used, costs are higher for some resolutions. For example, a video to be transcoded with a resolution of 856 × 480 pixels or 1368 × 768 pixels is billed based on the higher resolution level. A video with a resolution of 856 × 480 pixels is billed as an HD video whose resolution is 1280 × 720 pixels. Similarly, a video with a resolution of 1368 × 768 pixels is billed as an ultra HD video whose resolution is 1920 × 1080 pixels. In this case, the unit price for video transcoding is greatly increased, whereas the increase in computing capabilities is probably less than 30%. To resolve this issue, you can use Function Compute, which allows you to pay only for consumed computing resources.

Operations and deployment

Prerequisites

Procedure

Strengths

Integration of the benefits of Function Compute and Serverless Workflow

  • The serverless solution does not require you to purchase and manage infrastructure resources such as servers. You can focus on the development of video processing services. This greatly shortens the required delivery time and reduces labor costs.
  • Function Compute provides features such as log query, performance monitoring, and alerting to help you troubleshoot issues with high efficiency.
  • Function Compute uses events to trigger responses to requests.
  • The serverless solution is O&M-free and can provide excellent performance. Function Compute can automatically adjust resources in milliseconds. This allows you to scale out the underlying system with high efficiency to cope with traffic peaks.
  • The serverless solution has great cost competitiveness.

Benefits of the serverless solution compared with general transcoding services

  • The serverless solution is highly customized and transparent to you. It allows you to develop the appropriate audio and video processing logic with high efficiency based on specific audio and video processing tools or commands, such as FFmpeg.
  • You can migrate the original FFmpeg-based audio and video processing service to the serverless solution with a few clicks.
  • The serverless solution features higher elasticity. It ensures sufficient computing resources to provide transcoding services. For example, hundreds of 1080p videos, each more than 4 GB in size, are regularly generated every Friday, and you have the need to complete the processing of the videos within a few hours.
  • The serverless solution allows you to convert audio formats, customize the sample rate of audio streams, or reduce noise in audio streams. This is similar to the features of professional audio processing tools, such as AACGain and MP3Gain.
  • You can use Serverless workflow to perform complex and custom task orchestration. For example, you may want to record the transcoding details in your database each time a video is transcoded and want the popular videos to be automatically prefetched to CDN nodes after the videos are transcoded to relieve pressure on the origin server.
  • You can use more event-driven methods. For example, you can trigger a function based on OSS or Message Service (MNS) messages.
  • The serverless solution has great cost competitiveness in most scenarios.

Benefits of the serverless solution compared with self-managed systems

  • Function Compute can automatically adjust resources in milliseconds. The serverless solution supports the use of a large amount of resources and the computing capabilities of tens of thousands of CPU cores. For example, you can use the serverless solution to complete the transcoding of 10,000 online courses in half an hour.
  • The serverless solution allows you to focus on developing business logic code. The built-in event-driven mode of Function Compute simplifies the development and programming procedure. The serverless solution caters to the priority of audio and video processing tasks. This greatly improves the O&M efficiency.
  • Function Compute uses the three-zone deployment mode to ensure high security. Computing resources are also distributed across zones to ensure that each user can obtain the required maximum computing power.
  • The serverless solution provides an out-of-box monitoring system. The system monitors function executions from multiple perspectives. Based on the monitoring information provided by the system, you can identify the causes of issues with high efficiency and analyze different objects, such as the format distribution and size distribution of videos.
  • The serverless solution has great cost competitiveness in most scenarios. This is because Function Compute allows you to pay for only consumed computing resources and the billing granularity is 100 milliseconds. Function Compute is considered to have CPU utilization that reaches 100%.

FAQ

  • I have deployed a video processing system on a virtual machine or container platform by using FFmpeg. How can I improve the elasticity and availability of the system?

    You can migrate your system that is developed by using FFmpeg from the virtual machine or container platform to Function Compute with ease. Function Compute can be integrated with FFmpeg-related commands. The system reconstruction is cost-effective, and the elasticity and high availability of Function Compute can be inherited.

  • What can I do if I need to concurrently process a large number of videos?

    For more information about the solution, see Video processing workflow system. When multiple videos are uploaded to OSS at the same time, Function Compute automatically scales out the resources to process the videos in parallel. For more information, see refine fc-fnf-video-processing.

  • Hundreds of 1080p videos, each more than 4 GB in size, are regularly generated every Friday, and I want to complete the processing of the videos within a few hours. How can I process such a large number of oversized videos at a time with high efficiency?

    You can control the size of video segments to ensure that the original oversized video has adequate computing resources for transcoding. Video segmenting can greatly improve transcoding efficiency. For more information about the solution, see refine fc-fnf-video-processing.

  • I want to record the transcoding details in my database each time a video is transcoded. I also want the popular videos to be automatically prefetched to CDN nodes after the videos are transcoded to relieve pressure on the origin server. How can I use such advanced custom processing features?

    For more information about the solution, see Video processing workflow system. You can perform some custom operations during media processing or perform additional operations based on the process. For example, you can add preprocessing steps before the process begins or add subsequent steps.

  • My custom video processing workflow contains multiple operations, such as transcoding videos, adding watermarks to videos, and generating GIF images based on video thumbnails. After that, I want to add more features to my video processing system, such as adjusting the parameters used for transcoding. I also hope that the existing online services provided by the system are not affected when new features are released. How can I achieve this goal?

    For more information about the solution, see Video processing workflow system. Serverless Workflow is used only for function orchestration. Therefore, you can focus on updating functions used for media processing. Function versions and aliases are also supported for you to better control the canary release. For more information, see Introduction to versions.

  • I require only simple transcoding services or lightweight media processing services. For example, I want to obtain a GIF image that is generated based on the first few frames of a video, or query the duration of an audio file or a video file. In this case, building a custom media processing system is cost-effective. How can I achieve this goal?

    Function Compute supports custom features. You can run specific FFmpeg commands to achieve your goal. For more information about the typical sample project, see fc-oss-ffmpeg.

  • My mezzanine video files are stored in NAS or on disks that are attached to ECS instances. I want to build a custom video processing system that can directly read and process my mezzanine video files, without migrating them to OSS. How can I achieve this goal?

    You can integrate Function Compute with NAS to allow Function Compute to process the files that are stored in NAS. For more information, see Configure a NAS file system.