This topic provides a detailed example to describe how to build an elastic and highly available audio and video processing system in a serverless architecture by using Function Compute and Serverless Workflow. This topic also describes the benefits and application scenarios of this solution, and compares this solution with conventional solutions in terms of performance, costs, and engineering efficiency.

New requirements for audio and video processing

Among ApsaraVideo VOD solutions, video transcoding is the one that consumes the most computing power. You can use the dedicated transcoding services provided by Alibaba Cloud. However, you may build your transcoding service in the following scenarios:

  • You have deployed a video processing system on a virtual machine or container platform by using FFmpeg. You want to improve the elasticity and availability of the system.
  • You have the need to process a large number of videos at a time and you require parallel processing for the video files.
  • You have the need to process a large number of oversized videos in batches with efficiency. Assume that hundreds of 1080p videos, each more than 4 GB in size, are regularly generated every Friday. You want to complete the processing in a few hours after the videos are generated.
  • You require more advanced custom processing features. For example, you want to record the transcoding details in your database each time a video is transcoded. Alternatively, you want the popular videos to be automatically prefetched to Alibaba Cloud CDN nodes after the videos are transcoded to relieve pressure on the origin server.
  • The custom video processing workflow may contain multiple operations, such as transcoding videos, adding watermarks to videos, and generating GIF images based on video thumbnails. After that, you want to add more features to your video processing system, such as adjusting the parameters used for transcoding. You also hope that the existing online services provided by the system are not affected when new features are launched.
  • You require simple transcoding services or lightweight media processing services. For example, you want to obtain a GIF image that is generated based on the first few frames of a video, or query the duration of an audio file or a video file. In this case, you can save costs if you build a custom media processing system.
  • You want to convert audio files into different coding formats, customize the sample rate of audio streams, or reduce noise in audio streams.
  • Your video mezzanine files are stored in Apsara File Storage NAS or on disks that are attached to Elastic Compute Service (ECS) instances. If you build a custom video processing system, the system can directly read and process your video mezzanine files, without migrating them to Object Storage Service (OSS).

You may have one of the preceding requirements for your media processing system, or you want to use an elastic, highly available, and cost-efficient system that does not require operations and maintenance (O&M) and supports all processing logic. In this case, the best practice provided in this topic can meet your expectation.

Conventional solution

As computer and network technologies develop, the video-on-demand technology is favored by industries such as education and entertainment because this technology provides excellent human-computer interaction and media streaming services. Nowadays, cloud service providers are continuously refining and optimizing their product lines. This frees you from hardware procurement and technology development when you build video-on-demand applications in the cloud. The following figure shows the conventional solution of Alibaba Cloud.flow chart

In this solution, the OSS buckets can store a large number of videos. The uploaded videos can be transcoded to adapt to the playback speed of different general terminals and terminals that are content delivery network (CDN) accelerated. In addition, requirements for content moderation can be met, such as pornography and terrorism detection.

Simple video processing system

Assume that you require simple video processing. The following figure shows the architecture of the solution.structure

When you upload a video to OSS, the OSS trigger automatically triggers a specific function. The function uses FFmpeg to transcode the video and sends the transcoded video to OSS. For more information about the demo and procedure of the simple video processing system, see simple-video-processing. For more information about OSS triggers, see Overview.

You may want to perform multiple operations on an oversized video or a short video. However, the maximum execution time allowed by Function Compute is 10 minutes. If the time required by the operations exceeds the time limit, the relevant function fails to be executed. If you require more than 10 minutes, you can perform the following operations:
  • Segment the video stream, transcode the video, or produce a video. For more information, see fc-fnf-video-processing.
  • Join the DingTalk group for Function Compute support by searching for the DingTalk group number 11721331 or scanning the QR code at the end of this topic. Alternatively, you can submit a ticket and make the following requests:
    • Increase the maximum time allowed for function executions.
    • Use a performance instance with eight CPU cores and 16 GB memory. This instance type supports a maximum of 2 hours for function executions. For more information, see Instance specifications and usage modes.

Video processing workflow system

You may want to break through the limits on the runtime environment of Function Compute, or speed up the transcoding of long videos to perform complex operations on videos. In this case, you can use Serverless Workflow to orchestrate functions to implement a powerful video processing workflow system. The following figure shows the architecture of the solution.flow chart
When you upload a video in the MOV format to OSS, the OSS trigger automatically triggers a specific function. The function uses Serverless Workflow to transcode the video to one or more formats at the same time. The formats of the transcoded videos are determined by the DST_FORMATS environment variable of the function. This way, the following requirements can be met:
  • A video can be transcoded to various formats at the same time for custom processing, such as adding watermarks to the video or synchronizing the updated information about the processed video to the database.
  • When multiple videos are uploaded to OSS at the same time, Function Compute automatically scales out the resources to process the videos in parallel. Function Compute can also transcode the videos to multiple formats at the same time.
  • By using NAS and segmenting the video stream, you can transcode oversized videos, such as a video more than 3 GB in size. Each oversized video is first segmented, then the video segments are transcoded in parallel, and at last, the video segments are merged into a video. If you specify proper time intervals at which an oversized video is segmented, the transcoding of the video can be accelerated.
    Note Video segmenting refers to splitting a video stream into multiple segments at specified time intervals and recording the information about the segments in a generated index file.

For more information about how to implement the solution, see fc-fnf-video-processing.

Benefits of a serverless solution

Remarkable engineering efficiency
Item Serverless solution by using Function Compute and Serverless Workflow Self-managed system
Infrastructure None You must purchase and manage infrastructure resources.
Development efficiency You can focus on the development of business logic. By using Funcraft, you can orchestrate and deploy resources with a few clicks. In addition to necessary business logic development, you must build an online runtime environment, which includes the installation of related software, service configuration, and security updates.
Parallel and distributed video processing You can use Serverless Workflow to orchestrate resources so that you can implement parallel processing of multiple videos or distributed processing of a single oversized video. Alibaba Cloud ensures the service stability and monitors video processing. Strong development capabilities and a sound monitoring system are required to ensure the stability of the video processing system.
Training costs You are required only to write code in the corresponding programming language and be familiar with FFmpeg. In addition to the understanding of the corresponding programming language and FFmpeg, you must be familiar with the features, terms, and parameters that are related to Kubernetes or Auto Scaling.
Business cycle The solution is estimated to cost 3 man-days, which include 2 man-days for development and debugging and 1 man-day for stress testing and checking. The system is estimated to cost about 30 man-days for the following items, which exclude business logic development: hardware procurement, software and environment configuration, system development, testing, monitoring and alerting, and phased release of the system.
Auto scaling and O&M-free
Item Serverless solution by using Function Compute and Serverless Workflow Self-managed system
Elastic and highly available Function Compute can automatically adjust resources in milliseconds, which allows you to scale out the underlying system with efficiency to cope with traffic peaks. In addition, Function Compute does not require O&M and provides excellent transcoding services. You must manage a Server Load Balancer (SLB) instance and your self-managed system cannot reach the auto-scaling speed of Function Compute.
Monitoring, alerting, and queries The solution allows you to enjoy the workflow executions of Serverless Workflow and function executions that are more fine-grained. You can query the latency and logs of each function execution. In addition, the solution uses a more sound monitoring and alerting mechanism. You can use metrics similar to those of ECS or containers.
Excellent transcoding performance

The percentage of performance acceleration is calculated by using the following formula:

Assume that a video file lasts for 89 seconds and is in the MOV format. A cloud service takes 188 seconds to perform general transcoding on the video file to convert it to the MP4 format. This time, 188 seconds, is marked as T as a reference.

Percentage of performance acceleration = T/Time consumed for transcoding by Function Compute

Time interval of video segmenting (s) Time consumed for transcoding by Function Compute (s) Percentage of performance acceleration (%)
45 160 117.5
25 100 188
15 70 268.6
10 45 417.8
5 35 537.1
High cost efficiency
  • In specific video processing scenarios, peak hours and off-peak hours are obvious. For example, you may have the need to process videos in specific time periods of a day and you barely have videos to process in the remaining time of that day. In this case, you can select the pay-as-you-go billing method to pay for only used computing resources.
  • In specific video processing scenarios, peak hours or off-peak hours are not obvious. In this case, you can select the subscription billing method, which saves costs for you.

For more information, see Billing overview.

Assume that you have built a video transcoding system based on ECS. Such a system is CPU-intensive. Therefore, this topic takes average CPU utilization as a core metric to measure costs. The following figure shows the CPU utilization of the serverless solution and self-managed system when the utilization of the computing capacity of 10 c5 ECS instances in one month is about 30%.result

Based on the preceding figure, the following billing model can be used:

  • Price for the subscription plan of Function Compute (3 GB per month): CNY 246.27. The computing capabilities of this subscription plan are equivalent to those of a c5 ECS instance.
  • Price for a c5 ECS instance (2 vCPUs and 4 GB memory) and its attached disk: CNY 219 per month.
  • The amount of pay-as-you-go Function Compute resources accounts for 10% or less of the total computing resources used. Therefore, the fee is calculated in the following way: 3 × 864 × 10% = 259.2.

    The fee for Function Compute with 3 GB memory to run at full load for a month is calculated in the following way: 0.00011108 × 3 × 30 × 24 × 3600 = 863.8.

Item Average CPU utilization Fee for computing resources (CNY) Total (CNY)
Combination of the subscription and pay-as-you-go billing method for Function Compute ≥ 80% 998 (246.27 × 3 + 259.2) ≤ 998
Reserved ECS instances based on peak hours ≤ 30% 2190 (10 × 219) ≥ 2190

In the estimated billing model, the serverless solution that uses Function Compute has great cost competitiveness. In actual use, the CPU utilization of a self-managed video transcoding system built based on ECS can hardly reach 20%. This is due to the following reasons:

  • You may have the need to transcode videos only in specific time periods of a day.
  • To enhance user experience, you may have specific requirements for the video transcoding speed. For example, you may require 10 ECS instances to work in parallel to transcode a single video. In this case, you must purchase a large number of reserved ECS instances.

Therefore, in actual use, the serverless solution that uses Function Compute saves more costs than the self-managed system in terms of video processing. In addition, compared with the video transcoding services provided by other cloud service providers, the serverless solution has greater cost competitiveness.

This topic uses an example of the conversion between MP4 and FLV, which are the formats most commonly used in video-on-demand services. Based on experiments, the memory capacity of Function Compute is set to 3 GB for the serverless solution. The following table describes the fees for transcoding MP4 files to the FLV format.

Table 1. Transcode MP4 files into the FLV format
Resolution Rate Frame rate Time consumed for transcoding by Function Compute Fee for transcoding by Function Compute (CNY) Fee for transcoding by a specific cloud service provider (CNY) Percentage of cost reduction
Standard definition (SD): 640 × 480 pixels 889 KB/s 24 11.2s 0.003732288 0.032 88.3%
High definition (HD): 1280 × 720 pixels 1963 KB/s 24 20.5s 0.00683142 0.065 89.5%
Ultra-high definition (ultra HD): 1920 × 1080 pixels 3689 KB/s 24 40s 0.0133296 0.126 89.4%
4K: 3840 × 2160 pixels 11185 KB/s 24 142s 0.04732008 0.556 91.5%
Table 2. Transcode FLV files into the MP4 format
Resolution Rate Frame rate Time consumed for transcoding by Function Compute Fee for transcoding by Function Compute (CNY) Fee for transcoding by a specific cloud service provider (CNY) Percentage of cost reduction
SD: 640 × 480 pixels 712 KB/s 24 34.5s 0.01149678 0.032 64.1%
HD: 1280 × 720 pixels 1806 KB/s 24 100.3s 0.033424 0.065 48.6%
Ultra HD: 1920 × 1080 pixels 3911 KB/s 24 226.4s 0.0754455 0.126 40.1%
4K: 3840 × 2160 pixels 15109 KB/s 24 912s 0.30391488 0.556 45.3%

Percentage of cost reduction = (Fee for transcoding by a specific cloud service provider - Fee for transcoding by Function Compute)/Fee for transcoding by a specific cloud service provider

The specific cloud service provider charges users based on the billing policy of general transcoding. The minimum billable duration is 1 minute for each video. In this example, videos that last for 2 minutes are used. If the billing duration is replaced by 1.5 minutes, the percentage of cost reduction generally fluctuates by less than 10%.

The preceding tables show that the serverless solution that uses Function Compute and Serverless Workflow has great cost competitiveness on the bidirectional conversion between the FLV and MP4 formats. Specifically, the conversion from FLV to MP4 requires complex computing capabilities, whereas the conversion from MP4 to FLV requires simple computing capabilities. Based on practical experience, the actual cost reduction is usually more obvious than that described in the preceding tables. This is due to the following reasons:

  • The test videos have high bitrates, whereas most of the videos in actual use are in the SD or low definition (LD) quality and they have lower bitrates than the test videos. The videos in actual use require fewer computing resources. Functions can be executed in a shorter time and thus the costs become lower. However, the configured pricing policies of general cloud transcoding services do not vary with the video quality or bitrate.
  • Videos with many resolutions are billed at great cost by general cloud transcoding services. For example, a video to be transcoded with a resolution of 856 × 480 pixels or 1368 × 768 pixels is billed based on the higher resolution level. A video with a resolution of 856 × 480 pixels is billed as an HD video whose resolution is 1280 × 720 pixels. Similarly, a video with a resolution of 1368 × 768 pixels is billed as an ultra HD video whose resolution is 1368 × 768 pixels. In this case, the unit price for video transcoding is greatly increased, whereas the increase in computing capabilities is probably less than 30%. To resolve this issue, you can use Function Compute, which allows you to pay only for consumed computing resources.

Operations and deployment

PrerequisitesProcedure

Strengths

Integration of the benefits of Function Compute and Serverless Workflow
  • The serverless solution does not require you to purchase and manage infrastructure resources such as servers. You can focus on the development of video processing services. This greatly shortens the required delivery time and saves costs.
  • Function Compute provides features such as log query, performance monitoring, and alert to help you troubleshoot issues with efficiency.
  • Function Compute uses events to trigger an application to respond to your requests.
  • The serverless solution requires no O&M and can provide excellent performance. Function Compute can automatically adjust resources in milliseconds, which allows you to scale out the underlying system with efficiency to cope with traffic peaks.
  • The serverless solution has great cost competitiveness.
Benefits of the serverless solution compared with general transcoding services
  • The serverless solution is highly customized and transparent to you. It allows you to develop the appropriate audio and video processing logic with efficiency based on specific audio and video processing tools or commands, such as FFmpeg.
  • You can migrate the original FFmpeg-based audio and video processing service to the serverless solution with a few clicks.
  • The serverless solution features higher elasticity. It ensures sufficient computing resources to provide transcoding services. For example, hundreds of 1080p videos, each more than 4 GB in size, are regularly generated every Friday, and you want to complete the processing in a few hours after the videos are generated.
  • The serverless solution allows you to convert audio files into different coding formats, customize the sample rate of audio streams, or reduce noise in audio streams. This is similar to the features of professional audio processing tools, such as AACGain and MP3Gain.
  • You can use Serverless Workflow to perform complex and custom task orchestration. For example, you may want the popular videos to be automatically prefetched to CDN nodes after the videos are transcoded to relieve pressure on the origin server.
  • You can use more event-driven methods. For example, you can trigger a function based on OSS or Message Service (MNS) messages.
  • The serverless solution has great cost competitiveness in most scenarios.
Benefits of the serverless solution compared with self-managed systems
  • Function Compute can automatically adjust resources in milliseconds. The solution supports the use of a large amount of resources and the computing capabilities of tens of thousands of CPU cores. For example, you can use the solution to complete the transcoding of 10,000 online courses in half an hour.
  • The serverless solution allows you to focus on developing business logic code. The built-in event-driven mode of Function Compute simplifies the development and programming procedure. The solution caters to the priority of audio and video processing tasks, which greatly improves the O&M efficiency.
  • Function Compute uses the three-zone deployment mode to ensure high security. Computing resources are also distributed across regions to ensure that each user can obtain the required maximum computing power.
  • The serverless solution provides an out-of-box monitoring system. The system monitors function executions from multiple perspectives. Based on the monitoring information provided by the system, you can identify the causes of issues with efficiency and analyze different objects, such as the format distribution and size distribution of videos.
  • The serverless solution has great cost competitiveness in most scenarios. This is because Function Compute allows you to pay for only consumed computing resources and the billing granularity is 100 milliseconds. Function Compute is considered to have a CPU utilization that reaches 100%.

FAQ

  • I have deployed a video processing system on a virtual machine or container platform by using FFmpeg. How can I improve the elasticity and availability of the system?

    You can migrate your system that is developed by using FFmpeg from the virtual machine or container platform to Function Compute with ease. Function Compute can be integrated with FFmpeg-related commands. The system reconstruction is cost-efficient and the high availability of Function Compute can be inherited.

  • What can I do if I need to concurrently process a large number of videos?

    For more information about the solution, see Video processing workflow system. When multiple videos are uploaded to OSS at the same time, Function Compute automatically scales out the resources to process the videos in parallel. For more information, see refine fc-fnf-video-processing.

  • Hundreds of 1080p videos, each more than 4 GB in size, are regularly generated every Friday and I want to complete the processing in a few hours after the videos are generated. How can I process such a large number of oversized videos in batches with efficiency?

    You can control the size of video segments to ensure that the original oversized video has adequate computing resources for transcoding. Video segmenting can greatly improve transcoding efficiency. For more information about the deployment solution, see refine fc-fnf-video-processing.

  • I want to record the transcoding details in my database each time a video is transcoded. I also want the popular videos to be automatically prefetched to CDN nodes after the videos are transcoded to relieve pressure on the origin server. How can I use such advanced custom processing features?

    For more information about the deployment solution, see Video processing workflow system. You can perform some custom operations during media processing or perform additional operations based on the process. For example, you can add preprocessing steps before the process begins or add subsequent steps.

  • My custom video processing workflow contains multiple operations, such as transcoding videos, adding watermarks to videos, and generating GIF images based on video thumbnails. After that, I want to add more features to my video processing system, such as adjusting the parameters used for transcoding. I also hope that the existing online services provided by the system are not affected when new features are launched. How can I achieve this goal?

    For more information about the deployment solution, see Video processing workflow system. Serverless Workflow is used only for function orchestration. Therefore, you can focus on updating functions used for media processing. Function versions and aliases are also supported for you to better control the phased release. For more information, see Introduction to versions.

  • I require only simple transcoding services or lightweight media processing services. For example, I want to obtain a GIF image that is generated based on the first few frames of a video, or query the duration of an audio file or a video file. In this case, I can save costs if I build a custom media processing system. How can I achieve this goal?

    Function Compute supports custom features. You can run specific FFmpeg commands to achieve your goal. For more information about the typical application sample, see fc-oss-ffmpeg.

  • My video mezzanine files are stored in NAS or on disks that are attached to ECS instances. I want to build a custom video processing system that can directly read and process my video mezzanine files, without migrating them to OSS. How can I achieve this goal?

    You can integrate Function Compute with NAS to allow Function Compute to process the files that are stored in NAS. For more information, see Configure NAS.