This topic describes how to use Function Compute to resolve issues such as process blocking in CPU-intensive scenarios in Yuque.
Yuque is a professional cloud knowledge base for team document collaboration. Yuque has been an essential service that allows Alibaba employees to write documents and accumulate knowledge, and has been commercially available since 2018.
Customer pain points
Yuque is a complex web application and a typical data-intensive application that depends on a large number of cloud services such as databases. The server of Yuque is a stack of Node.js technologies. Node.js has features such as single-threaded, non-blocking, and asynchronous programming. These features are ideal for building scalable network applications to implement I/O-intensive applications such as web services. However, in CPU-intensive scenarios, if a method that can block processes is executed, the entire process is blocked.
You may encounter scenarios where a large number of CPU resources are consumed or a blocked process gets stuck in an infinite loop for applications such as Yuque that uses Node.js to implement the entire server logic. For example, transformation to Markdown from various programming languages may lead to low efficiency or infinite loops. When Node.js was first introduced, it was difficult to find perfect solutions for these issues. Even programming languages that are based on the thread concurrency model, such as Java, are unable to provide a solution in such scenarios. CPU is an important resource for web applications. When the infrastructure improves, the introduction of Function Compute provides a solution to resolve these issues.
After Function Compute was introduced, Yuque uses Function Compute to perform CPU-intensive and unstable operations. This way, Yuque can continue to use the I/O-intensive application model to provide the primary service and improve R&D efficiency with the help of Node.js.
An actual scenario encountered in Yuque is used as an example. When you upload documents in the HTML or Markdown format, Yuque must transform these documents into the format that is used in Yuque. In most cases, the content that you enter is efficiently parsed. However, unexpected scenarios may trigger parser bugs and lead to indefinite loops. Yuque may not upgrade Markdown parsing libraries and related plug-ins to prevent more issues. After Function Compute was introduced, Yuque uses Function Compute to implement the CPU-consuming transformation logic to ensure the stability of the primary service.
Yuque allows you to use various forms of code for plotting, including PlantUML, formulas, and Mermaid. Yuque also allows you to export documents into PDF files or images. These scenarios have the following features:
- Complex application software such as Puppeteer and Graphviz must be used.
- The content that you enter may need to be executed.
It seems that child_process.exec can be called to meet all the preceding requirements. However, Yuque wants to make it a stable external service. The preceding complex application software may not be designed for long-term running. When such software runs for a long period of time, errors may occur in CPU utilization and stability. When such software is called with high concurrency, CPU has heavy loads. In addition, the code that you enter must be run in some scenarios. Hackers can create malicious input to run malicious code on the server, which is dangerous.
Before Function Compute was introduced, to support these features, Yuque allocated a dedicated task cluster to run these third-party services and handle requests from the primary service so that the stability of the primary service is not affected. However, high costs are required to resolve the preceding issues.
- Maintain a large task cluster, although most of the time a large amount of resources in the task cluster are not required.
- Restart third-party application software on a regular basis to avoid memory leak caused by long-term running. Some special requests may affect stability of third-party application software.
- Detect and filter user input to prevent hacker attacks. However, it is difficult to completely prevent malicious code. Security risks are high.
Yuque deploys all the third-party services to Function Compute, and splits features of the task cluster into functions and deploy these functions to Function Compute. Function Compute offers the following features to resolve the preceding issues:
- You are charged for the actual time when CPU runs to execute code. This eliminates the need to maintain the task cluster for a long period of time.
- Although resident functions are optimized during function execution in Function Compute, issues caused by long-term running can be prevented. Calls are independent of each other.
- Code that you enter runs in a sandbox container. Regardless of whether the user input is filtered, hackers cannot obtain sensitive information or enter the internal network to run malicious code, which is more secure.
In addition to the preceding features, Yuque replaces ApsaraVideo VOD with a combination of Object Storage Service (OSS) and Function Compute to transcode videos and audio files.
Only a few video and audio formats can be directly played by browsers. Most videos that you upload must be transcoded before they can be directly played on Yuque. Typically, FFmpeg is a common tool to transcode videos and audio files. The transcoding service is CPU-intensive. A self-managed video transcoding cluster can waste a large amount of resources. Use of ApsaraVideo VOD causes high costs and resources that can be managed are limited. Function Compute integrates FFmpeg in the application center to provide audio and video processing capabilities and works with Log Service to implement monitoring and data analysis. After the audio and video processing service is migrated from ApsaraVideo VOD to Function Compute, Yuque reduces costs by 80% by optimizing the compression rate and reducing unnecessary transcoding.
Benefits of Function Compute
Yuque does not migrate web services to Function Compute as the small form factor (SFF) does. The current architecture of Function Compute is not good at SFF. However, Function Compute plays an important role in the stability, security, and cost control of the overall architecture of Yuque. Function Compute is ideal for the following scenarios:
- Perform CPU-intensive operations that do not have high requirements for timeliness and distribute CPU loads of the primary service.
- Work as a sandbox container to run code that you enter.
- Run unstable third-party application software.
- Provide highly scalable services.
After Function Compute was introduced, the architecture of Yuque is centered around Monolith Application. Independent function modules are divided into microservices and serverless architectures based on scenarios and capability requirements. Application architectures are closely related to team members and business forms. Various cloud services and infrastructure have improved, and Yuque can choose a more appropriate architecture.
Based on the serverless architecture, Yuque can migrate tasks that have security risks or consume a large amount of CPU resources to Function Compute. This way, these tasks run in a sandbox environment to protect against malicious code. This approach also removes these CPU-intensive tasks from the primary service so that they do not block the primary service during concurrent operations. The pay-as-you-go billing method can significantly reduce costs because you do not need to deploy a resident service for low-frequency feature scenarios.