How to use layers to solve dependency package problem

When using the Alibaba Cloud functional computing platform, if you have ever encountered the following problems, this article should help you:

1. Third-party dependency packages are too large, and each update of the code is very time-consuming, and may even exceed the code package limit. What should I do?

2. After installing the third-party dependency package, it can be successfully run locally. An error will be reported when it is uploaded to the Alibaba Cloud function computing platform. What is this?

3. There are many commonly used dependency packages, which should be used by many users. Can't Alibaba Cloud function computing be directly built into the runtime environment?

4. I have the same dependent packages in multiple functions. How can I manage these same dependent packages?

The layer provides a method for centralized management, cross multiple functions, and share code and data.

In January 2021, Alibaba Cloud functional computing released the "custom layer" function, which allows users to customize the layer and support cross-function sharing. In August 2022, Alibaba Cloud functional computing released the "public layer" function, providing the official public layer for users to use directly, further improving the user experience.

Next, we will introduce the function and function of "Custom Layer".

Custom layer

Before the release of layer functions, the code must be packaged and deployed together with its dependencies. These dependencies may be the same in different functions. In many cases, the size of these dependencies is far greater than the size of the code.

After the release of layer functions, we can package the code dependencies or the shared parts of multiple functions into Zip compressed files and publish them as the user-defined layer for function calculation. Different functions can use this user-defined layer. Alibaba Cloud function computing will load the layer and function code together when calling. You can refer to the document [1] at the end of the article to create a custom layer [2] and configure the custom layer in the function:

Why use custom layers?

Using custom layers has the following advantages:

Reuse code across functions

Extract common code or data from multiple functions, package them into Zip packages, and make them into user-defined layers for reference by different functions, thus avoiding maintaining common code or data in multiple places.

At the same time, it also realizes the separation of dependency and business logic, and users can focus on the core business logic.

Make the code package smaller

As the code package of the function becomes larger and larger, the deployment speed will become slower and slower, which makes the maintenance and testing of the function more difficult.

In addition, the size of the function code package is also limited. For example, the code package of Alibaba Cloud function computing is limited to 500MB (September 2022). Layer is one of the ways to break this limit. Layers also have size restrictions. At present, the code package size of a single layer is limited to 500MB. A single function can be configured with up to five layers, and the total size cannot exceed 2GB.

Accelerate code deployment and simplify function management

The smaller the function code package, the faster the deployment of the code package. Especially for some large dependencies, the core function code may only be a few megabytes, but the dependencies may be hundreds of megabytes. For example, Puppeter's dependency package exceeds 100MB, and Alibaba Cloud's DataX dependency package exceeds 800MB.

In general, these dependencies are rarely modified, so packaging them into layers can avoid frequent modification of these large dependencies when the core code is modified. These dependencies can also be split into multiple layers, and only one layer needs to be updated each time a function is modified. For example, we implemented the customized runtime Python 3.10 and the runtime compatible scientific computing library Scipy, which can split the customized runtime and the dependent package into two layers. When the dependent package needs to be updated, only the layer of the dependent package needs to be updated, while the layer of the customized runtime remains unchanged.

The dilemma of custom layer

There is a certain threshold for the production layer

The Zip package of layer has a certain format specification, and users need to make it according to this specification. Take Python's requests library as an example, and the file structure after packaging is:

my-layer-code.zip

└── python

└── requests

Why is there such a requirement? This involves the implementation logic of searching third-party dependency packages in different runtimes. For example, Python runtime will search for dependency packages in the sys.path path. The above Zip package will be decompressed to the/opt directory of the function instance. After decompressing, the requests package will be placed in the/opt/python directory.

Then, the function computing platform will put some specific directories on the dependent search path of the runtime language. For example, the Python runtime will put/opt/Python in sys.path, so that the requests library can be directly referenced in the code. For other runtime usage methods, refer to the document to create custom layers.

Of course, you can also make a layer without following this format specification. At this time, you need to add the corresponding search path in the code. For specific methods, please refer to the document: How to reference the dependency in the layer in the Custom Runtime?

The layer needs to be made under the specified operating system and processor architecture. Some dependencies depend on the operating system and processor architecture. For example, Python's scientific computing library NumPy. If you install it under the Mac OS of M1 chip, its version is

numpy-1.23.3-cp39-cp39-macosx_ 11_ 0_ arm64

You can see that the compatible operating system is mac os and the processor architecture is arm64. However, the instance environment of the function computing platform is Linux x86_ 64. The current distribution version of the operating system is Debian 9, so the NumPy library installed under M1 Mac cannot be used on the Alibaba Cloud functional computing platform.

We recommend installing under Debian 9 system, but the user may not have this environment locally. You can use online build dependency libraries or use functions to calculate the official runtime image to build, which will not be repeated here.

The layer needs to contain new shared dynamic libraries. Some dependent libraries need to install additional shared dynamic libraries, and these shared dynamic libraries need to be included when building the Zip package of the layer. For example, Nodejs' dependent library Puppeter needs to install more than 20 additional shared dynamic libraries (such as libxss1, libnspr4, etc.), all of which must be packaged into the layer Zip package. How to successfully install the Puppeter library is not a simple matter.

The shared dynamic library is recommended to be placed in the lib directory of the Zip package. The function computing platform will add the/opt/lib directory to the LD_ LIBRARY_ PATH (built-in runtime only).

Cannot share across accounts

The user-defined layer can only be shared between different functions in the same account and region by default, and cannot be shared across accounts. Therefore, the user-defined layer created by user A cannot be used by user B, which not only brings repeated workload to users, but also is not conducive to the reuse of the same layer on the host.

Common layer

Due to these pain points of the custom layer, Alibaba Cloud functional computing released the public layer function in August 2022. Realize cross-account sharing between layers, and provide some official public layers for users to use directly, facilitating users to quickly develop sample prototypes.

The Alibaba Cloud functional computing platform mainly provides three types of official public layers:

• Custom runtime (such as Python 3.10, Nodejs17, PHP 8.1, Java17,. NET 6, etc.)

• Common dependency libraries (such as PyTorch, Scipy, Puppeter, etc.)

• AliCloud SDK (such as AliCloud DataX)

For details, please refer to the official document to configure the official public layer in the function. At present, the official public layer is still being supplemented. If you have a runtime or dependent library that you want to use through the official public layer, you can contact us through the nail answer group (nail group number: 11721331), or you can directly submit an issue at github.com/awesome-fc/awesome-layers.

How do I expose custom layers?

At present, the layer disclosure function is in the internal test. If you have any need, please contact us through the nail answer group (nail group number: 11721331). Please refer to github.com/awesome-fc/awesome-layers for the method of exposing custom layers.

At the same time, we also welcome you to contribute the public layer to the warehouse github.com/awesome-fc/awesome-layers. We will soon provide methods and examples of public layer contributions in the warehouse.

Example display

For the latest version and instructions of the official public layer, please refer to github.com/awesome-fc/awesome-layers. Here are some typical examples of using the official public layer.

Example 1. Implementation of web page screenshot sample program based on Nodejs16+Puppeter

Puppeter is a Nodejs library that provides advanced APIs and controls Chrome (or Chromium) through the DevTools protocol. Generally speaking, it is a headless chrome browser, which can be used to accomplish many automated tasks, such as:

• Generate screenshots or PDF

• Automatic submission of forms, automatic testing of UI, simulated keyboard input, etc

• more...

This example uses Puppeter to complete a sample program of web page screenshots.

First, we use the built-in runtime Nodejs16 to create a function start-poppeter, where the request handler type is "Process HTTP requests".

Then, set the memory specification to 1GB in the advanced configuration, and the memory usage of the sample program is about 550MB.

After the creation is successful, open the index.js file on the console, copy and overwrite the following code, and click the deploy button.

Briefly introduce the core logic of the above code. First, the code will parse the query parameter to obtain the url address to be screenshot (if the parsing fails, the home page of the Serverless Devs official website will be used by default), then use Puppeter to take a screenshot of the webpage, and save it to the/tmp/example file of the running instance, and then return the file directly as the return body of the HTTP request.

Then, we need to configure the Puppeter public layer, find the layer in the function configuration, click Edit, and select Add Official Public Layer.

Example 2. Quickly implement. NET 6 custom runtime based on the common layer

First, create a. NET 6 custom runtime through the console. At the top level, select "Create using custom runtime", select "Process HTTP requests", and select. NET 6 runtime. Other configurations use default values.

After the creation is successful, you can see the sample code Program.cs through the WebIDE

There are four parts to note in the sample code:

• This example listens to 0.0.0.0 port 9000. The service started by Custom Runtime must listen to 0.0.0.0: CAPort or *: CAPort port, not 127.0.0.1 or localhost. Refer to the document Custom Runtime>Basic Principles for details.

• Add a route/and directly return the string "Hello World!"

• Add a route/invoke, which is the path to use the event request handler. Please refer to the document Custom Runtime>Event Handler

• Add a route/initialize, which is the path corresponding to the function initialization callback program. This method will be executed once during the sample initialization. Please refer to the document Custom Runtime>Function Instance Lifecycle Callback

First, we directly use the test address in the trigger management page to test. At this time, we do not add any PATH information. The results are shown in the following figure:

Then, we test adding/invoke path. Because the routing method is POST, we directly use curl - XPOST to test:

Again, let's test/initialize in this way

Note: This is just a test. The initialization callback function does not need to be actively called. The function calculation platform will automatically call the callback method after the instance is started (don't forget to enable the initializer callback program in the configuration)

Finally, let's do a small test again. Delete the HTTP trigger on the trigger management page. After deletion, the function type will be converted into an event request handler. In the function configuration, enable the Initializer callback program

Test the function on the console, and the results are shown in the following figure:

Click the real-time log button to see that the Initialize callback method has been executed before the request is executed.

What are the best practices for layers?

The previous article introduced what is a custom layer, why to use a custom layer, and what is a public layer. It also introduced two examples of official public layers. However, we still have some doubts about the use of layers. For example, in what scenarios do we recommend using layers? What is the difference between a layer and a code package? Is there any function similar to the layer? Compared with these similar functions, what are the advantages and disadvantages of the layer? Next try to answer these questions.

In what scenarios are layers recommended?

At present, there are two main types of scenarios for using layers. One is the user-defined runtime, and the other is the dependent libraries of various languages. It is highly recommended to build and use custom runtime through layers, but for dependent libraries of various languages, you can refer to the following suggestions:

• Preferential use of official public layer is recommended

• Dependency libraries for non-compiled languages are recommended to be managed by layers, and the compiler-line language needs to be judged according to the actual situation (for example, for the custom runtime, if Java programs are run by JAR package, the dependencies in the layer cannot be introduced. Please refer to the document how to reference the dependencies in the layer in the Custom Runtime?)

• If the dependent library is large and does not exceed the limit size of the layer, it is recommended to use the layer

• If the dependent library requires additional installation of shared dynamic library, it is recommended to use the layer (if the construction is complex, contact the function calculation team to make it)

• If there is a need to share code or data between multiple functions and accounts, it is recommended to use layer

What is the difference between a layer and a code package?

Intuitively, a layer is just to split part of the original code package and build a new code package. Why build a layer? The main difference here is that the design concepts of layers and code packages are different.

• The layer has a simpler version management scheme

The version of a layer is automatically incremented from 1. At present, a layer can support up to 100 available versions (excluding deleted versions); As for the code package, there is no concept of version, only at the service level. The version at the relative level will be more complex.

• - The layer version is read-only and immutable

The content of a layer cannot be changed after it is created (except permissions). If you want to modify the content of a layer, you can only publish a new version. The read-only feature of layer version can avoid the impact of layer changes on functions.

• Sharing ability of the layer

The layer can be shared across functions and accounts, but the code package does not support it.

• Soft deletion strategy of layer version

After the layer version is deleted, it will not affect the normal operation of functions that have been configured to change the layer version. Because when deleting a layer version, Alibaba Cloud's function computing platform does not directly delete the code of the layer version, but first performs a soft deletion operation to avoid using the deleted layer version for new functions. Only when there is no function reference in the layer version, can the layer version be completely deleted.

epilogue

In Alibaba Cloud function computing, the positioning of the layer is an immutable infrastructure. The read-only nature of the layer version ensures the consistency and reliability of the layer. This article first introduces the characteristics and difficulties of the custom layer, then introduces the recently released public layer functions, describes in detail the two sample programs implemented based on the official public layer, and finally discusses what is the best practice of the layer, hoping that readers can better understand the concept of the layer and its application scenarios through this article.

The functions of the layer are still being improved. Next, we will focus on optimizing in the following directions:

• Improve the official public layer experience, add more common dependency libraries or custom runtime as the official public layer, and provide complete application examples.

• Provide methods and examples of public layer contributions to promote the open source and co-construction of public layer.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us