×
Community Blog Installing a Dependency Library for Function Compute

Installing a Dependency Library for Function Compute

This article explains how to use existing tools to install dependency libraries to a Function Compute project with minimal manual intervention.

In common programming practice, projects, libraries, and system environments must be installed and configured in synergy. Alibaba Cloud Function Compute runs in a prefabricated runtime environment, which pursues higher concurrency and security by sacrificing flexibility. Because the system and code directories are read-only during runtime, dependency libraries need to be pre-installed to the code but not system directory. The installation tool of the new Function Compute platform cannot yet address these changes. This article explains how to use existing tools to install the dependency library to a project with minimal manual intervention.

Two types of dependent packages are required for developing Function Compute, one of which is the DEB software package installed by the APT package manager. The other is the packages installed by a specific language environment manager (such as Maven or pip). The following analyzes the package managers in different language environments.

Installation Directory of a Package Manager

Currently, Function Compute supports the Java, Python, and Node.js environments. The package managers for these three languages are Maven, pip, and NPM respectively. The following describes the installation directory of each of these package managers.

maven

Maven is the package manager for Java. Maven downloads the dependencies declared in the project file pom.xml from the central repository or a private repository to the $M2_HOME/repository directory. The default value of M2_HOME is $HOME/.m2. All Java projects on a development machine share the JAR packages under this local repository directory. During the mvn package phase, all the dependent JAR packages are packaged into the final deliverable. Therefore, Java projects run without depending on any files in the $M2_HOME/repository directory.

pip

Currently, pip is the most popular and recommended Python package manager. Before understanding how to install the installation package to a local directory, you need to learn more about the Python package manager. To help improve your understanding, the following first describes the development history of the Python package manager.

Before 2004, setup.py was recommended for Python installation. To use it, download any module and use the setup.py file supplied with the module.

python setup.py install

setup.py is developed from Distutils. Released in 2000 as part of the Python standard repository, Distutils is used to build and install Python modules.

Therefore, you can also use setup.py to release a Python module or

python setup.py sdist

even package the module into an RPM or EXE file.

python setup.py bdist_rpm
python setup.py bdist_wininst

Similarly to MakeFile, setup.py can be used for building and installation. However, the building and installation processes are integrated, thus you always have to build a module when installing the module, which wastes some resources. In 2004, the Python community released setuptools, which contains the easy_install tool. After that, Python began to support the EGG format and introduced the online repository PyPi, which respectively correspond to the JAR format and the Maven repository from the JAVA community.

The online module repository PyPi offers two key advantages:

  1. It only requires the installation of the pre-compiled EGG package, improving efficiency.
  2. It automatically downloads dependent packages from PyPi and installs them.

Since its release in 2008, pip has gradually replaced easy_install to become the de facto standard Python package manager. As it is compatible with the EGG format, pip prefers the Wheel format and supports installation of the module from a code version repository (for example, GitHub).

The following describes the directory structure of a Python module. The directories of installation files in both EGG and Wheel formats are classified into five types: purelib, platlib, headers, scripts, and data.

Directory Installation location Purpose
purelib $prefix/lib/pythonX.Y/site-packages Pure Python implementation library
platlib $exec-prefix/lib/pythonX.Y/site-packages Platform-related DLL
headers $prefix/include/pythonX.Yabiflags/distname C header files
script $prefix/bin Executable files
data $prefix Data files, such as .conf configuration files and SQL initialization files

$prefix and $exec-prefix are Python compiler parameters, which can be retrieved from sys.prefix and sys.exec_prefix. Their defaults on Linux are both /usr/local.

npm

NPM is the package manager for Node.js. After the npm install command is run, dependent packages are downloaded to the node_modules sub-directory under the current directory. Node.js runtime dependencies are all included in the current directory. However, some Node.js libraries depend on the local environment that was built when the module was installed. The locally dependent libraries cannot run if the build environment (such as Windows) is different from the runtime environment (such as Linux). Additionally, if development libraries and runtime libraries are installed during building, DDLs that were locally installed by the operating system package manager (such as apt-get) may not exist in the container under the runtime environment.

Troubleshooting Problems

Next, you will see how to resolve problems that occur when dependent libraries of Function Compute are installed.

Dependencies Installed to the Global System Directory

Maven and pip install dependent packages to a system directory other than the project directory. When a project is built, Maven packages all external dependencies to the final deliverable. Therefore, projects managed by Maven are free from dependency-related problems during runtime. For JAVA projects not managed by Maven, it is also a common practice to place dependent JAR packages into the current directory or its sub-directory and package them to the final deliverable. In this way, dependency-related problems are prevented during Java runtime. However, such problems occur in pip-managed Python environments. pip installs dependencies to the system directory, but the production environment (except the /tmp directory) of Function Compute is read-only and the prefabricated environment cannot be built.

Native Dependencies

Common Python and Node.js library files depend on the native environment of the system. The DDLs in the compilation and runtime environments must be installed, resulting in poor portability in both cases.

When Function Compute runs on Debian or Ubuntu, the APT package is used to manage system installation programs and libraries. By default, these programs and libraries are installed to a system directory, for example, /usr/bin, /usr/lib, /usr/local/bin, or /usr/local/lib. Therefore, native dependencies also need to be installed to a local directory.

Recommended Solutions

Several intuitive solutions are described as follows:

  1. Ensure that the development system for dependency installation is consistent with the production execution system. Use fcli sbox to install the dependencies.
  2. Place all the dependency files in a local directory. Copy the modules, executable files, and .ddl or .so files from pip to the current directory.

However, in practice, it is difficult to place the dependency files into the current directory.

  1. Library files installed by pip and apt-get are scattered to different directories. This means that you must be familiar with different package managers in order to find these files.
  2. Library files have transitive dependencies. When a library is installed, other libraries on which the library depends are also installed. This makes it very tedious to manually retrieve these dependencies.

In this case, how can we manually install dependencies to the current directory with minimal manual intervention? The following describes some methods used by the pip and APT package managers and compares their pros and cons.

Installation of Dependencies to the Current Directory

Python

Method 1: Use the --install-option parameter

pip install --install-option="--install-lib=$(pwd)" PyMySQL

When --install-option is used, parameters are passed to setup.py. However, neither the .egg nor the .whl files contains the setup.py file. Therefore, using --install-option triggers the installation procedure based on the source code package while setup.py triggers the module building process.

--install-option supports the following options:

File type Option
Python modules --install-purelib
extension modules --install-platlib
all modules --install-lib
scripts --install-scripts
data --install-data
C headers --install-headers

When --install-lib is used, the values of --install-purelib and --install-platlib are overwritten.

In addition, --install-option="--prefix=$(pwd)" supports installation to the current directory, but a sub-directory named lib/python2.7/site-packages will be created under the current directory.

Advantages:

  1. You can install the module to a local directory, such as purelib.

Disadvantages:

  1. This method is inapplicable to modules that do not contain source code packages.
  2. A system is built without making full use of the Wheel package.
  3. To fully install the module, many more parameters need to be configured, which is tedious.

Method 2: Use the --target or -t parameter

pip install --target=$(pwd) PyMySQL

--target is a parameter newly provided by pip. When this parameter is used, the module is directly installed to the current directory without creating the sub-directory named lib/python2.7/site-packages. This method is easy to use and is applicable for modules with a few dependencies.

Method 3: Use PYTHONUSERBASE in conjunction with --user

PYTHONUSERBASE=$(pwd) pip install --user PyMySQL

When --user is used, the module is installed to the site.USER_BASE directory. The default value of this directory is ~/.local for Linux, ~/Library/Python/X.Y for MacOS, and %APPDATA%\Python for Windows. The environment variable PYTHONUSERBASE can be used to change the value of site.USER_BASE.

Similar to --prefix=, when --user is used, the sub-directory named lib/python2.7/site-packages is created.

Method 4: Use virtualenv

pip install virtualenv
virtualenv path/to/my/virtual-env
source path/to/my/virtual-env/bin/activate
pip install PyMySQL

virutalenv is a recommended method from the Python community because it does not contaminate the global environment. When virtualenv is used, both desired modules (such as PyMySQL) and package managers (such as setuptools, pip, and wheel) are saved to a local directory. Although these modules increase the size of the package, they are not used during runtime.

apt-get

DDLs and executable files installed by apt-get also need to be installed to a local directory. After trying the chroot and apt-get -o RootDir=$(pwd) methods recommended on the Internet, we have discarded them because they are defective. Starting on a foundation of the preceding methods, we have made improvements and designed a method that uses apt-get to download DEB packages and uses dpkg to install these packages.

apt-get install -d -o=dir::cache=$(pwd) libx11-6 libx11-xcb1 libxcb1
for f in $(ls ./archives/*.deb)
do 
    dpkg -x $pwd/archives/$f $pwd
done

Running Method

Java loads jar and class files by setting classpaths. nodejs automatically loads the packages under node_modules in the current directory. Here, these common operations are omitted.

Python

Python loads the module file from the directory list that sys.path points to.

> import sys
> print '\n'.join(sys.path)

/usr/lib/python2.7
/usr/lib/python2.7/plat-x86_64-linux-gnu
/usr/lib/python2.7/lib-tk
/usr/lib/python2.7/lib-old
/usr/lib/python2.7/lib-dynload
/usr/local/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages

By default, sys.path includes the current directory. Therefore, the second method can ignore setting sys.path because the module is installed in the current directory when you use the -- target or -t parameter.

You can use sys.path.append(dir) when the program starts because sys.path is an editable array. To improve the portability of the program, you can also use the PYTHONPATH environment variable.

export PYTHONPATH=$PYTHONPATH:$(pwd)/lib/python2.7/site-packages

apt-get

Ensure that executable files and DDLs installed using apt-get are available in the directory list set by the PATH and LD_LIBRARY_PATH environment variables.

PATH

The PATH variable indicates a list of directories that the system uses to search for executable programs. Add bin or sbin directories such as bin, usr/bin, and usr/local/bin to the PATH variable.

export PATH=$(pwd)/bin:$(pwd)/usr/bin:$(pwd)/usr/local/bin:$PATH

Note that the preceding content is applicable to Bash. For Java, Python, and node.js, make adjustments accordingly when modifying the PATH environment variable of the current process.

LD_LIBRARY_PATH

Similar to PATH, LD_LIBRARY_PATH is a directory list in which you can search for DDLs. Typically, the system places dynamic links in the /lib, /usr/lib, and /usr/local/lib directories. Some modules are placed into the sub-directories of these directories, such as /usr/lib/x86_64-linux-gnu. Typically, these sub-directories are recorded in the files under /etc/ld.so.conf.d/.

cat /etc/ld.so.conf.d/x86_64-linux-gnu.conf
# Multiarch support
/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu

Therefore, the so files in the directories declared in the files under $(pwd)/etc/ld.so.conf.d/ also must be retrievable from the directory list set by the LD_LIBRARY_PATH environment variable.

Note that modifications to the LD_LIBRARY_PATH environment variable during runtime may not take effect, which is true, at least, for Python. In the LD_LIBRARY_PATH variable, the /code/lib directory has been preset. Therefore, you can soft-link all the dependent so files to the /code/ lib directory.

Conclusion

This document explains how to run the pip and apt-get commands to install libraries in the local directory and set environment variables during runtime so that the program can find the installed local library files.

The four methods provided by Python are applicable to any common scenario. Despite the slight differences described above, you can choose an appropriate method based on your needs.

apt-get is another method. Compared with other methods, this method reduces the package size because it does not require to install the deb package that is already installed in the system. To further reduce the size, you can delete unnecessary files that have been installed, such as the user manual.

This document is part of the technology accumulation process for customizing better tools. On this basis, we will provide better tools in the future to simplify development.

References

  1. How does python find packages?
  2. Pip User Guide
  3. python-lambda-local
  4. python-lambda
  5. Guide to Python package management tools
  6. Running apt-get for another partition/directory?
0 0 0
Share on

vangie

2 posts | 0 followers

You may also like

Comments

vangie

2 posts | 0 followers

Related Products