×
Community Blog How to Migrate LibreOffice to Function Compute

How to Migrate LibreOffice to Function Compute

This blog explains how to compress, migrate, and compile LibreOffice to Alibaba Cloud Function Compute for a serverless file converter.

By Du Wan

LibreOffice [1] is a free and open source code office suite developed by the Documentation Foundation. The LibreOffice suite includes text processors, spreadsheets, presentation programs, vector graphics editors and chart tools, database management programs, and applications for creating and editing mathematical formulas. By using LibreOffice's CLI, Microsoft Office files can be easily converted to PDF files. See the following figure:

$ soffice --convert-to pdf --outdir /tmp /tmp/test.doc

The size of a full LibreOffice program is 2 GB. In Function Compute, however, the size of the /tmp cache directory is limited to 512 MB and that of the zip package is limited to 50 MB. Fortunately, the aws-lambda-libreoffice project [2] from the community has successfully migrated LibreOffice to the AWS Lambda platform. Based on the existing methods and experiences, I created the fc-libreoffice project, which enables LibreOffice to run on Alibaba Cloud's function calculation platform. fc-libreoffice resolves the following problems based on aws-lambda-libreoffice:

  • Recompile and tailor LibreOffice to adapt it to the built-in gcc and kernel versions of the FC nodejn8 runtime environment.
  • Install libssl3 dependencies that were missing during the runtime.
  • Download and extract it during OSS runtime to override the 50 MB limit for ZIP packages.
  • The example project has been created to support one-click deployment for a quick experience.

This document focuses on the entire migration process. It also records the key steps for migration to the function calculation platform in the future by using similar conversion tools. If you are interested in how to quickly build a cheap and scalable Word-to-PDF cloud service, see Launching a Word-to-PDF Cloud Service on Function Compute.

Preparation

We recommend that you prepare a Debian/Ubuntu machine with high specifications because LibreOffice compilation consumes a number of computing resources. Install and configure the following tools on the machine:

For MacOS systems, use the following installation method:

brew tap vangie/formula
brew install fun

For other platforms, install by using NPM.

npm install @alicloud/fun -g

The command line tool for OSS is ossutil. Download and store the tool in the directory that $PATH points to.

Compile LibreOffice

We use the aliyunfc/runtime-nodejs8:build docker image provided by fc-docker to compile LibreOffice. fc-docker provides a range of docker images, whose runtime environments are very similar to actual Function Compute environments. Because we will run LibreOffice in the nodejs8 environment, aliyunfc/runtime-nodejs8:build is used in this case. The tab image requires more basic packages compared with other images.

Start a Compilation Environment

Run the following command to start a container for building LibreOffice.

docker run --name libre-builder --rm  -v $(pwd):/code -d -t --cap-add=SYS_PTRACE --security-opt seccomp=unconfined aliyunfc/runtime-nodejs8:build bash

In this case, a container named lipo-builder is started and the current directory is mounted to the /code directory of the file system in the container. The additional parameter --cap-add=SYS_PTRACE --security-opt seccomp=unconfined is required for CPP program compilation. If this parameter is missing, you will be prompted with a warning. Here, -d indicates daemon and -t indicates tty. The bash command prevents the container from exiting. --rm indicates that the container is automatically deleted once it stops.

Install the Compilation Tool

Now, enter the container to install the compilation tool.

apt-get install -y ccache
apt-get build-dep -y libreoffice

CCache is a compilation tool that can accelerate multi-compilation of GCC to the same program. Although the initial compilation process takes a relatively long time, CCache can significantly accelerate subsequent compilation processes.

The build-dep subcommand of apt-get builds an environment for the software programs to be complied. Specifically, it installs all the required tools and packages.

Clone the Source Code

git clone --depth=1 git://anongit.freedesktop.org/libreoffice/core libreoffice
cd libreoffice

Add the --depth=1 parameter because full cloning is time-consuming for a large-scale LibreOffice project and the Git submission history is useless for compilation.

Configure and Compile

# For compilation of the software program multiple times, this setting can accelerate the progress of subsequent compilations.
ccache --max-size 16 G && ccache -s

Remove the unwanted modules with the --disable parameter to reduce compilation residuals.

# The most important part. Run ./autogen.sh --help to see what each option means
./autogen.sh --disable-report-builder --disable-lpsolve --disable-coinmp \
    --enable-mergelibs --disable-odk --disable-gtk --disable-cairo-canvas \
    --disable-dbus --disable-sdremote --disable-sdremote-bluetooth --disable-gio --disable-randr \
    --disable-gstreamer-1-0 --disable-cve-tests --disable-cups --disable-extension-update \
    --disable-postgresql-sdbc --disable-lotuswordpro --disable-firebird-sdbc --disable-scripting-beanshell \
    --disable-scripting-javascript --disable-largefile --without-helppack-integration \
    --without-system-dicts --without-java --disable-gtk3 --disable-dconf --disable-gstreamer-0-10 \
    --disable-firebird-sdbc --without-fonts --without-junit --with-theme="no" --disable-evolution2 \
    --disable-avahi --without-myspell-dicts --with-galleries="no" \
    --disable-kde4 --with-system-expat --with-system-libxml --with-system-nss \
    --disable-introspection --without-krb5 --disable-python --disable-pch \
    --with-system-openssl --with-system-curl --disable-ooenv --disable-dependency-tracking

Start compiling

make

The compilation result is stored in the ./instdir/ directory.

Reduce the Size

Run the strip command to remove the symbol and compilation information from the binary file.

# this will remove ~100 MB of symbols from shared objects
strip ./instdir/**/*

Delete the unnecessary files.

# remove unneeded stuff for headless mode
rm -rf ./instdir/share/gallery \
    ./instdir/share/config/images_*.zip \
    ./instdir/readmes \
    ./instdir/CREDITS.fodt \
    ./instdir/LICENSE* \
    ./instdir/NOTICE

Verification

Run the following command to test whether the compiled soffice can properly convert a .txt file to a .pdf file.

echo "hello world" > a.txt
./instdir/program/soffice --headless --invisible --nodefault --nofirststartwizard \
    --nolockcheck --nologo --norestore --convert-to pdf --outdir $(pwd) a.txt

Package

# archive
tar -zcvf lo.tar.gz instdir

Run the following command to copy the lo.tar.gz file in the container file system to the host file system.

docker cp libre-builder:/code/libreoffice/lo.tar.gz ./lo.tar.gz

Gzip vs Zopfli vs Brotli

Gzip, Zopfli, and Brotli are three open source compression algorithms. When you use these algorithms to compress a chromium file of 130 MB, their compression results are as follows:

File Algorithm MiB Compression Ratio Decompression Duration
chromium - 130.62 - -
chromium.gz Gzip 44.13 66.22% 0.968s
chromium.gz Zopfli 43.00 67.08% 0.935s
chromium.br Brotli 33.21 74.58% 0.712s

From the preceding table, we can see that the Brotli algorithm is the most efficient.

Because aliyunfc/runtime-nodejs8:build is based on the released Debian Jessie, it is difficult to install Brotli on Debian Jessie. Therefore, we used the Ubuntu container to convert tar.gz files to tar.br files.

docker run --name brotli-util --rm -v $(pwd):/root -w /root -d -t ubuntu:18.04 bash
docker exec -t brotli-util apt-get update
docker exec -t brotli-util apt-get install -y brotli
docker exec -t brotli-util gzip -d lo.tar.gz
docker exec -t brotli-util brotli -q 11 -j -f lo.tar

In the current directory, a lo.tar.br file is generated.

Install Dependencies

To run soffice in the nodejs8 environment of Function Compute, you must run NPM to install the decompression dependency package @shelf/aws-lambda-brotli-unpacker for tar.br files, and run apt-get to install the libnss3 dependency. Start a nodejs8 container to ensure that the dependency installation and runtime environments are consistent.

docker run --rm --name libreoffice-builder -t -d -v $(pwd):/code --entrypoint /bin/sh aliyunfc/runtime-nodejs8

Note: @shelf/aws-lambda-brotli-unpacker has a native binding, so packaging and uploading files by running npm install on MacOS systems does not work.

docker exec -t libreoffice-builder npm install

Because the global deb package cannot be installed when Function Compute is running, download deb and dependent deb packages and install them to the current working directory rather than to the system directory. In the current working directory, deb and the dependent deb packages can be packaged and uploaded along with the code.

docker exec -t libreoffice-builder apt-get install -y -d -o=dir::cache=/code libnss3
docker exec -t libreoffice-builder bash -c 'for f in $(ls /code/archives/*.deb); do dpkg -x $f $(pwd) ; done;'

libnss3 contains many .so dynamic link library files. The DDLs set in the LD_LIBRARY_PATH environment variable can be found only in a Linux system. However, on Function Compute, the /code/lib directory is added to LD_LIBRARY_PATH by default. Therefore, a script is developed to link all .so files to the /code/lib directory.

docker exec -t libreoffice-builder bash -c "rm -rf /code/archives/; mkdir -p /code/lib;cd /code/lib; find ../usr/lib -type f \( -name '*.so' -o -name '*.chk' \) -exec ln -sf {} . \;"

Download and Decompress the tar.br Package

To use the lo.tar.br file, upload it to OSS first.

ossutil cp $SCRIPT_DIR/../node_modules/fc-libreoffice/bin/lo.tar.br oss://${OSS_BUCKET}/lo.tar.br \
     -i ${ALIBABA_CLOUD_ACCESS_KEY_ID} -k ${ALIBABA_CLOUD_ACCESS_KEY_SECRET} -e oss-${ALIBABA_CLOUD_DEFAULT_REGION}.aliyuncs.com -f

Download the tar.br package with the initializer method.

module.exports.initializer = (context, callback) => {

    store = new OSS({
        region: `oss-${process.env.ALIBABA_CLOUD_DEFAULT_REGION}`,
        bucket: process.env.OSS_BUCKET,
        accessKeyId: context.credentials.accessKeyId,
        accessKeySecret: context.credentials.accessKeySecret,
        stsToken: context.credentials.securityToken,
        internal: process.env.OSS_INTERNAL === 'true'
    });

    if (fs.existsSync(binPath) === true) {
        callback(null, "already downloaded.");
        return;
    }

    co(store.get('lo.tar.br', binPath)).then(function (val) {
        callback(null, val)
    }).catch(function (err) {
        callback(err)
    });
};

Use the @shelf/aws-lambda-brotli-unpacker npm package to decompress the lo.tar.br package.

const {unpack} = require('@shelf/aws-lambda-brotli-unpacker');
const {execSync} = require('child_process');

const inputPath = path.join(__dirname, '..', 'bin', 'lo.tar.br');
const outputPath = '/tmp/instdir/program/soffice';

module.exports.handler = async event => {
  await unpack({inputPath, outputPath});

  execSync(`${outputPath} --convert-to pdf --outdir /tmp /tmp/example.docx`);
};

Deploy Functions on Fun

Compose a template. yml file and write all Function Compute settings to the file. Run the fun deploy command to deploy a function.

ROSTemplateFormatVersion: '2015-09-01'
Transform: 'Aliyun::Serverless-2018-04-03'
Resources:
  libre-svc: # service name
    Type: 'Aliyun::Serverless::Service'
    Properties:
      Description: 'fc test'
      Policies: 
        - AliyunOSSFullAccess
    libre-fun: # function name
      Type: 'Aliyun::Serverless::Function'
      Properties:
        Handler: index.handler
        Initializer: index.initializer
        Runtime: nodejs8
        CodeUri: './'
        Timeout: 60
        MemorySize: 640
        EnvironmentVariables:
          ALIBABA_CLOUD_DEFAULT_REGION: ${ALIBABA_CLOUD_DEFAULT_REGION}
          OSS_BUCKET: ${OSS_BUCKET}
          OSS_INTERNAL: 'true'

In actual scenarios, it is inappropriate to write both keys and variables to the template.yml file. To separate the code from the settings, the variable placeholders ${ALIBABA_CLOUD_DEFAULT_REGION} and ${OSS_BUCKET} are used in the example.

Replace the placeholders with envsubst.

SCRIPT_DIR=`dirname -- "$0"`
source $SCRIPT_DIR/../.env

export ALIBABA_CLOUD_DEFAULT_REGION OSS_BUCKET
envsubst < $SCRIPT_DIR/../template.yml.tpl > $SCRIPT_DIR/../template.yml

cd $SCRIPT_DIR/../

All the preceding settings are written to the .env file. dotenv is a common community solution, which is supported by various tools.

Conclusion

This document explains how to compile LibreOffice, which is challenging during the migration process. LibreOffice also requires installing npm native binding and apt-get to the local directory. For this reason, this example also applies to Function Compute dependencies. The steps in this document strongly depend on the fc-docker image for both compilation and dependency installation. This image resolves the problem of environmental differences, greatly reducing migration difficulty. Loading data when a large file is running is another common Function Compute problem. In the conversion tool scenario, a binary program is usually a large file. In the machine learning scenario, a data file in the training model is normally large. In both scenarios, you can use OSS to download and decompress the packages. Given that Function Compute now supports NAS, the use of NAS to mount shared online storage is also applicable.

For the full source code used in this document, refer to the fc-libreoffice project.

References

1.  https://en.wikipedia.org/wiki/LibreOffice
2.  How to Run LibreOffice in AWS Lambda for Dirty-Cheap PDFs at Scale
3.  https://github.com/alixaxel/chrome-aws-lambda
4.  https://github.com/shelfio/aws-lambda-brotli-unpacker

0 0 0
Share on

Alibaba Cloud Serverless

97 posts | 7 followers

You may also like

Comments

Alibaba Cloud Serverless

97 posts | 7 followers

Related Products