All Products
Search
Document Center

Elastic Compute Service:Deploy an end-to-end RAG solution on a TDX-enabled instance

Last Updated:Jan 09, 2025

This topic describes how to deploy a Retrieval Augment Generation (RAG) solution on an Elastic Compute Service (ECS) instance of the g8i general-purpose instance family with Intel® Trust Domain Extensions (TDX) enabled.

Background information

RAG is a state-of-the-art AI application framework that allows Large Language Models (LLMs) to provide answers based on data stored in private knowledge bases of an organization. RAG is widely used in scenarios such as enterprise-level knowledge bases, chatbots, and AI assistants. The security and privacy of private data are major considerations, which requires a secure and trusted environment.

The Alibaba Cloud ECS instances of the g8i general-purpose instance family with Intel® TDX enabled (also known as TDX-enabled instances) are an ideal choice for this use case. TDX-enabled instances provide hardware-based trusted confidential environments to ensure the confidentiality and integrity of tenant system-level data during runtime.

This topic provides an end-to-end guide on how to deploy an RAG-powered chatbot based on the Haystack software stack. A TDX-enabled ECS instances is used to ensure the privacy and security of user data at each stage. This topic helps you understand TDX in the following aspects:

  • Deepen the understanding of TDX encryption technology deployed on Alibaba Cloud servers.

  • Understand the end-to-end data security solution based on TDX encryption technology.

  • Obtain a TDX framework and scripts to quickly get started with TDX-enabled instances.

Architecture description

RAG uses pre-trained LLMs to provide extracted knowledge fragments when RAG is generating answers to enrich the content and improve the accuracy of the answers. RAG allows LLMs to extract knowledge fragments and use the fragments to generate answers, which can improve the variety and accuracy of answers. In the knowledge extraction phase, RAG measures the similarity of word vectors to identify the content that best matches the questions. At the answer generation phase, the selected knowledge data is directly injected into LLMs to generate answers that better fit the context.

The RAG workflow consists of the following parts:

  • Document processing: You encrypt the uploaded documents, decrypt the documents on a TDX-enabled instance, split and vectorize the documents, and then write the documents to a database.

  • Data retrieval: A retrieval model looks for relevant segments from the database and ranks the retrieved content based on the relevance of the content to your question.

  • Answer generation: An LLM generates an answer by combining the prompt words and the retrieved document content.

Traditional RAG solution

Traditional RAG frameworks are exposed to security threats when data is written to disks and when data is queried at the frontend, and threats exist in databases and LLMs. The following figure shows a traditional RAG framework.

image

RAG solution deployed in a TDX environment

The following figure shows the RAG framework deployed on a TDX-enabled instance.

image

Solution composition

The RAG architecture consists of two processes: cloud deployment and online Q&A processing. RAG deployment in the cloud involves the following operations:

  • A service deployer deploys the RAG service on a TDX-enabled instance. The RAG service consists of the document splitting module, vector database module, ranking module, LLM module, and frontend module.

    • Document splitting module: extracts and splits text from uploaded documents.

    • Vector database module: vectorizes the formatted data that is generated by the document splitting module and stores the data in the database. A combination of Facebook AI Similarity Search (Faiss) and MySQL is used in this architecture.

    • Ranking module: compares the vectorized questions with the data in the vector database and outputs the text with a high similarity to the LLM.

    • LLM module: uses the text output from the ranking module and specific prompt words to provide ultimate answers.

    • Frontend module: the interface where users ask questions and obtain answers from the LLM.

  • The service deployer uploads documents to be analyzed to the database.

Benefits

The RAG solution can address threats in the traditional RAG solution and provide the following benefits:

  • A TDX Remote Attestation-Transport Layer Security (RA-TLS) based communication scheme is built. In the communication scheme, remote attestation between the frontend and backend is performed to protect user requests.

  • RAG can be run in the Trust Domain (TD) virtual machine (VM) to protect the runtime.

  • Data can be stored in confidential databases where data is Always confidential.

  • Linux Unified Key Setup (LUKS) is used to protect data when data is written to disks, which secures the uploaded documents and LLMs.

Security and protection

The RAG solution provides data security and privacy protection mainly in the following aspects:

  • Memory encryption: RAG frontend and backend services run in the TD environment in which memory is encrypted. This prevents malicious parties from stealing the data of programs that run in the TD environment.

  • RA-TLS communication: When the Docker images in the RAG framework are deployed on different TDX-enabled instances, the RA-TLS communication scheme can be used to verify the identity of the remote node and ensure data security during data transmission. For information about RA-TLS, see RA-TLS Enhanced gRPC.

  • Linux Unified Key Setup (LUKS) encryption and Object Storage Service (OSS): The LUKS encryption technology and OSS are used to protect data when data in the database is written to disks. This prevents malicious parties from stealing model information from disks.

Procedure

Step 1: Create a TDX-enabled instance

  1. Go to the instance buy page in the ECS console.

  2. Configure parameters as prompted to create an ECS instance.

    Take note of the following parameters. For information about how to configure other parameters on the ECS instance buy page, see Create an instance on the Custom Launch tab.

    • Instance: To ensure the stability of the models that you want to use, select an instance type that is at least ecs.g8i.4xlarge with 64 GiB of memory.

    • Image: Select Alibaba Cloud Linux 3.2104 LTS 64-bit (UEFI) as the image version and select Confidential VM.

      image

    • Public IP Address: To accelerate the data downloads for the model, select Assign Public IPv4 Address, set Bandwidth Billing Method to Pay-by-traffic, and then set Maximum Bandwidth to 100 Mbit/s.

      image..png

    • Data Disk: We recommend that you set the data disk size to at least 100 GiB.

    • Security Group: Open ports 22, 80, and 8502.

  3. Install Python 3.8.

    The instance comes with Python 3.6, but this RAG solution needs Python 3.8 or later. Manual installation of Python 3.8 is required.

    1. Install the Python 3.8 package.

      sudo yum install -y python38
    2. Configure Python 3.8 as the default Python version.

      sudo update-alternatives --config python

      Follow the prompt and enter 4 to select Python 3.8 as the default Python version.

      image.png

    3. Update the corresponding pip version of Python.

      sudo python -m ensurepip --upgrade
      sudo python -m pip install --upgrade pip

Step 2: Deploy a Docker image

  1. Install Docker on the TDX-enabled instance that you created.

    For more information, see the Alibaba Cloud Linux 3 part of the "Install Docker" section in the "Install and use Docker on a Linux instance" topic.

  2. Download Confidential Computing Zoo (CCZoo) to the ECS instance.

    CCZoo uses the Intel Trusted Execution Environment (TEE) technology that includes Software Guard Extensions (SGX) and TDX and provides example cases for end-to-end security solutions in different scenarios. It helps you develop confidential computing solutions in a simpler and case-by-case manner.

    Note

    Replace the <workdir> value with the actual directory. In this example, /home/ecs-user is used.

    cd <workdir>
    git clone https://github.com/intel/confidential-computing-zoo.git
  3. Download or compile the Docker image.

    Note

    The system requires an extended period of time to download or compile the Docker image.

    • Download the Docker image from Docker Hub.

      sudo docker pull intelcczoo/tdx-rag:backend
      sudo docker pull intelcczoo/tdx-rag:frontend
    • Compile the Docker image.

      cd confidential-computing-zoo/cczoo/rag
      ./build-images.sh

Step 3: Create an encrypted partition

  1. Create an encrypted partition.

    Create an encrypted partition to store model files and document data.

    cd confidential-computing-zoo/cczoo/rag/luks_tools
    sudo yum install -y cryptsetup
    VFS_SIZE=30G
    VIRTUAL_FS=/home/vfs
    sudo ./create_encrypted_vfs.sh ${VFS_SIZE} ${VIRTUAL_FS}
    1. When the Are you sure? (Type 'yes' in capital letters): message appears, enter YES.

      image

    2. When the Enter passphrase for /home/vfs: message appears, enter a password for the encrypted partition.image

      After the encrypted partition is created, the system outputs the loop device number. In this example, the loop device number is /dev/loop1, as shown in the following figure.

      image

  2. Configure the LOOP_DEVICE environment variable of the loop device.

    Replace the <the bound loop device> value with the loop device number obtained in the previous step.

    export LOOP_DEVICE=<the bound loop device>
  3. Format the block loop device to the Ext4 file system.

    1. Create the /home/encrypted_storage directory and grant permissions to the current user. In this example, ecs-user is used.

      sudo mkdir /home/encrypted_storage
      sudo chown -R ecs-user:ecs-user /home/encrypted_storage/
    2. Format the block loop device to the Ext4 file system.

      ./mount_encrypted_vfs.sh ${LOOP_DEVICE} format

      When the Enter passphrase for /home/vfs: message appears, enter the password specified in Step 1.b.

      image

      The command output in the following figure indicates that the block loop device is formatted.

      image

Step 4: Download document data and backend models

    Important

    Alibaba Cloud does not guarantee the legality, security, or accuracy of third-party models. Alibaba Cloud is not liable for any damages caused thereby.

    You must abide by the user agreements, usage specifications, and relevant laws and regulations of the third-party models. You agree that your use of third-party models is at your sole risk.

The following data and backend models are used by default:

  • Document data: Sample content in <workdir>/confidential-computing-zoo/cczoo/rag/data/data.txt.

  • Backend LLM: Llama-2-7b-chat-hf.

  • Ranking model: ms-marco-MiniLM-L-12-v2.

  • Encoder models: dpr-ctx_encoder-single-nq-base and dpr-question_encoder-single-nq-base.

This example describes how to download the required models from the ModelScope community and Hugging Face website. If you want to download other models, the following steps can be used as reference.

Note

You can also obtain uploaded models or data from OSS. For more information, see Download objects.

ossutil64 cp oss://<your dir>/<your file or your data> /home/encrypted_storage
  1. Switch to the /home/encrypted_storage directory.

    cd /home/encrypted_storage
  2. Install the modelscope runtime library and configure the environment variable for the library.

    pip install modelscope
    export MODELSCOPE_CACHE=/home/encrypted_storage
  3. Download the Llama-2-7b-chat-hf model.

    python3.8 -c "from modelscope import snapshot_download; model_dir = snapshot_download('shakechen/Llama-2-7b-chat-hf')"
    mv shakechen/Llama-2-7b-chat-hf Llama-2-7b-chat-hf
  4. Install the huggingface_hub runtime library and configure the environment variable for the library.

    pip install -U huggingface_hub
    export HF_ENDPOINT=https://hf-mirror.com
  5. Download pre-trained models such as the ranking model and encoder models.

    huggingface-cli download --resume-download --local-dir-use-symlinks False cross-encoder/ms-marco-MiniLM-L-12-v2 --local-dir ms-marco-MiniLM-L-12-v2
    huggingface-cli download --resume-download --local-dir-use-symlinks False facebook/dpr-ctx_encoder-single-nq-base --local-dir dpr-ctx_encoder-single-nq-base
    huggingface-cli download --resume-download --local-dir-use-symlinks False facebook/dpr-question_encoder-single-nq-base --local-dir dpr-question_encoder-single-nq-base

Step 5: Activate the RAG service

  1. Switch to the rag directory.

    cd <workdir>/confidential-computing-zoo/cczoo/rag
  2. Start the database service container.

    sudo ./run.sh db

    If the output similar to the output shown in the following figure appears, the database service container is started.

    image

  3. Start the backend service container.

    sudo ./run.sh backend

    After the script is executed, the content in the data.txt file is split and stored in the database. Enter the IP address of the database that is the public IP address of the on-premises computer, the default database account that is root, and the default database password that is 123456 as prompted.

    For information about how to use the Always confidential database feature to encrypt sensitive data in a database, see Overview.

    image

    If the output similar to the output shown in the following figure appears, the backend service container is started.

    image

  4. Create a terminal session and start the frontend service container.

    cd <workdir>/confidential-computing-zoo/cczoo/rag
    sudo ./run.sh frontend

    If the output similar to the output shown in the following figure appears, the frontend service container is started.

    image

  5. Enter the external URL that is obtained from the previous step in the browser of your on-premises computer to start an AI conversation.

    A message in green background indicates that a secure connection between the frontend and backend is established. For information about the custom modifications and issues of the RAG framework, see Haystack.

    image