Use a DataCache to accelerate the building of an Alpaca-LoRa application - Elastic Container Instance

This topic describes how to use a DataCache to accelerate the building of an Alpaca-LoRa application. If you want to build an Alpaca-LoRa application, you can pull the llama-7b-hf model data and alpaca-lora-7b weight data from a DataCache in advance. When you create the pod that corresponds to the Alpaca-LoRa application, you can mount the model data and weight data to the pod. This eliminates the need to pull data and accelerates the startup of the Apache-LoRa application.

Background information

Alpaca-LoRa is a lightweight language model that uses Low-Rank Adaptation (LoRA) techniques to be fine-tuned on the LLaMA (Large Language Model Meta AI) model. Alpaca-LoRa can simulate natural language for dialogue and interaction, generate different texts based on the instructions entered by a user, and help users complete tasks such as writing, translation, and coding.

Important

Alibaba Cloud does not guarantee the legality, security, or accuracy of third-party models. Alibaba Cloud is not liable for any damages caused thereby.
You must abide by the user agreements, usage specifications, and relevant laws and regulations of the third-party models. You agree that your use of the third-party models is at your sole risk.

Prerequisites

A DataCache custom resource definition (CRD) is deployed in the cluster. For more information, see Deploy a DataCache CRD.
The virtual private cloud (VPC) in which the cluster resides is associated with an Internet NAT gateway. A SNAT entry is configured for the Internet NAT gateway to allow resources in the VPC or resources connected to vSwitches in the VPC to access the Internet.
Note
If the VPC is not associated with an Internet NAT gateway, you must associate an elastic IP address (EIP) with the VPC when you create the DataCache and deploy the application. This way, you can pull data from the Internet.

Procedure

Create an Alpaca-LoRa image

Create an image based on your business requirements.

Visit alpaca-lora and clone a repository to your on-premises machine.

Modify the requirements.txt and Dockerfile in the repository.

Show the requirements.txt

accelerate
appdirs
loralib
bitsandbytes
black
black[jupyter]
datasets
fire
git+https://github.com/huggingface/peft.git
transformers>=4.28.0
sentencepiece
gradio
scipy

Show the Dockerfile

FROM nvidia/cuda:11.8.0-devel-ubuntu22.04

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y \
    git \
    curl \
    software-properties-common \
    && add-apt-repository ppa:deadsnakes/ppa \
    && apt install -y python3.10 \
    && rm -rf /var/lib/apt/lists/*
WORKDIR /workspace
COPY requirements.txt requirements.txt
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10 \
    && python3.10 -m pip install -r requirements.txt \
    && python3.10 -m pip install numpy --pre torch --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu118 \
    && python3.10 -m pip install --upgrade typing-extensions
COPY . .

EXPOSE 7860

Use the Dockerfile to build an image.
Push the image to the image repository.

Create a DataCache

Visit Hugging Face and obtain the IDs of the models.
In this example, the following models are used. Find the models in Hugging Face and copy the IDs of the models in the upper part of the model details page.
- decapoda-research/llama-7b-hf
- tloen/alpaca-lora-7b

Create DataCaches

Create a DataCache for llama-7b-hf.

kubectl apply -f llama-7b-hf.yaml

The llama-7b-hf.yaml document:

apiVersion: eci.aliyun.com/v1alpha1
kind: DataCache
metadata:
  name: llama-7b-hf
spec:
  path: /model/llama-7b-hf                          # Specify the storage path of the model data.
  bucket: test                                      # Specify the bucket in which you want to store the DataCache.
  dataSource:
    type: URL 
    options:
      repoSource: "HuggingFace/Model"               # Specify a model whose data source is Hugging Face.
      repoId: "decapoda-research/llama-7b-hf"       # Specify the ID of the model.
  netConfig: 
    securityGroupId: sg-2ze63v3jtm8e6sy******
    vSwitchId: vsw-2ze94pjtfuj9vaym******           # Specify a vSwitch for which a SNAT gateway is configured.

Create a DataCache for alpaca-lora-7b.

kubectl apply -f alpaca-lora-7b.yaml

The alpaca-lora-7b.yaml document:

apiVersion: eci.aliyun.com/v1alpha1
kind: DataCache
metadata:
  name: alpaca-lora-7b
spec:
  path: /model/alpaca-lora-7b                        # Specify the storage path of the model data.
  bucket: test                                       # Specify the bucket in which you want to store the DataCache.
  dataSource:
    type: URL 
    options:
      repoSource: "HuggingFace/Model"               # Specify a model whose data source is Hugging Face.
      repoId: "tloen/alpaca-lora-7b" # Specify the ID of the model.
  netConfig: 
    securityGroupId: sg-2ze63v3jtm8e6sy******
    vSwitchId: vsw-2ze94pjtfuj9vaym******           # Specify a vSwitch for which a SNAT gateway is configured.

Query the status of the DataCaches.
```
kubectl get edc
```
After the data is downloaded and the status of the DataCaches becomes Available, the DataCaches are ready for use. Example:

Deploy an Alpaca-lora application

Write a YAML configuration file for the Alpaca-lora application, and then use the YAML file to deploy the application.

kubectl create -f alpacalora.yaml

The following sample code provides a sample content of alpacalora.yaml. You can create two resource objects:

Deployment: The name of the Deployment is alpacalora. The Deployment contains a pod replica. The pod has an additional temporary storage space of 20 GiB. The llama-7b-hf and alpaca-lora-7b Datacaches are mounted to the pod. The image for the containers in the pod is the Alpaca-LoRa image that you created. After the containers are started, the containers run python3.10 generate.py --load_8bit --base_model /data/llama-7b-hf --lora_weights /data/alpaca-lora-7b.
Service: The name of the Service is alpacalora-svc. The type of the Service is LoadBalancer. The Service exposes port 80 and forwards data transfers to port 7860 of pods that have the app: alpacalora label.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: alpacalora
  labels:
    app: alpacalora
spec:
  replicas: 1 
  selector:
    matchLabels:
      app: alpacalora
  template:
    metadata:
      labels:
        app: alpacalora
      annotations:
        k8s.aliyun.com/eci-data-cache-bucket: "test"              # Specify the bucket in which you want to store the DataCache.
        k8s.aliyun.com/eci-extra-ephemeral-storage: "20Gi"        # Increase the temporary storage space by 20 GiB.
    spec:
      containers:
      - name: alpacalora
        image: registry.cn-hangzhou.aliyuncs.com/****/alpaca-lora:v3.5   # Use the image that you created.
        command: ["/bin/sh","-c"]
        args: ["python3.10 generate.py --load_8bit --base_model /data/llama-7b-hf --lora_weights /data/alpaca-lora-7b"] # Replace arguments in the startup command with actual values.
        resources:
          limits:
            cpu: "16000m"
            memory: "64.0Gi"
        ports:
        - containerPort: 7860
        volumeMounts:
        - mountPath: /data/llama-7b-hf         # Specify the mount path of llama-7b-hf in the container.
          name: llama-model
        - mountPath: /data/alpaca-lora-7b   # Specify the mount path of alpaca-lora-7b in the container.
          name: alpacalora-weight
      volumes:
        - name: llama-model
          hostPath:
            path: /model/llama-7b-hf       # Specify the storage path of llama-7b-hf.
        - name: alpacalora-weight
          hostPath:
            path: /model/alpaca-lora-7b   # Specify the storage path of alpaca-lora-7b.
---
apiVersion: v1
kind: Service
metadata:
  name: alpacalora-svc
spec:
  ports:
  - port: 80
    targetPort: 7860
    protocol: TCP
  selector:
    app: alpacalora
  type: LoadBalancer

Check the deployment status of the application.
```
kubectl get deployment alpacalora
kubectl get Pod
```
The following example shows that the Alpaca-lora application is deployed.
View the IP address of the Service.
```
kubectl get svc alpacalora-svc 
```
In the following example, the IP address of the Service that is displayed in the EXTERNAL-IP column is 123.57.XX.XX.

Test the model

Add an inbound rule to the security group to which the pod belongs and open port 80.
Open a browser and visit the external IP address of the Service over port 80.
Enter text transcripts to test the model.
Example: