This topic describes how to use a DataCache to accelerate the building of an Alpaca-LoRa application. If you want to build an Alpaca-LoRa application, you can pull the llama-7b-hf model data and alpaca-lora-7b weight data from a DataCache in advance. When you create the pod that corresponds to the Alpaca-LoRa application, you can mount the model data and weight data to the pod. This eliminates the need to pull data and accelerates the startup of the Apache-LoRa application.
Background information
Alpaca-LoRa is a lightweight language model that uses Low-Rank Adaptation (LoRA) techniques to be fine-tuned on the LLaMA (Large Language Model Meta AI) model. Alpaca-LoRa can simulate natural language for dialogue and interaction, generate different texts based on the instructions entered by a user, and help users complete tasks such as writing, translation, and coding.
Alibaba Cloud does not guarantee the legality, security, or accuracy of third-party models. Alibaba Cloud is not liable for any damages caused thereby.
You must abide by the user agreements, usage specifications, and relevant laws and regulations of the third-party models. You agree that your use of the third-party models is at your sole risk.
Prerequisites
A DataCache custom resource definition (CRD) is deployed in the cluster. For more information, see Deploy a DataCache CRD.
The virtual private cloud (VPC) in which the cluster resides is associated with an Internet NAT gateway. A SNAT entry is configured for the Internet NAT gateway to allow resources in the VPC or resources connected to vSwitches in the VPC to access the Internet.
NoteIf the VPC is not associated with an Internet NAT gateway, you must associate an elastic IP address (EIP) with the VPC when you create the DataCache and deploy the application. This way, you can pull data from the Internet.
Procedure
Create an Alpaca-LoRa image
Create an image based on your business requirements.
Visit alpaca-lora and clone a repository to your on-premises machine.
Modify the requirements.txt and Dockerfile in the repository.
Use the Dockerfile to build an image.
Push the image to the image repository.
Create a DataCache
Visit Hugging Face and obtain the IDs of the models.
In this example, the following models are used. Find the models in Hugging Face and copy the IDs of the models in the upper part of the model details page.
decapoda-research/llama-7b-hf
tloen/alpaca-lora-7b
Create DataCaches
Create a DataCache for llama-7b-hf.
kubectl apply -f llama-7b-hf.yamlThe llama-7b-hf.yaml document:
apiVersion: eci.aliyun.com/v1alpha1 kind: DataCache metadata: name: llama-7b-hf spec: path: /model/llama-7b-hf # Specify the storage path of the model data. bucket: test # Specify the bucket in which you want to store the DataCache. dataSource: type: URL options: repoSource: "HuggingFace/Model" # Specify a model whose data source is Hugging Face. repoId: "decapoda-research/llama-7b-hf" # Specify the ID of the model. netConfig: securityGroupId: sg-2ze63v3jtm8e6sy****** vSwitchId: vsw-2ze94pjtfuj9vaym****** # Specify a vSwitch for which a SNAT gateway is configured.Create a DataCache for alpaca-lora-7b.
kubectl apply -f alpaca-lora-7b.yamlThe alpaca-lora-7b.yaml document:
apiVersion: eci.aliyun.com/v1alpha1 kind: DataCache metadata: name: alpaca-lora-7b spec: path: /model/alpaca-lora-7b # Specify the storage path of the model data. bucket: test # Specify the bucket in which you want to store the DataCache. dataSource: type: URL options: repoSource: "HuggingFace/Model" # Specify a model whose data source is Hugging Face. repoId: "tloen/alpaca-lora-7b" # Specify the ID of the model. netConfig: securityGroupId: sg-2ze63v3jtm8e6sy****** vSwitchId: vsw-2ze94pjtfuj9vaym****** # Specify a vSwitch for which a SNAT gateway is configured.
Query the status of the DataCaches.
kubectl get edcAfter the data is downloaded and the status of the DataCaches becomes Available, the DataCaches are ready for use. Example:

Deploy an Alpaca-lora application
Write a YAML configuration file for the Alpaca-lora application, and then use the YAML file to deploy the application.
kubectl create -f alpacalora.yamlThe following sample code provides a sample content of alpacalora.yaml. You can create two resource objects:
Deployment: The name of the Deployment is alpacalora. The Deployment contains a pod replica. The pod has an additional temporary storage space of 20 GiB. The llama-7b-hf and alpaca-lora-7b Datacaches are mounted to the pod. The image for the containers in the pod is the Alpaca-LoRa image that you created. After the containers are started, the containers run
python3.10 generate.py --load_8bit --base_model /data/llama-7b-hf --lora_weights /data/alpaca-lora-7b.Service: The name of the Service is alpacalora-svc. The type of the Service is LoadBalancer. The Service exposes port 80 and forwards data transfers to port 7860 of pods that have the
app: alpacaloralabel.
apiVersion: apps/v1 kind: Deployment metadata: name: alpacalora labels: app: alpacalora spec: replicas: 1 selector: matchLabels: app: alpacalora template: metadata: labels: app: alpacalora annotations: k8s.aliyun.com/eci-data-cache-bucket: "test" # Specify the bucket in which you want to store the DataCache. k8s.aliyun.com/eci-extra-ephemeral-storage: "20Gi" # Increase the temporary storage space by 20 GiB. spec: containers: - name: alpacalora image: registry.cn-hangzhou.aliyuncs.com/****/alpaca-lora:v3.5 # Use the image that you created. command: ["/bin/sh","-c"] args: ["python3.10 generate.py --load_8bit --base_model /data/llama-7b-hf --lora_weights /data/alpaca-lora-7b"] # Replace arguments in the startup command with actual values. resources: limits: cpu: "16000m" memory: "64.0Gi" ports: - containerPort: 7860 volumeMounts: - mountPath: /data/llama-7b-hf # Specify the mount path of llama-7b-hf in the container. name: llama-model - mountPath: /data/alpaca-lora-7b # Specify the mount path of alpaca-lora-7b in the container. name: alpacalora-weight volumes: - name: llama-model hostPath: path: /model/llama-7b-hf # Specify the storage path of llama-7b-hf. - name: alpacalora-weight hostPath: path: /model/alpaca-lora-7b # Specify the storage path of alpaca-lora-7b. --- apiVersion: v1 kind: Service metadata: name: alpacalora-svc spec: ports: - port: 80 targetPort: 7860 protocol: TCP selector: app: alpacalora type: LoadBalancerCheck the deployment status of the application.
kubectl get deployment alpacalora kubectl get PodThe following example shows that the Alpaca-lora application is deployed.

View the IP address of the Service.
kubectl get svc alpacalora-svcIn the following example, the IP address of the Service that is displayed in the
EXTERNAL-IPcolumn is 123.57.XX.XX.
Test the model
Add an inbound rule to the security group to which the pod belongs and open port 80.
Open a browser and visit the external IP address of the Service over port 80.
Enter text transcripts to test the model.
Example:
