All Products
Search
Document Center

Platform For AI:Use PAI and a vector database to implement intelligent dialogue based on LLMs

Last Updated:Feb 21, 2024

You can use a vector database as a dedicated enterprise knowledge base to perform vector search, and perform model inference based on large language models (LLMs) in Elastic Algorithm Service (EAS) of Machine Learning Platform for AI (PAI). You can also use the open source framework LangChain to integrate vector search and model inference into your business by using EAS to achieve optimized inference results. This topic describes how to use PAI and a vector database to implement intelligent dialogue based on LLMs.

Background information

Principles

LangChain is an open source framework that allows AI developers to integrate LLMs, such as Tongyi Qianwen, with external data to achieve optimized performance and results and reduce the costs of computing resources. LangChain performs natural language processing on input knowledge files and then stores the files in the vector database. In each inference, LangChain searches for answers that are related to the prompt question in the vector database, and then inputs the answers together with the prompt question to the LLM service in EAS to generate a custom answer based on the vector database. LangChain supports user-defined prompts and provides the multi-round dialogue capability based on LLM and vector search.

Related products

Prerequisites

Step 1: Prepare a vector database

You can use Faiss to build an on-premises vector database that does not require activation or purchase. You can also activate Hologres, AnalyticDB for PostgreSQL, or Elasticsearch based on your business requirements. Prepare the parameters that are used to configure the web UI. You can use the parameters to connect to the vector database.

Hologres

  1. Purchase a Hologres instance and create a database. For more information, see Purchase a Hologres instance. You need to save the name of the database that you created to your on-premises machine.

  2. On the Instance Details page, view the invocation information.

    1. Click the name of the instance to go to the Instance Details page.

    2. In the Network Information section, click Copy in the Select VPC section to save the content before the domain name :80 to your on-premises machine.

  3. Go to the Account Management tab and create a custom user. Save the account and password that are used to connect to the Hologres instance to your on-premises machine. For more information, see the "Create a custom account" section in the Manage users topic.

    In the Select Member Role section, select Super Administrator (SuperUser).

AnalyticDB for PostgreSQL

  1. Create an instance in the AnalyticDB for PostgreSQL console. For more information, see Create an instance.

    In the Vector Engine Optimization section, select Enabled.

  2. Click the name of the instance to go to the Basic Information page. In the Database Connection Information section, copy the internal and public IP addresses of the instance and save the IP addresses to your on-premises machine.

    Note
    • If no public endpoint is available, click Apply for Public Endpoint. For more information, see Manage public endpoints.

    • If you connect to an instance that resides in the same VPC, you only need to use the internal endpoint.

  3. Create a database account. Save the database account and password that are used to connect to the database to your on-premises machine. For more information, see Create a database account.

  4. Set the whitelist to 0.0.0.0/0. For more information, see Configure an IP address whitelist.

ElasticSearch

  1. Create an Alibaba Cloud Elasticsearch cluster. For more information, see Create an Alibaba Cloud Elasticsearch cluster.

    Parameters:

    • Save the Username and Password to your on-premises machine.

  2. Click the name of the instance to go to the Basic Information page. Obtain and save the Private IP Address and Private Port to your on-premises machine.

Faiss

You can use Faiss to build an on-premises vector database in a lightweight manner, without the need to purchase or activate online vector databases.

Step 2: Use EAS to deploy the LLM inference service

  1. Go to the EAS-Online Model Services page.

    1. Log on to the Platform for AI (PAI) console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace to which the model service that you want to manage belongs.

    3. In the left-side navigation pane, choose Model Deployment>Elastic Algorithm Service (EAS) to go to the EAS-Online Model Services page. image.png

  2. On the PAI-EAS Model Online Service page, click Deploy Service. In the dialog box that appears, select Custom Deployment and click OK.

  3. On the Deploy Service page, configure the parameters. The following table describes key parameters.

    Parameter

    Description

    Service Name

    The name of the service.

    Deployment Method

    Select Deploy Web App by Using Image.

    Select Image

    Click PAI Image, select chat-llm-webui from the drop-down list, and select 2.0 as the image version.

    Note

    You can select the latest version of the image when you deploy the model service.

    Command to Run

    The command varies based on the model that you use.

    • Command to run if you use the chatglm2-6b model: python webui/webui_server.py --port=8000 --model-path=THUDM/chatglm2-6b.

    • Command to run if you use the Tongyi Qianwen-7b model: python webui/webui_server.py --port=8000 --model-path=Qwen/Qwen-7B-Chat.

    • Command to run if you use the llama2-7b model: python webui/webui_server.py --port=8000 --model-path=meta-llama/Llama-2-7b-chat-hf.

    • Command to run if you use the Llama2-13b model: python webui/webui_server.py --port=8000 --model-path=meta-llama/Llama-2-13b-chat-hf --precision=fp16.

    Port number: 8000.

    Resource Group Type

    Select Public Resource Group.

    Resource Configuration Mode

    Select General.

    Resource Configuration

    Select an Instance Type on the GPU tab. In terms of cost-effectiveness, we recommend that you use the ml.gu7i.c16m60.1-gu30 instance type.

    VPC

    • If you select Hologres, AnalyticDB for PostgreSQL, or Elasticsearch as the vector database, make sure that the VPC you specify is the same as the vector database.

    • If you select Faiss as the vector database, select an available VPC.

  4. Click Deploy. The deployment takes several seconds to complete.

    When the Service Status becomes Running, the service is deployed.

  5. Obtain the service endpoint and token for invocation.

    1. Click the service name to go to the Service Details page.

    2. In the Basic Information section, click View Endpoint Information.

    3. On the VPC Endpoints tab of the Call Information dialog box, obtain the endpoint and token of the service, and then save the endpoint and token to your on-premises machine.

Step 3: Deploy the LangChain service and launch the web UI

PAI lets you deploy services in a convenient manner. You can select an image to deploy the LangChain web UI service in EAS. For more information about LangChain, visit GitHub open source code.

  1. On the PAI-EAS Model Online Service page, click Deploy Service. In the dialog box that appears, select Custom Deployment and click OK.

  2. On the Deploy Service page, configure the required parameters. The following table describes key parameters.

    Parameter

    Description

    Service Name

    The name of the service. In this example, the name chatbot_langchain_vpc is used.

    Deployment Method

    Select Deploy Web App by Using Image.

    Select Image

    Click PAI Image, select chatbot-langchain from the drop-down list, and select 1.0 as the image version.

    You can select the latest version of the image when you deploy the model service.

    Command to Run

    • Command: uvicorn webui:app --host 0.0.0.0 --port 8000.

    • Port: 8000.

    Resource Group Type

    Select Public Resource Group.

    Resource Configuration Mode

    • In the Resource Configuration section, click CPU and select ecs.c7.4xlarge as the instance type.

    • Additional System Disk: 60 GB.

    VPC Settings

    • If you select Hologres, AnalyticDB for PostgreSQL, or Elasticsearch as the vector database, make sure that the VPC you specify is the same as the vector database.

    • If you select Faiss as the vector database, make sure that the VPC you specify is the same as the LLM model service.

  3. Click Deploy. The deployment takes several seconds to complete.

    When the Service Status becomes Running, the service is deployed.

  4. After you deploy the service, click View Web App in the Service Type column to launch the web UI.

Step 4: Use LangChain to integrate business data for intelligent dialogue

Configure parameters for the web UI

  1. On the Settings tab of the web UI page, configure the parameters based on the vector database that you select.

    • Emebdding Model: Select an Emebdding Model and the Emebdding Dimension. We recommend that you use the SGPT-125M-weightedmean-nli-model.

    • Emebdding Dimension: After you select an Emebding Model, the system automatically selects the Emebdding Dimension.

    • EAS Url: the endpoint that you obtained in Step 2.

    • EAS Token: the service token that you obtained in Step 2.

    • Configure Vector Store. The configuration varies based on the vector database that you use.

      Hologres

      • Host: the Hologres invocation information that you obtained in Step 1.

      • Database: the name of the database that you created in Step 1.

      • User: the account of the custom user that you created in Step 1.

      • Password: the password of the user that you created in Step 1.

      After you configure the preceding parameters, click Connect Hologres to check whether the connection to the Hologres instance is established as expected.

      ElasticSearch

      • URL: the private endpoint and port that you obtained in Step 1. Specify the parameter in the http:// Private endpoint: port format.

      • Index: the name of the index.

      • User: the logon name that you configured when you created the Elasticsearch instance in Step 1.

      • Password: the logon password that you configured when you created the Elasticsearch instance in Step 1.

      After you configure the preceding parameters, click Connect Elasticsearch to check whether the connection to the Elasticsearch instance is established as expected.

      AnalyticDB

      • Host: the public endpoint that you obtained in Step 1.

        Note

        If you connect to an instance that resides in the same VPC, you only need to use the internal endpoint.

      • User: the database account that you created in Step 1.

      • Database: the name of the database. You can connect to the database and view the name. For more information, see Connect to a database. image.png

      • Password: the password of the database that you created in Step 1.

      • Pre_delete: specifies whether to delete the existing database. Valid values: True (delete) and False (do not delete)

      Faiss

      • Path: the name of the index folder. Example: faiss_path.

      • Index: the name of the database folder. Example: faiss_index.

    You can also upload a configuration file on the Settings tab and click Parse Config to parse the configuration file. After the configuration file is parsed, the web UI automatically specifies the parameters based on the configuration file. Sample configuration files for vector databases:

    Hologres

    {
      "embedding": {
        "model_dir": "embedding_model/",
        "embedding_model": "SGPT-125M-weightedmean-nli-bitfit",
        "embedding_dimension": 768
      },
    
      "EASCfg": {
        "url": "http://xx.vpc.pai-eas.aliyuncs.com/api/predict/chatllm_demo_glm2",
        "token": "xxxxxxx=="
      },
    
      "vector_store": "Hologres",
    
      "HOLOCfg": {
        "PG_HOST": "hgpostcn-cn.xxxxxx.vpc.hologres.aliyuncs.com",
        "PG_PORT": "80",
        "PG_DATABASE": "langchain",
        "PG_USER": "user",
        "PG_PASSWORD": "password"
      }
    }

    In the preceding sample file, EASCfg is the service endpoint and token of the LLM service. HOLOCfg is the configuration of Hologres. You can configure the parameters based on the parameters on the web UI.

    ElasticSearch

    {
      "embedding": {
        "model_dir": "embedding_model/",
        "embedding_model": "SGPT-125M-weightedmean-nli-bitfit",
        "embedding_dimension": 768
      },
    
      "EASCfg": {
        "url": "http://xx.pai-eas.aliyuncs.com/api/predict/chatllm_demo_glm2",
        "token": "xxxxxxx=="
      },
    
      "vector_store": "ElasticSearch",
    
      "ElasticSearchCfg": {
        "ES_URL": "http://es-cn-xxx.elasticsearch.aliyuncs.com:9200",
        "ES_USER": "elastic",
        "ES_PASSWORD": "password",
        "ES_INDEX": "test_index"
      }
    }

    In the preceding sample file, EASCfg is the service endpoint and token of the LLM service. ElasticSearchCfg is the configuration of Elasticsearch. You can configure the parameters based on the parameters on the web UI.

    AnalyticDB

    {
      "embedding": {
        "model_dir": "embedding_model/",
        "embedding_model": "SGPT-125M-weightedmean-nli-bitfit",
        "embedding_dimension": 768
      },
    
      "EASCfg": {
        "url": "http://xx.pai-eas.aliyuncs.com/api/predict/chatllm_demo_glm2",
        "token": "xxxxxxx=="
      },
    
      "vector_store": "AnalyticDB",
    
      "ADBCfg": {
        "PG_HOST": "gp.xxxxx.rds.aliyuncs.com",
        "PG_USER": "xxxxxxx", 
        "PG_DATABASE": "xxxxxxx", 
        "PG_PASSWORD": "passwordxxxx"
      }
    }

    In the preceding sample file, EASCfg is the service endpoint and token of the LLM service. ADBCfg is the configuration of AnalyticDB for PostgreSQL. You can configure the parameters based on the parameters on the web UI.

    Faiss

    {
      "embedding": {
        "model_dir": "embedding_model/",
        "embedding_model": "SGPT-125M-weightedmean-nli-bitfit",
        "embedding_dimension": 768
      },
    
      "EASCfg": {
        "url": "http://xx.vpc.pai-eas.aliyuncs.com/api/predict/chatllm_demo_glm2",
        "token": "xxxxxxx=="
      },
    
      "vector_store": "FAISS",
    
      "FAISS": {
        "index_path": "faiss_index",
        "index_name": "faiss_file"
      }
    }

    In the preceding sample file, EASCfg is the service endpoint and token of the LLM service that you obtained in Step 2. index_path is the name of the database folder. index_name is the name of the index folder.

  2. On the Upload tab of the WebUI page, upload a knowledge base file and configure the required parameters. image.png

    • Chunk Size: the size of each chunk. Default value: 200. Unit: bytes.

    • Chunk Overlap: the portion of overlap between adjacent chunks. Default value: 0.

    • Files: Upload a knowledge base file based on the on-screen instructions, and then click Upload. You can upload multiple files in the TXT, DOCS, or PDF format.

    • Directory: Upload a directory that contains the knowledge base file based on the on-screen instructions, and then click Upload.

  3. On the Chat tab of the web UI page, perform intelligent dialogue.

    • You can select one of the following modes:

      • Vector Store:: The system searches answers in the vector database and returns top K results.

      • LLM: The system returns results of the LLM service.

      • Vector Store+LLM: The system inputs the prompt question and the answers that are obtained in the vector database to the LLM service in EAS and returns results.

    • Retrieval top K answers: Specifies the number of results to return from the vector database. The default value is 3.

    • Please choose the prompt template type: Select a prompt template based on the on-screen instructions.

Inference demo

In the following section, Hologres is used as an example to demonstrate the inference results. The operations of other vector databases are consistent with Hologres.

  1. Configure the required parameters on the Settings tab as shown in the following figure based on the Configure parameters for the web UI section in this topic. Check whether the connection is established as expected. image.png

  2. On the Upload tab, upload the knowledge base file based on the on-screen instructions, and then click Upload. image.png

    After you upload the file, you can view the written data and vectors in Hologres. For more information, see Manage an internal table CREATE TABLE.

  3. On the Chat tab, select a query method. The following section shows the results.

    VectorStoreimage.png

    LLM

    image.png

    Vector Store + LLM

    image.png

API calls

  1. Obtain the invocation information of the LangChain web UI service.

    1. Click the name of the LangChain web UI service that you deployed in Step 3 to go to the Service Details page.

    2. In the Basic Information section, click View Endpoint Information.

    3. On the Public Endpoint tab, obtain the endpoint and token.

  2. Call the service by using APIs. Hologres is used as an example.

    1. Upload the config_holo.json file to establish a service connection.

      cURL command

      Prepare a config_holo.json configuration file based on the Configure parameters for the web UI section and run the following command in the directory where the config_holo.json file resides:

      curl -X 'POST' '<service_url>config' -H 'Authorization: <service_token>' -H 'accept: application/json'  -H 'Content-Type: multipart/form-data'  -F 'file=@config_es.json'

      Replace <service_url> with the service endpoint that you obtained in Step 1. Replace <service_token> with the service token that you obtained in Step 1.

      Python script

      Prepare a config_holo.json configuration file based on the Configure parameters for the web UI section and run the following script in the directory where the config_holo.json file resides:

      import requests
      
      EAS_URL = 'http://chatbot-langchain.xx.cn-beijing.pai-eas.aliyuncs.com'
      
      
      def test_post_api_config():
          url = EAS_URL + '/config'
          headers = {
              'Authorization': 'xxxxx==',
          }
          files = {'file': (open('config_es.json', 'rb'))}
          response = requests.post(url, headers=headers, files=files)
          if response.status_code != 200:
              raise ValueError(f'Error post to {url}, code: {response.status_code}')
          ans = response.json()
          return ans['response']
      print(test_post_api_config())

      In the preceding example, set the EAS_URL parameter to the service endpoint that you obtained in Step 1 and the Authorization parameter to the service token that you obtained in Step 1.

    2. Upload an on-premises knowledge base file.

      cURL command

      Prepare the knowledge base file in the TXT, DOCS, or PDF format. Example: PAI.txt. Run the following command in the directory where the knowledge base file resides.

      curl -X 'POST' '<service_url>uploadfile' -H 'Authorization: <service_token>' -H 'accept: application/json'  -H 'Content-Type: multipart/form-data'  -F 'file=@PAI.txt;type=text/plain'

      Replace <service_url> with the service endpoint that you obtained in Step 1. Replace <service_token> with the service token that you obtained in Step 1.

      Python script

      Prepare the knowledge base file in the TXT, DOCS, or PDF format. Example: PAI.txt. Run the following command in the directory where the knowledge base file resides.

      import requests
      
      EAS_URL = 'http://chatbot-langchain.xx.cn-beijing.pai-eas.aliyuncs.com'
      
      
      def test_post_api_uploafile():
          url = EAS_URL + '/uploadfile'
          headers = {
              'Authorization': 'xxxxx==',
          }
          files = {'file': (open('PAI.txt', 'rb'))}
          response = requests.post(url, headers=headers, files=files)
          if response.status_code != 200:
              raise ValueError(f'Error post to {url}, code: {response.status_code}')
          ans = response.json()
          return ans['response']
      print(test_post_api_uploafile())
      # success
      

      In the preceding example, set the EAS_URL parameter to the service endpoint that you obtained in Step 1 and the Authorization parameter to the service token that you obtained in Step 1.

    3. Perform intelligent dialogue by using one of the following methods: chat/vectorstore, chat/llm, and chat/langchain.

      cURL command

      Method 1: chat/vectorstore

      curl -X 'POST' '<service_url>chat/vectorstore' -H 'Authorization: <service_token>' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"question": "What is Machine Learning Platform for AI?"}'

      Method 2: chat/llm

      curl -X 'POST' '<service_url>chat/llm' -H 'Authorization: <service_token>' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"question": "What is Machine Learning Platform for AI?"}'

      Method 3: chat/langchain

      curl -X 'POST' '<service_url>chat/langchain' -H 'Authorization: <service_token>' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"question": "What is Machine Learning Platform for AI?"}'

      Replace <service_url> with the service endpoint that you obtained in Step 1. Replace <service_token> with the service token that you obtained in Step 1.

      Python script

      import requests
      
      EAS_URL = 'http://chatbot-langchain.xx.cn-beijing.pai-eas.aliyuncs.com'
      
      
      def test_post_api_chat():
          url = EAS_URL + '/chat/vectorstore'
          # url = EAS_URL + '/chat/llm'
          # url = EAS_URL + '/chat/langchain'
          headers = {
              'accept': 'application/json',
              'Content-Type': 'application/json',
              'Authorization': 'xxxxx==',
          }
          data = {
              'question': 'What is Machine Learning Platform for AI?'
          }
          response = requests.post(url, headers=headers, json=data)
      
          if response.status_code != 200:
              raise ValueError(f'Error post to {url}, code: {response.status_code}')
          ans = response.json()
          return ans['response']
      print(test_post_api_chat())

      In the preceding example, set the EAS_URL parameter to the service endpoint that you obtained in Step 1 and the Authorization parameter to the service token that you obtained in Step 1.