Community Blog chat with pdf using AnalyticDB for PostgreSQL of Alibaba cloud best practise

chat with pdf using AnalyticDB for PostgreSQL of Alibaba cloud best practise

chat with pdf using ADBPG of Alibaba cloud best practise

Chat with PDF Using Alibaba Cloud AnalyticDB for PostgreSQL
Tags: Database, AnalyticDB, PostgreSQL, Chatbot, Tutorils, AnalyticDB for PostgreSQL, LLM
Abstract: This article will explain how to build a Chatbot that can handle PDF files step by step.

This article will explain how to build a Chatbot that can handle PDF files step by step. If you prefer a faster deployment method or are interested in how the whole thing works or how it can be achieved, you can refer to the following one-click pull-up solution and complete it in several minutes:

Link: https://www.alibabacloud.com/blog/llm-chatbot-powered-by-llmchatgpt-%2B-langchain-%2B-analyticdb-pg-docs_600259


This is how it looks like in a live environment. The chat with the pdf gets the input of the pdf and converts it to text format. Once the PDF document is converted into a text format, LLM or other AI models can analyze the text and extract relevant information. For example, LLM can be trained on a dataset of PDF documents related to a specific topic (such as finance or healthcare) and used to identify key concepts or trends within the text.

We also have an AnalyticDB for PostgreSQL (ADBPG) free trial available.

In order to read the text from the PDF document, LLM needs to tokenize the text first, which involves breaking it down into individual words or phrases. Then, the model processes the tokens and uses them to generate a vector representation of the text, which can be stored in a vectorstore or other data structure.

The vector representation of the text can be used for a variety of tasks (such as classification, clustering, or similarity analysis). For example, ChatGPT could use the vector representation of a PDF document to identify other documents with similar content or classify the document into a specific category based on its content.


How Does It Work?

Let's use LangChain with adb-pg as vectorstore:

  1. Convert the PDF Documents to Text Format: As mentioned earlier, you need to use OCR software or other text extraction tools to convert the PDF documents into a text format that can be processed by ChatGPT.
  2. Tokenize the Text: Once you have the text from the PDF documents, you need to tokenize the text into individual words or phrases. This involves breaking down the text into meaningful units that can be processed by ChatGPT.
  3. Generate Vector Representations of the Text: You can useChatGPT or another language model to generate vector representations of the text. These vectors can be used to represent the meaning and context of the text in a high-dimensional space.
  4. Store the Vector Representations in Postgres: You can store the vector representations of the text in Postgres as a vectorstore. This allows you to efficiently query and retrieve similar documents or perform other types of vector-based operations.
  5. Use LangChain for Q & A of the Document


Here are the required cloud components in Alibaba Cloud:


Step 1. Cloud Resources

1.1 Create ECS with Security Group 8501 Open



Note: If you have a VPC setup, use it. If not, please create one.


Create Security Group


1.2 Create AnalyticDB for PostgreSQL with fastann Enabled


This will take around 10–15 mins.

Get the public access endpoint:


Create an admin account:


eg: username: aigcpostgres and password: alibabacloud666


Create a database with the name: aigcpostgres


Please refer to this link for more information about DMS.

Add whitelist IP to


Step 2. env init


apt update && apt install git -y && apt install unzip -y && apt install docker-compose -y && apt install postgresql -y


Step 3. Install Packages

git clone https://github.com/daviddhc20120601/chat-with-pdf.git && cd chat-with-pdf/


Step 4. Run the Docker

cp .devops/Dockerfile . && docker build . -t haidonggpt/front:1.0 && docker run -d -p 8501:8501 haidonggpt/front:1.0


Step 5. Insert Your Token and Get Started


Note: My token and credentials are invalidated and revoked, but it shows a case to help readers understand what everything looks like and where to put it.

  • ChatGPT token:


  • adbpg host name:gp-gs5inp2dl746742muo-master.gpdbmaster.singapore.rds.aliyuncs.com


  • port: 5432


  • database name: aigcpostgres


  • adb pg username: aigcpostgres
  • adb pg password: alibabacloud666
0 1 0
Share on

harold c

4 posts | 3 followers

You may also like


harold c

4 posts | 3 followers

Related Products