Vector analysis of AnalyticDB for PostgreSQL is developed based on the massively parallel processing (MPP) query architecture. It is implemented as follows: AnalyticDB for PostgreSQL uses an AI algorithm to extract the features of unstructured data, and then uses a feature vector to uniquely identify the unstructured data. The distance between vectors is used to measure the similarities between unstructured data. Vector analysis allows you to use SQL statements to retrieve unstructured data and perform association analysis between unstructured data and structured data.

Typical scenarios

Vector analysis of AnalyticDB for PostgreSQL helps you develop intelligent applications such as:

  • Search by image: Search for images similar to a given image.
  • Voiceprint recognition: Search for audio similar to given audio.
  • Semantics-based text retrieval and recommendation: Searches for texts similar to a given text.
  • File deduplication: Remove duplicate files based on the fingerprint of a given file.
  • Product image analysis: Analyze which images contain the same product among a large number of images.

As an advanced feature of AnalyticDB for PostgreSQL, vector analysis has served a number of Alibaba businesses, such as Data Mid-End, Alibaba e-commerce new retail businesses, and Alibaba Cloud City Brain.

Typical architecture

Figure 1. Example of vector analysis on unstructured data based on AnalyticDB for PostgreSQL
Example of vector analysis on unstructured data based on AnalyticDB for PostgreSQL
  • A web application uses a feature extraction service to extract feature vectors of unstructured data (such as texts, images, and audio) and writes the vectors into the vector library in AnalyticDB for PostgreSQL.
  • During retrieval, the web application first uses the feature_extractor function to extract vectors from unstructured data, and then performs queries by invoking the query and analysis interface for vector analysis of AnalyticDB for PostgreSQL.