AnalyticDB for PostgreSQL - Supports Hybrid Search for Dense and Sparse Vectors
Jul 15 2024
AnalyticDB for PostgreSQLContent
Target customers: all users. Features released: In most cases, vectors are classified into dense vectors and sparse vectors in vector databases. A sparse vector contains a large number of zero values. Among tens of thousands of dimensions, only few dimensions have non-zero values. When you use sparse vectors to perform search, each sparse vector represents a document, in which each dimension specifies a keyword in a dictionary or vocabulary and each dimension value specifies the importance of a keyword in the document. If the sparse vectors are generated by using the BM25 algorithm, the dimension values contain the number of keyword matches, keyword occurrence frequency, and other text relevance factors. In machine learning and natural language processing (NLP) technologies, storing vector data by using ordinary arrays or lists results in wasted storage space. In this case, sparse vectors are introduced to efficiently store and process vector data. In most cases, you can use sparse vectors to specify text, images, or other types of data. The data structure of sparse vectors can significantly reduce storage usage and computing resource usage for storing and processing high-dimensional data.