This article demonstrates a case of image recognition, cashless payment, biometric clock-in, and feature recognition. Such cases are common in almost all industries, including the Internet, new retail, transportation, smart buildings, education, gaming, medical care, social networks, and public security. Let's take a look at some examples:
This solution has the following disadvantages:
This solution has the following advantages:
Note: This is an acceleration solution for database vector searching and does not involve the extraction of image feature values (for converting images to high-dimensional vectors). Image feature values can be extracted at the application layer.
Currently, the PASE plug-in of ApsaraDB RDS for PostgreSQL supports two popular vector index algorithms: IVFFlat and HNSW. In the future, it will continue to integrate cutting edge vector index algorithms in the industry.
The IVFFlat algorithm
The HNSW algorithm
For more information about the PASE plug-in, refer to the official documentation for ApsaraDB RDS for PostgreSQL.
1) ApsaraDB RDS for PostgreSQL supports index retrieval for high-dimensional vectors (by using the PASE plug-in). This enables highly efficient similarity matchup searching for image vectors. A single request takes only milliseconds to complete.
2) The high-dimensional vector retrieval function can be used not only in image searches but also in any feature search that can be digitized, such as feature searches for user profiling and similar people selection in marketing systems.
3) ApsaraDB RDS for PostgreSQL supports searches through a combination of indexes. Therefore, the combined filtering by vector conditions and other common query conditions can be achieved at the same time, substantially improving the performance.
4) ApsaraDB RDS for PostgreSQL satisfies the high concurrency requirement of image recognition and image search applications in industries such as the Internet, new retail, transportation, smart buildings, education, gaming, medical care, social networks, and public security. In addition, these databases meet the high concurrency requirement of similar people selection in marketing systems. Compared with the general MySQL solution, the acceleration solution of ApsaraDB RDS for PostgreSQL using the PASE vector index plug-in is far more advantageous. It is a cost-effective and highly efficient solution for image recognition, image search, and similar people selection.
5) With this solution, the performance is improved by 2,457,900% on average, and the response time is reduced to milliseconds.
The preceding comparison data comes from the actual operations of one million images in a quad-core 8-GB RDS database instance.
Currently, the ApsaraDB RDS for PostgreSQL version that supports this function is V11.
In the future, this function will be supported by ApsaraDB RDS for PostgreSQL V10 and later.
For more information about this function, see this guide
Prerequisites include the following operations:
1) Purchase an ApsaraDB RDS for PostgreSQL V11 instance.
2) Set up a whitelist.
3) Create a user.
4) Create a database.
Step 1) Create a test table.
Step 2) Create a function to generate random vectors for simulating image feature values. In real-life scenarios, use actual image feature values.
Step 3) Write one million random vectors into the table.
Step 4) Return the queried one million records to the client.
Step 5) Conduct the concurrency capability test.
Step 1) Create the PASE vector index plug-in.
Step 2) Create a test table.
Step 3) Create a function to generate random vectors for simulating image feature values. In real-life scenarios, use actual image feature values.
Step 4) Write one million random vectors to the table.
Step 5) Create a vector index by using the HNSW algorithm. The PASE plug-in currently supports two types of indexing: IVFFlat and HNSW. For more information about actual use, see the topic about the PASE plug-in in the official documentation for ApsaraDB RDS for PostgreSQL. The index parameters must be set correctly. Pay special attention to ensuring that the dimension is consistent with the actual dimension.
After creating an index, when image feature values are updated or new values are added in the future, the index will be automatically updated and no additional index needs to be created.
Step 6) Assign random values to a vector, query the five vectors that are most similar to the vector, and return them sequentially based on their vector distances.
Step 7) Conduct the concurrency capability test.
Sample simulation query:
The test result:
The following table represents the case environment:
|Database||Computing specifications||Storage specifications|
|MySQL 8.0||Quad-core, 8 GB||1,500-GB ESSD|
|PostgreSQL V11||Quad-core, 8 GB||1,500-GB ESSD|
The following table shows the performance comparison:
|Case (1 million images, 80-dimensional vectors)||Solution 1: The general solution for MySQL and PostgreSQL, with all records, returned to the application layer for computing.||Solution 2: ApsaraDB RDS for PostgreSQL that supports image search in the database by using the PASE plug-in||Increased by|
|The response speed of a single query||61.45 seconds||2.5 milliseconds||2457900%|
|Concurrent queries per second||0.05533||1,056||1908449%|
ApsaraDB - November 16, 2020
ApsaraDB - February 13, 2021
dehong - July 8, 2020
Alibaba Clouder - March 27, 2020
Alibaba Clouder - September 13, 2018
digoal - February 3, 2020
An intelligent image search service with product search and generic search features to help users resolve image search requests.Learn More
This technology can assist realizing quantitative analysis, speeding up CT image analytics, avoiding errors caused by fatigue and adjusting treatment plans in time.Learn More
Fully managed and less trouble database servicesLearn More
An online MPP warehousing service based on the Greenplum Database open source programLearn More
More Posts by digoal