×
Community Blog One-Click Fitting: Online Retrieval of AnalyticDB Vector for Taobao AI Fitting Room Technology

One-Click Fitting: Online Retrieval of AnalyticDB Vector for Taobao AI Fitting Room Technology

This article introduces how AnalyticDB for MySQL provides high-dimensional vector and low-latency online vector retrieval services for Taobao's AI fitting room.

As Taobao continues to flourish, online shopping has seamlessly integrated into our daily lives. It offers the convenience of exploring a vast array of products from home and comparing prices across multiple vendors. However, it also brings some problems.

We've likely all experienced the uncertainties of buying clothes online. When browsing, we rely on information like price, style, and size to make a selection that might fit us. But what will the clothes look like on us? How should we pair them for the best look? Should we opt for a ladylike or a more rugged, dystopian style? After all, Taobao's models seem to make anything look fabulous.

To address these concerns, Taobao Clothing has collaborated with Tongyi Laboratory's XR Lab to introduce their groundbreaking AI fitting room, powered by robust generative AI technology.

By searching ifashion within the Taobao app, you can upload your photos and enjoy the AI fitting experience.

gif

The AI fitting room feature is supported by AnalyticDB for MySQL, a cloud-native data warehouse of Alibaba Cloud ApsaraDB, which offers an online high-dimensional vector retrieval service with low latency. The following sections will describe the technology.

1. Technology behind Taobao AI Fitting Room

Search ifashion on the Taobao app to access the AI fitting room, a collaborative innovation between Taobao Clothing and Tongyi Laboratory's XR Lab, marrying artificial intelligence with fashion. The AI fitting room uses data on the measurements of hundreds of millions of users and a vast array of clothing images to recommend outfits tailored to your fit. An essential service here is quickly retrieving the right images from a massive pool of product photos. To enhance the recall rate and support online services that demand low latency and high concurrency, images are embedded and stored as multi-dimensional vector feature values.

1

In this scenario, cloud-native AnalyticDB for MySQL supports searching by image, providing the fitting room with images that meet specific similarity requirements. AI fitting large model training requires multiple different images of each model. At present, we only ask users to provide one image. First, we cut out the models' upper body images from the product SKU or the detailed image library through algorithm recognition. Then, based on the main image provided by the user, we perform a similarity search on the extracted body images to generate different images for algorithm model training. These images meet the requirements in terms of quantity and similarity, and we may add some deterministic conditions as input. The following figure briefly describes the process.

2

AnalyticDB for MySQL provides the search by image feature, which allows you to search for similar images based on product image categories, attributes, or other similar images. In addition to supporting similarity searches, AnalyticDB also integrates with structured data, including multi-table association queries.

Suppose we are searching for product images that are similar to the input image and were put on the shelves within the past three months, with prices ranging from 200 to 300 yuan. For demonstration purposes, assume that the vector length is eight-dimensional.

1.1 Data Model

The vector search feature provided by cloud-native AnalyticDB for MySQL is easy to use. It stores vector feature values in an array data type. This array type supports four value types: byte, smallint, int, and float. If dealing with a large dataset, a vector index can be defined for the feature column to enhance search speed. Managing vector feature columns and vector indexes is akin to the DDL operations for standard columns and indexes. They can be set up during table creation or added later using the ALTER TABLE statement.

The simplified data table of clothing images is defined as follows.

CREATE TABLE products (
  product_id BIGINT COMMENT 'Product ID',
  gmt_create DATETIME COMMENT 'Creation time',
  gmt_modified DATETIME COMMENT 'Modification time',
  image_url VARCHAR COMMENT 'Product image address',
  price FLOAT COMMENT 'Product price',
  document JSON COMMENT 'Knowledge document in the JSON structure',
  status INT COMMENT 'Document status: 1 approved, 0 pending approval, -1 unapproved',
  feature ARRAY <float>(8) COMMENT 'Product image vector result',
  PRIMARY KEY (product_id, gmt_create),
  ANN INDEX idx_feature(`feature`)
) DISTRIBUTE BY HASH(product_id) PARTITION BY VALUE(`date_format(gmt_create, '%Y%m')`) LIFECYCLE 36 INDEX_ALL = 'Y';

1.2 Data Preparation

Real-time data writing into the database and batch import are supported. In the following example, a piece of test data is inserted by using the INSERT INTO statement.

INSERT INTO products (product_id, gmt_create, price, image_url, feature)
VALUES(6, NOW(), 288.00, 'https://xxx/img6.jpg', '[0.83891445,0.50359607,0.9299093,0.19440076,0.5789051,0.12121256,0.6587046,0.86555034]');

1.3 Data Search (Vector Retrieval)

AnalyticDB for MySQL supports integrated queries. The following conditions must be met:

  • Condition 1: Specify the top 5 image similarity and sort them by similarity. KNN + Top-K.
  • Condition 2: The price is from 200 to 300 yuan.
  • Condition 3: The material was created within the last three days.

Conditions 2 and 3 are structured data scalar value calculations, and condition 1 is an unstructured similarity calculation. The business scenario expects that the three conditions can be met in one engine at the same time to improve efficiency and reduce maintenance costs. AnalyticDB can easily support this scenario and is easy to use.

SELECT product_id, l2_distance(feature, '[0.83891440,0.50359607,0.9299093,0.19440070,0.5789051,0.12121256,0.6587046,0.86555034]') as dis, image_url, price, document
FROM products
WHERE l2_distance(feature, '[0.83891445,0.50359607,0.9299093,0.19440076,0.5789051,0.12121256,0.6587046,0.86555034]') < 10  -- Set similarity threshold to exclude images that are significantly different
  AND gmt_create > DATE_SUB(NOW(), INTERVAL 90 DAY) -- Within the past 90 days
  AND price between 200.00 and 300.00 -- From 200 to 300 yuan
ORDER BY l2_distance(feature, '[0.83891445,0.50359607,0.9299093,0.19440076,0.5789051,0.12121256,0.6587046,0.86555034]')
LIMIT 5;  -- Top 5 similar images

In addition to real-time OLAP multi-dimensional analysis and search, AnalyticDB for MySQL also provides the vector retrieval feature, which supports the AI fitting room scenario of the Taobao app. It solves the problem of engine redundancy by integrating the search capabilities of structured data and unstructured data, making it suitable for multi-mode mixed-load search scenarios while reducing the use and O&M costs of vectors.

2. Experience and Summary

In the e-commerce industry, image search enables users to find similar products by just taking a photo of what they see. In the gaming sector, this feature can help better understand the emotions and attitudes of players, allowing for targeted optimization and improvement to enhance the gameplay experience. Additionally, in the field of intelligent customer service, combining enterprise knowledge with Large Language Model semantic comprehension enables the implementation of intelligent customer service. The successful deployment of these intelligent applications is heavily reliant on vector databases. Vector databases are widely used in various scenarios, such as searching for images by text or image, and identifying music by audio.

The technology has been widely applied, for example:

  1. Image search. You can search for images that are similar to a specified image.
  2. Video search. You can search for video images that are similar to a specified video image.
  3. Recommendation system. Suitable features are recommended based on user characteristics.
  4. Text search. You can search for texts that are similar to a specified text based on semantics.
  5. Q&A chatbots that are built in combination with large language models.

3

Generally, vector indexes are constructed to achieve a fast search of feature vectors. Vector indexes belong to ANNS (Approximate Nearest Neighbor Search). They are different from the equivalent of numbers or the term matching of strings; they are also distinguished from LIKE or approximate matching in full-text retrieval. In fact, they search through the similarity of unstructured data to the greatest extent.

ANNS vector indexes may be divided into different types depending on how they are implemented. The two major categories are the graph-based index and the quantization-based index, where the former is mainly HNSW and RNSW, and the latter is mainly PQ, FLAT, SQ8, and SQ8H. At present, there are mainly two practical methods in the industry to make it more convenient to apply ANNS vector indexes in the production environment. One is to make ANNS vector indexes service-oriented separately to provide vector index creation and search capabilities, thus making them part of the artificial intelligence service system. The other is to integrate the capabilities of ANNS vector indexes with traditional structured databases so that you can use simple SQL to complete complex structured data searches. Meanwhile, the indexes can integrate structured and unstructured queries.

4

AnalyticDB for MySQL is a cloud-native data warehouse product developed by the Alibaba Cloud database team. It supports the integration of vector and structured data for efficient retrieval in scenarios involving various query conditions. The data warehouse and lakehouse of AnalyticDB provide storage for unstructured data and offer common database search services. By leveraging deep learning networks, it can convert unstructured data into vectors and enable vector-based similarity searches.

AnalyticDB for MySQL integrates and optimizes the vector search engine. The algorithm combines the HNSW (Hierarchical Small World Graph) algorithm with the multi-version Product Quantization (PQ) coding algorithm. This allows for real-time addition, deletion, query, and modification of vector indexes in different scenarios. The database is user-friendly with the integrated vector search engine. The research paper on the vector search engine technology used by AnalyticDB has been published in VLDB. It primarily focuses on the implementation of the HNSW algorithm, PQ algorithm, and corresponding optimization strategies in AnalyticDB.

5

3. Outlook

• Vector embeddings support plug-ins and provide serverless and function vector embedding services. You can select different embedding models based on the business scenarios.
• Provides configurable parameters of the similarity retrieval rate, allowing for a more autonomous and flexible configuration between QPS and retrieval rate.

0 1 0
Share on

ApsaraDB

377 posts | 57 followers

You may also like

Comments

ApsaraDB

377 posts | 57 followers

Related Products