This article is the fifth article of the Get Started with Data + AI series published by Alibaba Cloud ApsaraDB. This series of articles describes the application scenarios of the Data + AI solution in various industries based on real customer cases and best practices.
This article discusses how to use the Alibaba Cloud Data + AI solution to implement intelligent data query services based on the actual scenarios of DingTalk AI assistants so that everyone can have their dedicated data analysts, greatly improving the efficiency of data query and analysis.
DingTalk is an enterprise-level communication tool launched by Alibaba Group to provide enterprises with an efficient and secure mobile office platform. It offers a variety of functions, such as instant messaging, video conferencing, file management, and attendance punch-in, to help enterprises implement collaborative office services across departments and regions. At the DingTalk 7.5 product conference with the theme of "My Super Assistant", an AI assistant product created based on the needs of 700,000 enterprises was officially released. This product further lowers the threshold for AI use, allowing everyone to easily create their own AI assistant.
The intelligent data query function of DingTalk AI assistant helps you query and analyze business data in areas such as sales, business travel, and personnel across multiple application scenarios after accessing business data. Users can pose questions freely based on their data deposited in the DingTalk. The officially preset instruction center helps users quickly get started with the correct way of asking questions at a low threshold. Through conversational data AI, combined with knowledge graph, natural language understanding, and other capabilities, intelligent data query provides enterprises with functions such as intelligent Q&A, intelligent recommendation, and early-warning attribution. It helps users search for data quickly, interpret data intuitively, and mine data intelligently, thus realizing that everyone has a dedicated data analyst and greatly improving the efficiency of data query and analysis.
In intelligent data query scenarios, it is a difficult issue to accurately locate entities (such as filiale names, department names, and proper nouns) during the Q&A process. For example, when the manager inputs in natural language "Help me check the performance of product xx in East China in the third quarter", which filiales does East China include? Another example is "Check the Q1 performance of the Product Department" while the full name of the Product Department within the enterprise is the Product Design and Management Department. Also, the product SKU has special codes within the enterprise but large models cannot recognize these codes. In general, it is very difficult to provide AI services that meet the needs of enterprises in combination with enterprise-specific data.
Therefore, the AnayticDB for PostgreSQL vector search engine is used to vectorize more than 1 billion core enterprise entities (enterprise names, department names, employee names, and proper nouns). Vector search recalls the most accurate enterprise entities for questions randomly entered by enterprises, and then combines with large models to provide services such as intelligent Q&A and intelligent data query, which greatly improves the AI assistant's recognition of entities and the accuracy of large models.
Although large models can answer questions of universality, they cannot cover enterprise-specific knowledge in some vertical fields and cannot guarantee the timeliness of data updates, making it difficult to apply large model applications in enterprises. Enterprises can use the DMS + AnayticDB for PostgreSQL vector search engine to build an enterprise-specific knowledge base, which can embed structured, semi-structured, and unstructured data and store data in the AnayticDB for PostgreSQL. Combined with the large model inference service, the knowledge base integrates enterprise private data into large model applications such as intelligent Q&A, intelligent data query, and intelligent creation. The steps for building an enterprise-specific large model knowledge base are as follows.
1) Data preprocessing: Before vectorization, unstructured documents and images need to be preprocessed, including document/image parsing and segmentation. The quality of preprocessing greatly affects the recall and accuracy of Q&A.
2) Embedding: The embedding algorithm of a large model vectorizes the preprocessed data blocks and stores the results in a vector database.
3) Vector retrieval: The large model performs vector retrieval and approximation calculation in the vector database after vectorizing the user's question. It also combines structured conditional filtering to limit permissions and scope.
4) Query recall: The large model conducts inference on the results of vector retrieval and returns the answer closest to the question. Semantic retrieval may not cover all aspects so users can use full-text retrieval to supplement the answer.
Enterprises can create AnayticDB for PostgreSQL dedicated instances on the public cloud to store their own data. Enterprises orchestrate data processes through DMS to implement business logic by using ChatBI and deploy NL2SQL models in private domains. This meets the highest security requirements of different enterprises. Combined with AnayticDB for PostgreSQL methods such as row-level and column-level access control, dynamic data desensitization, data encryption, and SQL audit, ChatBI maximizes the security of enterprise data. This allows enterprises to leverage benefits brought by large model application services without worrying about the security of data in private domains.
You can use only one SQL statement to integrate structured data analysis, vector analysis, and full-text retrieval to implement multimodal retrieval.
Integrating with DMS, AnalyticDB for PostgreSQL uses OneMeta and OneOps to deploy and implement global data management, data development, model inference services, and the open-source Dify framework for end-to-end Data + AI process orchestration.
• It supports streaming import of vector data, index compression, transactions, and various similarity algorithms.
• It provides higher write throughput and query performance than similar products.
• DMS + X provides a complete set of API operations including document parsing, chunking, embedding, vector approximation calculation, and retrieval, allowing users to quickly deploy their services.
• It provides out-of-the-box Data + AI capabilities based on DMS and one-click deployment of Dify. It allows you to build enterprise-specific large models and vector databases within 10 minutes and quickly build enterprise-level Gen-AI applications.
• It supports service solutions such as image-based search and text-based search.
• Out-of-the-box: Automatically identify user database metadata for out-of-the-box analysis.
• Integration of large and small models: Analyze user intent through large models and generate precise SQL statements through small models to implement more accurate services.
• Enterprise-specific data security assurance: Deploy end-to-end data links and inference services in the private domain to ensure data security. The NL2SQL model developed by DMS provides three levels of accuracy.
• Sustainable performance optimization: Combine continuous learning, history labeling, and RAG intervention to improve accuracy. Currently, three levels of NL2SQL model capabilities are available.
Service Level | Suggested Scenarios | Accuracy |
---|---|---|
Level 1 (out of the box) | User experience and tests | 65% |
Level 2 (dedicated deployment) | Production | 85% |
Level 3 (dedicated optimization) | Dedicated high-accuracy production | 95% |
DingTalk AI Assistant uses the AnalyticDB vector search engine to create an enterprise-specific knowledge base and combines it with large model inference services to integrate enterprise-specific data into applications such as intelligent Q&A, intelligent data query, and intelligent creation. It also orchestrates data processes through DMS to realize ChatBI orchestration of business logic and deployment of boutique NL2SQL models in private domains, meeting the highest security requirements of different enterprises for data to be stored within the domain. DingTalk AI assistant has served thousands of customers, covering retail, Internet, logistics, transportation, and other industries.
Data + AI provides a new way for enterprise growth. Enterprises must recognize the importance of Data + AI and use it as the key solution to promote intelligent transformation. By integrating Data + AI into the core business, enterprises can better mine data value and optimize operation processes and decision-making mechanisms, thus promoting intelligent transformation and significantly improving their market competitiveness.
In the future, with the help of the LLM workflow that can be customized and orchestrated by Alibaba Cloud Data + AI solution and other continuously provided solutions, it will be possible to expand the application of intelligent data query and solve various problems of enterprise operation through large model solutions, so as to improve the operation efficiency, accelerate the intelligent transformation of enterprises, and bring new impetus to enterprise development.
MongoDB 8.0 Debuts on Alibaba Cloud Discover Enhanced Performance and Security!
ApsaraDB - February 7, 2025
ApsaraDB - February 18, 2025
ApsaraDB - October 29, 2024
ApsaraDB - November 28, 2022
ApsaraDB - November 17, 2020
Alibaba Clouder - May 20, 2020
Follow our step-by-step best practices guides to build your own business case.
Learn MoreAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.
Learn MoreMore Posts by ApsaraDB