As AI technology evolves, data infrastructure has become central to AI applications. ApsaraDB for SelectDB is a high-performance, real-time analytical database designed for the AI era. It combines text search, vector search, AI functions, and MCP intelligent interaction to build an all-in-one AI data stack for data storage, retrieval, and analysis. As a high-performance, low-cost, and easy-to-integrate solution, ApsaraDB for SelectDB supports scenarios such as Lakehouse for AI, semantic search, hybrid search and analytics, retrieval-augmented generation (RAG), Agent Facing Analytics, and AI observability.
Lakehouse for AI
Scenario: AI model development involves stages such as Data Preparation, feature engineering, and model evaluation, which typically require processing massive amounts of data. In traditional architectures, data must be frequently moved between data lakes and analytics engines. The data lakehouse architecture combines the open storage of data lakes with real-time analytics engines. This lets you complete the entire AI development process on a single platform, which eliminates data silos and speeds up development iterations.
Applications in the AI development process:
Large-scale data preparation: Use the efficient data processing capabilities of ApsaraDB for SelectDB to filter, sample, and clean petabyte-scale data in data lakes and quickly build high-quality training datasets.
Real-time feature engineering: Use the real-time analytics capabilities of ApsaraDB for SelectDB to perform online feature extraction, transformation, and aggregation. This provides real-time feature services for model training and inference.
Model and data quality evaluation: Quickly perform multidimensional analysis on test datasets and online data to continuously monitor model performance and data drift.
ApsaraDB for SelectDB advantages:
Data lakehouse architecture: Builds an open data lakehouse based on open lake table formats, such as Iceberg and Paimon, and catalogs. This unifies the management of analytical data and AI data.
High-speed SQL engine: As a real-time analytics engine, it supports interactive search and lightweight extract, transform, and load (ETL). This provides high-speed SQL computing for Data Preparation and feature engineering.
Seamless data forwarding: Directly read from and write to data lakes without data migration. This allows for unified data management at the storage layer and flexible acceleration at the computation layer.
Semantic search
Scenario: Semantic search uses vectorization technology to capture the deeper meaning of text. It can retrieve semantically related content even if the user's search query does not exactly match the document content. This technology is crucial for scenarios such as cross-language retrieval, synonym recognition, and intention recognition. It can significantly improve search recall rates and the user experience.
Typical applications:
Corporate document retrieval: Employees describe problems in natural language. The system understands the intent and retrieves semantically related policies, procedures, and knowledge from a large volume of documents.
E-commerce product search: A user enters "breathable shoes for summer". The system understands the request and retrieves relevant products, rather than just matching keywords.
Content recommendation platform: Makes intelligent recommendations based on the semantic similarity of articles and videos. It discovers content that users might be interested in but that uses different wording.
ApsaraDB for SelectDB advantages:
High-performance vector search: Supports HNSW and IVF algorithms to deliver sub-second responses for hundreds of millions of vectors. This meets the needs of large-scale semantic search.
Enhanced hybrid search: Combines semantic search and keyword filtering in a single SQL statement. This balances the breadth of semantic retrieval with the precision of keyword matching.
Multimodal extension: Supports semantic search for not only text but also multi-modal content such as images and audio.
Flexible quantization optimization: Uses SQ/PQ quantization techniques to significantly reduce storage and computation costs while maintaining search precision.
Hybrid search and analytics
Scenario: The value of semi-structured and unstructured data, such as customer reviews, chat records, and production logs, is becoming more important in business decision-making. This creates challenges for traditional analytics solutions. Hybrid search and analytics combines full-text index, vector search, and structured data analysis on a single platform. This allows for both semantic search and multidimensional aggregation and analysis.
Typical applications:
Customer insights: Combines review text retrieval with user behavior analysis to accurately identify customer needs and satisfaction trends.
Smart manufacturing: Integrates full text search of production logs, device image recognition, and Internet of Things (IoT) metric analysis to enable fault prediction and quality optimization.
Internet of vehicles: Combines in-vehicle signal data analysis, user feedback text mining, and driving behavior vector search to enhance the smart cockpit experience.
ApsaraDB for SelectDB advantages:
Unified architecture: Handles structured analysis, full-text index, and vector search on a single platform. This eliminates the need for data migration and integration with heterogeneous systems.
Hybrid query performance: Supports vector similarity search, keyword filtering, and aggregation and analysis in a single SQL statement, delivering excellent query performance.
Flexible schema support: The VARIANT type natively supports dynamic JSON structures. Light Schema Change lets you modify fields and indexes in seconds.
Full-stack optimization: Provides end-to-end optimization from inverted indexes and vector indexes to the Massively Parallel Processing (MPP) execution engine. This balances search precision and analysis efficiency.
Retrieval-augmented generation (RAG)
Scenario: RAG provides context to large language models (LLMs) by retrieving relevant information from external knowledge bases. This can effectively reduce model hallucinations and address the problem of outdated knowledge. A vector database is a core component of a RAG system. It must be able to quickly retrieve the most relevant document fragments from a massive knowledge base and support high-concurrency user queries.
Typical applications:
Corporate knowledge base: Builds an AI chat system based on internal documents and manuals. This lets employees quickly obtain accurate answers using natural language.
Intelligent customer service assistant: Combines a product knowledge base with historical cases to provide precise response suggestions for customer service representatives or chatbots.
Intelligent document assistant: Quickly locates relevant content within large document collections to assist with research, writing, and decision-making.
ApsaraDB for SelectDB advantages:
High-concurrency performance: The distributed architecture supports high-concurrency vector search, effectively handling concurrent access from many users.
Hybrid search capability: Supports vector similarity search and keyword filtering in a single SQL statement. This balances semantic retrieval and exact matching.
Scalability: Search performance scales linearly as you scale out the cluster. It supports smooth scaling from millions to tens of billions of vectors.
All-in-one solution: Manages vector data, original documents, and business data on a single platform. This simplifies the data architecture for RAG applications.
Agent Facing Analytics
Scenario: With the rise of AI Agent technology, an increasing number of analytical decisions will be automated by AI. Unlike traditional manual analysis, Agent Facing Analytics requires data platforms to have excellent real-time performance and high-concurrency capabilities. Data queries must be completed in milliseconds to support decision-making and meet the concurrent access demands of a massive number of Agents.
Typical applications:
Real-time fraud detection
Intelligent ad delivery
Personalized recommendation
ApsaraDB for SelectDB advantages:
Sub-second data latency: Supports real-time data ingestion and updates to ensure that Agent decisions are based on the latest data.
Millisecond-level query response: Delivers an average query latency of less than 100 ms to meet the real-time decision-making needs of Agents.
High QPS concurrency: Supports over 10,000 queries per second (QPS) to handle concurrent queries from a massive number of Agents.
Native Agent integration: Seamlessly connects with AI Agents through MCP Server to simplify the development and integration process.
AI observability
Scenario: The training and operation of AI models generate massive amounts of log, metric, and trace data. As a key part of AI infrastructure, AI observability systems analyze this data to pinpoint issues and continuously optimize performance. These systems must handle multiple challenges, such as high-throughput writes for petabyte-scale data, millisecond-level search responses, and cost control.
Typical applications:
Model training monitoring: Tracks training metrics and resource consumption in real time to quickly identify training anomalies and performance bottlenecks.
Inference service tracing: Records the complete path of each inference request to analyze latency sources and error patterns.
AI application log analysis: Performs full-text index and aggregation and analysis on massive application logs to support troubleshooting and behavior insights.
ApsaraDB for SelectDB advantages:
High performance: Supports continuous data writes of petabytes per day (10 GB/s). It uses an inverted index to accelerate log retrieval and achieve second-level responses.
Cost optimization: Achieves a high compression ratio of 5:1 to 10:1, which can save 50% to 80% on storage costs. It also supports low-cost storage for cold data.
Flexible schema: Light Schema Change lets you modify fields in seconds. The VARIANT type natively supports dynamic JSON structures.
Ecosystem-friendly: Compatible with the OpenTelemetry and ELK ecosystems. It also supports integration with mainstream visualization tools such as Grafana and Kibana.