All Products
Search
Document Center

Platform For AI:Overview

Last Updated:Mar 18, 2026

Manage and share feature data for machine learning and AI training. Ensure consistency between offline and online data.

What is FeatureStore?

FeatureStore stores and manages feature data for offline and online services. It integrates with DataHub, Flink, Hologres, Tablestore, and FeatureDB, a feature database for search and recommendation.

Applications receive behavioral logs and real-time item and user properties from DataHub, synchronize them to MaxCompute, process them with Flink, and write the results to an online store through FeatureStore. Recommendation engines, user growth applications, and risk control applications then call the FeatureStore SDK to access feature data in the online store.

The following figure shows how data is ingested from MaxCompute and DataHub, processed for feature calculation and model sample management, and published to an online store for client applications.

image

Terms

  • Feature entity: Named collection of feature tables. For example, in a recommendation scenario, set two feature entities: user and item.

  • Feature view: Group of features with information about derived features. It maps an offline feature table to an online feature table.

  • Join ID: Field in a feature table that associates a feature view with a feature entity. Each feature entity has a Join ID to link features from multiple feature views.

    Note

    Each feature view has a primary key, or index key, to retrieve feature data. The index key name can differ from the Join ID name.

    For example, in a recommendation scenario, set the Join ID to user_id and item_id, which are the primary keys of the user and item tables.

  • Label table: Stores labels for model training. It contains the model training target and the Join ID of the feature entity. In recommendation scenarios, it is usually obtained from a behavior table using operations such as group by user_id/item_id/request_id.

Use cases

  • Recommendation systems and ad sorting: Centrally manage user and item features, including browsing history, purchase records, and personas. Real-time feature read and write capabilities improve model performance and increase ad delivery accuracy.

  • Search engine sorting: Feature data includes keyword match degree, click-through rate (CTR), and sales volume. Train a sorting model that sorts recall results from search engines such as Elasticsearch or OpenSearch. Request the scoring service of a TensorFlow model in EAS to provide users with accurate and personalized search results.

  • User growth or risk control: Manage feature data such as user personal information, transaction behavior, and credit records. Combine this data with machine learning models (XGBoost, GBDT) to perform risk assessment. This improves risk control accuracy and efficiency.

  • Offline KV data synchronization to online store: Manage feature data such as product attribute tables and user attributes. This simplifies scheduling tasks for synchronizing offline data to an online store.

Features

Diverse data sources

Manage the entire process from features to models. Register and manage feature tables from multiple offline and online data sources:

  • Offline store: MaxCompute

  • Online stores: FeatureDB, Hologres, and Tablestore

Benefits of registering a feature table in FeatureStore:

  • Automatic synchronization: Automatically build online and offline tables to ensure consistency between online and offline data.

  • Cost savings: Store features once and share among multiple teams to reduce resource costs.

  • Improved efficiency: Complete complex operations, such as exporting training tables or importing data to an online database, with a single line of code.

Management of offline and real-time features

Manage offline feature views and real-time feature views. Offline features include attribute features and statistical features of users and items. Real-time features include new users or items published and written directly to an online store (such as Hologres) through Flink. They also include features statistically analyzed based on time windows, such as clicks, forwards, purchases, and conversion rate within one hour.

Real-time statistical features and user sequence features

Model feature complexity and real-time requirements increase over time. Manage real-time statistical features and user behavior sequence features computed by Flink in real time. Define offline user sequence features, such as the sequence of item IDs that a user has clicked. However, item ID sequence alone is insufficient. Models often use item attribute features (SideInfo). Transmitting SideInfo online over the network consumes substantial data. In EasyRec, the FeatureStore SDK caches item features to reduce inference response time and improve inference performance.

Automatic feature association and model sample export

Manage generated samples using PAI-FeatureStore. When a model uses features from a real-time feature view, use the Create Model Feature feature to automatically generate correct samples based on real-time feature update information recorded in FeatureDB. This feature automatically associates real-time features without requiring a callback interface in the PAI-Rec engine.

Feature sharing

When an algorithm or BI developer creates a new set of user or item features, design a new ModelFeature to associate the new and old features required by the training dataset. Export samples for offline training using the FeatureStore SDK. Publish the samples to an online store for online services. When multiple models reference the same feature view, only one copy is stored online. This benefits algorithm engineering, especially when adding features to iteratively optimize a model.

Multi-language SDKs

FeatureStore provides Go, Java, and Python SDKs. These SDKs help you use features in the joint solution of PAI-REC and EasyRecProcessor. Use the Java SDK to call EasyRecProcessor or other model scoring engines from your own server-side engines (search, recommendation, risk control). Use the Python SDK to access data in online stores for data analytics, modeling, and other tasks.

Feature generation SDK

Feature generation refers to defining and generating features. Define features using a Python script, execute the script to produce required features, then register the features on the PAI-FeatureStore platform. The feature generation SDK is an independent, open source code based on MaxCompute SQL. It reduces feature generation complexity. The implementation process uses day-level intermediate data. When you use 30-day behavioral data to calculate user preference statistical features, you significantly save compute resources.

EasyRec recommendation engine integration

FeatureStore integrates deeply with EasyRec and TorchEasyRec. It supports efficient feature engineering (FG) and model training. Deploy models directly online to EasyRec Processor and TorchEasyRec Processor to build high-performing recommendation systems quickly. EasyRec provides memory cache for item feature tables and offers efficient model scoring.

The FeatureStore Cpp SDK integrated into EasyRec Processor is optimized for large-scale scenarios. Benefits:

  • Memory usage: The built-in FeatureStore Cpp SDK in EasyRec Processor optimizes feature storage. It saves 50% of memory compared to native memory caching. Savings are more significant when processing many features, reducing resource consumption.

  • Feature pull time: Offline feature views quickly cache features to memory cache. This is more than 5 times faster than using online data sources. It increases speed and reduces pressure on online data sources. The high stability of the offline data source lets you scale out to hundreds of EAS instances simultaneously. Each instance loads all features within minutes. Scaling out does not put significant pressure on the online store.

  • Model scoring time: Model scoring extracts features in real time from the optimized cache. With specific optimizations of the FeatureStore Cpp SDK, tp100 performance improves significantly. Scoring stability is enhanced, and timeouts are reduced.

How it works

  • Connect to offline and online storage products to enable unified reading, writing, and management of offline and online feature data.

  • Register offline and online feature tables in feature views to aggregate and map feature data.

  • Store label tables in the offline store MaxCompute and register them through the offline data source. The registered label table maps to the actual label table data.

  • Use the Join ID of a feature entity to associate feature views across projects and link all features of the entity. Combine them with label tables to produce model feature tables (Train Set tables) and store them in MaxCompute.

image

Regions and zones

Available regions:

Area

Region

Asia-Pacific

  • China (Hangzhou)

  • China (Shanghai)

  • China (Beijing)

  • China (Shenzhen)

  • China (Hong Kong)

  • Singapore

  • Indonesia (Jakarta)

Europe and America

  • Germany (Frankfurt)

  • US (Silicon Valley)

  • US (Virginia)

Procedure

  1. Create a data source. Data sources include offline stores and online stores.

  2. Create a project. Create feature entities, feature views, and label tables to produce a model feature train set table (training dataset).

  3. Run a data synchronization task to synchronize offline data to an online store.

  4. After you start the synchronization task, view the task status and details in Task Hub.

  5. To read and use online data in a Java or Go online service, join DingTalk group 34415007523 to contact technical support.