Feature Database (FeatureDB) is a database service provided by FeatureStore of Platform for AI (PAI). It serves as an online data store for FeatureStore, providing online feature storage and high-performance read/write optimization for search, recommendation, and advertising scenarios. This topic describes what is FeatureDB, along with its features and benefits.
What is FeatureDB
FeatureDB is a high-performance distributed database provided by FeatureStore. It supports data in KV and KKV formats, and stores arrays as Array type and KV as Map type. The stored structured data of Array and Map types provides higher performance for subsequent reading, writing, and inference services. FeatureDB fully supports the production, update, and consumption links of offline features and real-time features, as well as user behavior sequence features.
How to activate
You can activate the service by following the prompts on the interface when creating an FeatureDB data store.
Features
FeatureDB has implemented the following features and optimizations for FeatureStore feature reading:
Reading and writing KV and KKV type features.
Reading and writing MaxCompute complex type features (Array, Map).
Pulling all feature data under a FeatureView.
Millisecond-level polling to update real-time feature data.
Second-level TTL, automatically cleaning expired data.
Pay-as-you-go billing based on actual read and write.
FeatureDB can shard the data of a FeatureView and adjust the number of shards to meet the read and write performance requirements in different scenarios. It also supports replicas to ensure data stability and security. The number of shards will be based on the provided Estimated Order of Magnitude:
Less than 10 million (default): 5 shards.
Between 10 million and 100 million: 10 shards.
More than 100 million: 20 shards.

Benefits
Cost-effective
For customers with smaller feature storage requirements, using FeatureDB can reduce costs.
Meets high-frequency update requirements
When using real-time statistical features, real-time features need to be updated every few seconds to the storage of multiple EasyRec Processor (model inference service) instances. FeatureDB can meet these high requirements for high-frequency updates.
Supports complex type features
In search and promotion businesses, Array and Map type features, user behavior long sequence features, and their SideInfo are widely used. If complex type features are stored as strings, they need to be serialized into Map type when used, which reduces performance.
FeatureDB supports storing complex type data and synchronizing MaxCompute 2.0 complex type data to FeatureDB for high-performance read operations.
Supports elastic scaling
For larger-scale customers, the number of shards can be flexibly increased according to the feature view to improve read and write performance.
Solves monitoring blind spots
When integrating third-party data sources, monitoring the entire data link becomes difficult, especially for real-time features. FeatureDB can monitor key performance indicators such as read and write QPS, RT, data update latency, and storage usage at the view granularity level.
Functions
VPC direct connection
FeatureDB provides VPC direct connection based on PrivateLink. After successful configuration, you can then use FeatureStore SDK in your VPC to access FeatureDB through a private connection based on PrivateLink, thereby improving read and write data performance and reducing access latency.
You can configure VPC direct connection through one of the following methods.
Method 1: If you have no FeatureDB data store, click Create Source on the Store tab. When creating a FeatureDB data source, configure VPC, Zone and vSwitch, and Security Group Name in the VPC Direct Connection Configurations section. For specific instructions, see Online data store: FeatureDB.
Method 2: If you have already created a FeatureDB data source, click feature_db on the Store tab. On the page that appears, click VPC Direct Connection Configurations. Specify VPC, Zone and vSwitch, and Security Group Name, and click OK.
Notes
The VPC setting cannot be modified after it is set. Make sure that the VPC you configure is the VPC where your online service using FeatureStore resides.
We recommend that you deploy your service in the following zones to avoid network latency and improve performance.
Area
Region
Recommended zone
Asia Pacific
China (Hangzhou)
Zone G
China (Shanghai)
Zone L
China (Beijing)
Zone F
China (Shenzhen)
Zone F
China (Hongkong)
Zone B
Singapore
Zone C
Europe and Americas
Germany (Frankfurt)
Zone A
United States (Silicon Valley)
Zone B
Zone and vSwitch: Make sure that you select a vSwitch in the zone where your online service instance resides. We recommend that you select vSwitches in at least two zones for high availability and stability.
After you confirm, you cannot modify or delete the configurations. You can only add vSwitches in other zones.
Write features
For offline features, you can use FeatureStore Python SDK to run scheduled tasks through DataWorks to synchronize data from MaxCompute to FeatureDB.
For real-time features, you can write feature data directly using Java SDK.
// Configure regionId, Alibaba Cloud account, FeatureStore project
Configuration configuration = new Configuration("cn-beijing",
Constants.accessId, Constants.accessKey,"fs_demo_featuredb" );
// Configure FeatureDB username, password
configuration.setUsername(Constants.username);
configuration.setPassword(Constants.password);
// If using public network to connect to FeatureStore, refer to the domain information above
// If using VPC environment, no need to set
//configuration.setDomain(Constants.host);
ApiClient client = new ApiClient(configuration);
// If using public network connection, set usePublicAddress = true, VPC environment does not need to set
// FeatureStoreClient featureStoreClient = new FeatureStoreClient(client, Constants.usePublicAddress);
FeatureStoreClient featureStoreClient = new FeatureStoreClient(client );
Project project = featureStoreClient.getProject("fs_demo_featuredb");
if (null == project) {
throw new RuntimeException("project not found");
}
FeatureView featureView = project.getFeatureView("user_test_2");
if (null == featureView) {
throw new RuntimeException("featureview not found");
}
List<Map<String, Object>> writeData = new ArrayList<>();
// Simulate constructing data to write
for (int i = 0; i < 10; i++) {
Map<String, Object> data = new HashMap<>();
data.put("user_id", i);
data.put("string_field", String.format("test_%d", i));
data.put("int32_field", i);
data.put("int64_field", Long.valueOf(i));
data.put("float_field", Float.valueOf(i));
data.put("double_field", Double.valueOf(i));
data.put("boolean_field", i % 2 == 0);
writeData.add(data);
}
for (int i = 0; i < 100;i++) {
featureView.writeFeatures(writeData);
}
// This only needs to be called once, if all data is written, ensure all writes are completed, after calling this interface, writeFeatures cannot be called again
featureView.writeFlush();
For real-time feature writing, the entire data row will be updated by default. If the written data only contains some fields, the unwritten fields will be set to empty. If you want to update only the written fields and merge them with the original data, you can make the following settings:
Use Java SDK: Specify InsertMode.PartialFieldWrite.
for (int i = 0; i < 100;i++) { featureView.writeFeatures(writeData, InsertMode.PartialFieldWrite); }Use Flink Connector: Set insert_mode to partial_field_write.
Read features
You can use FeatureStore SDK (Go/Java) or EasyRec Processor to read features.
FeatureStore SDK (Go/Java) supports KV point queries for offline/real-time features. By specifying the JoinID (primary key) value and feature name, you can complete key-value (KV) queries within milliseconds to obtain the target feature data. FeatureStore SDK (Go/Java) also supports KKV queries for behavior sequence features. By specifying the UserID value, you can query the assembled sequence feature results.
EasyRec Processor has integrated FeatureStore Cpp SDK, which supports pulling all feature data from FeatureDB into memory and supports millisecond-level polling to update real-time feature data to memory, thereby achieving higher performance reading.
Metrics
If you use FeatureDB as an online data source, after creating a feature view, click Data Monitoring on the right side of the target view to view metrics such as read and write QPS and RT for that view.
Real-time feature link
The storage service provided by FeatureStore mainly includes three parts: Feature Service (access layer), MSMQ (DataHub), and FeatureDB.
In real-time features, users can call the feature service through FeatureStore Java SDK or Flink Connector to write feature data to FeatureDB. Data written through feature service will also be synchronized to the user's MaxCompute table, which can be used for real-time feature sample export and further model training.
For feature data stored in FeatureDB, users can read it through FeatureStore's Java/Go SDK, or pull all features through EasyRec Processor and store them in local cache for higher performance reading. For real-time features, the latest feature information can be obtained at the millisecond level.
Real-time feature lifecycle
When creating a real-time feature view, you can specify Feature Lifecycle for the FeatureDB table. When the survival time of a row reaches the lifecycle, it will be automatically cleaned up within seconds.

You can specify the survival time using the following methods:
Method 1: Do not set an Event Time field. In this case, the survival time will be calculated based on the data's write time.
Method 2: Check Event Time for a feature field. The unit is milliseconds. Assuming event_time is the value of Event Time, time_now is the current time, and time_ttl = time_now - ttl is the event_time when the data should start to expire. The specific handling methods for written feature data are:
If using PartialFieldWrite mode for partial field update writing, the survival time will be based on the actual data write time.
event_time > time_now + 15min: Data will not be written. (This prevents timestamp differences between different systems, allowing a 15-minute buffer)
time_ttl < event_time <= time_now + 15min: Data is written normally, survival time is calculated starting from event_time, and the row of data is automatically cleaned up after reaching the lifecycle.
0 < event_time < time_ttl: Data will be automatically cleaned up after being written. Note that the unit of event_time is milliseconds. If the value of your Event Time field is in seconds, it will fall into this case, causing the data to not be written successfully.
event_time <= 0: Survival time is calculated based on the actual data write time.
Invalid value (cannot be converted to integer): Data will not be written.
Registered an Event Time field but did not pass in the value of the Event Time field: Data is written normally, survival time is calculated based on the actual data write time.
No Event Time field: Data is written normally, survival time is calculated based on the actual data write time.
Additionally, in FeatureDB, the value of event_time is used the ts for this row of data, which means that if you need to update data corresponding to a key, the value of the Event Time field needs to be equal or greater than the previous value for this row of data to be updated. If the new event_time < the original event_time value, the data corresponding to this key will not be updated.
Performance testing
The following is an example of performance test results for reading FeatureDB data using FeatureStore Go SDK. The feature table data selected is user-side data in a recommendation scenario, with a total of 17, 689, 586 rows in the feature table. The test machine has 4 cores and 8GiB memory. The test results are for reference only.
VPC direct connection configured, and online service in recommended zone:
Number of feature fields (columns)
Number of keys read (rows)
Average latency
TP95
TP99
260
1
0.89 milliseconds
1.20 milliseconds
1.45 milliseconds
260
10
1.17 milliseconds
1.52 milliseconds
1.87 milliseconds
260
50
1.91 milliseconds
2.56 milliseconds
2.92 milliseconds
260
100
2.87 milliseconds
3.58 milliseconds
3.93 milliseconds
260
200
4.43 milliseconds
5.25 milliseconds
5.80 milliseconds
VPC direct connection configured, but online service in non-recommended zone:
Number of feature fields (columns)
Number of keys read (rows)
Average latency
TP95
TP99
260
1
2.54 milliseconds
2.86 milliseconds
3.15 milliseconds
260
10
2.75 milliseconds
3.12 milliseconds
3.56 milliseconds
260
50
3.95 milliseconds
4.75 milliseconds
5.19 milliseconds
260
100
4.82 milliseconds
5.66 milliseconds
6.21 milliseconds
260
200
6.84 milliseconds
7.75 milliseconds
8.25 milliseconds
VPC direct connection not configured:
Number of feature fields (columns)
Number of keys read (rows)
Average latency
TP95
TP99
260
1
3.62 milliseconds
3.83 milliseconds
4.27 milliseconds
260
10
3.82 milliseconds
4.11 milliseconds
4.61 milliseconds
260
50
4.54 milliseconds
5.19 milliseconds
5.60 milliseconds
260
100
5.40 milliseconds
6.13 milliseconds
6.56 milliseconds
260
200
7.15 milliseconds
7.93 milliseconds
8.47 milliseconds
Billing
For more information, see Billing of FeatureStore.