Developers can select an appropriate recommendation algorithm service by selecting an appropriate service type and resources in the PAI-Rec console.
1. Service type selection
Alibaba Cloud provides developers with the following three easy-to-use recommendation algorithm services:
No. | Service type | Description | Catalog price |
1 | Standard Edition | Recommendation engine configurations Service release management Metric registration and customization, and experiment reports A/B testing platform, recommendation data troubleshooting tools Consistency check tools | USD 1,231/month |
2 | Advanced Edition | In addition to the capabilities of Standard Edition, the following capabilities are available: Intelligent data diagnostics Custom recommendation algorithms (customized recommendation algorithms for recalls, feature engineering, ranking code, one-click deployment, and cold start of new items) | USD 1,538/month |
3 | Value-added capabilities | Precision marketing (recommendation control, item blocking and pining, and fixed positions) | N/A |
If developers use algorithm services with standard industry configurations or assemble algorithms by themselves, they do not need to pay additional fees. If developers require custom algorithm services based on scenarios, pipeline configuration design, model selection, or service optimization, they must pay for the customization.
If developers use the PAI-Rec service for the first time, we recommend that they use the service of Advanced Edition. Advanced Edition provides features such as data diagnostics and recommendation algorithm customization. The data diagnostics feature allows developers to analyze the validity of user features, item features, and user behavior tables, and determine the parameters of custom recommendation algorithms such as the feature and ranking model-related parameters. The recommendation algorithm customization feature helps users generate relevant code, quickly produce recall and ranking-related data and models, and deploy these models with a few clicks.
2. Resource selection
Building a complete recommendation system requires independent data modules, algorithm modules, and online pipeline modules. To do this, developers must select appropriate resources and assemble them based on their development habits and the data architectures of their existing business systems.
Based on big data development practices, we recommend that developers select the following resources:
No. | Module/Usage | Cloud service |
1 | Modeling, data cleansing, and task scheduling | Platform for AI (PAI), DataWorks, MaxCompute |
2 | Model storage | Object Storage Service (OSS) |
3 | Real-time feature storage engine Vector recall engine | Hologres GraphCompute |
4 | Online inference engine |
In addition, we recommend that developers activate the following services to ensure convenient O&M, quick data backflow, and flexible code-level development:
No. | Cloud service | Module/Usage |
1 | DataHub | Real-time log backflow, continuous update of user behaviors, model training |
2 | Simple Log Service | Request logs are pushed to Simple Log Service. You can manage the logs in the Simple Log Service console. |
3 | Container Registry |
|
Suggestions
Customers can activate different cloud products and enable different algorithms based on their business requirements. Different algorithms require different cloud products.
All the numbers of daily active users (DAU) mentioned in this topic are not a strict boundary. The selection of services and resources is determined based on whether the improvement of recommendation results can produce sufficient business value to cover the costs of the recommendation system.
The solution for a DAU of less than 50,000 is a baseline solution. The solution for a higher DAU size is based on the solution for the previous-level DAU size.
DAU size | Suggestion for recall models | Suggestion for ranking models | Suggestion for storage of user features |
Less than 50,000 | We recommend that you use the collaborative filtering algorithm named eTREC , Swing algorithm, and group-based popular item recall. We recommend that you do not use the vector recall model and do not deploy the vector recall engine. | We recommend that you use the single-objective multi-tower model. The model provides a fast inference service and consumes fewer resources of the EAS module of PAI. The subscription MaxCompute resources can be used for feature engineering, sample processing, and deep learning model training. | We recommend that you use Realtime Compute for Apache Flink to write user features to ApsaraDB for Redis. |
Greater than 50,000 | You can use vector recall. Hologres can be used to store features and query vectors. | You can use a multi-objective ranking model. | You can use GraphCompute to store rapidly changing user features. |
Greater than 200,000 | You can perform real-time inference of user vectors. You can use the built-in Facebook AI Similarity Search (FAISS) library of EAS for vector search. You can use item cold start algorithms such as the collaborative metric learning algorithm. | We recommend that you use incremental training to save training costs. We recommend that you add real-time item features. | You can use GraphCompute. |
Greater than 500,000 | If frequent activities affect the performance of the recommendation system, you can add online learning solutions. Realtime Compute for Apache Flink can be used to concatenate samples in real time for online learning of models. The online models are updated multiple times every day. | You can use subscription MaxCompute resources. |
For scenarios with a lot of new items
We recommend that you use the item cold start algorithm to recommend new items in a more reasonable manner.
For scenarios where the recommendations of a specified item or items of a specified category need to be controlled
We recommend that you use the recommendation control algorithm to adjust the number of exposures and the percentage of exposures for specific items, item sets, and item categories.
Other suggestions
For scenarios with a large number of user features: You can store user features in Tablestore.
For the EAS module of PAI: We recommend that you configure scheduled scale-out for peak hours and configure automatic scale-in to reduce resources during off-peak hours.
For the scoring service on the EAS module of PAI: You can use subscription resources together with elastic scaling resources.
For scenarios where social connections are found among users: We recommend that you use GraphCompute to manage fellowships and friendships among users. Graph algorithms such as the GraphSAGE algorithm can be used for recommendation scenarios where social connections are found among users.