System monitoring configuration and stability improvement guide - Artificial Intelligence Recommendation

To ensure the high availability, high performance, and stability of your recommendation system in a production environment, follow these configuration and operation recommendations.

System monitoring and alarm configuration
- Configure an alarm to send a message to DingTalk or a mobile phone if the recommendation system's response time (RT) exceeds a specified threshold within a short period, such as one minute.
Recommendation engine deployment
- Configure and test the recommendation engine in the staging environment. Verify that the recommendation diagnosis feature works correctly. If the recommendation results are irrelevant to user behaviors, such as clicks and purchases, it indicates a potential system issue. For example, the system might display popular items at the top instead of relevant recommendations.
- Next, configure a consistency check. After you confirm that the features are consistent, you can deploy the engine to the production environment.
Sorting model warmup
- Set the following parameter in model_config: warmup_data_path: '/warmup'.
- Send requests from the recommendation result diagnosis page. TorchEasyRec records the requests as PB files in the path specified by warmup_data_path.
- Warmup on restart: The system reads the requests from the path specified by warmup_data_path and automatically resends them.
  - When the model is updated daily, warmup files already exist, so a manual warmup is not required.
- Other parameters:
  - warmup_pb_files: The number of online requests to save as PB files. The default value is 64.
  - warm_up_count: The number of warmup iterations for each PB file. The default value is 20.
  - num_warm_threads: The size of the concurrent warmup thread pool. The default value is 4.
Sorting model service
- For the TorchEasyRec sorting model service, set the NO_GRAD_GUARD parameter to 1 to disable gradient calculation.
Recommendation engine configuration
- The BatchCount parameter for the fine-grained sorting algorithm model has a default value of 100. This parameter determines the number of items scored by PAI-EAS in each request. A larger value results in slower scoring for each request. For example, if the value is set to the default of 100 and a sorting pass includes 1,000 candidate items, the items are split into 10 separate requests to the scoring service.
Scaling the sorting model
- First, stress test the queries per second (QPS) of the new model. Then, based on the test results, set the number of service instances for the new PAI-EAS service. Finally, switch the traffic to the new model.
- Scheduled auto scaling: Schedule a scale-out operation before daily peak QPS. For example, if peak hours start at 8:00 PM, you can begin scaling out 30 minutes in advance, assuming the scale-out can be completed within that time. After peak hours, you can schedule a scale-in operation.
- Horizontal auto scaling: Add a scale-out policy based on metrics such as CPU and GPU utilization. For example, you can trigger a scale-out when CPU utilization exceeds 50%. You can also base the policy on the peak QPS of a single instance, which is determined by stress testing.
- When you switch sorting experiment traffic from r1 to r2, scale out the PAI-EAS resources for the r2 service in advance.
Scaling the recommendation engine
- See the description above.
Deploying reranking logic
- If you use complex reranking logic, such as logic from custom engine development, you must perform stress testing. This helps you determine whether to scale out the engine service resources.
Sorting model degradation
- Prepare a simpler sorting model, such as a baseline multi-tower model configured using experiment configuration and decision-making. Keep this model offline or set its traffic to 0. If the primary sorting model is under heavy load and cannot be scaled out, you can switch a portion of the traffic to this simpler model.
- Alternatively, you can prepare an experiment that contains only a collaborative filtering algorithm and no fine-grained model sorting pipeline. For more information, see How to build a related recommendation scenario for a product page using PAI-Rec.
Client-side engine fallback (Required)
- On the client side, you must implement a simple recommendation feature, such as real-time collaborative filtering or popular item retrieval. If the response from PAI-Rec times out or is empty, you can use the results from this client-side feature.
Set a timeout fallback to prevent online issues
- Set a timeout for calls to the PAI-Rec recommendation engine. If the number of results is insufficient, you can supplement or replace them. For more information, see Popular Item Retrieval.