In the social networking field, large amounts of data need to be stored, and data is used in varied forms. The current mainstream social services use a recommendation module to push news, messages, and articles to suitable users. This topic describes the social networking architecture.
Social networking architecture and service flow
Posting and reading posts after logon
After a user logs on, the system records the geolocation information of the user and stores the information in ApsaraDB for HBase after the information is processed by GeoHash.
When the user sends a post, the system writes the post to ApsaraDB for HBase. The system can also write the post to the recommendation module immediately after it is sent so the user can see the post.
When the user browses posts, the system records the post information and stores the browsing history in ApsaraDB for HBase. The system then analyzes the data and sends the analysis result to the user characteristics - user persona module.
User characteristics - user persona
After a user registers, the user may specify their interests. This data forms the initial user characteristics.
After the user browses around, the system generates browsing history and writes the data to ApsaraDB for HBase.
In the evening, the system activates Spark or run user-created code to analyze the browsing history and adjust the tags for the user persona.
Finally, the system creates a user characteristics - user persona table.
Recommendations and feeds
Typically, the system uses the push mode and pull mode based on whether recommendations and feeds are performed by post or user. The push mode causes write amplification, while the pull mode causes read amplification. ApsaraDB for HBase is based on Log-Structured Merge Tree (LSM), which is more applicable to the push mode.
After a post or article is created, the system stores a copy in ApsaraDB for HBase.
At the same time, the system calculates the attributes of the post, classifies the post, and extracts the eigenvalues of the post.
The system writes the matched users into the push table based on the eigenvalues and the user characteristics - user persona table. Depending on businesses, the logic may be very complex. The process includes operations such as adding location factors, weights, and a list of friends to follow, and removing inactive users.
The system creates a table for posts recommendation, which stores a large amount of data whose size ranges from about 1 TB to 100 TB. The data in the table normally expires in about 3 to 4 days.
For the information posted by VIP users or blast messages, we recommend that the system use the pull mode to significantly reduce write amplification. Typically, the push mode and pull mode are used together in actual businesses.
Posts and news query
After a user launches an app, the user can browse the latest recommendations.
For the information pushed by VIP users or blast push information, the system provides queries separately.
The system queries all post IDs in the posts recommendation table, and then queries the actual post information, which is cached in Redis or CDN. If no data is matched, the system queries tables in ApsaraDB for HBase.
Geolocation-based friend radar
The system recommends users with similar interests based on the stored geolocation information of a user.
The system searches for users nearby based on the geolocation information, queries the user persona table to find the users with similar interests, and then pushes them to the target user.