Social network provides a predominant information platform, where you can add friends from People you may be interested in list provided by social networking website or app. People you may be interested in, or Friend Suggestion, suggests you to send requests to the people based on your mutual friends suggestions. Here, we use a simple example to describe how to implement the Friend Suggestion feature with MapReduce.
The following figure shows friend relationship between A, B, C, D, and E, among which solid lines shows relationship of mutual friends. So, how can we get the number of friends between two users who are not friends act as a reference of new friends suggestion?
Follow these steps:
Divide the relationships to two maps, and each map contains three friend relationships. Split each friend relationship. If the two users in the Key are friends, set the value as 0. If not, set the value as 1. Sort the results. (A B) and (B A) are regarded as the same Key (A B).
Combine the records of two maps. If the Keys of two records are the same and the Value in each record is not 0, add 1 to the Values.
Keep the records whose Value is 0 during the Combine process. Otherwise, an error may occur in obtaining results during the Reduce process.
Club the Combine results of two maps by using the Reduce method.
If the Keys of two records are same and the Value in each record is not 0, add 1 to the Values.
Delete records whose Values are 0.
Acquire the number of mutual friends between two users who are not friends. The Key shows the two users who are not friends, and the Value shows the number of mutual friends between them. The social networking websites or apps can suggest friends based on the Value.
Follow these steps to create a data table
Log on to the DataWorks console, and click Enter Workspace in the corresponding project space.
Click Data Development from the upper menu to enter the Data Development homepage. Click New > Create script or Create script.
Complete the configurations in the New Script File dialog box. Enter the file name, select ODPS SQL for the type, and click Submit. See the following figure.
The statements used for table creation are as follows:
drop table if exists dual;--Create system dual
create table dual(id bigint); --If the project does not have the pseudo table, create the table and initialize data.
insert overwrite table dual select count(*)from dual;--Initialize data to the pseudo table
--Create a data input table for Friend Suggestion MR, among which the uid shows a user and the friends shows the friends of the uid user.
create table friends_in (uid string, friends string);
--Create a data output table for Friend Suggestion MR, among which the userA shows a user, the userB shows another user, and the cnt shows the number of mutual friends between them.
create table friends_out (userA string, userB string, cnt bigint);
Click Run. When the log message returns success, it means the target table is created successfully.
Click Save to save the input SQL table creation statement.
Click Import > Import Local Data from the upper menu, and open the friends_in_data.csv local file (Click here to download).
Set all configurations as default, and view the imported data. Click Next,
The data must be imported as txt or csv files in the real work environment.
Enter friends_in in Import to Table on the Import Local Data page to import the test data of this case to the friends_in input table of Friend Suggestion, and check that the Target Field matches with the Source Field. Click Import.
The data size is large, please wait for one to two minutes.
After the data is imported, enter the following statement to check. See the following figure.
Click Resource in the left-side navigation pane, and click the Upload in the upper-right corner of the list.
Configure the information in the Upload Resource window that appears, and select the Friends_MR to be uploaded. See the following figure.
You can see the uploaded Jar package friends_mr.jar, under Resource Management in the left-side navigation pane.
Click New > New Task from the upper menu to create the MR task for this case.
Set Task Type of the new task as Node Task in the dialog box that appears, and configure as follows:
Enter all the configurations on the task page, as shown in the following figure:
Configuration item description:
- MRJar package: Click the text box, and select friends_mr.jar.
Resource: friends_mr.jar by default.
Input table: Enter friends_in.
mapper: Enter friends_mr_odps.FriendsMapper, which is the full name of Mapper class in the Jar package.
reducer: Enter friends_mr_odps.FriendsReducer, which is the full name of Reducer class in the Jar package.
combiner: Enter friends_mr_odps.FriendsReducer, which is the full name of Combiner class in the Jar package.
Output table: Enter friends_out.
Output Key: Enter userA:String, userB:String.
Output Val: Enter cnt:Bigint.
Save and Run the configured OPEN MR task, and check the status and result in Logs. See the following figure.
Enter the following SQL statement in the script file, and click Run to query the data with more than two mutual friends.
SELECT * FROM friends_out WHERE cnt>2 order by cnt desc limit 100;