PageRank

Last Updated: Oct 31, 2017

PageRank is a typical algorithm used to calculate the web page ranking. In the input directed graph G, vertices indicate web pages. If a link exists between web pages A and B, an edge connecting A and B exists. The basic principle of the algorithm is as follows:

  • Initialization: The vertex value indicates the rank value (of the double type) of PageRank. In the initial phase, the value of all vertices is 1/TotalNumVertices.

  • Iteration formula: PageRank(i) = 0.15/TotalNumVertices + 0.85 x sum. Sum indicates the sum of PageRank(j)/out_degree(j). (j indicates all vertices pointing to vertex i.)

The basic principle shows that the algorithm is applicable to solutions using the MaxCompute Graph program. Each vertex j maintains the value of PageRank. PageRank(j)/out_degree(j) is sent to the adjacent vertex (for voting) in each iteration. In the next iteration, each vertex recalculates the PageRank value using the iteration formula.

Source code

  1. import java.io.IOException;
  2. import org.apache.log4j.Logger;
  3. import com.aliyun.odps.io.WritableRecord;
  4. import com.aliyun.odps.graph.ComputeContext;
  5. import com.aliyun.odps.graph.GraphJob;
  6. import com.aliyun.odps.graph.GraphLoader;
  7. import com.aliyun.odps.graph.MutationContext;
  8. import com.aliyun.odps.graph.Vertex;
  9. import com.aliyun.odps.graph.WorkerContext;
  10. import com.aliyun.odps.io.DoubleWritable;
  11. import com.aliyun.odps.io.LongWritable;
  12. import com.aliyun.odps.io.NullWritable;
  13. import com.aliyun.odps.data.TableInfo;
  14. import com.aliyun.odps.io.Text;
  15. import com.aliyun.odps.io.Writable;
  16. public class PageRank {
  17. private final static Logger LOG = Logger.getLogger(PageRank.class);
  18. public static class PageRankVertex extends
  19. Vertex<Text, DoubleWritable, NullWritable, DoubleWritable> {
  20. @Override
  21. public void compute(
  22. ComputeContext<Text, DoubleWritable, NullWritable, DoubleWritable> context,
  23. Iterable<DoubleWritable> messages) throws IOException {
  24. if (context.getSuperstep() == 0) {
  25. setValue(new DoubleWritable(1.0 / context.getTotalNumVertices()));
  26. } else if (context.getSuperstep() >= 1) {
  27. double sum = 0;
  28. for (DoubleWritable msg : messages) {
  29. sum += msg.get();
  30. }
  31. DoubleWritable vertexValue = new DoubleWritable(
  32. (0.15f / context.getTotalNumVertices()) + 0.85f * sum);
  33. setValue(vertexValue);
  34. }
  35. if (hasEdges()) {
  36. context.sendMessageToNeighbors(this, new DoubleWritable(getValue()
  37. .get() / getEdges().size()));
  38. }
  39. }
  40. @Override
  41. public void cleanup(
  42. WorkerContext<Text, DoubleWritable, NullWritable, DoubleWritable> context)
  43. throws IOException {
  44. context.write(getId(), getValue());
  45. }
  46. }
  47. public static class PageRankVertexReader extends
  48. GraphLoader<Text, DoubleWritable, NullWritable, DoubleWritable> {
  49. @Override
  50. public void load(
  51. LongWritable recordNum,
  52. WritableRecord record,
  53. MutationContext<Text, DoubleWritable, NullWritable, DoubleWritable> context)
  54. throws IOException {
  55. PageRankVertex vertex = new PageRankVertex();
  56. vertex.setValue(new DoubleWritable(0));
  57. vertex.setId((Text) record.get(0));
  58. System.out.println(record.get(0));
  59. for (int i = 1; i < record.size(); i++) {
  60. Writable edge = record.get(i);
  61. System.out.println(edge.toString());
  62. if (!(edge.equals(NullWritable.get()))) {
  63. vertex.addEdge(new Text(edge.toString()), NullWritable.get());
  64. }
  65. }
  66. LOG.info("vertex edgs size: "
  67. + (vertex.hasEdges() ? vertex.getEdges().size() : 0));
  68. context.addVertexRequest(vertex);
  69. }
  70. }
  71. private static void printUsage() {
  72. System.out.println("Usage: <in> <out> [Max iterations (default 30)]");
  73. System.exit(-1);
  74. }
  75. public static void main(String[] args) throws IOException {
  76. if (args.length < 2)
  77. printUsage();
  78. GraphJob job = new GraphJob();
  79. job.setGraphLoaderClass(PageRankVertexReader.class);
  80. job.setVertexClass(PageRankVertex.class);
  81. job.addInput(TableInfo.builder().tableName(args[0]).build());
  82. job.addOutput(TableInfo.builder().tableName(args[1]).build());
  83. // default max iteration is 30
  84. job.setMaxIteration(30);
  85. if (args.length >= 3)
  86. job.setMaxIteration(Integer.parseInt(args[2]));
  87. long startTime = System.currentTimeMillis();
  88. job.run();
  89. System.out.println("Job Finished in "
  90. + (System.currentTimeMillis() - startTime) / 1000.0 + " seconds");
  91. }
  92. }

Code description

The source code of PageRank includes the following parts:

  • Row 55: Defines the PageRankVertexReader class, loads a graph, and resolves each record in the table into a vertex. The first column of the record is the start vertex and other columns are the destination vertices.

  • Row 23: Defines PageRankVertex, where:

    • The vertex value indicates the current PageRank value of the vertex (web page).

    • The compute() method uses the iteration formula PageRank(i) = 0.15/TotalNumVertices + 0.85 x sum to update the vertex value.

    • The cleanup() method writes the vertex and its PageRank value to the result table.

  • Row 88: Runs the main program (main function), defines GraphJob, and specifies the implementation of Vertex/GraphLoader, the maximum number of iterations (30 by default), and input and output tables.
Thank you! We've received your feedback.