Summary

Last Updated: Sep 27, 2017

MaxCompute GRAPH is a set of iterative graph computing and processing framework. Graph computing job is modeled by graph. Graph is composed of Vertex and Edge; the vertex and edge contain weights (Value). MaxCompute GRAPH supports the following graph operations:

  • Modify the weight(value) for the vertex or edge.
  • Add/delete the vertex.
  • Add/delete the edge(s).

Note:

  • To edit the vertex or edge, user must maintan the relations between vertex and edge.
  • At present, graph function is in test stage. If you need to use it, please submit the application list and the project name in work order system. We will deal with it during 7 days.

Through iteration, edit and evolute the graph and then get the result finally. The typical apllications include: PageRank, SSSP, K-Means, etc.Use can use Java SDK provided by MaxCompute GRAPH to compile graph computing program.

Graph Data Structure

The graph processed by MaxCompute Graph module must be a directed graph, composed of vertices and edges. As MaxCompute can only provide two-dimensional table storage structure, it requires users to map graph data into a two-dimensional table and save it in MaxCompute. During graph calculation and analysis stage, user defined Graphloader is used to convert two-demensional table data into vertices and edges through MaxCompute Graph engine. As for how to map graph data into a two-dimensional table, users can make decisions according to their own business scenarios. In Example Programs, the given examples use different table formats to express the graph data structure, for your reference.

The Vertex structure can be expressed simply as < ID, Value, Halted, Edges >, which indicates the vertex identifier (ID), weight value (Value), state (Halted, which indicates whether to stop iteration) and edge set (Edges, all edge lists which take this vertex as beginning) seperately. The Edge structure can be expressed simply as <DestVertexID, Value >, which indicates destination vertex (DestVertexID) and weight value (Value).

For example, the graph shown above is composed of the following vertices:

Vertex <ID, Value, Halted, Edges>
v0 <0, 0, false, [ <1, 5 >, <2, 10 > ] >
v1 <1, 5, false, [ <2, 3>, <3, 2>, <5, 9>]>
v2 <2, 8, false, [<1, 2>, <5, 1 >]>
v3 <3, Long.MAX_VALUE, false, [<0, 7>, <5, 6>]>
v5 <5, Long.MAX_VALUE, false, [<3, 4 > ]>

Graph Programming Logic

1. Load Graph

Load graph: the framework calls user defined GraphLoader to parse input table records to vertex or edge. Distributed optimization: the framework calls user defined Partitioner to slice vertices (the default slicing logic: convert vertex ID into hash value and get modulus by Worker number) and distributes them to corresponding Workers.

For example, suppose that the worker number is 2 in the graph above, then v0 and v2 will be distributed to Worker0 because the result of ID getting modulus by 2 is 0. V1, v3 and v5 will be distributed to Worker1, because the result of ID acquiring modulus from 2 is 1.

2. Iterative calculation:

  • Iteration is a ‘SuperStep’, to traverse all vertices in non-end state (Halted value is false) or the vertices which have received messages (the vertex with end state will be awaken automatically after receiving messages.) and call the method ‘compute (ComputeContext context, Iterable messages)’.
  • In the method ‘compute (ComputeContext context, Iterable messages)’:
    • Process the messages sent by last SuperStep to current vertex;
    • Edit the graph according to actual requirements: 1) modify the value of vertex/edge; 2) send messages to certain vertices; 3) add/delete vertex or edge.
    • Aggregate information to global information through Aggregator.
    • Set the state of current vertex: end state or non-end state (Halted value is true or false).
    • During interation processes, the framework will send messages to corresponding Worker asynchronously and process them in the next SuperStep. Users do not need to care about it.

Iteration termination (only need to meet any one in the following items):

  • All vertices are in end state (Halted value is true) and new message is not generated;
  • Reach maximum number of iterations;
  • The terminate method of a certain Aggregator returns true.

Pseudo code description is shown as follows:

  1. // 1. load
  2. for each record in input_table {
  3. GraphLoader.load();
  4. }
  5. // 2. setup
  6. WorkerComputer.setup();
  7. for each aggr in aggregators {
  8. aggr.createStartupValue();
  9. }
  10. for each v in vertices {
  11. v.setup();
  12. }
  13. // 3. superstep
  14. for (step = 0; step < max; step ++) {
  15. for each aggr in aggregators {
  16. aggr.createInitialValue();
  17. }
  18. for each v in vertices {
  19. v.compute();
  20. }
  21. }
  22. // 4. cleanup
  23. for each v in vertices {
  24. v.cleanup();
  25. }
  26. WorkerComputer.cleanup();
Thank you! We've received your feedback.