Summary

Last Updated: May 09, 2018

MaxCompute Graph is a processing framework designed for iterative graph computing. MaxCompute Graph jobs use graphs to build models. Graphs are composed of vertices and edges.

Vertices and edges contain values. MaxCompute Graph supports the following graph editing operations:

  • Modify the value of a vertex or edge.
  • Add/delete a vertex.
  • Add/delete an edge.

Note:

When editing a vertex and an edge, you must maintain their relationship.

This process outputs a final solution after performing iterative graph editing and evolution. Typical applications include PageRank, SSSP algorithm, and Kmeans algorithm.

You can use Java SDK, an interface provided by MaxCompute Graph, to compile graph computing programs.

Graph Data Structure

Graphs processed by MaxCompute Graph must be directed graphs consisting of vertices and edges. As MaxCompute only provides a two-dimensional storage structure, you must resolve graph data into two-dimensional tables and store them in MaxCompute.

During graph computing analysis, use custom GraphLoader to convert two-dimensional table data to vertices and edges in the MaxCompute Graph engine. You can determine how to resolve graph data into two-dimensional tables based on your service scenarios. In the sample code, the table formats correspond to different graph data structures.

The vertex structure can be described as < ID, Value, Halted, Edges >, which respectively indicate the vertex ID (ID), value (Value), status (Halted, indicating whether an iteration is to be stopped), and edge set (Edges, indicating lists of all edges starting from the vertex). The edge structure can be described as < DestVertexID, Value >, which respectively indicate the destination vertex (DestVertexID) and value (Value).

For example, the preceding figure consists of the following vertices.

Vertex
v0 <0, 0, false, [ <1, 5 >, <2, 10 > ] >
v1 <1, 5, false, [ <2, 3>, <3, 2>, <5, 9>]>
v2 <2, 8, false, [<1, 2>, <5, 1 >]>
v3 <3, Long.MAX_VALUE, false, [<0, 7>, <5, 6>]>
v5 <5, Long.MAX_VALUE, false, [<3, 4 > ]>

Graph program logic

Graph loading

Graph loading: The framework calls custom GraphLoader and resolves records of an input table to vertices or edges.

Distributed architecture: The framework calls custom Partitioner to partition vertices and distributes them to corresponding Workers. (Default partitioning logic: Calculate the hash value of a vertex ID and perform the modulo operation on the number of Workers.)

For example, assume in the preceding figure that the number of Workers is 2. v0 and v2 are allocated to Worker 0 because the result of the ID mod 2 is 0. v1, v3, and v5 are allocated to Worker 1 because the result of the ID mod 2 is 1.

Iteration calculation

  • An iteration is called a superstep. It traverses all vertices in the non-halted status (the value of Halted is false) or all vertices that receive messages (a vertex in halted status is automatically woken up after receiving a message), and calls their compute (ComputeContext context, Iterable messages) method.
  • You can follow these steps on your implemented compute (ComputeContext context, Iterable messages) method:
    • Process messages sent from the last superstep to the current vertex.
    • Edit a graph as required:
      • Modify the vertex/edge value.
      • Send messages to some vertices.
      • Add/delete a vertex or edge.
    • Use Aggregator to collect information to global information.
    • Set the current vertex to the halted or non-halted status.
    • During iteration, the framework asynchronously sends messages to the corresponding Worker and processes the messages in the next superstep without your intervention.

Iteration termination (only if any of the following conditions is met)

If any of the following conditions is met, iteration becomes terminate.

  • All vertices are in the halted status (the value of Halted is true) and no new message is generated.
  • The maximum number of iterations is reached.
  • The terminate() method of an Aggregator returns true.

The pseudocode is described as follows.

  1. // 1. load
  2. for each record in input_table {
  3. GraphLoader.load();
  4. }
  5. // 2. setup
  6. WorkerComputer.setup();
  7. for each aggr in aggregators {
  8. aggr.createStartupValue();
  9. }
  10. for each v in vertices {
  11. v.setup();
  12. }
  13. // 3. superstep
  14. for (step = 0; step < max; step ++) {
  15. for each aggr in aggregators {
  16. aggr.createInitialValue();
  17. }
  18. for each v in vertices {
  19. v.compute();
  20. }
  21. }
  22. // 4. cleanup
  23. for each v in vertices {
  24. v.cleanup();
  25. }
  26. WorkerComputer.cleanup();
Thank you! We've received your feedback.