Function overview

Last Updated: Dec 05, 2017

The MaxCompute client provides JAR commands to run MaxCompute Graph jobs. These command are used in the same way as to run MapReduceJAR commands. This document briefly introduces these commands.

  1. Usage: jar [<GENERIC_OPTIONS>] <MAIN_CLASS> [ARGS]
  2. -conf <configuration_file> Specify an application configuration file
  3. -classpath <local_file_list> classpaths used to run mainClass
  4. -D <name>=<value> Property value pair, which will be used to run mainClass
  5. -local Run job in local mode
  6. -resources <resource_name_list> file/table resources used in graph, separated by comma

< GENERIC_OPTIONS> can be the following parameters (all are optional):

  • -conf < configuration file >: Specifies the JobConf configuration file.

  • -classpath < local_file_list >: Indicates the class path for local implementation. It is mainly used to specify the JAR package containing the main function. The main function and Graph job are usually written in the same package, for example, in the Single Source Shortest Path (SSSP) package. Therefore, the -resources and -classpath parameters in the sample code both contain the JAR package. The difference is that -resources references the value of the Graph job and runs in a distributed environment, while -classpath references the main function and runs locally. The specified JAR package path is also a local file path. Package names are separated using system default file delimiters. Generally, the delimiter is a semicolon (;) in a Windows system and a comma (,) in a Linux system.

  • -D < prop_name > = < prop_value >: Specifies the Java attributes of < mainClass > for local implementation. Multiple attributes can be defined.

  • -local: Runs the Graph job in local mode, which is mainly used for program debugging.

  • -resources : Indicates the resource statement used for Graph job running. Generally, the name of the resource where the Graph job is located must be specified in resource_name_list. If you read other MaxCompute resources in the Graph job, the resource names must be added to resource_name_list. Resource names are separated by commas. When resources are used across projects, PROJECT_NAME/resources/ must be prefixed, for example, -resources otherproject/resources/resfile.

In addition, you can run the main function of the Graph job to directly submit the job to MaxCompute, rather than submitting the job through the MaxCompute client. The following section uses the PageRank algorithm as an example:

  1. public static void main(String[] args) throws Exception {
  2. if (args.length < 2)
  3. printUsage();
  4. Account account = new AliyunAccount(accessId, accessKey);
  5. Odps odps = new Odps(account);
  6. odps.setEndpoint(endPoint);
  7. odps.setDefaultProject(project);
  8. SessionState ss = SessionState.get();
  9. ss.setOdps(odps);
  10. ss.setLocalRun(false);
  11. String resource = "mapreduce-examples.jar";
  12. GraphJob job = new GraphJob();
  13. // Add the JAR file in use and other files to class cache resource, corresponding to resources specified by -libjars in the JAR command
  14. job.addCacheResourcesToClassPath(resource);
  15. job.setGraphLoaderClass(PageRankVertexReader.class);
  16. job.setVertexClass(PageRankVertex.class);
  17. job.addInput(TableInfo.builder().tableName(args[0]).build());
  18. job.addOutput(TableInfo.builder().tableName(args[1]).build());
  19. // default max iteration is 30
  20. job.setMaxIteration(30);
  21. if (args.length >= 3)
  22. job.setMaxIteration(Integer.parseInt(args[2]));
  23. long startTime = System.currentTimeMillis();
  24. job.run();
  25. System.out.println("Job Finished in "
  26. + (System.currentTimeMillis() - startTime) / 1000.0
  27. + " seconds");
  28. }

Input and output

MaxCompute Graph jobs must be input and output using tables. You are not allowed to customize input and output formats.

Define job input. Multiple inputs are supported:

  1. GraphJob job = new GraphJob();
  2. job.addInput(TableInfo.builder().tableName(“tblname”).build()); //Table as input
  3. job.addInput(TableInfo.builder().tableName(“tblname”).partSpec("pt1=a/pt2=b").build()); //Shard as input
  4. //Read only columns col2 and col0 of the input table. In the load() method of GraphLoader, column col2 is obtained by record.get(0), and the sequence is the same
  5. job.addInput(TableInfo.builder().tableName(“tblname”).partSpec("pt1=a/pt2=b").build(), new String[]{"col2", "col0"});

Note:

  • For more information about the job input definition, see the description of the addInput() method in GraphJob. The framework reads records in the input table and transmits them to custom GraphLoader to load data.
  • Restrictions: Shard filtering conditions are not supported temporarily. For more restrictions on applications, see Application restrictions.

Define job output. Multiple outputs are supported. Each output is marked using a label:

  1. GraphJob job = new GraphJob();
  2. //If the output table is a shard table, the last level of shards must be provided
  3. job.addOutput(TableInfo.builder().tableName("table_name").partSpec("pt1=a/pt2=b").build());
  4. // Parameter true indicates overwriting shards specified by tableinfo, that is, the meaning of INSERT OVERWRITE. Parameter false indicates the meaning of INSERT INTO
  5. job.addOutput(TableInfo.builder().tableName("table_name").partSpec("pt1=a/pt2=b").lable("output1").build(), true);

Note:

  • For more information about the job output definition, see the description of the addOutput() method in GraphJob.
  • When a Graph job runs, records can be written to an output table using the write() method of WorkerContext. Labels must be specified for multiple outputs, such as “output1” in the preceding section.
  • For more restrictions on applications, see Application restrictions.

Read resources

Add resources to the Graph program

Besides JAR commands, you can use the following two methods of GraphJob to specify resources read by Graph:

  1. void addCacheResources(String resourceNames)
  2. void addCacheResourcesToClassPath(String resourceNames)

Use resources in the Graph program

You can use the following methods of WorkerContext to read resources in the Graph program:

  1. public byte[] readCacheFile(String resourceName) throws IOException;
  2. public Iterable<byte[]> readCacheArchive(String resourceName) throws IOException;
  3. public Iterable<byte[]> readCacheArchive(String resourceName, String relativePath)throws IOException;
  4. public Iterable<WritableRecord> readResourceTable(String resourceName);
  5. public BufferedInputStream readCacheFileAsStream(String resourceName) throws IOException;
  6. public Iterable<BufferedInputStream> readCacheArchiveAsStream(String resourceName) throws IOException;
  7. public Iterable<BufferedInputStream> readCacheArchiveAsStream(String resourceName, String relativePath) throws IOException;

Note:

  • Resources are often read using the setup() method of WorkerComputer, stored in Worker Value, and obtained using the getWorkerValue() method.
  • We recommended that the preceding stream APIs be used to read and process resources simultaneously, reducing memory consumption.
  • For more restrictions on applications, see Application restrictions.
Thank you! We've received your feedback.