MaxCompute does not provide Graph development plug-in, but users can still develop MaxCompute graph program based on Eclipse. The development flow to be suggested, as follows:
- Write Graph code and use local mode to complete basic testing.
- Do cluster debugging, to verify the results.
This section will describe how to develop and debug Graph program by using Eclipse, taking SSSP algorithm as an example.Next are the steps to develop SSSP:
Create a Java project, such as: graph_examples.
Add the jar packages in the directory “lib” in MaxCompute console into the Build Path of Eclipse project. A configured Eclipse project is shows on the next figure.
- Develop MaxCompute Graph code. In the actual development process, you will usually copy an example first (such as SSSP) and then modify it. In this example, we just modified the package path to be: package com.aliyun.odps.graph.example.
- Compile the program and package it. In the Eclipse environment, right-click on the “src” directory and click Export -> Java -> JAR file to generate the JAR package. Select the destination path to save the JAR package, such as “D:\odps\clt\odps-graph-example-sssp.jar”.
- Run SSSP on MaxCompute console. For the related operations, please refer to Quick Start-Run Graph.
- For the development steps, refer to Graph Development Plug-in.
GRAPH MaxCompute support local debugging mode. You can use Eclipse to carry out breakpoint debugging.The steps of breakpoint debugging are shown as follows:
- Download a maven pakage of odps-graph-local.
- Select the Eclipse project and right-click on the GRAPH java file (including main function) and configure the run parameters (Run As -> Run Configurations…), as shown in next figure.
- In Arguments tab, set Program arguments parameter to be “1 sssp_in sssp_out” as the input parameter.
- In Arguments tab, set VM arguments to be:-Dodps.runner.mode=local -Dodps.project.name=<project.name> -Dodps.end.point=<end.point>-Dodps.access.id=<access.id> -Dodps.access.key=<access.key>
- For the local mode (that is, odps.end.point has not been specified), you need to create the tables sssp_in and sssp_out in warehouse and add data for the input table sssp_in. The input data is shown as follows. For the introduction of warehouse, please refer to MapReduce Local Operation.
- Click Run to run SSSP in local mode.
The parameters setting can refer to the setting in conf/odps_config.ini of MaxCompute console. Other parameters are described as follows:
- odps.runner.mode: the value is “local”. In the local mode, this value must be specified.
- odps.project.name: specify the current project. It’s required.
- odps.end.point: specify the current MaxCompute service endpoint. This item is optional. If you do not specify this item, you can only read the meta and data of tables and resources from warehouse. If warehouse is not existent, the exception will be thrown. If you have specified this item, the meta and data will be read from ‘warehouse’ at first. If it does not exist, the data will be read by connecting with MaxCompute remotely.
- odps.access.id: access id to connect MaxCompute service. It is effective on condintion that you have specified odps.end.point.
- odps.access.key: access key to connect MaxCompute service. It is effective on condition that you have specified odps.end.point.
- odps.cache.resources: specify the resources to be used, which is similar with “-resources” parameter in jar command.
- odps.local.warehouse: the local path of warehouse. If it is not specified, the default value is “./warehouse”.
After you run SSSP in local mode, the output information is shown as follows:
graph task finish
Note: In the example mentioned aove, the tables sssp_in and sssp_out must exist in warehouse. For more details, refer to Quick Start-Run Graph.
Once a local debugging is executed, a new temporary directory will be created in Eclipse project directory, as follows:
A local run GRAPH job temp directory includes the following directories and files:
- counters - store some counter information in running process.
- inputs - store the input data of graph job, which will be gotten from the local warehouse preferentially. If it does not exist in the local, the data will be read from the server through MaxCompute SDK. The default data record number to be read is 10 for one input. You can modify the default value by modifying the parameter “-Dodps.mapred.local.record.limit”, but it cannot exceed 10000.
- outputs - store the output data of graph job. If the output table exists in the loacl warehouse, the result data in outputs will cover corresponding table in local warehouse.
- resources - store the resources to be used. Similarly to the input, the resources will be gotten from the local warehouse preferentially. If it does not exist in the local, the data will be read from the server through MaxCompute SDK.( odps.end.point has been configured.)
- job.xml - job configuration.
- superstep - store the message persistent information for each round of iteration.
- If you need to output detailed logs in local debugging process, you must put a configuration file of log4j “log4j.properties_odps_graph_cluster_debug” in src directory.