All Products
Search
Document Center

Batch Compute:Quick start for Java SDK

Last Updated:Aug 06, 2019

This section describes how to use the Java SDK to submit a job. The job aims to count the number of times INFO, WARN, ERROR, and DEBUG appear in a log file.

Note: Make sure that you have signed up Batch Compute service in advance.

Contents:

  • Prepare a job
    • Upload the data file to the OSS
    • Use the sample codes
    • Compile and pack the codes
    • Upload the package to the OSS
  • Use the SDK to create (submit) the job
  • Check the result

1. Prepare a job

The job aims to count the number of times “INFO”, “WARN”, “ERROR”, and “DEBUG” appear in a log file.

This job contains the following tasks:

  • The split task is used to divide the log file into three parts.
  • The count task is used to count the number of times “INFO”, “WARN”, “ERROR”, and “DEBUG” appear in each part of the log file. In the count task, InstanceCount must be set to 3, indicating that three count tasks are started concurrently.
  • The merge task is used to merge all the count results.

DAG

DAG

1.1. Upload data file to OSS

Download the data file used in this example: log-count-data.txt

Upload the log-count-data.txt file to:

oss://your-bucket/log-count/log-count-data.txt

  • your-bucket indicates the bucket created by yourself. In this example, it is assumed that the region is cn-shenzhen.
  • To upload the file to the OSS, see Upload files to the OSS.

1.2. Use sample codes

Here, Java is used to compile the tasks of the job, and specifically Maven is used for compiling. IDEA is recommended, and you can download the free Community version of IDEA from http://www.jetbrains.com/idea/download/.

Download the sample program: java-log-count.zip

This is a Maven project.

  • NOTE: You do not need to modify codes.

1.3. Compile and pack codes

Run the following command to compile and pack the codes:

  1. mvn package

The following .jar packages are obtained under the target directory:

  1. batchcompute-job-log-count-1.0-SNAPSHOT-Split.jar
  2. batchcompute-job-log-count-1.0-SNAPSHOT-Count.jar
  3. batchcompute-job-log-count-1.0-SNAPSHOT-Merge.jar

Run the following command to pack the three .jar packages into a tar.gz file:

  1. > cd target # Switch to the target directory
  2. > tar -czf worker.tar.gz *SNAPSHOT-*.jar # Packing

Run the following command to check whether the package content is correct:

  1. > tar -tvf worker.tar.gz
  2. batchcompute-job-log-count-1.0-SNAPSHOT-Split.jar
  3. batchcompute-job-log-count-1.0-SNAPSHOT-Count.jar
  4. batchcompute-job-log-count-1.0-SNAPSHOT-Merge.jar
  • NOTE: Batch Compute supports only the compressed packages with the extension tar.gz. Make sure that you use the preceding method (gzip) for packaging. Otherwise, the package cannot be parsed.

1.4. Upload the package to the OSS

In this example, upload worker.tar.gz to your-bucket in the OSS.

oss://your-bucket/log-count/worker.tar.gz

  • To run the job in this example, you must create your own bucket.In addition, upload worker.tar.gz to the path of your own bucket.

2. Use the SDK to create (submit) the job

2.1. Create a Maven project

Add the following dependencies to pom.xml:

  1. <dependencies>
  2. <dependency>
  3. <groupId>com.aliyun</groupId>
  4. <artifactId>aliyun-java-sdk-batchcompute</artifactId>
  5. <version>5.2.0</version>
  6. </dependency>
  7. <dependency>
  8. <groupId>com.aliyun</groupId>
  9. <artifactId>aliyun-java-sdk-core</artifactId>
  10. <version>3.2.3</version>
  11. </dependency>
  12. </dependencies>
  • Make sure that the SDK of the latest version is used. For more information, see Java SDK.

2.2. Create a Java class: Demo.java

When submitting a job, you must specify a cluster ID or use the AutoCluster parameters.

In this example, the AutoCluster is used. You must configure the following parameters for the AutoCluster:

  • Available image ID. You can use the image provided by the system or custom an image. For more information about how to custom an image, see Use an image.
  • InstanceType. For more information about the instance type, see Currently supported instance types.

Create a path for storing the StdoutRedirectPath (program outputs) and StderrRedirectPath (error logs) in the OSS. In this example, the created path is oss://your-bucket/log-count/logs/.

  • To run the program in this example, modify variables with comments in the program based on the previously described variables and OSS path variables.

The following provides a sample program that uses the Java SDK to submit a job. For specific meanings of parameters in the program, see SDK interface description.

Demo.java:

  1. /*
  2. * IMAGE_ID: ECS image. It can be obtained according to the previous descriptions.
  3. * INSTANCE_TYPE: Instance type. It can be obtained according to the previous descriptions.
  4. * REGION_ID: The region is Qingdao or Hangzhou. Currently, the Batch Compute service is provided only in Qingdao. The region must be consistent with the region of the bucket used to store the worker.tar.gz in the OSS.
  5. * ACCESS_KEY_ID: The AccessKeyID can be obtained according to the previous descriptions.
  6. * ACCESS_KEY_SECRET: The AccessKeySecret can be obtained according to the previous descriptions.
  7. * WORKER_PATH: OSS storage path to which worker.tar.gz is packed and uploaded.
  8. * LOG_PATH: Storage path of the error feedback and task outputs.
  9. */
  10. import com.aliyuncs.batchcompute.main.v20151111.*;
  11. import com.aliyuncs.batchcompute.model.v20151111.*;
  12. import com.aliyuncs.batchcompute.pojo.v20151111.*;
  13. import com.aliyuncs.exceptions.ClientException;
  14. import java.util.ArrayList;
  15. import java.util.List;
  16. public class Demo {
  17. static String IMAGE_ID = "img-ubuntu";; //Enter the ECS image ID
  18. static String INSTANCE_TYPE = "ecs.sn1.medium"; //Enter the appropriate instance type based on the region
  19. static String REGION_ID = "cn-shenzhen"; //Enter the region
  20. static String ACCESS_KEY_ID = ""; //"your-AccessKeyId"; Enter your AccessKeyID
  21. static String ACCESS_KEY_SECRET = ""; //"your-AccessKeySecret"; Enter your AccessKeySecret
  22. static String WORKER_PATH = ""; //"oss://your-bucket/log-count/worker.tar.gz"; // Enter the OSS storage path to which worker.tar.gz is uploaded
  23. static String LOG_PATH = ""; // "oss://your-bucket/log-count/logs/"; // Enter the OSS storage path of the error feedback and task outputs
  24. static String MOUNT_PATH = ""; // "oss://your-bucket/log-count/";
  25. public static void main(String[] args){
  26. /** Construct the BatchCompute client */
  27. BatchCompute client = new BatchComputeClient(REGION_ID, ACCESS_KEY_ID, ACCESS_KEY_SECRET);
  28. try{
  29. /** Construct the job object */
  30. JobDescription jobDescription = genJobDescription();
  31. // Create a job
  32. CreateJobResponse response = client.createJob(jobDescription);
  33. //After the successful creation, the jobId is returned
  34. String jobId = response.getJobId();
  35. System.out.println("Job created success, got jobId: "+jobId);
  36. //Query the job status
  37. GetJobResponse getJobResponse = client.getJob(jobId);
  38. Job job = getJobResponse.getJob();
  39. System.out.println("Job state:"+job.getState());
  40. } catch (ClientException e) {
  41. e.printStackTrace();
  42. System.out.println("Job created failed, errorCode:"+ e.getErrCode()+", errorMessage:"+e.getErrMsg());
  43. }
  44. }
  45. private static JobDescription genJobDescription(){
  46. JobDescription jobDescription = new JobDescription();
  47. jobDescription.setName("java-log-count");
  48. jobDescription.setPriority(0);
  49. jobDescription.setDescription("log-count demo");
  50. jobDescription.setJobFailOnInstanceFail(true);
  51. jobDescription.setType("DAG");
  52. DAG taskDag = new DAG();
  53. /** Add a split task */
  54. TaskDescription splitTask = genTaskDescription();
  55. splitTask.setTaskName("split");
  56. splitTask.setInstanceCount(1);
  57. splitTask.getParameters().getCommand().setCommandLine("java -jar batchcompute-job-log-count-1.0-SNAPSHOT-Split.jar");
  58. taskDag.addTask(splitTask);
  59. /** Add a count task */
  60. TaskDescription countTask = genTaskDescription();
  61. countTask.setTaskName("count");
  62. countTask.setInstanceCount(3);
  63. countTask.getParameters().getCommand().setCommandLine("java -jar batchcompute-job-log-count-1.0-SNAPSHOT-Count.jar");
  64. taskDag.addTask(countTask);
  65. /** Add a merge task */
  66. TaskDescription mergeTask = genTaskDescription();
  67. mergeTask.setTaskName("merge");
  68. mergeTask.setInstanceCount(1);
  69. mergeTask.getParameters().getCommand().setCommandLine("java -jar batchcompute-job-log-count-1.0-SNAPSHOT-Merge.jar");
  70. taskDag.addTask(mergeTask);
  71. /** Add the task dependencies: split-->count-->merge */
  72. List<String> taskNameTargets = new ArrayList();
  73. taskNameTargets.add("merge");
  74. taskDag.addDependencies("count", taskNameTargets);
  75. List<String> taskNameTargets2 = new ArrayList();
  76. taskNameTargets2.add("count");
  77. taskDag.addDependencies("split", taskNameTargets2);
  78. //dag
  79. jobDescription.setDag(taskDag);
  80. return jobDescription;
  81. }
  82. private static TaskDescription genTaskDescription(){
  83. AutoCluster autoCluster = new AutoCluster();
  84. autoCluster.setInstanceType(INSTANCE_TYPE);
  85. autoCluster.setImageId(IMAGE_ID);
  86. //autoCluster.setResourceType("OnDemand");
  87. TaskDescription task = new TaskDescription();
  88. //task.setTaskName("Find");
  89. //If the VPC instance is used, configure the CIDR block and avoid any CIDR block conflict
  90. Configs configs = new Configs();
  91. Networks networks = new Networks();
  92. VPC vpc = new VPC();
  93. vpc.setCidrBlock("192.168.0.0/16");
  94. networks.setVpc(vpc);
  95. configs.setNetworks(networks);
  96. autoCluster.setConfigs(configs);
  97. //Complete OSS path of the job for packaging and uploading
  98. Parameters p = new Parameters();
  99. Command cmd = new Command();
  100. //cmd.setCommandLine("");
  101. //Complete OSS path of the job for packaging and uploading
  102. cmd.setPackagePath(WORKER_PATH);
  103. p.setCommand(cmd);
  104. //Error feedback storage path
  105. p.setStderrRedirectPath(LOG_PATH);
  106. //Final result storage path
  107. p.setStdoutRedirectPath(LOG_PATH);
  108. task.setParameters(p);
  109. task.addInputMapping(MOUNT_PATH, "/home/input");
  110. task.addOutputMapping("/home/output",MOUNT_PATH);
  111. task.setAutoCluster(autoCluster);
  112. //task.setClusterId(clusterId);
  113. task.setTimeout(30000); /* 30000 seconds*/
  114. task.setInstanceCount(1); /** Use one instance to run the program */
  115. return task;
  116. }
  117. }

Example of the normal output:

  1. Job created success, got jobId: job-01010100010192397211
  2. Job state:Waiting

3. Check job status

You can view the job status by referring to Obtain the job information.

  1. //Query the job status
  2. GetJobResponse getJobResponse = client.getJob(jobId);
  3. Job job = getJobResponse.getJob();
  4. System.out.println("Job state:"+job.getState());

A job may be in one of the following states: Waiting, Running, Finished, Failed, and Stopped.

4. Check job execution result

You can view the job status by logging on to the Batch Compute console.

After the job finishes running, you can log on to the OSS console and check the following file under your-bucket: /log-count/merge_result.json.

The expected result is as follows:

  1. {"INFO": 2460, "WARN": 2448, "DEBUG": 2509, "ERROR": 2583}

Alternatively, you can use the OSS SDK to obtain the results.