This section describes how to use the Batch Compute-cli tool to submit a job that counts the occurrences of “INFO”, “WARN”, “ERROR”, and “DEBUG” in a log file.
Note: Make sure that you have signed up Batch Compute service in advance.
Contents:
Install and configure the Batch Compute-cli tool
Prepare a job
Upload the data file to the OSS
Prepare task programs
Submit the job
Check the job running status
Check the job execution result
Click Preparation for the installation and configuration of the Batch Compute-cli tool.
2. Prepare a job
The job aims to count the occurrences of “INFO”, “WARN”, “ERROR”, and “DEBUG” in a log file.
This job contains the following tasks:
The split task is used to divide the log file into three parts.
The count task is used to count the number of times “INFO”, “WARN”, “ERROR”, and “DEBUG” appear in each part of the log file. In the count task, InstanceCount must be set to 3, indicating that three count tasks are started concurrently.
The merge task merges all the results of the count task.
DAG
2.1. Upload data file to the OSS
Download the data file used in this example: log-count-data.txt
Upload the log-count-data.txt file to:
oss://your-bucket/log-count/log-count-data.txt
your-bucket
indicates the bucket you created. In this example, the region iscn-shenzhen
.
bcs oss upload ./log-count-data.txt oss://your-bucket/log-count/log-count-data.txt
bcs oss cat oss://your-bucket/log-count/log-count-data.txt # Check whether the file is uploaded successfully
The
bcs oss
command can complete some typical actions related to your OSS instance.bcs oss -h
shows the help information about this command. We recommend that you use this command when only a few data is to be tested. In the case of a large amount of data, the upload or download takes a long time because multithreading is not implemented yet. For more information about how to upload data to OSS instances, see OSS tools.
2.2 Prepare task programs
The job program used in this example is complied using Python. Download the program: log-count.tar.gz.
Decompress the program package into the following directory:
mkdir log-count && tar -xvf log-count.tar.gz -C log-count
After decompression, the log-count/ directory structure is as follows:
log-count
|-- conf.py # Configuration
|-- split.py # split task program
|-- count.py # count task program
|-- merge.py # merge task program
Note: Do not change the task programs.
3. Submit job
3.1. Compile job configuration
In the parent directory of log-count, create a file: job.cfg (under the same parent directory as log-count). The file contains the following content:
[DEFAULT]
job_name=log-count
description=demo
pack=./log-count/
deps=split->count;count->merge
[split]
cmd=python split.py
[count]
cmd=python count.py
nodes=3
[merge]
cmd=python merge.py
The file describes a multi-task job, with tasks executed in the following sequence: split->count->merge.
For more information about task description in a .cfg file, see Multiple tasks.
3.2. Submit the job
bcs sub --file job.cfg -r oss://your-bucket/log-count/:/home/input -w oss://your-bucket/log-count/:/home/output
In the command, -r and -w indicate read-only directory attaching and writable directory mapping respectively. For more information, see Access data on OSS.
The same OSS path can be attached to different local directories, but different OSS paths cannot be attached to the same local directory.
4. Check job running status
bcs j # Obtain the job list. The job list obtained each time is cached. Generally, the first job in the cache is the one you exactly submitted.
bcs ch 1 # Check the status of the first job in the cache.
bcs log 1 # Check the log of the first job in the cache.
5. Check job execution result
After the job is executed, run the following command to check the result on OSS.
bcs oss cat oss://your-bucket/log-count/merge_result.json
The expected result is as follows:
{"INFO": 2460, "WARN": 2448, "DEBUG": 2509, "ERROR": 2583}