Contents:
Before you start
Prepare a job
Upload data file to OSS
Prepare task programs
Submit job
Compile job configuration
Run command to submit job
Check job running status
Check job execution result
1. Before you start
In Batch Compute, the procedure for submitting a job in a Docker container is similar to that in an environment without a Docker container, except for the following differences:
The image specified by ImageId must support Docker.
You only need to set the ImageId field in the task description to the ID of the Batch Compute public image (supporting Docker, with ID of img-ubuntu) or specify the ID of the cluster with this image ID set.
The following two environment variables (EnvVars) are added to the task description:
Parameter
Description
Required
BATCH_COMPUTE_DOCKER_IMAGE
Docker image name
No
BATCH_COMPUTE_DOCKER_REGISTRY_OSS_PATH
Storage path of the Docker image in OSS-Registry
No
If the task description does not contain the BATCH_COMPUTE_DOCKER_IMAGE parameter, no Docker container is used. In this case, the BATCH_COMPUTE_DOCKER_REGISTRY_OSS_PATH parameter is ignored.
If the task description contains BATCH_COMPUTE_DOCKER_IMAGE, a Docker container is used.
2. Prepare a job
Use Python to compile a job that counts the number of times INFO, WARN, ERROR, and DEBUG appear in a log file.
This job contains the following tasks:
The split task is used to divide the log file into three parts.
The count task is used to count the number of times INFO, WARN, ERROR, and DEBUG appear in each part of the log file. In the count task, InstanceCount must be set to 3, indicating that three count tasks are started concurrently.
The merge task is used to merge all the count results.
DAG
2.1. Upload data file to OSS
Download the data file used in this example log-count-data.txt.
Upload the log-count-data.txt file to oss://your-bucket/log-count/log-count-data.txt
.
your-bucket
indicates the bucket you created. In this example, the region iscn-shenzhen
.To upload the file to the OSS, see Upload files to the OSS.
2.2. Prepare task programs
The job program used in this example is complied using Python. Download the program: log-count.tar.gz.
Decompress the program package into the following directory:
mkdir log-count && tar -xvf log-count.tar.gz -C log-count
After decompression, the log-count/ directory structure is as follows:
log-count
|-- conf.py # Configuration
|-- split.py # split task program
|-- count.py # count task program
|-- merge.py # merge task program
Note: Do not change the task programs.
3. Submit job
You can submit the job by using the Python SDK or Java SDK or in the console. In this example, the job is submitted using the command line tool.
3.1. Compile job configuration
In the parent directory of log-count, create a file: job.cfg (under the same parent directory as log-count). The file contains the following content:
[DEFAULT]
job_name=log-count
description=demo
pack=./log-count/
deps=split->count;count->merge
[split]
cmd=python split.py
[count]
cmd=python count.py
nodes=3
[merge]
cmd=python merge.py
The file describes a multi-task job, with tasks executed in the following sequence: split->count->merge.
For more information about task description in a .cfg file, see Multi-task support.
3.2. Run command to submit job
bcs sub --file job.cfg -r oss://your-bucket/log-count/:/home/input -w oss://your-bucket/log-count/:/home/output --docker localhost:5000/myubuntu@oss://your-bucket/dockers/
In the command, -r and -w indicate read-only directory attaching and writable directory mapping respectively. For more information, see OSS directory attaching.
The same OSS path can be attached to different local directories, but different OSS paths cannot be attached to the same local directory.
--docker
means that a Docker image is used, with the image name in the following format: image_name@storage_oss_path. OSS automatically sets the Docker image name and repository address in environment variables.
Note: The region specified for BCS must be the same as the region where the Docker container is located.
4. Check job running status
bcs j # Obtain the job list. The job list obtained each time is cached. Generally, the first job in the cache is the one you exactly submitted.
bcs ch 1 # Check the status of the first job in the cache.
bcs log 1 # Check the log of the first job in the cache.
5. Check job execution result
After the job is executed, run the following command to check the result on OSS.
bcs oss cat oss://your-bucket/log-count/merge_result.txt
The expected result is as follows:
{"INFO": 2460, "WARN": 2448, "DEBUG": 2509, "ERROR": 2583}