edit-icon download-icon

Run offline tasks

Last Updated: Dec 07, 2017

Container Service abstracts the basic model of offline computing and provides the offline computing function based on Docker containers.

The core functions include:

  • Job orchestration
  • Job scheduling and lifecycle management
  • Integration of storage and other functions

Basic concepts

The following table compares the concepts of offline applications with those of online applications.

Concept Offline application Online application
Container Task execution unit Service execution unit
Operation history Execution history of tasks that encountered an error and were re-executed None
Service (Task) A special function that can be divided into several containers for execution A group of containers with the same functions
Application (Job) A combination of several tasks A combination of several services

In a word, an offline job contains several tasks. Each task can be executed by several containers. Each container can have multiple operation histories. By contrast, an online application contains several services and each service can be provided by several containers simultaneously.

Docker Compose-based job orchestration

Similar to online applications, Docker Compose can be used to describe and orchestrate jobs. Docker Compose supports the vast majority of Docker functions, such as:

  • CPU, memory, and other resource limits
  • Data volumes
  • Environment variables and labels
  • Network models and port exposure

In addition, Alibaba Cloud Container Service has expanded the following functions:

  • Container quantity: The number of containers that each task is divided into.
  • Number of retries: The number of retries made by each container.
  • Remove containers: Whether or not to delete a container after it has completed its run. You can select the following policies: remove-finished (deletes the container after it completes its run), remove-failed (deletes the container that fails the run), remove-all (deletes all of the containers), and remove-none (does not delete any container).
  • DAG model task dependencies: Tasks in a job can have dependencies between each other. Tasks that others depend on are executed first.

The following is an example of offline job Docker Compose.

  1. version: "2"
  2. labels:
  3. aliyun.project_type: "batch"
  4. services:
  5. s1:
  6. image: registry.aliyuncs.com/jimmycmh/testret:latest
  7. restart: no
  8. cpu_shares: 10
  9. mem_limit: 100000000
  10. labels:
  11. aliyun.scale: "10"
  12. aliyun.retry_count: "20"
  13. aliyun.remove_containers: "remove-all"
  14. s2:
  15. image: registry.aliyuncs.com/jimmycmh/testret:latest
  16. cpu_shares: 50
  17. mem_limit: 100000000
  18. labels:
  19. aliyun.scale: "4"
  20. aliyun.retry_count: "20"
  21. aliyun.remove_containers: "remove-finished"
  22. aliyun.depends: "s1"

Note:

  • Only Docker Compose 2.0 is supported.
  • Add the label aliyun.project_type: "batch" at the job level. If this label is not added or its value is not batch, the application is considered as an online application.
  • Any value of restart will be changed to no.
  • Use the aliyun.depends label to specify dependencies. A task can depend on several other tasks. Separate the tasks by using commas (,).
  • The default value of aliyun.retry_count is 3.
  • The default value of aliyun.remove_containers is remove-finished.

Job lifecycle management

The container status is determined by the container running and exit status. The task status is determined by the statuses of all the containers in the task. The job status is determined by the statuses of all the tasks in the job.

Container status

  • Running: The container is running.
  • Finished: The container exits and ExitCode==0.
  • Failed: The container exits and ExitCode!=0.

Task status

  • Running: A container is running.
  • Finished: All containers are finished.
  • Failed: The number of failures of a container exceeds the set value.

Job status

  • Running: A task is running.
  • Finished: All tasks are finished.
  • Failed: A task failed.

The preceding statuses can all be retrieved by means of API to facilitate automated Operation and Maintenance(O&M).

Shared storage

Data is shared and exchanged between containers and tasks. Shared storage can be used to resolve this issue. For example, when running an MR job on Hadoop, HDFS is used for data exchange. In Container Service, two types of shared storage can be used. Their features and application scenarios are compared as follows:

Storage Advantage Disadvantage Application scope
OSSFS data volumes Cross-host sharing. Low read/write and ls performance;
Modify a file causes the file to be overwritten.
Shared configuration files;
Attachment upload.
A third-party storage integrated by you, such as Portworx Virtualize the cloud disks in the cluster into a large shared disk;
High performance;
Snapshots, multiple copies.
Certain O&M capabilities are required. I/O-intensive applications that need data sharing, such as file servers;
I/O-intensive applications that need fast migration, such as databases.

For more information about how to use the data volumes, see Use OSSFS data volumes to share WordPress attachments.

Monitoring service

Monitoring is an important tool used to analyze offline jobs. Alibaba Cloud Container Service integrates with the CloudMonitor function. Adding a label in the orchestration template can collect CPU, memory, and other data of containers to CloudMonitor. For details, see Container monitoring service.

Procedure

  1. Log on to the Container Service console.

  2. Create a cluster.

    For details, see Create a cluster.

  3. Click Applications in the left-side navigation pane and then click Create Application in the upper-right corner.

  4. Complete the basic information for the application and then click Create with Orchestration Template.

  5. Use the preceding orchestration template and then click Create and Deploy.

    For more information about how to create an application, see Create an application.

  6. On the Application List page, click the application name to view the application running status.

Thank you! We've received your feedback.