All Products
Search
Document Center

Container Service for Kubernetes:Use Argo Workflows to manage objects in a secure and efficient manner

Last Updated:Oct 16, 2024

Workflow clusters are fully-hosted Argo services used to manage objects in a secure and efficient manner. Workflow clusters also provide enhanced features. Compared with standard Argo workflows, workflow clusters have advantages in scenarios such as batch processing, data processing, and continuous integration. This topic describes how to use workflow clusters to manage objects in a secure and efficient manner.

Storage challenges in complex workflow orchestration

Argo Workflows is an open source cloud-native workflow engine and a graduated project of Cloud Native Computing Foundation (CNCF). Argo Workflows can automatically manage complex workflows on Kubernetes. Argo Workflows are suitable for various scenarios, such as CronJobs, machine learning, Extract Transform Load (ETL), data analysis, model training, data flow pipelines, and CI/CD.

image

When you use Argo Workflows for job orchestration, you must efficiently manage and store artifacts, especially in data-intensive scenarios such as model training, data processing, and bioinformatics analysis. You may face the following challenges when you adopt open source solutions:

  • Oversized objects cannot be uploaded: Objects larger than 5 GiB fail to be uploaded due to client upload limits.

  • Lack of an object cleanup mechanism: If temporary objects or output results of completed jobs are not cleaned up in a timely manner, Object Storage Service (OSS) storage space is wasted.

  • High disk usage of Argo Server: When you download objects by using Argo Server, you must store the objects on the disk before transferring the objects. High disk usage affects server performance and may lead to service interruptions or data loss.

Kubernetes clusters for distributed Argo workflows (workflow clusters) are fully-hosted Argo Workflows services that comply with the open source specifications. Workflow clusters can meet the challenges of large-scale, high-security object management. Workflow clusters provide enhanced features, such as large object upload, artifact garbage collection (GC), and artifact stream transmission. The preceding features can help you achieve efficient, secure, and fine-grained management of OSS objects on Alibaba Cloud.

image

As fully-hosted Argo Workflows services, workflow clusters provide the following advantages over open source solutions in artifact management:

Feature

Workflow clusters

Open source Argo Workflows

Object upload

The multipart upload feature for large objects is supported.

Only objects smaller than 5 GiB can be uploaded. Large objects cannot be uploaded.

GC

Artifact GC is supported.

GC is not supported.

Object download

Stream transmission is supported.

You must store objects on a disk before transferring the objects. This process is tedious.

Scenario 1: Support for large object upload

Open source Argo Workflows does not support large object upload, which limits its use in data-intensive jobs. To resolve this problem, workflow clusters optimize the logic for uploading large objects to OSS and support multipart upload and resumable upload. This significantly improves the efficiency and reliability of large object processing. The solution can also be used to independently verify the integrity of each part, which enhances data integrity and system fault tolerance.

Examples

By default, large object upload is enabled for workflow clusters. After you configure artifacts, you can submit a sample workflow to obtain a 20-GiB object named testfile.txt from OSS. This indicates that large objects can be uploaded to OSS.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-
spec:
  entrypoint: main
  templates:
    - name: main
      metadata:
        annotations:
          k8s.aliyun.com/eci-extra-ephemeral-storage: "20Gi"   # Specify the size of the additional temporary storage space. 
          k8s.aliyun.com/eci-use-specs : "ecs.g7.xlarge"
      container:
        image: alpine:latest
        command:
          - sh
          - -c
        args:
          - |
            mkdir -p /out
            dd if=/dev/random of=/out/testfile.txt bs=20M count=1024   # Create a 20-GiB object. 
            echo "created files!"
      outputs: # Trigger the system to upload the object to OSS. 
        artifacts:
          - name: out
            path: /out/testfile.txt

Scenario 2: Configure the object cleanup mechanism

Open source Argo solutions cannot automatically clean up OSS objects, which increases storage and O&M costs. Argo Workflows uses the artifact GC mechanism to clean up objects that are no longer needed after a workflow is complete, such as intermediate results and logs. This saves storage space, reduces costs, and avoids unlimited use of storage resources.

Workflow clusters optimize the OSS object cleanup logic. You can use the following features by configuring simple cleanup logic:

  • Automatic cleanup: After a workflow is completed or an administrator manually cleans up workflow-related resources for a period of time, the objects that are uploaded to OSS are automatically cleaned up.

  • Flexible configuration for cleanup: You have the option to configure cleanup policies only for successful workflow tasks to avoid deletion of failed logs. This facilitates subsequent issue tracking and troubleshooting. You can also configure exclusive cleanup policies for failed workflow tasks to remove invalid intermediate outputs.

  • Lifecycle management policy: You can use the lifecycle management policy provided by OSS to configure policies based on parameters such as time and prefix. The policies can automatically delete old artifacts or archive historical artifacts to cold storage. This ensures data integrity and reduces storage costs.

Examples

You can enable this feature by configuring an artifact GC policy. In the following example, the global artifact GC policy of the workflow is to clean up artifacts when the workflow is deleted., and the GC policy of the on-completion.txt file is to clean up artifacts when the workflow is completed. After the workflow is submitted, you can find on OSS that the on-completion.txt file is cleaned up when the workflow is completed and the on-deletion.txt file is also cleaned up after the workflow is deleted.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-gc-
spec:
  entrypoint: main
  artifactGC:
    strategy: OnWorkflowDeletion # The global cleanup policy. The artifact is cleanup when the workflow is deleted. This policy can be overridden. 
  templates:
    - name: main
      container:
        image: argoproj/argosay:v2
        command:
          - sh
          - -c
        args:
          - |
            echo "hello world" > /tmp/on-completion.txt
            echo "hello world" > /tmp/on-deletion.txt
      outputs: # Upload an object to OSS. 
        artifacts:
          - name: on-completion
            path: /tmp/on-completion.txt
            artifactGC:
              strategy: OnWorkflowCompletion # The global cleanup policy. The artifact is cleaned up when the workflow is completed. This policy overrides the global cleanup policy. 
          - name: on-deletion
            path: /tmp/on-deletion.txt

Scenario 3: Support for the stream transmission mechanism

In open source solutions, when Argo Server downloads objects, it must write the objects to disks before transferring them. This increases disk usage, which may affect the server performance and cause service interruptions or data loss.

Workflow clusters support the OpenStream API operation of OSS. When you download an object on the Argo Workflows user interface, Argo Server can directly stream the object from the OSS server to users, instead of downloading the entire object to the on-premises server and then providing it to users. This stream transmission mechanism is suitable for handling large-scale data transmission and storage workflow tasks. The following section describes the advantages of this mechanism.

  • Improved download performance: Objects are directly streamed from the OSS server to users, which eliminates the need to wait for the entire objects to be downloaded to Argo Server. This reduces the download latency, enhances the response speed, and provides a smoother user experience.

  • Reduced resource consumption to enhance concurrency: Stream transmission reduces the memory and disk demands on Argo Server. This enables Argo Server to transfer more objects in parallel with the same hardware resources and increases the concurrency level of the system. Stream transmission can efficiently scale out the service to handle the increasing number of users and growing file sizes. You do not need to worry about the disk space limits of Argo Server.

  • Enhanced security compliance: The stream transmission mechanism avoids temporary data storage in Argo Server, reduces security risks and possible data leaks, and meets data protection and compliance requirements.

By using streaming to transfer artifacts, loads on Argo Server can be reduced and the performance of UI file downloads can be improved. This effectively transforms Argo Server into a lightweight data forwarding center rather than a heavily-loaded storage and computing center.

Contact us

If you have any questions about ACK One, join the DingTalk group 35688562.