×
Community Blog ACK One Argo Workflow Clusters: Mastering Container Object Storage Service

ACK One Argo Workflow Clusters: Mastering Container Object Storage Service

This article introduces Argo Workflows, Artifacts, OSS, and the advantages of ACK One Serverless Argo Workflow.

By Su Yashi and Caishu

Terms

  • Argo Workflows: a cloud-native workflow engine designed to coordinate the execution of multiple tasks or steps in Kubernetes clusters.
  • Artifacts: persistent data objects generated during workflow execution, typically representing the output of a step or task.
  • OSS: Alibaba Cloud Object Storage Service, commonly used for storing Artifacts.

Background

Argo Workflows is an open-source, cloud-native workflow engine and a CNCF graduated project. It simplifies the automation and management of complex workflows on Kubernetes, making it suitable for various scenarios, including scheduled tasks, machine learning, ETL and data analysis, model training, data flow pipelines, and CI/CD.

1

When we use Argo Workflows to orchestrate tasks, especially in scenarios that involve large amounts of data, such as model training, data processing, and bioinformatics analysis, efficient management of Artifacts (usually stored in OSS in the Alibaba Cloud environment) is critical. However, users who adopt the open-source solution may encounter several challenges, including:

Upload failure of oversized files: If the size of a file exceeds 5Gi, the upload will fail due to the upload limit on the client.

Lack of a file cleanup mechanism: If the temporary files generated during the workflow or the output results of completed tasks are not cleaned up in time, it will lead to unnecessary consumption of OSS storage space.

High disk usage of Argo Server: When using Argo Server to download files, we need to persist data before transferring them. This results in a high disk usage which not only affects server performance but may also cause service interruption or data loss.

As a fully managed Argo Workflows service that completely adheres to community standards, ACK One Serverless Argo Workflows is dedicated to responding to the challenges of large-scale and high-security file management tasks. This article introduces a series of enhancements to the service in this regard, including the multipart upload of oversized files, Artifacts automatic garbage collection (GC), and Artifacts streaming. These features are designed to help users manage OSS files in an efficient, secure, and fine-grained manner in the Alibaba Cloud environment.

2

1. Support Multipart Upload of Oversized Files

For the purposes of data persistence and sharing, alleviating temporary storage pressure on Pod, and disaster recovery and backup, when we orchestrate tasks with Argo Workflows, it is necessary to upload data such as intermediate outputs, execution results, and process logs to OSS by using Artifacts. In scenarios such as model training, data processing, bioinformatics analysis, and audio and video processing, we often need to upload a large number of large files.

Open source solutions do not support the upload of oversized objects, which brings significant inconvenience and poor experience to users. However, ACK One Serverless Argo Workflows optimizes the logic of uploading oversized objects to OSS and supports multipart and resumable uploads. This is essential to improve the efficiency and reliability of large file processing, especially in environments with data-intensive tasks and distributed computing. It not only optimizes the resource use but also improves the ability to handle large data sets. In addition, each multipart supports independent integrity verification, which better guarantees data integrity, and enhances the fault tolerance of the system and data security.

Sample code:

This feature is enabled by default in ACK One Serverless Argo Workflows. After configuring Artifacts, we can submit a sample workflow to obtain a 20Gi file named testfile.txt from OSS. This indicates that an oversized object has been uploaded.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-
spec:
  entrypoint: main
  templates:
    - name: main
      metadata:
        annotations:
          k8s.aliyun.com/eci-extra-ephemeral-storage: "20Gi"  # Specify the capacity that you want to scale up for the temporary storage space.
          k8s.aliyun.com/eci-use-specs : "ecs.g7.xlarge"
      container:
        image: alpine:latest
        command:
          - sh
          - -c
        args:
          - |
            mkdir -p /out
            dd if=/dev/random of=/out/testfile.txt bs=20M count=1024 # Generate a 20Gi file
            echo "created files!"
      outputs: # Trigger the upload of a file to OSS.
        artifacts:
          - name: out
            path: /out/testfile.txt

2. Support OSS Artifacts GC

The Artifact garbage collection (GC) mechanism of Argo Workflows is mainly used to delete no-longer-needed files generated by the workflow (such as intermediate results and logs) after the workflow ends, which can save storage space and costs and prevent unlimited consumption of storage resources.

In open-source scenarios, the unavailability of automatic file reclaims from OSS increases the costs of use and O/M. Therefore, ACK One Serverless Argo Workflows optimizes the file cleanup method on OSS. By simple configuration of reclaim logic, the following can be implemented:

When the workflow is completed or the administrator manually clears the workflow-related resources in the cluster, the files uploaded to OSS will be automatically reclaimed after a certain period of time.

Reclaims are configured only for successful workflow tasks to prevent clearing logs that contain failed operations and facilitate tracing. Alternatively, reclaims are configured only for failed workflow tasks to reclaim invalid intermediate output.

We can set rules to automatically delete old Artifacts based on parameters such as the time and prefix by using the lifecycle management policy provided by OSS, or archive early Artifacts to cold storage to reduce costs while ensuring data integrity.

Sample code:

Configure the artifactGC policy to use this feature. As shown in the following example, the artifactGC policy of the workflow is Recycle After Deletion, and the recycling policy of the on-completion file is Recycle Upon Completion. That is, after the workflow is submitted, it can be observed on OSS that the on-completion.txt is recycled when the workflow is completed, and the on-deletion.txt file is recycled after the workflow is deleted.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-gc-
spec:
  entrypoint: main
  artifactGC:
    strategy: OnWorkflowDeletion # The global reclaim policy to recycle the Artifact when the workflow is deleted, which can be overwritten
  templates:
    - name: main
      container:
        image: argoproj/argosay:v2
        command:
          - sh
          - -c
        args:
          - |
            echo "hello world" > /tmp/on-completion.txt
            echo "hello world" > /tmp/on-deletion.txt
      outputs: # Upload an object to OSS
        artifacts:
          - name: on-completion
            path: /tmp/on-completion.txt
            artifactGC:
              strategy: OnWorkflowCompletion # Overwrite the global reclaim policy and recycle the Artifact when the workflow is completed
          - name: on-deletion
            path: /tmp/on-deletion.txt

3. Support Artifacts Streaming

When an open-source solution uses Argo Server to download files, it needs to persist data before transferring them. This results in a high disk usage which not only affects server performance but may also lead to service interruption or data loss.

ACK One Serverless Argo Workflows implements the OpenStream interface of OSS. When a user clicks to download a file on the Argo Workflows UI interface, the Argo Server directly streams the file from the OSS server to the user, instead of downloading the file to the server and then providing it for the user. This streaming mechanism is especially suitable for large-scale data transmission and storage workflow tasks:

Improve download performance: Streaming transfers a file from the OSS server without waiting for the entire file to be downloaded to the Argo Server first. This means that the download starts with a smaller delay so that it can provide a faster response and a smoother experience.

Reduce resource usage to improve concurrency: Streaming processing reduces the memory and disk requirements for the Argo Server, which enables it to handle more parallel file transfer requests with the same hardware resources and improves the system's concurrent processing capabilities. With the increases in users or file sizes, direct streaming allows better scaling of services to handle the growth without worrying about disk space limitations on the Argo Server.

Improve security compliance: Streaming avoids the temporary storage of data in Argo Server space, reduces security risks and data leakage risks, and helps comply with data protection and compliance requirements.

Streaming maximizes the performance of UI file downloads while minimizing the pressure on Argo Server single points. By Artifact streaming, Argo Server becomes a lightweight data flow center rather than a heavy load center for storage and computing.

Summary

As a fully managed Argo Workflows service, ACK One Serverless Argo Workflows has the following advantages over the open-source solution to Artifacts file management:

OSS file management capability Open-source Argo Workflows ACK One Serverless Argo Workflows
File upload Only files less than 5Gi are supported, and oversized files are not supported Support multipart upload of oversized files
File reclaim Not supported Support Artifacts GC
File download Data persistence is required Support streaming

In the future, Serverless Argo will feed these enhancements back into the community and grow with the community. It will continue to integrate the practices and experiences from the community, and further improve the stability and usability to provide users with a high-performance and scalable workflow platform.

0 1 0
Share on

Alibaba Container Service

160 posts | 29 followers

You may also like

Comments

Alibaba Container Service

160 posts | 29 followers

Related Products