You can compile a local workflow and an Alibaba Cloud Genomics Service (AGS) workflow to form a hybrid workflow. We recommend that you run time-consuming tasks that have a standard workflow and require a large number of computing resources in the AGS workflow, and run other tasks in the local workflow. Workflow data transfers through Object Storage Service (OSS). This improves operation efficiency and saves resources. This topic describes how to use AGS to create a mapping that allows you to compile and run a hybrid workflow.

Prerequisites

The permissions to use AGS are granted. AGS is in public preview. Before you use this feature, make sure that you have the permissions. To apply for permissions, visit AGS application.
Note If you log on as a RAM user, enter the ID of the RAM user when you apply for the use of AGS. You can obtain the ID of the RAM user in the Account Management console.

Procedure

  1. Configure AGS.

    Download and install AGS. For more information, see Introduction to AGS CLI.

    ags config init
    After you configure AGS, the AGS configuration file config is generated.
  2. Create a ConfigMap.
    kubectl create configmap config --from-file=~/.ags/config -n argo
  3. Add the following content to the hybridStorage.yaml file.

    You can configure the following settings based on your needs: the endpoint and path of the NAS volume, and the AccessKey ID and AccessKey secret of the OSS bucket.

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: gene-shenzhen-cache-nas
      namespace: argo
      labels:
        alicloud-pvname: gene-shenzhen-cache-nas
    spec:
      capacity:
        storage: 20Gi
      storageClassName: alicloud-nas
      accessModes:
        - ReadWriteMany
      csi:
        driver: nasplugin.csi.alibabacloud.com
        volumeHandle: gene-shenzhen-cache-nas-pvc
        volumeAttributes:
          server: "xxxxxxxx-fbi71.cn-beijing.nas.aliyuncs.com"
          path: "/tarTest"
      mountOptions:
      - nolock,tcp,noresvport
      - vers=3 
    ---
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: gene-shenzhen-cache-nas-pvc
      namespace: argo
    spec:
      accessModes:
        - ReadWriteMany
      resources:
        requests:
          storage: 20Gi
      storageClassName: alicloud-nas
      selector:
        matchLabels:
          alicloud-pvname: gene-shenzhen-cache-nas
    ---  
    apiVersion: v1
    kind: Secret
    metadata:
      name: oss-secret
      namespace: argo
    stringData:
      akId: xxxxxxxxxx
      akSecret: xxxxxxxxxxx
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: gene-shenzhen-cache-oss-pvc
      namespace: argo
    spec:
      accessModes:
      - ReadWriteMany
      resources:
        requests:
          storage: 200Gi
      selector:
        matchLabels:
          alicloud-pvname: gene-shenzhen-cache-oss   
    ---
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: gene-shenzhen-cache-oss
      namespace: argo
      labels:
        alicloud-pvname: gene-shenzhen-cache-oss  
    spec:
      capacity:
        storage: 200Gi
      accessModes:
        - ReadWriteMany
      persistentVolumeReclaimPolicy: Retain
      csi:
        driver: ossplugin.csi.alibabacloud.com
        volumeHandle: gene-shenzhen-cache-oss //Set the value to the name of the persistent volume (PV).
        nodePublishSecretRef:
          name: oss-secret
          namespace: argo
        volumeAttributes:
          bucket: "oss-test-tsk"
          url: "oss-cn-beijing.aliyuncs.com"
          otherOpts: "-o max_stat_cache_size=0 -o allow_other"
  4. Create PVs and persistent volume claims (PVCs).
    kubectl create -f hybridStorage.yaml
  5. Add the following content to the hybrid.yaml file.
    apiVersion: argoproj.io/v1alpha1
    kind: Workflow                  #new type of k8s spec
    metadata:
      generateName: mpileup-    #name of workflow spec
      namespace: argo
    spec:
      entrypoint: mpileup         #invoke the whalesay template
      arguments:
        parameters:
    
        # fastq define
        - name: bucket
          value: "my-test-shenzhen"
        - name: fastqDir
          value: "sample"
        - name: fastq1
          value: "MGISEQ2000_PCR-free_NA12878_1_V100003043_L01_1.fq.gz"
        - name: fastq2
          value: "MGISEQ2000_PCR-free_NA12878_1_V100003043_L01_2.fq.gz"
        - name: reference
          value: "hg19"
    
        # gene define
        - name: bamDir
          value: "outbam/bam"
        - name: vcfDir
          value: "output/vcf"
        - name: bamFileName
          value: "gene.bam"
        - name: cpuNumber
          value: "32"
        - name: vcfFileName
          value: "gene.vcf"
        - name: service
          value: "s"
    
      volumes:
      - name: workdir
        persistentVolumeClaim:
          claimName: gene-shenzhen-cache-nas-pvc
      - name: outputdir
        persistentVolumeClaim:
          claimName: gene-shenzhen-cache-oss-pvc
      - name: agsconfig
        configMap:
          name: config
    
      templates:
      #It is executed remotely and accelerated
      - name: agsmapping 
        container:
          image: registry.cn-beijing.aliyuncs.com/shuangkun/genetool:v1.1
          imagePullPolicy: Always
          command: [sh,-c]
          args:
          - ags remote run mapping --region cn-shenzhen --fastq1 sample/{{workflow.parameters.fastq1}} --fastq2 sample/{{workflow.parameters.fastq2}} --bucket {{workflow.parameters.bucket}} --output-bam {{workflow.parameters.bamDir}}/{{workflow.parameters.bamFileName}} --reference {{workflow.parameters.reference}} --service s --watch;
          volumeMounts:
          - name: agsconfig
            mountPath: /root/.ags/config
            subPath: config
            
      #Download BAM file from OSS to NAS
      - name: mpileupprepare
        container:
          image: registry.cn-beijing.aliyuncs.com/shuangkun/genetool:v1.1
          imagePullPolicy: Always
          command: [sh,-c]
          args:
          - cp /output/outbam/bam/gene.bam /data/sample/. 
          resources:
            requests:
              memory: 8Gi
              cpu: 4
          volumeMounts:
          - name: workdir
            mountPath: /data
          - name: outputdir
            mountPath: /output  
      #It is executed locally
      - name: samtoolsindex
        retryStrategy:
          limit: 3
        container:
          image: registry.cn-beijing.aliyuncs.com/shuangkun/genetool:v1.1
          command: [sh,-c]
          args:
          - samtools index -@ {{workflow.parameters.cpuNumber}} /data/sample/gene.bam
          volumeMounts:
          - name: workdir
            mountPath: /data
          resources:
            requests:
              memory: 20Gi
              cpu: 4
      
      ##Upload index file from NAS to OSS
      - name: upload
        retryStrategy:
          limit: 3
        container:
          image: registry.cn-beijing.aliyuncs.com/shuangkun/genetool:v1.1
          command: [sh,-c]
          args:
          - cp /data/sample/gene.bam.bai /output/outbam/bam/.
          volumeMounts:
          - name: workdir
            mountPath: /data
          - name: outputdir
            mountPath: /output    
          resources:
            requests:
              memory: 20Gi
              cpu: 4      
              
              
      - name: mpileup
        dag:
          tasks:
          - name: agsmappingtask
            template: agsmapping
    
          - name: download-data
            dependencies: [agsmappingtask]
            template: mpileupprepare
    
          - name: index
            dependencies: [download-data]
            template: samtoolsindex
            
          - name: upload-data
            dependencies: [index]
            template: upload 
  6. Create a hybrid workflow.
    kubectl create -f hybrid.yaml

    After you create a hybrid workflow, the workflow automatically runs.

    Find the bucket that corresponds to gene-shenzhen-cache-oss-pvc, and go to the /outbam/bam/ path of the bucket. If gene.bam.bai exists, the hybrid workflow is created and running properly.