You can compile a local workflow and an Alibaba Cloud Genomics Service (AGS) workflow
to form a hybrid workflow. We recommend that you run time-consuming tasks that have
a standard workflow and require a large number of computing resources in the AGS workflow,
and run other tasks in the local workflow. Workflow data transfers through Object
Storage Service (OSS). This improves operation efficiency and saves resources. This
topic describes how to use AGS to create a mapping that allows you to compile and
run a hybrid workflow.
Prerequisites
The permissions to use AGS are granted. AGS is in public preview. Before you use this
feature, make sure that you have the permissions. To apply for permissions, visit
AGS application.
Note If you log on as a RAM user, enter the ID of the RAM user when you apply for the use
of AGS. You can obtain the ID of the RAM user in the
Account Management console.
Procedure
- Configure AGS.
After you configure AGS, the AGS configuration file config is generated.
- Create a ConfigMap.
kubectl create configmap config --from-file=~/.ags/config -n argo
- Add the following content to the hybridStorage.yaml file.
You can configure the following settings based on your needs: the endpoint and path
of the NAS volume, and the AccessKey ID and AccessKey secret of the OSS bucket.
apiVersion: v1
kind: PersistentVolume
metadata:
name: gene-shenzhen-cache-nas
namespace: argo
labels:
alicloud-pvname: gene-shenzhen-cache-nas
spec:
capacity:
storage: 20Gi
storageClassName: alicloud-nas
accessModes:
- ReadWriteMany
csi:
driver: nasplugin.csi.alibabacloud.com
volumeHandle: gene-shenzhen-cache-nas-pvc
volumeAttributes:
server: "xxxxxxxx-fbi71.cn-beijing.nas.aliyuncs.com"
path: "/tarTest"
mountOptions:
- nolock,tcp,noresvport
- vers=3
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: gene-shenzhen-cache-nas-pvc
namespace: argo
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 20Gi
storageClassName: alicloud-nas
selector:
matchLabels:
alicloud-pvname: gene-shenzhen-cache-nas
---
apiVersion: v1
kind: Secret
metadata:
name: oss-secret
namespace: argo
stringData:
akId: xxxxxxxxxx
akSecret: xxxxxxxxxxx
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: gene-shenzhen-cache-oss-pvc
namespace: argo
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 200Gi
selector:
matchLabels:
alicloud-pvname: gene-shenzhen-cache-oss
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: gene-shenzhen-cache-oss
namespace: argo
labels:
alicloud-pvname: gene-shenzhen-cache-oss
spec:
capacity:
storage: 200Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
csi:
driver: ossplugin.csi.alibabacloud.com
volumeHandle: gene-shenzhen-cache-oss //Set the value to the name of the persistent volume (PV).
nodePublishSecretRef:
name: oss-secret
namespace: argo
volumeAttributes:
bucket: "oss-test-tsk"
url: "oss-cn-beijing.aliyuncs.com"
otherOpts: "-o max_stat_cache_size=0 -o allow_other"
- Create PVs and persistent volume claims (PVCs).
kubectl create -f hybridStorage.yaml
- Add the following content to the hybrid.yaml file.
apiVersion: argoproj.io/v1alpha1
kind: Workflow #new type of k8s spec
metadata:
generateName: mpileup- #name of workflow spec
namespace: argo
spec:
entrypoint: mpileup #invoke the whalesay template
arguments:
parameters:
# fastq define
- name: bucket
value: "my-test-shenzhen"
- name: fastqDir
value: "sample"
- name: fastq1
value: "MGISEQ2000_PCR-free_NA12878_1_V100003043_L01_1.fq.gz"
- name: fastq2
value: "MGISEQ2000_PCR-free_NA12878_1_V100003043_L01_2.fq.gz"
- name: reference
value: "hg19"
# gene define
- name: bamDir
value: "outbam/bam"
- name: vcfDir
value: "output/vcf"
- name: bamFileName
value: "gene.bam"
- name: cpuNumber
value: "32"
- name: vcfFileName
value: "gene.vcf"
- name: service
value: "s"
volumes:
- name: workdir
persistentVolumeClaim:
claimName: gene-shenzhen-cache-nas-pvc
- name: outputdir
persistentVolumeClaim:
claimName: gene-shenzhen-cache-oss-pvc
- name: agsconfig
configMap:
name: config
templates:
#It is executed remotely and accelerated
- name: agsmapping
container:
image: registry.cn-beijing.aliyuncs.com/shuangkun/genetool:v1.1
imagePullPolicy: Always
command: [sh,-c]
args:
- ags remote run mapping --region cn-shenzhen --fastq1 sample/{{workflow.parameters.fastq1}} --fastq2 sample/{{workflow.parameters.fastq2}} --bucket {{workflow.parameters.bucket}} --output-bam {{workflow.parameters.bamDir}}/{{workflow.parameters.bamFileName}} --reference {{workflow.parameters.reference}} --service s --watch;
volumeMounts:
- name: agsconfig
mountPath: /root/.ags/config
subPath: config
#Download BAM file from OSS to NAS
- name: mpileupprepare
container:
image: registry.cn-beijing.aliyuncs.com/shuangkun/genetool:v1.1
imagePullPolicy: Always
command: [sh,-c]
args:
- cp /output/outbam/bam/gene.bam /data/sample/.
resources:
requests:
memory: 8Gi
cpu: 4
volumeMounts:
- name: workdir
mountPath: /data
- name: outputdir
mountPath: /output
#It is executed locally
- name: samtoolsindex
retryStrategy:
limit: 3
container:
image: registry.cn-beijing.aliyuncs.com/shuangkun/genetool:v1.1
command: [sh,-c]
args:
- samtools index -@ {{workflow.parameters.cpuNumber}} /data/sample/gene.bam
volumeMounts:
- name: workdir
mountPath: /data
resources:
requests:
memory: 20Gi
cpu: 4
##Upload index file from NAS to OSS
- name: upload
retryStrategy:
limit: 3
container:
image: registry.cn-beijing.aliyuncs.com/shuangkun/genetool:v1.1
command: [sh,-c]
args:
- cp /data/sample/gene.bam.bai /output/outbam/bam/.
volumeMounts:
- name: workdir
mountPath: /data
- name: outputdir
mountPath: /output
resources:
requests:
memory: 20Gi
cpu: 4
- name: mpileup
dag:
tasks:
- name: agsmappingtask
template: agsmapping
- name: download-data
dependencies: [agsmappingtask]
template: mpileupprepare
- name: index
dependencies: [download-data]
template: samtoolsindex
- name: upload-data
dependencies: [index]
template: upload
- Create a hybrid workflow.
kubectl create -f hybrid.yaml
After you create a hybrid workflow, the workflow automatically runs.
Find the bucket that corresponds to gene-shenzhen-cache-oss-pvc, and go to the /outbam/bam/ path of the bucket. If gene.bam.bai exists, the hybrid workflow is created and running properly.