The GATK software analysis process is jointly provided by Alibaba Cloud and Broad Institute. For the GATK process provided by Broad Institute, it is best to use Workflow Definition Language (WDL) for programming and use BatchCompute’s integrated Cromwell workflow engine for parsing. You are billed for the computing and storage resources actually consumed during jobs and do not have to pay any additional fees.
The Broad Institute GATK website and forum provide more background information, documentation, and support for GATK tools and WDL.
To use WDL to program a universal workflow, see In-app use —3. Cromwell workflow engine and WDL support.
* Currently, the GATK and WDL support functions are open for testing. To test the functions, please open a ticket.
To run GATK on BatchCompute, the input and output files must be stored in OSS. Therefore, you must first activate OSS and create a bucket.NOTE: You must create the bucket in the same region in which you plan to run GATK on BatchCompute.
pip install batchcompute-cli
After installing the interface, you must configure it.
For specific configuration instructions, click here.
Run this command to generate the demo code:
bcs gen ./demo -t gatk
This command generates the following directory structure:
Run the GATK demoThe GATK demo uses the human reference genome build 38 to process whole genome sequencing data. The input file is in unmatched BAM format.In this example, we use the public data in NA12878, with free storage for this data provided by Alibaba Cloud.
Now, run the following demo on your terminal:
bcs asub cromwell gatk-job\
This command is already written in main.sh, so alternatively, you can simply run:
The following message indicates submission was successful:
Job created: job-0000000059DC658400006822000001E3
job-0000000059DC658400006822000001E3 is the ID of the submitted job.
Check the job status:
bcs j # Get the job list
bcs j job-0000000059DC658400006822000001E3 # View job details
View a job log:
bcs log job-0000000059DC658400006822000001E3
To view process data and information in the workspace:
bcs o ls oss://my_bucket/my_key/worker_dir/
View all output files:
bcs o ls oss://my_bucket/my_key/outputs/
You have now successfully run Broad Institute GATK on BatchCompute.