Alibaba Cloud Genomics Service (AGS) is an ultra-fast, cost-effective, and high-precision genome sequencing and secondary analysis service developed by Alibaba Cloud. This topic introduces AGS and its benefits in applications.
What is AGS
AGS is primarily used in genome sequencing and secondary analysis. It needs only 15 minutes to complete a high-precision 30x whole-genome sequencing process, which includes gene comparison, sequencing, deduplication, and mutation detection. Compared with the traditional solution, AGS offers a 120 times speed improvement and is 2 to 4 times faster than the globally used FPGA/GPU solution.
AGS provides strong support for genetic disease detection and cancer screening by analyzing and identifying the mutation mechanism of personal genome sequences. In the future, it will play a bigger role in clinical medicine and genetic diagnosis. The whole human genome consists of about three billion base pairs. One 30x WGS sample is about 100 GB in size. AGS offers great benefits in terms of computation speed, precision, cost, usability, and compatibility with upstream sequencers. It is also suitable in scenarios such as SNP/INDEL and CNV detection in DNAs, and RNA virus detection.
- Ultra-fast and high-precision
In actual tests, the solution completes the sequencing and secondary analysis of eight 30x WGS samples within 15 minutes. The solution performs assembling, sequencing, deduplication, and mutation detection on 720 billion base pairs within 15 minutes, achieving a 120 times speed improvement without compromising the precision. In comparison between the NA12878 sample and gold-standard VCF files, the precision of secondary analysis is higher or equivalent to the output of BWA-0.7.17/GATK 4.1.3. The SNP precision reaches 99.80%.
Dataset: 30X NA12878 SNP RECALL PRECISION F1 GATK 4.1 99.86% 99.79% 99.82% AGS 99.86% 99.80% 99.83% INDEL RECALL PRECISION F1 GATK 4.1 99.28% 99.70% 99.49% AGS 99.27% 99.68% 99.47%
Container Service for Kubernetes and AGS provide PaaS-based acceleration capabilities and help BGI Group perform secondary analysis on large-scale FASTQ data from genome sequencers based on a hybrid cloud architecture. This solution lowers the costs of secondary analysis and shortens the delivery cycle by 95%.
- Wide applications
Meets flexible needs and supports custom analysis processes for different platforms and data types without compromising the throughput. Provides more simplified and efficient storage, automated analysis, data transmission, project collaboration, and bioinformation tool development solutions for major sequencing service providers and research institutions.
AGS supports Kubernetes native workflows and allows you to execute DAG-structured workflows in Kubernetes clusters. The service is also applicable in scenarios such as genetic computing and data processing.
- Easy to use
AGS supports elastic scheduling in large-scale computing scenarios based on the auto scaling feature of the cloud network architecture. In practical use, this solution saves you the trouble of planning computing resources, processing logic, and data caching. You only need to upload offline data in FASTQ files to OSS and authorize AGS to access your OSS buckets. The data analysis process will be completed efficiently and the result data will be uploaded to your specified storage space.
In addition, AGS also solves the issues in workflow management, massive data storage, migration and transmission, and security compliance requirements. For more information, see Perform Whole Genome Sequencing (WGS) through AGS, Use workflows in Kubernetes clusters, and Introduction to AGS CLI.