17 January 2019
Illumina, Inc. announces the release of open source, novel artificial intelligence (AI) software that can find previously overlooked noncoding mutations in patients with rare genetic diseases. The new AI software for genome interpretations has been publicly released through Illumina’s BaseSpace Sequence Hub and GitHub. Additionally, these AI capabilities will be integrated into Illumina’s advanced BaseSpace Variant Interpreter Software.
“The open source release of our software demonstrates Illumina’s commitment to not only being the world’s largest enabler of DNA sequencing data, but also making widely available the AI tools that will enable clinicians and researchers to keep up with the enormous depth and breadth of genomic data being generated,” said Mostafa Ronaghi, Chief Technology Officer at Illumina.
This improves our ability to understand the clinical effects of mutations, especially in the noncoding genome.
The Illumina team, working with collaborators at the University of California, San Francisco and Stanford University, developed SpliceAI, a state-of-the-art deep neural network, to find previously overlooked noncoding mutations in patients with autism and intellectual disability. The team then experimentally validated 75 percent of these predictions by using RNA-seq to identify aberrant splicing events in cells from these patients. The results are published in the January 17 edition of Cell.
About the Study
Advances in machine learning/AI have the power to help us discover new biological insights from next-generation sequencing (NGS) data and remain largely unexplored. The authors of the Cell study, led by Kyle Farh, M.D., Ph.D., at Illumina, use deep neural networks to predict where these splicing events occur in the genome with high accuracy. The researchers then leverage their model to predict mutations that have the potential to alter splicing, including NGS data. Around 10 percent of pathogenic mutations in patients with neurodevelopmental disorders, or other rare genetic diseases, could have noncoding mutations that disrupt splicing, leading to the loss of an important protein.
“Less than two percent of human DNA is made of protein-coding sequence, which list the ingredients for making thousands of proteins,” said Dr. Stephan Sanders, BMBS, Ph.D., a leading autism researcher at the University of California, San Francisco, and co-author of the paper. “DNA mutations in these regions can disrupt proteins, often causing human disorders. The other 98 percent of DNA is noncoding and contains vital information about when, where and how these proteins should be made. In contrast to coding regions, little is known about the impact of noncoding mutations on human disorders, and we are only beginning to understand how they affect human health and disease.”
By leveraging NGS data, the study demonstrates a novel framework for using AI to generate actionable insights about biology and disease. This improves our ability to understand the clinical effects of mutations, especially in the noncoding genome.
For Research Use Only. Not for use in diagnostic procedures.