Mar 8, 2007: Massively Parallel Plant Genomics
Filed in: Colloquium
Scott Emrich, Iowa State University
While only refrigerator-sized, the 1,024 node CyBlue BlueGene/L at Iowa State University can perform 5.7 trillion floating point operations per second. We have been applying this massive parallelism to multiple biological problems.
Today, researchers must break any genome into smaller pieces prior to sequencing. We present a method that uses efficient algorithms and BlueGene/L to solve these large "puzzles"; CyBlue currently can process over 3.2 million complex maize genomic sequences in less than 2 hours. Further, experiments on up to 8,000 nodes indicate this approach scales well on larger BlueGene/L supercomputers. Resulting MAGI assemblies, or Maize Assembled Genomics Islands, are widely used. We describe our own computational analyses that revealed significant biological features of the maize genome including "orphan" genes and highly similar gene duplications. Both may have played important roles during the domestication and evolution of this crop species.
Supercomputing is not the only technology capable of inducing paradigm shifts. New massively parallel pyrosequencing produces hundreds of thousands of sequences in only a few hours, and we describe applications and associated computational difficulties. As a case study, maize shoot apical meristem (SAM) ESTs were isolated by us and sequenced by 454 Life Sciences. Approximately 25,000 MAGIs were annotated after a single inbred B73 sequencing run. Moreover, we discovered thousands of putative SNPs after similarly sequencing another inbred maize line, Mo17. Most (>80%) of the computational predictions tested were validated and could be used to generate accurate genetic markers. Hence, the coupling of BlueGene and 454 technologies could be invaluable for upcoming plant genome sequencing and annotation including U.S. biofuel-related projects such as switchgrass and Brachypodium.
Scott Emrich obtained a BS in Biology and Computer Science from Loyola College (MD). Since then, he has been working on an interdisciplinary PhD in Bioinformatics and Computational Biology at Iowa State University. He recently received a departmental research excellence award from Electrical and Computer Engineering and was a visiting scholar at the Indian Institute of Technology (IIT) Bombay. In addition to research, he has led two undergraduate honors seminars and has given multiple cross-disciplinary talks on applying high performance computing to biological problems.
Abstract
While only refrigerator-sized, the 1,024 node CyBlue BlueGene/L at Iowa State University can perform 5.7 trillion floating point operations per second. We have been applying this massive parallelism to multiple biological problems.
Today, researchers must break any genome into smaller pieces prior to sequencing. We present a method that uses efficient algorithms and BlueGene/L to solve these large "puzzles"; CyBlue currently can process over 3.2 million complex maize genomic sequences in less than 2 hours. Further, experiments on up to 8,000 nodes indicate this approach scales well on larger BlueGene/L supercomputers. Resulting MAGI assemblies, or Maize Assembled Genomics Islands, are widely used. We describe our own computational analyses that revealed significant biological features of the maize genome including "orphan" genes and highly similar gene duplications. Both may have played important roles during the domestication and evolution of this crop species.
Supercomputing is not the only technology capable of inducing paradigm shifts. New massively parallel pyrosequencing produces hundreds of thousands of sequences in only a few hours, and we describe applications and associated computational difficulties. As a case study, maize shoot apical meristem (SAM) ESTs were isolated by us and sequenced by 454 Life Sciences. Approximately 25,000 MAGIs were annotated after a single inbred B73 sequencing run. Moreover, we discovered thousands of putative SNPs after similarly sequencing another inbred maize line, Mo17. Most (>80%) of the computational predictions tested were validated and could be used to generate accurate genetic markers. Hence, the coupling of BlueGene and 454 technologies could be invaluable for upcoming plant genome sequencing and annotation including U.S. biofuel-related projects such as switchgrass and Brachypodium.
Bio
Scott Emrich obtained a BS in Biology and Computer Science from Loyola College (MD). Since then, he has been working on an interdisciplinary PhD in Bioinformatics and Computational Biology at Iowa State University. He recently received a departmental research excellence award from Electrical and Computer Engineering and was a visiting scholar at the Indian Institute of Technology (IIT) Bombay. In addition to research, he has led two undergraduate honors seminars and has given multiple cross-disciplinary talks on applying high performance computing to biological problems.