SAND - Scalable Assembly at Notre Dame
SAND is a set of modules for genome assembly that are built atop the
Work Queue platform
for large-scale distributed computation on clusters, clouds, or grids. SAND was
designed as a modular replacement for the conventional overlapper in the Celera
assembler, separated into two distinct steps: candidate filtering and alignment.
To use SAND, you start your assembly process as normal, then run a lightweight
worker program on as many other machines as you can access.
You can start them manually, run them on the cloud, or submit
them to systems like Condor or SGE. SAND will organize the machines into
a workforce that, under the right conditions, can speed up assembly tasks by
several hundred fold.
The correct output of SAND has been validated on the anopheles gambiae, sorghum bicolor,
and homo sapiens datasets listed below.
For More Information
SAND User's Manual
Download SAND Software
Getting Help with SAND
Sample Data
The following are the datasets used for evaluating SAND in our various publications.
Publications
(Showing papers with tag bioinformatics. See all papers instead.)
- Christopher Moretti, Andrew Thrasher, Li Yu, Michael Olson, Scott Emrich, and Douglas Thain,
A Framework for Scalable Genome Assembly on Clusters, Clouds, and Grids, IEEE Transactions on Parallel and Distributed Systems, 23(12), December, 2012. DOI: 10.1109/TPDS.2012.80
- Andrew Thrasher, Zachary Musgrave, Douglas Thain, Scott Emrich,
Shifting the Bioinformatics Computing Paradigm: A Case Study in Parallelizing Genome Annotation Using Maker and Work Queue, IEEE International Conference on Computational Advances in Bio and Medical Sciences, February, 2012.
- Rory Carmichael, Patrick Braga-Henebry, Douglas Thain, and Scott Emrich,
Biocompute 2.0: An Improved Collaborative Workspace for Data Intensive Bio-Science., Concurrency and Computation: Practice and Experience, 23(17), pages 2305-2314, December, 2011. DOI: 10.1002/cpe.1782
- Irena Lanc, Peter Bui, Douglas Thain, and Scott Emrich,
Adapting Bioinformatics Applications for Heterogeneous Systems: A Case Study, Emerging Computational Methods for the Life Sciences Workshop at ACM HPDC, pages 7-13, June, 2011. DOI: 10.1145/1996023.1996025
- Andrew Thrasher, Rory Carmichael, Peter Bui, Li Yu, Douglas Thain, and Scott Emrich,
Taming Complex Bioinformatics Workflows with Weaver, Makeflow, and Starch, Workshop on Workflows in Support of Large Scale Science, pages 1-6, November, 2010. DOI: 10.1109/WORKS.2010.5671858
- Rory Carmichael, Patrick Braga-Henebry, Douglas Thain, and Scott Emrich,
Biocompute: Toward a Collaborative Workspace for Data Intensive Bio-Science, Workshop on Emerging Computational Methods for Life Sciences at ACM HPDC 2010, pages 489-498, June, 2010. DOI: 10.1145/1851476.1851547
- Christopher Moretti, Michael Olson, Scott Emrich, and Douglas Thain,
Highly Scalable Genome Assembly on Campus Grids, Many-Task Computing on Grids and Supercomputers (MTAGS), November, 2009. DOI: 10.1145/1646468.1646480
- Christopher Moretti, Michael Olson, Scott Emrich, and Douglas Thain,
Scalable Modular Genome Assembly on Campus Grids, University of Notre Dame, Computer Science and Engineering Department, Technical Report 2009-04, July, 2009.
- Li Yu, Christopher Moretti, Scott Emrich, Kenneth Judd, and Douglas Thain,
Harnessing Parallelism in Multicore Clusters with the All-Pairs and Wavefront Abstractions, IEEE High Performance Distributed Computing, pages 1-10, June, 2009. DOI: 10.1145/1551609.1551613
- Christophe Blanchet, Remi Mollon, Douglas Thain, and Gilbert Deleage,
Grid Deployment of Legacy Bioinformatics Applications with Transparent Data Access, IEEE Grid Computing, pages 120-127, September, 2006. DOI: 10.1109/ICGRID.2006.311006
|