Motivation: the revolution in sequencing technologies referred to as "Next Generation Sequencing" has enabled rapid genome sequencing at reduced costs. While it has become easier to obtain a “draft” of a genome (that is usually highly fragmented into small contigs), producing a high quality genome assembly, with scaffolds spanning entire chromosomes, still presents hurdles and a lack of dedicated tools. Methods: ScaMPI is a comprehensive suite of programs to perform genome scaffolding using Mate Paired reads (in particular SOLiD color-space encoded reads). ScaMPI provides a greedyalgorithm for scaffolding with mate paired reads, a web - based interface to assist manual scaffolding and refinements of the assembly, and a set of tools for complementary tasks like contig consistency validation via physical coverage check, primer design, gap - closure, BAC - ends validation of the assembly and de novo telomere identification (TRAP, Telomeric Repeat Analysis Program). Results: the ScaMPI suite has been used to scaffold the contigs of a genome project of an oil - producing microalga (N. gaditana) sequenced with the 454 (N50: 40 kbp). ScaMPI automatically produced a set of scaffold (N50: 600 kbp) using two libraries of SOLiD mate pairs. The web interface has been used for manual refinements to produce a set of 58 scaffolds (N50: 1 Mbp). The telomere - identification module has been used to find telomere, thus discovering that 21 scaffolds were complete chromosomes (out of 30 estimated). Sequencing a set of 528 BAC - ends we found that 97% of them confirmed the assembly of 32 large scaffolds (accounting for 20 Mbp), while the remainder 3% did not disprove it.
ScaMPI: a program for genome Scaffolding using Mate Paired Information
CAMPAGNA, DAVIDE;FORCATO, CLAUDIO;VITULO, NICOLA;VALLE, GIORGIO
2013
Abstract
Motivation: the revolution in sequencing technologies referred to as "Next Generation Sequencing" has enabled rapid genome sequencing at reduced costs. While it has become easier to obtain a “draft” of a genome (that is usually highly fragmented into small contigs), producing a high quality genome assembly, with scaffolds spanning entire chromosomes, still presents hurdles and a lack of dedicated tools. Methods: ScaMPI is a comprehensive suite of programs to perform genome scaffolding using Mate Paired reads (in particular SOLiD color-space encoded reads). ScaMPI provides a greedyalgorithm for scaffolding with mate paired reads, a web - based interface to assist manual scaffolding and refinements of the assembly, and a set of tools for complementary tasks like contig consistency validation via physical coverage check, primer design, gap - closure, BAC - ends validation of the assembly and de novo telomere identification (TRAP, Telomeric Repeat Analysis Program). Results: the ScaMPI suite has been used to scaffold the contigs of a genome project of an oil - producing microalga (N. gaditana) sequenced with the 454 (N50: 40 kbp). ScaMPI automatically produced a set of scaffold (N50: 600 kbp) using two libraries of SOLiD mate pairs. The web interface has been used for manual refinements to produce a set of 58 scaffolds (N50: 1 Mbp). The telomere - identification module has been used to find telomere, thus discovering that 21 scaffolds were complete chromosomes (out of 30 estimated). Sequencing a set of 528 BAC - ends we found that 97% of them confirmed the assembly of 32 large scaffolds (accounting for 20 Mbp), while the remainder 3% did not disprove it.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.