Advances in sequencing technologies and computational methods have enabled rapid and accurate identification of genetic variants. Accurate genotype calls and allele frequency estimations are crucial for population genomics analyses. One of the most demanding step in the genotyping pipeline is mapping reads to the human reference genome. Recently mapping-free methods, like Lava and VarGeno, have been proposed for the genotyping problem. They are reported to perform 30 times faster than a standard alignment-based genotyping pipeline while achieving comparable accuracy. Moreover, these methods are able to include known genomic variants in the reference making read mapping, and genotyping variant-aware. However, in order to run they require a large k-mers database, of about 60GB, to be loaded in memory. In this paper we study the problem of genotyping using new efficient data structures based on k-mers set compression, and we present a fast mapping-free genotyping tool, named GenoLight. GenoLight reports accuracy results similar to the standard pipeline, but it is up to 8 times faster. Also, GenoLight uses between 5 to 10 times less memory than the other mapping-free tools, and it can be run on a laptop. Availability: https://github.com/CominLab/GenoLight.

Efficient k-mer Indexing with Application to Mapping-free SNP Genotyping

Andreace, F;Comin, M
2022

Abstract

Advances in sequencing technologies and computational methods have enabled rapid and accurate identification of genetic variants. Accurate genotype calls and allele frequency estimations are crucial for population genomics analyses. One of the most demanding step in the genotyping pipeline is mapping reads to the human reference genome. Recently mapping-free methods, like Lava and VarGeno, have been proposed for the genotyping problem. They are reported to perform 30 times faster than a standard alignment-based genotyping pipeline while achieving comparable accuracy. Moreover, these methods are able to include known genomic variants in the reference making read mapping, and genotyping variant-aware. However, in order to run they require a large k-mers database, of about 60GB, to be loaded in memory. In this paper we study the problem of genotyping using new efficient data structures based on k-mers set compression, and we present a fast mapping-free genotyping tool, named GenoLight. GenoLight reports accuracy results similar to the standard pipeline, but it is up to 8 times faster. Also, GenoLight uses between 5 to 10 times less memory than the other mapping-free tools, and it can be run on a laptop. Availability: https://github.com/CominLab/GenoLight.
2022
Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies - BIOINFORMATICS
978-989-758-552-4
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3459736
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact