A fundamental operation in computational genomics is the reduction of input sequences into their constituent k-mers. Developing space-efficient methods to represent a collection of k-mers is crucial for enhancing the scalability of bioinformatics analyses. A common strategy is to transform the set of k-mers into a de Bruijn graph and then create a streamlined representation by identifying the smallest path cover. In this article, we introduce USTAR2, a novel algorithm for compressing k-mers. USTAR2 leverages node connectivity principles in the de Bruijn graph for more efficient path selection in constructing the path cover. We tested USTAR2 on real read datasets and compared it with several other tools. USTAR2 demonstrated superior performance in terms of compression, requiring less memory and being significantly faster (up to 96x). The code of USTAR2 is available at the repository https://github.com/CominLab/USTAR2.

A Linear Algorithm For Efficient Representation of k-mer Sets Using De Bruijn Graphs

Rossignolo, Enrico;Comin, Matteo
2026

Abstract

A fundamental operation in computational genomics is the reduction of input sequences into their constituent k-mers. Developing space-efficient methods to represent a collection of k-mers is crucial for enhancing the scalability of bioinformatics analyses. A common strategy is to transform the set of k-mers into a de Bruijn graph and then create a streamlined representation by identifying the smallest path cover. In this article, we introduce USTAR2, a novel algorithm for compressing k-mers. USTAR2 leverages node connectivity principles in the de Bruijn graph for more efficient path selection in constructing the path cover. We tested USTAR2 on real read datasets and compared it with several other tools. USTAR2 demonstrated superior performance in terms of compression, requiring less memory and being significantly faster (up to 96x). The code of USTAR2 is available at the repository https://github.com/CominLab/USTAR2.
2026
Communications in Computer and Information Science
17th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2024
9783031968983
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3560666
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 1
  • OpenAlex 0
social impact