A fundamental operation in computational genomics is the reduction of input sequences into their constituent k-mers. Designing space-efficient ways to represent a k-mer collection is essential to improve the scalability of bioinformatics analyses. A widely used approach involves converting the k-mer set into a de Bruijn graph and then producing a compact plain text representation by identifying the minimum path cover. In this article, we present USTAR-CR, a novel algorithm for compressing multiple k-mer sets. USTAR-CR leverages node connectivity principles in the colored de Bruijn graph for a more compact plain text representation, combined with an efficient encoding of k-mers colors. We tested USTAR-CR on real read datasets and compared it with the state-of-the-art GGCAT. USTAR-CR demonstrated superior performance in terms of compression, requiring less memory and being significantly faster (up to 51x) https://github.com/enricorox/USTAR-CR.
Fast and Succinct Compression of k-mer Sets with Plain Text Representation of Colored de Bruijn Graphs
Rossignolo, Enrico;Comin, Matteo
2026
Abstract
A fundamental operation in computational genomics is the reduction of input sequences into their constituent k-mers. Designing space-efficient ways to represent a k-mer collection is essential to improve the scalability of bioinformatics analyses. A widely used approach involves converting the k-mer set into a de Bruijn graph and then producing a compact plain text representation by identifying the minimum path cover. In this article, we present USTAR-CR, a novel algorithm for compressing multiple k-mer sets. USTAR-CR leverages node connectivity principles in the colored de Bruijn graph for a more compact plain text representation, combined with an efficient encoding of k-mers colors. We tested USTAR-CR on real read datasets and compared it with the state-of-the-art GGCAT. USTAR-CR demonstrated superior performance in terms of compression, requiring less memory and being significantly faster (up to 51x) https://github.com/enricorox/USTAR-CR.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




