Alignment-free genomic sequence analysis has facilitated high-throughput processing within numerous bioinformatics workflows. A central task in alignment-free applications is hashing k-mers, commonly used for indexing, querying, and fast similarity searches. Recently, spaced seeds—a specialized pattern designed to accommodate errors or mutations—have increasingly replaced k-mers, enhancing sensitivity in various applications. However, spaced seed hashing is computationally intensive, introducing significant delays. This paper addresses the challenge of efficient spaced seed hashing and presents DuoHash, a framework that enables the efficient computation of several hash functions. Our experimental results demonstrate that the proposed method substantially outperforms existing algorithms, achieving speedups of up to 11x. To illustrate practical utility, we further applied DuoHash to the problem of spaced k-mers counting. The code of DuoHash is available at https://github.com/CominLab/DuoHash/.
DuoHash: Fast Hashing of Spaced Seeds with Application to Spaced K-mers Counting
Pizzi, Cinzia;Comin, Matteo
2026
Abstract
Alignment-free genomic sequence analysis has facilitated high-throughput processing within numerous bioinformatics workflows. A central task in alignment-free applications is hashing k-mers, commonly used for indexing, querying, and fast similarity searches. Recently, spaced seeds—a specialized pattern designed to accommodate errors or mutations—have increasingly replaced k-mers, enhancing sensitivity in various applications. However, spaced seed hashing is computationally intensive, introducing significant delays. This paper addresses the challenge of efficient spaced seed hashing and presents DuoHash, a framework that enables the efficient computation of several hash functions. Our experimental results demonstrate that the proposed method substantially outperforms existing algorithms, achieving speedups of up to 11x. To illustrate practical utility, we further applied DuoHash to the problem of spaced k-mers counting. The code of DuoHash is available at https://github.com/CominLab/DuoHash/.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




