Abstract We present variants of classical data compression paradigms by Ziv, Lempel, and Welch in which the phrases used in compression are selected among suitably chosen motifs, defined here as strings of intermittently solid and wild characters that recur more or less frequently in the source textstring. This notion emerged primarily in the analysis of biological sequences and molecules. Whereas the number of motifs in a sequence or family may be exponential in the size of the input, a linear-sized basis of irredundant motifs may be defined such that any other motif can be obtained by the union of a suitable subset from the basis. Previous study has exposed the advantages of using irredundant motifs in lossy as well as lossless offline compression. In the present paper, we examine adaptations and extensions of classical incremental ZL and ZLW paradigms. First, hybrid schemata are proposed along these lines, in which motifs may be discovered and selected off-line, while the parse and encoding is still conducted on-line. The performances thus obtained improve on the one hand over previous off-line implementations of motif-based compression, and on the other, over the traditionally best implementations of ZLW. On the basis of this, both lossy and lossless motif-based schemata are introduced and tested that follow more closely the ZL and ZLW paradigms.

Motifs in Ziv-Lempel-Welch Clef

APOSTOLICO, ALBERTO;COMIN, MATTEO;
2004

Abstract

Abstract We present variants of classical data compression paradigms by Ziv, Lempel, and Welch in which the phrases used in compression are selected among suitably chosen motifs, defined here as strings of intermittently solid and wild characters that recur more or less frequently in the source textstring. This notion emerged primarily in the analysis of biological sequences and molecules. Whereas the number of motifs in a sequence or family may be exponential in the size of the input, a linear-sized basis of irredundant motifs may be defined such that any other motif can be obtained by the union of a suitable subset from the basis. Previous study has exposed the advantages of using irredundant motifs in lossy as well as lossless offline compression. In the present paper, we examine adaptations and extensions of classical incremental ZL and ZLW paradigms. First, hybrid schemata are proposed along these lines, in which motifs may be discovered and selected off-line, while the parse and encoding is still conducted on-line. The performances thus obtained improve on the one hand over previous off-line implementations of motif-based compression, and on the other, over the traditionally best implementations of ZLW. On the basis of this, both lossy and lossless motif-based schemata are introduced and tested that follow more closely the ZL and ZLW paradigms.
2004
DCC Data Compression Conference
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/2433251
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 13
social impact