Many tasks of contemporary Molecular Biology rely increasingly on au- tomated techniques for the discovery of interesting patterns and associations among them, both in individual sequences and across sequence families. A number of computational models and tools have been set up in recent years in response to these needs. This paper concentrates on approaches based on discrete combinatorial algorithms. As the order of magnitude of sequences and sequence repositories scales up to genomic proportions, problems of massive pattern and association discovery pose daunting methodological and algorith- mic challenges. Several formulations of pattern discovery implicate tables and synopses which, far from being a fraction of the input size, grow non-linearly, even exponentially with it. Often, these phenomena are intrinsic to the problem formulations and are not easily reverted. In some cases, a prudent mixture of combinatorial, algorithmic and statistical properties enable us to control such paradoxes. The focus of the present paper is thus on such “space-conscious” approaches to pattern discovery. The paper reflects mostly recent experience and work by its author, and does not pretend to be exhaustive.

Pattern Discovery and the Algorithmics of Surprise (Invited Paper)

APOSTOLICO, ALBERTO
2003

Abstract

Many tasks of contemporary Molecular Biology rely increasingly on au- tomated techniques for the discovery of interesting patterns and associations among them, both in individual sequences and across sequence families. A number of computational models and tools have been set up in recent years in response to these needs. This paper concentrates on approaches based on discrete combinatorial algorithms. As the order of magnitude of sequences and sequence repositories scales up to genomic proportions, problems of massive pattern and association discovery pose daunting methodological and algorith- mic challenges. Several formulations of pattern discovery implicate tables and synopses which, far from being a fraction of the input size, grow non-linearly, even exponentially with it. Often, these phenomena are intrinsic to the problem formulations and are not easily reverted. In some cases, a prudent mixture of combinatorial, algorithmic and statistical properties enable us to control such paradoxes. The focus of the present paper is thus on such “space-conscious” approaches to pattern discovery. The paper reflects mostly recent experience and work by its author, and does not pretend to be exhaustive.
2003
Artificial Intelligence and Heuristic Methods for Bioinformatics
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/1363088
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact