COMPARISON OF CORRELATION–BASED DISSIMILARITY MEASURES FOR CLUSTERING GENES WITH DAG STRUCTURE

Di Lascio, Francesca Marta Lilja; Roverato, Alberto

Clustering techniques have been largely used in the analysis of microarray data. One of the main tasks is the identification of clusters of genes forming relevant functional structures. In this context, a central role is played by transcription modules that consist of a transcription factor and its set of associated target genes. Following the Bayesian networks literature, Roverato and Di Lascio [14] represented these functional modules by means of directed acyclic graph models and they introduced, in a hierarchical clustering context, a theoretical framework for the comparison of dissimilarity measures on the basis of their ability to identify this kind of modules. Moreover, they proposed a novel dissimilarity measure based on the Wilks’ Lambda statistic and compared its performance with that of the most used linkage rules basing them on the “1 − squared correlation coefficient” proximity between genes; see also Di Lascio and Roverato [5]. In this paper we use the tools introduced in [14] in order to carry out a formal comparison of several dissimilarity measures. These measures are then compared on the basis of both simulated and real microarray data.