The identification of significant patterns, defined as patterns whose frequency significantly deviates from what is expected under a suitable null model of the data, is a key data mining task with application in several areas. We present PROMISE, an algorithm for identifying significant sequential patterns while guaranteeing that the probability that one or more false discoveries are reported in output (i.e., the Family-Wise Error Rate - FWER) is less than a user-defined threshold. PROMISE employs the Westfall-Young method to correct for multiple hypothesis testing, a more powerful method than the commonly used Bonferroni correction. PROMISE crucially hinges on the generation of (random) permuted datasets with features similar to the input dataset, for which we provide two efficient strategies. We also provide a rigorous analysis of one of such strategies, which is based on a properly defined swap operation, proving a rigorous bound on the number of swaps it requires. The results of our experimental evaluation show that PROMISE is an efficient method that allows the discovery of statistically significant sequential patterns from transactional datasets while properly controlling for false discoveries.

Permutation strategies for mining significant sequential patterns

Tonon A.;Vandin F.
2019

Abstract

The identification of significant patterns, defined as patterns whose frequency significantly deviates from what is expected under a suitable null model of the data, is a key data mining task with application in several areas. We present PROMISE, an algorithm for identifying significant sequential patterns while guaranteeing that the probability that one or more false discoveries are reported in output (i.e., the Family-Wise Error Rate - FWER) is less than a user-defined threshold. PROMISE employs the Westfall-Young method to correct for multiple hypothesis testing, a more powerful method than the commonly used Bonferroni correction. PROMISE crucially hinges on the generation of (random) permuted datasets with features similar to the input dataset, for which we provide two efficient strategies. We also provide a rigorous analysis of one of such strategies, which is based on a properly defined swap operation, proving a rigorous bound on the number of swaps it requires. The results of our experimental evaluation show that PROMISE is an efficient method that allows the discovery of statistically significant sequential patterns from transactional datasets while properly controlling for false discoveries.
2019
Proceedings - IEEE International Conference on Data Mining, ICDM
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3329322
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 18
  • ???jsp.display-item.citation.isi??? 10
social impact