The task of measuring the dependence between terms is computationally expensive for IR systems which have to deal with large and sparse datasets. The current approaches to mining frequent term sets are based on the enumeration of the term sets found in a set of documents and on monotonicity, the latter being the property that a term set is frequent only if all its subsets are frequent as implemented by Apriori. However, the computational time can be very large. An alternative approach is to store the dataset in a FPT and to visit and prune the tree in a recursive way as implemented by FPGrowth. However, the storage space can still be very large. We introduce the BWI as a conceptual enhancement of monotonicity to predict with certainty when an itemset is frequent and when it is infrequent. We describe the empirical validation that the BWI can significantly reduce both the computational time of Apriori and the storage space of pattern tree-based algorithms such as FPGrowth. The empirical...

Efficient Term Set Prediction Using the Bell-Wigner Inequality

MELUCCI, MASSIMO
2015

Abstract

The task of measuring the dependence between terms is computationally expensive for IR systems which have to deal with large and sparse datasets. The current approaches to mining frequent term sets are based on the enumeration of the term sets found in a set of documents and on monotonicity, the latter being the property that a term set is frequent only if all its subsets are frequent as implemented by Apriori. However, the computational time can be very large. An alternative approach is to store the dataset in a FPT and to visit and prune the tree in a recursive way as implemented by FPGrowth. However, the storage space can still be very large. We introduce the BWI as a conceptual enhancement of monotonicity to predict with certainty when an itemset is frequent and when it is infrequent. We describe the empirical validation that the BWI can significantly reduce both the computational time of Apriori and the storage space of pattern tree-based algorithms such as FPGrowth. The empirical...
2015
Proceedings of the Symposium on String Processing and Information Retrieval
22nd International Symposium on String Processing and Information Retrieval, SPIRE 2015
9783319238258
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3167800
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex 0
social impact