Evaluation of the quality of data integration processes is usually performed via manual onerous data inspections. This task is particularly heavy in real business scenarios, where the large amount of data makes checking all the tuples infeasible and the frequent updates, i.e. changes in the sources and/or new sources, impose to repeat the evaluation over and over. Our idea is to address this issue by providing the experts with an unsupervised measure, based on word frequencies, which quantifies how much a dataset is representative of another dataset, giving an indication of how good is the integration process and whether deviations are happening and a manual inspection is needed. We also conducted some preliminary experiments, using shared datasets, that show the effectiveness of the proposed measures in typical data integration scenarios.

Unsupervised Evaluation of Data Integration Processes

Ferro, N.
2020

Abstract

Evaluation of the quality of data integration processes is usually performed via manual onerous data inspections. This task is particularly heavy in real business scenarios, where the large amount of data makes checking all the tuples infeasible and the frequent updates, i.e. changes in the sources and/or new sources, impose to repeat the evaluation over and over. Our idea is to address this issue by providing the experts with an unsupervised measure, based on word frequencies, which quantifies how much a dataset is representative of another dataset, giving an indication of how good is the integration process and whether deviations are happening and a manual inspection is needed. We also conducted some preliminary experiments, using shared datasets, that show the effectiveness of the proposed measures in typical data integration scenarios.
2020
Proc. 22nd International Conference on Information Integration and Web-based Applications & Services (iiWAS 2020),
9781450389228
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3367842
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact