Since massive collections of textual documents become more and more available in digital format, the organization and classification of these documents (for example in Digital Library Management System (DLMS)) becomes an important issue. For this reason, finding a suitable graphical representation of documents would be of help for system designers during the process of raw data exploration, and users to interpret results more clearly. Automatic Text Categorization (ATC), which is the task of organizing large collection of documents into predefined categories by means of Machine Learning (ML) methods, is a potential (and rarely explored) field of application of the Visual Data Exploration (VDE) techniques. Starting from a recent approach that represents documents with only two dimensions, we study the possibilities of an enhanced three-dimensional plot. In particular, the problem of detecting duplicates within documents collections as well as how the visualization of these duplicates may help both system designers and users is tackled.

3-D Environment to Represent Textual Documents for Duplicate Detection and Collection Examination

DI NUNZIO, GIORGIO MARIA
2005

Abstract

Since massive collections of textual documents become more and more available in digital format, the organization and classification of these documents (for example in Digital Library Management System (DLMS)) becomes an important issue. For this reason, finding a suitable graphical representation of documents would be of help for system designers during the process of raw data exploration, and users to interpret results more clearly. Automatic Text Categorization (ATC), which is the task of organizing large collection of documents into predefined categories by means of Machine Learning (ML) methods, is a potential (and rarely explored) field of application of the Visual Data Exploration (VDE) techniques. Starting from a recent approach that represents documents with only two dimensions, we study the possibilities of an enhanced three-dimensional plot. In particular, the problem of detecting duplicates within documents collections as well as how the visualization of these duplicates may help both system designers and users is tackled.
Proceedings of the 7-th International Workshop of the EU Network of Excellence DELOS on Audio-Visual Content and Information Visualization in Digital Libraries (AVIVDiLib 2005)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/1468212
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact