In this paper, a general methodology for gathering and mining information from Web log files is proposed. A series of tools to retrieve, store, and analyze the data extracted from log files have been designed and implemented. The aim is to form general methods by abstracting from the analysis of logs which use a well-defined standard format, such as the Extended Log File Format proposed by W3C. The methodology has been experimented on the Web log files of The European Library portal; the experimental analyses led to personal, technical, geographical and temporal findings about the usage and traffic load. Considerations about a more accurate tracking of users and users profiles, and a better management of crawler accesses using authentication are presented.
Gathering and Mining Information from Web Log Files.
AGOSTI, MARISTELLA;DI NUNZIO, GIORGIO MARIA
2007
Abstract
In this paper, a general methodology for gathering and mining information from Web log files is proposed. A series of tools to retrieve, store, and analyze the data extracted from log files have been designed and implemented. The aim is to form general methods by abstracting from the analysis of logs which use a well-defined standard format, such as the Extended Log File Format proposed by W3C. The methodology has been experimented on the Web log files of The European Library portal; the experimental analyses led to personal, technical, geographical and temporal findings about the usage and traffic load. Considerations about a more accurate tracking of users and users profiles, and a better management of crawler accesses using authentication are presented.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.