The development of neural network (NN) models able to encode structured input, and the more recent definition of kernels for structures, makes it possible to directly apply machine learning approaches to generic structured data. However, the effectiveness of a kernel can depend on its sparsity with respect to a specific data set. In fact, the accuracy of a kernel method typically reduces as the kernel sparsity increases. The sparsity problem is particularly common in structured domains involving discrete variables which may take on many different values. In this paper, we explore this issue on two well-known kernels for trees, and propose to face it by recurring to self-organizing maps (SOMs) for structures. Specifically, we show that a suitable combination of the two approaches, obtained by defining a new class of kernels based on the activation map of a SOM for structures, can be effective in avoiding the sparsity problem and results in a system that can be significantly more accurate for categorization tasks on structured data. The effectiveness of the proposed approach is demonstrated experimentally on two relatively large corpora of XML formatted data and a data set of user sessions extracted from Website logs.

Learning Nonsparse Kernels by Self-Organizing Maps for Structured Data

AIOLLI, FABIO;DA SAN MARTINO, GIOVANNI;SPERDUTI, ALESSANDRO
2009

Abstract

The development of neural network (NN) models able to encode structured input, and the more recent definition of kernels for structures, makes it possible to directly apply machine learning approaches to generic structured data. However, the effectiveness of a kernel can depend on its sparsity with respect to a specific data set. In fact, the accuracy of a kernel method typically reduces as the kernel sparsity increases. The sparsity problem is particularly common in structured domains involving discrete variables which may take on many different values. In this paper, we explore this issue on two well-known kernels for trees, and propose to face it by recurring to self-organizing maps (SOMs) for structures. Specifically, we show that a suitable combination of the two approaches, obtained by defining a new class of kernels based on the activation map of a SOM for structures, can be effective in avoiding the sparsity problem and results in a system that can be significantly more accurate for categorization tasks on structured data. The effectiveness of the proposed approach is demonstrated experimentally on two relatively large corpora of XML formatted data and a data set of user sessions extracted from Website logs.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/2448580
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 19
  • ???jsp.display-item.citation.isi??? 17
social impact