The authors describe a statistical approach based on hidden Markov models (HMMs), for generating stemmers automatically. The proposed approach requires little effort to insert new languages in the system even if minimal linguistic knowledge is available. This is a key advantage especially for digital libraries, which are often developed for a specific institution or government because the program can manage a great amount of documents written in local languages. The evaluation described in the article shows that the stemmers implemented by means of HMMs are as effective as those based on linguistic rules.

Design, Implementation, and Evaluation of a Methodology for Automatic Stemmer Generation

MELUCCI, MASSIMO;ORIO, NICOLA
2007

Abstract

The authors describe a statistical approach based on hidden Markov models (HMMs), for generating stemmers automatically. The proposed approach requires little effort to insert new languages in the system even if minimal linguistic knowledge is available. This is a key advantage especially for digital libraries, which are often developed for a specific institution or government because the program can manage a great amount of documents written in local languages. The evaluation described in the article shows that the stemmers implemented by means of HMMs are as effective as those based on linguistic rules.
File in questo prodotto:
File Dimensione Formato  
article-revised-final-29-09-05.pdf

accesso aperto

Tipologia: Preprint (AM - Author's Manuscript - submitted)
Licenza: Accesso libero
Dimensione 331.88 kB
Formato Adobe PDF
331.88 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/1776177
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 5
  • OpenAlex ND
social impact