Recent softwares for textual analysis often contain procedures aimed at identifying specific features of documents in large corpora in order to distinguish among them; especially algorithms based on the hypergeometric probabilistic model. This paper attempts to propose some new directions based on bootstrap techniques. The corpus is composed of documents written by different stakeholder actors during the seven preparatory meetings of the first step of the United Nations World Summit on the Information Society (WSIS -Geneve 2003).

Identifying specific textual units of documents taken from large corpora. Comparing methods

TUZZI, ARJUNA
2006

Abstract

Recent softwares for textual analysis often contain procedures aimed at identifying specific features of documents in large corpora in order to distinguish among them; especially algorithms based on the hypergeometric probabilistic model. This paper attempts to propose some new directions based on bootstrap techniques. The corpus is composed of documents written by different stakeholder actors during the seven preparatory meetings of the first step of the United Nations World Summit on the Information Society (WSIS -Geneve 2003).
2006
JADT 2006 Actes des 8es Journées internationales d'Analyse statistique des Données Textuelles
9782848671307
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/2463573
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact