Recent softwares for textual analysis often contain procedures aimed at identifying specific features of documents in large corpora in order to distinguish among them; especially algorithms based on the hypergeometric probabilistic model. This paper attempts to propose some new directions based on bootstrap techniques. The corpus is composed of documents written by different stakeholder actors during the seven preparatory meetings of the first step of the United Nations World Summit on the Information Society (WSIS -Geneve 2003).
Identifying specific textual units of documents taken from large corpora. Comparing methods
TUZZI, ARJUNA
2006
Abstract
Recent softwares for textual analysis often contain procedures aimed at identifying specific features of documents in large corpora in order to distinguish among them; especially algorithms based on the hypergeometric probabilistic model. This paper attempts to propose some new directions based on bootstrap techniques. The corpus is composed of documents written by different stakeholder actors during the seven preparatory meetings of the first step of the United Nations World Summit on the Information Society (WSIS -Geneve 2003).File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.