Identifying specific textual units of documents taken from large corpora. Comparing methods

Recent softwares for textual analysis often contain procedures aimed at identifying specific features of documents in large corpora in order to distinguish among them; especially algorithms based on the hypergeometric probabilistic model. This paper attempts to propose some new directions based on bootstrap techniques. The corpus is composed of documents written by different stakeholder actors during the seven preparatory meetings of the first step of the United Nations World Summit on the Information Society (WSIS -Geneve 2003).

Identifying specific textual units of documents taken from large corpora. Comparing methods

Pauli, Francesco;TUZZI, ARJUNA

2006

Abstract

Recent softwares for textual analysis often contain procedures aimed at identifying specific features of documents in large corpora in order to distinguish among them; especially algorithms based on the hypergeometric probabilistic model. This paper attempts to propose some new directions based on bootstrap techniques. The corpus is composed of documents written by different stakeholder actors during the seven preparatory meetings of the first step of the United Nations World Summit on the Information Society (WSIS -Geneve 2003).

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2006
			
	Titolo del Libro
	
				JADT 2006 Actes des 8es Journées internationales d'Analyse statistique des Données Textuelles
			
	Codice OpenAlex
	
				W2182203819
			
	Codice ISBN
	
				9782848671307
			
	Appare nelle tipologie:
	
				02.01 - Contributo in volume (Capitolo o Saggio)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/2463573

Citazioni

ND

ND

ND

ND

social impact