In text clustering most distance-based methods summarize the occurrences of a set of linguistic features to obtain a distance. It should decrease when texts are written by the same author, however, there are further properties that might influence the result: gender of the authors, their age, their geographical origin, publication date of the novels, their size, etc. In this study, regression analyses compare the performance of three distances and highlight, among available covariates, the preeminent effect of the author's hand but also interesting patterns in the effect of novels’ size.

Distance measures for exploring pairs of novels in a large corpus of Italian literature

Arjuna Tuzzi
2020

Abstract

In text clustering most distance-based methods summarize the occurrences of a set of linguistic features to obtain a distance. It should decrease when texts are written by the same author, however, there are further properties that might influence the result: gender of the authors, their age, their geographical origin, publication date of the novels, their size, etc. In this study, regression analyses compare the performance of three distances and highlight, among available covariates, the preeminent effect of the author's hand but also interesting patterns in the effect of novels’ size.
2020
Book of Short Papers SIS2020
9788891910776
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3355296
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact