When dealing with authorship attribution (AA), famous cases of disputed authorship naturally come up to one’s mind: who wrote William Shakespeare’s works? Is Corneille really hiding behind Molières signature in certain successful comedies? Why does Joanne Rowling write under the pen name of Robert Galbraith? And, in Italy, what is Elena Ferrante’s real identity? Style is determined by both words and syntactic structures that a writer decides to use – either consciously or unconsciously - when drafting his or her text and AA methods are called upon to reveal the “author’s hand”. However, the relevant literature includes hundreds of different proposals (Rudman 1998, Koppel et al. 2008, Stamatatos 2009) and no particular approach seems to be preferable in absolute terms: the choice of one method or another is heavily dependant upon the text type and the objectives of the analysis. Furthermore, at present AA may be considered an – albeit partially – unchartered strain of research because no standard parameters and protocols are available to compare and contrast results achieved according to different procedures (Juola 2015). In quantitative approaches, AA is often dealt with as a question regarding the measure of the similarity of (or distance between) two texts, as in the particular case of text clustering. In previous works we have endeavoured to contribute to the debate on AA testing, improve Labbé’s intertextual distance (Cortelazzo et al. 2012, 2013) and propose new graphic representation modes to compare the results of different measuring methods (Tuzzi 2010). However, we have so far tested different methods on texts whose authors were known. This new analysis deals with a corpus of contemporary novels written in Italian and introduces the following novelties: (1) limited time span; (2) increased number of novels by the same author; (3) focus on cases of disputed authorship (e.g. Giorgio Faletti, Elena Ferrante); (4) last but not least, focus on improving comparison protocols with the introduction of innovative methods to assess results exceeding traditional dichotomous measures (accuracy, precision, recall).

Authorship Attribution and Text Clustering for Contemporary Italian Novels

CORTELAZZO, MICHELE;TUZZI, ARJUNA
2016

Abstract

When dealing with authorship attribution (AA), famous cases of disputed authorship naturally come up to one’s mind: who wrote William Shakespeare’s works? Is Corneille really hiding behind Molières signature in certain successful comedies? Why does Joanne Rowling write under the pen name of Robert Galbraith? And, in Italy, what is Elena Ferrante’s real identity? Style is determined by both words and syntactic structures that a writer decides to use – either consciously or unconsciously - when drafting his or her text and AA methods are called upon to reveal the “author’s hand”. However, the relevant literature includes hundreds of different proposals (Rudman 1998, Koppel et al. 2008, Stamatatos 2009) and no particular approach seems to be preferable in absolute terms: the choice of one method or another is heavily dependant upon the text type and the objectives of the analysis. Furthermore, at present AA may be considered an – albeit partially – unchartered strain of research because no standard parameters and protocols are available to compare and contrast results achieved according to different procedures (Juola 2015). In quantitative approaches, AA is often dealt with as a question regarding the measure of the similarity of (or distance between) two texts, as in the particular case of text clustering. In previous works we have endeavoured to contribute to the debate on AA testing, improve Labbé’s intertextual distance (Cortelazzo et al. 2012, 2013) and propose new graphic representation modes to compare the results of different measuring methods (Tuzzi 2010). However, we have so far tested different methods on texts whose authors were known. This new analysis deals with a corpus of contemporary novels written in Italian and introduces the following novelties: (1) limited time span; (2) increased number of novels by the same author; (3) focus on cases of disputed authorship (e.g. Giorgio Faletti, Elena Ferrante); (4) last but not least, focus on improving comparison protocols with the introduction of innovative methods to assess results exceeding traditional dichotomous measures (accuracy, precision, recall).
2016
QUALICO 2016 Trier 24-28 August ABSTRACTS
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3197597
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact