The idea behind this work stems from the participation in some shared tasks concerning stance detection in NLP conferences. In these competitions, participants tried to develop the best stance prediction system for 'favor', 'against', and 'none' categories on selected topics, according to messages and relationships among users of a social networking site. Thus, the data available consisted of textual and network data. The teams we collaborated with used dimensionality reduction methods for network data, through a Multidimensional Scaling. On the other hand, the approach towards textual data involved different methods of feature extraction, without paying particular attention to dimensionality reduction for unstructured data. In this paper we show the empirical results of a two-step strategy to obtain lower-dimensional textual data relying on text mining techniques and principal component analysis. The results show levels of accuracy comparable to classical feature extraction techniques and to the best task models, despite using a much smaller number of predictors.

Dimensionality Reduction of Unstructured and Network Data for Stance Detection

Sciandra A.
2022

Abstract

The idea behind this work stems from the participation in some shared tasks concerning stance detection in NLP conferences. In these competitions, participants tried to develop the best stance prediction system for 'favor', 'against', and 'none' categories on selected topics, according to messages and relationships among users of a social networking site. Thus, the data available consisted of textual and network data. The teams we collaborated with used dimensionality reduction methods for network data, through a Multidimensional Scaling. On the other hand, the approach towards textual data involved different methods of feature extraction, without paying particular attention to dimensionality reduction for unstructured data. In this paper we show the empirical results of a two-step strategy to obtain lower-dimensional textual data relying on text mining techniques and principal component analysis. The results show levels of accuracy comparable to classical feature extraction techniques and to the best task models, despite using a much smaller number of predictors.
2022
JADT 2022 - Proceedings of the 16th International Conference on Statistical Analysis of Textual Data
979-12-80153-31-9
File in questo prodotto:
File Dimensione Formato  
Sciandra_jadt2022_rev.pdf

non disponibili

Tipologia: Published (publisher's version)
Licenza: Accesso privato - non pubblico
Dimensione 351.24 kB
Formato Adobe PDF
351.24 kB Adobe PDF Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3466135
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact