Search Engines (SE) have been shown to perpetuate well-known gender stereotypes identified in psychology literature and to influence users accordingly. Similar biases were found encoded in Word Embeddings (WEs) learned from large online corpora. In this context, we propose the Gender Stereotype Reinforcement (GSR) measure, which quantifies the tendency of a SE to support gender stereotypes, leveraging gender-related information encoded in WEs. Through the critical lens of construct validity, we validate the proposed measure on synthetic and real collections. Subsequently, we use GSR to compare widely-used Information Retrieval ranking algorithms, including lexical, semantic, and neural models. We check if and how ranking algorithms based on WEs inherit the biases of the underlying embeddings. We also consider the most common debiasing approaches for WEs proposed in the literature and test their impact in terms of GSR and common performance measures. To the best of our knowledge, GSR is the first specifically tailored measure for IR, capable of quantifying representational harms.

Gender stereotype reinforcement: Measuring the gender bias conveyed by ranking algorithms

Fabris A.;Purpura A.;Silvello G.;Susto G. A.
2020

Abstract

Search Engines (SE) have been shown to perpetuate well-known gender stereotypes identified in psychology literature and to influence users accordingly. Similar biases were found encoded in Word Embeddings (WEs) learned from large online corpora. In this context, we propose the Gender Stereotype Reinforcement (GSR) measure, which quantifies the tendency of a SE to support gender stereotypes, leveraging gender-related information encoded in WEs. Through the critical lens of construct validity, we validate the proposed measure on synthetic and real collections. Subsequently, we use GSR to compare widely-used Information Retrieval ranking algorithms, including lexical, semantic, and neural models. We check if and how ranking algorithms based on WEs inherit the biases of the underlying embeddings. We also consider the most common debiasing approaches for WEs proposed in the literature and test their impact in terms of GSR and common performance measures. To the best of our knowledge, GSR is the first specifically tailored measure for IR, capable of quantifying representational harms.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3350593
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 32
  • ???jsp.display-item.citation.isi??? 21
social impact