Gender stereotype reinforcement: Measuring the gender bias conveyed by ranking algorithms

Search Engines (SE) have been shown to perpetuate well-known gender stereotypes identified in psychology literature and to influence users accordingly. Similar biases were found encoded in Word Embeddings (WEs) learned from large online corpora. In this context, we propose the Gender Stereotype Reinforcement (GSR) measure, which quantifies the tendency of a SE to support gender stereotypes, leveraging gender-related information encoded in WEs. Through the critical lens of construct validity, we validate the proposed measure on synthetic and real collections. Subsequently, we use GSR to compare widely-used Information Retrieval ranking algorithms, including lexical, semantic, and neural models. We check if and how ranking algorithms based on WEs inherit the biases of the underlying embeddings. We also consider the most common debiasing approaches for WEs proposed in the literature and test their impact in terms of GSR and common performance measures. To the best of our knowledge, GSR is the first specifically tailored measure for IR, capable of quantifying representational harms.

Gender stereotype reinforcement: Measuring the gender bias conveyed by ranking algorithms

Fabris A.;Purpura A.;Silvello G.;Susto G. A.

2020

Abstract

Search Engines (SE) have been shown to perpetuate well-known gender stereotypes identified in psychology literature and to influence users accordingly. Similar biases were found encoded in Word Embeddings (WEs) learned from large online corpora. In this context, we propose the Gender Stereotype Reinforcement (GSR) measure, which quantifies the tendency of a SE to support gender stereotypes, leveraging gender-related information encoded in WEs. Through the critical lens of construct validity, we validate the proposed measure on synthetic and real collections. Subsequently, we use GSR to compare widely-used Information Retrieval ranking algorithms, including lexical, semantic, and neural models. We check if and how ranking algorithms based on WEs inherit the biases of the underlying embeddings. We also consider the most common debiasing approaches for WEs proposed in the literature and test their impact in terms of GSR and common performance measures. To the best of our knowledge, GSR is the first specifically tailored measure for IR, capable of quantifying representational harms.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Rivista su cui è pubblicata l'opera
	
				INFORMATION PROCESSING & MANAGEMENT
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.ipm.2020.102377
			
	Codice WOS
	
				WOS:000582206800091
			
	Codice Scopus
	
				2-s2.0-85090152698
			
	Codice OpenAlex
	
				W3081529310
			
	Appare nelle tipologie:
	
				01.01 - Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
fabris.pdf Accesso riservato Licenza: Accesso privato - non pubblico Dimensione 1.85 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.85 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
2009.01334v1.pdf accesso aperto Tipologia: Preprint (AM - Author's Manuscript - submitted) Licenza: Creative commons Dimensione 914.46 kB Formato Adobe PDF Visualizza/Apri	914.46 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3350593

Citazioni

ND

53

44

53

social impact