Reducing Human Effort to Validate LLM Relevance Judgements via Stratified Sampling

Merlo, Simone; Marchesin, Stefano; Faggioli, Guglielmo; Ferro, Nicola

doi:10.1007/978-3-032-21289-4_27

Information Retrieval (IR) evaluation deeply relies on human-made relevance judgments. To overcome the high costs of the judgment collection process, a potential solution is to utilize LLMs as judges to replace human annotators. However, the validation of LLM-generated judgments is fundamental for informed use. Standard validation approaches typically rely on simple sampling techniques to collect a sample of the LLM-generated judgments and estimate the LLM agreement with the human. In this work, we propose using stratified sampling, a more sophisticated sampling strategy that, by leveraging appropriate stratification features, reduces human involvement in the validation process while still providing statistical guarantees on the human-LLM agreement estimate. Through the analysis of various candidate features, we identify the LLM-generated judgments themselves as the most promising one. Our approach achieves up to an 85% reduction in the required human involvement in the validation process.

Reducing Human Effort to Validate LLM Relevance Judgements via Stratified Sampling

Simone Merlo;Stefano Marchesin;Guglielmo Faggioli;Nicola Ferro

2026

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Titolo del Libro
	
				Lecture Notes in Computer Science
			
	Collana/serie monografica
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	Titolo convegno
	
				48th European Conference on Information Retrieval, ECIR 2026
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-032-21289-4_27
			
	Codice OpenAlex
	
				W7140142263
			
	Codice ISBN
	
				9783032212887
9783032212894
			
	Identificativo progetto
	
	Titolo Progetto
	
									Conversational Agents: Mastering, Evaluating, Optimizing
								
	Acronimo
	
									CAMEO
								
	Nome finanziatore
	
										Ministero dell'Università e della Ricerca
									
	N. Contratto
	
									2022ZLL7MW
								
	Appare nelle tipologie:
	
				04.01 - Contributo in atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3590312

Reducing Human Effort to Validate LLM Relevance Judgements via Stratified Sampling

Simone Merlo;Stefano Marchesin;Guglielmo Faggioli;Nicola Ferro

2026

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Pubblicazioni consigliate

Citazioni

social impact

Reducing Human Effort to Validate LLM Relevance Judgements via Stratified Sampling

Simone Merlo;Stefano Marchesin;Guglielmo Faggioli;Nicola Ferro

2026

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)