Deep vision models often rely on biases learned from spurious correlations in datasets. To identify these biases, methods that interpret high-level, human-understandable concepts are more effective than those relying primarily on low-level features like heatmaps. A major challenge for these concept-based methods is the lack of image annotations indicating potentially bias-inducing concepts, since creating such annotations requires detailed labeling for each dataset and concept, which is a highly labor-intensive task. To this end, we propose CUBIC (Concept embeddings for Unsupervised Bias IdentifiCation), a novel methodology that automatically discovers human-understandable concepts that can bias a classifier’s behavior. Unlike existing approaches, CUBIC does not rely on predefined bias candidates or examples of model failures tied to specific biases, as these are not always available in the data. Instead, it utilizes image-text latent space and linear classifier probes to examine how the latent representation of a superclass label---shared by all instances in the dataset---is influenced by the presence of a concept. By measuring these shifts against the normal vector to the classifier’s decision boundary, CUBIC identifies concepts that significantly influence model predictions. Our experiments demonstrate that CUBIC effectively uncovers previously unknown biases using Vision-Language Models (VLMs) without requiring knowledge of potential biases or samples linked to bias where the classifier underperforms.

CUBIC: Unsupervised Detection of Conceptual Biases via Vision-Language Embeddings

Confalonieri R.;
2025

Abstract

Deep vision models often rely on biases learned from spurious correlations in datasets. To identify these biases, methods that interpret high-level, human-understandable concepts are more effective than those relying primarily on low-level features like heatmaps. A major challenge for these concept-based methods is the lack of image annotations indicating potentially bias-inducing concepts, since creating such annotations requires detailed labeling for each dataset and concept, which is a highly labor-intensive task. To this end, we propose CUBIC (Concept embeddings for Unsupervised Bias IdentifiCation), a novel methodology that automatically discovers human-understandable concepts that can bias a classifier’s behavior. Unlike existing approaches, CUBIC does not rely on predefined bias candidates or examples of model failures tied to specific biases, as these are not always available in the data. Instead, it utilizes image-text latent space and linear classifier probes to examine how the latent representation of a superclass label---shared by all instances in the dataset---is influenced by the presence of a concept. By measuring these shifts against the normal vector to the classifier’s decision boundary, CUBIC identifies concepts that significantly influence model predictions. Our experiments demonstrate that CUBIC effectively uncovers previously unknown biases using Vision-Language Models (VLMs) without requiring knowledge of potential biases or samples linked to bias where the classifier underperforms.
2025
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3573262
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex 0
social impact