Post-hoc methods in Explainable AI (XAI) elucidate black-box models by identifying input features critical to the model’s decision-making. Recent advancements in these methods have facilitated the generation of logic-based explanations that capture interactions among input features. However, these techniques often encounter critical limitations, notably the inability to ensure logical consistency and fidelity between generated explanations and the model’s actual decision-making processes. Such inconsistencies jeopardize the reliability of explanations particularly in high-risk domains. To address this gap, we introduce a novel, theoretically rigorous approach rooted in category theory. Specifically, we propose the concept of an explaining functor, which preserves logical entailment structurally between the explanations and the decisions of black-box models. By establishing a categorical framework, our method guarantees the coherence and accuracy of extracted explanations, thus overcoming the common pitfalls associated with heuristic-based explanation methods. We demonstrate the practical efficacy of our theoretical contributions through two synthetic benchmarks that highlight significant reductions in contradictory and unfaithful explanations. Our experiments show how our framework can provide mathematically grounded, compositional, and coherent explanations.

Categorical Explaining Functors: Ensuring Coherence in Logical Explanations

Fioravanti S.
;
Frazzetto P.;Confalonieri R.;Navarin N.
2025

Abstract

Post-hoc methods in Explainable AI (XAI) elucidate black-box models by identifying input features critical to the model’s decision-making. Recent advancements in these methods have facilitated the generation of logic-based explanations that capture interactions among input features. However, these techniques often encounter critical limitations, notably the inability to ensure logical consistency and fidelity between generated explanations and the model’s actual decision-making processes. Such inconsistencies jeopardize the reliability of explanations particularly in high-risk domains. To address this gap, we introduce a novel, theoretically rigorous approach rooted in category theory. Specifically, we propose the concept of an explaining functor, which preserves logical entailment structurally between the explanations and the decisions of black-box models. By establishing a categorical framework, our method guarantees the coherence and accuracy of extracted explanations, thus overcoming the common pitfalls associated with heuristic-based explanation methods. We demonstrate the practical efficacy of our theoretical contributions through two synthetic benchmarks that highlight significant reductions in contradictory and unfaithful explanations. Our experiments show how our framework can provide mathematically grounded, compositional, and coherent explanations.
2025
Proceedings of the International Conference on Knowledge Representation and Reasoning
22nd International Conference on Principles of Knowledge Representation and Reasoning, KR 2025
   Symbolic conditioning of Graph Generative Models
   SymboliG
   European Union under the National Recovery and Resilience Plan (NRRP) Mission 4 Component 2 Investment 1.3
   NextGenerationEU, Code PE0000013, Concession Decree No. 1555 of October 11, 2022 CUP C63C22000770006
   NextGenerationEU, Code PE0000013, Concession Decree No. 1555 of October 11, 2022 CUP C63C22000770006
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3590889
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact