Existing Relation Extraction (RE) methods typically focus on extracting relational facts between entity pairs within single sentences or documents. However, in practice, a large amount of relational facts can only be inferred by reasoning across multiple documents. In this work, we introduce the task of Cross-Document Relation Extraction (CDRE), placed in between the domains of Information Retrieval (IR) and Natural Language Processing (NLP). CDRE enables the acquisition of knowledge in the wild, making it better suited for real-world use cases where relevant information is scattered across multiple sources. After formally introducing the task and the components involved in a CDRE system, we present the research directions that we plan to pursue to advance the state of the art. Specifically, we propose to integrate sparse and dense retrieval models with the heuristic-based methods currently employed in CDRE to improve the retrieval effectiveness of relevant passages from multiple documents. To further improve this retrieval, we introduce path-ranking algorithms as re-rankers to filter out less informative passages. Additionally, we explore leveraging graph-based representations to enhance document retrieval. Next, we plan to adapt Knowledge Injection (KI) techniques widely employed in sentence- and document-level RE to the CDRE setting, aiming to improve their robustness against syntactic and semantic variations, hence enhancing extraction effectiveness. Finally, we present an evaluation framework designed to assess the overall performances of CDRE systems and analyze the impact of each individual component.

Advancing Cross-Document Relation Extraction with Hybrid Retrieval and Knowledge-Augmented Reasoning

Marco Martinelli
2025

Abstract

Existing Relation Extraction (RE) methods typically focus on extracting relational facts between entity pairs within single sentences or documents. However, in practice, a large amount of relational facts can only be inferred by reasoning across multiple documents. In this work, we introduce the task of Cross-Document Relation Extraction (CDRE), placed in between the domains of Information Retrieval (IR) and Natural Language Processing (NLP). CDRE enables the acquisition of knowledge in the wild, making it better suited for real-world use cases where relevant information is scattered across multiple sources. After formally introducing the task and the components involved in a CDRE system, we present the research directions that we plan to pursue to advance the state of the art. Specifically, we propose to integrate sparse and dense retrieval models with the heuristic-based methods currently employed in CDRE to improve the retrieval effectiveness of relevant passages from multiple documents. To further improve this retrieval, we introduce path-ranking algorithms as re-rankers to filter out less informative passages. Additionally, we explore leveraging graph-based representations to enhance document retrieval. Next, we plan to adapt Knowledge Injection (KI) techniques widely employed in sentence- and document-level RE to the CDRE setting, aiming to improve their robustness against syntactic and semantic variations, hence enhancing extraction effectiveness. Finally, we present an evaluation framework designed to assess the overall performances of CDRE systems and analyze the impact of each individual component.
2025
Proceedings of the 33nd Symposium on Advanced Database Systems
SEBD 2025: the 33nd Symposium on Advanced Database Systems
   HetERogeneous sEmantic Data integratIon for the guT-bRain interplaY
   HEREDITARY
   European Commission
   Horizon Europe Framework Programme - HORIZON Research and Innovation Actions
   101137074
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3590458
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact