Existing Relation Extraction (RE) methods typically focus on extracting relational facts between entity pairs within single sentences or documents. However, in practice, a large amount of relational facts can only be inferred by reasoning across multiple documents. In this work, we introduce the task of Cross-Document Relation Extraction (CDRE), placed in between the domains of Information Retrieval (IR) and Natural Language Processing (NLP). CDRE enables the acquisition of knowledge in the wild, making it better suited for real-world use cases where relevant information is scattered across multiple sources. After formally introducing the task and the components involved in a CDRE system, we present the research directions that we plan to pursue to advance the state of the art. Specifically, we propose to integrate sparse and dense retrieval models with the heuristic-based methods currently employed in CDRE to improve the retrieval effectiveness of relevant passages from multiple documents. To further improve this retrieval, we introduce path-ranking algorithms as re-rankers to filter out less informative passages. Additionally, we explore leveraging graph-based representations to enhance document retrieval. Next, we plan to adapt Knowledge Injection (KI) techniques widely employed in sentence- and document-level RE to the CDRE setting, aiming to improve their robustness against syntactic and semantic variations, hence enhancing extraction effectiveness. Finally, we present an evaluation framework designed to assess the overall performances of CDRE systems and analyze the impact of each individual component.
Advancing Cross-Document Relation Extraction with Hybrid Retrieval and Knowledge-Augmented Reasoning
Marco Martinelli
2025
Abstract
Existing Relation Extraction (RE) methods typically focus on extracting relational facts between entity pairs within single sentences or documents. However, in practice, a large amount of relational facts can only be inferred by reasoning across multiple documents. In this work, we introduce the task of Cross-Document Relation Extraction (CDRE), placed in between the domains of Information Retrieval (IR) and Natural Language Processing (NLP). CDRE enables the acquisition of knowledge in the wild, making it better suited for real-world use cases where relevant information is scattered across multiple sources. After formally introducing the task and the components involved in a CDRE system, we present the research directions that we plan to pursue to advance the state of the art. Specifically, we propose to integrate sparse and dense retrieval models with the heuristic-based methods currently employed in CDRE to improve the retrieval effectiveness of relevant passages from multiple documents. To further improve this retrieval, we introduce path-ranking algorithms as re-rankers to filter out less informative passages. Additionally, we explore leveraging graph-based representations to enhance document retrieval. Next, we plan to adapt Knowledge Injection (KI) techniques widely employed in sentence- and document-level RE to the CDRE setting, aiming to improve their robustness against syntactic and semantic variations, hence enhancing extraction effectiveness. Finally, we present an evaluation framework designed to assess the overall performances of CDRE systems and analyze the impact of each individual component.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




