This paper presents the participation of Team SARD in the Conference and Labs of the Evaluation Forum (CLEF) LongEval 2025 shared task, which investigates the longitudinal evaluation of information retrieval systems on evolving web collections. After a careful analysis on the dataset given to train the system, the group decided to first try some common techniques to improve performance and then focus on those that produced better experimental results. In particular, we explore a wide range of indexing and querying configurations by varying analyzers, language detection granularity, and query formulation strategies using the tools provided by the Apache Lucene framework. Additionally, we examine the impact of synonym-based query expansion using external lexical resources and query rewriting to correct typing errors. The evaluation is based on standard retrieval metrics: MAP, nDCG, and interpolated precision at standard recall levels, all computed using trec_eval. A custom MATLAB-based pipeline was developed to automate metric extraction, aggregation, and visualization across monthly snapshots. Our results indicate that document-level language detection and a combination of boolean and phrase queries improve performance in most scenarios. Conversely, synonym-based query expansion often degraded performance. In contrast, LLM-based query rewriting to fix user input errors led to noticeable improvements, demonstrating how even small interventions can enhance retrieval quality. We also observe high variability across queries and reflect on the implications for evaluation reliability. Future work will explore learning-to-rank approaches and the integration of large language models, contingent on the availability of higher-quality computational resources and adequate relevance judgments.
SARD at LongEval: Longitudinal Evaluation of IR Systems by Using Query Rewriting and Hybrid Queries
Ferro N.
2025
Abstract
This paper presents the participation of Team SARD in the Conference and Labs of the Evaluation Forum (CLEF) LongEval 2025 shared task, which investigates the longitudinal evaluation of information retrieval systems on evolving web collections. After a careful analysis on the dataset given to train the system, the group decided to first try some common techniques to improve performance and then focus on those that produced better experimental results. In particular, we explore a wide range of indexing and querying configurations by varying analyzers, language detection granularity, and query formulation strategies using the tools provided by the Apache Lucene framework. Additionally, we examine the impact of synonym-based query expansion using external lexical resources and query rewriting to correct typing errors. The evaluation is based on standard retrieval metrics: MAP, nDCG, and interpolated precision at standard recall levels, all computed using trec_eval. A custom MATLAB-based pipeline was developed to automate metric extraction, aggregation, and visualization across monthly snapshots. Our results indicate that document-level language detection and a combination of boolean and phrase queries improve performance in most scenarios. Conversely, synonym-based query expansion often degraded performance. In contrast, LLM-based query rewriting to fix user input errors led to noticeable improvements, demonstrating how even small interventions can enhance retrieval quality. We also observe high variability across queries and reflect on the implications for evaluation reliability. Future work will explore learning-to-rank approaches and the integration of large language models, contingent on the availability of higher-quality computational resources and adequate relevance judgments.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




