Generative Artificial Intelligence (AI) and Large Language Models (LLMs) are revolutionizing technology and society thanks to their versatility and applicability to a wide array of tasks and use cases, in multiple media and modalities. As a new and relatively untested technology, LLMs raise several challenges for research and application alike, including questions about their quality, reliability, predictability, veracity, as well as on how to develop proper evaluation methodologies to assess their various capacities. This evaluation lab will focus on a specific aspect of LLMs, namely their versatility. The CLEF Monster Track is organized as a meta-challenge across a selection of tasks chosen from other evaluation labs running in CLEF 2024, and participants will be asked to develop or adapt a generative AI or LLM-based system that will be run on all the tasks with no or minimal task adaptation. This will allow us to systematically evaluate the performance of the same LLM-based system across a wide range of very different tasks and to provide feedback to each targeted task about the performance of a general-purpose LLM system compared to systems specifically developed for the task. Since the datasets for CLEF 2024 have not yet been released publicly, we will be able to experiment with previously unseen data, thus reducing the risk of contamination, which is one of the most serious problems faced by LLM evaluation datasets.

The CLEF 2024 Monster Track: One Lab to Rule Them All

Ferro N.;
2024

Abstract

Generative Artificial Intelligence (AI) and Large Language Models (LLMs) are revolutionizing technology and society thanks to their versatility and applicability to a wide array of tasks and use cases, in multiple media and modalities. As a new and relatively untested technology, LLMs raise several challenges for research and application alike, including questions about their quality, reliability, predictability, veracity, as well as on how to develop proper evaluation methodologies to assess their various capacities. This evaluation lab will focus on a specific aspect of LLMs, namely their versatility. The CLEF Monster Track is organized as a meta-challenge across a selection of tasks chosen from other evaluation labs running in CLEF 2024, and participants will be asked to develop or adapt a generative AI or LLM-based system that will be run on all the tasks with no or minimal task adaptation. This will allow us to systematically evaluate the performance of the same LLM-based system across a wide range of very different tasks and to provide feedback to each targeted task about the performance of a general-purpose LLM system compared to systems specifically developed for the task. Since the datasets for CLEF 2024 have not yet been released publicly, we will be able to experiment with previously unseen data, thus reducing the risk of contamination, which is one of the most serious problems faced by LLM evaluation datasets.
2024
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
46th European Conference on Information Retrieval, ECIR 2024
9783031560712
9783031560729
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3524142
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact