We propose a family of new evaluation measures, called Markov Precision (MP), which exploits continuous-time and discrete-time Markov chains in order to inject user mod- els into precision. Continuous-time MP behaves like time- calibrated measures, bringing the time spent by the user into the evaluation of a system; discrete-time MP behaves like traditional evaluation measures. Being part of the same Markovian framework, the time-based and rank-based ver- sions of MP produce values that are directly comparable. We show that it is possible to re-create average precision using specific user models and this helps in providing an ex- planation of Average Precision (AP) in terms of user mod- els more realistic than the ones currently used to justify it. We also propose several alternative models that take into account different possible behaviors in scanning a ranked result list. Finally, we conduct a thorough experimental evaluation of MP on standard TREC collections in order to show that MP is as reliable as other measures and we provide an example of calibration of its time parameters based on click logs from Yandex.
Injecting User Models and Time into Precision via Markov Chains
FERRANTE, MARCO;FERRO, NICOLA;MAISTRO, MARIA
2014
Abstract
We propose a family of new evaluation measures, called Markov Precision (MP), which exploits continuous-time and discrete-time Markov chains in order to inject user mod- els into precision. Continuous-time MP behaves like time- calibrated measures, bringing the time spent by the user into the evaluation of a system; discrete-time MP behaves like traditional evaluation measures. Being part of the same Markovian framework, the time-based and rank-based ver- sions of MP produce values that are directly comparable. We show that it is possible to re-create average precision using specific user models and this helps in providing an ex- planation of Average Precision (AP) in terms of user mod- els more realistic than the ones currently used to justify it. We also propose several alternative models that take into account different possible behaviors in scanning a ranked result list. Finally, we conduct a thorough experimental evaluation of MP on standard TREC collections in order to show that MP is as reliable as other measures and we provide an example of calibration of its time parameters based on click logs from Yandex.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.