We present a simple and scalable algorithm for top-N recommen- dation able to deal with very large datasets and (binary rated) im- plicit feedback. We focus on memory-based collaborative filtering algorithms similar to the well known neighboor based technique for explicit feedback. The major difference, that makes the algo- rithm particularly scalable, is that it uses positive feedback only and no explicit computation of the complete (user-by-user or item- by-item) similarity matrix needs to be performed. The study of the proposed algorithm has been conducted on data from the Million Songs Dataset (MSD) challenge whose task was to suggest a set of songs (out of more than 380k available songs) to more than 100k users given half of the user listening history and complete listening history of other 1 million people. In particular, we investigate on the entire recommendation pipeline, starting from the definition of suitable similarity and scoring func- tions and suggestions on how to aggregate multiple ranking strate- gies to define the overall recommendation. The technique we are proposing extends and improves the one that already won the MSD challenge last year.
Efficient top-n recommendation for very large scale binary rated datasets
AIOLLI, FABIO
2013
Abstract
We present a simple and scalable algorithm for top-N recommen- dation able to deal with very large datasets and (binary rated) im- plicit feedback. We focus on memory-based collaborative filtering algorithms similar to the well known neighboor based technique for explicit feedback. The major difference, that makes the algo- rithm particularly scalable, is that it uses positive feedback only and no explicit computation of the complete (user-by-user or item- by-item) similarity matrix needs to be performed. The study of the proposed algorithm has been conducted on data from the Million Songs Dataset (MSD) challenge whose task was to suggest a set of songs (out of more than 380k available songs) to more than 100k users given half of the user listening history and complete listening history of other 1 million people. In particular, we investigate on the entire recommendation pipeline, starting from the definition of suitable similarity and scoring func- tions and suggestions on how to aggregate multiple ranking strate- gies to define the overall recommendation. The technique we are proposing extends and improves the one that already won the MSD challenge last year.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.