Efficient top-n recommendation for very large scale binary rated datasets

Aiolli, Fabio

doi:10.1145/2507157.2507189

We present a simple and scalable algorithm for top-N recommen- dation able to deal with very large datasets and (binary rated) im- plicit feedback. We focus on memory-based collaborative filtering algorithms similar to the well known neighboor based technique for explicit feedback. The major difference, that makes the algo- rithm particularly scalable, is that it uses positive feedback only and no explicit computation of the complete (user-by-user or item- by-item) similarity matrix needs to be performed. The study of the proposed algorithm has been conducted on data from the Million Songs Dataset (MSD) challenge whose task was to suggest a set of songs (out of more than 380k available songs) to more than 100k users given half of the user listening history and complete listening history of other 1 million people. In particular, we investigate on the entire recommendation pipeline, starting from the definition of suitable similarity and scoring func- tions and suggestions on how to aggregate multiple ranking strate- gies to define the overall recommendation. The technique we are proposing extends and improves the one that already won the MSD challenge last year.