Toward an Automatically Generated Soundtrack from Low-level Cross-modal Correlations for Automotive Scenarios

Cristani, M.; Pesarin, A.; Drioli, C.; Murino, V.; Roda', Antonio; Grapulin, M.; Sebe, N.

doi:10.1145/1873951.1874024

In this paper, we propose a novel recommendation policy for driving scenarios. While driving a car, listening to an audio track may enrich the atmosphere, conveying emotions that let the driver sense a more arousing experience. Here, we are introducing a recommendation policy that, given a video sequence taken by a camera mounted onboard a car, chooses the most suitable audio piece from a predetermined set of melodies. The mixing mechanism takes inspiration from a set of generic qualitative aesthetical rules for cross-modal linking, realized by associating audio and video features. The contribution of this paper is to translate such qualitative rules into quantitative terms, learning from an extensive training dataset cross-modal statistical correlations, and validating them in a thoroughly way. In this way, we are able to define what are the audio and video features that correlate at best (i.e., promoting or rejecting some aesthetical rules), and what are their correlation intensities. This knowledge is then employed for the realization of the recommendation policy. A set of user studies illustrate and validate the policy, thus encouraging further developments toward a real implementation in an automotive application.