Audio-video people recognition system for an intelligent environment

Anzalone, S. M.; Menegatti, Emanuele; Pagello, Enrico; Yoshikawa, Y.; Ishiguro, H.; Chella, A.

doi:10.1109/HSI.2011.5937372

In this paper an audio-video system for intelligent environments with the capability to recognize people is presented. Users are tracked inside the environment and their positions and activities can be logged. Users identities are assessed through a multimodal approach by detecting and recognizing voices and faces through the different cameras and microphones installed in the environment. This approach has been chosen in order to create a flexible and cheap but reliable system, implemented using consumer electronics. Voice features are extracted by a short time cepstrum analysis, and face features are extracted using the eigenfaces technique. The recognition task is solved using the same Support Vector Machine for both voice and face features. The system learns the features of each person using SVM in a set-up phase, in which the two modalities are also bound together through a cross-anchoring learning rule based on the mutual exclusivity selection principle. In the running phase the system is able to recognize the identity of the person either using voice features, or face features or both. The system is scalable in the number of cameras and microphones thanks to NMM, a middleware software which manages the processing of the single sensors and the communications among the several software nodes.