Wake word detection is widely used in Internet of Things (IoT) systems to enable voice-based interaction with devices. However, traditional approaches rely on speech-based triggers designed for direct human interaction. In this work, we introduce the concept of Wake Sound, defined as a short, distinctive acoustic signal that activates a predefined task in an electronic device without relying on linguistic content. This paradigm enables autonomous activation of IoT systems through simple sound cues and extends the traditional wake word concept toward the Internet of Audio Things (IoAuT). We formulate the task of Wake Sound Spotting (WSS) as the continuous monitoring of an audio stream to detect the presence of a predefined acoustic marker with low latency and high reliability. To address this problem, we evaluate multiple lightweight neural architectures suitable for embedded systems, including a 1D convolutional neural network (CNN) operating on raw audio, a transfer learning approach based on YAMNet embeddings, a DualBranch CNN combining spectral and cepstral features, and a hybrid residual network. The models are trained using extensive data augmentation designed to simulate realistic acoustic conditions. Experimental results across three different Wake Sounds, including human-composed and AI-generated signals, show that the proposed approaches achieve high detection accuracy (around 97-99%) with low false acceptance rates, demonstrating the feasibility of reliable Wake Sound detection on resource-constrained devices.
Wake Sound Spotting in Internet of Things
Manuele Favero
;Leonardo Badia
;Sergio Canazza
;
2026
Abstract
Wake word detection is widely used in Internet of Things (IoT) systems to enable voice-based interaction with devices. However, traditional approaches rely on speech-based triggers designed for direct human interaction. In this work, we introduce the concept of Wake Sound, defined as a short, distinctive acoustic signal that activates a predefined task in an electronic device without relying on linguistic content. This paradigm enables autonomous activation of IoT systems through simple sound cues and extends the traditional wake word concept toward the Internet of Audio Things (IoAuT). We formulate the task of Wake Sound Spotting (WSS) as the continuous monitoring of an audio stream to detect the presence of a predefined acoustic marker with low latency and high reliability. To address this problem, we evaluate multiple lightweight neural architectures suitable for embedded systems, including a 1D convolutional neural network (CNN) operating on raw audio, a transfer learning approach based on YAMNet embeddings, a DualBranch CNN combining spectral and cepstral features, and a hybrid residual network. The models are trained using extensive data augmentation designed to simulate realistic acoustic conditions. Experimental results across three different Wake Sounds, including human-composed and AI-generated signals, show that the proposed approaches achieve high detection accuracy (around 97-99%) with low false acceptance rates, demonstrating the feasibility of reliable Wake Sound detection on resource-constrained devices.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




