Wake word detection is widely used in Internet of Things (IoT) systems to enable voice-based interaction with devices. However, traditional approaches rely on speech-based triggers designed for direct human interaction. In this work, we introduce the concept of Wake Sound, defined as a short, distinctive acoustic signal that activates a predefined task in an electronic device without relying on linguistic content. This paradigm enables autonomous activation of IoT systems through simple sound cues and extends the traditional wake word concept toward the Internet of Audio Things (IoAuT). We formulate the task of Wake Sound Spotting (WSS) as the continuous monitoring of an audio stream to detect the presence of a predefined acoustic marker with low latency and high reliability. To address this problem, we evaluate multiple lightweight neural architectures suitable for embedded systems, including a 1D convolutional neural network (CNN) operating on raw audio, a transfer learning approach based on YAMNet embeddings, a DualBranch CNN combining spectral and cepstral features, and a hybrid residual network. The models are trained using extensive data augmentation designed to simulate realistic acoustic conditions. Experimental results across three different Wake Sounds, including human-composed and AI-generated signals, show that the proposed approaches achieve high detection accuracy (around 97-99%) with low false acceptance rates, demonstrating the feasibility of reliable Wake Sound detection on resource-constrained devices.

Wake Sound Spotting in Internet of Things

Manuele Favero
;
Leonardo Badia
;
Sergio Canazza
;
2026

Abstract

Wake word detection is widely used in Internet of Things (IoT) systems to enable voice-based interaction with devices. However, traditional approaches rely on speech-based triggers designed for direct human interaction. In this work, we introduce the concept of Wake Sound, defined as a short, distinctive acoustic signal that activates a predefined task in an electronic device without relying on linguistic content. This paradigm enables autonomous activation of IoT systems through simple sound cues and extends the traditional wake word concept toward the Internet of Audio Things (IoAuT). We formulate the task of Wake Sound Spotting (WSS) as the continuous monitoring of an audio stream to detect the presence of a predefined acoustic marker with low latency and high reliability. To address this problem, we evaluate multiple lightweight neural architectures suitable for embedded systems, including a 1D convolutional neural network (CNN) operating on raw audio, a transfer learning approach based on YAMNet embeddings, a DualBranch CNN combining spectral and cepstral features, and a hybrid residual network. The models are trained using extensive data augmentation designed to simulate realistic acoustic conditions. Experimental results across three different Wake Sounds, including human-composed and AI-generated signals, show that the proposed approaches achieve high detection accuracy (around 97-99%) with low false acceptance rates, demonstrating the feasibility of reliable Wake Sound detection on resource-constrained devices.
2026
2026 Mediterranean Artificial Intelligence and Networking Conference (MAIN)
2026 Mediterranean Artificial Intelligence and Networking Conference (MAIN)
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3596105
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact